GenAI is fun… until you try to keep it running in prod
GenAI is fun… until you try to keep it running in prod 😅
I’ve been seeing tons of GenAI demos lately and yeah, they look great. But every time I end up thinking, okay cool, but how do you operate this thing after the demo?
Recently AWS started talking more seriously about GenAIOps.
GenAI just doesn’t behave like normal apps. Same prompt, different output. “Works” but not always right. Tokens quietly draining money. Stuff breaks in weird ways.
Funny thing is, just recently I found myself using shell scripts and multi-stage Azure DevOps pipelines to build some guardrails and ops around GenAI workflows. Not fancy, but very real. And that’s when it hit me, yeah, this absolutely needs its own ops mindset.
AWS is basically saying the same: treat prompts, models, agents like deployable artifacts. Monitor quality, not just uptime. Add safety, cost controls, evals. It’s like MLOps… but leveled up for GenAI chaos.
This feels less like hype and more like reality catching up. We’re clearly moving from GenAI experiments to GenAI systems. And systems always need ops.
Good reads if you’re curious: https://aws.amazon.com/blogs/machine-learning/operationalize-generative-ai-workloads-and-scale-to-hundreds-of-use-cases-with-amazon-bedrock-part-1-genaiops/
I hope you are happy now @mods. 😜
#AWS #GenAIOps #GenerativeAI #DevOps #MLOps #CloudEngineering
https://redd.it/1pt3b7w
@r_devops
GenAI is fun… until you try to keep it running in prod 😅
I’ve been seeing tons of GenAI demos lately and yeah, they look great. But every time I end up thinking, okay cool, but how do you operate this thing after the demo?
Recently AWS started talking more seriously about GenAIOps.
GenAI just doesn’t behave like normal apps. Same prompt, different output. “Works” but not always right. Tokens quietly draining money. Stuff breaks in weird ways.
Funny thing is, just recently I found myself using shell scripts and multi-stage Azure DevOps pipelines to build some guardrails and ops around GenAI workflows. Not fancy, but very real. And that’s when it hit me, yeah, this absolutely needs its own ops mindset.
AWS is basically saying the same: treat prompts, models, agents like deployable artifacts. Monitor quality, not just uptime. Add safety, cost controls, evals. It’s like MLOps… but leveled up for GenAI chaos.
This feels less like hype and more like reality catching up. We’re clearly moving from GenAI experiments to GenAI systems. And systems always need ops.
Good reads if you’re curious: https://aws.amazon.com/blogs/machine-learning/operationalize-generative-ai-workloads-and-scale-to-hundreds-of-use-cases-with-amazon-bedrock-part-1-genaiops/
I hope you are happy now @mods. 😜
#AWS #GenAIOps #GenerativeAI #DevOps #MLOps #CloudEngineering
https://redd.it/1pt3b7w
@r_devops
Amazon
Operationalize generative AI workloads and scale to hundreds of use cases with Amazon Bedrock – Part 1: GenAIOps | Amazon Web Services
In this first part of our two-part series, you'll learn how to evolve your existing DevOps architecture for generative AI workloads and implement GenAIOps practices. We'll showcase practical implementation strategies for different generative AI adoption levels…
👍1
Terraform AWS Infrastructure Framework (Multi-Env, Name-Based, Scales by Config)
🚀 Excited to share my latest open-source project: a Terraform framework for AWS focused on multi-environment infrastructure management.
After building and refining patterns across multiple environments, I open-sourced a framework that helps teams keep deployments consistent across dev / qe / prod.
The problem:
- Managing AWS infra across dev / qe / prod usually leads to:
- Configuration drift between environments
- Hardcoded resource IDs everywhere
- Repetitive boilerplate when adding “one more” resource
- Complex dependency management across modules
The solution:
A workspace-based framework with automation:
- ✅ Automatic resource linking — reference resources by name, not IDs. The framework resolves and injects IDs automatically across modules.
- ✅ DRY architecture — one codebase for dev / qe / prod using Terraform workspaces.
- ✅ Scale by configuration, not code — create unlimited resources WITHOUT re-calling modules. Just add entries in a .tfvars file using plain-English names (e.g., “prod_vpc”, “private_subnet_az1”, “eks_cluster_sg”).
What’s included:
- VPC networking (multi-AZ, public/private subnets)
- Internet gateway, NAT gateway, route tables, EIPs
- Security groups + SG-to-SG references
- VPC endpoints (Gateway & Interface)
- EKS cluster + managed node groups
Real example:
# terraform.tfvars (add more entries, no new module blocks)
eks_clusters = {
prod = {
my_cluster = {
cluster_version = "1.34"
vpc_name = "prod_vpc" # name, not ID
subnet_name = ["pri_sub1", "pri_sub2"] # names, not IDs
sg_name = ["eks_cluster_sg"] # name, not ID
}
}
}
# Framework injects vpc_id, subnet_ids, sg_ids automatically
GitHub:
https://github.com/rajarshigit2441139/terraform-aws-infrastructure-framework
Looking for:
- Feedback from the community
- Contributors interested in IaC patterns
- Teams standardizing AWS deployments
Question:
What are your biggest challenges with multi-environment Terraform? How do you handle cross-module references today?
#Terraform #AWS #InfrastructureAsCode #DevOps #CloudEngineering #EKS #Kubernetes #OpenSource #CloudArchitecture #SRE
https://redd.it/1qkjko7
@r_devops
🚀 Excited to share my latest open-source project: a Terraform framework for AWS focused on multi-environment infrastructure management.
After building and refining patterns across multiple environments, I open-sourced a framework that helps teams keep deployments consistent across dev / qe / prod.
The problem:
- Managing AWS infra across dev / qe / prod usually leads to:
- Configuration drift between environments
- Hardcoded resource IDs everywhere
- Repetitive boilerplate when adding “one more” resource
- Complex dependency management across modules
The solution:
A workspace-based framework with automation:
- ✅ Automatic resource linking — reference resources by name, not IDs. The framework resolves and injects IDs automatically across modules.
- ✅ DRY architecture — one codebase for dev / qe / prod using Terraform workspaces.
- ✅ Scale by configuration, not code — create unlimited resources WITHOUT re-calling modules. Just add entries in a .tfvars file using plain-English names (e.g., “prod_vpc”, “private_subnet_az1”, “eks_cluster_sg”).
What’s included:
- VPC networking (multi-AZ, public/private subnets)
- Internet gateway, NAT gateway, route tables, EIPs
- Security groups + SG-to-SG references
- VPC endpoints (Gateway & Interface)
- EKS cluster + managed node groups
Real example:
# terraform.tfvars (add more entries, no new module blocks)
eks_clusters = {
prod = {
my_cluster = {
cluster_version = "1.34"
vpc_name = "prod_vpc" # name, not ID
subnet_name = ["pri_sub1", "pri_sub2"] # names, not IDs
sg_name = ["eks_cluster_sg"] # name, not ID
}
}
}
# Framework injects vpc_id, subnet_ids, sg_ids automatically
GitHub:
https://github.com/rajarshigit2441139/terraform-aws-infrastructure-framework
Looking for:
- Feedback from the community
- Contributors interested in IaC patterns
- Teams standardizing AWS deployments
Question:
What are your biggest challenges with multi-environment Terraform? How do you handle cross-module references today?
#Terraform #AWS #InfrastructureAsCode #DevOps #CloudEngineering #EKS #Kubernetes #OpenSource #CloudArchitecture #SRE
https://redd.it/1qkjko7
@r_devops
GitHub
GitHub - rajarshigit2441139/terraform-aws-infrastructure-framework: This Terraform framework provides a comprehensive, production…
This Terraform framework provides a comprehensive, production-ready infrastructure-as-code solution for AWS. Built with modularity, scalability, and multi-environment support at its core, it enable...