Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Where are people using AI in DevOps today? I can't find real value

Two recent experiments highlight serious risks when AI tools modify Kubernetes infrastructure and Helm configurations without human oversight. Using kubectl-ai to apply “suggested” changes in a staging cluster led to unexpected pod failures, cost spikes, and hidden configuration drift that made rollbacks a nightmare. Attempts to auto-generate complex Helm values.yaml files resulted in hallucinated keys and misconfigurations, costing more time to debug than manually editing a 3,000-line file.

I ran

kubectl ai apply --context=staging --suggest

and watched it adjust CPU and memory limits, replace container images, and tweak our HorizontalPodAutoscaler settings without producing a diff or requiring human approval. In staging, that caused pods to crash under simulated load, inflated our cloud bill overnight, and masked configuration drift until rollback became a multi-hour firefight. Even the debug changes, its overriding my changes done by ArgoCD, which then get reverted. I feel the concept is nice but in practicality.... it needs to full context or will will never be useful. the tool feels like we are just trowing pasta against the wall.

Another example is when I used AI models to generate helm values. to scaffold a complex Helm values.yaml. The output ignored our chart’s schema and invented arbitrary keys like imagePullPolicy: AlwaysFalse and resourceQuotas.cpu: high. Static analysis tools flagged dozens of invalid or missing fields before deployment, and I spent more time tracing Kubernetes errors caused by those bogus keys than I would have manually editing our 3,000-line values file.

Has anyone else captured any real, measurable benefits—faster rollouts or fewer human errors—without giving up control or visibility? Please share your honest war stories?

https://redd.it/1klgx3h
@r_devops
How to know if I'm suitable for an SRE/DevOps position

Hi folks

I've been a SWE for about 4 years now, and I'd consider myself a bit of a polyglot (fluent in lots of languages, front end to back end), and I've done a fair amount of work on the cloud and infrastructure side.

I'm curious if Reddit thinks I'd be capable of taking a job as an SRE or in DevOps based on my experience:
\- Built and managed several Kubernetes clusters (no managed services)
\- Built a multi-region, multi-vendor automated Kubernetes cluster deployer
\- Worked with Gitlab CI/CD to support releases for Spring Boot apps, various Node projects and more
\- Built and maintained image scanning pipelines (using trivvy and blackduck)
\- Managed terraform and ansible projects for deploying infrastructure in AWS (including all your usual suspects; EC2, RDS, etc etc)

Thanks!

https://redd.it/1klii7h
@r_devops
Self-hosted MySQL for production - how hard is it really?

I started software engineering in 2002, there was no cloud back then and we would buy physical servers, rent a partial rack in a datacenter, deploy the servers there and install everything manually, from the OS to the database.

With 10-15 servers we quickly needed someone full time to manage the OS upgrades, patches, etc.

I have a side project that's getting hit around 5,000 times per minutes uncached, behing the back-end sits a MySQL 8 database curently managed by DigitalOcean. I'm paying around $100 per month for the database for 4 Gb of RAM, 2 vCPUs and around 8Gb of disk.

Separately, I've been a customer of OVH since 2008 and I've never had real problems with them. For $90 per month I can have something stupidely better: AMD Ryzen 5 5600X 6c @ 3.7Ghz/4.6Ghz, 64GB of DDR4 RAM (can get 192Gb for only $50 extra), 2x 960GB of SSD NVMe Raid, 25Gbp/s private bandwidth unmetered.

My question: does any of you have practical experience these days of the work involved in maintaining a database always updated/upgraded? Is it worth the hassle? What tools / stack do you use for this?

Note: I'm not affiliate with either OVH nor DigitalOcean, the question is really about baremetal self-managed (OVH, Hetzner, etc.) vs cloud managed (AWS, DigitalOcean, Linode, etc.)

https://redd.it/1kljcuz
@r_devops
What’s one cloud concept you pretended to understand at first?

Let’s be real—cloud has a steep learning curve. In my first few months, I nodded along when people mentioned VPCs, but deep down I had no clue what was really happening under the hood.

I eventually had to swallow my pride, go back to basics, and sketch it all out on paper. It finally clicked, but man—I struggled before that 😅

What about you?
Was there a concept (IAM, subnets, container orchestration?) you “faked till you made it”?
Curious what tripped others up early on.

https://redd.it/1klk7qt
@r_devops
BPMN for DevOps?

I'm looking into using a BPMN tool (like Camunda) or engine (like Zeebe or something more OSS) to describe complex DevSecOps processes, and would love to pick your brain on this topic.

I'm somewhat surprised that BPMN is not the standard, and instead even the best tools only support DAG, or are just super dev friendly (e.g Temporal). Have you used BPMN for DevOps automation/orchestration?

My idea is to keep using GitLab CI for ... well ... CI, but that would end at building containers. Otherwise all the orchestration, including cross-project orchestration, integrating several tools (Datadog, Slack, etc...) would happen at the BPMN layer. (I'm still thinking to either use GitLab or Kubernetes Job when I need a longer running task, like a DB migration, but even that would be launched as part of BPMN.)

While I struggle finding people using BPMN for these tasks, I see more and more people using durable execution engines (e.g. Temporal) for it. If you were part of such a decision, would you mind sharing why you went one way or the other?

https://redd.it/1kllbaa
@r_devops
im finally a DevOps Engineer

5 years ago I had zero college, zero experience, no certifications, and no marketable skills coming out of the army. i set the goal for myself to become a DevOps engineer and today I did it.

got into IT with zero experience and one certification in 2020 when i got out of the army infantry.

first job was help desk, then sysadmin, then a couple tier 2/3 remote support positions including as a RHCSA at red hat. then i got a sysadmin position for my current company in August of 2023.

i worked my ass off. i have built full terraform/Terragrunt modules, deployment pipelines, and incident response tools for our clients, who are some of the biggest tech organizations in the world. google, zoom, red hat, Microsoft, etc... I do this across multiple cloud providers based on client needs. it's actually kind of shocking the amount of work we do at the level we do given the size of our team. I'm the only systems person and I get to touch infrastructure for large organizations on a regular basis.

today i got the email that i have officially been promoted to DevOps engineer.

im really proud of myself. I barely graduated high school because of my ADHD. I did well in the army but the violent environment was not good for my soul. college is very uncomfortable for me. I wasn't sure if I'd ever make a good living, let alone doing smart people stuff.

when I was getting into IT I looked for the most lucrative positions. then looked for the one that I thought seemed the most interesting and that was DevOps. now im a DevOps engineer.

I'm really proud of myself.



https://redd.it/1klp28x
@r_devops
Devops positions are harsh for mid-level

Hey buddies,

I have been in DevOps for 2 years, and in the tech industdy for roughly 3 years. I am not a senior yet, more of a mid-level working in a good company here in cyprus, but the thing is am not getting what I want. I mean, im trying to switch job as any normal human being looking for a change and my current company is pretty reputable and know in the market. I have 2 AWS certifications and the CKA, and my CV is a solid 99/100 on ATS reviewers. But still not getting in. All positions are looking for seniors, and this is killing me.
I mean, I am doing super good on interviews, always showimg a super nice energy and answering all technical questions with the best answers possible, I did more than 15 interviews this year, even reached the last stages with big companies like AWS, Exness... stuff like that, but bad luck is a curse. Always someone more experienced take the role. Or got filled internally, or the recruiter is a jerk... any tips?

https://redd.it/1klulbg
@r_devops
How did your "trial by fire" go?

Hey! I'm in my first DevOps gig and it's kicking my butt. I was told that our environment is pretty complicated. We have a pretty intricate project pipeline with tons of jobs, rules, and variables. I'm having a hard time keeping up. I'm in year one and most of the tech we are using is technically new to me. It's making me want to quit but there are pretty smart, intelligent, and PATIENT people that are taking me under the wing a bit. I don't want to disappoint them. And I'll admit, at this point it isn't interesting work to me but I feel like it only feels like that because I haven't got a firm grasp on it. I've been a sys engineer for 20 years and I feel like I started at the bottom again.

What was your trial by fire like?

https://redd.it/1klxi7a
@r_devops
❤‍🔥1
How do I level up beyond my golden-cage role?

Hey r/devops,

I’ve been in a junior DevOps role for 9 months—great pay, stable environment, but zero real mentorship or sandbox to experiment. I’ve built my own Puppet lab with Dockerfiles and even spun up a NetBox for our company (we use it to inventarize all our VM‘s), yet I’m still stuck on company policies, black-box CI/CD, and no cloud exposure.

I’m not looking to be hand-held. Give me your-tips:

• Self-training: Must-have home-lab setups, tools, projects or challenges that actually translate to production skills?

• Pipeline mastery: What are the best resources or exercises to go from “black box” to “I own any CI/CD stack”?

• Career acceleration: Beyond certs and Udemy, what separates a “good” DevOps engineer from a “great” one in 2025?

Drop your strongest advice—books, courses, hands-on labs, community challenges, mindset shifts—anything that helped you break out of a comfortable but stagnant role.

Let’s hear your best!

https://redd.it/1klxy2b
@r_devops
Personal ops horror stories?

Share your ops horror stories so we can share the pain.

I'll go first. I once misconfigured a prod mx server and pointed it to mailtrap. Didn't notice for nearly 24 hours. On-call reached out first only because we had a midnight migration that ALWAYS alerts/sends email, this time it didn't and caught the attention of whoevers on call. Fun time bisecting terraform configs and commits for the next 3hrs.

https://redd.it/1km0s7l
@r_devops
Essential Kubernetes Design Patterns

As Kubernetes becomes the go-to platform for deploying and managing cloud-native applications, engineering teams face common challenges around reliability, scalability, and maintainability.

In my latest article, I explore Essential Kubernetes Design Patterns that every cloud-native developer and architect should know—from Health Probes and Sidecars to Operators and the Singleton Service Pattern.

These patterns aren’t just theory—they’re practical, reusable solutions to real-world problems, helping teams build production-grade systems with confidence.

Whether you’re scaling microservices or orchestrating batch jobs, these patterns will strengthen your Kubernetes architecture.

Read the full article:
Essential Kubernetes Design Patterns: Building Reliable Cloud-Native Applications

https://www.rutvikbhatt.com/essential-kubernetes-design-patterns/

Let me know which pattern has helped you the most—or which one you want to learn more about!

#Kubernetes #CloudNative #DevOps #SRE #Microservices #Containers #EngineeringLeadership #DesignPatterns #K8sArchitecture

https://redd.it/1km21st
@r_devops
DevOps engineer live coding interview

Hey guys! I've never had a live coding interview for devops engineering roles. Anyone has experience on what questions might be asked? I was told it won't be leetcode style not algo. Any experience you can share would be greatly appreciated!

https://redd.it/1km7db7
@r_devops
Looking for DevOps Role

Hi everyone, I'm looking for devops role for quite some sometime now. If you have any openings in your organization, please DM me with the company name. I have 6 years of experience with top Cloud, tools, and technologies. Prefer Remote, but open to relocate given visa is provided.

https://redd.it/1km96dw
@r_devops
How do you handle scaling challenges in your devops setup?

Hey everyone! I’ve been running into some scaling issues with my current devops setup. How do you typically approach scaling when your infrastructure starts to hit its limits? Do you have any tools or strategies that have worked well for you? Would love to hear your thoughts and experiences!

https://redd.it/1kmcnm8
@r_devops
Is it normal to still feel nervous every time you deploy?

Hey everyone, We (at kuberns) have been focused on simplifying deployments for dev teams, and one thing we keep hearing is that no matter how automated your process gets, there’s always that moment of hesitation before hitting “deploy.”

Even with things like automated rollbacks, health checks, and continuous monitoring, we still see teams dealing with deployment anxiety.

We’ve built Kuberns to help minimize those risks and provide more confidence, but we’re curious to know: for those of you who've mastered deployment, what’s helped you overcome that “nervous moment”?

Looking forward to hearing your tips or stories, especially if you’ve used anything to reduce deployment stress.

https://redd.it/1kmdte5
@r_devops
I'm a DevOps engineer with strong AWS skills but weak fundamentals — how can I fill the gaps without burning out?

Hey folks,

I'm a DevOps engineer with a few years of hands-on experience — mostly focused on CI/CD, infrastructure automation, Kubernetes, observability, and cloud tooling.

I have strong proficiency in AWS and Terraform. I’ve built and managed production infrastructure, automated pipelines, and deployed scalable services with infrastructure as code. That part of the job feels natural to me.

But here's the thing:
I don’t have a programming background like many other DevOps engineers. I’ve never studied computer science, and I’ve always disliked “studying” in the traditional sense. Most of what I know came from solving real problems at work, often under pressure. This helped me get by, but I’ve realized that it also left serious gaps in my foundational knowledge.

For example:

I can deploy and troubleshoot apps in Kubernetes, but I couldn’t confidently explain what a `kubelet` is.
I work with Linux servers daily, but I’ve never deeply understood things like cgroups or namespaces.
I use networking tools all the time, but explaining how NAT, routing, or TCP really work makes me feel insecure.
I’ve never written a proper app — just shell scripts and YAML. I’d like to learn Go from scratch, but I’m not sure how to structure that.

I’m getting worried that these gaps will hold me back — especially in future interviews or higher-responsibility roles.
I genuinely want to fix this, but I need to do it in a sustainable way. Sitting down for hours of study doesn’t work well for me. I lose focus quickly, especially when I already “kind of” know the topic.

https://redd.it/1kmgjkx
@r_devops
I used to default to S3 for everything—until I realized not all storage is equal

When I started learning AWS, S3 felt like the answer to every storage need. Logs? S3. Backups? S3. App data? Yep—S3 again.

Then I ran into problems:

Needed fast reads → latency was too high
Needed a POSIX filesystem → oops, not S3
Needed relational structure → suddenly reinventing a database in JSON

That’s when I finally sat down and learned the
why behind AWS storage options:

S3 is great for blobs and backups
EFS for shared file storage across instances
EBS for block storage tied to EC2
FSx if you need Windows or Lustre performance
And Glacier for deep archiving

Now I think less about “where to dump data” and more about “how it’ll be accessed.”

Anyone else hit this wall before?
What helped you figure out the right fit for each use case?

https://redd.it/1kmlcbt
@r_devops
How do you dockerize your java application ?

Hey folks, I've started learning about docker and so far im loving it. I realised the best way to learn is to dockerize something and I already have my java code with me.


I have a couple of questions for which I need some help

- Im using a lot of localhosts in my code. Im using caddy reverse proxy, redis, mongoDB and the java code itself which has an embedded serverjetty. All run on localhost with different ports
- I need to create separate containers for java codejar, caddy, redis, mongoDB
- What am I gonna do about many localhosts ? I have them in the java code and in caddy as well ?

This seems like a lot of work to manually use the service name instead of localhost ? Is manually changing from localhost to the service name - the only way to dockerize an application ?

Can you please guide me on this ?


https://redd.it/1kmmgq4
@r_devops
We started using Testcontainers to catch integration bugs before CI — huge improvement in speed and reliability

Our devs used to rely on mocks and shared staging environments for integration testing. We switched to Testcontainers to run integration tests locally using real services like PostgreSQL, and it changed everything.

No more mock maintenance
Immediate feedback inside the IDE
Reduced CI load and test flakiness
Faster lead time to changes (thanks DORA metrics!)

Wrote a detailed blog post on it here:

https://blog.abhimanyu-saharan.com/posts/catch-bugs-early-with-testcontainers-shift-left-testing-made-easy

Would love feedback or to hear how others are doing shift-left testing.

https://redd.it/1kmqt71
@r_devops
Seeking Advice from DevOps Experts for Hosting a Rental E-Commerce Platform

Hey seniors I need help!

I’m a 3rd-year CSE student working at an early-stage startup (full-stack + DevOps role). We’re building a rental e-commerce platform, and ~50-60% of our production-grade code is ready. Before deployment, I’d love some advice beyond just tooling—strategies, pitfalls, and real-world experiences.

Current Stack & Setup:
Infra: DigitalOcean (servers), S3 (object storage), CloudFront (CDN)
Orchestration: Docker Swarm (initially)
Monitoring: Prometheus + Loki + Grafana (planned)

Questions:
Best zero-downtime strategy for small teams? (Blue-green, canary, rolling?)

Docker Swarm gotchas in production?
How to handle sudden traffic spikes?
Common runtime errors to prep for?
Critical alerts for a rental platform?
backup and failure strategy for Postgres/mongodb/redis?
Security tips?

Rather than this you can share your experience also that might be helpful!

Thanks




https://redd.it/1kmqrc0
@r_devops