AI is flooding codebases, and most teams aren’t reviewing it before deploy
42% of devs say AI writes half their code. Are we seriously ready for that?
Cloudsmith recently surveyed 307 DevOps practitioners- not randoms, actual folks in the trenches. Nearly 40% came from orgs with 50+ software engineers, and the results hit hard:
42% of AI-using devs say at least half their code is now AI-generated
Only 67% review AI-generated code before deploy (!!!)
80% say AI is increasing OSS malware risk, especially around dependency abuse
Attackers are shifting tactics, we're seeing increased slopsquatting and poisoning in the supply chain, knowing AI solutions will happily pull in risky packages
As vibe coding takes a bigger seat in the SDLC, we’re seeing speed gains - but also way more blind spots and bad practices. Most teams haven’t locked down artifact integrity, provenance, or automated trust checks in their pipelines.
Cool tech, but without the guardrails, we're just accelerating into a breach.
Does this resonate with you? If so, check out the free survey report today:
https://cloudsmith.com/blog/ai-is-now-writing-code-at-scale-but-whos-checking-it
https://redd.it/1lecppz
@r_devops
42% of devs say AI writes half their code. Are we seriously ready for that?
Cloudsmith recently surveyed 307 DevOps practitioners- not randoms, actual folks in the trenches. Nearly 40% came from orgs with 50+ software engineers, and the results hit hard:
42% of AI-using devs say at least half their code is now AI-generated
Only 67% review AI-generated code before deploy (!!!)
80% say AI is increasing OSS malware risk, especially around dependency abuse
Attackers are shifting tactics, we're seeing increased slopsquatting and poisoning in the supply chain, knowing AI solutions will happily pull in risky packages
As vibe coding takes a bigger seat in the SDLC, we’re seeing speed gains - but also way more blind spots and bad practices. Most teams haven’t locked down artifact integrity, provenance, or automated trust checks in their pipelines.
Cool tech, but without the guardrails, we're just accelerating into a breach.
Does this resonate with you? If so, check out the free survey report today:
https://cloudsmith.com/blog/ai-is-now-writing-code-at-scale-but-whos-checking-it
https://redd.it/1lecppz
@r_devops
Cloudsmith
AI is now writing code at scale - but who’s checking it? | Cloudsmith
As Generative AI (GenAI) reshapes the software development landscape, the risks and complexities around managing what gets built, where it comes from, and how it’s secured are growing just as fast. The Cloudsmith 2025 Artifact Management Report dives into…
Help planning workers
Hey, I am building an App, I need to create jobs and workers for this jobs to update my database.
I do not have experience with jobs, so here is my approach:
- I will use redis to create a job queue
- I will use workers to consume that job queue
What would be better for workers and redis, use my own VPS (starting with 15 dollar month) with docker swarm or k8, or use any Container as a service provider like Fly.io or Railway??
https://redd.it/1lecs0e
@r_devops
Hey, I am building an App, I need to create jobs and workers for this jobs to update my database.
I do not have experience with jobs, so here is my approach:
- I will use redis to create a job queue
- I will use workers to consume that job queue
What would be better for workers and redis, use my own VPS (starting with 15 dollar month) with docker swarm or k8, or use any Container as a service provider like Fly.io or Railway??
https://redd.it/1lecs0e
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Hackathon challenge: Monitor EKS with literally just bash (no joke, it worked)
Had a hackathon last weekend with the theme "simplify the complex" so naturally I decided to see if I could replace our entire Prometheus/Grafana monitoring stack with... bash scripts.
Challenge was: build EKS node monitoring in 48 hours using the most boring tech possible. Rules were no fancy observability tools, no vendors, just whatever's already on a Linux box.
What I ended up with:
DaemonSet running bash loops that scrape /proc
gnuplot for making actual graphs (surprisingly decent)
12MB total, barely uses any resources
Simple web dashboard you can port-forward to
The kicker? It actually monitors our nodes better than some of the "enterprise" stuff we've tried. When CPU spikes I can literally
Judges were split between "this is brilliant" and "this is cursed" lol (TL;DR - I won)
Now I'm wondering if I accidentally proved that we're all overthinking observability. Like maybe we don't need a distributed tracing platform to know if disk is full?
Posted the whole thing here: https://medium.com/@heinancabouly/roll-your-own-bash-monitoring-daemonset-on-amazon-eks-fad77392829e?source=friends\_link&sk=51d919ac739159bdf3adb3ab33a2623e
Anyone else done hackathons that made you question your entire tech stack? This was eye-opening for me.
https://redd.it/1ledzu9
@r_devops
Had a hackathon last weekend with the theme "simplify the complex" so naturally I decided to see if I could replace our entire Prometheus/Grafana monitoring stack with... bash scripts.
Challenge was: build EKS node monitoring in 48 hours using the most boring tech possible. Rules were no fancy observability tools, no vendors, just whatever's already on a Linux box.
What I ended up with:
DaemonSet running bash loops that scrape /proc
gnuplot for making actual graphs (surprisingly decent)
12MB total, barely uses any resources
Simple web dashboard you can port-forward to
The kicker? It actually monitors our nodes better than some of the "enterprise" stuff we've tried. When CPU spikes I can literally
cat the script to see exactly what it's checking.Judges were split between "this is brilliant" and "this is cursed" lol (TL;DR - I won)
Now I'm wondering if I accidentally proved that we're all overthinking observability. Like maybe we don't need a distributed tracing platform to know if disk is full?
Posted the whole thing here: https://medium.com/@heinancabouly/roll-your-own-bash-monitoring-daemonset-on-amazon-eks-fad77392829e?source=friends\_link&sk=51d919ac739159bdf3adb3ab33a2623e
Anyone else done hackathons that made you question your entire tech stack? This was eye-opening for me.
https://redd.it/1ledzu9
@r_devops
Medium
🩺 Roll Your Own Bash Monitoring DaemonSet on Amazon EKS
A sturdy, zero‑vendor‑lock‑in path to cluster observability — no Prometheus, no Grafana, no fuss
Reading Material
Hello DevOps community,
Im new here but thought it would be a good place to start. Lately I've realized that reddit being my default time filler is not as appealing as it used to be. Many times I thought, I wish I was reading something actually beneficial to my life.
I am a cloud engineer, I mostly focus on automation at scale. Do you all have any staple books that still hold weight today, even if they were written years ago? I dont read a lot, especially in tech, but my brain defaults to "if it was published 10 years ago, its probably out of date". So I came to ask which books you think held up and maybe where you go to "learn more by reading more".
Thanks!
https://redd.it/1lehcav
@r_devops
Hello DevOps community,
Im new here but thought it would be a good place to start. Lately I've realized that reddit being my default time filler is not as appealing as it used to be. Many times I thought, I wish I was reading something actually beneficial to my life.
I am a cloud engineer, I mostly focus on automation at scale. Do you all have any staple books that still hold weight today, even if they were written years ago? I dont read a lot, especially in tech, but my brain defaults to "if it was published 10 years ago, its probably out of date". So I came to ask which books you think held up and maybe where you go to "learn more by reading more".
Thanks!
https://redd.it/1lehcav
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A quirky, fun and gamified Wordle for hard-core Devops pals! 🎮
Helloo!
I just built a gamified version of Wordle, but exclusively with words related to DevOps, Observability and Monitoring.
There will be a five-letter word, and you have five guesses. The score is based on the time taken to crack it. There's also a hint (maybe slightly cryptic) that can help you guess right.
Soo be on your toes and think right!
Try it out here at - https://signoz.io/todaysdevopswordle
Play ON! 🎮 🎲
https://redd.it/1lej0bu
@r_devops
Helloo!
I just built a gamified version of Wordle, but exclusively with words related to DevOps, Observability and Monitoring.
There will be a five-letter word, and you have five guesses. The score is based on the time taken to crack it. There's also a hint (maybe slightly cryptic) that can help you guess right.
Soo be on your toes and think right!
Try it out here at - https://signoz.io/todaysdevopswordle
Play ON! 🎮 🎲
https://redd.it/1lej0bu
@r_devops
signoz.io
DevOps Wordle | SigNoz
This game will help you learn the evolving language of observability, DevOps, and monitoring in a playful way. After each game, you will find resources to explore the word.
What’s the best tooling stack your company uses for logging?
I work at a large bank and am responsible for handling a massive volume of logs every day. In banking, it’s critical to trace errors as quickly as possible because it involves money and customers. We use the ELK stack as our solution, and it’s very effective thanks to its full-text search. ELK is great, but it has one drawback: its compressed log volume is huge, which drives up maintenance and storage costs. We’ve looked into Loki and ClickHouse as alternatives, but neither can match ELK’s log-tracing speed with full-text search. Do you have a more balanced solution? What logging system are you running at your company?
https://redd.it/1lekk07
@r_devops
I work at a large bank and am responsible for handling a massive volume of logs every day. In banking, it’s critical to trace errors as quickly as possible because it involves money and customers. We use the ELK stack as our solution, and it’s very effective thanks to its full-text search. ELK is great, but it has one drawback: its compressed log volume is huge, which drives up maintenance and storage costs. We’ve looked into Loki and ClickHouse as alternatives, but neither can match ELK’s log-tracing speed with full-text search. Do you have a more balanced solution? What logging system are you running at your company?
https://redd.it/1lekk07
@r_devops
People looking for a career in Network Engineering, Telecom or Cloud Network Engineering and don’t know where to start…just hit me up!
People who are looking to or are interested to work in the Networking Automation, or Cloud Computing field. Just hit me up.
To be more specific, some job roles from this field include
1. SDN Engineer / SDN Developer
2. NFV Engineer / VNF Integration Engineer
3. Network Automation Engineer
4. Cloud Network Architect
5. Telecom Network Engineer (5G Core)
6. DevOps / NetDevOps Engineer
7. Network Security Engineer (Virtualized Environments)
and many more…
If you’re looking to build up your skills in these and get placed….just hit me up asap!!
Strictly for people in India
If you’re a fresher who’s stuck and confused to do what next, I have a great opportunity for you. DMMM!!!
https://redd.it/1lem3wm
@r_devops
People who are looking to or are interested to work in the Networking Automation, or Cloud Computing field. Just hit me up.
To be more specific, some job roles from this field include
1. SDN Engineer / SDN Developer
2. NFV Engineer / VNF Integration Engineer
3. Network Automation Engineer
4. Cloud Network Architect
5. Telecom Network Engineer (5G Core)
6. DevOps / NetDevOps Engineer
7. Network Security Engineer (Virtualized Environments)
and many more…
If you’re looking to build up your skills in these and get placed….just hit me up asap!!
Strictly for people in India
If you’re a fresher who’s stuck and confused to do what next, I have a great opportunity for you. DMMM!!!
https://redd.it/1lem3wm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
SaltStack vs Puppet or something else
Hi,
We still deploy a ton of virtual machines in all sorts of environments, and Ansible has done a great job so far during deployments. But we're seeing more and more cases where Ansible isn’t a good fit — usually because the machines aren't reachable during deployment, or the setup is just weird.
So now we’re looking at alternatives that can live on the VM and pull configs themselves. SaltStack and Puppet are the two I’m looking at. We’re not planning to go all-in with config management - the main goal is just to kick off some Microsoft DSC stuff once the VM is up and running. This includes installing some software or so during the deployment.
I’ve used Puppet before, but only as a “consumer” - writing manifests and modules (beginners level), but never setting up or running the backend.
Anyone using Salt or Puppet like this? Especially curious about the pull model - having the agent phone home is a big plus for us.
SaltStack is Open Source - but its backed by Broadcom - given their previous actions, should we even consider them?
https://redd.it/1len93f
@r_devops
Hi,
We still deploy a ton of virtual machines in all sorts of environments, and Ansible has done a great job so far during deployments. But we're seeing more and more cases where Ansible isn’t a good fit — usually because the machines aren't reachable during deployment, or the setup is just weird.
So now we’re looking at alternatives that can live on the VM and pull configs themselves. SaltStack and Puppet are the two I’m looking at. We’re not planning to go all-in with config management - the main goal is just to kick off some Microsoft DSC stuff once the VM is up and running. This includes installing some software or so during the deployment.
I’ve used Puppet before, but only as a “consumer” - writing manifests and modules (beginners level), but never setting up or running the backend.
Anyone using Salt or Puppet like this? Especially curious about the pull model - having the agent phone home is a big plus for us.
SaltStack is Open Source - but its backed by Broadcom - given their previous actions, should we even consider them?
https://redd.it/1len93f
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
As someone who already knows Other cloud providers, how long does it take me to learn Azure?
I'm a senior software engineer, a devops engineer and a sysadmin, my career is 20yrs+, so depending on the company I'm working on, I do the role asked from me.
I used Azure a bit in 2015 and 2018, currently there's a company that might hire me but needs an Azure expert, I'm already familiar with AWS, Google cloud, Oracle cloud and Hetzner, to name a few.
I didn't work much with Azure simply because the companies I worked in prefered to use other cloud providers.
How hard is it for someone like me to pick up Azure? Is it a deal breaker? Can I learn it in 2 weeks to get through the interview or not?
https://redd.it/1lep4wl
@r_devops
I'm a senior software engineer, a devops engineer and a sysadmin, my career is 20yrs+, so depending on the company I'm working on, I do the role asked from me.
I used Azure a bit in 2015 and 2018, currently there's a company that might hire me but needs an Azure expert, I'm already familiar with AWS, Google cloud, Oracle cloud and Hetzner, to name a few.
I didn't work much with Azure simply because the companies I worked in prefered to use other cloud providers.
How hard is it for someone like me to pick up Azure? Is it a deal breaker? Can I learn it in 2 weeks to get through the interview or not?
https://redd.it/1lep4wl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a free visual Kubernetes YAML generator – would love your feedback!
Hey everyone!
I just released an open-source tool called Kube Composer — it’s a browser-based visual editor that helps you build Kubernetes YAML without writing it by hand.
🧩 Drag-and-drop UI for defining resources
📄 Clean YAML export
🌐 No login, no install — runs entirely in the browser
🔗 https://kube-composer.com
💻 GitHub: https://github.com/same7ammar/kube-composer
I built this to reduce the pain of manually writing and validating YAML over and over again. Still early stage, so I’d love your feedback, suggestions, or even bug reports.
Happy to answer any questions!
https://redd.it/1leqxf8
@r_devops
Hey everyone!
I just released an open-source tool called Kube Composer — it’s a browser-based visual editor that helps you build Kubernetes YAML without writing it by hand.
🧩 Drag-and-drop UI for defining resources
📄 Clean YAML export
🌐 No login, no install — runs entirely in the browser
🔗 https://kube-composer.com
💻 GitHub: https://github.com/same7ammar/kube-composer
I built this to reduce the pain of manually writing and validating YAML over and over again. Still early stage, so I’d love your feedback, suggestions, or even bug reports.
Happy to answer any questions!
https://redd.it/1leqxf8
@r_devops
Kube Composer
Kube Composer - Free Kubernetes YAML Generator
Generate production-ready Kubernetes YAML files in minutes with our intuitive visual editor. Perfect for developers and DevOps teams. No registration required!
Interview Question, Is the Interviewer Wrong?
Had an interview recently at a large financial firm with their Director of DevOps.
One of the questions was regarding my experience with monitoring/logging tools, where I was asked to explain examples of my use along with what I have used.
The interviewer seemed to scald me on the fact our company use both Prometheus and Loki. I politely explained the differences between Prometheus (metrics) and Loki (logging), however the interviewer seemed adament that we should be down-selecting one of the two as they are apparently the same.
Answered all his other questions well I think otherwise, but am I going mad? We have used Loki as a logging tool and Prometheus as part of our monitoring stack. That was the final question twenty minutes into my thirty minute interview.
I would have thought a person in this position, in all of his wisdom, would have known the difference between the two.
https://redd.it/1lesiem
@r_devops
Had an interview recently at a large financial firm with their Director of DevOps.
One of the questions was regarding my experience with monitoring/logging tools, where I was asked to explain examples of my use along with what I have used.
The interviewer seemed to scald me on the fact our company use both Prometheus and Loki. I politely explained the differences between Prometheus (metrics) and Loki (logging), however the interviewer seemed adament that we should be down-selecting one of the two as they are apparently the same.
Answered all his other questions well I think otherwise, but am I going mad? We have used Loki as a logging tool and Prometheus as part of our monitoring stack. That was the final question twenty minutes into my thirty minute interview.
I would have thought a person in this position, in all of his wisdom, would have known the difference between the two.
https://redd.it/1lesiem
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you justify your salary expectations
Hi, so this is my first time looking for a switch after landing my first job as a DevOps Engineer. I have finally started to get some interview calls.
Recently I gave an interview for an early stage startup (team of about 15-20 people). They had a 6 days working policy and the work hours were also not that flexible so I wasn't sure that I would want to join because suddenly work pressure would get 2-3x for me. I still gave it for the interview experience.
The interview had 2 rounds, it went well but i struggled answering 2 questions.
1. My biggest professional achievement
2. How would you justify the salary ask (50% raise)
Now I only have 1.5 years of experience and that too 5 months in training/learning doing very basic things.Only since the last 8-9 months they've started giving me some substantial work.
How do you guys generally answer these questions.
https://redd.it/1lezh3p
@r_devops
Hi, so this is my first time looking for a switch after landing my first job as a DevOps Engineer. I have finally started to get some interview calls.
Recently I gave an interview for an early stage startup (team of about 15-20 people). They had a 6 days working policy and the work hours were also not that flexible so I wasn't sure that I would want to join because suddenly work pressure would get 2-3x for me. I still gave it for the interview experience.
The interview had 2 rounds, it went well but i struggled answering 2 questions.
1. My biggest professional achievement
2. How would you justify the salary ask (50% raise)
Now I only have 1.5 years of experience and that too 5 months in training/learning doing very basic things.Only since the last 8-9 months they've started giving me some substantial work.
How do you guys generally answer these questions.
https://redd.it/1lezh3p
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Does anyone use Docker Compose in production? I do, and here are my thoughts.
I work with a few clients, building, deploying, and maintaining internal business software tailored to each of their needs. These apps typically solve very specific operational problems and are deployed on VPS instances, running with docker compose. The setup is simple and works like a charm.
One of the biggest advantages of using docker compose in production is how straightforward it makes managing multi-container applications. Instead of juggling dozens of commands or configuring complex orchestration tools, everything stays in a single docker-compose.yml file. That means your entire environment, from databases to web servers to caches, can be spun up or updated with a single command.
For deployments, I use a simple manual workflow (shell script): run tests, check lints, build the Docker image, export it, and transfer it to the server. It’s intentionally minimal, no CI/CD tools involved, just a few reliable terminal commands.
The challenge I’ve faced is monitoring containers across multiple servers, especially logs. To deal with that, I set up a lightweight solution that collects logs from different machines into one place, where I can search and filter as needed.
So far, I haven’t had any problems using docker compose in production. I like it, and I’ll probably keep using it as long as it continues to fit my needs.
What’s your experience with docker compose in production?
https://redd.it/1lezx8h
@r_devops
I work with a few clients, building, deploying, and maintaining internal business software tailored to each of their needs. These apps typically solve very specific operational problems and are deployed on VPS instances, running with docker compose. The setup is simple and works like a charm.
One of the biggest advantages of using docker compose in production is how straightforward it makes managing multi-container applications. Instead of juggling dozens of commands or configuring complex orchestration tools, everything stays in a single docker-compose.yml file. That means your entire environment, from databases to web servers to caches, can be spun up or updated with a single command.
For deployments, I use a simple manual workflow (shell script): run tests, check lints, build the Docker image, export it, and transfer it to the server. It’s intentionally minimal, no CI/CD tools involved, just a few reliable terminal commands.
The challenge I’ve faced is monitoring containers across multiple servers, especially logs. To deal with that, I set up a lightweight solution that collects logs from different machines into one place, where I can search and filter as needed.
So far, I haven’t had any problems using docker compose in production. I like it, and I’ll probably keep using it as long as it continues to fit my needs.
What’s your experience with docker compose in production?
https://redd.it/1lezx8h
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What are some really cool projects that you've worked on, participated in, or seen people create?
I'm getting more and more involved in automation and devops (personally). I'd love to know what projects people have worked on to see if it'll inspire new ideas in me.
https://redd.it/1lf1xcg
@r_devops
I'm getting more and more involved in automation and devops (personally). I'd love to know what projects people have worked on to see if it'll inspire new ideas in me.
https://redd.it/1lf1xcg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Did anyone try openobserve?
Hey folks, as part of our observability pipeline we have dynatrace which is super expensive and we are planning to look for opensource solutions but not too many tools because we are a small team. I came across openobserve and kinda liked it but I want to hear your opinions about the platform.
Please advise!!
https://redd.it/1lez029
@r_devops
Hey folks, as part of our observability pipeline we have dynatrace which is super expensive and we are planning to look for opensource solutions but not too many tools because we are a small team. I came across openobserve and kinda liked it but I want to hear your opinions about the platform.
Please advise!!
https://redd.it/1lez029
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
LLMs Don’t Crash (They Just Quietly Ruin Your Reputation)
We’re seeing more companies add generative AI to their products...chatbots, smart assistants, summarizers, search, you name it. But many of them ship features without any real testing strategy. That’s not just risky, it’s reckless!!
One hallucination, a minor data leak, or a weird tone shift in production, and you’re dealing with trust issues, support tickets, legal exposure or worse.. people getting hurt.
But how to test GenAI-enabled applications?? Below are lessons that we have learned!
Start with defining what “good enough” means.
Seriously. What’s a good output? What’s wrong but tolerable? What’s flat-out unacceptable? Teams often skip this step, then argue about results later..
Use real inputs.
Not polished prompts. The kind of messy, typo-ridden, contradictory stuff real users write when they’re tired or frustrated. That’s the only way to know how it’ll perform.
Break the thing!!
Feed it adversarial prompts, contradictions, junk data. Push it until it fails. Better you than your users.
Track how it changes over time.
We saw assistants go from helpful to smug, or vague to overly confident, without a single code change. Model drift is real, especially with upstream updates.
Save everything.
Prompt versions, outputs, feedback. If something goes sideways, you’ll want a full trail. Not just for debugging, also for compliance.
Run chaos drills.
Every quarter, have your engineers or an external red team try to mess with the system. Give them a scorecard. Fix whatever they break.
Don’t fake your data.
Synthetic data has a place...especially for edge cases or sensitive topics, but it won’t reflect how weird and unpredictable actual users are. Anonymized real data beats generated samples.
If you’re in the EU or planning to be, the AI Act is NOT theoretical.
Employment tools, legal bots, health stuff, even education assistants, all count as high-risk. You’ll need formal testing and traceability. We’re mapping our work to ISO 42001 and the NIST AI Risk Framework now because we’ll have to show our homework.
Use existing tools.
We’re using LangSmith, Weights & Biases, and Evidently to monitor performance, flag bad outputs, detect drift, and tie feedback back to the prompt or version that caused it.
Once it’s live, the job’s just beginning..
You need alerts for prompt drift, logs with privacy controls, feedback loops to flag hallucinations or sensitive errors, and someone on call for when it says something weird at 2 a.m.
This isn’t about perfection, but rather about keeping things under control, and keeping people safe! GenAI doesn’t come with guardrails, instead, we have to build them!
What are you doing to test GenAI that actually works? What doesn't work in your experience?
https://redd.it/1lf6847
@r_devops
We’re seeing more companies add generative AI to their products...chatbots, smart assistants, summarizers, search, you name it. But many of them ship features without any real testing strategy. That’s not just risky, it’s reckless!!
One hallucination, a minor data leak, or a weird tone shift in production, and you’re dealing with trust issues, support tickets, legal exposure or worse.. people getting hurt.
But how to test GenAI-enabled applications?? Below are lessons that we have learned!
Start with defining what “good enough” means.
Seriously. What’s a good output? What’s wrong but tolerable? What’s flat-out unacceptable? Teams often skip this step, then argue about results later..
Use real inputs.
Not polished prompts. The kind of messy, typo-ridden, contradictory stuff real users write when they’re tired or frustrated. That’s the only way to know how it’ll perform.
Break the thing!!
Feed it adversarial prompts, contradictions, junk data. Push it until it fails. Better you than your users.
Track how it changes over time.
We saw assistants go from helpful to smug, or vague to overly confident, without a single code change. Model drift is real, especially with upstream updates.
Save everything.
Prompt versions, outputs, feedback. If something goes sideways, you’ll want a full trail. Not just for debugging, also for compliance.
Run chaos drills.
Every quarter, have your engineers or an external red team try to mess with the system. Give them a scorecard. Fix whatever they break.
Don’t fake your data.
Synthetic data has a place...especially for edge cases or sensitive topics, but it won’t reflect how weird and unpredictable actual users are. Anonymized real data beats generated samples.
If you’re in the EU or planning to be, the AI Act is NOT theoretical.
Employment tools, legal bots, health stuff, even education assistants, all count as high-risk. You’ll need formal testing and traceability. We’re mapping our work to ISO 42001 and the NIST AI Risk Framework now because we’ll have to show our homework.
Use existing tools.
We’re using LangSmith, Weights & Biases, and Evidently to monitor performance, flag bad outputs, detect drift, and tie feedback back to the prompt or version that caused it.
Once it’s live, the job’s just beginning..
You need alerts for prompt drift, logs with privacy controls, feedback loops to flag hallucinations or sensitive errors, and someone on call for when it says something weird at 2 a.m.
This isn’t about perfection, but rather about keeping things under control, and keeping people safe! GenAI doesn’t come with guardrails, instead, we have to build them!
What are you doing to test GenAI that actually works? What doesn't work in your experience?
https://redd.it/1lf6847
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A Brief DevOps History: The Roots of Infrastructure as Code
I came across this article on the history of DevOps practices and tools, and felt like it should be shared - https://thenewstack.io/a-brief-devops-history-the-roots-of-infrastructure-as-code/
https://redd.it/1lf73ts
@r_devops
I came across this article on the history of DevOps practices and tools, and felt like it should be shared - https://thenewstack.io/a-brief-devops-history-the-roots-of-infrastructure-as-code/
https://redd.it/1lf73ts
@r_devops
The New Stack
A Brief DevOps History: The Roots of Infrastructure as Code
If you look at the history of computing, configuration management tools have been around since the 1970s. How did we get from make files to Terraform configuration?
How should I manage prerequisites for this application?
I have inherited a very old application that has some prerequisites including java, vc redists, and some sql odbc drivers. It has been deployed and maintained manually so far and is in a bit of a sorry state.
Should these prerequisite installs be completed as part of the applications release process, or during server provisioning?
These are very old dependencies that are unlikely to change. Even for things like vulnerability management (I know, it’s not good).
I have no control over the image put onto the VM.
View Poll
https://redd.it/1lf75m7
@r_devops
I have inherited a very old application that has some prerequisites including java, vc redists, and some sql odbc drivers. It has been deployed and maintained manually so far and is in a bit of a sorry state.
Should these prerequisite installs be completed as part of the applications release process, or during server provisioning?
These are very old dependencies that are unlikely to change. Even for things like vulnerability management (I know, it’s not good).
I have no control over the image put onto the VM.
View Poll
https://redd.it/1lf75m7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you check if Node.js is using all of the available memory in your container?
Noticed that if I run some app on Windows the hard limit is 4GB, but on my containers, it seems to be 2GB. How do I increase that? I want to increase the max amount of memory for the container and Node.js itself. I found a way to do that on Windows, but I need to do it on my containers and I want to find the best way to achieve this.
https://redd.it/1lf8u1l
@r_devops
Noticed that if I run some app on Windows the hard limit is 4GB, but on my containers, it seems to be 2GB. How do I increase that? I want to increase the max amount of memory for the container and Node.js itself. I found a way to do that on Windows, but I need to do it on my containers and I want to find the best way to achieve this.
https://redd.it/1lf8u1l
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How are you actually handling observability in 2025? (Beyond the marketing fluff)
I've been diving deep into observability platforms lately and I'm genuinely curious about real-world experiences. The vendor demos all look amazing, but we know how that goes...
What's your current observability reality?
For context, here's what I'm dealing with:
* Logs scattered across 15+ services with no unified view
* Metrics in Prometheus, APM in New Relic (or whatever), errors in Sentry - context switching nightmare
* Alert fatigue is REAL (got woken up 3 times last week for non-issues)
* Debugging a distributed system feels like detective work with half the clues missing
* Developers asking "can you check why this is slow?" and it takes 30 minutes just to gather the data
The million-dollar questions:
1. What's your observability stack? (Honest answers - not what your company says they use)
2. How long does it take you to debug a production issue? From alert to root cause
3. What percentage of your alerts are actually actionable?
4. Are you using unified platforms (DataDog, New Relic) or stitching together open source tools?
5. For developers: How much time do you spend hunting through logs vs actually fixing issues?
What's the most ridiculous observability problem you've encountered?
I'm trying to figure out if we should invest in a unified platform or if everyone's just as frustrated as we are. The "three pillars of observability" sound great in theory, but in practice it feels like three separate headaches.
https://redd.it/1lf9wge
@r_devops
I've been diving deep into observability platforms lately and I'm genuinely curious about real-world experiences. The vendor demos all look amazing, but we know how that goes...
What's your current observability reality?
For context, here's what I'm dealing with:
* Logs scattered across 15+ services with no unified view
* Metrics in Prometheus, APM in New Relic (or whatever), errors in Sentry - context switching nightmare
* Alert fatigue is REAL (got woken up 3 times last week for non-issues)
* Debugging a distributed system feels like detective work with half the clues missing
* Developers asking "can you check why this is slow?" and it takes 30 minutes just to gather the data
The million-dollar questions:
1. What's your observability stack? (Honest answers - not what your company says they use)
2. How long does it take you to debug a production issue? From alert to root cause
3. What percentage of your alerts are actually actionable?
4. Are you using unified platforms (DataDog, New Relic) or stitching together open source tools?
5. For developers: How much time do you spend hunting through logs vs actually fixing issues?
What's the most ridiculous observability problem you've encountered?
I'm trying to figure out if we should invest in a unified platform or if everyone's just as frustrated as we are. The "three pillars of observability" sound great in theory, but in practice it feels like three separate headaches.
https://redd.it/1lf9wge
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Tooltitude for YAML extension
We recently released a new extension: Tooltitude for YAML. YAML is widely used in devops, so we think this is relevant to member of this community.
It provides the following features:
- Configurable YAML formatter, which allows setting indent size, and the indentation style for lists
- Outline, including the breadcrumbs bar
We recently released it, so if you have feature requests, feel free to share them with us here or on the issue tracker. Read more: https://marketplace.visualstudio.com/items?itemName=tooltitudeteam.tooltitude-ym
P.S. We have been creating extensions for more than 2 years, the most popular of our extensions is Tooltitude for Go: https://marketplace.visualstudio.com/items?itemName=tooltitudeteam.tooltitude
https://redd.it/1lfbc53
@r_devops
We recently released a new extension: Tooltitude for YAML. YAML is widely used in devops, so we think this is relevant to member of this community.
It provides the following features:
- Configurable YAML formatter, which allows setting indent size, and the indentation style for lists
- Outline, including the breadcrumbs bar
We recently released it, so if you have feature requests, feel free to share them with us here or on the issue tracker. Read more: https://marketplace.visualstudio.com/items?itemName=tooltitudeteam.tooltitude-ym
P.S. We have been creating extensions for more than 2 years, the most popular of our extensions is Tooltitude for Go: https://marketplace.visualstudio.com/items?itemName=tooltitudeteam.tooltitude
https://redd.it/1lfbc53
@r_devops
Visualstudio
Tooltitude for YAML - Visual Studio Marketplace
Extension for Visual Studio Code - Productivity features for YAML