DevOps team in the AI era
It feels like in near future DevOps team will be busy building, supporting, maintaining remote MCP servers across different teams. Kinda become AI tool enablers.
I can imagine that request will be “team, we are starting a new project, so we need support for a new tool in MCP server” or “please fix a bug in this MCP because our ai client recently got wrong response”. CI/CD of MCP 😅 hallucinations monitoring dashboards
https://redd.it/1ldebv1
@r_devops
It feels like in near future DevOps team will be busy building, supporting, maintaining remote MCP servers across different teams. Kinda become AI tool enablers.
I can imagine that request will be “team, we are starting a new project, so we need support for a new tool in MCP server” or “please fix a bug in this MCP because our ai client recently got wrong response”. CI/CD of MCP 😅 hallucinations monitoring dashboards
https://redd.it/1ldebv1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
We reduced our Kubernetes costs by 40% using automation — here’s what helped most
In our Kubernetes clusters, we've been focusing a lot on cost optimisation. We wanted to share a few minor yet significant adjustments that we found to be effective (we'd love to know what else is working as well):
✅ Developer namespaces were automatically reduced after business hours.
✅ Appropriate pod requests and limits according to actual usage (no more 2Gi on idle jobs 😅)
✅ Remaining debug pods, outdated replicas, and unused PVCs were cleaned up.
✅ To cut down on noise, usage-based triggers were used in place of always-on alerts.
In addition to saving a tonne of engineering hours, Alertmend(https://alertmend.io/) helped us reduce idle resources by tying Prometheus metrics to cost insights and automatically running cleanup/scale workflows.
I'm curious about what other people are doing to save money over time, particularly if you're automating using Prometheus, scripts, or third-party tools.
https://redd.it/1ldfnsw
@r_devops
In our Kubernetes clusters, we've been focusing a lot on cost optimisation. We wanted to share a few minor yet significant adjustments that we found to be effective (we'd love to know what else is working as well):
✅ Developer namespaces were automatically reduced after business hours.
✅ Appropriate pod requests and limits according to actual usage (no more 2Gi on idle jobs 😅)
✅ Remaining debug pods, outdated replicas, and unused PVCs were cleaned up.
✅ To cut down on noise, usage-based triggers were used in place of always-on alerts.
In addition to saving a tonne of engineering hours, Alertmend(https://alertmend.io/) helped us reduce idle resources by tying Prometheus metrics to cost insights and automatically running cleanup/scale workflows.
I'm curious about what other people are doing to save money over time, particularly if you're automating using Prometheus, scripts, or third-party tools.
https://redd.it/1ldfnsw
@r_devops
AlertMend AI
AlertMend: Autonomous AI SRE Platform
AlertMend AI empowers DevOps teams with an autonomous AI agent that troubleshoots Kubernetes alerts 10x faster, reduces MTTR, and optimizes cloud costs...
SREs – got 2 mins?
Working on a blog post about how (or if) AI is actually useful in incident management and observability. Trying to include thoughts from folks.
If you're an SRE or work on infra/on-call stuff, would love to hear from you. Even if your team hasn't touched AI tools yet, that’s super relevant.
**Form’s here (3-5 mins tops):**
👉 [https://docs.google.com/forms/d/e/1FAIpQLSc5Sxwv8ebPJD943xNKTZPKSkb0ECozEqrZzmjRy7K2AvRH4A/viewform](https://docs.google.com/forms/d/e/1FAIpQLSc5Sxwv8ebPJD943xNKTZPKSkb0ECozEqrZzmjRy7K2AvRH4A/viewform)
# A few things:
* No spam, no sales, just writing a blog.
* You can stay anonymous as there’s an option to be quoted if you're cool with that.
* Not asking for any infra details. Just your takes.
Will share the post here once it's live if folks are curious. Appreciate any responses 🙏
https://redd.it/1ldhrno
@r_devops
Working on a blog post about how (or if) AI is actually useful in incident management and observability. Trying to include thoughts from folks.
If you're an SRE or work on infra/on-call stuff, would love to hear from you. Even if your team hasn't touched AI tools yet, that’s super relevant.
**Form’s here (3-5 mins tops):**
👉 [https://docs.google.com/forms/d/e/1FAIpQLSc5Sxwv8ebPJD943xNKTZPKSkb0ECozEqrZzmjRy7K2AvRH4A/viewform](https://docs.google.com/forms/d/e/1FAIpQLSc5Sxwv8ebPJD943xNKTZPKSkb0ECozEqrZzmjRy7K2AvRH4A/viewform)
# A few things:
* No spam, no sales, just writing a blog.
* You can stay anonymous as there’s an option to be quoted if you're cool with that.
* Not asking for any infra details. Just your takes.
Will share the post here once it's live if folks are curious. Appreciate any responses 🙏
https://redd.it/1ldhrno
@r_devops
Google Docs
AI and the Future of Observability and Incident Management
We're gathering insights from SREs and DevOps engineers to understand how AI is (or isn’t) changing the way teams approach incident management and observability. This is for an upcoming blog post to share trends, challenges, and perspectives from real engineers…
Who's using Backstage? What are your use cases?
Hey everyone,
I’m curious to hear if anyone is actively using [Backstage](https://backstage.io/) in production. I'm evaluating it for internal developer portals and wanted to get a better sense of real-world use cases.
* What are you using Backstage for?
* Which plugins do you rely on most?
* Any gotchas, lessons learned, or things you’d do differently?
Would really appreciate hearing about your setups — from solo dev projects to large orgs!
Thanks in advance 🙌
https://redd.it/1ldjjcu
@r_devops
Hey everyone,
I’m curious to hear if anyone is actively using [Backstage](https://backstage.io/) in production. I'm evaluating it for internal developer portals and wanted to get a better sense of real-world use cases.
* What are you using Backstage for?
* Which plugins do you rely on most?
* Any gotchas, lessons learned, or things you’d do differently?
Would really appreciate hearing about your setups — from solo dev projects to large orgs!
Thanks in advance 🙌
https://redd.it/1ldjjcu
@r_devops
backstage.io
Backstage Software Catalog and Developer Platform
Backstage is an open source developer portal framework that centralizes your software catalog, unifies infrastructure tools, and helps teams ship high-quality code faster.
Automation VS SOX Compliance - any insights?
I have been automating a lot of financial reporting for my employer using a variety of tools like Power Platform, ETL/ELT (Informatica, Snowflake, Azure Analysis Services I.E. AAS) etc.
Our accounting suite is SAP ECC (will likely migrate to S/4HANA by 2027).
And then our auditors yelped "SOX ITGCs/ITACs!"
(Sarbanes-Oxley Act Information Technology General/Application Controls, basically publicly traded companies need to disclose every single step in the data flow to auditors to guarantee data integrity between source and target.)
And they made it abundantly clear that automation cannot be done in case there is any sort of data flow that can affect data integrity, as it would have to be re-reviewed step by step each audit.
They (EY) make it seem like a black and white thing and frankly in a patronising manner. For instance, quarterly exports from SAP supported by printscreens from the moment of capture.
So what to do?
I am mainly looking into general insights, so do share. Sources on ITAC Controls would be even better. (ITGCs are straightforward, ISO 27001) but my issue in particular focuses on two parts:
1. SOX Compliance with middleware
We use both Informatica and Snowflake. Both offer SOX Compliance controls. None are set up yet.
But our issue is that we were previously working on Informatica - SQL Datawarehouse (AAS).
Now we are moving to Snowflake, but we are still using Informatica to move data from SAP to Snowflake.
I feel that is a step too many as it would require the same controls in both Informatica and Snowflake.
I also understand this is the only way to have continuous monitoring in place (as opposed to snapshots), which is where SOX 404 is going through from what I understand.
2. SOX Compliance without middleware
Limiting the data lineage from source (SAP) to target (audit report) is an obvious answer.
But now I want to play Devil's Advocate:
Do I have to do these repeatable steps manually?
Or:
Can't RPA do it?
Hypothetically (seriously I have NOT done this... yet), SUPPOSE if I were to implement automation through a mix of Python and maybe some Excel, then on the surface it would still look like I manually exported a quarterly report.
That way it is just a few repeatable steps automated through a form of RPA (Robotic Process Automation) under my username and without touching data integrity (no change to the source data).
And it could save the company hours. Seriously, we have one guy losing half a day each time he needs to do a datadump of SAP's ACDOCA table.
Auditors would not see the difference.
Okay I could also have the Python code audited, but is that really necessary when a process is automated on a user level?
SOX is supposed to be about controls, not manual tedium. That's not what they (EY) are having us believe however.
https://redd.it/1ldklhc
@r_devops
I have been automating a lot of financial reporting for my employer using a variety of tools like Power Platform, ETL/ELT (Informatica, Snowflake, Azure Analysis Services I.E. AAS) etc.
Our accounting suite is SAP ECC (will likely migrate to S/4HANA by 2027).
And then our auditors yelped "SOX ITGCs/ITACs!"
(Sarbanes-Oxley Act Information Technology General/Application Controls, basically publicly traded companies need to disclose every single step in the data flow to auditors to guarantee data integrity between source and target.)
And they made it abundantly clear that automation cannot be done in case there is any sort of data flow that can affect data integrity, as it would have to be re-reviewed step by step each audit.
They (EY) make it seem like a black and white thing and frankly in a patronising manner. For instance, quarterly exports from SAP supported by printscreens from the moment of capture.
So what to do?
I am mainly looking into general insights, so do share. Sources on ITAC Controls would be even better. (ITGCs are straightforward, ISO 27001) but my issue in particular focuses on two parts:
1. SOX Compliance with middleware
We use both Informatica and Snowflake. Both offer SOX Compliance controls. None are set up yet.
But our issue is that we were previously working on Informatica - SQL Datawarehouse (AAS).
Now we are moving to Snowflake, but we are still using Informatica to move data from SAP to Snowflake.
I feel that is a step too many as it would require the same controls in both Informatica and Snowflake.
I also understand this is the only way to have continuous monitoring in place (as opposed to snapshots), which is where SOX 404 is going through from what I understand.
2. SOX Compliance without middleware
Limiting the data lineage from source (SAP) to target (audit report) is an obvious answer.
But now I want to play Devil's Advocate:
Do I have to do these repeatable steps manually?
Or:
Can't RPA do it?
Hypothetically (seriously I have NOT done this... yet), SUPPOSE if I were to implement automation through a mix of Python and maybe some Excel, then on the surface it would still look like I manually exported a quarterly report.
That way it is just a few repeatable steps automated through a form of RPA (Robotic Process Automation) under my username and without touching data integrity (no change to the source data).
And it could save the company hours. Seriously, we have one guy losing half a day each time he needs to do a datadump of SAP's ACDOCA table.
Auditors would not see the difference.
Okay I could also have the Python code audited, but is that really necessary when a process is automated on a user level?
SOX is supposed to be about controls, not manual tedium. That's not what they (EY) are having us believe however.
https://redd.it/1ldklhc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Critical Python Package Vulnerability Now Actively Exploited – CVE-2025-3248
There's a critical unauthenticated RCE vulnerability (CVSS 9.8) in Langflow (<1.3.0), a widely-used Python framework for building AI apps (70k+ GitHub stars, 21k+ PyPI downloads/week).
Link to blog post:
https://cloudsmith.com/blog/cve-2025-3248-serious-vulnerability-found-in-popular-python-ai-package
Attackers are actively exploiting this flaw to install the Flodrix DDoS botnet via the
If you're pulling anything from PyPI or running Langflow-based AI services exposed to the internet, you should check your versions now.
https://redd.it/1ldlfhg
@r_devops
There's a critical unauthenticated RCE vulnerability (CVSS 9.8) in Langflow (<1.3.0), a widely-used Python framework for building AI apps (70k+ GitHub stars, 21k+ PyPI downloads/week).
Link to blog post:
https://cloudsmith.com/blog/cve-2025-3248-serious-vulnerability-found-in-popular-python-ai-package
Attackers are actively exploiting this flaw to install the Flodrix DDoS botnet via the
/api/v1/validate/code endpoint, which (incredibly) uses ast.parse() \+ compile() \+ exec() without auth.If you're pulling anything from PyPI or running Langflow-based AI services exposed to the internet, you should check your versions now.
https://redd.it/1ldlfhg
@r_devops
Cloudsmith
CVE-2025-3248: Serious vulnerability found in popular Python AI package | Cloudsmith
Researchers at Trend Micro have uncovered a critical unauthenticated remote code execution (RCE) vulnerability [CVE-2025-3248] affecting Langflow versions prior to 1.3.0. Langflow is a Python-based vi…
Share your idea for my setup.
Hey r/devops!
I have my own freelancing company, and I would like to offer hosting to my clients. After studying options and considering my budget, I settled on Oracle Cloud and found that I can even have a free K8s cluster with 4 nodes. If you were in my position and had to set this up, while also serving some applications from this cluster, CI/CD them, and monitor their status. How would you tackle this?
https://redd.it/1ldm8dz
@r_devops
Hey r/devops!
I have my own freelancing company, and I would like to offer hosting to my clients. After studying options and considering my budget, I settled on Oracle Cloud and found that I can even have a free K8s cluster with 4 nodes. If you were in my position and had to set this up, while also serving some applications from this cluster, CI/CD them, and monitor their status. How would you tackle this?
https://redd.it/1ldm8dz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Flutter Developer Thinking of Switching to Cloud Engineering – Is It Worth It? Where to Start?
Hey everyone,
I’m currently working as a Flutter developer and have been in mobile app development for a while now. Lately, I’ve been really curious about Cloud Engineering — the idea of building scalable infrastructure, working with DevOps tools, and understanding cloud platforms like AWS, Azure, or GCP sounds exciting.
But honestly, I have no idea where to start.
Is it worth making the switch from Flutter to Cloud Engineering? How steep is the learning curve? And if I do want to start exploring, are there any beginner-friendly tutorials or roadmaps you’d recommend?
I’m not planning to completely abandon mobile development just yet, but I’d love to eventually land a role in cloud or DevOps. Any advice, insights, or resources would be super appreciated.
Thanks in advance!
https://redd.it/1ldoc40
@r_devops
Hey everyone,
I’m currently working as a Flutter developer and have been in mobile app development for a while now. Lately, I’ve been really curious about Cloud Engineering — the idea of building scalable infrastructure, working with DevOps tools, and understanding cloud platforms like AWS, Azure, or GCP sounds exciting.
But honestly, I have no idea where to start.
Is it worth making the switch from Flutter to Cloud Engineering? How steep is the learning curve? And if I do want to start exploring, are there any beginner-friendly tutorials or roadmaps you’d recommend?
I’m not planning to completely abandon mobile development just yet, but I’d love to eventually land a role in cloud or DevOps. Any advice, insights, or resources would be super appreciated.
Thanks in advance!
https://redd.it/1ldoc40
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How to commit a bugfix for PROD in main when few commits should not get transported?
Hello everyone,
lets say there is a main branch which has been deployed to Prod. Then there are additional commits pushed to main via pull requests. Now main is ahead of production by 2 commits. Then there is a bug found in Prod which requires an urgent fix. The fix is ready but not yet merged to the main branch. The condition is the 2 commits should not be moved to PROD but only the fix which came later after those 2 commits. how this can work out?
the Stack looks as below, better read from bottom to top:
\---BugFix (I want only this to get deployed and not 2 commits from wave1)
\---wave1 feature code
\---wave1 enhahcement
\---main (thats where wave0 exist and got deployed to PROD)
========================================
One possible solution is to comment the codes from 2 commits in a new commit with the fix and then deploy.
The other one is to create branches specific to releases such as release/wave0 and continue with main. At the end, create release/wave1 from main and start working on wave3 in main.
Are there any alternatives?
Thanks
https://redd.it/1ldp3bz
@r_devops
Hello everyone,
lets say there is a main branch which has been deployed to Prod. Then there are additional commits pushed to main via pull requests. Now main is ahead of production by 2 commits. Then there is a bug found in Prod which requires an urgent fix. The fix is ready but not yet merged to the main branch. The condition is the 2 commits should not be moved to PROD but only the fix which came later after those 2 commits. how this can work out?
the Stack looks as below, better read from bottom to top:
\---BugFix (I want only this to get deployed and not 2 commits from wave1)
\---wave1 feature code
\---wave1 enhahcement
\---main (thats where wave0 exist and got deployed to PROD)
========================================
One possible solution is to comment the codes from 2 commits in a new commit with the fix and then deploy.
The other one is to create branches specific to releases such as release/wave0 and continue with main. At the end, create release/wave1 from main and start working on wave3 in main.
Are there any alternatives?
Thanks
https://redd.it/1ldp3bz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Infisical vs others
Thoughts on infisical.com?
Anyone using it in production?
Seems to me that it compares with AWS parameter store and HashiCorp vault
https://redd.it/1ldqikc
@r_devops
Thoughts on infisical.com?
Anyone using it in production?
Seems to me that it compares with AWS parameter store and HashiCorp vault
https://redd.it/1ldqikc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
severe grafana CVE: patch now or forever hold your peace (CVE-2025-4123 Grafana)
there's a pretty significant cross-site scripting vulnerability in many versions of grafana...
'''
A cross-site scripting (XSS) vulnerability exists in Grafana caused by combining a client path traversal and open redirect. This allows attackers to redirect users to a website that hosts a frontend plugin that will execute arbitrary JavaScript. This vulnerability does not require editor permissions and if anonymous access is enabled, the XSS will work. If the Grafana Image Renderer plugin is installed, it is possible to exploit the open redirect to achieve a full read SSRF. The default Content-Security-Policy (CSP) in Grafana will block the XSS though the connect-src directive. This vulnerability is fixed in v10.4.18+security-01, v11.2.9+security-01, v11.3.6+security-01, v11.4.4+security-01, v11.5.4+security-01, v11.6.1+security-01, and v12.0.0+security-01
'''
https://nvd.nist.gov/vuln/detail/CVE-2025-4123
https://grafana.com/security/security-advisories/cve-2025-4123/
https://www.bleepingcomputer.com/news/security/over-46-000-grafana-instances-exposed-to-account-takeover-bug/
https://redd.it/1ldsg2x
@r_devops
there's a pretty significant cross-site scripting vulnerability in many versions of grafana...
'''
A cross-site scripting (XSS) vulnerability exists in Grafana caused by combining a client path traversal and open redirect. This allows attackers to redirect users to a website that hosts a frontend plugin that will execute arbitrary JavaScript. This vulnerability does not require editor permissions and if anonymous access is enabled, the XSS will work. If the Grafana Image Renderer plugin is installed, it is possible to exploit the open redirect to achieve a full read SSRF. The default Content-Security-Policy (CSP) in Grafana will block the XSS though the connect-src directive. This vulnerability is fixed in v10.4.18+security-01, v11.2.9+security-01, v11.3.6+security-01, v11.4.4+security-01, v11.5.4+security-01, v11.6.1+security-01, and v12.0.0+security-01
'''
https://nvd.nist.gov/vuln/detail/CVE-2025-4123
https://grafana.com/security/security-advisories/cve-2025-4123/
https://www.bleepingcomputer.com/news/security/over-46-000-grafana-instances-exposed-to-account-takeover-bug/
https://redd.it/1ldsg2x
@r_devops
Grafana Labs
XSS in Frontend Plugins in Grafana | Grafana Labs
A cross-site scripting (XSS) vulnerability exists in Grafana caused by combining a client path traversal and open redirect. This allows attackers to redirect users to a website that hosts a frontend plugin that will execute arbitrary JavaScript. This vulnerability…
I addressed the Fatal Mistake in my resume I got roasted for yesterday. Ty for 100+ responses
Hi everyone.
https://i.imgur.com/seBld3F.jpeg < - My new streamlined resume
---
Thank you for the 100+ constructive comments I got on my post yesterday.
Here -> What fatal mistake do you see in my resume? I am getting 0 ( ZERO ) response to any job applications
I think I've addressed most of it. I agree with the comments about it being an essay. We live in a weird time where I expect the AI machine to process my resume well before a human gets to it so I was trying to load as much info as possible in a 2 page resume. Devops is a field where we are doing new things basically everyweek and i feel like 50% of the stuff ive worked with isnt even on the resume lol.
BUt yes you guys are correct. Hope my new resume is better.
Is it a bit too light? looking forward to feeback thank you
https://redd.it/1ldu6tp
@r_devops
Hi everyone.
https://i.imgur.com/seBld3F.jpeg < - My new streamlined resume
---
Thank you for the 100+ constructive comments I got on my post yesterday.
Here -> What fatal mistake do you see in my resume? I am getting 0 ( ZERO ) response to any job applications
I think I've addressed most of it. I agree with the comments about it being an essay. We live in a weird time where I expect the AI machine to process my resume well before a human gets to it so I was trying to load as much info as possible in a 2 page resume. Devops is a field where we are doing new things basically everyweek and i feel like 50% of the stuff ive worked with isnt even on the resume lol.
BUt yes you guys are correct. Hope my new resume is better.
Is it a bit too light? looking forward to feeback thank you
https://redd.it/1ldu6tp
@r_devops
Imgur
Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more from users.
DB scripts! How do you handle that?
Hi guys good day. Hope you're doing well.
So I have worked in multiple projects and it seems that db scripts are the one thing that requires a lot of attention and human intervention. Would love to know -
1. How do you hadle db scripts using pipelines?
2. What are the most challenging part of implementation?
3. How do you take care of rollback of required?
4. What's the trickiest thing that you have ever done while designing db scripts pipelines?
https://redd.it/1lduujd
@r_devops
Hi guys good day. Hope you're doing well.
So I have worked in multiple projects and it seems that db scripts are the one thing that requires a lot of attention and human intervention. Would love to know -
1. How do you hadle db scripts using pipelines?
2. What are the most challenging part of implementation?
3. How do you take care of rollback of required?
4. What's the trickiest thing that you have ever done while designing db scripts pipelines?
https://redd.it/1lduujd
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone else feel like you’re “learning” but not actually making progress?
Lately I’ve been thinking that i spend hours watching tutorials, taking notes, and following along with code .....but when i try to build something from scratch, i freeze.
Like i understood it while watching, but didn’t really absorb anything.
That’s when I realized.....learning isn’t just about consuming info, it’s about making stuff, even if it’s bad or tiny or full of bugs.
Now I’ve started focusing more on building little tools, scripts, and weird automations ........ just to apply what I learn as I learn it.
Anyone else going through this phase?
How do you make sure you're actually learning instead of just binging tutorials?
https://redd.it/1ldtejw
@r_devops
Lately I’ve been thinking that i spend hours watching tutorials, taking notes, and following along with code .....but when i try to build something from scratch, i freeze.
Like i understood it while watching, but didn’t really absorb anything.
That’s when I realized.....learning isn’t just about consuming info, it’s about making stuff, even if it’s bad or tiny or full of bugs.
Now I’ve started focusing more on building little tools, scripts, and weird automations ........ just to apply what I learn as I learn it.
Anyone else going through this phase?
How do you make sure you're actually learning instead of just binging tutorials?
https://redd.it/1ldtejw
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Recruiter/Headhunter Recommendations?
I was wondering if any of you have any recommendations for recruiters/headhunters you may have hired to help you find a new position? I have 15 YOE in tech, 10 of which have been in senior/lead devops roles, and my biggest challenge right now is finding the time to apply with all the associated accoutrement; to the point where I'd like to hire someone to help.
Anyone have any good experiences they can share?
https://redd.it/1le0eoc
@r_devops
I was wondering if any of you have any recommendations for recruiters/headhunters you may have hired to help you find a new position? I have 15 YOE in tech, 10 of which have been in senior/lead devops roles, and my biggest challenge right now is finding the time to apply with all the associated accoutrement; to the point where I'd like to hire someone to help.
Anyone have any good experiences they can share?
https://redd.it/1le0eoc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
IaC Platforms Complexity
Lately I've been wondering, why are modern IaC platforms so complex to use?
It feels like most solutions (Terraform, Pulumi, Crossplane, etc.) are extremely powerful but often come with steep learning curves and unintuitive workflows
Is this complexity necessary due to the nature of infrastructure itself? Or is there a general lack of focus on usability in this space?
Are there any efforts or platforms that prioritize simplicity and better user experience? Or has the industry kind of accepted that complexity is just the norm, and users are expected to adapt??
https://redd.it/1le14t4
@r_devops
Lately I've been wondering, why are modern IaC platforms so complex to use?
It feels like most solutions (Terraform, Pulumi, Crossplane, etc.) are extremely powerful but often come with steep learning curves and unintuitive workflows
Is this complexity necessary due to the nature of infrastructure itself? Or is there a general lack of focus on usability in this space?
Are there any efforts or platforms that prioritize simplicity and better user experience? Or has the industry kind of accepted that complexity is just the norm, and users are expected to adapt??
https://redd.it/1le14t4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Advice Needed! Transition from Senior desktop support analyst to DevOps engineer????
Hey Reddit,
I work for a large enterprise and I'm currently a Senior I.T. Technical Lead (basically Senior Desktop Support Analyst) supporting a department of around 200 users mostly Mac users, with some accountants using Windows 11. I have no directive port report so I'm Solo Dolo in this shit lol
Unfortunately, there's a chance that my department may be laid off in 12 months. So I want to take the one year to figure out what I'll enjoy, lock in and upskill.
But the problem is that I'm stuck deciding on what to explore next, and I'd love to get y'all thoughts on which career path I should look into based on my background and interests????
>Current Day to Day: (Outside basic end user support)
>Microsoft Power Automate (I'm comfortable with Expressions + JSON)
>Microsoft Power Apps (comfortable with PowerFX and Model Driven Apps)
>Microsoft Dataverse (Also PowerFx formula columns + Relational Databases)
>Microsoft Excel (Pivot Tables, Power Query, Data Array Function)
>Very basic HTML (For Building Reports within Power Automate)
>Managing SharePoint sites
>Managing user permissions in Active Directory and Microsoft Entra
>White glove VIP Executive Support
>Paths I'm Considering:
>Cloud Engineering
>DevOps Engineering
>Data Engineering
>System Admin (If all else fails)
>My Approach & Resources:
>I'm comfortable diving into intensive study, Python, R, SQL, whatever it takes.
>My current company is a large enterprise, and I have access to various tools and tech department contacts, so I'm not too worried about getting the chance to practice what I learn and to get hands-on experience.
>My plan is to solve a real business problem before I leave the job so it gives me some experience and stories to tell in my next interview.
So based on all of that, which path do you think aligns best with my skills, interests?
https://redd.it/1le9sy0
@r_devops
Hey Reddit,
I work for a large enterprise and I'm currently a Senior I.T. Technical Lead (basically Senior Desktop Support Analyst) supporting a department of around 200 users mostly Mac users, with some accountants using Windows 11. I have no directive port report so I'm Solo Dolo in this shit lol
Unfortunately, there's a chance that my department may be laid off in 12 months. So I want to take the one year to figure out what I'll enjoy, lock in and upskill.
But the problem is that I'm stuck deciding on what to explore next, and I'd love to get y'all thoughts on which career path I should look into based on my background and interests????
>Current Day to Day: (Outside basic end user support)
>Microsoft Power Automate (I'm comfortable with Expressions + JSON)
>Microsoft Power Apps (comfortable with PowerFX and Model Driven Apps)
>Microsoft Dataverse (Also PowerFx formula columns + Relational Databases)
>Microsoft Excel (Pivot Tables, Power Query, Data Array Function)
>Very basic HTML (For Building Reports within Power Automate)
>Managing SharePoint sites
>Managing user permissions in Active Directory and Microsoft Entra
>White glove VIP Executive Support
>Paths I'm Considering:
>Cloud Engineering
>DevOps Engineering
>Data Engineering
>System Admin (If all else fails)
>My Approach & Resources:
>I'm comfortable diving into intensive study, Python, R, SQL, whatever it takes.
>My current company is a large enterprise, and I have access to various tools and tech department contacts, so I'm not too worried about getting the chance to practice what I learn and to get hands-on experience.
>My plan is to solve a real business problem before I leave the job so it gives me some experience and stories to tell in my next interview.
So based on all of that, which path do you think aligns best with my skills, interests?
https://redd.it/1le9sy0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Snapshot vs backup
In my previous company we would always make snapshots before system or package upgrades, but it got me thinking whether it’s actually sufficient. What are the chances for upgrades to cause persistent metadata corruption on the disk that would be irreversible for the snapshot and make backups necessary? Are snapshots actually enough for maintenance procedures?
https://redd.it/1leaxkx
@r_devops
In my previous company we would always make snapshots before system or package upgrades, but it got me thinking whether it’s actually sufficient. What are the chances for upgrades to cause persistent metadata corruption on the disk that would be irreversible for the snapshot and make backups necessary? Are snapshots actually enough for maintenance procedures?
https://redd.it/1leaxkx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
AI is flooding codebases, and most teams aren’t reviewing it before deploy
42% of devs say AI writes half their code. Are we seriously ready for that?
Cloudsmith recently surveyed 307 DevOps practitioners- not randoms, actual folks in the trenches. Nearly 40% came from orgs with 50+ software engineers, and the results hit hard:
42% of AI-using devs say at least half their code is now AI-generated
Only 67% review AI-generated code before deploy (!!!)
80% say AI is increasing OSS malware risk, especially around dependency abuse
Attackers are shifting tactics, we're seeing increased slopsquatting and poisoning in the supply chain, knowing AI solutions will happily pull in risky packages
As vibe coding takes a bigger seat in the SDLC, we’re seeing speed gains - but also way more blind spots and bad practices. Most teams haven’t locked down artifact integrity, provenance, or automated trust checks in their pipelines.
Cool tech, but without the guardrails, we're just accelerating into a breach.
Does this resonate with you? If so, check out the free survey report today:
https://cloudsmith.com/blog/ai-is-now-writing-code-at-scale-but-whos-checking-it
https://redd.it/1lecppz
@r_devops
42% of devs say AI writes half their code. Are we seriously ready for that?
Cloudsmith recently surveyed 307 DevOps practitioners- not randoms, actual folks in the trenches. Nearly 40% came from orgs with 50+ software engineers, and the results hit hard:
42% of AI-using devs say at least half their code is now AI-generated
Only 67% review AI-generated code before deploy (!!!)
80% say AI is increasing OSS malware risk, especially around dependency abuse
Attackers are shifting tactics, we're seeing increased slopsquatting and poisoning in the supply chain, knowing AI solutions will happily pull in risky packages
As vibe coding takes a bigger seat in the SDLC, we’re seeing speed gains - but also way more blind spots and bad practices. Most teams haven’t locked down artifact integrity, provenance, or automated trust checks in their pipelines.
Cool tech, but without the guardrails, we're just accelerating into a breach.
Does this resonate with you? If so, check out the free survey report today:
https://cloudsmith.com/blog/ai-is-now-writing-code-at-scale-but-whos-checking-it
https://redd.it/1lecppz
@r_devops
Cloudsmith
AI is now writing code at scale - but who’s checking it? | Cloudsmith
As Generative AI (GenAI) reshapes the software development landscape, the risks and complexities around managing what gets built, where it comes from, and how it’s secured are growing just as fast. The Cloudsmith 2025 Artifact Management Report dives into…
Help planning workers
Hey, I am building an App, I need to create jobs and workers for this jobs to update my database.
I do not have experience with jobs, so here is my approach:
- I will use redis to create a job queue
- I will use workers to consume that job queue
What would be better for workers and redis, use my own VPS (starting with 15 dollar month) with docker swarm or k8, or use any Container as a service provider like Fly.io or Railway??
https://redd.it/1lecs0e
@r_devops
Hey, I am building an App, I need to create jobs and workers for this jobs to update my database.
I do not have experience with jobs, so here is my approach:
- I will use redis to create a job queue
- I will use workers to consume that job queue
What would be better for workers and redis, use my own VPS (starting with 15 dollar month) with docker swarm or k8, or use any Container as a service provider like Fly.io or Railway??
https://redd.it/1lecs0e
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Hackathon challenge: Monitor EKS with literally just bash (no joke, it worked)
Had a hackathon last weekend with the theme "simplify the complex" so naturally I decided to see if I could replace our entire Prometheus/Grafana monitoring stack with... bash scripts.
Challenge was: build EKS node monitoring in 48 hours using the most boring tech possible. Rules were no fancy observability tools, no vendors, just whatever's already on a Linux box.
What I ended up with:
DaemonSet running bash loops that scrape /proc
gnuplot for making actual graphs (surprisingly decent)
12MB total, barely uses any resources
Simple web dashboard you can port-forward to
The kicker? It actually monitors our nodes better than some of the "enterprise" stuff we've tried. When CPU spikes I can literally
Judges were split between "this is brilliant" and "this is cursed" lol (TL;DR - I won)
Now I'm wondering if I accidentally proved that we're all overthinking observability. Like maybe we don't need a distributed tracing platform to know if disk is full?
Posted the whole thing here: https://medium.com/@heinancabouly/roll-your-own-bash-monitoring-daemonset-on-amazon-eks-fad77392829e?source=friends\_link&sk=51d919ac739159bdf3adb3ab33a2623e
Anyone else done hackathons that made you question your entire tech stack? This was eye-opening for me.
https://redd.it/1ledzu9
@r_devops
Had a hackathon last weekend with the theme "simplify the complex" so naturally I decided to see if I could replace our entire Prometheus/Grafana monitoring stack with... bash scripts.
Challenge was: build EKS node monitoring in 48 hours using the most boring tech possible. Rules were no fancy observability tools, no vendors, just whatever's already on a Linux box.
What I ended up with:
DaemonSet running bash loops that scrape /proc
gnuplot for making actual graphs (surprisingly decent)
12MB total, barely uses any resources
Simple web dashboard you can port-forward to
The kicker? It actually monitors our nodes better than some of the "enterprise" stuff we've tried. When CPU spikes I can literally
cat the script to see exactly what it's checking.Judges were split between "this is brilliant" and "this is cursed" lol (TL;DR - I won)
Now I'm wondering if I accidentally proved that we're all overthinking observability. Like maybe we don't need a distributed tracing platform to know if disk is full?
Posted the whole thing here: https://medium.com/@heinancabouly/roll-your-own-bash-monitoring-daemonset-on-amazon-eks-fad77392829e?source=friends\_link&sk=51d919ac739159bdf3adb3ab33a2623e
Anyone else done hackathons that made you question your entire tech stack? This was eye-opening for me.
https://redd.it/1ledzu9
@r_devops
Medium
🩺 Roll Your Own Bash Monitoring DaemonSet on Amazon EKS
A sturdy, zero‑vendor‑lock‑in path to cluster observability — no Prometheus, no Grafana, no fuss