Launch darkly rugpull coming
Hey everyone!
If you're using Launch Darkly on their existing user-based pricing scheme, they're moving to a new usage-based pricing.
Upside? Unlimited users.
Downside? They charge per service connection. What's a service connection? Any independent instance of an app connecting to Launch Darkly. For example, a VM, a Kubernetes pod, or a Heroku worker.
They're charging $12/month per service connection ($10 on an annual commitment).
We were paying $10k/annually for user-based pricing. We would pay $45k on the new per-service connection pricing.
For anyone going through the same thing, there are plenty of open source feature flag tools you can use, like Flagsmith. Just deploy them in your infrastructure and call it a day.
https://redd.it/1rr4fen
@r_devops
Hey everyone!
If you're using Launch Darkly on their existing user-based pricing scheme, they're moving to a new usage-based pricing.
Upside? Unlimited users.
Downside? They charge per service connection. What's a service connection? Any independent instance of an app connecting to Launch Darkly. For example, a VM, a Kubernetes pod, or a Heroku worker.
They're charging $12/month per service connection ($10 on an annual commitment).
We were paying $10k/annually for user-based pricing. We would pay $45k on the new per-service connection pricing.
For anyone going through the same thing, there are plenty of open source feature flag tools you can use, like Flagsmith. Just deploy them in your infrastructure and call it a day.
https://redd.it/1rr4fen
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Empowering DevOps Teams
I came across an article sharing how to empower DevOps teams. If you are given the following choices and can pick only one to make your life better, which one would you pick?
1. A good team leader who understands what's going on and cares about his/her team. Pay and workloads remain the same.
2. A better paying job with less stress but you are required to relocate
3. A big promotion with far better pay and perks but with more stress and responsibilities.
https://redd.it/1rr74xm
@r_devops
I came across an article sharing how to empower DevOps teams. If you are given the following choices and can pick only one to make your life better, which one would you pick?
1. A good team leader who understands what's going on and cares about his/her team. Pay and workloads remain the same.
2. A better paying job with less stress but you are required to relocate
3. A big promotion with far better pay and perks but with more stress and responsibilities.
https://redd.it/1rr74xm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
VE-2026-28353 the Trivy security incident nobody is talking about, idk why but now I'm rethinking whether the scanner is even the right fix for container image security
Saw this earlier: https://github.com/aquasecurity/trivy/discussions/10265
pull_request_target misconfiguration, PAT stolen Feb 27, 178 releases deleted March 1, malicious VSCode extension pushed, repo renamed. CVE-2026-28353 filed.
That workflow was in the repo since October 2025. Four months before anyone noticed. Release assets from that whole window are permanently deleted. GPG signing key for Debian/Ubuntu/RHEL may be gone too.
Someone checked the cosign signature on v0.69.2 independently and got private-trivy in the identity field instead of the main repo. Quietly fixed in v0.69.3.
Maintainers confirmed: if you pulled via the install script or get.trivy.dev during that window, those assets cannot be checked. Not "we think they're fine." Cannot be checked.
Scanning for CVEs assumes the pipeline that built the image was clean. If it wasn't, the scan result means nothing.
Am I missing something or is this just not a big deal to people? Because it made me completely rethink how much I trust open source container image pipelines.
Looking at SLSA Level 3 for base images now. Hermetic builds, signed provenance. What are people actually using for distroless container images that ships with that level of build integrity baked in? Not scanners. The images themselves.
And before anyone says just switch to Grype or related, please don't. Same problem. You're still scanning images after the fact with no visibility into how they were built or whether the pipeline that produced them was clean. Another scanner doesn't fix a provenance problem.
https://redd.it/1rqmrhi
@r_devops
Saw this earlier: https://github.com/aquasecurity/trivy/discussions/10265
pull_request_target misconfiguration, PAT stolen Feb 27, 178 releases deleted March 1, malicious VSCode extension pushed, repo renamed. CVE-2026-28353 filed.
That workflow was in the repo since October 2025. Four months before anyone noticed. Release assets from that whole window are permanently deleted. GPG signing key for Debian/Ubuntu/RHEL may be gone too.
Someone checked the cosign signature on v0.69.2 independently and got private-trivy in the identity field instead of the main repo. Quietly fixed in v0.69.3.
Maintainers confirmed: if you pulled via the install script or get.trivy.dev during that window, those assets cannot be checked. Not "we think they're fine." Cannot be checked.
Scanning for CVEs assumes the pipeline that built the image was clean. If it wasn't, the scan result means nothing.
Am I missing something or is this just not a big deal to people? Because it made me completely rethink how much I trust open source container image pipelines.
Looking at SLSA Level 3 for base images now. Hermetic builds, signed provenance. What are people actually using for distroless container images that ships with that level of build integrity baked in? Not scanners. The images themselves.
And before anyone says just switch to Grype or related, please don't. Same problem. You're still scanning images after the fact with no visibility into how they were built or whether the pipeline that produced them was clean. Another scanner doesn't fix a provenance problem.
https://redd.it/1rqmrhi
@r_devops
Advice Wanted Transitioning an internal production tool to Open Source (First-timer)
Hey everyone,
I’m looking for some "war stories" or guidance from people who have successfully moved a project from an internal private repo to a public Open Source project.
The Context:
I started this project as "vibe code", heavy AI-assisted prototyping just to see if a specific automation idea for our clusters would work.
Surprisingly, it scaled well. I’ve spent the last 3 months refactoring it into proper production-grade code, and it’s currently handling our internal workloads without issues.
I’ve want to "donate" this to the community, but since this is my first time acting as a maintainer, I want to do it right the first time. I’ve seen projects fail because of poor Day 1 execution, and I’d like to avoid that.
Specific hurdles I’m looking for help with:
1. Sanitization: Besides .gitignore, what are the best tools for scrub-testing a repo for accidental internal URLs or legacy secrets in the git history before the first public push?
2. Documentation for Strangers: My internal docs assume you know our infrastructure. What’s the "Gold Standard" for a README that makes a cluster tool accessible to someone with zero context?
3. Licensing: For infrastructure/orchestration tools, is Apache 2.0 still the "safe" default, or should I be looking at something else to encourage contribution while protecting the project?
4. Community Building: How do you handle that first "Initial Commit" vs. a "Version 0.1.0" release to get people to actually trust the code?
Please don't downvote, I'm genuinely here to learn the "right" way to contribute back to the ecosystem. If you have a blog post, a checklist, or just a "I wish I knew this before I went public" tip, I’d really appreciate it.
TL;DR: My "vibe code" turned into a production tool. Now I want to open-source it properly. How do I not mess this up?
https://redd.it/1rqoqnz
@r_devops
Hey everyone,
I’m looking for some "war stories" or guidance from people who have successfully moved a project from an internal private repo to a public Open Source project.
The Context:
I started this project as "vibe code", heavy AI-assisted prototyping just to see if a specific automation idea for our clusters would work.
Surprisingly, it scaled well. I’ve spent the last 3 months refactoring it into proper production-grade code, and it’s currently handling our internal workloads without issues.
I’ve want to "donate" this to the community, but since this is my first time acting as a maintainer, I want to do it right the first time. I’ve seen projects fail because of poor Day 1 execution, and I’d like to avoid that.
Specific hurdles I’m looking for help with:
1. Sanitization: Besides .gitignore, what are the best tools for scrub-testing a repo for accidental internal URLs or legacy secrets in the git history before the first public push?
2. Documentation for Strangers: My internal docs assume you know our infrastructure. What’s the "Gold Standard" for a README that makes a cluster tool accessible to someone with zero context?
3. Licensing: For infrastructure/orchestration tools, is Apache 2.0 still the "safe" default, or should I be looking at something else to encourage contribution while protecting the project?
4. Community Building: How do you handle that first "Initial Commit" vs. a "Version 0.1.0" release to get people to actually trust the code?
Please don't downvote, I'm genuinely here to learn the "right" way to contribute back to the ecosystem. If you have a blog post, a checklist, or just a "I wish I knew this before I went public" tip, I’d really appreciate it.
TL;DR: My "vibe code" turned into a production tool. Now I want to open-source it properly. How do I not mess this up?
https://redd.it/1rqoqnz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is it worth taking on a part time Lvl 4 DevOps apprenticeship (UK) as a network design analyst
[](https://www.reddit.com/)Is it worth taking on a part time Lvl 4 DevOps apprenticeship (UK) as a network design analyst.[](https://www.reddit.com/r/devops/?f=flair_name%3A%22Career%20%2F%20learning%22)After 3 years at university I recently landed a graduate role and I’m currently about 6 months into my job as a Network Design Analyst. My role mainly involves supporting commissions and migrations of Fortinet-based networks, working alongside engineers and project teams.
I’m about a month away from sitting my CCNA, and after that my plan was to start working towards Fortinet certifications to deepen my networking knowledge.
My company has offered me the opportunity to do a part-time DevOps Upskiller apprenticeship through Multiverse, which they would fully fund.
My main question is: what are the pros and cons of taking this apprenticeship given the path I’m currently on?
Would it complement a networking career (e.g. automation, infrastructure, cloud), or would it be better to stay focused purely on networking certifications and experience?
I’d be interested to hear from people who have taken a similar path or work in networking / DevOps.
https://redd.it/1rqxc5d
@r_devops
[](https://www.reddit.com/)Is it worth taking on a part time Lvl 4 DevOps apprenticeship (UK) as a network design analyst.[](https://www.reddit.com/r/devops/?f=flair_name%3A%22Career%20%2F%20learning%22)After 3 years at university I recently landed a graduate role and I’m currently about 6 months into my job as a Network Design Analyst. My role mainly involves supporting commissions and migrations of Fortinet-based networks, working alongside engineers and project teams.
I’m about a month away from sitting my CCNA, and after that my plan was to start working towards Fortinet certifications to deepen my networking knowledge.
My company has offered me the opportunity to do a part-time DevOps Upskiller apprenticeship through Multiverse, which they would fully fund.
My main question is: what are the pros and cons of taking this apprenticeship given the path I’m currently on?
Would it complement a networking career (e.g. automation, infrastructure, cloud), or would it be better to stay focused purely on networking certifications and experience?
I’d be interested to hear from people who have taken a similar path or work in networking / DevOps.
https://redd.it/1rqxc5d
@r_devops
Reddit
Reddit is where millions of people gather for conversations about the things they care about, in over 100,000 subreddit communities.
Designing enterprise-level CI/CD access between GitHub <--> AWS
I have an interesting challenge for you today.
Context
I have a GitHub organization with over 80 repositories, and all of these repositories need to access different AWS accounts, more or less 8 to 10 accounts.
Each account has got a different purpose (ie. security, logging, etc).
We have a deployment account that should be the only entry point from where the pipelines should access from.
Constraints
Not all repos should have to have access to all accounts.
Repos should only have access to the account where they should deploy things.
All of the actual provisioning roles (assumed by the pipeline role)( should have least privilege permissions.
The system should scale easily without requiring any manual operations.
How would you guys work around this?
EDIT:
I'm adding additional information to the post not to mislead on what the actual challenge is.
The architecture I already have in mind is:
GitHub Actions -> deployment account OIDC role -> workload account provisioning role
The actual challenge is the control plane behind it:
\- where the repo/env/account mapping lives
\- who creates and owns those roles
\- how onboarding scales for 80+ repos without manual per-account IAM work
\- how to keep workload roles least-privilege without generating an unmaintainable snowflake per repo
I’m leaning toward a central platform repo that owns all IAM/trust relationships from a declarative mapping, and app repos only consume pre-created roles.
So the real question is less “how do I assume a role from GitHub?” and more “how would you design that central access-management layer?”
https://redd.it/1rqwjxt
@r_devops
I have an interesting challenge for you today.
Context
I have a GitHub organization with over 80 repositories, and all of these repositories need to access different AWS accounts, more or less 8 to 10 accounts.
Each account has got a different purpose (ie. security, logging, etc).
We have a deployment account that should be the only entry point from where the pipelines should access from.
Constraints
Not all repos should have to have access to all accounts.
Repos should only have access to the account where they should deploy things.
All of the actual provisioning roles (assumed by the pipeline role)( should have least privilege permissions.
The system should scale easily without requiring any manual operations.
How would you guys work around this?
EDIT:
I'm adding additional information to the post not to mislead on what the actual challenge is.
The architecture I already have in mind is:
GitHub Actions -> deployment account OIDC role -> workload account provisioning role
The actual challenge is the control plane behind it:
\- where the repo/env/account mapping lives
\- who creates and owns those roles
\- how onboarding scales for 80+ repos without manual per-account IAM work
\- how to keep workload roles least-privilege without generating an unmaintainable snowflake per repo
I’m leaning toward a central platform repo that owns all IAM/trust relationships from a declarative mapping, and app repos only consume pre-created roles.
So the real question is less “how do I assume a role from GitHub?” and more “how would you design that central access-management layer?”
https://redd.it/1rqwjxt
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Ask HN / FinOps: How do you actually attribute AI / GPU costs to specific customers or products in multi-tenant SaaS?
Hi there,
I'm digging into billing transparency for AI workloads in multi-tenant systems.
Cloud billing usually shows allocated resources, but mapping real utilization (tokens, GPU time, CPU/RAM usage) to a specific customer or product feature seems surprisingly hard.
Curious how teams handle this in practice:
* How do you attribute infrastructure / AI costs to specific customers?
* Do you track allocation vs real utilization?
* What tools do you use (Kubecost, CloudZero, custom pipelines, etc.)?
Thanks!
https://redd.it/1rqqd78
@r_devops
Hi there,
I'm digging into billing transparency for AI workloads in multi-tenant systems.
Cloud billing usually shows allocated resources, but mapping real utilization (tokens, GPU time, CPU/RAM usage) to a specific customer or product feature seems surprisingly hard.
Curious how teams handle this in practice:
* How do you attribute infrastructure / AI costs to specific customers?
* Do you track allocation vs real utilization?
* What tools do you use (Kubecost, CloudZero, custom pipelines, etc.)?
Thanks!
https://redd.it/1rqqd78
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How to find projects as a Freelancer
I worked with two different companies last year, but neither of them were in my niche. Now I want to find freelance projects specifically in data analytics. However, I’m unsure where to look or how to find such opportunities.
https://redd.it/1rq0apr
@r_devops
I worked with two different companies last year, but neither of them were in my niche. Now I want to find freelance projects specifically in data analytics. However, I’m unsure where to look or how to find such opportunities.
https://redd.it/1rq0apr
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A workflow for encrypted .env files using SOPS + age + direnv for the LLM era
I work on multiple computers, especially when traveling and when coming home, and I don't really want to store .env files for all my projects in my password manager. So I needed a way to store secrets on GitHub, securely. Especially in a world where we vibe code, it's not uncommon that an LLM is going to push your secrets either, so I solved that problem!
Most projects rely on two things:
1.
2.
That's… not great.
So I built a small workflow using SOPS + age + direnv. Now secrets:
- Stay encrypted in git
- Auto-load when entering a project
- Disappear when leaving the directory
- Never exist as plaintext
The entire setup is free, open-source, and takes about five minutes.
I wrote up the full walkthrough here: https://jfmaes.me/blog/stop-committing-your-secrets-you-know-who-you-are/
https://redd.it/1rr3crv
@r_devops
I work on multiple computers, especially when traveling and when coming home, and I don't really want to store .env files for all my projects in my password manager. So I needed a way to store secrets on GitHub, securely. Especially in a world where we vibe code, it's not uncommon that an LLM is going to push your secrets either, so I solved that problem!
Most projects rely on two things:
1.
.env files sitting in plaintext on disk2.
.gitignore not failingThat's… not great.
So I built a small workflow using SOPS + age + direnv. Now secrets:
- Stay encrypted in git
- Auto-load when entering a project
- Disappear when leaving the directory
- Never exist as plaintext
.env filesThe entire setup is free, open-source, and takes about five minutes.
I wrote up the full walkthrough here: https://jfmaes.me/blog/stop-committing-your-secrets-you-know-who-you-are/
https://redd.it/1rr3crv
@r_devops
Jean-Francois Maes
Stop Committing Your Secrets (You Know Who You Are) | Jean-Francois Maes
Plaintext .env files are a stupid little footgun. Here's the SOPS + age + direnv setup I use to keep secrets encrypted, auto-loaded, and out of Git.
I analyzed 1.6M git events to measure what happens when you scale AI code generation without scaling QA. Here are the numbers.
Hi. I've been a dev for 7 years. I worked on an enterprise project where management adopted AI tools aggressively but cut dedicated testers on new features. Within some months the codebase was unrecoverable and in perpetual escalation.
I wanted to understand why, so I built a model and validated it on 27 public repos (FastAPI, Django, React, Spring Boot, etc.) plus that enterprise project. About 1.6 million file touch events total.
Some results:
AI increases gross code generation by about 55%, but without QA the net delivery velocity drops to 0.85x (below the pre AI baseline)
Adding one dedicated tester restores it to 1.32x. ROI roughly 18:1
Unit tests in the enterprise case had the lowest filter effectiveness of the entire pipeline. Code review was slightly better but still insufficient at that volume
The model treats each QA step (unit tests, integration tests, code review, static analysis) as a filter with effectiveness that decays exponentially with volume
Everything is open access on Zenodo with reproducible scripts.
https://zenodo.org/records/18971198
I'm not a mathematician, so I used LLMs to help formalize the ideas into equations and structure the paper. The data, the analysis, and the interpretations are mine.
Would like to hear if this matches what you see in your pipelines. Especially interested in whether teams with strong CI/CD automation still hit the same wall when volume goes up.
https://redd.it/1rrrj0v
@r_devops
Hi. I've been a dev for 7 years. I worked on an enterprise project where management adopted AI tools aggressively but cut dedicated testers on new features. Within some months the codebase was unrecoverable and in perpetual escalation.
I wanted to understand why, so I built a model and validated it on 27 public repos (FastAPI, Django, React, Spring Boot, etc.) plus that enterprise project. About 1.6 million file touch events total.
Some results:
AI increases gross code generation by about 55%, but without QA the net delivery velocity drops to 0.85x (below the pre AI baseline)
Adding one dedicated tester restores it to 1.32x. ROI roughly 18:1
Unit tests in the enterprise case had the lowest filter effectiveness of the entire pipeline. Code review was slightly better but still insufficient at that volume
The model treats each QA step (unit tests, integration tests, code review, static analysis) as a filter with effectiveness that decays exponentially with volume
Everything is open access on Zenodo with reproducible scripts.
https://zenodo.org/records/18971198
I'm not a mathematician, so I used LLMs to help formalize the ideas into equations and structure the paper. The data, the analysis, and the interpretations are mine.
Would like to hear if this matches what you see in your pipelines. Especially interested in whether teams with strong CI/CD automation still hit the same wall when volume goes up.
https://redd.it/1rrrj0v
@r_devops
Zenodo
The AI Quality Paradox: How Code Complexity Drives Rework in AI-Assisted Development
Adopting AI coding tools without proportional QA investment does not accelerate delivery — it amplifies technical debt. We model software development as a coupled ODE system where AI-generated code erodes the team's cognitive validation capacity (σ) at rate…
Need Advice on taking the next good role
I have 2 offers in hand. Both are contract positions for major clients, one being media giant and other being Insurance giant.
\- The media company is offering me a Tech Lead- Infrastructure position to lead their infra/CI-CD/k8s. They are heavy in K8s and multi cloud infra. Things are already in place but still can be further extended based on how I skill up on K8s ecosystem.
\- The insurance company is offering me a AWS DevOps position to lead their infra/CI-CD and other serverless tech. They are pure AWS and yet to transition to containerized workloads. ( I have lot of room to grow here as I can lead many things )
The package offered are almost similar and position is based in NYC.
I am unable to make clear decision as to which one to proceed. What would be pros and cons etc.
Kindly guide me 🙏
https://redd.it/1rrucw3
@r_devops
I have 2 offers in hand. Both are contract positions for major clients, one being media giant and other being Insurance giant.
\- The media company is offering me a Tech Lead- Infrastructure position to lead their infra/CI-CD/k8s. They are heavy in K8s and multi cloud infra. Things are already in place but still can be further extended based on how I skill up on K8s ecosystem.
\- The insurance company is offering me a AWS DevOps position to lead their infra/CI-CD and other serverless tech. They are pure AWS and yet to transition to containerized workloads. ( I have lot of room to grow here as I can lead many things )
The package offered are almost similar and position is based in NYC.
I am unable to make clear decision as to which one to proceed. What would be pros and cons etc.
Kindly guide me 🙏
https://redd.it/1rrucw3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
AWS vs Azure for DevOps transition (6 yrs IT experience) – which is better to start with?
I’m planning to transition into a DevOps / Cloud Engineer role and would like some guidance.
My background:
6 years total experience
4 yrs IT Helpdesk
2 yrs Windows Server & VMware administration (L2, not advance actions)
My plan was to first gain Cloud Engineer experience and then move into DevOps.
Initially I thought Amazon Web Services (AWS) would be the best option since it has a large market share. But it seems entry-level roles are very competitive and expectations are quite high.
Because of that, I’m also considering Microsoft Azure, especially since many companies use Microsoft environments.
For people already working in cloud or DevOps:
1.Which platform is easier to break into for the first cloud role?
2.How does the job demand and competition compare between AWS and Azure?
3.What tools and responsibilities are common in Azure DevOps roles vs AWS-based DevOps?
From a career growth perspective, which would you recommend starting with?
Any insights from real-world experience would be really helpful.
https://redd.it/1rr64lo
@r_devops
I’m planning to transition into a DevOps / Cloud Engineer role and would like some guidance.
My background:
6 years total experience
4 yrs IT Helpdesk
2 yrs Windows Server & VMware administration (L2, not advance actions)
My plan was to first gain Cloud Engineer experience and then move into DevOps.
Initially I thought Amazon Web Services (AWS) would be the best option since it has a large market share. But it seems entry-level roles are very competitive and expectations are quite high.
Because of that, I’m also considering Microsoft Azure, especially since many companies use Microsoft environments.
For people already working in cloud or DevOps:
1.Which platform is easier to break into for the first cloud role?
2.How does the job demand and competition compare between AWS and Azure?
3.What tools and responsibilities are common in Azure DevOps roles vs AWS-based DevOps?
From a career growth perspective, which would you recommend starting with?
Any insights from real-world experience would be really helpful.
https://redd.it/1rr64lo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Roles for those who might be "not good enough" to be DevOps?
2-page resume (not a full CV, as that's 11-pages):
https://imgur.com/a/0yPYHOM
1-page resume (what I usually use to apply for jobs):
https://imgur.com/YnxLDy1
I'm finding myself in a bit of a weird spot, having been laid off in January. My company had me listed even on my offer of employment letter as a "DevOps Engineer", but I suspect they (MSP) paid people in job title inflation rather than a real salary. Because our "SREs" would do things like build a site-to-site VPN entirely using ClickOps in 2 Cloud Platform web consoles rather than do my natural inclination (which is to do it all in Terraform). So in spite of the job title, I never had Software Engineers/Developers to support, and didn't really touch containers or CICD until 1-2 years into the job.
My role was more Ansible-monkey + Packer-monkey than anything else (Cloud Engineer? Infrastructure Engineer?). At best I can write out the Terraform + Ansible code and tie it all together with a Gitlab CI Pipeline so that a junior engineer could adjust some variables, run the pipeline, and about 2 hours later you're looking at a 10-node Splunk cluster deployed (EC2, ALB, Kinesis Firehose, S3, SQS), all required Splunk TA apps installed, ingesting required logs (Cloudwatch => Kinesis, S3 => SQS, etc.) from AWS. Used to need about 150+ allocated hours to do that manually.
But I don't have formal work experience with k8s. And ironically I'm not well-practiced with writing Bash/Python/Powershell because most of my time was spent doing the exact opposite (converting cartoonishly long User Data scripts => Ansible plays, I swear someone tried to install Splunk using 13 Python scripts).
I also trip over Basic Linux CLI questions (I can STIG various Linux distros without bricking them, but I can't tell you by heart which CLI tools to check if "Linux is slow").
So yeah, I'm feeling a bit of imposter syndrome here and wanted to see what roles might suit someone like me (more Ops than Dev) who might not be qualified to be mid-level DevOps Engineer on Day 1 who has to hit the ground running without a full slide backwards into say, Systems Administration?
From what I can tell, Platform Engineer and SRE tends to have harsher Programming requirements.
Cloud Engineer, Infrastructure Engineer, and Linux Administrator tend to have extremely low volume.
"Automation Engineer" tends to be polluted with wrong industry results (Automotive or Manufacturing). "Release Engineer" doesn't seem to have any results (may be Senior-only).
https://redd.it/1rrxfri
@r_devops
2-page resume (not a full CV, as that's 11-pages):
https://imgur.com/a/0yPYHOM
1-page resume (what I usually use to apply for jobs):
https://imgur.com/YnxLDy1
I'm finding myself in a bit of a weird spot, having been laid off in January. My company had me listed even on my offer of employment letter as a "DevOps Engineer", but I suspect they (MSP) paid people in job title inflation rather than a real salary. Because our "SREs" would do things like build a site-to-site VPN entirely using ClickOps in 2 Cloud Platform web consoles rather than do my natural inclination (which is to do it all in Terraform). So in spite of the job title, I never had Software Engineers/Developers to support, and didn't really touch containers or CICD until 1-2 years into the job.
My role was more Ansible-monkey + Packer-monkey than anything else (Cloud Engineer? Infrastructure Engineer?). At best I can write out the Terraform + Ansible code and tie it all together with a Gitlab CI Pipeline so that a junior engineer could adjust some variables, run the pipeline, and about 2 hours later you're looking at a 10-node Splunk cluster deployed (EC2, ALB, Kinesis Firehose, S3, SQS), all required Splunk TA apps installed, ingesting required logs (Cloudwatch => Kinesis, S3 => SQS, etc.) from AWS. Used to need about 150+ allocated hours to do that manually.
But I don't have formal work experience with k8s. And ironically I'm not well-practiced with writing Bash/Python/Powershell because most of my time was spent doing the exact opposite (converting cartoonishly long User Data scripts => Ansible plays, I swear someone tried to install Splunk using 13 Python scripts).
I also trip over Basic Linux CLI questions (I can STIG various Linux distros without bricking them, but I can't tell you by heart which CLI tools to check if "Linux is slow").
So yeah, I'm feeling a bit of imposter syndrome here and wanted to see what roles might suit someone like me (more Ops than Dev) who might not be qualified to be mid-level DevOps Engineer on Day 1 who has to hit the ground running without a full slide backwards into say, Systems Administration?
From what I can tell, Platform Engineer and SRE tends to have harsher Programming requirements.
Cloud Engineer, Infrastructure Engineer, and Linux Administrator tend to have extremely low volume.
"Automation Engineer" tends to be polluted with wrong industry results (Automotive or Manufacturing). "Release Engineer" doesn't seem to have any results (may be Senior-only).
https://redd.it/1rrxfri
@r_devops
Ingress NGINX EOL this month — what runway are teams giving themselves to migrate?
Ingress NGINX reaches end of support this month, and I'm guessing there's still thousands of clusters still running it in production.
Curious what runway teams are giving themselves to migrate off of it?
For lots of orgs I've worked with, Ingress NGINX has been the default for years. With upstream maintenance coming to a halt, many teams are evaluating alternatives.
Traefik
HAProxy Ingress
AWS ALB Controller (for EKS)
Gateway API
What's the sentiment around these right now? Are any of them reasonably close to a drop in replacements for existing clusters?
Also wondering if some orgs will end up doing what we see with other projects that go EOL and basically run a supported fork or extended maintenance version while planning a slower migration.
https://redd.it/1rr49pn
@r_devops
Ingress NGINX reaches end of support this month, and I'm guessing there's still thousands of clusters still running it in production.
Curious what runway teams are giving themselves to migrate off of it?
For lots of orgs I've worked with, Ingress NGINX has been the default for years. With upstream maintenance coming to a halt, many teams are evaluating alternatives.
Traefik
HAProxy Ingress
AWS ALB Controller (for EKS)
Gateway API
What's the sentiment around these right now? Are any of them reasonably close to a drop in replacements for existing clusters?
Also wondering if some orgs will end up doing what we see with other projects that go EOL and basically run a supported fork or extended maintenance version while planning a slower migration.
https://redd.it/1rr49pn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Sonatype Nexus Repository CE
Hey folks, I'm trying to evaluate the "new" Sonatype Nexus Community Edition.
However, the download page at https://www.sonatype.com/products/nexus-community-edition-download requires me to insert all sort of personal details (including the company name, what if I don't have one lol).
Understandably, I could insert random data, but I'm not sure if the download link is then sent to the email address.
That you know of, is there a known direct download link? Sonatype's website must be purposedly indexed like crap because I can't find anything useful there.
https://redd.it/1ryalu1
@r_devops
Hey folks, I'm trying to evaluate the "new" Sonatype Nexus Community Edition.
However, the download page at https://www.sonatype.com/products/nexus-community-edition-download requires me to insert all sort of personal details (including the company name, what if I don't have one lol).
Understandably, I could insert random data, but I'm not sure if the download link is then sent to the email address.
That you know of, is there a known direct download link? Sonatype's website must be purposedly indexed like crap because I can't find anything useful there.
https://redd.it/1ryalu1
@r_devops
Sonatype
Download Nexus Repository Community Edition | Sonatype
Get your free download of Sonatype Nexus Repository Community. Manage all your components from the world's #1 binary artifact repository.
How do you keep track of which repos depend on which in a large org?
I work in an infrastructure automation team at a large org (\~hundreds of repos across GitLab). We build shared Docker images, reusable CI templates, Terraform modules, the usual stuff.
A challenge I've seen is: someone pushes a breaking change to a shared Docker image or a Terraform module, and then pipelines in other repos start failing. We don't have a clear picture of "if I change X, what else is affected." It's mostly "tribal knowledge". A few senior engineers know which repos depend on what, but that's it. New people are completely lost.
We've looked at GitLab's dependency scanning but that's focused on CVEs in external packages, not internal cross-repo stuff. We've also looked at Backstage but the idea of manually writing YAML for every dependency relationship across hundreds of repos feels like it defeats the purpose.
How do you handle this? Do you have some internal tooling, a spreadsheet, or do you just accept that stuff breaks and fix it after the fact?
Curious how other orgs deal with this at scale.
https://redd.it/1ry0edd
@r_devops
I work in an infrastructure automation team at a large org (\~hundreds of repos across GitLab). We build shared Docker images, reusable CI templates, Terraform modules, the usual stuff.
A challenge I've seen is: someone pushes a breaking change to a shared Docker image or a Terraform module, and then pipelines in other repos start failing. We don't have a clear picture of "if I change X, what else is affected." It's mostly "tribal knowledge". A few senior engineers know which repos depend on what, but that's it. New people are completely lost.
We've looked at GitLab's dependency scanning but that's focused on CVEs in external packages, not internal cross-repo stuff. We've also looked at Backstage but the idea of manually writing YAML for every dependency relationship across hundreds of repos feels like it defeats the purpose.
How do you handle this? Do you have some internal tooling, a spreadsheet, or do you just accept that stuff breaks and fix it after the fact?
Curious how other orgs deal with this at scale.
https://redd.it/1ry0edd
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Added a lightweight AWS/Azure hygiene scan to our CI - sharing the 20 rules we check
We’ve been trying to keep our AWS and Azure environments a bit cleaner without adding heavy tooling, so we built a small read‑only scanner that runs in CI and evaluates a conservative set of hygiene rules. The focus is on high‑signal checks that don’t generate noise in IaC‑driven environments.
It’s packaged as a Docker image and a GitHub Action so it’s easy to drop into pipelines. It assumes a read‑only role and just reports findings - no write permissions.
https://github.com/cleancloud-io/cleancloud
Docker Hub: https://hub.docker.com/r/getcleancloud/cleancloud
docker run getcleancloud/cleancloud:latest scan
GitHub Marketplace: https://github.com/marketplace/actions/cleancloud-scan
yaml
- uses: cleancloud-io/scan-action@v1
with:
provider: aws
all-regions: 'true'
fail-on-confidence: HIGH
fail-on-cost: '100'
output: json
output-file: scan-results.json
# 20 rules across AWS and Azure
Conservative, high‑signal, designed to avoid false positives in IaC environments.
# AWS (10 rules)
Unattached EBS volumes (HIGH)
Old EBS snapshots
CloudWatch log groups with infinite retention
Unattached Elastic IPs (HIGH)
Detached ENIs
Untagged resources
Old AMIs
Idle NAT Gateways
Idle RDS instances (HIGH)
Idle load balancers (HIGH)
# Azure (10 rules)
Unattached managed disks
Old snapshots
Unused public IPs (HIGH)
Empty load balancers (HIGH)
Empty App Gateways (HIGH)
Empty App Service Plans (HIGH)
Idle VNet Gateways
Stopped (not deallocated) VMs (HIGH)
Idle SQL databases (HIGH)
Untagged resources
Rules without a confidence marker are MEDIUM \- they use time‑based heuristics or multiple signals. We started by failing CI only on
We're also adding multi‑account scanning (AWS Organizations + Azure Management Groups) in the next few days, since that’s where most of the real‑world waste tends to hide.
Curious how others are handling lightweight hygiene checks in CI and what rules you consider “must‑have” in your setups.
https://redd.it/1rxuyet
@r_devops
We’ve been trying to keep our AWS and Azure environments a bit cleaner without adding heavy tooling, so we built a small read‑only scanner that runs in CI and evaluates a conservative set of hygiene rules. The focus is on high‑signal checks that don’t generate noise in IaC‑driven environments.
It’s packaged as a Docker image and a GitHub Action so it’s easy to drop into pipelines. It assumes a read‑only role and just reports findings - no write permissions.
https://github.com/cleancloud-io/cleancloud
Docker Hub: https://hub.docker.com/r/getcleancloud/cleancloud
docker run getcleancloud/cleancloud:latest scan
GitHub Marketplace: https://github.com/marketplace/actions/cleancloud-scan
yaml
- uses: cleancloud-io/scan-action@v1
with:
provider: aws
all-regions: 'true'
fail-on-confidence: HIGH
fail-on-cost: '100'
output: json
output-file: scan-results.json
# 20 rules across AWS and Azure
Conservative, high‑signal, designed to avoid false positives in IaC environments.
# AWS (10 rules)
Unattached EBS volumes (HIGH)
Old EBS snapshots
CloudWatch log groups with infinite retention
Unattached Elastic IPs (HIGH)
Detached ENIs
Untagged resources
Old AMIs
Idle NAT Gateways
Idle RDS instances (HIGH)
Idle load balancers (HIGH)
# Azure (10 rules)
Unattached managed disks
Old snapshots
Unused public IPs (HIGH)
Empty load balancers (HIGH)
Empty App Gateways (HIGH)
Empty App Service Plans (HIGH)
Idle VNet Gateways
Stopped (not deallocated) VMs (HIGH)
Idle SQL databases (HIGH)
Untagged resources
Rules without a confidence marker are MEDIUM \- they use time‑based heuristics or multiple signals. We started by failing CI only on
HIGH confidence, then tightened things as teams validated.We're also adding multi‑account scanning (AWS Organizations + Azure Management Groups) in the next few days, since that’s where most of the real‑world waste tends to hide.
Curious how others are handling lightweight hygiene checks in CI and what rules you consider “must‑have” in your setups.
https://redd.it/1rxuyet
@r_devops
GitHub
GitHub - cleancloud-io/cleancloud: Shift-left cloud hygiene engine for AWS and Azure for regular and sovereign environments. Catch…
Shift-left cloud hygiene engine for AWS and Azure for regular and sovereign environments. Catch waste in CI - read-only, deterministic, zero telemetry. - cleancloud-io/cleancloud
Looking for a rolling storage solution
Where I work we have a lot of data that's stored in some file shares in an on-prem set of devices. We are unfortunately repeatedly running into storage limits and because of the current price of everything, expansion might not be possible.
What I'm looking for is something that can look at all of these SAN devices, find files that have not been read or modified in X days, and archive that data to the cloud, similar to how s3 has lifecycles that can progressively move cold data to colder storage. I want our on-prem SANs to be hot and cloud storage to get progressively colder. And just as s3 does it, I want reads and write to be transparent.
Budgets are tight, but my time is not. I'm not afraid to learn and deploy some open source software that fulfills these requirements, but I don't know what that software is. If I have to buy something, I would prefer to be able to configure it with terraform.
Thanks in advance for your suggestions!
https://redd.it/1rxuc40
@r_devops
Where I work we have a lot of data that's stored in some file shares in an on-prem set of devices. We are unfortunately repeatedly running into storage limits and because of the current price of everything, expansion might not be possible.
What I'm looking for is something that can look at all of these SAN devices, find files that have not been read or modified in X days, and archive that data to the cloud, similar to how s3 has lifecycles that can progressively move cold data to colder storage. I want our on-prem SANs to be hot and cloud storage to get progressively colder. And just as s3 does it, I want reads and write to be transparent.
Budgets are tight, but my time is not. I'm not afraid to learn and deploy some open source software that fulfills these requirements, but I don't know what that software is. If I have to buy something, I would prefer to be able to configure it with terraform.
Thanks in advance for your suggestions!
https://redd.it/1rxuc40
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Has anyone actually used Port1355? Worth it or just hype?
Has anyone here actually used this? Is it worth trying?
I know I could just search or ask AI, but I’m more interested in hearing from real people who have used it and seen actual benefits.
Not just something that’s “nice to have,” but something genuinely useful.
https://port1355.dev/
https://redd.it/1ryrj1p
@r_devops
Has anyone here actually used this? Is it worth trying?
I know I could just search or ask AI, but I’m more interested in hearing from real people who have used it and seen actual benefits.
Not just something that’s “nice to have,” but something genuinely useful.
https://port1355.dev/
https://redd.it/1ryrj1p
@r_devops
portless
portless | Named .localhost URLs for Development
Replace port numbers with stable, named .localhost URLs. For humans and agents.
I calculated how much my CI failures actually cost
I calculated how much failed CI runs cost over the last month - the number was worse than I expected.
I've been tracking CI metrics on a monorepo pipeline that runs on self-hosted 2xlarge EC2 spot instances (we need the size for several of the jobs). The numbers were worse than I expected.
It's a build and test workflow with 20+ parallel jobs per run - Docker image builds, integration tests, system tests. Over about 1,300 runs the success rate was 26%. 231 failed, 428 cancelled, 341 succeeded. Average wall-clock time per run is 43 minutes, but the actual compute across all parallel jobs averages 10 hours 54 minutes. Total wasted compute across failed and cancelled runs: 208 days. So almost exactly half of all compute produced nothing.
That 43 min to 11 hour gap is what got me. Each run feels like 43 minutes but it's burning nearly 11 hours of EC2 time across all the parallel jobs. 15x multiplier.
On spot 2xlarge instances at ~$0.15/hr, 208 days of waste works out to around $750. On-demand would be 2-3x that. Not great, but honestly the EC2 bill is the small part.
The expensive part is developer time. Every failed run means someone has to notice it, dig through logs across 20+ parallel jobs, figure out if it's their code or a flaky test or infra, fix it or re-run, wait another 43 minutes, then context-switch back to what they were doing before. At a 26% success rate that's happening 3 out of every 4 runs. If you figure 10 min of developer time per failure at $100/hr loaded cost, the 659 failed+cancelled runs cost something like $11K in engineering time. The $750 EC2 bill barely registers.
A few things surprised me:
The cancelled runs (428) actually outnumber the failed runs (231). They have concurrency groups set up, so when a dev pushes a new commit before the last build finishes the old run gets cancelled. Makes sense as a policy, but it means a huge chunk of compute gets thrown away mid-run. Also, at 26% success rate the CI isn't really a safety net anymore — it's a bottleneck. It's blocking shipping more than it's catching bugs. And nobody noticed because GitHub says "43 minutes per run" which sounds totally fine.
Curious what your pipeline success rate looks like. Has anyone else tracked the actual wasted compute time?
https://redd.it/1rxlfxd
@r_devops
I calculated how much failed CI runs cost over the last month - the number was worse than I expected.
I've been tracking CI metrics on a monorepo pipeline that runs on self-hosted 2xlarge EC2 spot instances (we need the size for several of the jobs). The numbers were worse than I expected.
It's a build and test workflow with 20+ parallel jobs per run - Docker image builds, integration tests, system tests. Over about 1,300 runs the success rate was 26%. 231 failed, 428 cancelled, 341 succeeded. Average wall-clock time per run is 43 minutes, but the actual compute across all parallel jobs averages 10 hours 54 minutes. Total wasted compute across failed and cancelled runs: 208 days. So almost exactly half of all compute produced nothing.
That 43 min to 11 hour gap is what got me. Each run feels like 43 minutes but it's burning nearly 11 hours of EC2 time across all the parallel jobs. 15x multiplier.
On spot 2xlarge instances at ~$0.15/hr, 208 days of waste works out to around $750. On-demand would be 2-3x that. Not great, but honestly the EC2 bill is the small part.
The expensive part is developer time. Every failed run means someone has to notice it, dig through logs across 20+ parallel jobs, figure out if it's their code or a flaky test or infra, fix it or re-run, wait another 43 minutes, then context-switch back to what they were doing before. At a 26% success rate that's happening 3 out of every 4 runs. If you figure 10 min of developer time per failure at $100/hr loaded cost, the 659 failed+cancelled runs cost something like $11K in engineering time. The $750 EC2 bill barely registers.
A few things surprised me:
The cancelled runs (428) actually outnumber the failed runs (231). They have concurrency groups set up, so when a dev pushes a new commit before the last build finishes the old run gets cancelled. Makes sense as a policy, but it means a huge chunk of compute gets thrown away mid-run. Also, at 26% success rate the CI isn't really a safety net anymore — it's a bottleneck. It's blocking shipping more than it's catching bugs. And nobody noticed because GitHub says "43 minutes per run" which sounds totally fine.
Curious what your pipeline success rate looks like. Has anyone else tracked the actual wasted compute time?
https://redd.it/1rxlfxd
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
New junior DevOps engineer - the best way to succeed
Hi guys, I started to work as a junior DevOps engineer 9 days ago, before that I finished colleague and worked 1 year as a System administrator T1.
Now, I have my own dedicated mentor/buddy and first few days were like really awesome, he wanted to help with information and everything but in the last few days it's like some really weird feedback with some blaming vibe of how I don't know something - and I'm not asking silly things, like before running any plan or apply script in our CI/CD pipeline - because I don't want to destroy anything and similar situations, now, he already told that to our team lead which makes me a bit worried/scared on how to proceed, because I do believe it's a smart thing to not be a hero, but on the other hand, if questions in first few weeks-even months would be considered "how come you don't know that" for a person that never worked on this position and reported to TL I'm really confused on what to ask and approach.
Also, documentation almost don't exist, as seniors were leaving the company documentation wasn't built and now too many of them left and few that are here are not having time to do it because of their work which I can understand. One feedback that I also got was that why I don't ask questions on daily meetings when he is explaining something - well how should I ask if even in dm he seems to be a bit unwilling to help. My bf is telling me that situations like this never got any better for him in the past so he is saying that I should already chasing another opportunity while working on this passive.
I don't know, I don't like quitting at all, and it's really a great opportunity, but I never had situation like this.
And yeah, colleague, courses, certs and even my own projects are basically just a scratch when you come into production, like the only thing is helping me are some commands around terminal haha.
https://redd.it/1rxipdh
@r_devops
Hi guys, I started to work as a junior DevOps engineer 9 days ago, before that I finished colleague and worked 1 year as a System administrator T1.
Now, I have my own dedicated mentor/buddy and first few days were like really awesome, he wanted to help with information and everything but in the last few days it's like some really weird feedback with some blaming vibe of how I don't know something - and I'm not asking silly things, like before running any plan or apply script in our CI/CD pipeline - because I don't want to destroy anything and similar situations, now, he already told that to our team lead which makes me a bit worried/scared on how to proceed, because I do believe it's a smart thing to not be a hero, but on the other hand, if questions in first few weeks-even months would be considered "how come you don't know that" for a person that never worked on this position and reported to TL I'm really confused on what to ask and approach.
Also, documentation almost don't exist, as seniors were leaving the company documentation wasn't built and now too many of them left and few that are here are not having time to do it because of their work which I can understand. One feedback that I also got was that why I don't ask questions on daily meetings when he is explaining something - well how should I ask if even in dm he seems to be a bit unwilling to help. My bf is telling me that situations like this never got any better for him in the past so he is saying that I should already chasing another opportunity while working on this passive.
I don't know, I don't like quitting at all, and it's really a great opportunity, but I never had situation like this.
And yeah, colleague, courses, certs and even my own projects are basically just a scratch when you come into production, like the only thing is helping me are some commands around terminal haha.
https://redd.it/1rxipdh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community