Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Interviewed somebody today; lots of skills, not much person

I interviewed a person today for a DevOps role. His resume was very thick with technical things. Software he's used, frameworks, programming languages, security and compliance regulations, standards, etc. There was not much about how he worked with those things, what he did with them, which bits he was more familiar with and less familiar with.

I tried to get an idea about what kind of techie he is. Did he learn these things on his own? Or is he driven more by learning things as needed for the job? Has he designed anything on his own? Is he lawful good or chaotic neutral or...? Etc.

The answers I got made it feel like most of what he's done is work where someone else directed him, he coordinated with other teams, used vendor tools with pre-determined actions, ran scripts, etc. This is okay, since this wasn't for a senior role. But it made me think about how important it is, as a job seeker, to give a potential employer an idea of what kind of work you do. It's not just about checking boxes or flexing on hard skills, but showing that you're a person as well. Especially since these days everyone's on the lookout for AI chatbot answers. In this case, maybe he was just nervous. Maybe he's not good in formal situations. Or maybe he's just "not a good fit", as they say.

https://redd.it/1rfr007
@r_devops
Lucrative DevOps Fields/Jobs?

Based on your experience, what DevOps positions tend to pay high salaries(250k+)?

I come from a networking background but since then ive made the switch to devops. Back then in the networking space if you wanted to make a lot of money you would get a CCIE certification and try to work at a networking vendor such as Cisco,Arista, and Juniper. There's also the option of working high frequency trading companies where stress levels are high but so is the pay..

Whats the equivalent for DevOps?

Do companies like AWS pay their in-house DevOps engineers a lot? What skills does the industry value to command that type of pay? Are there high paying DevOps vendors out there? I know certifications arent really valued anymore like they used to be.

https://redd.it/1rfvwf4
@r_devops
Helm in production: lessons and gotchas

Hi everyone! I've been using Helm in production at scale for the past few years and collected lessons and gotchas that surprised me:

- Helm doesn't manage CRDs.
- --wait doesn't wait for readiness of all resources.
- Dry run is dependent on the state of an existing release.
- Values can be validated with JSON schema.
- OCI registries can be used for charts alongside container images.

I think the tip about values validation is the coolest, because loading the schema into yaml-language-server is a great development experience boost and helps LLMs do better work writing values.

Hope you find this post useful, I think even experienced Helm users can learn something from it.

https://redd.it/1rgdp5x
@r_devops
ECS CICD Rollback?

Hi Guys! What could be the best way to rollback on ECS CICD , do I describe last active task definition then rerun but it will give diff in GitHub task definition, or just revert back to last successful action I think this would be better or any other solution to it?

any blogs or suggestions would be great

https://redd.it/1rfx80d
@r_devops
What is platform engineering exactly?

Every time I tell someone what I like and how I think, they end up in some way or another recommending platform engineering.

For example I’ve always wanted to contribute to open source projects I liked but always thought I wasn’t technically there to help outside infra and cloud, which prompted another “PE is perfect” and every explanation I get is different, and not closely different but can be categorized as a different role

I won’t make the post long by explaining what exactly I like and what I don’t but I want to know what is it to maybe understand why it’s been recommended so much to me. I’d also appreciate some examples of the output of such a role compared to the normal DevOps for example.

https://redd.it/1rhefsl
@r_devops
Cloud Engineer roadmap check: Networking + Linux completed, next steps?

I’m transitioning to Cloud Engineering from scratch. I’ve completed basic networking (TCP/IP, DNS, subnetting) and Linux fundamentals (CLI, file permissions, processes). I’m currently learning Git and GitHub. My goal is to get a junior cloud role in 6–9 months. What should I focus on next.

https://redd.it/1rezupb
@r_devops
CleanCloud v1.6.3 - 20 rules to find what's costing you money in AWS/Azure

A while ago I posted about CleanCloud \- a shift-left cloud waste report tool enforces hygiene as a CI/CD gate, now with cost estimates and --fail-on-cost CLI option

AWS Rules (10):

1. Unattached EBS volumes (HIGH)
2. Old EBS snapshots
3. Infinite retention logs
4. Unattached Elastic IPs (HIGH)
5. Detached ENIs
6. Untagged resources
7. Old AMIs
8. Idle NAT Gateways
9. Idle RDS instances (HIGH)
10. Idle load balancers (HIGH)

Azure Rules (10):

1. Unattached Managed Disks
2. Old Snapshots
3. Unused Public IPs
4. Empty Load Balancers
5. Empty Application Gateways
6. Empty App Service Plans
7. Idle VNet Gateways
8. Stopped (Not Deallocated) VMs — still incurring full compute charges
9. Idle SQL Databases (zero connections 14+ days)
10. Untagged Resources

Every finding includes:
\- Confidence level (HIGH / MEDIUM)
\- Evidence and signals used
\- Resource details and age
\- Cost waste estimates

Enforce in CI/CD:

cleancloud scan --provider aws --all-regions --fail-on-confidence HIGH --fail-on-cost 2000

Exit 0 = pass.

Exit 2 = policy violation.

pipx install cleancloud and run your first scan in 5 minutes.

If you’re one of the 200+ users who have downloaded CleanCloud, we’d love to hear what you found.

Please open an issue here or leave a comment below.

https://redd.it/1rf84m8
@r_devops
27001 didn’t change our stack but it sure as hell changed our discipline

We missed two deals so it finally made sense to leadership to pursue ISO 27001.


We did end up tightening parts of our stack. A few workflows became more structured, some things moved out of people’s heads and into systems but that wasn’t the real shift even though they definitely had their own positive sides to it.


The uncomfortable part was answering some questions we’d never formally defined. A lot of our processes were muscle memory and ISO forced us to define them, assign ownership and create review cadence.


The discipline we gained changed everything.

https://redd.it/1reqg60
@r_devops
Why does docker output everything to standard error?

Everytime I look inside my github wrokflows I see everything outputted to stderr, why does this happen?


Thank you!

https://redd.it/1rhts32
@r_devops
Build a website for DevOps Learning

Hey folks
After a long time, I finally rebuilt (vibe-coded ) and revamped one of my old projects DevOps Atlas.
It’s basically a one-stop search engine for DevOps learning resources.
The goal is simple:
Help DevOps engineers discover high-quality learning resources without endless searching.
Any suggestions and feedback are most welcome. Check it out at https://devopsatlas.com/ and let me know what you think!

https://redd.it/1rhwo1p
@r_devops
hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions - Microsoft, DataDog, and CNCF Projects Hit So Far

https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation#attack-6-aquasecuritytrivy---evidence-cleared

Now trivy repo is empty.... https://github.com/aquasecurity/trivy

some advices :

1. Verify the integrity of your Trivy binaries if installed at the end of February
2. Switch to the Docker image (if still available on GHCR/Docker Hub), verify Cosign signatures
3. Keep Checkov or Grype as a fallback
4. Audit your GitHub Actions workflows: no pull_request_target + checkout of the fork, no unescaped ${{ }} in run blocks:

https://redd.it/1ri4nwu
@r_devops
I parsed cloud Interview questions

Hey Folks,

Last time I published my 100 interview questions. I've added 10 more new question from Glassdoor reviews covering Cloud.

Companies are Amazon, Accenture, Kayak, Adobe, Autodesk, EPAM, Lyft, Twitch, Coinbase.
These are AWS questions, I've added Videos for them as well.

https://github.com/devops-interviews/devops-interview-questions

Nothing on github is paywalled. If you ever feel like thanking me just star the repo. Thanks

https://redd.it/1ro861x
@r_devops
I made an interactive progressive roadmap for new DevOps Engineers

TL;DR

The Roadmap [https://roadmap.esc.sh/](https://roadmap.esc.sh/)
Source : https://github.com/MansoorMajeed/infra-roadmap
Blog Post (the philosophy for learning SRE/DevOps) : [https://blog.esc.sh/sre-devops-roadmap/](https://blog.esc.sh/sre-devops-roadmap/)


I have been an SRE for over a decade, and I’ve mentored a lot of junior engineers. The single biggest hurdle they all face is that the DevOps/SRE field is just incredibly overwhelming to beginners.

Many juniors make the mistake of jumping straight into learning tools (Docker, K8s, Terraform) without actually understanding
what problems those tools were built to solve or how they fit together or the foundation of it all itself. If we look at traditional DevOps roadmaps or the CNCF landscape, it often makes the problem worse. It’s just a massive bingo card of logos that doesn't explain the "why" behind anything.

So, I decided to build a better way to visualize this: an interactive, progressive roadmap.

How it’s different:

Question-Driven: Each different node follows a general thought or question a new engineer may have and lets them choose the next path that they find interesting
Progressive Disclosure: It doesn't show you 200 tools at once. The map expands as you explore, keeping cognitive load low.
Open Source & Static: It’s a fully offline, static site.



Note about how it was made: I am an SRE, not a frontend dev (I still struggle with frontend and I decided that it is not my cup of tea), so I used Claude to help write the React Flow/Next.js engine and some boilerplate text. However, the architecture, the paths, the connections, and the core learning flow are 100% my own design based on my experience. Because of that, it might be biased or missing things, so PRs are more than welcome!

I also wrote a short blog post expanding on why I think we need to teach "concepts over tools" if anyone is interested in the philosophy behind it. https://blog.esc.sh/sre-devops-roadmap/

I hope this helps some of the juniors build a mental model. Would love to hear your feedback!

I am also happy to answer any questions any new folks may have!

https://redd.it/1rojfho
@r_devops
DevOps to Build/Release Eng

So I needed to find a full remote role because my current hybrid arrangement isn’t gonna work out moving forward. I ended up receiving an offer for a build and release engineer position.

My background is in traditional DevOps, supporting developers and their CI pipelines which I do enjoy. The toolset is: GitHub actions, AWS, EKS runner infra.

This new position is more like technical program/project management. I’ll be responsible for what releases go out the door, managing the GitHub branching strategy, and also owning the CI/CD pipelines + release automation.

The new role is a +20% TC, full remote position. Has anyone else made this transition? Loved it? Hated it? Interested to hear your experiences.

https://redd.it/1roke6e
@r_devops
I'm looking to move to a proper devops/platform engineer role

I don't know if its a right place for me to make this post ...
but i have been loking for a job change ...my roles have been mixed like initially i worked as devops engineer for two years then was moved to cloud migration then cloud operations mainly in azure ....i have knowledge in terraform for infrastructure provisioning(mainly virtual machines) jenkins from previous experience python scripting kubernetes (AKS) docker azure devops pipelines its like i know a little bit of everything but not enough so does anyone know how to permanently switch to devops platform engineering?

im stuck i blew of an interview at round 2 because i didn't know system design much so i don't know i would appreciate any sort of help

I don't know where to start wat tools to stick too n learn properly ?

https://redd.it/1roj11d
@r_devops
Choosing DNS to host

I am designing environment for malware simulation where it uses DNS tunneling to export data bypassing the firewall. For this I need to host an internal authoritative DNS for a dummy domain that would cache requests with encoded information.

Do you have any recommendations which software to use for it? I’m leaning towards bind9 on Debian host, but I’m not sure if it’s not an overkill since it’s an enterprise-grade solution and all I’m doing is a simple demo.

The infra runs on multi node proxmox and I use OPNSense for firewall if it matters.

https://redd.it/1rnghlb
@r_devops
AI’s Impact on DevOps: Opportunities and Challenges

Read this article -- https://medium.com/@averageguymedianow/ais-impact-on-devops-opportunities-and-challenges-6cdba7a5a45e.

What really caught my eyes is this statement:

"Integrating AI into DevOps workflows introduces significant complexity. Teams must now understand not only traditional infrastructure and application concerns but also machine learning models, training data requirements, model versioning, and AI-specific monitoring needs. This complexity can create new forms of technical debt when AI systems are implemented without proper governance or understanding."

From what I'm seeing, technical debt keeps piling up.


https://redd.it/1rocti9
@r_devops
Complete Guide to Building a CLI

In this article, I’ll cover a complete guide on how to build a professional CLI (Command Line Interface) that is easy to use and, most importantly, easy to integrate with other applications. If you’ve never built a CLI before, don’t worry — we’ll start from scratch.

https://vibelog.mateusmoutinho.com.br/en/article?date=2026/03/07&id=cli-guide/

https://redd.it/1ro0rqk
@r_devops
Hands-on with OVHcloud Managed Kubernetes

Been testing EU managed k8s providers one by one for eucloudcost.com, OVH was next.

Short version: it just works.

Free control plane, free egress in EU regions. You only pay for nodes. Coming from AWS this feels wrong somehow.

I also managed to set both vRack subnets to no_gateway = true and then spent an hour wondering why Traefik was stuck in Pending. Turns out Octavia needs a gateway on the load balancer subnet. Anyway.

Main issue is no RWX volumes out of the box. File Storage for RWX exists but starts at 150 GiB which is overkill for most things, so out of the Box only RWO exists ...

Also they burned down a datacenter in 2021 so now every resource in the console shows you the AZ deployment mode.

Put together a reference repo with the full OpenTofu setup if you want a starting point: https://github.com/mixxor/opentofu-kubernetes-ovhcloud

Full writeup in comments.

Anyone else running OVHcloud in prod / dev ?
Curious if you hit anything weird I missed...

https://redd.it/1rmp4f9
@r_devops
Would you be interested in official r/DevOps Discord server ?

Hi r/devops,

Would you be interested in having a community Discord server related to the subreddit?

This is simply an open discussion to gauge interest.. please comment your opinion.

https://redd.it/1rnnxq8
@r_devops
finally stopped manually SSH-ing to deploy my code. I built a simple CI/CD pipeline and it saved my sanity.

Last month, I spent 3 hours debugging a broken deployment on a Friday at midnight.

For context, I’m building a full-stack ERP (TypeScript, Node.js, React). Every time I wanted to ship a new feature, my routine was: open terminal -> SSH into my DigitalOcean Droplet -> git pull \-> npm install \-> npm run build \-> restart PM2 -> pray Nginx doesn’t throw a 500 error. It took way too long and was super prone to typos.

I finally decided to automate it. I drew up this architecture [ATTACH YOUR EXCALIDRAW IMAGE HERE\] and wrote a GitHub Actions .yml file.

Now, my workflow is just:

1. git push origin main
2. GH Actions sets up a Node environment, installs dependencies, and runs the TS build (to catch errors early).
3. If it passes, an SSH action connects to my Droplet, pulls the code, and restarts PM2.

Total time: \~30 seconds. Zero manual work. I deployed 3 times today in my pajamas.

I was debating between Jenkins and GitHub Actions, but GH Actions felt like the frictionless choice since my code is already there. For the senior DevOps folks here: at what scale do you usually outgrow GitHub Actions and move to something like Jenkins? Any security flaws in my current setup I should be aware of?

https://redd.it/1rp1n6f
@r_devops