Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Cloud Engineer roadmap check: Networking + Linux completed, next steps?

I’m transitioning to Cloud Engineering from scratch. I’ve completed basic networking (TCP/IP, DNS, subnetting) and Linux fundamentals (CLI, file permissions, processes). I’m currently learning Git and GitHub. My goal is to get a junior cloud role in 6–9 months. What should I focus on next.

https://redd.it/1rezupb
@r_devops
CleanCloud v1.6.3 - 20 rules to find what's costing you money in AWS/Azure

A while ago I posted about CleanCloud \- a shift-left cloud waste report tool enforces hygiene as a CI/CD gate, now with cost estimates and --fail-on-cost CLI option

AWS Rules (10):

1. Unattached EBS volumes (HIGH)
2. Old EBS snapshots
3. Infinite retention logs
4. Unattached Elastic IPs (HIGH)
5. Detached ENIs
6. Untagged resources
7. Old AMIs
8. Idle NAT Gateways
9. Idle RDS instances (HIGH)
10. Idle load balancers (HIGH)

Azure Rules (10):

1. Unattached Managed Disks
2. Old Snapshots
3. Unused Public IPs
4. Empty Load Balancers
5. Empty Application Gateways
6. Empty App Service Plans
7. Idle VNet Gateways
8. Stopped (Not Deallocated) VMs — still incurring full compute charges
9. Idle SQL Databases (zero connections 14+ days)
10. Untagged Resources

Every finding includes:
\- Confidence level (HIGH / MEDIUM)
\- Evidence and signals used
\- Resource details and age
\- Cost waste estimates

Enforce in CI/CD:

cleancloud scan --provider aws --all-regions --fail-on-confidence HIGH --fail-on-cost 2000

Exit 0 = pass.

Exit 2 = policy violation.

pipx install cleancloud and run your first scan in 5 minutes.

If you’re one of the 200+ users who have downloaded CleanCloud, we’d love to hear what you found.

Please open an issue here or leave a comment below.

https://redd.it/1rf84m8
@r_devops
27001 didn’t change our stack but it sure as hell changed our discipline

We missed two deals so it finally made sense to leadership to pursue ISO 27001.


We did end up tightening parts of our stack. A few workflows became more structured, some things moved out of people’s heads and into systems but that wasn’t the real shift even though they definitely had their own positive sides to it.


The uncomfortable part was answering some questions we’d never formally defined. A lot of our processes were muscle memory and ISO forced us to define them, assign ownership and create review cadence.


The discipline we gained changed everything.

https://redd.it/1reqg60
@r_devops
Why does docker output everything to standard error?

Everytime I look inside my github wrokflows I see everything outputted to stderr, why does this happen?


Thank you!

https://redd.it/1rhts32
@r_devops
Build a website for DevOps Learning

Hey folks
After a long time, I finally rebuilt (vibe-coded ) and revamped one of my old projects DevOps Atlas.
It’s basically a one-stop search engine for DevOps learning resources.
The goal is simple:
Help DevOps engineers discover high-quality learning resources without endless searching.
Any suggestions and feedback are most welcome. Check it out at https://devopsatlas.com/ and let me know what you think!

https://redd.it/1rhwo1p
@r_devops
hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions - Microsoft, DataDog, and CNCF Projects Hit So Far

https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation#attack-6-aquasecuritytrivy---evidence-cleared

Now trivy repo is empty.... https://github.com/aquasecurity/trivy

some advices :

1. Verify the integrity of your Trivy binaries if installed at the end of February
2. Switch to the Docker image (if still available on GHCR/Docker Hub), verify Cosign signatures
3. Keep Checkov or Grype as a fallback
4. Audit your GitHub Actions workflows: no pull_request_target + checkout of the fork, no unescaped ${{ }} in run blocks:

https://redd.it/1ri4nwu
@r_devops
I parsed cloud Interview questions

Hey Folks,

Last time I published my 100 interview questions. I've added 10 more new question from Glassdoor reviews covering Cloud.

Companies are Amazon, Accenture, Kayak, Adobe, Autodesk, EPAM, Lyft, Twitch, Coinbase.
These are AWS questions, I've added Videos for them as well.

https://github.com/devops-interviews/devops-interview-questions

Nothing on github is paywalled. If you ever feel like thanking me just star the repo. Thanks

https://redd.it/1ro861x
@r_devops
I made an interactive progressive roadmap for new DevOps Engineers

TL;DR

The Roadmap [https://roadmap.esc.sh/](https://roadmap.esc.sh/)
Source : https://github.com/MansoorMajeed/infra-roadmap
Blog Post (the philosophy for learning SRE/DevOps) : [https://blog.esc.sh/sre-devops-roadmap/](https://blog.esc.sh/sre-devops-roadmap/)


I have been an SRE for over a decade, and I’ve mentored a lot of junior engineers. The single biggest hurdle they all face is that the DevOps/SRE field is just incredibly overwhelming to beginners.

Many juniors make the mistake of jumping straight into learning tools (Docker, K8s, Terraform) without actually understanding
what problems those tools were built to solve or how they fit together or the foundation of it all itself. If we look at traditional DevOps roadmaps or the CNCF landscape, it often makes the problem worse. It’s just a massive bingo card of logos that doesn't explain the "why" behind anything.

So, I decided to build a better way to visualize this: an interactive, progressive roadmap.

How it’s different:

Question-Driven: Each different node follows a general thought or question a new engineer may have and lets them choose the next path that they find interesting
Progressive Disclosure: It doesn't show you 200 tools at once. The map expands as you explore, keeping cognitive load low.
Open Source & Static: It’s a fully offline, static site.



Note about how it was made: I am an SRE, not a frontend dev (I still struggle with frontend and I decided that it is not my cup of tea), so I used Claude to help write the React Flow/Next.js engine and some boilerplate text. However, the architecture, the paths, the connections, and the core learning flow are 100% my own design based on my experience. Because of that, it might be biased or missing things, so PRs are more than welcome!

I also wrote a short blog post expanding on why I think we need to teach "concepts over tools" if anyone is interested in the philosophy behind it. https://blog.esc.sh/sre-devops-roadmap/

I hope this helps some of the juniors build a mental model. Would love to hear your feedback!

I am also happy to answer any questions any new folks may have!

https://redd.it/1rojfho
@r_devops
DevOps to Build/Release Eng

So I needed to find a full remote role because my current hybrid arrangement isn’t gonna work out moving forward. I ended up receiving an offer for a build and release engineer position.

My background is in traditional DevOps, supporting developers and their CI pipelines which I do enjoy. The toolset is: GitHub actions, AWS, EKS runner infra.

This new position is more like technical program/project management. I’ll be responsible for what releases go out the door, managing the GitHub branching strategy, and also owning the CI/CD pipelines + release automation.

The new role is a +20% TC, full remote position. Has anyone else made this transition? Loved it? Hated it? Interested to hear your experiences.

https://redd.it/1roke6e
@r_devops
I'm looking to move to a proper devops/platform engineer role

I don't know if its a right place for me to make this post ...
but i have been loking for a job change ...my roles have been mixed like initially i worked as devops engineer for two years then was moved to cloud migration then cloud operations mainly in azure ....i have knowledge in terraform for infrastructure provisioning(mainly virtual machines) jenkins from previous experience python scripting kubernetes (AKS) docker azure devops pipelines its like i know a little bit of everything but not enough so does anyone know how to permanently switch to devops platform engineering?

im stuck i blew of an interview at round 2 because i didn't know system design much so i don't know i would appreciate any sort of help

I don't know where to start wat tools to stick too n learn properly ?

https://redd.it/1roj11d
@r_devops
Choosing DNS to host

I am designing environment for malware simulation where it uses DNS tunneling to export data bypassing the firewall. For this I need to host an internal authoritative DNS for a dummy domain that would cache requests with encoded information.

Do you have any recommendations which software to use for it? I’m leaning towards bind9 on Debian host, but I’m not sure if it’s not an overkill since it’s an enterprise-grade solution and all I’m doing is a simple demo.

The infra runs on multi node proxmox and I use OPNSense for firewall if it matters.

https://redd.it/1rnghlb
@r_devops
AI’s Impact on DevOps: Opportunities and Challenges

Read this article -- https://medium.com/@averageguymedianow/ais-impact-on-devops-opportunities-and-challenges-6cdba7a5a45e.

What really caught my eyes is this statement:

"Integrating AI into DevOps workflows introduces significant complexity. Teams must now understand not only traditional infrastructure and application concerns but also machine learning models, training data requirements, model versioning, and AI-specific monitoring needs. This complexity can create new forms of technical debt when AI systems are implemented without proper governance or understanding."

From what I'm seeing, technical debt keeps piling up.


https://redd.it/1rocti9
@r_devops
Complete Guide to Building a CLI

In this article, I’ll cover a complete guide on how to build a professional CLI (Command Line Interface) that is easy to use and, most importantly, easy to integrate with other applications. If you’ve never built a CLI before, don’t worry — we’ll start from scratch.

https://vibelog.mateusmoutinho.com.br/en/article?date=2026/03/07&id=cli-guide/

https://redd.it/1ro0rqk
@r_devops
Hands-on with OVHcloud Managed Kubernetes

Been testing EU managed k8s providers one by one for eucloudcost.com, OVH was next.

Short version: it just works.

Free control plane, free egress in EU regions. You only pay for nodes. Coming from AWS this feels wrong somehow.

I also managed to set both vRack subnets to no_gateway = true and then spent an hour wondering why Traefik was stuck in Pending. Turns out Octavia needs a gateway on the load balancer subnet. Anyway.

Main issue is no RWX volumes out of the box. File Storage for RWX exists but starts at 150 GiB which is overkill for most things, so out of the Box only RWO exists ...

Also they burned down a datacenter in 2021 so now every resource in the console shows you the AZ deployment mode.

Put together a reference repo with the full OpenTofu setup if you want a starting point: https://github.com/mixxor/opentofu-kubernetes-ovhcloud

Full writeup in comments.

Anyone else running OVHcloud in prod / dev ?
Curious if you hit anything weird I missed...

https://redd.it/1rmp4f9
@r_devops
Would you be interested in official r/DevOps Discord server ?

Hi r/devops,

Would you be interested in having a community Discord server related to the subreddit?

This is simply an open discussion to gauge interest.. please comment your opinion.

https://redd.it/1rnnxq8
@r_devops
finally stopped manually SSH-ing to deploy my code. I built a simple CI/CD pipeline and it saved my sanity.

Last month, I spent 3 hours debugging a broken deployment on a Friday at midnight.

For context, I’m building a full-stack ERP (TypeScript, Node.js, React). Every time I wanted to ship a new feature, my routine was: open terminal -> SSH into my DigitalOcean Droplet -> git pull \-> npm install \-> npm run build \-> restart PM2 -> pray Nginx doesn’t throw a 500 error. It took way too long and was super prone to typos.

I finally decided to automate it. I drew up this architecture [ATTACH YOUR EXCALIDRAW IMAGE HERE\] and wrote a GitHub Actions .yml file.

Now, my workflow is just:

1. git push origin main
2. GH Actions sets up a Node environment, installs dependencies, and runs the TS build (to catch errors early).
3. If it passes, an SSH action connects to my Droplet, pulls the code, and restarts PM2.

Total time: \~30 seconds. Zero manual work. I deployed 3 times today in my pajamas.

I was debating between Jenkins and GitHub Actions, but GH Actions felt like the frictionless choice since my code is already there. For the senior DevOps folks here: at what scale do you usually outgrow GitHub Actions and move to something like Jenkins? Any security flaws in my current setup I should be aware of?

https://redd.it/1rp1n6f
@r_devops
Python modules for creating and modifying Helm & k8s manifests

I'm now working on a DBaaS service for the developers in my department, and since it's my first time doing a project like this, I'd be happy if anyone could recommend modules they like to use for these types of automations that are used mainly to create or modify existing helm charts and k8s manifests.

https://redd.it/1roxmm9
@r_devops
How to make Documentation Discoverable?

Hey, DevOps Engineer here!

How do you handle the problem of “there is documentation” but no one knows where it is (except like 2 seniors who were there when it was written) - Using Confluence for this example?

The goal is to make the documentation explicitly available where it is most needed, instead of having to ask someone else “Where are the docs on X?” The reason this matters is that if someone is sick or unavailable, we avoid a single point of failure :D

Ideas I’ve come up with:

* Add relevant documents to the Jira ticket (for example, deployment Guide attached to deployment tickets).
* Create “Hook Pages” that are framed around the problem and point to or include the guide for example,
* “How do I do X?” → links to guide on X
* “What is Service?” → links to “Service Architecture Explanation Guide”
* **One guide can have multiple problem/question hooks**

How do you go about making your docmunetation easily findable when you need it?

https://redd.it/1rp2noq
@r_devops
Advice For Surviving Current Job Market 6 Months After Layoff 3+ YOE

I've gotten laid off about 6 months ago, back in September. After being made redundant, I took some time off from anything work related, and got back to applying for DevOps/Platform engineering roles. Despite having gotten a dozen or so recruiters contacting me, as well as getting past a few final interviews, I feel as though my confidence is waning at this point.

My emergency funds are fairly solid, and should last a fairly long time (roughly 12 more months). I'm Interested in getting feedback mainly with my CV, as I fear I may be missing something here. I'm applying for mainly mid-level DevOps/Platform engineer roles.

My CV is here

https://redd.it/1rp95f3
@r_devops
Launch darkly rugpull coming

Hey everyone!

If you're using Launch Darkly on their existing user-based pricing scheme, they're moving to a new usage-based pricing.

Upside? Unlimited users.

Downside? They charge per service connection. What's a service connection? Any independent instance of an app connecting to Launch Darkly. For example, a VM, a Kubernetes pod, or a Heroku worker.

They're charging $12/month per service connection ($10 on an annual commitment).

We were paying $10k/annually for user-based pricing. We would pay $45k on the new per-service connection pricing.

For anyone going through the same thing, there are plenty of open source feature flag tools you can use, like Flagsmith. Just deploy them in your infrastructure and call it a day.

https://redd.it/1rr4fen
@r_devops
Empowering DevOps Teams

I came across an article sharing how to empower DevOps teams. If you are given the following choices and can pick only one to make your life better, which one would you pick?

1. A good team leader who understands what's going on and cares about his/her team. Pay and workloads remain the same.
2. A better paying job with less stress but you are required to relocate
3. A big promotion with far better pay and perks but with more stress and responsibilities.



https://redd.it/1rr74xm
@r_devops