Reddit DevOps
268 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
To all the new prospects

It's good to see so many new people interested in DevOps. Our field definitely needs fresh perspectives. But I've seen a common issue. A lot of folks entering DevOps, especially if they're coming straight from college or some internships, don't always have a gut feel for the intense, unpredictable side of live operational work. They might know about certain tools, but they haven't always built up the deep resilience or the sharp, practical problem-solving skills you get from really tough, real-world challenges.

Think about what it's like on a working fishing boat. Imagine a vessel where its constant, reliable operation is absolutely essential for the crew to make their living. At the same time, this boat is often run on a tight budget, meaning ingenuity and making the most of what you have are more common than expensive, easy fixes. This boat isn't for fun. It's a vital piece of equipment. People's livelihoods and their safety absolutely depend on it running reliably, day after day. That makes its operation critical. And with limited resources, every repair or challenge demands clever solutions. You've got to make do, get creative, and find smart ways forward with what you've already got.

Things inevitably go wrong on that boat. Often it happens far from shore, in bad weather or tough conditions. When that occurs, the results are immediate and serious. An engine failure isn't some abstract problem. It’s a critical situation that needs to be diagnosed and fixed right now, with practical skills. There's no option to just pass the problem up the chain. That kind of environment forces you to become truly resourceful. It teaches you to solve complex problems when you're under serious pressure. You learn to understand the whole system because one small failure can affect everything else. You also develop a real toughness and a calm focus. Panicking doesn't help when you're dealing with a crisis.

This type of experience, where you're constantly adapting and learning by doing, with real responsibility and clear results, is incredibly valuable. It builds a kind of practical wisdom and resilience that's tough to get from more sheltered learning situations. Some internships are great for introducing tools. But they might not expose you to the actual stress and uncertainty of a live system failure. They may not show you how to make critical decisions when you don't have all the answers.

The parallels to the DevOps world are strong. We manage systems that are absolutely production critical. When they fail, the impact is right now, affecting users, company money, and its reputation. And while some companies have huge budgets, many DevOps teams work with limits. They need to find smart, efficient solutions instead of just throwing more money at every problem. We need people who can think on their feet. We need folks who can diagnose tricky issues across connected systems and stay effective when the pressure is high. We need that same ingenuity and resilience you'd find on that fishing boat, the kind that comes from real necessity.

So, if you're looking to build a solid foundation for a DevOps career, I'd really encourage you to look for experiences that genuinely challenge you. Find situations that force you to develop these core skills. Don't just focus on learning tools by themselves. Try to understand how systems actually work, how they break, and how you can fix them when the stakes are high. It's often true that the most effective people in DevOps also have a strong track record as successful developers. They don't just know that systems operate; they understand how they are built from the code on up. That deep insight is incredibly valuable. It’s also a fundamental truth that operating a system is only as good as its implementation. You can't effectively run or automate something that was poorly designed or built in the first place. No amount of operational heroism can truly make up for a flawed foundation.

Look for opportunities that push you to be resourceful, to take real ownership, and to keep
going through tough times. This could be in a job, a project, or even a demanding hobby. And remember, the best use of a good DevOps engineer is to serve the developers, to act as a force multiplier for them. Our primary role should be to make their work smoother, faster, and more effective, clearing obstacles so they can build and innovate. While we support the business, empowering the engineering teams is where we truly shine.

It's this kind of broader experience and focused mindset that builds the practical skills and the strong character so essential in our field. Being able to navigate those "storms," understand the code, and support your development teams is what truly makes a difference.

https://redd.it/1ky3qn1
@r_devops
Remote SWE Role for AI Infrastructure (Top Tier CS Backgrounds, Flexible Hours)

Hey all – wanted to share a SWE contract role I came across that might interest those with strong backend or API experience, especially if you're from a top-tier CS background.

It's from a platform called Mercor, which connects developers to AI-focused research projects. They've raised $100M+ and work with top labs to build tools and infrastructure that support large-scale Reinforcement Learning (RL) systems.

---

🛠️ The role (contract / remote):
- Help design and build secure APIs, database schemas, and backend infra used in AI training
- You'll also simulate synthetic environments to test RL systems
- 10–20 hrs/week (asynchronous, fully remote)
- Applicants must be based in the US, UK, or Canada
- Comp is a hybrid hourly+commission model with $50–$150/hr range depending on throughput

They’re looking for folks with:
- Strong CS fundamentals from top schools
- 1+ year in high-pressure environments (startups, quant funds, etc.)
- Real experience structuring DBs and building APIs (testing, auth, deployment, etc.)

You can check the official listing here.

I’m posting because I’ve been working with them and having good experiences so far. Worth a look if you’re interested in contributing to AI infra work and want something flexible but high-caliber.

Disclosure: referral link included above

https://redd.it/1ky4wpj
@r_devops
AWS ECS Alert

I want to setup on alert for ecs state change for my cluster in slack.Whats the best approach to do it.

I am planning to do it via event bridge with lambda.


Any other suggestions?

https://redd.it/1ky4hyn
@r_devops
I am looking for some devops project ideas, stuffs to deploy in Docker, Kubernetes etc.



My status: I am qualified to deploy "anything" on bare metal without hassle. i.e. on virtual machine.

I just started with docker & kubernetes. I am looking for projects that I can deploy on gitlab. There are tons of open source projects out there like:








artemis-platform

ipfire

jumpserver


While this is enough food for thought to learn deployment. Including the awesome-selfhosted github repo, I am posting this just for fun.

https://redd.it/1ky6iun
@r_devops
The hardest part of learning cloud wasn’t the tech it was letting go of “I need to understand everything first”

When I first started learning cloud, I kept bouncing between services.
I'd open the AWS docs for EC2, then jump to IAM, then to VPCs, and suddenly I'm 40 tabs deep wondering why everything feels disconnected.

I thought I had to fully understand everything *before* touching it.

But the truth is:

* You learn best when you build, break, and fix
* It's okay to treat the docs like a reference, not a textbook
* You'll never feel “ready”—you just get more comfortable being confused

Once I let go of the need to “master it all upfront,” I actually started making progress.

Anyone else go through that mindset shift?
What helped you move from overwhelm to action?

https://redd.it/1ky9el4
@r_devops
Kubernetes observability is way more complex than it needs to be

Every time something breaks, I'm stuck digging through endless logs or adding more instrumentation code just to see what's happening. And agent-based tools are eating up CPU and memory.

Are there any monitoring solutions that don't require me to modify application code or pay a fortune just to see what's going on in my cluster? Would love to hear what's worked for others who don't have enterprise-level resources!

https://redd.it/1kyace1
@r_devops
First DevOps Internship Interview—What Should I Focus On?

Hey everyone! I’m prepping for my first DevOps internship interview and would love advice on key areas and likely questions. Here’s my background:

* **Microservices:** Built Spring Boot services (Healthcare platform).
* **Docker:** Wrote Dockerfiles and managed images locally.
* **Kubernetes:** Deployed to a Minikube cluster for local testing.
* **CI/CD:** Configured GitHub Actions per service branch for build, test, and deploy.
* **Blog Post:** Wrote “Introduction to Docker” covering its history and use cases.

**My questions for the community:**

1. **Key topics to study:**
* Should I dive deeper into Kubernetes concepts, or focus more on CI/CD best practices?
* How important is it to have hands-on AWS knowledge vs. local setups like Minikube?
2. **Interview question prep:**
* What practical or scenario-based questions might they ask ?
* Any common algorithmic or systems-design questions for a DevOps intern?
3. **Soft skills & culture fit:**
* What kind of behavioral questions are typical ?
4. **Additional resources:**
* Recommended tutorials, books, or courses to fill any gaps?

Thanks in advance for any advice, sample questions, or pointers! I really appreciate any help to make sure I’m covering the most relevant areas. 😊

https://redd.it/1kybht9
@r_devops
Are you guys willing to switch to (and re-learn) a different cloud provider for if it is required for a job?

As the title says, is it wise to start learning Azure from scratch for a job opportunity if you already have a few years of experience with AWS and some AWS certs? (specifically, switching from amazon EKS to azure AKS and learning how to deploy it with terraform).

https://redd.it/1kyeo40
@r_devops
Downgrade CPU

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2Fdowngrade-cpu-v0-ftvxu72m3r3f1.png%3Fwidth%3D1662%26format%3Dpng%26auto%3Dwebp%26s%3De581291ccbf7835f9d45124c034b286e97e4d7b3


The virtual machine is provisioned with 4vCPUs.
Here's the breakdown of the CPU usage from GCP in last 14 days.
Occasionally it goes up to 86.4%, but most of the time it stays at around 30%.

Is it safe to downgrade it to 2 vCPUs? What kind of factors should I consider?

https://redd.it/1kyfykm
@r_devops
Looking for Secure Dev Team Access to Cloud Resources (without Cloud Accounts)

Hi everyone,

I’m trying to design a **secure and cloud-agnostic access solution** for my dev team, and I’d appreciate some guidance or suggestions.

🔒 **What I want to achieve:**

* I want my devs to securely access certain cloud resources (e.g., VMs, internal services) **without creating cloud user accounts** for them (e.g., no IAM/AD accounts).
* Ideally, they should be able connect with a client (similar to VPN) and get seamless, controlled access to assigned resources.
* I need **identity-based access control**, centralized management of access policies, and something **cloud-agnostic** so I’m not tied to a specific cloud vendor.
* This should cover use cases like **SSH access to VMs** and **access to internal web services**.

🌐 **What I’ve tried:**
I’ve been experimenting with **OpenZiti** to set up secure overlays (for example, mapping `vm.ziti` to a target VM’s public IP). However, I’m facing challenges:

* **Overlaying SSH connections to public IPs of target VMs** hasn’t been easy im having couple of issues.
* I’m not sure if my setup is incorrect or if OpenZiti isn’t ideal for this use case.

📢 **So I’m looking for:**

* Alternative solutions that are easier to set up than OpenZiti but still provide **zero-trust, identity-based access control**.
* Solutions where developers can connect via a VPN-like client and get access based on policies, with **no user account management in the cloud**.
* **Cloud-agnostic** setups that work across different cloud providers.

🤝 If anyone has experience with **OpenZiti**, especially in overlaying SSH access to public IPs, I’d love to connect and discuss further!

Thanks in advance for any advice or recommendations 🙌

https://redd.it/1kyiqrd
@r_devops
I had an interviewer refer to AWS' DNS service as "Route 34"

I gave my best poker face and pretended not to notice... if you know you know.

https://redd.it/1kyk9g2
@r_devops
The terror of a "ZERO CVE" metric and how the bureaucrats lost.

Hey i recently worked at company with a 'Zero CVE' policy and i would like to share my story on my blog, feel free to ask any questions it was a lot of fun to write and i hope you guys like it as well.

The terror of a "ZERO CVE" metric and how the bureaucrats lost.

Please share me your best stories and especially metrics that the bureaucrats in your company made up. I'm fascinated in what silliness other companies invent.

I suppose the Goodhart Law is really fitting to this topic.

https://redd.it/1kykf9o
@r_devops
Scraping control plane metrics in Kubernetes… without exposing a single port. Yes, it’s possible.

“You can scrape etcd and kube-scheduler with binding to 0.0.0.0

Opening etcd to [0.0.0.0](https://0.0.0.0/) so Prometheus can scrape it is like inviting the whole neighborhood into your bathroom because the plumber needs to check the pressure once per year.

kube-prometheus-stack is cool until tries to scrape control-plane components.

At that point, your options are:

* Edit static pod manifests (...)
* Bind etcd and scheduler to [0.0.0.0](https://0.0.0.0/) (lol)
* Deploy a HAProxy just to forward localhost (???)
* Accept that everything is DOWN and move on (sexy)

No thanks.

I just dropped a Helm chart that integrates cleanly with kube-prometheus-stack:

* A Prometheus Agent DaemonSet runs only on control-plane nodes
* It scrapes etcd / scheduler / controller-manager / kube-proxy on [127.0.0.1](https://127.0.0.1/)
* It pushes metrics via "remote\_write" to your main Prometheus
* Zero services, ports, or hacks
* No need to expose critical components to the world just to get metrics.

Add it alongside your main kube-prometheus-stack and you’re done.

GitHub → [https://github.com/adrghph/kps-zeroexposure](https://github.com/adrghph/kps-zeroexposure)

Inspired by all cursed threads like [https://github.com/prometheus-community/helm-charts/issues/1704](https://github.com/prometheus-community/helm-charts/issues/1704) and [https://github.com/prometheus-community/helm-charts/issues/204](https://github.com/prometheus-community/helm-charts/issues/204)

bye!

https://redd.it/1kym1qu
@r_devops
Helping DevOps with Automation! - Import Postman & Swagger, collections & instantly create API's!

I created a website that streamlines API creation by letting you import Postman or Swagger collections.

Instead of manually setting up endpoints, just upload your collection and let my website generate your API and responses automatically.

Then simply click run to make the API's accessable!

Just trying to make Dev's lives easier 😊

https://redd.it/1kyqdr8
@r_devops
Scripts and tools to diagnose and find issues with your database?

Do you guys have things you can run as queries or tools you can use that connects to the db to see if there are things you can optimize or improve? Things like the SQL script that detects every long queries that need to be rewritten.

https://redd.it/1kyrjb2
@r_devops
I don't understand high-level languages for scripting/automation

Title basically sums it up- how do people get things done efficiently without Bash? I'm a year and a half into my first Devops role (first role out of college as well) and I do not understand how to interact with machines without using bash.


For example, say I want to write a script that stops a few systemd services, does something, then starts them.

```bash

\#!/bin/bash

systemctl stop X Y Z
...
systemctl start X Y Z

```

What is the python equivalent for this? Most of the examples I find interact with the DBus API, which I don't find particularly intuitive. As well as that, if I need to write a script to interact with a *different* system utility, none of my newfound DBus logic applies.

Do people use higher-level languages like python for automation because they are interacting with web APIs rather than system utilites?

https://redd.it/1kyu0xf
@r_devops
Handling Secrets with Deployments via github

Hey Folks,

I am using argocd for my k3s cluster and komo.do for my docker deployments. Both selfhosted.

Ever since i have the problem with handling secrets for my deployments.

I read about hashicorp vault, but cant find much information about setting it up.

Do you know any good tutorials, how i can set up and utilize hashicorp? An alternative would also fit for me.

Thanks

https://redd.it/1kyvltl
@r_devops
Bohr Model of Atom Animations Using HTML, CSS and JavaScript (Free Source Code)

Bohr Model of Atom Animations: Science is enjoyable when you get to see how different things operate. The Bohr model explains how atoms are built. What if you could observe atoms moving and spinning in your web browser?

In this article, we will design Bohr model animations using HTMLCSS, and JavaScript. They are user-friendly, quick to respond, and ideal for students, teachers, and science fans.

You will also receive the source code for every atom.

# Bohr Model of Atom Animations

1. Bohr Model of Hydrogen
2. Bohr Model of Helium
3. Bohr Model of Lithium
4. Bohr Model of Beryllium
5. Bohr Model of Boron
6. Bohr Model of Carbon
7. Bohr Model of Nitrogen
8. Bohr Model of Oxygen
9. Bohr Model of Fluorine
10. Bohr Model of Neon
11. Bohr Model of Sodium
12. Bohr Model of Magnesium
13. Bohr Model of Aluminium
14. Bohr Model of Silicon
15. Bohr Model of Phosphorus
16. Bohr Model of Sulfur
17. Bohr Model of Chlorine
18. Bohr Model of Argon
19. Bohr Model of Potassium
20. Bohr Model of Calcium
21. Bohr Model of Scandium
22. Bohr Model of Titanium
23. Bohr Model of Vanadium
24. Bohr Model of Chromium
25. Bohr Model of Manganese
26. Bohr Model of Iron
27. Bohr Model of Cobalt
28. Bohr Model of Nickel
29. Bohr Model of Copper
30. Bohr Model of Zinc

You can download the codes and share them with your friends.

Let’s make atoms come alive!

Stay tuned for more science animations!

Would you like me to generate HTML demo code or download buttons for these elements as well?

https://redd.it/1kywunl
@r_devops
Switching From Flutter to DevOps ?? Need some assistance or guidance

I've been working as flutter developer for around 2 yrs and built several projects including my personal project available on playstore built using flutter, nodejs and managing my own server by hostinger. After managing my own app and my freelance project I found my interest is more towards scaling and managing products rather than development. And for that reason switching my role obviously for higher pay as well.

I've covered ansible, kubernetes, aws, CI/CD basic without jenkins, Coolify, Nginx and learning more and started applying for similar roles..

Can anyone help me guide whether I'm on a right path or not ?? And What approaches should I follow to be the best ? I already have hands on vps and more.

Also looking to purchase kodekloud subscription once my interview will get clear so that I can have more hands on practice during my current company notice period..

Please Guide...

https://redd.it/1kyyksn
@r_devops