Reddit DevOps
268 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Making system design diagrams less painful.

Hi everyone!

After years of pain of designing system design diagram by hand, I have decided to try and make the whole process smoother and faster.

I developed RapidChart, a free technical diagram generator that lets you design your system architecture much faster!

I’d love for you to try it out and let me know what you think.

Best, Sami

https://redd.it/1m6aclv
@r_devops
How do small SaaS teams handle CI/CD and version control?

Solo dev here, building a multi-tenant Laravel/Postgres school management system.

I’m at the stage where I need proper CI/CD for staging + prod deploys, and I’m unsure whether to:

Self-host GitLab + runners (on DigitalOcean or a personal physical server)
Use GitHub/GitLab’s cloud offering

My biggest concerns:

Security/compliance (especially long-term SOC2)
Secrets management (how to safely deploy to AWS/DigitalOcean)
Availability (what if the runner or repo server goes down?)

Questions:

1. Do you self-host version control and CI/CD? On your cloud provider? Home lab?
2. How do you connect it to your AWS/DO infra securely? (Do you use OIDC? SSH keys? Vault?)
3. For solo devs and small teams — is it better to keep things simple with cloud providers?
4. If I self-host GitLab, can it still be considered secure/compliant enough for audits (assuming hardened infra)?

My plan right now is:

GitLab on a home server or a separate DO droplet, harden everything with Keycloak and Wireguard
Runners on the same network
Deploy apps to DOKS (or ECS later)

Would love to hear how others manage this.

Thanks!

https://redd.it/1m6d5ep
@r_devops
A simple fix for Docker Hub rate limit errors in CI/CD

Hi r/devops,

My team has been struggling with intermittent CI failures ever since the new Docker Hub rate limits were enforced. The shared IP of our runners kept hitting the anonymous pull limit, which was a major headache.

We looked into the standard solutions:

Docker Pro/Team: The per-seat pricing felt wrong for an infrastructure problem.
Self-hosting Harbor/Nexus: The operational overhead of setting up and maintaining another piece of infrastructure just for this was too high for our small team.

We wanted a "set it and forget it" utility, so I ended up building one. I'm sharing it here in case it can help other teams facing the same issue.

It's a free, public caching mirror for Docker Hub called RateLimitShield. It requires no sign-up. It solves the problem by handling authentication on the backend and caching layers, so your runners don't hit the anonymous limit.

To use it, you just need to configure the Docker daemon on your runners. Edit the /etc/docker/daemon.json file:

{
"registry-mirrors":
"https://public-mirror.ratelimitshield.io"

}

And then restart the Docker service (sudo systemctl restart docker).

That's it. Our builds have been stable ever since. The project website with more details is at ratelimitshield.io.

The public mirror uses a shared cache, which is great for common base images. I'm also gauging interest in future premium plans for teams that might need a dedicated, private cache for guaranteed performance.

Would love to hear how other teams are tackling this problem and get any feedback on this approach. Thanks!

https://redd.it/1m6gpn1
@r_devops
Setting Up a Production-Grade Kubernetes Cluster from Scratch Using Kubeadm (No Minikube, No AKS)

Hi ,

I've published a detailed blog on how to set up a 3-node Kubernetes cluster (1 master + 2 workers) completely from scratch using kubeadm — the official Kubernetes bootstrapping tool.

This is not Minikube, Kind, or any managed service like EKS/GKE/AKS. It’s the real deal: manually configured VMs, full cluster setup, and tested with real deployments.

Read here: https://ariefshaik.hashnode.dev/setting-up-k8s-using-kubeadm

What’s in the guide:

How to spin up 3 Ubuntu VMs for K8s

Installing containerd, kubeadm, kubelet, and kubectl

Setting up the control plane (API server, etcd, controller manager, scheduler)

Adding worker nodes to the cluster

Installing Calico CNI for networking

Deploying an actual NGINX app using NodePort

Accessing the cluster locally (outside the VM)

Managing multiple kubeconfig files

I’ve also included an architecture diagram to make everything clearer.

Perfect for anyone preparing for the CKA, building a homelab, or just trying to go beyond toy clusters.



Would love your feedback or ideas on how to improve the setup. If you’ve done a similar manual install, how did it go for you?

TL;DR:

Real K8s cluster using kubeadm

No managed services

Step-by-step from OS install to running apps

Architecture + troubleshooting included

Happy to answer questions or help troubleshoot if anyone’s trying this out!

https://redd.it/1m6eq0e
@r_devops
Third year CS student trying to get into DSA & DevOps, any beginner-friendly internships or advice?

Hello everyone,
I’m Dhyan Bellary, currently in my 3rd year of engineering (CSE). I’ve just started learning DSA and DevOps, but honestly, I still feel pretty lost. I'm looking for internships (even unpaid ones) where I can get hands-on experience, learn by doing, and figure out what to focus on next.

Are there any platforms, programs, or open-source projects where beginners like me can start contributing or learning practically?
Any advice or resources would also be hugely appreciated.

Thanks in advance!

https://redd.it/1m6o4zb
@r_devops
What's your team's branching strategy for React Native? (GitFlow-Lite vs. Trunk-Based Development)

Hey r/devops 👋

My team could use some community wisdom. We're a small team of 3 devs working on a React Native app using Expo, EAS, and Jenkins for CI/CD.

We're currently debating our branching and release strategy and have landed on two main options:

1. Option A: GitFlow-Lite (main / develop branches)

How it works: Features are merged into `develop`. This branch is used for internal test builds and OTA testing. When we're ready for a release, we merge `develop` into `main`, which represents the production App Store version.
Pros: This feels very safe, especially for separating native changes from simple OTA updates. There's a clear buffer between our daily work and what goes to the app stores.

2. Option B: Trunk-Based Development (main only)

How it works: All features get merged directly into `main`, protected by feature flags.
Pros: We love the simplicity and development speed. It eliminates "merge hell" and feels more aligned with true CI/CD.
Cons: We're cautious about the risks with mobile. A bad merge with a new native dependency could break the app for everyone until a new binary is released. It feels like it requires extreme discipline.

We know the big tech companies (Google, Meta, etc.) use Trunk-Based Development successfully, but we're curious how it works for small to medium-sized teams like ours.

So, we wanted to ask the community:

What's your team size and which strategy have you adopted?
If you use Trunk-Based Development, how do you manage the risk of native dependencies? Is it all on feature flags and careful release coordination, and has it ever bitten you?
If you use a GitFlow-style strategy, do you ever find it slows you down too much?
How do you structure your workflow for OTA updates vs. full app store releases within your chosen strategy?
Any major "gotchas" or lessons you've learned that you wish you knew earlier?

Any insights, war stories, or advice would be hugely appreciated. Thanks!

https://redd.it/1m6pl26
@r_devops
Best AI for DevOps?

Hey everyone,

I am mostly working in .net environnents with a lot of Powershell/c# tasks.

Azure DevOps on prem for CI/CD.

What do you think the best AI for this would be? I am currently using Chatgpt and CoPilot and the experience is ok-ish.


https://redd.it/1m6roo6
@r_devops
why pay for incident management platforms?

Just got off two weeks back to back on call rotation, rant incoming.

All "incident management" platforms are just insanely expensive phone plans that wakes me up in the middle of the night. It’s like I’m a masochist paying for my own torture. After we wake up we just jump into Slack anyway to actually fix the problem. Why are we paying for tools that just adds a step and creates more work?

Holy crap the UIs man, 3am I do not function as normal, I spent the first 10 minutes trying to remember how a mouse worked let along clicking drop downs and five layers deep navigations.

Trying to check who’s on schedule for escalation feels like I'm trying to defuse a bomb in an interface designed 15 years ago.

too bad SLAs require 3 nines uptime. I'd kill this whole thing so f fast if i had the guts and money weren't so good LOL

ok rant over, thanks for reading.

https://redd.it/1m6dr7z
@r_devops
Introduction to Maven: The Build Tool That Modernized Java Development

With Maven 4.0.0 just around the corner, I thought it would be a good idea to write a quick introduction to Apache Maven for any newcomers that are interested in getting acquainted with the tool, its history and philosophies.

I hope you find this interesting! :)

https://medium.com/maven-by-nature/introduction-to-maven-the-build-tool-that-modernized-java-development-f3c038b4d32e?sk=fe44db3512f026787bc2cd7d31e98b5f

https://redd.it/1m6tbuw
@r_devops
How Do I Learn AWS, Kubernetes, and Modern DevOps Tools If My Company Doesn’t Use Them (And Without Spending a Fortune)?

I currently work at a company where our tech stack is fairly traditional — we use Apache, Nginx, and Docker Compose for deployments. There’s **no AWS**, no **Kubernetes**, no **CI/CD pipelines**, and barely any of the modern DevOps tooling that’s in demand right now.

While I’m grateful for the learning so far (I’ve gained solid Linux and server fundamentals), I’m starting to feel like I’m falling behind in the DevOps world. I really want to get hands-on experience with:

* AWS (EC2, S3, IAM, CloudFormation, etc.)
* Kubernetes (EKS, Helm, ArgoCD)
* Terraform, CI/CD tools like Jenkins/GitLab CI, etc.

But here’s the catch — **AWS can get expensive** real fast when you're practicing. I’m also trying to be mindful of costs, as I’m self-learning in my spare time. So I’m looking for **advice** from folks who’ve been in a similar situation:

https://redd.it/1m6amv4
@r_devops
CI & CD Pipeline Setup

Hello Guys,
I am able to understand/learn the basics of docker and kubernetes by testing it in my linux laptop using kind.
But I am not able to understand how a production pipeline will look like. If someone can explain how their CI & CD Pipeline works and what are all the components involved in it that would be of great help for me to understand how a pipeline works in a real production environment.
Thanks in advance!

https://redd.it/1m6z5ss
@r_devops
generate sample YAML objects from Kubernetes CRD

Built a tool that automatically generates sample YAML objects from Kubernetes Custom Resource Definitions (CRDs). Simply paste your CRD YAML, configure your options, and get a ready-to-use sample manifest in seconds.

Try it out here: https://instantdevtools.com/kubernetes-crd-to-sample/

https://redd.it/1m710tj
@r_devops
Opensearch Cross Cluster Replication

Hello everyone.
I have 2 Opensearch Clusters installed each on a different EKS cluster on different regions.I have connected the VPCs together so both EKS Cluster can reach each other.
one cluster is located in Asia and one Europe.
I was able to set up the CrossCluster Replication following the official guide but the problem im facing is that when i setup the Auto-follower, it replicated all the indices below 250mb and doesnt do that with the bigger ones.
On the ones failing i get UNALLOCATED and the reason is that the cannot allocate because allocation is not permitted to any of the nodes

PS: I have used the same configurations for both clusters (installed via helm chart)

https://redd.it/1m737r5
@r_devops
I created a browser extension for pre-alerting of high costs in AWS console

Hello,

I had a surprise the other day when AWS charged me $300 for two public exportable certificates. I didn't notice the small note under the "enable export" option that made each certificate cost $150 upfront.

For this reason, I have created a multi-browser extension that warns you that the option you just selected is quite expensive. See it in github for visual example: https://github.com/xavi-developer/aws-pricing-helper

Extension is open source, right now it warns in two different sections (EC2 & certificate manager).

Anyone willing to contribute with PRs or comments is welcome.



https://redd.it/1m75cty
@r_devops
pERSONAL cREDENTIALS AND ideS

Hey all,

I am new-ish to DevOps and currently learning the ins and outs. I am working on learning Azure DevOps and integrating VSCode into managing code within that environment. I have some vision about what I want to accomplish in the short term. I have accumulated a library of powershell scripts that I leverage on a day to day basis to do various things (manage Intune, generate reports, etc) and I'd like to extend them to the wider group as a whole. A lot of the scripts leverage RestAPIs that require OAuth 2.0 authentication mechanisms and the tokens that those scripts rely on are personalized to the individual. Obviously, I don't want to store my own credentials/tokens within the scripts in DevOps. What is the strategy for leveraging personal credentials in code? Is there a local mechanism people leverage for personal credentials that can be integrated into scripts and other code? It feels pretty ham-fisted to require people to manually store things like personal refresh tokens in a personal key vault and have to routinely pull a script, go to their personal key vault and copy the token to the clip board, and paste it into the script. Is this what people normally do?

Ultimately, the final destination for work like this is maybe some kind of Azure Function with a Managed Identity or some other secure credential authentication mechanism, but I am not quite there yet.

Edit: The awkward moment when you notice your caps lock was on when typing the subject title...

https://redd.it/1m76j7u
@r_devops
Scratching my head trying to differentiate between Canary release vs blue green deployment

Hello, I am new to learning the shenanigans of Quality assurance, and this one in particular is making me go crazy.

First, let's share how I initially thought it was like - Canary testing had 2 methods: One is incremental deployment, and another one is blue-green deployment. In the first one, you utilize the existing infrastructure of the software and drop experimental updates on a selected subset of users(Canaries). While on the second one, you create a parallel environment which mimics the original setup, you send some of the selected users to this new experimental version via a load balancer, and if everything turns out to be fine, you start sending all of your users to the new version while the original one gets scrapped.

Additionally, the first one was used for non-web-based software like mobile apps, while the second one was used for web-based services like a payment gateway, for example.

But the more I read, I keep repeatedly seeing that canary testing also happens on a parallel environment which closely resembles the original one, and if that is the case, how is this any different than blue green testing? Or is it just a terminology issue, and blue-green can totally be part of canary testing? Like, I am so confused.

I would be hella grateful if someone helped me understand this.

https://redd.it/1m777h8
@r_devops
do these principles line up with how your team handles on-call?

Engineers at X walk through 6 practical principles they used to seriously reduce on-call fatigue and alert volume. Think process over product: things like clear ownership, dependency hygiene, and proactive maintenance.

Link to article: [https://leaddev.com/technical-direction/on-call-firefighting-future-proofing](https://leaddev.com/technical-direction/on-call-firefighting-future-proofing)

Some takeaways that resonated:

* "No ownership = no accountability = endless alerts"
* On-call quality > just reducing alert count
* Fixing broken dependencies before they break you

https://redd.it/1m794xw
@r_devops
Can someone explain the difference between Elasticsearch ERUs and Splunk cloud ? Can they be used for central logging and central observability?

Same as above, looking to buy either one but have nobody to explain

https://redd.it/1m794bu
@r_devops