Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How I Automated My Infrastructure with Terraform

Hello everyone!
I wanted to share one of my more... questionable engineering decisions: I Terraformed my entire home network.

I've been managing my Mikrotik setup (router + switches + wireless) with Terraform for about a year now. Everything from VLANs to firewall rules is defined as code and version controlled.

All of the code is avaliable here: https://github.com/mirceanton/mikrotik-terraform/

Why Terraform for networking?
Honestly, because it's the tool I know. When I found out the RouterOS provider existed, I just had to try it. Probably not the most practical approach, but it's been a great learning experience!

The state management situation is... creative. Can't exactly use S3 when you might accidentally terraform your own internet connection away! I ended up going with local state + SOPS encryption + Git. Works, i guess, but it's definitely not textbook.

Oh, and the amount of terraform state mv commands I've run during refactoring... SO many. I can't just destroy and recreate resources because they are, quite literally, my internet connection. I don't think I've ever had to do this much state surgery... even at work.

The whole thing taught me a lot about both Terraform and networking. Sometimes picking an overly complicated approach is the best way to learn!

Made a video about it too, if you're interested, wwhereI go into my setup as well, not just the code https://youtu.be/86LRoxuU5kg

Anyone else using Terraform in non-conventional ways? Would love to hear about other creative use cases or approaches!

https://redd.it/1kv99c6
@r_devops
Learn by doing

I'm looking to team up with some like-minded individuals who have a basic grasp of various tools and are ready to jump into some exciting projects! I've got a few cool ideas we could start working on together.

If you're interested in collaborating and bringing some of these ideas to life, let's create a Discord server and get started

https://redd.it/1kvdbhj
@r_devops
Hiring Managers

1) What are some of the skills with the most demand right now and will stay in demand for the next 30 or so years?

2) How is the job market right now for Cloud/DevOps and SRE roles?

https://redd.it/1kvesqr
@r_devops
Bare metal K8s Cluster Inherited


We inherited an infrastructure consisting of 5 physical servers that make a k8s cluster. One master and four worker nodes. They also allowed load inside the master itself as well.

It is an ancient installation and the physical servers have either RAID-0 or single disk. They used OpenEBS Hostpath for persistent volumes for all the products.

Now, this is a development cluster but it contains important data. We have several small issues to fix, like:

- Migrate the PV to a distributed storage like NFS

- Make backups of relevant data

- Reinstall the servers and have proper RAID-1 ( at least )

We do not have much resources. We do not have ( for now ) a spare server.

We do have a NFS server. We can use that.

What are good options to implement to mitigate the problems we have? Our goal is to reinstall the servers using proper RAID-1 and migrate some PV to NFS so the data is not lost if we lose one node.

I listed some actions points:

- Use the NFS, perform backups using Valero

- Migrate the PVs to the NFS storage


At least we would have backups and some safety.

But how could we start with the servers that do not have RAID-1? The very master itself is single disk. How could we reinstall it and bring it back to the cluster?

The ideal would be able to reinstall server by server until all of them have RAID-1 ( or RAID-6 ). But how could we start. We have only one master and PV attached to the nodes themselves

Would be nice to convert this setup to proxmox or some virtualization system. But I think this is a second step.

Thanks!

https://redd.it/1kvdnb3
@r_devops
Scaling Postgres with Kubernetes, guide on partitioning sharding and replication

i have written a guide on setting up high availability Postgres cluster with sharding, replication and partitioning. Hope you find this helpful. 🐘



https://blog.sagyamthapa.com.np/scaling-postgresql-with-kubernetes

https://redd.it/1kvdc66
@r_devops
👍1
Developer to Devops resume review

I'm a backend developer with over 2.5 years of experience, and I’m looking to transition into a DevOps role. In my resume, the Developer and DevOps roles are listed under the same company. I’ve been involved in DevOps tasks for the past year, but there wasn’t much to learn beyond the tools I’ve already mentioned. That’s why I worked on personal projects to gain a deeper understanding.

Most of the DevOps skills I’ve acquired have been through these personal projects.

I’ve currently separated the Developer and DevOps roles into two parts on my resume, as I wasn’t sure how to present the experience correctly.

I would appreciate your guidance while keeping these points in mind. I’m open to omitting anything unnecessary and willing to add whatever is needed.

My resume below..
kindly review
https://i.postimg.cc/4x1BFCXw/IMG-20250523-225607.jpg

https://redd.it/1kviy4n
@r_devops
cheaper datadog alternative for APM?

Our datadog bill is starting to get eye watering for web APM purposes. We use datadog for web APM because we need insight into site code for a couple of python and nodejs services, and well.. they were the safe choice. But our data volume has gone up quite a bit over the past 4 months so i'm now tasked to evaluate other options.

We already use elastic for an internal service and we're happy with that, so that could be an option for logging. I'm open to ideas, Honeycomb, Sentry, Sumo Logic, Splunk, New Relic, Dynatrace, Grafana, Groundcover, whatever works. Cloud Metrics are cool but that's not what we use DD for. So if it can't do traces it's automatically a non-starter. Preferably no deep dev integration (or code change would be great).. we just don't have the resource got other fire fights to deal with. Open to database APM feature, good over postgresql work loads and then tying web apm traces to db traces.

Advice / input appreciated.

https://redd.it/1kvlssd
@r_devops
How I Blocked 95% of Web Attacks Using AWS WAF Blog


I recently wrote a blog post about securing web apps using AWS WAF, and how you can block up to 95% of common attacks (like SQL injection, XSS, bot traffic, and even basic DDoS) with just a few clicks in the AWS Console.

If you’re on AWS and haven’t tried WAF yet (or find it intimidating), this guide breaks it down step by step:

https://blog.prateekjain.dev/how-to-block-up-to-95-of-attacks-using-aws-waf-e2223efc1f55?sk=cc74156befaab48297655a00f352f4e6

https://redd.it/1kvm4gp
@r_devops
Best books/Courses to transition from Developper to Devops

Hello everyone,
i am a fullstack developper with 4 years of experience. I use Angular/Typescript for frontend and SpringBoot/Java for the backend.

I also have basic knowledge of Docker, basic knowledge of Jenkins (using the pipeline and writing basic templates), i also have Kubernetes Developer Certification and some knowledge in cloud (AWS basic services , and have azure fundamentals), and some linux basics.

I would like to transition from developer to Devops but i am a bit lost in what path to follow. So i would like some recommendation for couple of books or courses to help me transition to Devops.



PS: I know it depends, and maybe a bit subjective but any guide would help me understand.

Thank you!


https://redd.it/1kvoyoz
@r_devops
Build an incident response workflow with n8n + Prometheus

Hey guys,

I’m working on a monitoring setup that automates basic incident resolutions.

This is the visualization of the flow:

https://drive.google.com/file/d/1HiobPj50VZp1VylyqLTXLAeqDoJtrG\_x/view

I’m using Prometheus - Grafana for monitoring, Alertmanager to send alerts, and n8n to orchestrate a workflow, then an AWS Lambda function to restart the services. “Restart services” is a kind of demo action, you can customize it for your needs.

How does it work?

Prometheus: I configure some basic rules to alert when CPU/Memory exceeds a threshold. When the thresholds are exceeded, it will send a webhook to n8n system.
N8n flow: Get information, analyze the metrics, calculate the business hours or incident duration, and send alerts to Discord or escalate to PagerDuty.
AI agent (in n8n): I define a prompt to check for the input. I will consider the metrics and current contexts to decide whether to restart the services or not.
Lambda function: Receive the commands from AI agent and process if necessary. Currently, I grant it to restart an EC2 instance to make the service available again when the system overloaded.

I hope this helps you to apply an automated stack in your team. I’ve shared the example materials in those repositories:

One-click to set up Prometheus - Alert Manager - Grafana at

[
https://github.com/Bubobot-Team/monitoring-stack/tree/main/stacks/prometheus-stack](https://github.com/Bubobot-Team/monitoring-stack/tree/main/stacks/prometheus-stack)

N8n workflow in JSON format (just copy into your n8n dashboard): https://github.com/Bubobot-Team/automation-workflow-monitoring

Btw, just wondering, what recovery actions would you automate? (e.g., disk cleanup, rollback deployments). I would like to hear your feedback to improve the current flow.

https://redd.it/1kvqdph
@r_devops
Container is instance of image like in coding an object is instance of class?

class Dog {
String name;
int age;

Dog(String name, int age) {
this.name = name;
this.age = age;
}
}

// Creating multiple instances with different values
Dog dog1 = new Dog("James", 3);
Dog dog2 = new Dog("Bella", 5);

Docker

docker run -d --name app1 -e NAME=James -e AGE=3 mydogimage
docker run -d --name app2 -e NAME=Bella -e AGE=5 mydogimage



Is this true or I misunderstand

https://redd.it/1kvvp25
@r_devops
Atlassian Bamboo

Any devops who are still using this?

I’m 3 months into my promotion as devops engineer and have been given the keys to the bamboo kingdom.

It’s legacy and deprecated I believe. Also, with it being on premise it’s not the easiest to lab.

Interested in finding out who still uses this and how they find it?

I’m currently implanting a snyk integration for our code.

Thanks and have a wonderful day!

https://redd.it/1kvx0mg
@r_devops
Migration from GCP to OCI instances

I have 10+ servers on GCP which I want to migrate to oci. Some are production instances with live traffic and some are dev/testing servers. What is the best approach to migrate along with all the data. Is there a possibility of transferring snapshots?
GCP instances are running on centOS while the oci will run the Oracle linux images.
Any lead will be helpful

https://redd.it/1kvy85p
@r_devops
Questions about the LFS258 Kubernetes Course – Worth It for CKA Prep?

Hi everyone,

I'm looking into taking the **LFS258 - Kubernetes Fundamentals** course from the Linux Foundation, and I have a few questions for those who have taken it:

* Is the course mostly pre-recorded video lectures?
* Does it include hands-on labs and troubleshooting practice?
* Is it beginner-friendly for someone with **no prior Kubernetes experience**?
* Is it enough on its own to prepare for the **CKA (Certified Kubernetes Administrator)** exam?
* Would you recommend buying **just the course**, or going for the **bundle with the exam voucher**?
* Are there any known **discount codes or promotions** for this course?
* Lastly, would you say this course is a good choice for someone coming from a **Cloud Engineering background** and looking to transition into **DevOps**?

Appreciate any insights or advice you can share – thank you!

https://redd.it/1kw1ner
@r_devops
I ruined a POC

Been a DevOps from 4.5 years. Started from Linux administrator and now I'm managing cloud, db and container orchestration. So my manager asked me to do a POC on traefik which is a reverse proxy just like nginix. I did well, explored the features but was unable to implement fail2ban plugin in it. When I was presenting the same to my manager, i forgot basic docker compose syntax and now I think my role is in jeopardy. Anyone else faced this? Motivate me please, I'm scared.

https://redd.it/1kw0o9g
@r_devops
What’s the best SSO solution for a +50 mid-sized company in 2025?

Curious to hear what the DevOps community is seeing work best today.

For companies with \~50–200 employees, minimal internal IT, and tools like GitHub, Gmail, Vault, AWS, and Graylog — what are your go-to SSO solutions?

Looking for feedback on:

Ease of integration (SAML/OIDC)
Multi-IDP support
Support for SCIM provisioning
Transparent, scalable pricing (no bloated enterprise overhead)
Good developer experience

Here’s a list I often see in conversations:

Azure AD (Entra ID)
[Keycloak](https://www.keycloak.org/)
Authentik
[WorkOS](https://workos.com/)
SSOJet

Would love to hear your experience with any of these or other favorites — especially across multi-tenant or external user auth use cases.

https://redd.it/1kw0uvh
@r_devops
Docker images works fine on local but not on gcp.

Hi everyone,

I’m running a Docker image with an old Ruby version on Debian. It works locally with Docker Compose, but fails with “Service Unavailable” on GCP Cloud Run. The issue seems to be incompatibility with the latest Ubuntu version used in the infra.

I can’t upgrade Ruby due to legacy constraints—we’re rewriting it in another language. Any suggestions for getting this to run on Cloud Run as-is?

Thanks!



https://redd.it/1kw0fpi
@r_devops
Multi-stage release pipeline, how to require one approval from each of two separate groups?

Hi all I am trying to implement a release pipeline using Azure DevOps and using yaml.

I have a requirement where two groups need to manually approve a release. At least one person per group must approve. So I deploy to an environment like `staging` or `prod`, but before deployment I want a manual approval gate where at least one person from `group a` and at least one person from `group b` need to manually approve.

I want to avoid using the Classic Release UI as I want the whole process to be code-defined in yaml.

I have tried looking at yaml definition but I did not get very far, to be honest if I could version control groups here that would be a really nice feature. Using ManualValidation@0 in yaml sounded interesting but given that anyone can approve and no concept of groups as far as I can tell so this is out of the question.

I have tried looking into `environments` with approval checks but Azure DevOps only supports assigning a single group to an environment’s approval gate. That doesn't seem to allow me to enforce the "one per group" logic.

I came across the idea of using two environments per stage eg `staging-group-a` and `staging-group-b`. I was also thinking to have two representatives for the workflow and let them defer approval if necessary. Both options sound clunky and I think I prefer the latter one the most.

Is there a simple way to solve this problem? It feels more complicated than it has to be.

https://redd.it/1kw5khg
@r_devops
AWS Native macOS App

I'm a huge infrastructure dev and love working in AWS. But I absolutely hate the UI, and I think it turns a lot of people off by making it seem to complicated.

I'm curious what folks think about a UI on top of AWS. I've been working on a project in the background and curious if others feel similarly or this is just me. Not sure the best way to share pics

I love native apps, so building it as a macOS app to start.

Edit: posted a Imgur link in the comments

https://redd.it/1kw7sk4
@r_devops
Can someone please show me a better way to find related resources in Kubernetes?

I know this problem is solved, I just don't want to go on google and try a few specific tools and I want to find a good tool that:


Allows me to link my deployments to github repositories and show me what services are connected to other services or resources (eg databases)

I want to know the tables of the database and the data models and contracts so I can focus on my features/testing rather than going through loads of microservice repositories

https://redd.it/1kw7s9o
@r_devops
Enterprise application requirements management?

Hi all,

My team manage over 100 applications and requirements management hasn't been a strong suit in the past.

What business-facing processes and systems would be considered best practice to manage current and in-development functional and non-functional requirements/stories in an Enterprise?

We can maintain product backlogs in a SDLC process, but for large initiatives/projects, we have PMs that often create new Azure DevOps or Jira projects and end up with a de-centralised list of requirements to link test cases to.

I want transparency and collaboration with the various product owners in our organisation to help maintain a central list of requirements that we can establish test cases against and refer to it when needed for root-cause analysis and change management.



https://redd.it/1kwa545
@r_devops