Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Self-hosting message brokers

Anyone have experience from the field, war stories, common oversights or such from hosting your own message brokers like RabbitMQ?

Looking at the documentation makes it seems like you'd never want to self-host it, it's quite complex and seems to require deep expertise throughout the OSI stack as well. However, it seems to be a commonly self-hosted solution so maybe I am overthinking it?

https://redd.it/1mljd7d
@r_devops
What's One 'Standard' DevOps Practice That's Actually Rare in Production?

We all talk about GitOps, immutable infrastructure, and zero-downtime deployments - but what's something that's considered 'standard' that you rarely see implemented properly in real production environments?

https://redd.it/1mlke5t
@r_devops
More specialist platform/cloud eng role or generalist DevOps

Hello everyone,

I'm currently thinking about two offers and could use some advice. Context, I previously had 3yoe as a DevOps at a bank, currently lacking hands on in some stuffs like observability and k8s

1. Cloud/Platform Role: This is a more specialized role in a large corp, focused on fewer sections of DevOps. The pay is decent, the culture seems good, and it's hybrid working, and my health isn't so great so that's ideal.
2. Generalist Role: Much smaller company, there are only a few DevOps people. This position offers hands-on experience with the entire stack. It comes with a 20-30% higher salary than the first option, but it's fully on-site with a culture that is quite notoriously micromanaging-ish

I'm torn between career growth and a better work-life balance. I know that many markets require hands-on knowledge of the entire DevOps stack, which makes the second option appealing for long-term career growth. However, the first offer's remote work and positive culture are also very important to me, also first offer's brand name is better.

https://redd.it/1mllypa
@r_devops
Help with architecture design

Hello everyone!

We are undergoing some changes at work and I wanted a diagram to have a better overview of what we want and impress them by taking the initiative.

Let me first explain what we currently have:

1. Windows server - App 1 - Production frontend + backend
2. Database server - App 1 - Production
3. Windows server - App 1 - Staging frontend + backend
4. Database server - App 1 - staging
5. Windows server - App 2 - Production frontend + backend + database
6. Windows server - App 2 - staging frontend + backend + database
7. Linux server with Plesk

This is by no means a perfect setup, but it has served us fine for years.


We are now changing our server provider and while doing that I figured its time to take a look at the architecture and how everything can be improved. We are considering two machines that we can run VMs on. And we are contemplating if we want to have everything on machine One mirrored to machine Two so we have redundancy if something were to happen.

With the new solution for App 2 I would like to be able to "flip a switch" to change the staging and production without any downtime. I would also like the possibility of spinning up new server for a branch, either automatically or manually.

When it comes to the diagram, App 1 can be ignored. This will run in four VM's kind of like it is today.

But app two, which is under development can be moved away from IIS and over to docker or k3s.
The database is Microsoft SQL so that must be in a VM.

I don't really have any experience with this kind of stuff, but after doing some research and consulting with ChatGPT, this is what I have made so far. https://imgur.com/a/0xyABEi

This diagram is mostly for App 2, as App 1 and the Plesk can live on their own VMs I think.

I would love it if anyone have any tips on the architecture and how it can be better.
I feel like there's a lot of stuff missing from the diagram but I don't really know what else to do.

I am also complementing whether everything should be as it is in the diagram, with everything on the same k3s, or if I should have two environments, one with production and staging and the other for the development stuff.

All of the software choices has been made with input from ChatGPT because I don't have any experience with any of them, except of course GitHub.

https://redd.it/1mlpnt8
@r_devops
Pull Req strategy for deploying new services?

A common task on my team is deploying new services, which we call "dataflows". A developer will open a ticket asking us to enable dataflow to a backend application. Our process right now is to open a repo for the application using a GH template which contains orcehstrated Terraform deploying the necessary infra across our CDN, layer 3 firewall, and cloud platform - then use tfvars for each environment and deploying via GH Actions. Pretty simple and efficient compared to how everything used to be done manually.

Next step - I want to empower devs to make these reqs just using a pull request with only manual approval needed from my team, but I think the new repo process adds a lot of complication and is maybe unnecessary. I had this thought: Put all dataflows in a SINGLE repo, with individual folders for each dataflow containing their tfvars, then use a GH Actions workflow which runs on Push to to run TF Plan on the newly added or updated tfvars, then GH Actions workflow which runs on approved PR to run TF Apply. The devs can independently stand up a feature branch with a folders for their new dataflow with tfvars. (State files are all separate btw, per application environment - I'm not worried about overlap within the same repo, we have state management under control).

I found some basic GH Actions steps that check for differences between current commit and previous commit for grabbing the added/modified tfvars, but haven't had much time to experiment to yet. Before I go much further, figured I'd ask: has anyone done something like this? Any feedback? Are there better methods for this than what I described a moment ago (comparing differences between commits)?

https://redd.it/1mlr7x8
@r_devops
Need career advice

Hey folks,

I’ve got about 3 years in DevOps, mostly on the Salesforce side — managing releases, setting up CI/CD for Salesforce metadata deployments with Copado, Gearset,GitHub Actions/YAML, doing Bash/Linux scripting, a bit of Python scripting, and using Salesforce admin/CLI tools.

I’ve never worked on cloud infrastructure (AWS/GCP/Azure) or tools like Docker, Kubernetes, Terraform, Prometheus, but I really want to move into a broader DevOps role that’s more infra/cloud-focused.

Has anyone here made that switch or can offer guidance? I find Salesforce DevOps pretty boring and want to work on actual infrastructure.

How should I approach this? Should I mention my Salesforce experience in interviews? Do companies consider platform-focused DevOps folks for cloud roles?

Thanks

https://redd.it/1mluflb
@r_devops
Beginner in DevOps – Can I land an internship yet?

TECHNICAL SKILLS

• Cloud Platforms: Amazon Web Services (AWS)-EC2, S3, VPC, NACL, NAT Gateway, Security Groups

• Infrastructure as Code (IaC): Terraform • Configuration Management: Ansible

• Containerization: Docker, Dockerfile • Operating Systems: Linux (Ubuntu), Windows

• Programming/Scripting Languages: JavaScript, Python, Bash Scripting

• Version Control & Tools: Git, GitHub

PROJECTS

1. Automated Multi-Stage Node.js Application Deployment on AWS EC2
• Designed and implemented a multi-stage Dockerfile to efficiently build and containerize a Node.js application, reducing the final image size for faster deployment.
• Deployed the Dockerized application to an AWS EC2 instance, managing the instance's configuration. • Configured an AWS Security Group to control inbound traffic, ensured that the application was accessible on a specific port while maintaining a secure environment.

2. Static Website Hosting and Management on AWS S3
• Hosted a static website on an AWS S3 bucket, configured the bucket for public web hosting.
• Managed bucket policies and access control lists to allow public read access, demonstrating an understanding of AWS permissions and security.

3. Cloud Infrastructure Provisioning with Terraform
• Utilized Terraform to provision and manage a Virtual Private Cloud (VPC) on AWS, including subnets and route tables.
• Used Infrastructure as Code (IaC) approach to define the infrastructure, enabling repeatable and consistent deployments.
• Managed the state of the infrastructure and applied changes to the environment in a controlled manner.

4.Automated Dependency Management with Ansible
• Developed Ansible playbooks to automate the installation of dependencies on an EC2 instance.
• Demonstrated the ability to configure servers consistently and reduce manual setup time.

https://redd.it/1mlv6vy
@r_devops
How are you handling IPv4 vs IPv6 in your setups?

I’m tweaking our network setup and got stuck thinking about IPv4 vs IPv6 for our cloud infra. I found this IPv4 vs IPv6 breakdown that explains the address space limits and security differences, which got me wondering how folks here are managing the transition. Are you sticking with IPv4 with NAT, going full IPv6, or running dual-stack? What’s been the biggest pain point for you with either protocol in production?

https://redd.it/1mlx1wh
@r_devops
How would you design multi-cluster EKS job triggers at scale?

I’m building a central dashboard (in its own EKS cluster) that needs to trigger long-lived Kubernetes Jobs in multiple target EKS clusters — one per env (**dev**, **qa**, **uat**, **prod**).

The flow is simple: dashboard sends a request + parameters → target cluster runs a job (`db-migrate`, `data-sync`, `report-gen`, etc.) → job finishes → dashboard gets status/logs.

Current setup:

* Target clusters have **public API endpoints** locked down via strict IP allowlists.
* Dashboard only needs **create Job + read status** perms in a namespace (no cluster-admin).
* All triggers should be **auditable** (who ran it, when, what params).

I’m *okay* with sticking to public endpoints + IP restrictions for now but I’m wondering: is this actually **scalable and secure** once you go beyond a handful of clusters?

How would *you* solve this problem and design it for scale?

* Networking
* Secure parameter passing
* RBAC + auditability
* Operational overhead for 4–10+ clusters

If you’ve done something like this, I’d love to hear
Links, diagrams, blog posts — all appreciated.


**TL;DR:** Need to trigger parameterised Jobs across multiple private EKS clusters from one dashboard. Public endpoints with IP allowlists are fine for now, but I’m looking for scalable, secure, auditable designs from folks who’ve solved this before. Ideas/resources welcome.

https://redd.it/1mlu559
@r_devops
AI LLM for troubleshooting

Which one you prefer to use if any - Claude, Chatgtp, etc specifically to be as vibe coding or troubleshooting issue with like terraform or ansible or dockedfiles, github actions, etc?

https://redd.it/1mm49ab
@r_devops
How realistic is it to land a full-time Cloud/DevOps role in Toronto as a 4th year CS student?

I’m in my 4th year of a Computer Science degree specializing in Cloud Computing, and I’m considering switching to part-time studies if I can secure a full-time role. I have my AWS CCP.

Right now, I’m working on two portfolio projects:
• a cloud-native environmental dashboard (AWS, Terraform, CI/CD)
• a real-time cloud monitoring dashboard

I’m also studying for my AWS Solutions Architect – Associate and Terraform Associate certs.

How realistic is it to land a junior Cloud/DevOps position in the Toronto market right now? And is it worth trying before graduation?

https://redd.it/1mm8b5i
@r_devops
Transitioning from Accounting + Analytics into FinOps (AI + Cloud Cost Optimization) — Need Guidance

Hi all,

I’m exploring a career transition into FinOps / Cloud Financial Management and would love insights from professionals already in the field.

Background:

4+ years in accounting & finance (financial reporting, reconciliations, audits)

Completed MS in Business Analytics (STEM OPT eligible) in the U.S.

Tools: Power BI, SQL, Excel (Advanced), QuickBooks, INFOR FM

Experience building dashboards & data models for finance teams

Familiar with U.S. GAAP and Indian GAAP

Strong process improvement background (SOPs, RCA)

Goal:

Move into FinOps Analyst or Cloud Financial Analyst role with an AI + automation focus

Open to starting as Financial Analyst (Cloud/Tech) as an entry point

Questions:

How realistic is it to land a FinOps role without prior FinOps experience if I bring strong finance + BI skills?

Which technical skills or certs (FinOps Certified Practitioner, AWS, Azure) make the biggest difference in getting hired?

I’m seeing fewer FinOps job postings right now — is this normal? Will demand grow or shrink as AI adoption increases in cloud operations?

For those in the field — how much of your role is technical vs stakeholder communication?

Any recommended tools, datasets, or side projects I can start to show FinOps capability?

Thanks in advance for any advice or resources!

https://redd.it/1mm9xwi
@r_devops
Experiences of cloud/on-prem infrastructure security headaches

I’m collecting real-world stories about common cloud infra/security headaches for a project. Ever had a misconfig bite you? Curious what the #1 pain point is in your setup.

https://redd.it/1mm94va
@r_devops
Guide me please

I can build websites using the MERN stack, and I now want to start learning DevOps. I know I need to learn Linux and networking basics first. Could you suggest some sources free or paid where I can learn these skills? Also, is there anything else I should learn before starting DevOps?

https://redd.it/1mmbe7o
@r_devops
Just started as a System Engineer — aiming for DevOps, need guidance on next

Hey everyone,I recently got my first role as a System Engineer. My work right now involves system administration, Linux basics, and some networking.In the long run, I want to move into DevOps — I’m interested in automation, CI/CD pipelines, and cloud platforms — but I’m honestly a bit nervous about the coding side. I’m fine with basic scripting, but anything beyond that feels intimidating.Here’s what I currently know:Comfortable with Linux (process/user management, permissions, disk mgmt, still learning networking)Basic cloud concepts (AWS fundamentals)Some exposure to shell scriptingBasic SQL & Excel (from previous work)I’d love advice on:Which DevOps tools or platforms to learn first (e.g., Jenkins, Docker, Kubernetes, Terraform, etc.)How much coding I realistically need to be good at for DevOps Recommended learning order so I don’t get overwhelmedAny tips for building a DevOps portfolio as a beginner.

https://redd.it/1mmeemy
@r_devops
Beginner to AWS (DAY 5) : rate the level of this project (also suggest me some good projects so that i'll be able to land an internship/job ) ps: i am currently in my last year of Engineering

Built a production-ready AWS VPC architecture:

• Deployed EC2 instances in private subnets across two Availability Zones.

• Configured Application Load Balancer for incoming traffic distribution.

• Implemented Auto Scaling for elastic capacity.

• Enabled secure outbound internet access using dual NAT gateways for high availability.

• Ensured fault tolerance and resilience with multi-AZ design.

https://redd.it/1mmkgt5
@r_devops
GDG Organizer: Done All Associate GCP Certs, Voucher in Hand - Which Professional Should I Pursue Next? (Seeking Advice)

https://preview.redd.it/owi4ldw028if1.png?width=414&format=png&auto=webp&s=b43e7c0183027f7456a4223ad6e5b6237ae4e5b6


Hey everyone!

As a GDG Organizer, I'm deep into cloud stuff. I've got a GCP cert voucher burning a hole in my pocket, and I'm ready to tackle a Professional cert after doing ACE and CDL.

My background is heavy in backend (Python/Go/FastAPI), DevOps (CI/CD, Docker, K8s, Terraform, Prometheus, Grafana), and I'm also into AI/ML (LLMs, GenAI).

Given that, which Professional cert would you recommend and why? (Architect, DevOps, ML Engineer, Developer, Data Engineer?)

Also, quick side note/vent: GCP support has been truly horrendous for me lately. Seriously the worst support system I've dealt with. Anyone else feel this pain?

Thanks for any tips!

https://redd.it/1mmnjqf
@r_devops
Looking for Suggestions on Solid, Interview-Worthy DevOps Projects

I’ve recently completed most of the core fundamentals in my DevOps learning journey, and now I’m looking for ideas for a solid, end-to-end project that can really stand out in interviews and demonstrate my skills.

Here’s what I’ve covered so far:

Networking fundamentals
Linux system administration and shell scripting
Docker & Kubernetes
Git and GitHub
Basic AWS (with 2–3 small practice projects)
Python scripting

I’ve also done some basic projects (like setting up my own Linux environment, simple CI/CD pipelines, etc.), but now I want to build something closer to real-world, infrastructure-related work that hiring managers would appreciate.

What would you suggest as an impressive, portfolio-worthy DevOps project that’s both challenging and relevant to industry use cases?

TL;DR: Finished core DevOps skills, looking for suggestions on a real-world, interview-ready DevOps project to showcase in my portfolio.

https://redd.it/1mmqhxa
@r_devops
How do you start planning and setting up a new project?

I’ve just been assigned to a project as a DevOps intern and I’m trying to understand how experienced engineers approach things from the start. When you join a project, how do you plan the architecture, choose the right resources, decide on integrations, set up CI/CD pipelines, and define the main components needed? I’m curious about the typical thought process and steps you follow so I can learn how to approach my own work better

https://redd.it/1mmtomn
@r_devops
Ops in DevOps reconversion with apprenticeship. I'm currently learning GitLab and Git while sending job applications. Any tips/hints/recommandation?

Here is my main roadmap until i find an job for my formation in apprenticeship, what do you think about it?

I built it by looking at job boards and my school roadmap, and what is the most demanded and priorise this + requirement (in exemple: i see that Kubernetes is super demanded but without Ansible, Terraform and Docker it's a no go... plus i wanted to put everything in git to learn it by practicing so...)

I have a Homelab based on an old gaming computer (with good CPU and 32Go ram... better than nothing to experiment!)

Anyway here is my actual roadmap (that i have in my README.md in my gitlab, my only file right now x) )
but tomorrow i'll start Docker so it'll be a good start to do a new project and put new files,, like the configuration, make it evolve with branches, take the habit of doing a good and easy to read/maintain main, make multi-branches, etc...)

I don't know yet what i'll put in my servers but probably things usefull for my Homelab+something interesting to show during job interviews!

EDIT: i'd already read a lot and talked about DevOps mindset :) so this part is already pretty well advanced.

\## Roadmap ###

Phase 1: Git Mastery

\- [x\] Create lab-infra project with clear README

\- [x\] First branches -> Merge Requests -> squash merge

\- [x\] Protect main (no direct push)

\- [x\] Install Git; Set user.name and user.email; add SSH key

\- [ \] Master Git conflicts resolution and multi-branch workflow

\### Phase 2: Docker Fundamentals

\- [ \] Install Docker Desktop on Windows

\- [ \] Build first containerized web service (Nginx)

\- [ \] Master Docker best practices (.dockerignore, non-root user, HEALTHCHECK)

\- [ \] Create multi-stage builds for optimized image

\### Phase 3: Terraform Infrastructure

\- [ \] Terraform basics: providers, resources, and state management

\- [ \] Deploy VMs on Proxmox with reusable module

\- [ \] Output VM IPs and names for automation chain

\- [ \] Version control Terraform configurations in Gi

\### Phase 4: Ansible Configuration

\- [ \] Ansible fundamentals: inventory, playbooks, and modules

\- [ \] Build dynamic inventory from Terraform outputs

\- [ \] Create roles for system hardening and Docker installation

\- [ \] Achieve idempotence for reliable automation

\### Phase 5: Kubernetes Orchestration

\- [ \] Deploy k3s cluster on Proxmox VMs using Terraform and Ansible

\- [ \] Master Kubernetes basics: pods, services, deployments

\- [ \] Implement persistent volumes and ingress controllers

\- [ \] Deploy applications with proper manifests

\### Phase 6: Monitoring and Observability

\- [ \] Deploy Prometheus and Grafana stack with automation

\- [ \] Configure node_exporter on all VMs automatically

\- [ \] Create custom dashboards and import professional templates

\- [ \] Setup Alertmanager and test alert workflows

\- [ \] Integrate Datadog for advanced monitoring capabilities

\### Phase 7: CI/CD Integration

\- [ \] Create GitLab CI/CD pipeline for Docker image builds

\- [ \] Implement automated security scanning with Trivy

\- [ \] Deploy to Kubernetes via GitOps approach

\- [ \] Add infrastructure quality gates (tflint, checkov, ansible-lint)

\### All Along: Professional Presentation And Habits

\- [ \] Architecture documentation with diagrams

\- [ \] Disaster recovery procedures documentation

\- [ \] Clean public repository with no secrets

\- [ \] Screenshots and setup instructions for portfolio



https://redd.it/1mmulu9
@r_devops