Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Help with architecture design

Hello everyone!

We are undergoing some changes at work and I wanted a diagram to have a better overview of what we want and impress them by taking the initiative.

Let me first explain what we currently have:

1. Windows server - App 1 - Production frontend + backend
2. Database server - App 1 - Production
3. Windows server - App 1 - Staging frontend + backend
4. Database server - App 1 - staging
5. Windows server - App 2 - Production frontend + backend + database
6. Windows server - App 2 - staging frontend + backend + database
7. Linux server with Plesk

This is by no means a perfect setup, but it has served us fine for years.


We are now changing our server provider and while doing that I figured its time to take a look at the architecture and how everything can be improved. We are considering two machines that we can run VMs on. And we are contemplating if we want to have everything on machine One mirrored to machine Two so we have redundancy if something were to happen.

With the new solution for App 2 I would like to be able to "flip a switch" to change the staging and production without any downtime. I would also like the possibility of spinning up new server for a branch, either automatically or manually.

When it comes to the diagram, App 1 can be ignored. This will run in four VM's kind of like it is today.

But app two, which is under development can be moved away from IIS and over to docker or k3s.
The database is Microsoft SQL so that must be in a VM.

I don't really have any experience with this kind of stuff, but after doing some research and consulting with ChatGPT, this is what I have made so far. https://imgur.com/a/0xyABEi

This diagram is mostly for App 2, as App 1 and the Plesk can live on their own VMs I think.

I would love it if anyone have any tips on the architecture and how it can be better.
I feel like there's a lot of stuff missing from the diagram but I don't really know what else to do.

I am also complementing whether everything should be as it is in the diagram, with everything on the same k3s, or if I should have two environments, one with production and staging and the other for the development stuff.

All of the software choices has been made with input from ChatGPT because I don't have any experience with any of them, except of course GitHub.

https://redd.it/1mlpnt8
@r_devops
Pull Req strategy for deploying new services?

A common task on my team is deploying new services, which we call "dataflows". A developer will open a ticket asking us to enable dataflow to a backend application. Our process right now is to open a repo for the application using a GH template which contains orcehstrated Terraform deploying the necessary infra across our CDN, layer 3 firewall, and cloud platform - then use tfvars for each environment and deploying via GH Actions. Pretty simple and efficient compared to how everything used to be done manually.

Next step - I want to empower devs to make these reqs just using a pull request with only manual approval needed from my team, but I think the new repo process adds a lot of complication and is maybe unnecessary. I had this thought: Put all dataflows in a SINGLE repo, with individual folders for each dataflow containing their tfvars, then use a GH Actions workflow which runs on Push to to run TF Plan on the newly added or updated tfvars, then GH Actions workflow which runs on approved PR to run TF Apply. The devs can independently stand up a feature branch with a folders for their new dataflow with tfvars. (State files are all separate btw, per application environment - I'm not worried about overlap within the same repo, we have state management under control).

I found some basic GH Actions steps that check for differences between current commit and previous commit for grabbing the added/modified tfvars, but haven't had much time to experiment to yet. Before I go much further, figured I'd ask: has anyone done something like this? Any feedback? Are there better methods for this than what I described a moment ago (comparing differences between commits)?

https://redd.it/1mlr7x8
@r_devops
Need career advice

Hey folks,

I’ve got about 3 years in DevOps, mostly on the Salesforce side — managing releases, setting up CI/CD for Salesforce metadata deployments with Copado, Gearset,GitHub Actions/YAML, doing Bash/Linux scripting, a bit of Python scripting, and using Salesforce admin/CLI tools.

I’ve never worked on cloud infrastructure (AWS/GCP/Azure) or tools like Docker, Kubernetes, Terraform, Prometheus, but I really want to move into a broader DevOps role that’s more infra/cloud-focused.

Has anyone here made that switch or can offer guidance? I find Salesforce DevOps pretty boring and want to work on actual infrastructure.

How should I approach this? Should I mention my Salesforce experience in interviews? Do companies consider platform-focused DevOps folks for cloud roles?

Thanks

https://redd.it/1mluflb
@r_devops
Beginner in DevOps – Can I land an internship yet?

TECHNICAL SKILLS

• Cloud Platforms: Amazon Web Services (AWS)-EC2, S3, VPC, NACL, NAT Gateway, Security Groups

• Infrastructure as Code (IaC): Terraform • Configuration Management: Ansible

• Containerization: Docker, Dockerfile • Operating Systems: Linux (Ubuntu), Windows

• Programming/Scripting Languages: JavaScript, Python, Bash Scripting

• Version Control & Tools: Git, GitHub

PROJECTS

1. Automated Multi-Stage Node.js Application Deployment on AWS EC2
• Designed and implemented a multi-stage Dockerfile to efficiently build and containerize a Node.js application, reducing the final image size for faster deployment.
• Deployed the Dockerized application to an AWS EC2 instance, managing the instance's configuration. • Configured an AWS Security Group to control inbound traffic, ensured that the application was accessible on a specific port while maintaining a secure environment.

2. Static Website Hosting and Management on AWS S3
• Hosted a static website on an AWS S3 bucket, configured the bucket for public web hosting.
• Managed bucket policies and access control lists to allow public read access, demonstrating an understanding of AWS permissions and security.

3. Cloud Infrastructure Provisioning with Terraform
• Utilized Terraform to provision and manage a Virtual Private Cloud (VPC) on AWS, including subnets and route tables.
• Used Infrastructure as Code (IaC) approach to define the infrastructure, enabling repeatable and consistent deployments.
• Managed the state of the infrastructure and applied changes to the environment in a controlled manner.

4.Automated Dependency Management with Ansible
• Developed Ansible playbooks to automate the installation of dependencies on an EC2 instance.
• Demonstrated the ability to configure servers consistently and reduce manual setup time.

https://redd.it/1mlv6vy
@r_devops
How are you handling IPv4 vs IPv6 in your setups?

I’m tweaking our network setup and got stuck thinking about IPv4 vs IPv6 for our cloud infra. I found this IPv4 vs IPv6 breakdown that explains the address space limits and security differences, which got me wondering how folks here are managing the transition. Are you sticking with IPv4 with NAT, going full IPv6, or running dual-stack? What’s been the biggest pain point for you with either protocol in production?

https://redd.it/1mlx1wh
@r_devops
How would you design multi-cluster EKS job triggers at scale?

I’m building a central dashboard (in its own EKS cluster) that needs to trigger long-lived Kubernetes Jobs in multiple target EKS clusters — one per env (**dev**, **qa**, **uat**, **prod**).

The flow is simple: dashboard sends a request + parameters → target cluster runs a job (`db-migrate`, `data-sync`, `report-gen`, etc.) → job finishes → dashboard gets status/logs.

Current setup:

* Target clusters have **public API endpoints** locked down via strict IP allowlists.
* Dashboard only needs **create Job + read status** perms in a namespace (no cluster-admin).
* All triggers should be **auditable** (who ran it, when, what params).

I’m *okay* with sticking to public endpoints + IP restrictions for now but I’m wondering: is this actually **scalable and secure** once you go beyond a handful of clusters?

How would *you* solve this problem and design it for scale?

* Networking
* Secure parameter passing
* RBAC + auditability
* Operational overhead for 4–10+ clusters

If you’ve done something like this, I’d love to hear
Links, diagrams, blog posts — all appreciated.


**TL;DR:** Need to trigger parameterised Jobs across multiple private EKS clusters from one dashboard. Public endpoints with IP allowlists are fine for now, but I’m looking for scalable, secure, auditable designs from folks who’ve solved this before. Ideas/resources welcome.

https://redd.it/1mlu559
@r_devops
AI LLM for troubleshooting

Which one you prefer to use if any - Claude, Chatgtp, etc specifically to be as vibe coding or troubleshooting issue with like terraform or ansible or dockedfiles, github actions, etc?

https://redd.it/1mm49ab
@r_devops
How realistic is it to land a full-time Cloud/DevOps role in Toronto as a 4th year CS student?

I’m in my 4th year of a Computer Science degree specializing in Cloud Computing, and I’m considering switching to part-time studies if I can secure a full-time role. I have my AWS CCP.

Right now, I’m working on two portfolio projects:
• a cloud-native environmental dashboard (AWS, Terraform, CI/CD)
• a real-time cloud monitoring dashboard

I’m also studying for my AWS Solutions Architect – Associate and Terraform Associate certs.

How realistic is it to land a junior Cloud/DevOps position in the Toronto market right now? And is it worth trying before graduation?

https://redd.it/1mm8b5i
@r_devops
Transitioning from Accounting + Analytics into FinOps (AI + Cloud Cost Optimization) — Need Guidance

Hi all,

I’m exploring a career transition into FinOps / Cloud Financial Management and would love insights from professionals already in the field.

Background:

4+ years in accounting & finance (financial reporting, reconciliations, audits)

Completed MS in Business Analytics (STEM OPT eligible) in the U.S.

Tools: Power BI, SQL, Excel (Advanced), QuickBooks, INFOR FM

Experience building dashboards & data models for finance teams

Familiar with U.S. GAAP and Indian GAAP

Strong process improvement background (SOPs, RCA)

Goal:

Move into FinOps Analyst or Cloud Financial Analyst role with an AI + automation focus

Open to starting as Financial Analyst (Cloud/Tech) as an entry point

Questions:

How realistic is it to land a FinOps role without prior FinOps experience if I bring strong finance + BI skills?

Which technical skills or certs (FinOps Certified Practitioner, AWS, Azure) make the biggest difference in getting hired?

I’m seeing fewer FinOps job postings right now — is this normal? Will demand grow or shrink as AI adoption increases in cloud operations?

For those in the field — how much of your role is technical vs stakeholder communication?

Any recommended tools, datasets, or side projects I can start to show FinOps capability?

Thanks in advance for any advice or resources!

https://redd.it/1mm9xwi
@r_devops
Experiences of cloud/on-prem infrastructure security headaches

I’m collecting real-world stories about common cloud infra/security headaches for a project. Ever had a misconfig bite you? Curious what the #1 pain point is in your setup.

https://redd.it/1mm94va
@r_devops
Guide me please

I can build websites using the MERN stack, and I now want to start learning DevOps. I know I need to learn Linux and networking basics first. Could you suggest some sources free or paid where I can learn these skills? Also, is there anything else I should learn before starting DevOps?

https://redd.it/1mmbe7o
@r_devops
Just started as a System Engineer — aiming for DevOps, need guidance on next

Hey everyone,I recently got my first role as a System Engineer. My work right now involves system administration, Linux basics, and some networking.In the long run, I want to move into DevOps — I’m interested in automation, CI/CD pipelines, and cloud platforms — but I’m honestly a bit nervous about the coding side. I’m fine with basic scripting, but anything beyond that feels intimidating.Here’s what I currently know:Comfortable with Linux (process/user management, permissions, disk mgmt, still learning networking)Basic cloud concepts (AWS fundamentals)Some exposure to shell scriptingBasic SQL & Excel (from previous work)I’d love advice on:Which DevOps tools or platforms to learn first (e.g., Jenkins, Docker, Kubernetes, Terraform, etc.)How much coding I realistically need to be good at for DevOps Recommended learning order so I don’t get overwhelmedAny tips for building a DevOps portfolio as a beginner.

https://redd.it/1mmeemy
@r_devops
Beginner to AWS (DAY 5) : rate the level of this project (also suggest me some good projects so that i'll be able to land an internship/job ) ps: i am currently in my last year of Engineering

Built a production-ready AWS VPC architecture:

• Deployed EC2 instances in private subnets across two Availability Zones.

• Configured Application Load Balancer for incoming traffic distribution.

• Implemented Auto Scaling for elastic capacity.

• Enabled secure outbound internet access using dual NAT gateways for high availability.

• Ensured fault tolerance and resilience with multi-AZ design.

https://redd.it/1mmkgt5
@r_devops
GDG Organizer: Done All Associate GCP Certs, Voucher in Hand - Which Professional Should I Pursue Next? (Seeking Advice)

https://preview.redd.it/owi4ldw028if1.png?width=414&format=png&auto=webp&s=b43e7c0183027f7456a4223ad6e5b6237ae4e5b6


Hey everyone!

As a GDG Organizer, I'm deep into cloud stuff. I've got a GCP cert voucher burning a hole in my pocket, and I'm ready to tackle a Professional cert after doing ACE and CDL.

My background is heavy in backend (Python/Go/FastAPI), DevOps (CI/CD, Docker, K8s, Terraform, Prometheus, Grafana), and I'm also into AI/ML (LLMs, GenAI).

Given that, which Professional cert would you recommend and why? (Architect, DevOps, ML Engineer, Developer, Data Engineer?)

Also, quick side note/vent: GCP support has been truly horrendous for me lately. Seriously the worst support system I've dealt with. Anyone else feel this pain?

Thanks for any tips!

https://redd.it/1mmnjqf
@r_devops
Looking for Suggestions on Solid, Interview-Worthy DevOps Projects

I’ve recently completed most of the core fundamentals in my DevOps learning journey, and now I’m looking for ideas for a solid, end-to-end project that can really stand out in interviews and demonstrate my skills.

Here’s what I’ve covered so far:

Networking fundamentals
Linux system administration and shell scripting
Docker & Kubernetes
Git and GitHub
Basic AWS (with 2–3 small practice projects)
Python scripting

I’ve also done some basic projects (like setting up my own Linux environment, simple CI/CD pipelines, etc.), but now I want to build something closer to real-world, infrastructure-related work that hiring managers would appreciate.

What would you suggest as an impressive, portfolio-worthy DevOps project that’s both challenging and relevant to industry use cases?

TL;DR: Finished core DevOps skills, looking for suggestions on a real-world, interview-ready DevOps project to showcase in my portfolio.

https://redd.it/1mmqhxa
@r_devops
How do you start planning and setting up a new project?

I’ve just been assigned to a project as a DevOps intern and I’m trying to understand how experienced engineers approach things from the start. When you join a project, how do you plan the architecture, choose the right resources, decide on integrations, set up CI/CD pipelines, and define the main components needed? I’m curious about the typical thought process and steps you follow so I can learn how to approach my own work better

https://redd.it/1mmtomn
@r_devops
Ops in DevOps reconversion with apprenticeship. I'm currently learning GitLab and Git while sending job applications. Any tips/hints/recommandation?

Here is my main roadmap until i find an job for my formation in apprenticeship, what do you think about it?

I built it by looking at job boards and my school roadmap, and what is the most demanded and priorise this + requirement (in exemple: i see that Kubernetes is super demanded but without Ansible, Terraform and Docker it's a no go... plus i wanted to put everything in git to learn it by practicing so...)

I have a Homelab based on an old gaming computer (with good CPU and 32Go ram... better than nothing to experiment!)

Anyway here is my actual roadmap (that i have in my README.md in my gitlab, my only file right now x) )
but tomorrow i'll start Docker so it'll be a good start to do a new project and put new files,, like the configuration, make it evolve with branches, take the habit of doing a good and easy to read/maintain main, make multi-branches, etc...)

I don't know yet what i'll put in my servers but probably things usefull for my Homelab+something interesting to show during job interviews!

EDIT: i'd already read a lot and talked about DevOps mindset :) so this part is already pretty well advanced.

\## Roadmap ###

Phase 1: Git Mastery

\- [x\] Create lab-infra project with clear README

\- [x\] First branches -> Merge Requests -> squash merge

\- [x\] Protect main (no direct push)

\- [x\] Install Git; Set user.name and user.email; add SSH key

\- [ \] Master Git conflicts resolution and multi-branch workflow

\### Phase 2: Docker Fundamentals

\- [ \] Install Docker Desktop on Windows

\- [ \] Build first containerized web service (Nginx)

\- [ \] Master Docker best practices (.dockerignore, non-root user, HEALTHCHECK)

\- [ \] Create multi-stage builds for optimized image

\### Phase 3: Terraform Infrastructure

\- [ \] Terraform basics: providers, resources, and state management

\- [ \] Deploy VMs on Proxmox with reusable module

\- [ \] Output VM IPs and names for automation chain

\- [ \] Version control Terraform configurations in Gi

\### Phase 4: Ansible Configuration

\- [ \] Ansible fundamentals: inventory, playbooks, and modules

\- [ \] Build dynamic inventory from Terraform outputs

\- [ \] Create roles for system hardening and Docker installation

\- [ \] Achieve idempotence for reliable automation

\### Phase 5: Kubernetes Orchestration

\- [ \] Deploy k3s cluster on Proxmox VMs using Terraform and Ansible

\- [ \] Master Kubernetes basics: pods, services, deployments

\- [ \] Implement persistent volumes and ingress controllers

\- [ \] Deploy applications with proper manifests

\### Phase 6: Monitoring and Observability

\- [ \] Deploy Prometheus and Grafana stack with automation

\- [ \] Configure node_exporter on all VMs automatically

\- [ \] Create custom dashboards and import professional templates

\- [ \] Setup Alertmanager and test alert workflows

\- [ \] Integrate Datadog for advanced monitoring capabilities

\### Phase 7: CI/CD Integration

\- [ \] Create GitLab CI/CD pipeline for Docker image builds

\- [ \] Implement automated security scanning with Trivy

\- [ \] Deploy to Kubernetes via GitOps approach

\- [ \] Add infrastructure quality gates (tflint, checkov, ansible-lint)

\### All Along: Professional Presentation And Habits

\- [ \] Architecture documentation with diagrams

\- [ \] Disaster recovery procedures documentation

\- [ \] Clean public repository with no secrets

\- [ \] Screenshots and setup instructions for portfolio



https://redd.it/1mmulu9
@r_devops
What Kubernetes do, that Google Cloud can not?

I never used Kubernetes, but I use Google Cloud at work.

We run our code in multiple products from Google Cloud:

\- Cloud Run for our APIs;
\- Cloud Scheduler for some scheduled background jobs;
\- Cloud SQL for DB;
\- etc;

What Kubernetes offer that I can't get from Google Cloud products? Why would I ever use GKE (Google Kubernetes Engine)?

https://redd.it/1mmwesq
@r_devops
Interview this week for a contract Platform engineer for Nvdia

Interview this week for a contract Platform engineer for Nvdia
Any other contractors/FT working for Nvdia and can offer some insights
What are some of the questions I can ask to find this is legit ?

Potential red flags
Asking for my references before the interview and very generic information about the role
Rate is very low
Direct vendor

Please DM if you do not want to share info on reddit

TLDR: check if the req for Platform engineer at Nvdia is legit or a attempt to get my details

TIA



https://redd.it/1mmxrwx
@r_devops
I'm struggling to figure out how to handle user data in the context of cattle-like VMs when the VMs are developers' primary workstations.

Working in azure for context.

We have developers currently in personal assignment pools where everyone gets their own unique vm. We have FSLogix handling the mounting of the user profiles, but it doesn't do anything for the user data, such as git repos that have been cloned down. Users often run local builds, so performance is important here.

The workstations are very much pets in that they get patches regularly and are very long lived. We are trying to move to immutable images, and trying to break free from the pets at the same time.

I can use a pooled host pool for the vm itself, and Fslogix for the user profile, but I always get stuck on the user data. If performance wasn't an issue, a network share with acls being mounted on login seems to be the best solution. Given that users would constantly be doing local builds on that share, I would like a 'local' solution instead, I'm just not sure what it is.

Something that could attach an existing data disk at logon or even at boot, based on the user that signed in, would be ideal, but I don't know of such a capability.

How would y'all handle a scenario like this?

Edit to add: forgot to note that this is for Windows

https://redd.it/1mn01a9
@r_devops