Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Need help on devsecops pipeline and branching strategy

I'm starting my devsecops internship and I was told by our IT architect that we will have 3 environments: development environment, staging environment and production environment. I'm having difficulties trying to understand when will the pipeline trigger and will the deployment to dev env or stage env or prod env be made and what tests of my pipeline will on it.

The deployment will be made on kubernetes clusters on vms on on-premises vmware esxi hosts

this screenshot of branching strategy provided by a devops engineer may be helpful
branching

https://redd.it/1j6sqd7
@r_devops
When working on migration projects, I encountered an unexpected issue related to the GKE (Google Kubernetes Engine) Ingress controller.

When working on migration projects, I encountered an unexpected issue related to the GKE (Google Kubernetes Engine) Ingress controller. Specifically, I found that the GKE Ingress controller doesn’t support URL path overwriting. Let me explain the issue with an example and walk you through the challenges it caused during my debugging process.

I wrote an article about it, hope this will be helpful for the community

https://medium.com/@rasvihostings/challenges-with-url-path-forwarding-in-gke-ingress-controller-c175057a76d6



https://redd.it/1j6rl5u
@r_devops
As a technical resource how do you deal with sales staff?

The setup here is that I manage a team of support engineers, and a lot of times we're asked to support customer "events" where there is elevated traffic. This is a lot we can do mid-event to mitigate problems and even prevent them, and just a lot more that's well outside our control.

I keep running into situations where something will happen during an event (sudden router failure somewhere on the network, misconfiguration leaves a component vulnerable to a traffic spike, etc), a short lived spike or two in errors results from it, the customer calmly asks for an RFO and the next week of my life is spent dealing with an escalating chain of internal account execs and non-technical customer relations people with escalating temperatures who are all demanding a technical explanation of what happened, but don't like the answer they get.

"I can't spin this" is the phrase that I keep hearing when I explain how the thing broke, why it was impossible for a tier 1 support engineer to predict/prevent, and a step by step of configuration changes that can be made to prevent this from happening in the future. Like, what else did you want if the literal correct technical answer isn't good enough? More often than not we'll triage with an engineering team who is already familiar with the account because 6 months ago they warned the account team about the possibility of exactly what broke and the recommendations were ignored.

Whenever this happens I have a sit down with my own managers and they seem pretty confident that we handled it appropriately. But naturally the sales oriented teams have the ear of upper management and execs, and the story that lives on as canon to both management and the customer is that the support team blew it and didn't flip the switch from "broken" to "fixed" fast enough.

I'll admit there's plenty I don't know about the business end of things, and blaming the first available lowest ranked person you can find will certainly get you off the phone quick enough, but I simply don't see a business upside to painting your support team as incompetent. Is there any approach to navigating this that actually helps or is this just the way it is everywhere?

https://redd.it/1j6u9pq
@r_devops
PM2 process exits after SSH session ends when deploying via AWS CodeBuild

I’m deploying a Node.js backend using AWS CodeBuild and SSH into an EC2 instance to run the deployment steps. The deployment script successfully:

1. Fetches the latest application code from S3
2. Extracts it to `/home/ubuntu/app`
3. Sets up environment variables from AWS SSM Parameter Store
4. Installs dependencies (`npm ci`)
5. Runs database migrations (`npx prisma generate`)
6. Builds the application (`npm run build`)
7. Starts the application using PM2

**The problem:**

* **When CodeBuild runs the script via SSH, everything executes successfully, and PM2 starts the application.**
* **However, once the SSH session from CodeBuild ends, the process moves to an "errored" state.**
* **Manually running** `pm2 restart backend-app --update-env` **after SSH logs out restores the process to "online."**

# What I’ve tried so far:

Ensured PM2 is running as `ubuntu` user
Used `pm2 save` to persist the process list
Ran `pm2 startup systemd -u ubuntu --hp /home/ubuntu`
Enabled PM2 as a systemd service (`systemctl enable pm2-ubuntu`)
Restarted PM2 service (`systemctl restart pm2-ubuntu`)
Set `export PM2_HOME="/home/ubuntu/.pm2"`

But the issue persists—PM2 starts fine during deployment, yet after CodeBuild finishes, the process moves to "errored." If I log in via SSH as `ubuntu` and manually restart it, **it works perfectly.**

**Why is PM2 treating the process as "errored" when the SSH session ends? How can I ensure it remains running after CodeBuild logs out?**

# Additional Info:

I’ve also tried using **CodeDeploy and SSM** instead of SSH, but both had their own issues:

* **CodeDeploy Agent** doesn't pick up the latest changes properly and causes problems with root user permissions.
* **SSM Run Commands** either behave the same way as SSH (failing after session ends) or stay stuck in an **"in progress"** state indefinitely.

Any insights or suggestions would be greatly appreciated!
PS. I'm limited on using only AWS CI/CD tools.

https://redd.it/1j6z253
@r_devops
Creating EC2 security group rules for Pingdom?

I have an EC2 instance hosting a webserver that Pingdom performs uptime tests against.

I need 80/443 open to my web server so Pingdom can hit it, but I don't want the web server to be publicly accessible.

I was thinking of manually adding all of Pingdom's probe IP addresses, but there's a couple hundred.

It seems like people have made projects to get around this issue (see PicnicSupermarket/pingdom-probes-aws-whitelist and andypowe11/AWS-Lambda-Pingdom-SG on GitHub).

However, many of the projects are pretty old. I was curious if someone could suggest a project/method that they know works in 2025. Thanks!

https://redd.it/1j70c2f
@r_devops
Trying for a DevOps role, offered a software support role, will it help?

It’s a weird situation but I am a fresh grad computer science and really interested in DevOps and have been studying it and bootcamps and what not. Now in my search for jobs, I applied to this role I was told that I am being heavily considered and that I might get accepted soon.

Now the role itself is mostly database support and maybe some customer communications regarding APIs and such (things that can be handled with the company’s documentations as well). Now my issue is while I’d love to finally work and gain experience, I fear that I might have landed a role that may be far from what I want and I fear that later I won’t have the ability and ease of transition to DevOps or similar roles.

I decided to ask the experts here as I personally know no one who even understands what DevOps means or stands for.

Context note: the company isn’t huge, around 500 employees and does have DevOps engineers in it, I’ve seen 4 on LinkedIn so I assumed a team of 4.

https://redd.it/1j738do
@r_devops
Looking for DevOps Projects Can someone sends a lectures videos link – Recently Completed Basics!

I recently completed the basics of DevOps and have a medium-level understanding of CI/CD, Docker, Terraform, and Kubernetes. Now, I want to work on some real-world projects to solidify my skills. Could u suggest me some videos where i can learn end-to-end make projects add into my portfolio.

https://redd.it/1j75nvk
@r_devops
Considering a Career Shift to IT - DevOps or Other Remote Roles?

Hello friends,

I'm reaching out for some advice and insights. I'm currently in a non-IT role that's at risk of being outsourced, and with a salary of about $120k at 47, I feel the need to diversify my skills to ensure future stability and my mental health.

As I don’t want to loose my job without a plan.

And it would be nice to pursue remote work.

Hypothetically if I don’t loose my job it would nice to have a part time home based side hustle.

In the past, I served as a data systems analyst in the Air Force from 1997 to 2017, so I do have some background in IT although archaic.

To mitigate some of the anxiety about my job security, I've started exploring new skills, particularly in roles that could lead to remote work.

I've been playing around with Perplexity AI asking for suggestions.

And DevOps keeps popping up, but from what I've read, it seems like it's far from entry-level.

There are plenty of "gurus" offering training courses claiming you can be DevOps-ready in just six months, which I'm skeptical.

Currently, I'm taking Python courses and considering moving on to Linux and possibly Kubernetes, though I've heard it takes about a year to master which is a ridiculous time investment.

Perplexity suggested some alternative roles that might be more realistic.

1. Remote Python Developer (Backend/Frontend)

2. AI/ML Engineering Support Roles

3. Cloud Automation Specialist

I'm hoping that by the time I get some skills and certs down the job market will have improved.

Has anyone transitioned into these roles from a non-IT background, especially with a focus on remote work?

Any advice or experiences you can share would be greatly appreciated!

Thanks in advance for your input.

https://redd.it/1j7dcck
@r_devops
Looking for contributors for my dockerfile template repository

I have created a template repository with dockerfiles to kickoff projects / setup environment for existing projects

Templates can be easily downloaded using a shell script that I hosted in my personal webpage server (curl the sh code into shell script and run the script -> further details in the repo)

The main purpose is to provide a very low friction method for fast project kickoffs / experiments and easy env setup of existing projects

https://github.com/arjunprakash027/Templates

I am looking for contributors to add more templates to the repository

https://redd.it/1j7d0c5
@r_devops
ITCareerQuestions did not answer me: I am exploring my options to stay relevant in a fast-changing career and I had some career-shifting questions from professionals in the field today.

It's been 10 months and I have had no luck finding work. Not even 1 interview. Very very quickly, my background...you can skip to the end for my actual questions, but you can use this as reference.

Academic Bkg: I live in Ontario, Canada. B. Eng in Electronics Systems Engineering. It was a very practical program - we had at least 1 engineering project every semester, sometimes multiple, amounting to 10 total.

Co-ops/Paid Internships: Three in total. One at BlackBerry-QNX and One at Ciena. One was in a startup. All 3 were in the realm of high-level SWE. This taught me everything in my toolbox which landed me my jobs after grad.

Professional Experience: First job, was in Data engineering - they provided all the training material and were patient, but got laid off due to lack of work. My second job was at a very famous Canadian company working for their automation team. At the end of probation, they terminated me due to lack of skill. Total YoE: 2 Years (1.5 + .5, respectively).

First 8 months: I tried to focus on SWE fields, such as DevOps, and upskilling, but not doing the certs since my other SWE friends told me that just having it on your resume is a strong bait, but you will have to prove yourself in the interview. Just 1 phone screen.

Last 2 Months Three of my friends who left their respective careers and became Data analysts talked to me and advised me to strongly consider DA or BA because it's got an easy barrier to entry and they all have stable jobs, so I took a big course, did a few personal projects, put on my resume and started applying. Not a single peep, just recruiters hopping on calls just to get my details and ghosting me immediately after I tell them I am pivoting to DA/BA.

Now: I'm exploring my options. I am in a capable spot to pursue a master's and I want to see what's the best course of action for moving forward.



1. How is the job market for entry levels ?

2. Is there even a master’s for it ?

3. Will a master’s level the playing field for me, or is it professional exp >>> courses and master's ?

4. If I need to upskill, what level ? (ie. Udemy vs actual professional certs from AWS, or GCP)

Thank you for taking the time to read through my post. Have a wonderful Sunday!

https://redd.it/1j7j6ag
@r_devops
Question for seasoned vets and best practice sticklers from a college student.

I am a CS student who wants to work in DevOps, but I don't know if you all see the job market. How can I learn to program like a senior-level developer to set myself apart from the new grads? Coding like a senior comes from experience.

If you were in my shoes, practices and resources, do you recommend capturing best practices from documentation, staying updated on new releases and tech, and learning security best practices so I can impress the right people?

And if there is anything else you recommend I do so that I can have a good shot at finding a job in this oversaturated market, compared to master students, prestigious university grads, experienced developers, and people with big-name internships on their resumes, please let me know. Cheers

https://redd.it/1j7oklt
@r_devops
Are there ever slow days/weeks?

So I'm really new this DevOps position. No idea how I got this job really but they said they'd teach me and I've been working my butt off trying to study/learn/ catch up to all these brilliant programmers around me. I'm even more new to the Dev side of Ops. Anyway, my workload is already lighter than than Sr. guy on my team but even then I'm curious are there ever slow days in general?

Like is it just constantly fixing things? My brain gets annoyed at this agile stuff when I'm just like do it right the first time and make updates when you want to make something better. So imo waterfall > agile.

I will say a lot of this work was started before any of us got here so that effects how much we have to fix but still, I'm wondering when working on a project, app, company, website, or whatever is it always constantly tweaking or is there like a "well everything works, we can make small tweaks, tear down rebuild in a minutes so relax for a little" or is it always sprint after freaking spring or something needs to be done/refactored or whatever?

And if so, what does that look like? Less work hours less money? Find a 2nd job? Contracts still paying so why leave just enjoy the time?

https://redd.it/1j7qfsi
@r_devops
How do you all use nginx exporter ?

I need the exporter to show metrics on grafana through Prometheus. But the fact that nginx Prometheus exporter is very basic , how can I make it more customised? Like showing latency, error/success for each api, etc

It would be also helpful if you all can suggest some other important metrics.

Intern need to impress manager.

https://redd.it/1j7rav8
@r_devops
JENKINS MISHAP????

Hi, need advice. I used JENKINS in deployment and when it was done, I found out that the old files in my server was deleted. How to recover them?

https://redd.it/1j7rjb3
@r_devops
Would you use Kubernetes Terraform template that provide a Platform grade setup?

Hey r/devops,

I’m exploring the idea of a platform that provides ready-to-use, production-grade Kubernetes infrastructure templates—something that could save teams time by offering pre-configured setups for essential components like:

Observability (Prometheus, Grafana, Loki, OpenTelemetry, etc.)
GitOps (ArgoCD, Flux)
Cert Management (cert-manager, external-dns)
Service Mesh & Networking (Istio, Linkerd, Cilium)

The goal is to help teams skip the painful initial setup and get straight to deploying applications with a solid, scalable foundation. Instead of spending weeks fine-tuning Kubernetes infrastructure, you’d have a well-tested Terraform/Kubernetes template that you can deploy in minutes.

I’d love to hear from you:

Would you (or your company) pay for a service like this?
What are the biggest pain points in setting up Kubernetes infrastructure?

Looking forward to your insights—especially from those who manage K8s at scale! 🚀

https://redd.it/1j7vj7c
@r_devops
Docker assumes my Harbor registry is DockerHub

Hello, everyone!


I’m new to DevOps and running into an issue with Docker and a private Harbor registry. The registry is running on the same server as my CI/CD runner. When I push images using 'localhost', everything works fine. But when I try using the server’s hostname, Docker assumes it’s a DockerHub repository instead of my Harbor registry.



Logging in to Harbor works without any issues, and images are listed correctly. However, when I push using the hostname, I get errors like access being denied or the tag not existing. In fact Docker assumes that I'm trying to push docker.io/server/image.



Has anyone faced this before? Any ideas on how to make Docker properly recognize the registry when using the hostname? Any help would be greatly appreciated!

https://redd.it/1j7vvfi
@r_devops
Thrown into the Deep End in DevOps, Need Guidance for the Next Step

Hey everyone,

I wanted to share my journey so far and get some advice from this community.

I joined a prop-tech startup right after college with limited DevOps knowledge. Initially, I worked alongside a senior engineer, starting with tasks like writing backup and restore scripts and creating POCs in the sandbox environment. One of the key things I worked on was a metrics exporter for a database, which helped me secure a full-time offer.

I officially started as a full-time DevOps Engineer in September. I took charge of stage deployments and started learning more about AWS and monitoring. The pay was okay for a fresher, but I stayed because I was gaining valuable experience.

Around December, my senior left, and their replacement didn’t have much experience with our setup. Since I had about 6 months of hands-on work with our infrastructure, I was given production access. Since then, I've been handling tasks like database replications, deployments, observability, monitoring, security audits, and disaster recovery practices.

I'm currently preparing for the CKA (Certified Kubernetes Administrator) exam, aiming to appear around May-June. My goal is to land a mid-level DevOps role by March 2026.

I'm looking for advice on:

1. Skills/Certifications I should focus on alongside the CKA to increase my chances.
2. How to effectively showcase my experience to land that mid-level role.
3. Any resources or strategies that can help me fast-track my growth.

Would love to hear your thoughts, especially from those who've navigated a similar path. Thanks in advance!

https://redd.it/1j7wivc
@r_devops
Seeking validation on Go CLI for Dockerfile Template Discovery

Hey folks,

I'm building a Go CLI that helps users find Dockerfile templates, and I’m exploring two approaches:

1. Cache Approach: Pull templates from well-known repositories (think Awesome Docker Templates or other curated Dockerfile libraries) and cache them locally.


2. Dynamic Search: Query Docker Hub directly to search for images and dynamically generate a template based on what’s available.


I’d love to hear what you think about this idea, does it sound useful? Any advice or pitfalls I should consider?

If you feel the idea has no base and is completely useless, let me know that too!

https://redd.it/1j7uwyt
@r_devops
Serverless observability for dummies

I'm the only dev (frontend background) in an early stage startup.

We use AWS Lambda (with serverless.com ) , Nextjs (hosted in Vercel).

I use AWS Cloudwatch to inspect logs but it has no alerts or nice UI so all I want is a nice UI to sit on top of Cloudwatch.

I tried setting up New Relic, HoneyComb.. but honestly I feel the effort required is way too involved for my time and skillset.

Is there an easy tool optimized for serverless? I dont have OpenTelemetry or anything like that.



https://redd.it/1j7zt7n
@r_devops