Reddit DevOps
268 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Currently an IT Technician doing lots of devops related task trying to get into DevOps/Cloud Enginnering

Hi fellow devops-ers I currently work for an MSP where my primary job is day-to-day tickets related to networking, SharePoint, other Microsoft issues but lately I have been working more with Powershell, Microsoft Power Automate, PowerApps and basically creating lots of automation flows and scripts for any task the company needs. I do have a good understanding of Linux and I have a mini homelab where I run various docker services and even host my own website in a cluster of 4 Raspberry Pis and a Mini PC. In addition to strong networking skills I am also a decent dev with knowledge of React, NextJS, SQL, mongoDB, and Python (i have developed my own website and web apps).

I was thinking of moving to more of a cloud engineering or Jr DevOps role, however not sure what position would allow me to get into that domain of IT. I have applied to lots of positions on LinkedIn and Indeed but nothing really materialized. I am also getting my AWS Associate Solutions Architect cert to boost up my chances.

Would love to hear from you guys and if anyone has any recommendations for what my next steps should be.


Portfolio: mointech.dev

https://redd.it/1ir76ef
@r_devops
How long should we stay in one job one or two years ?

Please suggest I am on my first job since 3 months and I hate it because of the environment.

https://redd.it/1ir9iu5
@r_devops
Self-service portal with templating and automation

I'm looking for some SaaS developer portal that has self-service items (eg. Create S3 bucket) with input fields. On submission, it creates a GH pull request using a templete you define with the input fields.

Has anyone used any commercial product that could be configured to do something like this? The team has low bandwidth atm so I'm looking for some solution that needs minimal development. This also needs to be user friendly so I'm looking for a nice UI/UX

https://redd.it/1irhny9
@r_devops
AWS ECS task ephemeral task definition filling up unexpectedly

I have a PHP application running in an ECS service and I recently also implemented DataDog, to monitor my services I recently ran into an error where my ephemeral storage of tasks got filled unexpectedly and i have never ran into this error in the past 1.5 years, what could be the reason behind this error? could i have something to do with my datadog implementation?

https://redd.it/1irido5
@r_devops
Rclone and S3 sutiable google drive replacement?

Hi i was just wondering if Rclone with S3 cloud storage would be a suitable replacement for google drive?

I don't care for conflicts right now it's mainly performance for multi gigbyte files.

I would wrap rclone in my own applicatoon for user authentication.

Or is there something else to consider.

What i need is:
- custom user auth
- cloud storage
- fast upload and download
- file permission filtering / allow list
- api or sdk or cli to control everything if needed




https://redd.it/1irn28r
@r_devops
How to Deploy Static Site to GCP CDN with GitHub Actions

Hey folks! 👋

After getting tired of managing service account keys and dealing with credential rotation, I spent some time figuring out a cleaner way to deploy static sites to GCP CDN using GitHub Actions and OpenID Connect authentication (or as GCP likes to call it, "Workload Identity Federation" 🙄).

I wrote up a detailed guide covering the entire setup, with full Infrastructure as Code examples using OpenTofu (Terraform's open source fork). Here's what I cover:

- Setting up GCP storage buckets with CDN enabled
- Configuring Workload Identity Federation between GitHub and GCP
- Creating proper IAM bindings and service accounts
- Setting up all the necessary DNS records
- Building a complete GitHub Actions workflow
- Full example of a working frontend repository

The whole setup is production-ready and focuses on security best practices. Everything is defined as code (using OpenTofu + Terragrunt), so you can version control your entire infrastructure.

Here's the guide:
https://developer-friendly.blog/blog/2025/02/17/how-to-deploy-static-site-to-gcp-cdn-with-github-actions/

Would love to hear your thoughts or if you have alternative approaches to solving this!

I'm particularly curious if anyone has experience with similar setups on other cloud providers.

https://redd.it/1iropdv
@r_devops
Docker interview

Hi, so as the title suggests. I have a technical interview about Docker/Python. It's for an entry-level role (Junior Devops). I had a previous candidate screening call and I was open and honest about not using these tools before with the tech lead at the company, but they still want to invite me to the interview after hearing about my experience with cloud platforms etc. They said the interview will mainly revolve around problem solving. So I was wondering if you guys can provide me with some tips to help prepare for it. Thanks

https://redd.it/1iro8ku
@r_devops
Alerting System That Supports Custom Scripts & Smart Alerting

Hey everyone,

In my company, we developed an internal system for alerting that works like this:

1. We have a chain of applications passing data between them until it reaches a database (e.g., an IoT sensor sending data to an on-premise server, which then sends it through RabbitMQ/kafka to a processing app in a Kubernetes cluster, which finally writes it to a DB).
2. Each component in the chain exposes a CNC data endpoint (HTTP, Prometheus, etc.).
3. A sampling system (like Prometheus) collects this data and stores it in a database for postmortem analysis.
4. Our internal system queries this database (via SQL, PromQL, or similar) and runs custom Python scripts that contain alerting logic (e.g., "if value > 5, trigger an alert").
5. If an alert is triggered, the operations team gets notified.

We’re now looking into more established, open-source (or commercial) solutions that can:
\- Support querying a time-series database (Prometheus, InfluxDB, etc.)
\- Allow executing custom scripts for advanced alerting logic
\- Save all sampled data for later postmortems
\- Support smarter alerting—for example, if an IoT module has no ping, we should only see one alert ("No ping to IoT module") instead of multiple cascading alerts like "No input to processing app."

I've looked into Prometheus + Alertmanager, Zabbix, Grafana Loki, Sensu, and Kapacitor, but I’m wondering if there’s something that natively supports custom scripts and prevents redundant alerts in a structured way.

Would love to hear if anyone has used something similar or if there are better tools out there! Thanks in advance.

https://redd.it/1irr036
@r_devops
How do you manage your most frequently used infrastructure automation scripts?

Hey folks! How do you manage your most frequently used infrastructure automation scripts?

https://redd.it/1irt7m1
@r_devops
Rolling out new features, but everything is slowing down... help?

We’re preparing to roll out a set of new features for our app, but during staging tests, we noticed something weird: the app is running significantly slower. It’s strange because the new features don’t seem heavy on the backend, but somewhere along the way, our API response times nearly doubled.

I’ve already tried a few tools to diagnose the issue:

\- perf – Gave some general insights but didn’t pinpoint the bottleneck.

\- Flamegraph – Useful for a high-level view, but I’m struggling to get actionable details.

\- Py-Spy – Helpful for lightweight Python scripts, but not sufficient for this scale.

At this point, I’m at a loss. Has anyone dealt with something similar? What profiling tools or approaches worked for you? I’m especially curious about tools that work well in live environments, as the slowdown doesn’t always appear in staging.

https://redd.it/1is96rx
@r_devops
How do you manage Docker images across different environments in DevOps?

I have a few questions regarding Docker image management across different environments (e.g., test, UAT, and production).


Single Image vs. Rebuild Per Environment

Should we build a single Docker image and promote it across different environments by retagging?
Or should we rebuild the image for each branch/environment (e.g., test, uat, prod)?
If we are rebuilding per environment, isn't there a risk that the production image is different from the one that was tested in UAT?
Or is consistency maintained at the branch level (i.e., ensuring the same code is used for all builds)?

Handling Environment-Specific Builds

If we promote the same image across environments but still have server-side build steps (e.g., compilation, minification), how can we properly manage environment variables?
Since they are not embedded in the image, what are the best practices for handling this in a production-like setting?

Jenkinsfile Structure: Bad Practice?

Below is a snippet of my current Jenkinsfile. Is this considered a bad approach?
Should I optimize it, or is there a more scalable way to handle multiple environments?

​

steps {
script {
if (BRANCHNAME == 'uat') {
echo "Running ${BRANCH
NAME} Branch"
env.IMAGE = "neo/neo:${BRANCHNAME}-${COMMITHASH}"
echo "New Image Name: ${env.IMAGE}"
docker.withRegistry('https://nexus.example.com', 'HARBORCRED') {
docker.build("${env.IMAGE}", '-f Dockerfile.${BRANCHNAME} .').push()
}
} else if (BRANCHNAME == 'test') {
echo "Running ${BRANCH
NAME} Branch"
env.IMAGE = "neo/neo:${BRANCHNAME}-${COMMITHASH}"
echo "New Image Name: ${env.IMAGE}"
docker.withRegistry('https://nexus.example.com', 'HARBORCRED') {
docker.build("${env.IMAGE}", '-f Dockerfile.${BRANCHNAME} .').push()
}
} else if (BRANCHNAME == 'prod') {
echo "Running ${BRANCH
NAME} Branch"
env.IMAGE = "neo/neo:${BRANCHNAME}-${COMMITHASH}"
echo "New Image Name: ${env.IMAGE}"
docker.withRegistry('https://nexus.example.com', 'HARBORCRED') {
docker.build("${env.IMAGE}", '-f Dockerfile.${BRANCHNAME} .').push()
}
}
}
}


https://redd.it/1isa0tx
@r_devops
Question

Can you get an entry level devops jobs in the current industry scenario? I am currently studying AWS , I know how to use Docker, Jenkins, git , have basic knowledge of Linux , Networking and OS. After practicing AWS ill study Kubernetes , & Terraform. LMK if there is anything that I should do or shouldn't do and also what is the market like for entry level devops engineer. TY

https://redd.it/1isbbrp
@r_devops
Can't configure a consent screen. Clicking on "OAuth consent screen" redirects me to "Google Auth Platform / Overview". What is going on?

I can't create client credentials cause I can't configure an OAuth consent screen, which I can't do cause I keep getting re-directed to /auth/overview.

Is this intended behavior or a bug? Honestly stumped over here and i've set up social login dozens of times in the past.

https://redd.it/1isamvc
@r_devops
Building custom Chromium, how do I stay aligned with official Chromium versioning?

Hello,

We have a fairly complex system in place where we fetch a clean Chromium, patch our changes and build the custom browser.

We have an update server where we manage versions, but we want to keep it aligned with Chromium's versions.

For example, Chromium is on 133.0.6943.99, but we continuously release new versions of our custom browser. When we finish building, we're supposed to upload the new artifact to the update server, but it won't trigger an update from the client's "About" page, since the version is still the same.

It's NOT possible to:

- Add a custom patch 99-mypatch
- Add another simver like 133.0.6943.99.123

We would like to stay aligned with the official version. I'm not sure how to handle this situation.

Any tips would be welcome.

Thank you!

https://redd.it/1isdtae
@r_devops
Which cloud for South America?

My friend wants to deploy his app(still works on it), hoping to establish it as a major player in South America. The big three are there but they are not cheap, we all know that. What about OVH Cloud? How to check if latency and bandwidth are comparable? How about local providers?

https://redd.it/1ise9mk
@r_devops
Is KodeKloud subscription worth it?

KodeKloud PRO subscription is worth 8250 INR per year right now and KodeKloud for BUSINESS is 12250 INR.
Is it worth buying it?
Can I share KodeKloud for business with someone even I bought it for my personal use?

https://redd.it/1isff9m
@r_devops
Recommended gitops ci/Cd pipelines for self managed kubernetes

I'm working at an AI development team, currently I'm setting up the CICD pipelines for development and staging, and is looking for some recommendation on how to setup everything smoothly.

For context, we are running Kubernetes on baremetal, the current setup is 3-4 nodes running on the same LAN with fast bandwith between the nodes. The system consists of Longhorn for the Storage, Sealed Secrets, ArgoCD. We have a Gitops repository where ArgoCD watches and deploys from, and the devs operate on their own application repo. When the application is built, the CI pipeline will push the new image tag and do git commit into the gitops repository to update the tag. Here are some of the pain points I have been dealing with and would want some suggestion on how to resolve them:

1. We are running on the company network infrastructure so there can only be traffics from either the local network, or from outside through the company's reverse proxy. So currently, we can only uses NodePort to expose the services to the outside world, that only the machine on the private network can access. To public the app we would have to file an request to the IT team to update the DNS and reverse proxy. Is this the only way to go? One thing I'm worried about is the managing of NodePorts when the services grow in size
2. Most of the devs here are not familiar to the Kubernetes world, so to deploy a new application stack, I have them create Dockerfiles and Docker compose for referencing. This process takes time to translate fully everything into a Helm chart. This Helm chart then get commited on the Gitops repository. I'm then create a new Application on ArgoCD and start the deployment process. So for each new app, I have to spent most of my time configuring the new Helm chart for deployment.
I'm looking for a way to automate this process, or at least simplify it. Or would the dev learning about writing Kubernetes worth it in the long run?
3. We as the AI team of the company rely heavily on large ML models, most of which are from HuggingFace. In the past, to deploy an AI app we used docker compose to mount a model cache folder, where we would store downloaded ML models so the applications wouldn't need to re-download every time we reload or have a new application running the same model. the problem is now we are migrating the system to k8s so there need to be a way to effectively cache these models, which can be varies from 500MB to 15GB in size. I'm currently considering hostpath PV using NFS ReadWriteMany so every nodes can access the models.

Any suggestions or comments about the system are welcome.

https://redd.it/1ishqn7
@r_devops
Yaml watch

Hey made a cool Yaml watch face for android based watches, LMK what you think!
I used it to practice parsing yaml easier & faster :)

https://play.google.com/store/apps/details?id=com.balappfacewatch.dev

https://redd.it/1isigl4
@r_devops
AWS Security groups and Facebook webhooks

Hello,
I'm implementing a Whatsapp Business chatbot and I need to open the Facebook addresses in order to receive the incoming call for the webhook.

When I looked it up, I ran the command and received around 900 addresses and they say it periodically changes.
https://developers.facebook.com/docs/whatsapp/cloud-api/guides/set-up-webhooks#ip-addresses

How can I add all those addresses ? Has anyone experienced this problem and solved it ?
Thank you !

https://redd.it/1isgfgb
@r_devops