Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
What Was Your "I Broke Something In Production" Moment?

A little under a year in my role as a DevSecOps engineer, and I have this huge fear around breaking something in production. A botched upgrade, loss of data, etc.. My coworkers reassure me that everybody breaks something at some point.

When did you, or someone you know break something in Production? What was the impact? What did you learn from that experience?

https://redd.it/1l6rnxp
@r_devops
GitHub Actions and nightly deployment question

Hi, hopefully you kind folk can help me out here. We've recently onboarded our build pipelines into GitHub Actions, and for the most part it's been pretty amazing. However we've got a recent requirement which doesn't seem to be easily accomplished. For context, we have 3 environments, dev, staging and production. Staging and production have deployment protection rules requiring reviewers to approve.

The new requirement is for nightly builds to be deployed to the staging environment. We can accomplish this by using a schedule in the workflow, however because of the deployment protection, someone has to manually approve these jobs.

Is there a way to automate nightly builds and still maintain an environments deployment protections?

https://redd.it/1l6vanq
@r_devops
Is it worth studying programming?

I was reading about the case of Shawn K, who has to make a living delivering orders because he can no longer find work as a programmer. On the other hand, Bill Gates says artificial intelligence cannot replace programmers.

What do you think?

https://redd.it/1l6xb8b
@r_devops
Need advice to switch from my build and release management job?

So I've been working as a build and release management release engineer for the past 8 years. My work usually revolves around creating ITSM Requests for production releases and basically manage all the release activities. The other tasks that I do is basic management of applications and it's environments in lower level environments. I have got nothing to do with linux or any other Scripting or programming stack for that matter. I understand code and can help fix some issues, but that's it.

For a while I've been trying to switch my job as I'm stuck with this project and haven't been really able to work on something new because of personal life crisis during covid.

Now I'm studying and applying but haven't been able to get interview calls. I don't know what to do.

Any advice?

https://redd.it/1l6yj2l
@r_devops
DevOps Engineer planning next cloud move: AWS, Azure, or GCP?

I’m a mid-level DevOps Engineer (3–5 YOE) currently working with AWS (SAA-C03 certified), using orchestration, ci/cd-gitops, IaC, etc.

I'm at a point where I want to deepen my Cloud DevOps focus and am trying to decide which platform to specialize in next:

* **Double down on AWS** with DevOps Pro (saturated but high demand)
* **Pivot to GCP** for less competition and niche appeal (especially with SRE/Data/AI)
* **Explore Azure**, given its enterprise traction (seems strong in Europe and government orgs)

My long-term goal is to be positioned for roles at strong, globally-oriented tech companies. I'm thinking about both skill growth and long-term positioning in the job market.

From your experience or observation, which cloud platform gives the best career ROI right now especially in mature, competitive markets?

Would love to hear from people working in companies that hire across multiple regions or those who recently made a similar decision.

https://redd.it/1l6zbah
@r_devops
Instant Incident Response - Deep dependency graph of the infra

Hello!

We have been working on an incident resolution feature at Anyshift: it helps surface root causes in minutes by connecting layers that don’t usually talk: cloud, Kubernetes, monitoring, and Git.

Classic monitoring stops at symptoms. We wanted to go deeper — so we built a live infra knowledge graph (Neo4j) updated by event-driven pipelines. It links AWS, Terraform, Datadog, and GitHub data to show what changed, where, and why.

It works as a Slackbot or web UI. Setup takes \~5 mins (GitHub app or AWS read-only on a dev account).

It’s free to try for now as we’re looking for as much usage and feedback as possible to shape what comes next.
Video is enclosed. Would love your thoughts, and to answer any of your questions!

Thanks a lot,
Roxane




https://redd.it/1l6zp1x
@r_devops
i'm a student and i need help

Hi everyone i hope you're doing well, basically i'm passing an academic exam in cloudComputing/Devops and it's gonna be a MCQ questions in cloud computing virtualization wether it's network/storage docker kubernetes and i need some help to find MCQ tests to train on them.

https://redd.it/1l6zic8
@r_devops
Future German Job Market?

Hi, I’m currently learning Cloud Engineering tools and concepts, and I plan to add DevOps knowledge as well if possible. My tech stack so far includes Terraform, Docker, Kubernetes, CI/CD basics, and I'm planning to go deeper into AWS/GCP.

I’m a non-EU Master’s student in Germany, with 1 year left to graduate. My German level is B2 in listening/reading, and around B1 in speaking. I have no prior work experience in tech.

The plan was to build up my Cloud/DevOps skills, improve my German, and then apply for jobs. But lately I’m seeing a lot of posts saying the junior market is dead, Cloud jobs require 2–3 years experience, and the IT sector is slowing down. On top of that, I’ve been pushing myself hard for years and I’m near burnout.

My questions are:

1. Is there any realistic chance for someone like me (0 experience, but decent German and solid skills) to break into Cloud Engineering or DevOps roles in Germany?


2. Do you think the market for Cloud Engineers in Germany will get better in the next year or two? Or is it already saturated?



I’m reaching a point where I’m wondering if it’s worth continuing this path or if I should just enjoy my time here and plan to return home after my degree. Any honest advice would be appreciated.



https://redd.it/1l70e23
@r_devops
Feeling Overworked and Frustrated as a Senior Cloud Engineer – Should I Quit?

I have 4 years of DevOps experience and am currently working remotely as a Senior Cloud Engineer at a startup, earning 12 LPA. Lately, I’ve been feeling overwhelmed and frustrated. My company recently assigned me to a new project with just one colleague, tasking me with migrating an application from Docker Compose to Kubernetes using Pulumi. The problem? I have zero experience with Pulumi or TypeScript.I’m struggling to make progress, and the lack of support is making it worse. My senior is never available for calls or guidance. I think about quitting daily but don’t have any job offers lined up. I’m stuck and don’t know how to move forward. Should I quit, or is there another way to handle this?

https://redd.it/1l70hb0
@r_devops
Junior in DevOps learning

I've been in the DevOps team for 1 year 6 months and lately have been given more responsibilities since I'm no longer a trainee, which is fair enough. But I've been feeling very overwhelmed and my team has reassured me and are supportive but I wanted to know how can I accelerate my learning progress? I have a doc of errors and solutions I come across, and recordings if I need help, as well as my team but is there anything else I can do?

When I asked my manager he said nothing he's fine with my progress so far, but I still feel something's amiss.

https://redd.it/1l75rfo
@r_devops
Anyone with experience comparing AWS and Oracle Cloud

Hello!
My team and I are currently exploring the possibility of switching from AWS to Oracle Cloud (OCI), and we have a few questions. We're specifically trying to compare the following services:

EKS (AWS) vs OKE (OCI) for Kubernetes
EC2 vs OCI Compute
AWS Load Balancers vs OCI Load Balancer

We're especially interested in hearing about:

Differences in performance and cost
Ease of setup and day-to-day management
Integration with other cloud services like IAM, autoscaling, monitoring, etc.
Data transfer costs – this is a big concern for us. AWS charges for most outbound traffic, while OCI offers a free monthly bandwidth quota (like 10TB, depending on region).
Any lessons learned or suggestions for switching from AWS to OCI

If anyone has experience working with both platforms, we’d really appreciate your insights. Thanks in advance!

https://redd.it/1l76bhe
@r_devops
Confusion on improving DevEx with platform engineering

Hey, so today we are using terraform across our org (a lot of copy and paste without centralized modules). We also have k8s and argocd. The problem today is that the process to create new services and infra for developers is not entirely smooth or clear.

We've been tasked with improving this process and making it easier and faster for developers to self service what they need. I've been exploring of things like crossplane etc would make sense, however that has just left me even more unsure.

Any suggestions on what has worked for you guys would be appreciated. Things are so opinionated these days that I often just end up going in circles 😅

https://redd.it/1l77r04
@r_devops
Has anyone been able to programatically grab the SHA256 file for Telegraf?

Hello,

This is a bit of a weird ask, but I'm trying to full automate the updates of our telegraf service on a Windows server, but Telegraf's SHA256 file is sitting behind a JavaScript button for some reason.

Has anyone been able to automate the download & verification of the newest telegraf SHA file? I've mostly got it, but the SHA file sitting behind a weird JS component is the one hitch in my steps.

https://redd.it/1l761cp
@r_devops
Rate My Idea !! A temporary app hosting service — just a resume project, not a startup

Hey everyone,

So I’ve been learning DevOps for a while now, and instead of just following tutorials or deploying sample apps, I thought of building something a bit more real-world.

The idea is pretty simple — a platform where anyone can deploy their GitHub project (frontend/backend) and host it temporarily for 1 day. After that, the app gets removed automatically.

Basically:

You give a GitHub link
Jenkins pulls it, builds it using Docker
It gets hosted on my server with a unique port or subdomain
You get the link via email
After 24 hours, the app is removed from the server

Only 4–5 apps will be live at a time, just to keep it manageable on my VPS. The main goal is to learn proper CI/CD, automation, container handling, cleanup scripts, and also make something that others can try out.

Not trying to launch a startup or anything — just a hands-on project to showcase on my resume and maybe help other devs who want a quick place to test or show their app.

I just want to know:

Is this idea worth building?
Any suggestions on what I can improve or add?
Anything that could go wrong or I should handle better?

Thanks in advance 🙏 Just trying to learn and build something useful for the dev community.

https://redd.it/1l7ahex
@r_devops
Anyone here tried Rafay’s GPU PaaS stack for managing AI infra?

Been seeing more mentions of Rafay's GPU PaaS push for AI workloads. Curious if anyone here has used their platform or evaluated it?

How does it stack up against Sagemaker or any other solution?

https://redd.it/1l7ddl6
@r_devops
AWS Cognito authentication with Keycloak as 3rd party IdP

Hi everyone, I am not sure this is the right place to ask but hopefully someone could give a helping hand and suggestion on my current setup. It is kinda rigid for this condition.


So I am using the AWS Cognito as the Authentication/Authorization for the web application. But I noticed that the users are all on AWS which is not a good practice to manage the users while our application are using Keycloak as the IdP. So I decided to integrate Keycloak as the external provider in AWS Cognito to see how's going. So far I have integrated and User can login ( testing mode with the default AWS login page).



But I noticed that when I checked the user ID token, it does not come with several attributes that I need most to put them into different groups on Cognito. I use the Pre token generation method with Lambda function to assign the custom attribute into the user ID token, but it did not work. first, the default id token does not come with the realm_role attribute to determine the role of the user, and second I could not create a custom field for the user ID token no matter what I did with the example AWS provided. I am not sure if there is the actual limitation/restriction that AWS Cognito exist with the 3rd party IdP setup.


I am not sure if there is any direct solution that can help to resolve this issue. I have a work-around idea but it sounds like weird.. Like making an API call to the keycloak to get all user's required attribute and dump into the S3 bucket and then there is background job or event-driven method to trigger lambda and somehow update the users membership and assign them to different groups. It sounds stupid as like a loop to complete the task.
May I know if there is anyone encountering this issue before? What would be your solution?

Thank you!

https://redd.it/1l7adu8
@r_devops
Logging Failed Writes/Reads in Redis (AWS Valkey cache)

We’re encountering issues in our Valkey cache where it’s not updating sometimes. Is there a way to log the failed writes and reads? I tried checking Cloudwatch but it doesn’t have native metrics to catch these failures.

https://redd.it/1l7qe65
@r_devops
A Complete Load Testing Setup with k6 and Grafana

I recently put together a modern load testing setup using k6 to run tests, and Grafana to visualise the results, with GitHub Actions for automation.

In my guide, I use Grafana Cloud's Prometheus Remote Write to keep things simple, but you can easily plug in your own self-hosted Grafana + Prometheus stack.

The setup includes:

Running k6 on a lightweight EC2 instance
Pushing metrics to the Prometheus Remote Write endpoint
Visualising test results in Grafana dashboards
Automating test runs for multiple services via GitHub Actions

It’s a DevOps-friendly, repeatable approach that works for QA and engineering teams alike.

Full guide here (with code & workflows): https://medium.com/@prateekjain.dev/modern-load-testing-for-engineering-teams-with-k6-and-grafana-4214057dff65?sk=eacfbfbff10ed7feb24b7c97a3f72a93

https://redd.it/1l7pytd
@r_devops
Claude Code under root and without Docker — permission-bypass CLI wrapper

Hi all,



I’ve built a small CLI wrapper around Claude Code that allows you to bypass all the usual restrictions and run it in environments that normally wouldn’t allow it — like under root, without Docker, or offline.



Main features:



* Always enables --dangerously-skip-permissions
* Fakes getIsDocker() and hasInternetAccess() responses
* Works fine under root
* Can run in headless/server environments
* Simple alias (cl) for quick usage





I know it’s a simple workaround, but I couldn’t find a working solution anywhere, so I figured I’d just make one and share it.



Still rough around the edges, but works well in practice.



GitHub repo:

[https://github.com/gagarinyury/claude-code-root-runner](https://github.com/gagarinyury/claude-code-root-runner)



Would love feedback or ideas if you have any.

https://redd.it/1l7t7xp
@r_devops
Finally solved GNOME's annoying multi-monitor workspace problem ( For me )



Been dealing with this for months on my 3-monitor setup. GNOME's workspace switching moves ALL monitors together, so when I switch contexts on my external displays, I lose my communication apps on the laptop screen. Drives me nuts.

Tried a bunch of existing extensions but nothing worked right. So I built my own.

**The fix:** Extension tracks which monitor your mouse is on. When you switch workspaces, only that monitor gets new content. The other monitors' windows automatically shift to keep everything in sync.

Example: I swipe left on my code monitor. My browser and terminal shift left too, but stay visible on their respective screens. No more losing Slack when I'm debugging.

**How it works:** Instead of blocking GNOME's workspace system (which breaks things), it works WITH it. Lets GNOME do the workspace change normally, then quickly moves windows around to maintain the illusion of per-monitor independence.

**Gotchas:**

* Requires static workspaces (not dynamic)
* Brief window animation when switching - it's not native behavior
* Your windows are technically moving between workspaces constantly, but you don't really notice

Took way longer than expected because GNOME really wasn't designed for this. Had to try 3 different approaches before finding one that didn't crash the shell.

Code's on GitHub if anyone wants to try it or improve it: [https://github.com/devops-dude-dinodam/smart-workspace-manager](https://github.com/devops-dude-dinodam/smart-workspace-manager)

Works great for my workflow now. Laptop stays on comms, externals switch contexts independently. Finally feels like macOS did this right and Linux caught up.

Anyone else solved this differently? Always interested in other approaches.

https://redd.it/1l7uks8
@r_devops