Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How do you keep track of all the changes in your deployments for audit or compliance checks?

With how fast deployments happen these days, especially in more agile or automated environments, keeping a clear, auditable trail of every single change feels like a constant battle. It's not just about knowing what changed, but who changed it, when, and why, especially when multiple teams are pushing updates continuously. That level of detail is crucial for security and compliance, but it often feels like you're trying to capture water.

The challenge really hits during an audit when you need to quickly pull up specific records or prove adherence to a standard, and the information is scattered across different tools, logs, or even mental notes. How do you manage to maintain a robust, easily auditable history of all your deployment changes without slowing down your release cycles? Thanks for any insights!

https://redd.it/1lprkx4
@r_devops
What automation do you maintain manually because it keeps failing?

Our setup requires me to manually update config across 3 different web consoles whenever we deploy new services - same 20 clicks every time but the interfaces keep changing so automation breaks constantly (I've tried).

Anyone else stuck doing repetitive console work because the tooling changes too fast for scripts to keep up? Could be AWS, monitoring tools, CI/CD platforms - anything where you know you should automate it but gave up after rebuilding the script.

Whats one automation you'd automate if it'd work reliably?

https://redd.it/1lptfbv
@r_devops
What DevOps Job Titles Really Mean

Here's my version, let's hear yours:

"DevOps Engineer" - need one person who can do everything, especially hand-holding our developers and making up for their inadequacies. We'll treat you with as much respect as we used to give Tech Support.
"SRE" - we had too many incidents, we need to productionize but we have no idea how.
"Cloud Engineer" - Terraform and a bit of pipelines, maybe some Ansible/Puppet/Chef.
"Platform Engineer" - Kubernetes admin.

https://redd.it/1lpvp4n
@r_devops
Feeling like an imposter in my Cloud Engineering internship - is my CompE degree a waste?

**TL;DR:** I'm a 22-year-old computer engineering student about to graduate. I've studied everything from transistors to software, but my cloud engineering internship feels completely different from my degree. I'm enjoying it but feel like a massive imposter. Looking for advice from the pros on how to build a solid career in this field and not get replaced by AI.

Hey r/devops,

I'm in a bit of a weird spot and could use some perspective from you seasoned veterans. I'm about to wrap up my computer engineering degree. My studies have been a deep dive, starting from the fundamentals of chip design and transistors and moving all the way up the stack to software development.

In this brutal tech job market, I feel incredibly fortunate to have landed a cloud engineering internship right before I graduate. The work is in AWS and Azure, and I'm getting my hands dirty with some cool stuff. I'm working with Infrastructure as Code (IaC) using Terraform, building out pipelines in Azure DevOps, and dealing with a lot of networking related concepts so far. Got done with a Azure Fundamentals certification too. To be honest, I'm starting to really enjoy it. The whole process of automating and managing infrastructure is fascinating.

Here's the thing, though: I have this nagging feeling of being an imposter. Almost nothing I'm doing on a daily basis directly relates to the low-level concepts I spent years learning in my degree. It feels like I'm operating at the highest level of abstraction, which is a world away from hardware design.

So, my question to all of you who have been in the game for a while is:

* **How can I leverage my computer engineering background to excel in a cloud/DevOps career?**
* **What should I be focusing on right now to build a successful and lasting career in this sector?**
* **How do I position myself to be one of the highly skilled workers and avoid the whole "AI is coming for our jobs" doom and gloom?**

Any advice or shared experiences would be hugely appreciated. Thanks in advance!

https://redd.it/1lpx02s
@r_devops
What social media-like apps/sites would you recommend for keeping up with the latest news in the bubble and also to broaden your knowledge on key systems

Just a disclaimer, i used the term social media-like because I prefer the option of having a ”feed” I can scroll where there’s output from multiple people instead of e.g. reading a blog written by a single person. But im also open to other kinds of ways of keeping up with news/ deepening your knowledge

Reddit is the most obvious answer but even using the home feed it’s saturated with alot of fluff/memes/people with little to none techinal knowledge/straight up nonsense

So I guess im looking for solutions where you read output from accredited individuals with credentials to talk about these things or something along those lines.

I downloaded substack yesterday but for some reason my feed seems to be full of only far-right ideology and conspiracy theorists along with dumb memes and tiktoks, even though I subscribed only to IT related fields

So my question is: what do you guys use for daily reading/keeping up with stuff

For background: im a freshly graduated network engineer currently being trained to work as an devops engineer and want to use some of my free time to learn usefull stuff instead of browsing reddit/ig/whatever and just wasting my screentime on fluff

https://redd.it/1lpyb6o
@r_devops
Easy SonarQube Continous Integration

I have created a shell tool that can simplify improving code quality control using SonarQube, the goal is have a easy integration in CI pipeline. The are two projects one to create a custom SonarQube configuration (SONARSCRATCH) and the other is for CI pipeline (SONARSCRATCH checker). Link : https://github.com/saidani-proj

https://redd.it/1lpysr2
@r_devops
Ass-and-a-half'ing it

We half-assed it the first time.

Then we realized we needed to full-ass it the second time.

So we ended up doing 1.5 asses worth of work. An ass and a half.

Maybe we should have just full-assed it the first time. Or maybe we got 0.6 asses of value from delivering the early version, so 1.5 asses of work is still a net gain. It can go either way, and sometimes 1.5 asses is the right amount of work, but it should be an intentional choice when we do it.

The thing to avoid is defaulting to half-assing it without a concrete value delivery to justify that decision. If we always half-ass it, then we're always signing up for 1.5 asses of work in the long run (at least) even when it doesn't bring us any extra value. That's how you end up delivering 33% less value over a quarter.

https://redd.it/1lq1r0d
@r_devops
DEVOPS GPT

Hi team,
Recently i noticed that Chat GPT has been included a feature/plugin names “DevOps GPT”, do you think that this will negatively affect the field?

https://redd.it/1lq3d7n
@r_devops
Email Tracking Pipeline Advice?

Hey folks 👋

Currently refining our email observability pipeline. We're using AWS SES → SNS → CloudWatch → Datadog, but as expected, the data is too high-level. We need to track and query metrics like open, click, bounce, per subject and recipient, ideally monthly.

Pinpoint is off the table (deprecated + TF modules reject pinpoint_destination). I tried dashboards in Datadog via query filters, but can’t drill down to the email-level granularity we need.

GPT suggested a cleaner route:
SES → SNS → Lambda → Firehose → S3 → Athena + QuickSight/Grafana

I’m considering this, but before investing, I’m curious:

Anyone implemented something similar in production?

Is there a more Terraform-native or managed approach?

Any caveats with Athena on large-scale event logs?

Would love to hear your take or stack suggestions. Open to hybrid/cloud-native patterns.

Thanks in advance!

https://redd.it/1lq4nsf
@r_devops
Moley: Open source CLI to expose local services using Cloudflare Tunnel & your domain name

Hey !

I'm sharing with you a small CLI tool I built for hackathons. Something I needed, and maybe others do too.

At ETH Prague, our deployed backend needed to call a service still running on my teammate’s laptop. He used ngrok — but on the free tier, the URL changed every reboot.

I had to constantly update env vars and redeploy, then test things again. Super annoying, super stressfull, even more when we have to pitch.

So I built Moley: a small, no-infra CLI that lets you expose local services using Cloudflare Tunnels and your own domain name, with automatic DNS setup and cleanup.

It’s designed for people who already use Cloudflare to manage their domain — and want something simple and stable for sharing or deploying local apps.

👉 https://github.com/stupside/moley

# What it solves

No more random URLs (like with ngrok free tier)
No more Nginx or reverse proxies
No need for a public server
You get clean URLs like `api.mydomain.dev`, instantly
Works great for demos, APIs, webhooks, or internal tools
Can even be used to deploy small apps without provisioning anything

# Key features

|Feature|Description|
|:-|:-|
|🔧 Tunnel Automation|Creates and cleans Cloudflare tunnels with one command|
|🌐 DNS Management|Sets subdomains via Cloudflare API|
|🧾 YAML Config|One file to define all your exposed services|
|💸 Free|Just needs a domain and a Cloudflare account|
|🚀 Zero Infra|No Nginx, no VPS, no dashboard, no headache|

# How it works (basic flow)

# Install cloudflared & authenticate
brew install cloudflare/cloudflare/cloudflared
cloudflared tunnel login

# Clone & build
git clone https://github.com/stupside/moley
cd moley
make build

# Set your Cloudflare API token
./moley config --cloudflare.token="your-token"

# Initialize config
./moley tunnel init

# Edit generated moley.yml
# (e.g. to expose localhost:3000 as api.mydomain.dev)

# Start tunnel
./moley tunnel run

When you stop the process, it automatically deletes the tunnel and DNS records.

# Status

Fully working and tested in real hackathon scenarios
⚠️ No formal test suite yet — built it in 2 days because I needed it fast
🔐 Token is stored securely (never in source)
📦 Dependency-free, binary + YAML config

# Looking for feedback & contributors

It’s still early, but I’m using it regularly for hackathons and personal projects.

Would love feedback, issues, or PRs — especially for:

Adding tests
Improving usability / UX
Supporting more config options
Better docs or install flows

Thanks for checking it out 🙏

https://redd.it/1lq4xpp
@r_devops
Ways to get hands-on k8s experience as a manager?

I'm in a leadership role, and due to the timing of my promotion into management, I seem to have side-stepped the container revolution - I have 15 years in industry at pretty much all levels and all industries, but on the old-school VM era. My current management role has been largely hands-off from tech - I've not raised a PR on production code for years.

I'm now in the sitiation where I have no direct hands-on exposure to Kubernetes, and it seems that pretty much all jobs these days need that - even management. It's not like I'm a luddite - I know kubectl and I'm able to have a conversation about it, but I seem to be skimming off the surface for recruiters. I've had some initial chats, but no actual interviews, always because I lack "hands on" with Kubernetes.

In terms of solutions - I'm out of ideas. My current job has no feasible work where using Kubernetes hands-on would be "in scope", as I'm basically just a people manager at this stage.

I'm happy to put the money and effort into taking the CKA on my own time if it would help - but it's an expensive bet to make.

Opinions welcome!

https://redd.it/1lq4d09
@r_devops
DevOps professionals - I need your insights!

Hi everyone ☺️ I'm a postgraduate student researching racing to prove why DevOps adoption in large organisations (such as AWS, Microsoft, Google, Meta, etc) sometimes fails to match the hype.
I call it the DevOps Implementation Paradox (DIP) framework: companies adopt DevOps for prestige or branding, but face real struggles with legacy systems, culture and leadership misalignment.
For research, I'm running a quick survey (anonymous) to capture real-world challenges and enablers from engineers, SREs, DevOps leads and anyone working within this field or with CI/CD pipelines.
Your input will help expose the gap between DevOps hype and practical reality 👏🏻 and will be used ethically in my dissertation.

If you've experienced DevOps wins, frustrations, or fake "DevOps theatre" at work, I'd greatly appreciate your insights 🙏🏻

Copy survey link here:
https://docs.google.com/forms/d/e/1FAIpQLSf17Bd_kAM7G7OTeGIdq5Vcy-uGWlJ3NNaj1qzqFLKBzxkvjw/viewform?usp=header

Thank you for helping bridge the DevOps reality gap! Happy to share final insights with anyone interested.

https://redd.it/1lq75sm
@r_devops
SRE Interview Coming Up – I’m Lost!

Hey everyone!

I have an upcoming interview for a Site Reliability Engineer (SRE) position, and honestly, I don’t have much background in this area (I interned as an SDET) and don’t have any formal work experience yet.

They sent me an email outlining the main components of the technical interview:

1. Applying algorithms, data structures, and computer science fundamentals
2. Explaining and implementing solutions in code without typical engineering aids (e.g., IDEs, online documentation)
3. Communication
4. Pace and speed

I’m wondering is this all they will focus on? Am I not expected to know things like Kubernetes, AWS, CI/CD pipelines, or production logs, since none of that is on my resume?

I’d really appreciate any advice on how to prepare well for this interview.
Thank you! 🙏


https://redd.it/1lqa1br
@r_devops
How can I restrict access to a service connection in Azure DevOps to prevent misuse, while still allowing my team to deploy infrastructure using Bicep templates?

I have a team of four people, each working on a separate project. I've prepared a shared infrastructure-as-code template using Bicep, which they can reuse. The only thing they need to do is fill out a `parameters.json` file and create/run a pipeline that uses a service connection (an SPN with **Owner** rights on the subscription).

**Problem:**
Because the service connection grants Owner permissions, they could potentially write their own YAML pipelines with inline PowerShell/Bash and assign themselves or their Entra ID groups to resource groups they shouldn’t have access to( lets say team member A will try to access to team member B's project which can be sensitive but they are in the same Subscription.). This is a serious security concern, and I want to prevent this kind of privilege escalation.

**Goal:**

* Prevent abuse of the service connection (e.g., RBAC assignments to unauthorized resources).
* Still allow team members to:
* Access the shared Bicep templates in the repo.
* Fill out their own `parameters.json` file.
* Create and run pipelines to deploy infrastructure within their project boundaries.

**What’s the best practice to achieve this kind of balance between security and autonomy?**
Any guidance would be appreciated.

https://redd.it/1lq6x4g
@r_devops
What are the best Continuous Delivery tools on the market today?

I'm looking for a great CD tool that automates various stages of the software delivery pipeline, such as building, testing, packaging, and deploying... What are ya'll using these days?

https://redd.it/1lqd820
@r_devops
Splunk alerts are delayed by 15 minutes, so I started building a side project to fix it. Has anyone else done something similar?

I work in a regulated industry where fast production alerts are critical. Our team relies on Splunk, but over time it’s become so bloated that alerts can be delayed by 15 minutes. That delay has real consequences — our support team no longer trusts it.

Out of frustration, I started building my own real-time alerting system as a side project. I wanted something fast, lightweight, and self-hostable. It's still early, but I’ve already learned a lot (I even implemented passkey login recently just for fun).

I’m curious — have any of you built your own monitoring or alerting tool to replace bloated enterprise solutions like Splunk? What did you learn in the process?

Would love to hear your experiences. I'm trying to stick with this project long-term and keep improving it.

https://redd.it/1lqfv0t
@r_devops
Anyone working with MCP in VSCode for Kubernetes deployments?

I’m exploring the use of the MCP model in VSCode to streamline Kubernetes deployment workflows either by defining context-aware prompts or automating manifest generation. Curious if others are integrating MCP with Kubernetes or VSCode tasks. Any insights, repos, or use cases to share?

https://redd.it/1lqgrsm
@r_devops
How long did it take to finish KodeKloud DevOps roadmap as a beginner?

I’m a complete beginner starting the KodeKloud DevOps Engineer path.
How long did it take you guys to complete it?
And did you feel job-ready after finishing it?


https://redd.it/1lqj1is
@r_devops
Single pane of glass Observability MCP server( a Jarvis style AI assistant)

I’m excited to share a project I’ve been diligently working past month during my free time to help out #devops #sre folks who are always oncall and into “firefighting” incidents, it’s an observability MCP server.

This MCP server — whose name, Eagle-Eye acts like a Jarvis-style MCP server.
Eagle-Eye aims to streamline workflows for on-call #devops, #sre engineers by providing quick insights using the power of AI.

You can ask Eagle-Eye things like:
🔍 “Why is this Kubernetes pod crashing?”
📊 “What’s this Datadog alert about?”
🧑‍💻 “Who’s on call in PagerDuty?”
📈 “Can you explain this PromQL query?”

Eagle-Eye connects to systems using the MCP server, retrieves data, and uses AI to provide recommendations back to the user.

Currently integrated systems include:
Kubernetes (k8s)
PagerDuty
Prometheus
Datadog
…and more integrations are on the way!

It currently use Cursor IDE to interact with the MCP server, making it feel like you’re chatting directly with your infrastructure.

Feel free to download the repo and add more integrations or update the code — it’s completely open source. The idea, as I mentioned, is to have a single-pane-of-glass tool that helps DevOps, SREs, or on-call folks.

I’ve attached some snapshots inside the repo for quick reference.

Here’s the link to the repo:- https://github.com/neeltom92/eagle-eye-mcp/blob/main/README.md

In my next post, I plan to share how I leveraged Facebook’s Prophet forecasting library and time-series metrics from Datadog to build an MCP server that does infrastructure capacity planning at scale.
Imagine a tool that could help predict traffic patterns on CPU, memory, HPA, and more — perfect for handling spikes during Black Friday sales or marketing campaigns.
Excited to keep building and sharing!

#mcp #server #ai #observability #devops #sre

https://redd.it/1lqkgxm
@r_devops
Cloud to Local Server - Should we do Openstack?

Hi,

I work at a startup with a small platform team who are currently running on AWS cloud. We rely on AWS mostly for Aurora Mysql, EKS, Load Balancers. We also have Site-to-Site VPNs, DXs but they are confined to higher environments. We use Kafka for queues but we manage it on our own using strimzi kafka cluster in the EKS cluster. Similarly we also manage our own observability and siem solutions deployed in the EKS cluster.

Recently we have been contemplating about moving our lower test environments out of cloud and save a few thousand dollars a month. Our customers also would be happy at the EOD as we usually pass on the cloud bill to them. So I'm stuck with the below questions

1. If we were to do this and move out of cloud for lower environments:
1. Should we look at solutions like OpenStack because we would want to have a same replica of the environment as we have in AWS, so that devs can get that exact same environment and will help everyone to find any platform related bugs. Or this will over complicate things for us?
2. Instead of OpenStack should we deploy our own EKS cluster and Mysql somehow and manage the rest of the things like we already do in AWS.
2. Should we not go to bare-metal and instead move the lower environments to cheaper clouds like DigitalOcean?
3. Should we even do this? Are the cost savings not worth the effort that the platform team puts in managing multiple cloud/bare-metal environments? Currently we pay around 3-5k USD per month in AWS costs for test environment per customer.

PS: We are a team of 4 engineers who manage devops, cloud, db management and kafka automation frameworks, observability and siem.

Thanks in advance for your insights.

https://redd.it/1lqln1d
@r_devops
How do you identify new attack vectors that target your cloud setup?""

Cloud security is a whole different beast compared to on-prem, isn't it? It feels like you're constantly trying to keep up with new services, features, and configurations across multiple accounts or even different providers. The sheer scale and rapid pace of change can make it incredibly difficult to ensure every corner of your environment is locked down and compliant, leading to that nagging feeling that something might be overlooked.

Whether it's managing endless IAM policies, keeping tabs on configuration drift, or just getting a truly unified view of your risks, there's always something that feels like an uphill battle. What's the one aspect of cloud security posture management that consistently gives you the biggest headache? Appreciate any insights you can share!


https://redd.it/1lqlymi
@r_devops