Reddit DevOps
268 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
source code management for aws instances

hello i'm a junior backend developer, and i joined company. my task until now just update db, and create api for mobile. now i'm trying to learn how to manage source code for prod development and uat server that has been stored on aws instances, i tried to read about version control system using git, but i'm still dont have clear visual how to do it, i asked ai and stuff but still have missing point related with scm on aws instances. is anyone have documentation relate with it, or any experience with this?

thank you so much

https://redd.it/1kq3nu1
@r_devops
Tracking your AI Agents

We built AgentWatch, an open-source tool to track and understand AI agents.

It logs agents' actions and interactions and gives you a clear view of their behavior. It works across different platforms and frameworks. It's useful if you're building or testing agents and want visibility.

https://github.com/cyberark/agentwatch

Everyone can use it.

https://redd.it/1kq6iju
@r_devops
I have a question about DNS configuration

I deployed my web app using Render. I am using Name Bright for my domain name. I usually just deal with name servers, but Render gave me A and CNAME.

My DNS configuration is below and I deleted the default Name Bright name servers. That was last night, and DNS Checker still shows it’s not propagated. Is my configuration correct, assuming that it’s what Render gave?

Configuration:
A: subdomain = @ | ip address = 216.24.57.1
CNAME: subdomain = www | CNAME = trendy-wqzi.onrender.com

https://redd.it/1kq86ld
@r_devops
Is DORA Enough? What We Learned After Building Full-Stack Continuous Delivery

Whats your northstar as a DevOps?

Has anyone here built out full-stack continuous delivery and started measuring more than just DORA metrics? Does this matter to you? If not this then how do you make sure you align to what the business needs?

We’ve been deep in this space, trying to solve the real delivery pain: fragmented pipelines, duplicated logic across tools, and constant drift between environments. So we built a platform, not to replace CI/CD, but to make it actually work end to end. It covers everything from infrastructure provisioning to Kubernetes-native application deployment, with tooling and observability wired in automatically. I believe the key point here is to have a CD that works without changes to local development on a dev laptop as it does to our huge cloud Kubernetes clusters.

The flow starts with GitLab CI triggering a call to our platform’s API. That API handles a global spec for the environment, selects the appropriate delivery path, and renders validated Helm values for the workload. It then hands it off to ArgoCD, which manages the sync into Kubernetes. From there, everything lands in a unified state: infrastructure, core tools, and apps deployed and monitored together.

All tools are deployed Kubernetes-first, using native patterns: Helm charts, CRDs, secrets via External Secrets, persistent volumes via CSI, and Git-based configuration. The environment comes up with everything pre-integrated, nothing glued together post-deploy.

Our base platform includes OpenTelemetry for tracing, OpenSearch for logs, PostgreSQL instances pre-wired into services, Sentry for error monitoring, and NATS as an internal event bus for inter-service communication and platform signaling. Debugging is no longer jumping across five tools—our platform gives full visibility across deployment layers, from Helm history to K8s runtime status to distributed traces.

The biggest shift has been in reliability. Before, we’d see around five broken deployments per feature branch, mostly due to differences between staging and prod. Now, with delivery flows and environments standardized, we’re down to about one failed deployment in every fifty commits—and most of those are app logic issues, not infrastructure or delivery bugs.

We still track DORA, lead time, deployment frequency, failure rate, time to restore—but those metrics alone aren’t cutting it anymore. They don’t reflect time lost in debugging pipelines, investigating drift, or recovering from partial failures when infra and app deploys go out of sync.

Curious if others here are building similar full-stack delivery systems, or tracking alternative metrics that get closer to real delivery friction.
How are you quantifying the quality of delivery?

Is DORA enough, or are there better ways to measure what's actually slowing us down?

https://redd.it/1kq7m3p
@r_devops
How do you manage hybrid clouds?

If you have some servers in cloud and some in your local infra. How do you manage the connections between them?

Im thinking using vpn but im sure i can do something better with google cloud

https://redd.it/1kq9e4k
@r_devops
Bohr Model of Atom Animations Using HTML, CSS and JavaScript - JV Codes 2025

Bohr Model of Atom Animations: Science is enjoyable when you get to see how different things operate. The Bohr model explains how atoms are built. What if you could observe atoms moving and spinning in your web browser?

In this article, we will design Bohr model animations using HTMLCSS, and JavaScript. They are user-friendly, quick to respond, and ideal for students, teachers, and science fans.

You will also receive the source code for every atom.

# Bohr Model of Atom Animations

# Bohr Model of Hydrogen

1. Bohr Model of Hydrogen
2. Bohr Model of Helium
3. Bohr Model of Lithium
4. Bohr Model of Beryllium
5. Bohr Model of Boron
6. Bohr Model of Carbon
7. Bohr Model of Nitrogen
8. Bohr Model of Oxygen
9. Bohr Model of Fluorine
10. Bohr Model of Neon
11. Bohr Model of Sodium

You can download the codes and share them with your friends.

Let’s make atoms come alive!

Stay tuned for more science animations!

Would you like me to generate HTML demo code or download buttons for these elements as well?

https://redd.it/1kqcqhf
@r_devops
After 24 years in IT, I'm done.

I don't want to debug another fucking YAML file.

This is not how I foresee spending my life.

Thank you.



https://redd.it/1kqe912
@r_devops
We’re Part of the Founding Engineering Team at groundcover!

Hey 👋 We’re here to chat about all things cloud-native observability! This post will run from May 19-23, so jump in and ask away. No topic is off-limits.

# Who We Are

We’re part of the founding engineering team at groundcover, building a modern, cloud-native observability platform that’s redefining how teams monitor and troubleshoot applications in Kubernetes environments.

Our engineering efforts focus on:

Building high-performance, low-overhead observability tool powered by eBPF
Leveraging a unique Bring Your Own Cloud (BYOC) architecture to shift-left costs and privacy with no infrastructure markups
Tackling real-world troubleshooting challenges in large-scale, distributed cloud environments
Making observability fast, accessible, and seamless — for managed and self-hosted cloud environments
Developing zero-instrumentation solutions to give engineers immediate, out-of-box actionable insights

We also run an active [Slack community](
https://app.slack.com/huddle/T03ELGQ5J2W/C03ELGQ6Y2E) and updated [Docs](https://docs.groundcover.com/) for devs, SREs, and cloud enthusiasts to discuss cloud monitoring, eBPF, OpenTelemetry, and more. Feel free to join!

\--

About Us

Noam Levy —
Field CTO @groundcoverI’m a Field CTO and part of groundcover’s founding engineering team. For the past decade, I’ve led engineering groups focused on building microservices-based web applications, optimizing complex application pipelines, and tackling system engineering challenges at scale.

Aviv Zohari —
Field CTO @groundcoverI’m a Field CTO and founding engineer at groundcover, I work on eBPF-based observability solutions. My passion lies in deeply understanding how software systems behave in the wild and designing tools that make monitoring them simple and efficient. Previously, I worked as a security researcher breaking weird machines for a living.

\---

# What We'll Cover

We’re here to talk about the cloud monitoring and observability landscape, including:

Exploring the power of eBPF in Kubernetes
Kubernetes troubleshooting: how to fix common issues
Troubleshooting cloud-native apps, including the most frequent errors
Next-gen microservice architecture trends
On-prem observability considerations
BYOC (Bring Your Own Cloud) — what it means and when it makes sense
OpenTelemetry and eBPF: everything you need to know
AI Agents and Observability — what’s coming next
OpenTelemetry: benefits, challenges, and best practices

…and anything else you’d like to throw at us!

We’ll help unpack the most interesting observability trends, tradeoffs, and challenges in 2025, and share what we’re seeing out there in the wild.

Let’s dive into your questions!



https://redd.it/1kqebyj
@r_devops
What’s your “I’m definitely a cloud person now” moment?

For me, it was when I caught myself saying things like “I’ll just spin up an environment real quick” while making coffee at 7am.

Or the time I set lifecycle rules for my personal Google Drive after spending a week with S3 policies 😂

It’s weird how cloud thinking just... seeps into your brain.
What was your moment?
When did you realize cloud had officially taken over your brain?

https://redd.it/1kqgyfl
@r_devops
Read-only Fridays led to creating Neofetch for Terraform

My boss advocates for dedicating specific hours each week to learning new, fulfilling, and interesting topics. We’ve implemented read-only Fridays, where we allocate a few hours in the morning or afternoon to acquire new skills that pique our interest. Personally, I’ve been on a side quest to enhance my Go skills. So this past Friday, I decided to experiment with a seemingly useless but enjoyable tool to add some flair to our infra repositories. It’s called Terrafetch (Neofetch for Terraform), which implements a straightforward terminal interface that provides statistics on various aspects of our infrastructure, including variables, outputs, providers, modules, and documentation. I highly recommend adopting a similar structure where team members can allocate time for learning. It keeps things fresh and spicy. If you’re interested in Terrafetch, here’s the repository: here’s the repository.

https://redd.it/1kqdr38
@r_devops
The DevOps Skills Score Card

Ive been doing some hard-core skill analysis and made this to help me find my weak spots.

Figured I should go ahead and share it. Let me know what you think!

https://docs.google.com/spreadsheets/d/1QT2iUlLlt9R44U4lsTL0u5rOC\_Cr\_zuYLYAazp-2oA8/edit?usp=sharing

edit: lol, I misspelled score card.. whatever, Im keeping it.

https://redd.it/1kqj76j
@r_devops
Task executor with "friendly" UI

We have automations all over the place and we're looking into centralizing into anything. We're trying to hit the points of HA (if it's self hosted), if cloud have an agent or some way to run scripts in network so we can run scripts on prem, SSO/SAML /w RBAC, able to run python /w libraries/etc, have a rest api so we can remotely start jobs, tell us if something went wrong, etc. While this would be for us I would love it if there was a non-scary UI so internal people can run jobs.

I've been casually looking for a month and it looks like I have three categories: holy hell there goes my kidney (e.g. runbook/process automation that has a yearly fee and per user licensing), low code solutions that I'm not confident will work with much of the custom logic we'd want to do and is consumption based [we have mssql and use dynamic ports, so all those query mssql actions? Ya those don't work.\] (e.g. azure logic apps, n8n), on prem solutions that miss one or more of the major points (argo workflows [worried it's complex enough to make an automation that people won't use it, comparing to aws lambda\], awx [locks us into ansible\], jenkins [technically does everything but we're actively trying to kill these off so I don't want to make another one if possible\], rundeck [no HA, SSO if one is willing to hack it a bit...but i don't want to rely on hacking things together\]).

We have budget, but I don't have $25K/yr + more for users. I'm leery on using consumption based because I'd want to put the monitors we have in that system that trigger every min or two. Is there something you guys have used that fits this or am I being unrealistic?

https://redd.it/1kqtbno
@r_devops
Career Advice

So i am in IT and having a hard time choosing a major to focus on i am currently trying to focus on cloud and unix because cloud(Azure) really in demand in canada and Unix is my strongest cuz i have spent more time on it so i am choosing both which are essential for devops is this good? i hate networking and cybersecurity is secondary

https://redd.it/1kqsmeh
@r_devops
CS grad who interned as a network engineer looking for next step

Hi just graduated a couple weeks ago and am now trying to continue learning as i apply for jobs. My goal is to work in the cloud engineer or devops space and right now i want to learn more about devops. In my capstone we worked with azure devops for version control and I interned as a NE last summer. ( im applying for everything from developer to network to data science type roles, but my desired field is devops i believe. as i feel it incorporates alot of what i learn vs being hyper focused)

Right now im considering either purchasing continuous delivery by jez hamble , or jumping straight into making a beginner/intermediate CICD pipeline following a tutorial , or doing one of those free code camp devops programs, focusing on what i don't know.

Any recommendations on what my best use of time would be?

https://redd.it/1kqrk79
@r_devops
How to interview experienced people?

I have to interview people with 3-4YOE.

What should i ask them? Should I ask them targeted questions on things we use. Questions which one should know if they really have used the tools.

Like IAM policies and cross account access, S3 resource policies, etc. And Ansible or Terraform basics like commands, underlying logic, etc.

And what should I ask them on Kubernetes? How to judge someone and send them to the next round?

The real challenge is when candidate resume mentions things that I have 0 idea. How should I ask such a candidate and judge them on their technical skills?

https://redd.it/1kqwxc9
@r_devops
I made a TUI for OpenTofu (Terraform) provider registry

If you're like me, when developing terraform code, you often switch to your browser and then google "terraform aws provider" or "terraform github provider" to browse available resources, their documentation, versions etc. I hated that workflow and decided to fix it by creating a TUI that interacts with OpenTofu registry API (still compatible with Terraform). Now whether you are a VIM, VSCode or IntelliJ user, you can use the terminal that's always nearby to look up exactly what you need.

GitHub: https://github.com/djetelina/tofuref
PyPi: https://pypi.org/project/tofuref/


Any feedback and suggestions are appreciated, while I was content enough with the current state to release it as 1.0, I'm sure there's more this tool could do :)

https://redd.it/1kqynmk
@r_devops
Feeling lost - dont know what to do with my career

Hi guys,
I am writing this post, as I am lost what to do with my career.

Small backgroud:
I am 23, and 3 years ago, just after my first year at university, I started internship in a big company, as I wanted to quickly gain some experience and internships at my collage are obligatory anyway (studing Telecomunnication engineering/CS).
As I was really devoted to the internship (Python developer), I took every extra task possible and tried to help with every interesting topic in sight, got very positive feedback and I stayed in.
With time my job quickly gravitated towards DevOps, more responsibilities, while still studing full time.

And here I am, after 3 years of studing full time, while in breaks between one lecture and another logging to dailes and meetings, spending all my spare time doing homeworks after work or doing work after day at university.
I berely finished my degree, after extending it for a half a year.
Now, after pursuing my master for half a year, I will probably start it again, as I failed most of exams already.
Things which used to be fun, now are only a chore, I have to force myself to study anything after 8 hours at work. Even things that used to interest me.

Now I am staring at another failed pipeline in terraform, wondering how did I finished here. Something that was supposed to be quick internship, ended in being full time career.
But here is a trap which I dont know how to deal with: the job is well paid, much more then any of my collegues from uni do, the team is fine and I am really appriciated here. The problem is, I dont really like this kind of job, I always wanted to do something more "interesting" and this job is quite frustrating (continous debugging, fixing pipelines and waiting ages for someone to do his tasks to unblock me (big company)).

I am feeling lost with next steps:

1. Taking some loooong break, and focusing on uni.
2. Trying to focus on job, hoping it will get better with more free time (but I am not sure if I will ever go for master degree if I skip it now...), maybe DevOps isnt that bad and I will regret changing career in future?
3. Trying to join company focused on my interest (space exploration, also programming) which I am after first rounds of interview and waiting for decision. Catch is, its half a salary which I make here.

https://redd.it/1kr0a51
@r_devops
Similar to cold start problem

My spring boot application is taking 120s to start, When a new pod gets spawned up in kubernetes cluster.

So, I have to include the readiness probe. Which is slow downing the load testing.

am I missing something here. can the spring application start can happen beforehead?

https://redd.it/1kqxx9u
@r_devops
Built something to monitor and forecast API usage across providers like OpenAI — curious if other DevOps folks face this pain

Hey all,

I’ve been working on a side project to deal with a challenge I ran into while building with LLM APIs — tracking and forecasting usage across providers like OpenAI and Anthropic. Especially when running workloads at scale, it’s easy to lose visibility into token consumption, cost spikes, or quota limits.

The tool I’m building:
• Monitors real-time usage (tokens, credits, endpoint data)
• Alerts when you hit certain thresholds (like 80% of quota)
• Forecasts future usage based on historical trends
• And checks if providers are up/down before your workflows break

Would love to know:
Do any of you manage LLM or third-party API usage this way?
What tooling do you use today to keep track of spend and reliability?

Not trying to pitch anything — just genuinely curious how others are solving this in a DevOps environment, especially when infra teams are told to “make sure OpenAI doesn’t break production” 🙃

If you’re interested, I’m happy to share a link in the comments so you can try it out and give feedback. Thanks!

https://redd.it/1kr222o
@r_devops
Anybody here built their own K8s operator? If so, what was the use case?

I’m trying to expand my K8s knowledge and Go skills by figuring out some good use cases for creating my own operator.

So far, the only thing I could come up with is an operator that analyzes cluster event logs and offers up a report for security improvements leveraging AI API.

I would like to find something a bit more practical though.

https://redd.it/1kr2twg
@r_devops