Reddit DevOps

OpsGenie shutting down, Pagerduty or Rootly?

I sure as hell will not switch my entire workflow / ticketing system over to Atlassian LOL. but i get it, most companies they're targeting probably already have Atlassian contracts.

Stuff I need:

\- integrations with ASPM / DSPM (crowdstrike/groundcover).. i'm not writing lambda functions to convert one alert into another.

\- not charged arm and leg for phone calls

\- slack integration would be a massive plus.

\- good team modelling.

\- different on-call schedules and overrides. if can integrate with HR management system that'd save me so much time LOL

\- don't really care about the UI much, hopefully don't have to log-in more than a few times a month

pricing obviously cheaper better.

looks like both has "easy" migration, where they'll do it for us

thoughts?

https://redd.it/1lkcjxg
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views18:28

Reddit DevOps

Azure - VMSS undergoing maintenance.

Anyone else seeing this over and over today? Im in CentralUS and all my VMSSs are going into maintenance on and off for the last few hours.

https://redd.it/1lkc9m1
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views19:28

Reddit DevOps

I have an interview for a Junior DevOps engineer position at EY, what to expect in interview?

So the interview is suppose to be strictly 30 minutes. My guess is it will mostly be behavioral type questions about my background. Does anyone have any experience with this? It's with the IT Risk and Compliance Team.

https://redd.it/1lkfy4z
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views20:28

Reddit DevOps

what is the best way to learn helm charts?

i have completed a helm charts course on cloud guru and i feel like i get the concept of it well enough but i wouldnt know where to even begin if i were to actually develop a helm chart for an application without using the public repo. which sucks because i have been tasked to do exactly that at work.

to those who are proficient at Helm, what was your learning method? how did you go from watching or reading about it to actually developing working charts?

https://redd.it/1lkh9zi
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views21:28

Reddit DevOps

study course or book to learn DevOps from zero to hero

I was googling and there are so many offerings on learning devops i wanted to come on here and ask what is the preferred way to start my journey.

my background is a network engineer, i have used ansible and netmiko python library to run simple repetitive tasks like backing up config on network gear.

thanks

https://redd.it/1lki900
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views22:28

Reddit DevOps

Transitioning from Cybersecurity to Cloud Architecture — Advice Welcome

I recently transitioned from a cybersecurity role into a cloud architect position and received a $17K raise—bringing my total comp to $115K. I’ve got around 3 years of experience, hold a master’s degree, and currently work as a Lead Associate with a TS/SCI clearance.

That said… I can’t shake the feeling that I’m still underpaid given my background, skills, and clearance. I'm looking ahead and trying to figure out what’s next in my journey.

Reddit—has anyone made a similar leap or been in this position before? What advice would you give someone trying to level up from here?

https://redd.it/1lkllx0
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views00:28

Reddit DevOps

Introducing DockedUp: A Live, Interactive Docker Dashboard in Your Terminal 🐳

Hello r/devops!

I’ve been working on DockedUp, a CLI tool that makes monitoring Docker containers easier and more intuitive. If you’re tired of juggling docker ps, docker stats, and switching terminals to check logs or restart containers, this might be for you!

## What My Project Does
DockedUp is a real-time, interactive dashboard that displays your Docker containers’ status, health, CPU, and memory usage in a clean, color-coded terminal view. It automatically groups containers by docker-compose projects and uses emojis to make status (Up 🟢, Down 🔴) and health (Healthy ✅, Unhealthy ⚠️) instantly clear. Navigate containers with arrow keys and use hotkeys to:
- l: View live logs
- r: Restart a container
- x: Stop a container
- s: Open a shell inside a container

## Target Audience
DockedUp is designed for developers and DevOps engineers who work with Docker containers and want a quick, unified view of their environment without leaving the terminal. It’s ideal for those managing docker-compose stacks in development or small-scale production setups. Whether you’re a Python enthusiast, a CLI lover, or a DevOps pro looking to streamline workflows, DockedUp is built to save you time and hassle.

## Comparison
Unlike docker ps and docker stats, which require multiple commands and terminal switching, DockedUp offers a single, live-updating dashboard with interactive controls. Compared to tools like Portainer (web-based) or lazydocker (another CLI), DockedUp is lightweight, focuses on docker-compose project grouping, and integrates emoji-based visual cues for quick status checks. It’s Python-based, easy to install via PyPI, and doesn’t need a web server, making it a great fit for terminal-centric workflows.

## Try It Out
It’s on PyPI and takes one command to install (I recommend pipx for CLI tools):

pipx install dockedup

Or:

pip install dockedup

Then run dockedup to start the monitor. Check out the GitHub repo for more details and setup instructions. If you like the project, I’d really appreciate a ⭐ on GitHub to help spread the word!

## Feedback Wanted!
I’d love to hear your thoughts—any features you’d like to see or issues you run into? Contributions are welcome (it’s MIT-licensed).

What’s your go-to way to monitor Docker containers?

Thanks for checking it out! 🚀

https://redd.it/1lkmf9d
@r_devops

GitHub

GitHub - anilrajrimal1/dockedup: A real-time, interactive CLI dashboard for monitoring Docker containers. View status, health,…

A real-time, interactive CLI dashboard for monitoring Docker containers. View status, health, CPU, and memory usage with a clean, color-coded interface. Supports docker-compose grouping and hotkeys...

11 views01:28

Reddit DevOps

SysDE at AWS worth it?

I'm in an interview loop with AWS for the Systems Development Engineer role building a new region.

My current experience is mainly in AWS, K8s, Python & Shell. The learning opportunities in my current role are great, despite the pay being average. My goal is to maximise my earning potential by getting into big tech, while also having access to learning opportunities, especially in dev side of devops.

Despite the pay at AWS being potentially great, the job description of the SysDE role seems very vague. I haven't been told much other than the fact that it involves Linux and some programmimg.

Anyone been a SysDE at AWS? What's the exact tech stack? How much dev work does it really involve? I'm not sure if doing mostly linux administration is worth the great pay package, if that were the case.

https://redd.it/1lknnyf
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views02:28

Reddit DevOps

Do you spend time optimizing jenkins jobs?

Hey guys,

In our company we have a lot of jenkins jobs almost 400. Some are for deployments used by devs, others are our own for some metric and monitoring stuff.

My manager has been for the past 1-2 years has been focusing much on optimizing on creating common jobs for all the stuff to minimize this number of jobs. Even if they are 4-5 jobs of a type he asks us to create a common job to accumulate these 4 so that if change is required in all then we can change in just one place and everything will work fine. Initially I was involved in creating a common pipeline for all deployments, that went well, we did it. But now he is just asking us to "commonize" every repeating pair or part of jenkins jobs that he sees.

Is this relevant for devops? Will that help with anything? Or is he just trying to solve a problem that never existed? Do you take part in these activities? Will they ever help a devops engineer in any way? Will putting these things in your resume or cv, attract recruiters?

https://redd.it/1lku3k7
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views08:28

Reddit DevOps

Grafana monitoring

Hello Folks,

Those who are using azure and grafana to visualize the data, how are you querying the data?
We are using SQL to fetch the data however the queries are running frequently and increases the sql usage, we want to avoid relying on SQL?
What is you approach?

https://redd.it/1lkvwt4
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views10:28

Reddit DevOps

UK Thinking of moving from IT Field Engineer to DevOps

Hey folks,

Been in IT for about 12 years now, basically all I’ve ever done on my life. Started out in tech support and eventually moved up to IT Field Engineer. Still doing hands-on work, and while I enjoy it, I’ve been seriously thinking about shifting into DevOps.

Main reason? DevOps salaries here in the UK look a lot healthier than what I’m on right now, even if I had to start over as a Junior (vs experienced tech).

Due to expire later this year, I’ve got my AWS CCP (never managed to use it in any of my jobs though) and I’ve dabbled in Azure (VM's only) in the past through work. I’ve also done some homelab stuff using Oracle Cloud (free tier) nothing massive, but enough to get some knowledge.

I was considering doing a bootcamp to accelerate things, since I tend to pick up new tech pretty fast. But I’m not sure if it’s worth the investment or if I should just go the self-study route and build a portfolio or certs instead.

Also, curious about how DevOps folks are feeling about AI right now. Within my current role, I’m not too worried, I don’t see AI replacing that any time soon. But what’s your take? Is it changing the DevOps space already? I can feel if the company allows you to use it can be a good allied to work, when comes to makes scripts, etc. Boost on productivity.

Would love to hear any advice or experiences from others who made the switch. Cheers!

https://redd.it/1ll0i3s
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

13 views14:28

Reddit DevOps

Arachni/Codename-SCNR Shutdown

Arachni was a DAST scanner I had used in previous projects, I went looking for it earlier this year to find out it had been converted to a new project, Codename-SCNR owned by ecsypno.

Here is the origin story, taken from the wayback machine since their site is down:

Origin

Today when going to the site I discovered that it no longer exists:

ECSYPNO

And the only thing I could find was a somewhat cryptic post on twitter from the owner, stating "Ecsypno.com is closing shop for the foreseeable future due to sabotage of my personal and professional lives."

Anyone here a customer? I wonder what will happen to the software for people who have already paid. It was definitely a smaller commercial enterprise, so hopefully not too many orgs are impacted, but it is interesting nonetheless.

https://redd.it/1ll49ke
@r_devops

Ecsypno

The Arachni Chronicles

A story of curiosity, experimentation, development, million euro deal, fraudsters, abandonment and revitalization.
From the inception of the F/OSS Arachni WebAppSec scanner to the opening of Ecsypno’s doors with its flagship product Codename SCNR.

8 views16:28

Reddit DevOps

I hate existing doc tooling

I don't think this breaks community guidelines (I post here regularly), if I am please remove the post.

I'm increasingly frustrated with how documentation tooling stinks at striking a balance between being useable for non-technical users and being well suited for automation/compliance workflows. I'm considering putting a service together and have a [quick survey](https://forms.fillout.com/t/aZtDWSYiMrus) that could help me validate some ideas. Also welcome discussion below.

* Why does nobody tackle document localization?
* Why does every service expect data backups to be done with some half-baked manual export function?
* Aside from Confluence, most have no options for data residency.

https://redd.it/1ll6h22
@r_devops

Doc Survey

Made with Fillout, the best way to make forms, surveys and quizzes your audience will answer.

9 views18:28

Reddit DevOps

Bare metal k8s interview questions, what will be asked?

Bare metal k8s interview questions, what will be asked? I said I know bare metal k8s, but Im familiar only cloud managed k8s, What kind of questions can I expect and how to answer them. Can anyone share some insights.

https://redd.it/1ll75yn
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views19:28

Reddit DevOps

Boss encourages a culture of „fixing in prod“ and it drives me insane

Disclaimer: I’m not a native speaker, I apologize for any confusion.

I’m the „DevOps engineer“ in a kinda established start up (running for more than 6 years, not yet profitable, Series A in October 2023). Technically what we do is not DevOps, rather classic ops just with more chaos but that’s not the topic.

I am responsible of doing the prod deployments and more than half the deployments, it does not go through smoothly. Manual scale downs need to be done before, restarting pods, even sometimes I need to pull in engineers to tell me what’s wrong and then they manually create an index, run a database query or things like that.

After another today if botched deployments today, it pissed me off so much, I wrote a manifesto called „no cowboy ops manifesto“ and here’s the content:

Disclaimer: I used ChatGPT for phrasing but I explicitly fed it my thought before.

**Production is Sacred – Let’s Treat It That Way**

Purpose

We work on live, customer-facing systems. Every action in production reflects our reliability, trustworthiness, and respect for our users. This manifesto defines the **boundaries** we operate within during deployments and incidents.

> “Production is not a playground. It’s a place for professionals who prioritize safety, reliability, and shared responsibility.”
>

## Core Principles

### 1.Rollback First, Investigate Later

If an update breaks, we **roll back within 10 minutes** unless there’s a playbooked fix. Recovery is a process, not a race.

> “Don’t treat rollback as failure. It’s responsible engineering.”
>

### 2. No Manual Fixes Without Audit Trail

No:

- kubectl exec hacks
- Manual scaling or pod deletions
- Ad-hoc DB queries or index creations
- Queue drops or reboots

Unless:

- It’s part of an **approved, version-controlled procedure**
- Actions are **fully logged and explained** in the postmortem

### 3. Every Fix Must Be Reproducible

If it’s not automatable, it’s not production-ready.

Fixes made by hand must be:

- Committed to a Git repo
- Reviewed and explained in plain English
- Deployable via CI/CD

### 4. No Heroes, Only Teams

We value coordinated response over solo acts. If you “save prod” alone:

- You take on **postmortem ownership**
- You also take on **automation follow-up**
- You may still get challenged on why rollback wasn’t used

### 5. Visibility Over Secrecy

Manual interventions will trigger alerts/logs. That’s not to shame—it’s to **protect the system** and learn from every exception.

## **Your Checklist in Production**

- Issue during update? Roll back within 10 minutes.
- Need a manual fix? Stop. Is it versioned or pre-approved?
- Touching prod? Tell the team. No silent SSHs.
- Fixed something? Great. Now write it up and automate it.

Now by boss‘ response was this:

Strongly disagree!

* Generally: When we tell customers we want to do something, we should at all times do everything in our power to ensure that it actually happens. A Rollback means we did not do what we promised to do in a first place and it is for sure considered as failing our promise.

* Priority 1 is always to make an update successful.
We are not in a business where Maintenance windows can be granted multiple times a day or week and this is connected with a huge communication effort on all sides and should be handled very carefully. So if there is any way to proceed even with manual intervention it is encouraged to do whatever needed to be able to make that update a success.

* Rollback should be the last possible solution when everything else failed. Within a maintenance window a rollback can be decided within the last 10 minutes if until that point we were not able to successfully update.

* We should do everything (including Rollback if needed) to be back online before the end of our maintenance window, respecting the window is super important.

* In case we see that we will not hold our maintenance window, we should rollback.In case we need to Rollback, we need to make sure to gather as much information as

7 views20:28

Reddit DevOps

somehow possible before rolling back to make sure we understand the problem before the rollback is triggered.

* Fixing whatever was needed to do manually to make the update a success or in case of rollback fixing what caused it is always highest possible priority as a next action

———

I think this is not a way to run a stable environment and ist driving me crazy. I am in this business for over a decade and quite confident in my abilities and views but I would still appreciate your opinion and advice. Thanks and apologies for the wall of text. I tried to be as brief as possible without missing many details.

https://redd.it/1ll9ul3
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views20:28

Reddit DevOps

Solution to re-run terminated AWS spot instances in CI jobs?

Hey guys,

I'm currently running a script every 15 minutes to re-run terminated jobs via Github API, but it's far from ideal and still missing some of the terminated workflows.

I saw this post from 3 years ago and was wondering if anyone has come up with a better solution by now.

Thanks!

https://redd.it/1llco45
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views22:28

Reddit DevOps

📡 Anyone setting up HTTPS for JupyterHub? Here’s my method using Jupyter AI setup

Hi all,

I recently had to configure HTTPS for JupyterHub while working with Jupyter AI and wanted to share a working method in case anyone else is trying to do the same.

The process involved:

Generating self-signed SSL certs (or using Let's Encrypt)

Editing the JupyterHub config

Restarting with the right flags and paths

It took a bit of trial and error to get it stable, especially since Jupyter AI has some subtle differences in environment behavior.

Would love to hear how others secure their notebook environments — especially for production or collaborative setups.

#Jupyter #HTTPS #DevOps #SelfHosted #JupyterHub #Security #Tips

https://redd.it/1llf10d
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views00:28

Reddit DevOps

From the devops community on Reddit

Explore this post and more from the devops community

7 views01:28

Reddit DevOps

How do you handle the glue between Java builds, Docker images, and deployment?

I'm curious how teams out there handle the glue code between building Java projects and getting them into production.

What tools are you using to build your Java projects (Maven, Gradle, something else)?

Once you build the JAR, how do you package it into a Docker image?

Are you scripting this with bash, using Maven plugins, or something more structured?

How do you push the image and trigger deployment (Terraform, GitOps, something else)?

Is this process reliable for you, or do you hit flaky edge cases (e.g., image push failures, ECS weirdness, etc)?

Bonus points if you're using ECS or Kubernetes, but any insights from teams with Java + Docker + CI/CD setups are welcome.

https://redd.it/1lli67l
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views02:28

Reddit DevOps

Am I deploying to On-Prem right

# Context

I'm the all-rounder at my agency, handling development, DevOps, database administration, sys admin, as well as whatever else is needed when someone doesn't have the necessary skills available.

A colleague comes to me, having built a script (in TypeScript) that needs to run on a cron on a customer-controlled platform, specifically an RHEL VM on an on-premises server, for specific reasons (unimportant at this point, just need to accept this is not able to be changed).

# Problem

Most of my experience is building and deploying artifacts in a cloud environment for containerised services, so my experience with on-prem, non-containerised workloads is not too well honed.

Currently, the on-premises server is locked down to a VPN and accessible via SSH.

# Current Approach

My current approach is to use Ansible executed from a CICD runner (right now, there is some uncertainty about what CICD we will be using, so it's unclear if I need to get the runner to connect to the VPN or if I can request the runner be whitelisted).

This seems like the exact use case for Ansible, but due to my lack of experience with Ansible, I'm wondering if there are better options (by better options I don't mean using other tools like Chef, Puppet, Saltstack or something else, I mean specifically higher level)

https://redd.it/1llkb5p
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views04:28

About

Blog

Apps

Platform