Reddit DevOps

10 Must-Have Grafana Dashboards for Kubernetes Monitoring with Prometheus (2025 Edition)

Overwhelmed by Kubernetes metrics? Check out this practical guide featuring 10 essential dashboards and why OpenTelemetry integration matters. Read here

https://preview.redd.it/t2s5r1qhdspe1.png?width=1200&format=png&auto=webp&s=18be801ebd5be402855ab4f3ea7bd8bc0ea1bb65

https://redd.it/1jfifli
@r_devops

Skedler

Top 10 Grafana Dashboards for Kubernetes Monitoring with Prometheus (2025) + Reporting Tips

Unlock the power of your Grafana Dashboard in Kubernetes. Discover tips for effective monitoring and reporting automation.

6 views06:28

Reddit DevOps

FREE LINUX AND KUBERNETES LEARNING RESOURCES 2025

Click Here

https://redd.it/1jfio4k
@r_devops

GitHub

GitHub - Pulkit12966/redhat_official_studyguide_RHCSA: Set of two books for official red hat's RHCSA exam.

Set of two books for official red hat's RHCSA exam. - Pulkit12966/redhat_official_studyguide_RHCSA

9 views07:28

Reddit DevOps

Framing work experience

Hi DevOps community. I was hoping that the community could shed some light on how to frame a particular year of my work experience while looking for new roles? For context, I have 4 total years of professional experience. 1 of those years I worked as a Systems Engineer for a well-known IT management consulting firm that is primarily a DoD contractor (wont directly say the name of the company but it’s the one that “House of Lies” is based on), and while there I had an active Secret clearance. On top of that there was so much red tape that I was only ever assigned to two (very) slow-moving projects. I don’t know how to properly frame my experience there in interviews. Please be constructive but kind. Thanks everyone!

https://redd.it/1jfk5d0
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views08:28

Reddit DevOps

The Art of Argo CD ApplicationSet Generators with Kubernetes

The Art of Argo CD ApplicationSet Generators with Kubernetes: https://piotrminkowski.com/2025/03/20/the-art-of-argo-cd-applicationset-generators-with-kubernetes/

https://redd.it/1jfla5x
@r_devops

Piotr's TechBlog

The Art of Argo CD ApplicationSet Generators with Kubernetes - Piotr's TechBlog

This article will teach you how to use the Argo CD ApplicationSet generators to manage your Kubernetes cluster using a GitOps approach.

10 views10:28

Reddit DevOps

Anyone use Cribl?

I have a team at work that is doing a PoC of the Cribl product for a very specific use case, but wondering if it is worth a closer look as an enterprise 0lly pipeline tool.

https://redd.it/1jfp117
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views14:28

Reddit DevOps

Weird situation after reorg

Hey all. I am looking for some advice. As part of a reorg, I was transitioned to the ops team's manager, who manages a team of infra/devops engineers. Previously, I used to report to the engineering team director and I am the only devops guy managing an app.

It's been over 2 weeks but I haven't heard anything from this new manager. I even sent an email 4 days ago asking to set up a quick call, but no response. He also doesn't look to be on PTO, his status always shows available or in a meeting. I am feeling a bit stuck and left out. To add to the challenge, the other team members of this team manage totally different products/apps, so there hasn't been much overlap or opportunities to naturally connect.

Just wanted to get any ideas on how to approach this. I'm also worried about lack of communication going forward working with his team.

Thanks!

https://redd.it/1jfrd5r
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views15:28

Reddit DevOps

AWS DevOps & SysAdmin: Your Biggest Deployment Challenge?

Hi everyone, I've spent years streamlining AWS deployments and managing scalable systems for clients. What’s the toughest challenge you've faced with automation or infrastructure management? I’d be happy to share some insights and learn about your experiences.

https://redd.it/1jfscf7
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views16:28

Reddit DevOps

How to set realistic expectations for adhoc work

I'm a DevOps consultant and a previous employer. The feedback I got from my manager was that I wasn't scanning Slack enough for ad-hoc work. I was a team of 1 in charge of everything infrastructure and security related for the startup. Sometimes if I was working on something that required a lot of concentration and debugging I would not want to context switch to a slack thread partially if I wasn't tagged or sent a direct message.

Basically I was expected to constantly scan slack channels and respond to any issues developers were having asap and drop everything I was doing. For example one of the gitlab runners was slow and having poor performance. The gitlab runner was still operational but builds were taking 10 to 15 minutes longer than normal for a job that usually takes 10 minutes. My Manager told me because I didn't stop everything I was working on reply that I was working on a fix with 15 minutes and resolve the issue within 1 to 2 hours that I was at fault. I was told this days later after the issue had been fixed because I was worked on the fix for a slow gitlab runner later in the day.

I was not getting direct messages or being tagged so this would mean scanning the common slack channels every 5 to 10 minutes all day which seemed unrealistic if I am doing active development work through out the day on other features. I didn't want to seem lazy because I was willing to work 70 hour weeks if it was required but the client got mad because I would not respond to messages within 20 minutes at 8 PM at night when I was at the gym for a code review for something not urgent.

Is these just really odd expectations of devops at startups or has any else encounter unrealistic expectations from a manager similar to this and how you met them or convinced the manager of more realistic expectations?

https://redd.it/1jfr7pn
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views17:28

Reddit DevOps

Need help for PipeLines

# TLDR;

Junior dev, the only one on the team who cares about pipelines, looking for advice on how to go about serverless.

# Thanks a lot

So I'm back. I'm the guy from this post. I'm very grateful for the help you guys gave me a couple of months ago. We're using Liquibase that a lot of you recommended and I managed to create a couple of pipelines in GitLab trying to automate a couple of things. I'm here because, while I enjoyed trying out Liquibase and building those little pipes, I'm pretty lost.

Let me explain:

## What we have

We started using Liquibase as I mentioned before and it's really helping. After that I decided to try Gitea and test some pipes (we were using GitHub Enterprise Server on-premises). Long story short, I really liked it, but I felt like it wasn't as enterprise-ready as GitLab.

We started using GitLab and with its sprint management and pipes the whole team was impressed. Well, more for sprint management. I decided that automating things was good, so I got to work and after a week I had a set of usable steps for pipes.

We are not using a repo for pipes because we are still trying it out, we only have a couple of repos and this repo is the only one that has pipes. I read that you can create a single repo for those and have another repo call the step on that or something.

Anyway we develop on .Net for BE and typescript with React for FE. I created 3 groups of pipes distributed in some stages:

- build

- test

- analyze (used for static analysis with SonarQube)

- lint

- deploy (used to publish a new version of lambda and push new files to S3 for FE)

- publish (used to apply that new THING on the various envs dev|test|demo|prod)

Maybe publish and deploy are used for switched things, but you get the idea.

Build, test, analyze and lint are executed on every commit on main (we are using Trunk but no one knows about it except me, I keep it a secret because some people don't like it)

Deploy is executed on tags like Release-v0.5.89 while publish on Release-dev|test|demo|prod-v0.5.89. We started logging the status code of the action executed by BE from both APIs and BusinessLogic to CloudWatch to track the error rate in a future pipe although I don't know how to use this data yet.

I feel like I need a little hint. Like what to look for or what the purpose of the next action should be. I was thinking about a way to auto rollback but our site is not in production so we are the only ones using it at the moment. Help?? 🥹

If it helps I can post the pipes via a pastebin or something tomorrow morning (Central European TZ zone).

Edit: fixed syntax and linting 😆. The first published was a rush through and i don't really read back what i wrote

https://redd.it/1jfuczh
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views18:28

Reddit DevOps

The outdated and the new tools you use/prefer?

I'm a fresher (3rd year undergrad), I heard docker is getting outdated and container runtime is not docker anymore and it is containerd from senior, its a new thing for me , I have heard of containerd and never worked on it, what else are there like these to differentiate me from others?

https://redd.it/1jfxtqy
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views20:28

Reddit DevOps

Problem solving, troubleshooting for juniors

Hello,
I am a junior (I mentioned before that I am currently on an internship) and I would like to ask you about your approach to debugging, troubleshooting, and problem-solving. Do you have any interesting books or courses that could help or guide me on different methodologies and improve these skills? Right now, what I do is I write the bug description in the chat and I know what it relates to, then I look at the code to see what’s wrong.
I have found this book https://artoftroubleshooting.com/book/
What do you Think

https://redd.it/1jfzl7h
@r_devops

THE ART OF TROUBLESHOOTING

Read The Art Of Troubleshooting Book

“It’s everything to fix anything.” Download the Ebook PDF: The Art Of Troubleshooting – ebook (78.5 MB, 403 pages) EPUB: The Art Of Troubleshooting – ebook (96.7 MB).z…

6 views21:28

Reddit DevOps

How do you leverage your TAM's?

We are multi-cloud, but mostly AWS. We have enterprise accounts but honestly we almost never talk to them except to escalate a ticker, and even that is extremely rare.

What kinds of things do you use a TAM for? I honestly don't even know what I would ask them to support with.

https://redd.it/1jfxvf9
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views22:28

Reddit DevOps

How much traction does SLSA have? With ML pipeline safety trending, is it getting more interest?

I remember there was a big splash a few years ago with Google kicking off a pubic SLSA (Supply-chain Levels for Software Artifacts, it's a mouthful) group. Is anyone actually actively adopting SLSA? Or under pressure to adopt it?

Just looking at public sources, there's a lot of regular activity on https://slsa.dev/, with release 1.1 coming out soon. And I've found some papers that are recently published, and the occasional blog post on the topic. And I did notice a recent small spike in google search queries.

Is there more to it than that? I don't see very many Reddit posts about it at any rate.

https://redd.it/1jg2hak
@r_devops

SLSA

Supply-chain Levels for Software Artifacts

SLSA is a security framework. It is a check-list of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure in your projects, businesses or enterprises. It’s how you get from safe enough to being as resilient…

6 views23:28

Reddit DevOps

AWS costs. Save me.

Why does it feel impossible to forecast application hosting prices? I have used AWS calculator and it is like another language.I literally want to host a KeyCloak server and .NET/Postgres RDS calendar scheduling, pdf storage and note taking application that will serve initially 4 people but could serve 5000 active daily users by next year. AWS calculator gives me anywhere between £100 and £20,000 a month.Why isn't there a human guide to these costs? Like "10,000 people transferring x mb per session per day would cost X amount"

https://redd.it/1jg3154
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views00:28

Reddit DevOps

I’ve applied to over 100 jobs with no luck. Can you please roast my resume?

What’s wrong with my resume? I have yet to receive any positive responses from the companies I’ve applied to. I would appreciate some feedback. Thanks in advance!

Here’s my resume: https://imgur.com/a/akSS1FL

https://redd.it/1jg33yr
@r_devops

Imgur

Resume DevOps

Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more from users.

7 views01:28

Reddit DevOps

Newbie to DevOps here - General advice requested

Hi. I'm starting with DevOps and would like to do a Proof of Concept deployment of an application to experiment and learn.

The application has 3 components (frontend, backend and keycloak) which can be deployed as containers. The data tier is implemented through an PostgreSQL database.

There is not development involved for the components. The application is an integration of existing components.

We are using GitLab with Ultimate licenses and target AWS for the deployment.

We would like to deploy on a Kubernetes cluster using AWS EKS service. For the database we want to use Aurora RDS for postgresql.

The deployment will be replicated in 4 environments (test, uat, stage, production), each of them with different sizing for the components (e.g. number of nodes in the kubernetes cluster, number of availability zones, size of the ec2 instances...). Each of those environments is implemented in a different AWS account, all of them part of the same AWS Organization.

In our vision we will have a pipeline that will have 4 jobs, each of them deploying the infrastructure components in the relevant AWS account using terraform. The first job (deploy to test) is triggered by a commit on the main branch. And the rest are triggered manually with the success of the previous as requisite.

And we have some (millions of) doubts... but I will include here only a few of them:

1. GitLab groups/projects: a single project for everything or should we have a group including then a project for the infrastructure and another for the deployment of the application? Or it is better to organize it in a complete different way.

2. Kubernetes/EKS: a single cluster per environment or a cluster per component (e.g. frontend, backend, keycloak...)?

3. Helm: we plan to do the deployment on the kubernetes cluster using helm charts. Any thoughts on that?

Thanks in advance to everybody reading this and trying to help!

https://redd.it/1jg1jbm
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views03:28

Reddit DevOps

Any Dev or User Experience with CoreWeave or Nebius for AI/ML Workloads?

I’m curious to hear about your experience—good or bad—as a developer or user working with CoreWeave or Nebius, especially for AI or machine learning workloads.
• How’s the developer experience (e.g., SDKs, APIs, tooling, documentation)?
• What’s the user experience like in terms of performance, reliability, and support?
• How do they compare in cost, scalability, and ease of integration with existing ML pipelines?
• Anything you love or hate about either platform?

Would love to hear your insights or compare notes if you’ve used one or both

https://redd.it/1jg90ks
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views05:28

Reddit DevOps

What DevOps project should I build to showcase my skills in interviews?

Not sure if this is the right place to ask, but I recently started a DevOps course, and so far, I’ve learned about Git, Docker, Kubernetes, Helm, and Ansible. I’m looking to build a project that I can showcase in future interviews to demonstrate my skills, but I’m not sure what would be the most impactful.

I searched on ChatGPT for project ideas, and one suggestion was:
• A scalable web platform: Deploying a web app using Terraform, Kubernetes, and Docker, with CI/CD pipelines, load balancing, and monitoring.

While this sounds interesting, I’m not sure if it would be enough to stand out. If you were interviewing a DevOps candidate, what kind of projects would impress you? What real-world problems should I try to tackle to make my project more relevant?

Any advice or recommendations would be greatly appreciated!

https://redd.it/1jgchw8
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views09:28

Reddit DevOps

Open-source for On-Call Solution?

We’ve been working on **Versus Incident**, an open-source incident management tool that supports alerting across multiple channels with easy custom messaging. Now **we’ve added on-call support with AWS Incident Manager integration**! 🎉

This new feature lets you escalate incidents to an on-call team if they’re not acknowledged within a set time. Here’s the rundown:

* **AWS Incident Manager Integration**: Trigger response plans directly from Versus when an alert goes unhandled.
* **Configurable Wait Time**: Set how long to wait (in minutes) before escalating. Want it instant? Just set wait\_minutes: 0 in the config.
* **API Overrides**: Fine-tune on-call behavior per alert with query params like `?oncall_enable=false` or `?oncall_wait_minutes=0`.
* **Redis Backend**: Use Redis to manage states, so it’s lightweight and fast.

Here’s a quick peek at the config:

oncall:
enable: true
wait_minutes: 3 # Wait 3 mins before escalating, or 0 for instant
aws_incident_manager:
response_plan_arn: ${AWS_INCIDENT_MANAGER_RESPONSE_PLAN_ARN}

redis:
host: ${REDIS_HOST}
port: ${REDIS_PORT}
password: ${REDIS_PASSWORD}
db: 0

I’d love to hear what you think! Does this fit your workflow? Thanks for checking it out—I hope it saves someone’s bacon during a 3 AM outage! 😄.

Check here: [https://versuscontrol.github.io/versus-incident/on-call-introduction.html](https://versuscontrol.github.io/versus-incident/on-call-introduction.html)

https://redd.it/1jgdljl
@r_devops

14 views10:28

Reddit DevOps

Gitlab project domain transfer

Hi there,

I'm a start up owner (don't worry, service biz, not AI bollocks) and I'm very stuck with some gitlab stuff. If someone can help out / do this for me, I'm also very happy to pay. Our current software devs are far too busy on our current project to help with it and the previous dev who built our system doesn't work on this kind of stuff any more as he's set up a new biz.

We have

\- a website

\- a booking form

\- a staff app

\- an admin panel

\- digital reports for our customers

all of these are hosted on the same domain which is the problem

i.e.

domain.com

domain.com/booking

domain.com/admin

domain.com/reports

We have a new website built in webflow that we can't publish on domain.com because it crashes all the above as there's nowhere pointing to them once we host the domain on webflow.

We either need to move all of the above to subdomains i.e. booking.domain.com or to copy the project and host them on webflow or something.

I have very entry level database knowledge and maybe I'm looking at this totally wrong, but we are dying to launch our website and are stuck in the meantime. We're actually building out a whole new system that will replace all of the above, but it's not ready yet. So all of this would be a temporary fix until it is so we can at least publish our new website.

Here's hoping the above isn't complete gibberish. Thanks all.

https://redd.it/1jgdrfh
@r_devops

Domain

Domain Names, Site Builder, Hosting, and More | Domain.com

Finding and buying the perfect domain is as easy as 1-2-3 with Domain.com. We'll even help get you online with our DIY and Pro site builder and marketing tools.

9 views11:28

Reddit DevOps

DevOps/Platform recommended reading

Hi. Am looking for any current recommended reads around the devops/ platform area. Wondered if books like Accelerate or Continuous Delivery are still current enough to be a valuable read without being too dated. Have read Phoenix project and The DevOps Handbook so anything in that vein would be good. Thank you!

https://redd.it/1jgdi4v
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views14:28

About

Blog

Apps

Platform