Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Would you use Kubernetes Terraform template that provide a Platform grade setup?

Hey r/devops,

I’m exploring the idea of a platform that provides ready-to-use, production-grade Kubernetes infrastructure templates—something that could save teams time by offering pre-configured setups for essential components like:

Observability (Prometheus, Grafana, Loki, OpenTelemetry, etc.)
GitOps (ArgoCD, Flux)
Cert Management (cert-manager, external-dns)
Service Mesh & Networking (Istio, Linkerd, Cilium)

The goal is to help teams skip the painful initial setup and get straight to deploying applications with a solid, scalable foundation. Instead of spending weeks fine-tuning Kubernetes infrastructure, you’d have a well-tested Terraform/Kubernetes template that you can deploy in minutes.

I’d love to hear from you:

Would you (or your company) pay for a service like this?
What are the biggest pain points in setting up Kubernetes infrastructure?

Looking forward to your insights—especially from those who manage K8s at scale! 🚀

https://redd.it/1j7vj7c
@r_devops
Docker assumes my Harbor registry is DockerHub

Hello, everyone!


I’m new to DevOps and running into an issue with Docker and a private Harbor registry. The registry is running on the same server as my CI/CD runner. When I push images using 'localhost', everything works fine. But when I try using the server’s hostname, Docker assumes it’s a DockerHub repository instead of my Harbor registry.



Logging in to Harbor works without any issues, and images are listed correctly. However, when I push using the hostname, I get errors like access being denied or the tag not existing. In fact Docker assumes that I'm trying to push docker.io/server/image.



Has anyone faced this before? Any ideas on how to make Docker properly recognize the registry when using the hostname? Any help would be greatly appreciated!

https://redd.it/1j7vvfi
@r_devops
Thrown into the Deep End in DevOps, Need Guidance for the Next Step

Hey everyone,

I wanted to share my journey so far and get some advice from this community.

I joined a prop-tech startup right after college with limited DevOps knowledge. Initially, I worked alongside a senior engineer, starting with tasks like writing backup and restore scripts and creating POCs in the sandbox environment. One of the key things I worked on was a metrics exporter for a database, which helped me secure a full-time offer.

I officially started as a full-time DevOps Engineer in September. I took charge of stage deployments and started learning more about AWS and monitoring. The pay was okay for a fresher, but I stayed because I was gaining valuable experience.

Around December, my senior left, and their replacement didn’t have much experience with our setup. Since I had about 6 months of hands-on work with our infrastructure, I was given production access. Since then, I've been handling tasks like database replications, deployments, observability, monitoring, security audits, and disaster recovery practices.

I'm currently preparing for the CKA (Certified Kubernetes Administrator) exam, aiming to appear around May-June. My goal is to land a mid-level DevOps role by March 2026.

I'm looking for advice on:

1. Skills/Certifications I should focus on alongside the CKA to increase my chances.
2. How to effectively showcase my experience to land that mid-level role.
3. Any resources or strategies that can help me fast-track my growth.

Would love to hear your thoughts, especially from those who've navigated a similar path. Thanks in advance!

https://redd.it/1j7wivc
@r_devops
Seeking validation on Go CLI for Dockerfile Template Discovery

Hey folks,

I'm building a Go CLI that helps users find Dockerfile templates, and I’m exploring two approaches:

1. Cache Approach: Pull templates from well-known repositories (think Awesome Docker Templates or other curated Dockerfile libraries) and cache them locally.


2. Dynamic Search: Query Docker Hub directly to search for images and dynamically generate a template based on what’s available.


I’d love to hear what you think about this idea, does it sound useful? Any advice or pitfalls I should consider?

If you feel the idea has no base and is completely useless, let me know that too!

https://redd.it/1j7uwyt
@r_devops
Serverless observability for dummies

I'm the only dev (frontend background) in an early stage startup.

We use AWS Lambda (with serverless.com ) , Nextjs (hosted in Vercel).

I use AWS Cloudwatch to inspect logs but it has no alerts or nice UI so all I want is a nice UI to sit on top of Cloudwatch.

I tried setting up New Relic, HoneyComb.. but honestly I feel the effort required is way too involved for my time and skillset.

Is there an easy tool optimized for serverless? I dont have OpenTelemetry or anything like that.



https://redd.it/1j7zt7n
@r_devops
Switching from CodeBuild to GH Actions. Managing all the workflows?

My team is in the process of rotating off of a CodeCommit/CodeBuild based CI/CD system, over to GitHub/GitHub Actions. Our Dev team is having a pretty easy time making the change over to workflows. Since I'm responsible for the Terraform stack, it's a little trickier, but mainly I've noticed that I had a staggering amount of code that governed CB and EB triggers that I am in the process of ripping out in favor of Workflows. Seems to be a much less complicated system.

I haven't really done anything too complicated yet that will require multiple TF deployments calling up the same workflow with some changes to a variable (but I know it's coming). I can see this all getting a bit unkempt and going the opposite of TF's DRY principle. My list of GH workflows is growing larger, and I'm curious how others manage these. I'm already going to switch from a multi-repo TF env to a monorepo (probably anyway - I started a new repo to rough it out) so that all the workflows can live in one place and not have a million copies doing the same thing that I have to edit en-masse when I need to change something. What else can I do to tame all my workflows - in TF and in other Dev projects?

https://redd.it/1j815ev
@r_devops
🚀 Announcing Wait4X v3.0.0: Smarter, Faster, and Feature-Packed! 🎉

Hey everyone! I’m excited to announce the release of **Wait4X v3.0.0**, packed with new features and improvements to make waiting for services easier and more efficient than ever before.

**🔄 What’s New in v3.0.0?**

1. **🌐 DNS Feature (New!)**
* You can now wait for DNS resolutions directly! Perfect for scenarios where DNS propagation timing is critical.
2. ** Improved Performance**
* Enhanced execution efficiency, reducing wait times and resource consumption.
3. **🛠️ Better CLI Experience**
* Refined command options and output for a smoother and more intuitive user experience.
4. **🐛 Bug Fixes and Stability**
* Addressed several minor bugs and improved overall reliability.
5. **📚 Enhanced Documentation**
* Comprehensive guides and examples to help you get started quickly.

**💡 About Wait4X** Wait4X is a CLI tool designed to wait for various services like HTTP, TCP, Databases, Messaging Queues, and now DNS to be ready before proceeding. It’s a handy tool for scripting, CI/CD pipelines, and deployment automation.

**📥 Get It Now!** You can download or update to v3.0.0 from [GitHub](https://github.com/atkrad/wait4x) and start exploring the new features!

**🙏 Feedback Welcome!** I’d love to hear your feedback, suggestions, or any issues you encounter. Drop a comment or open an issue on GitHub.

Thanks for your support and happy waiting! 🎉

https://redd.it/1j7zjq0
@r_devops
How to Setup Preview Environments with FluxCD in Kubernetes

Hey guys!

I just wrote a detailed guide on setting up GitOps-driven preview environments for your PRs using FluxCD in Kubernetes.

If you're tired of PaaS limitations or want to leverage your existing K8s infrastructure for preview deployments, this might be useful.

What you'll learn:

- Creating PR-based preview environments that deploy automatically when PRs are created

- Setting up unique internet-accessible URLs for each preview environment

- Automatically commenting those URLs on your GitHub pull requests

- Using FluxCD's ResourceSet and ResourceSetInputProvider to orchestrate everything

The implementation uses a simple Go app as an example, but the same approach works for any containerized application.

https://developer-friendly.blog/blog/2025/03/10/how-to-setup-preview-environments-with-fluxcd-in-kubernetes/

Let me know if you have any questions or if you've implemented something similar with different tools. Always curious to hear about alternative approaches!

https://redd.it/1j83kt3
@r_devops
OpenTelemetry Collector vs Grafana Alloy

Hi, does anybody have any experience with both these collectors so you may share your experience?

What should be chosen for the fresh env, vendor agnostic OpenTelemetry Collector or vendor specific Grafana Alloy?

Is there any significant difference to choose one over another?

Thanks in advance.

https://redd.it/1j84p18
@r_devops
UPDATE: Hired as a "Junior DevOps Engineer", now a "Business Operations Manager"—is this good or bad?

About a month ago, I posted about how I was hired (7 months ago) for a DevOps/software engineering role at a Fortune 500 company, only to be moved to a different team doing mostly Power Automate, SharePoint, and Power Apps—far from the AWS, Terraform, and Docker work I was expecting.

Since then, things have taken an even weirder turn. I recently checked my job title in our internal system and saw that my manager had changed it from Junior DevOps Engineer to Business Operations Manager—despite the fact that I’m not actually doing anything related to business operations. I’m still just writing scripts and building cloud-based tools, yet my title now makes it sound like I’m in a finance or admin role.

When I finally asked my manager about it, they said that due to an organizational restructure, my title was changed to better align with their team. This way, when N+2 managers interact with them and me, my job title eliminates any confusion and indicates that I work under them rather than the original manager who hired me. They also said this title was going to benefit me a lot moving forward.

What annoyed me is I never got any heads-up about this, and my work hasn’t changed. I’m still doing the same mix of automation and scripting. But now I’m wondering:

Is this a good thing (maybe it makes me look more versatile/above my pay grade)?
Or a bad thing (is my resume getting tanked, and should I jump ship ASAP)?

I was already considering leaving because this role isn’t fully aligned with my career goals, but this title change makes me confused.

Would love to hear if anyone’s been in a similar situation.

https://redd.it/1j866id
@r_devops
New to devOps: Tracing, observability, a bit lost.

Hi !

I'm in charge of develop the observability part of my company software.

I'm pretty inexperienced as a DevOps so I wanted to stay simple. At first, I went for OpenTelemetry and Jaeger (in a docker Paas). Then I realised I have no persistency/storage and no auth security with Jaeger alone.

So I searched a bit and solutions with trace storage and auth security seemed a bit cumbersome:
- Adding Keycloak on top of Jaeger for security and compiling some jaeger plugin to get some storage connection
- Going for a Grafana stack and deploying an Otel collector between my app and grafana

I feel like PaaS are not suited for observability solutions and I should go for some VPS or something. The primary reason I wanted to stay on my Paas provider (Clerver Cloud) is because I'm taking back an old project that has parts deployed on a lot of different providers and I wanted to just stick to one to avoid chaos.

So I'm a bit lost for now, do you have some advices ?

https://redd.it/1j83pj3
@r_devops
Need help and ideas to continue

Hi, about a year ago jumped in to Linux world and loved it and slowly become interested in DevOps, I followed this road map https://roadmap.sh/devops, but now Im almost halfway and I know I have to do something ( like a small project or sth) to gather all of my knowledge until here to one place and polish them, fill the holes in the way. But I'm totally lost, i have no idea what to do, what project or any thing and need help.

I can continue the path but I know it will do more harm

I just need a project or work idea with it telling me the steps like " first using this tool do this and then ... "

where can i find something like this? a mentor maybe ? someone who helps me ?

https://redd.it/1j88k05
@r_devops
How do you remember so many things?

I want to know how do you do it. When I get into something I learn it but after a few weeks I forget it partially or totally. When doing some interviews they ask things I knew but I forgot and it's kinda frustrating. How do you do to keep all this existing and new information always available?

https://redd.it/1j8a43i
@r_devops
Security Tips for Docker Compose with Nginx as a Reverse Proxy

Hey everyone!

I have an application deployed via Docker Compose, distributed across multiple VPS, and my setup is as follows:

* I use containers for **Next.js (a variable number of clients), Bun (server), Gluetun (to isolate the server within a VPN, which is necessary for my application), and Certbot**, but none of them have exposed ports.
* The only container with open ports is **Nginx**, which listens on ports **80 and 443** and acts as a reverse proxy.
* SSH access is available on port **22** on some of the VPS.

I want to ensure my setup is as secure as possible. Some security practices I already follow:

* I use **Certbot** to manage SSL.
* No internal services are accessible externally.
* SSH access is **key-based only**, and root login is **disabled**.
* I install **CrowdSec** on all VPS.

My main concern is **Nginx**, as it is the only exposed service. In the logs, I see many **path traversal attempts and random access attempts**. I believe my `nginx.conf` is properly configured, but is there anything else I should check to further enhance security?

I would love to hear your insights:

* What additional security measures would you recommend for this setup?
* What would professionals do or avoid in this kind of environment?
* Are there any specific configurations to harden **Nginx** or **Docker Compose**?
* Do I need Kubernetes if everything is already running? I generate the yml files dynamically (for the Next.js containers) using a bash script, and sometimes it can get to 15-20 containers.

I am the front-end and back-end developer and infrastructure manager of my SaaS. All of this has been a huge opportunity for me to learn and grow in my career and any advice to make my setup more secure and with higher professional standards is appreciated. Thanks!

https://redd.it/1j8bq6m
@r_devops
what are the better alternatives to sonarqube that you use currently?

Hey r/DevOps,



Most of our codebase is in JavaScript, TypeScript, and React, and we're currently looking for alternatives to SonarQube. 



Does anyone have experience with AI tools that can help with static code analysis, code quality checks, and security vulnerability scanning for these languages?  



Would love to hear what’s worked for you and if any new + reliable AI tools can take up the task!

https://redd.it/1j8kpab
@r_devops
Can I Run MongoDB and PostgreSQL on Hetzner Cloud Volumes?

I was checking out Hetzner's documentation and noticed that their Cloud Volumes offer sustained IOPS (read/write) of up to 5000 and burst up to 7500 (Hetzner Cloud Volumes Overview). Given these specs, I'm curious if it's feasible to run MongoDB and PostgreSQL on these volumes for a medium-size web app focused on data processing.

Has anyone had success running MongoDB or PostgreSQL on Hetzner Cloud Volumes?
Have you encountered any performance or latency issues under moderate loads with these IOPS numbers?

https://redd.it/1j8le9s
@r_devops
Could anyone please assist me with this project title and provide guidance on how to begin, as I'm a beginner leading a group of three members ?

Title - Deterministic log test replay framework for devops

Abstract - Imagine trying to fix a bug in a complex software system where every step matters—but the logs that record these steps are jumbled, making it
hard to recreate the exact conditions that led to the error. Our project, DLTRF (Deterministic Log Test Replay Framework), tackles this problem
by capturing every log entry produced during testing along with its precise timestamp, then storing them in a structured way so that they can be
replayed in exactly the same order every time. Drawing inspiration from an IEEE study on FPGA-based deterministic replay in which achieves
bit-accurate visibility of hardware behavior—DLTRF applies similar principles to software logs in DevOps environments. In simple terms,
DLTRF guarantees that when you re-run a test, you experience the same sequence of events, allowing developers to consistently recreate the test
scenario, accurately trace bugs, and clearly determine if issues stem from configuration differences or genuine software defects. This reliable,
repeatable replay process not only improves debugging precision but also boosts developer productivity by reducing the time spent
isolating and fixing errors.

https://redd.it/1j8kmnb
@r_devops
what are the better alternatives to sonarqube that you use currently?

Hey r/DevOps,



Most of our codebase is in JavaScript, TypeScript, and React, and we're currently looking for alternatives to SonarQube. 



Does anyone have experience with AI tools that can help with static code analysis, code quality checks, and security vulnerability scanning for these languages?  



Would love to hear what’s worked for you and if any new + reliable AI tools can take up the task!

https://redd.it/1j8kol8
@r_devops
ai or engineering jobs in med field ..tell me if you know

# Do you guys happen to know anyone in your experience who as software engineer got a job in med field(ai in med ) if you know any thing about it ..tell me what kinda skill set they had

https://redd.it/1j8p57m
@r_devops
Best cloud provider for AI workloads?

Been exploring different cloud providers for AI workloads, and I keep running into the same problem and AWS and Azure are overpriced as hell. Spot instances help, but they’re unreliable for longer jobs, and I’ve had training runs get killed halfway through because my instance got reclaimed. I’m using Compute with hivenet rn which is much better imo. Even if it doesn’t have templates yet it does the job in terms of just runnin some GPU instances on demand and costs way less than Amazon.

https://redd.it/1j8pmir
@r_devops
Lenovo ThinkPad X1 Carbon G12 Touch (21KC000MUS) or Apple MacBook Pro 14.2” with M4 (24GB, 1TB SSD)

Hello, everyone.

Since I plan to learn Devops, I’m trying to find out which one is the best for DevOps? Can you advise?

Thank you, in advance.

https://redd.it/1j8qqar
@r_devops