Reddit DevOps

How are you running short-lived Docker containers for integration tests in Java apps?

I see a lot of people using Jib or Buildx for building Docker images and Helm/Terraform for deployment.

What about running containers during integration tests? For example, spinning up Postgres, Redis, Elasticsearch, or other services locally or in CI to test against?

Are you using docker run in CI scripts or custom bash logic?

Using something like Testcontainers?

Building your own test infra harness?

I'm curious what patterns you’ve seen work (or fall apart) when trying to reliably run and stop Docker containers from within Java-based test flows or CI pipelines.

Have you hit reliability or cleanup issues?

Thanks.

https://redd.it/1llq016
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

13 views10:28

Reddit DevOps

alternative to Signoz

My organization wants to adopt the API monitoring tool. The best one. We wanted to go forward with Signoz, but right now, Signoz doesn't provide user management, and it's not what we're looking for.

What are the alternatives for Signoz out there? Tell me all, even if they are paid one.

https://redd.it/1llr3ki
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views11:28

Reddit DevOps

Distributed Logging Store?

Hi,
we are building a software (backend + app) for a large retailer with thousands of stores. Each store has its own server and therefore our backend has basically 10.000 instances distributed across the world.

When it is about logging we have two conflicting requirements and every second week we have a meeting around that:

1. All logs should be stored centralized for monitoring purposes and the costs must be acceptable. We have Elastic for that and expect a few Million Euro per year for logs. So we should not log too much.

2. When there is a bug we often get the complaint that the logs are not detailed enough. But we cannot add more logs, otherwise we would violate our cost constraints.

One idea is to have a system with decentralized log stores. Basically each server would have its own log server and store the stuff locally and the most important logs are also sent to elastic for central monitoring. But we need a way to connect with each store and run queries there. Do you know such a system to have decentralized log store, but with a centralized management hub? We don't want to connect to each server individually via remote desktor (they are windows btw).

https://redd.it/1lltexe
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

13 views13:28

Reddit DevOps

If you’re starting with AWS, focus on these 5 services

When I started learning AWS, I felt completely lost.

There were so many services, so much jargon, and no real roadmap. I kept bouncing between random tutorials and still had no idea how everything fit together.

What helped me most was focusing on a few key services that actually taught me how the cloud works at a basic level.

Here are five that made things start to make sense:

EC2
Taught me how virtual machines work in the cloud. Launching one, connecting to it, and running a basic app helped me understand compute in a hands-on way.

S3
This was my intro to cloud storage. Uploading files, managing folders, and setting permissions gave me a real sense of how cloud apps store data.

IAM
I used to get constant access errors until I spent time learning this. Once I understood users, roles, and policies, everything got easier.

RDS
Made working with databases much simpler. I didn't need to install anything locally, and I could finally connect apps to a managed database in the cloud.

Lambda
Running code without setting up a server felt like magic. It helped me understand how event-driven applications work and introduced me to automation.

While I was working through these, I made a simple system in Notion to stay organized, track what I was learning, and avoid getting overwhelmed.

What AWS service made things finally click for you? Always curious how others got started.

https://redd.it/1lltycb
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views14:28

Reddit DevOps

Migrating 5PB from AWS S3 to GCP Cloud Storage Archive – My Architecture & Recommendations

# Migrating 5 petabytes of data from AWS S3 to Google Cloud Storage Archive is quite a complex project.

I’ve recently completed a detailed discovery and analysis phase and published an architecture and recommendations based on my findings.

I’d love to know: Do you think my recommendations make sense? Or do you have any suggestions or lessons learned from similar large-scale migrations?

https://medium.com/@rasvihostings/migrating-5-petabytes-from-aws-s3-to-gcp-cloud-storage-archive-a107634969eb

https://redd.it/1llytar
@r_devops

Medium

Migrating 5 Petabytes from AWS S3 to GCP Cloud Storage Archive

Migrating 5 petabytes of data from an Amazon Web Services (AWS) S3 bucket to Google Cloud Platform (GCP) Cloud Storage Archive is a complex…

9 views17:28

Reddit DevOps

How to make DevOps projects to showcase my skills and learn?

I want to learn and showcase my skills but without collecting certificates or making a software application from scratch, what could be some ways to practice using docker, kubernetes, linux and all that stuff?

https://redd.it/1lm0vnx
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views18:28

Reddit DevOps

Anyone running wide events in a sizeable codebase?

- What hurdles or wins did you hit while instrumenting them?
- Did they shorten MTTR or surface new insights (numbers welcome!)?
- How do you reconcile single-service wide events with the cross-service view you get from distributed tracing?

Success stories, horror stories, and hard metrics all appreciated.

https://redd.it/1lm1wbm
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views19:28

Reddit DevOps

Last year CS student — Should I focus on Frontend (React) or DevOps/Cloud Path?

Hey everyone,
I'm in my final year of Computer Science and trying to figure out which career path to focus on.

Here’s what I currently know:

Frontend:

HTML, CSS, JavaScript

React (some basic projects, but not many standout ones yet)

DevOps / Cloud:

Linux (comfortable with CLI)

Docker

Kubernetes (can deploy apps to a basic K8s cluster)

AWS (EC2, S3, some deployment experience)

I enjoy both sides, but I'm stuck choosing which one to double down on for the next few months to become job-ready.

Which path would be more strategic to focus on right now — frontend or DevOps/cloud — considering demand, entry-level opportunities, and my current skills?

Any advice on how to make myself stand out or project ideas that could help would also be super appreciated!

Thanks in advance!

https://redd.it/1llzpkx
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views20:28

Reddit DevOps

I feel like I’m barely needed at my job.

I'm in DevOps but feel so much less useful than when I was a systems admin. It feels like with more and more time the less that regular IT people are needed and more are given to developers. Will DevOps exist in a few years? Writing yaml code and making small changes to our IDP feels like mediocre work. Basically all infrastructure will eventually be owned and controlled by software developers who also write the application code. There won't be any IT left except for those in low level support positions.

Someone tell me why I'm wrong.

https://redd.it/1lm4e3y
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views21:28

Reddit DevOps

Do you write test for your code?

I write python scripts to automate stuff usually it never exceeds 1-2k LOC. Also I never bother to write test because I don't see value in testing utility scripts. Once I saw a guy who wrote tests for Helm chart and in my mind this is total waste of time.

Just write a script run it if it fails fix it untill it works. Am I crazy?? What is your way of working?

https://redd.it/1lm5d8r
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views22:28

Reddit DevOps

What software and coding languages are the most important to learn?

I've been learning python and docker and also in the past learned JavaScript though it's been a while since I used JavaScript. I also am very well versed in Linux terminal commands (I have both a windows and Linux laptop) and have used a virtual machine on Linux in the past.

I want to do the DevOps career path but I want to know what software and coding languages are important to know and learn to be able to do the DevOps career path.

https://redd.it/1lm9eh4
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views00:28

Reddit DevOps

Changing processes

I work in a pretty decent software department. Good talent, good practices, modern technologies, decent management.

But one thing we can't nail is how to change processes. We have some way we've been doing things, we identify something that needs to be improved but we are failing at transitioning to the new way.

Some people, including staff engineers, believe in these tricke-down initiatives where they pitch a solution, maybe write some article or RFC and they expect everyone to buy in because how awesome this solution is. In their heads it's done. Sounds like circlejerk to me. Some people buy in and most people don't. The old way still works, they are too busy to care and at the end of the day we have 2 ways of doing something instead of 1.

I'm cynical enough to believe that there will only be full adoption if it comes from management and it is mandatory. Management is reluctant to do this because they don't want to create bureaucracy and too many rules. I see the point but it doesn't solve the problem.

I'm not even sure if my autocratic point of view is even the right way. Or are fully adoptions just not happening in medium/large organizations? It just starts to hurt productivity if you need to ask around "so how are we doing this thing now?" too much.

Example: we have 10 different ways we are building and pushing images in different teams/services. We want to unify it using reusable workflows so there's only one way. This is not fully adopted so now we have 11 ways.

Not looking to rant. I'm curious if someone found a smart way to deal with this.

https://redd.it/1lmcflz
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views03:28

Reddit DevOps

Advice Needed for DevOps Job

I have been fucking up constantly in my job, mainly due to my lack of time-keeping honestly. A bit of a background, I work for a major MNC Company, and we have many teams and department in this company. Our MNC Company is using Azure PAAS for everything. The company is so big, that just for RBAC alone, we have our own department. Then for Network Firewall, we outsource to a 3rd party company and for Cloud Infra Provisioning, we also have our own department. What i'm trying to say is, when we provision a new resource like Azure Kubernetes, we would need Service Principals and network firewall, and all of this requires a 3-week process. \

Now, I have 4 projects. I haven't been doing a good job at time-keeping and haven't been raising the tickets properly. This RBAC department is notoriously so evil, that they reject any ticket they receive as soon as they see even the most minute mistake, such as KeyVault name needs to be 24 characters long, keyVault name already exists. The funny thing is that, we are required to put 01 at our keyVault, so I was like thinking, what's stopping you from adding as 02? And due to this another 3 days delay, cause I have to go through the approval process again. \

My mistakes were so bad, my boss has already created a group chat with me, my line manager and my project manager, highlighting the mistakes I made, why I keep creating these tickets that are getting rejected, why I am assigning the wrong server owners for it blah blah. \

I have been very sleepless recently, cause I don't feel like I am in control over how long these tickets will take. It's a different feeling if I have the implementation capabilities, but I don't and that's the issue. \

TLDR: A lot of tickets that I raised keep getting rejected over the most minor reasons, Im not good at soft skills to ask why im getting blocked and what not, and I'm delaying our project timeline. Not just one, a few at least. \

I keep feeling like I'm the most irresponsible DevOps in the team. I have completed 2 out of the 4 projects, but at the sacrifice for the other 2 projects. 1 Project have been successfully deployed to PROD (a miracle honestly), the other project had mTLS error due to expiring cert, which was so bad, our services was down for 12 hours, I had to write an RCA report.

https://redd.it/1lmgxot
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views07:28

Reddit DevOps

Security of deniable encrypted links

So I am exploring the concept of deniable encryption, where any password is correct, like the XOR algorithm. But there is an entropy problem, where the incorrect password will almost always output non-common characters, I attempted to solve this at it's core by diving into the maths and some research papers but got nowhere, as it seemed to be almost impossible.

What I wanted was an algorithm that would give you perfect plausible deniability, so if you shared a link X with a password you could use a different password and get Y, saying you never intended to share X. I came up with a workaround, it's kind of cool and works. Just adding decoys which are mutable XOR ciphers joined, it allows you to set what other data is included, but it is not the perfect solution I was going for. Demo, Deniable Encrypted Link

I think it would be safe to share data encrypted with this method, I've done some basic brute force tests and did not find any shortcuts, I have a rough estimate of a billion years on a server farm for a 12digit password, and it is cool that every password is technically right.

https://redd.it/1lmj618
@r_devops

QR Catalyst

Anonymous Link Sharing | QR Catalyst

Share links anonymously with password protection and encode/decode text with compression. Fully client-side.

8 views10:28

Reddit DevOps

A small utility to add a security check before running remote installer scripts in pipelines.

Hey everyone,

We've all been there. You need to install a tool in a Dockerfile or a CI/CD pipeline, and the official method is:

# Super convenient, but always feels a bit sketchy...
curl -sSL https://some-tool.com/install.sh | bash

This works, but it's a blind trust fall. What if the script changes without you knowing? A typo could be added, or worse, something malicious. The usual alternative is to manually download, inspect, and run the script, which is safer but breaks the convenience of automation. So, I built a small, single-file bash utility called `vet` to solve this. The idea is to keep the convenience but add a transparent security layer right on the command line.

**What vet does:** It wraps the execution of a remote script in a safe, interactive workflow:

* **Shows you a diff:** If you've run the script before, `vet` caches it. The next time you run it, it will automatically show you a diff of exactly what has changed. No more silent updates.
* **Integrates with ShellCheck:** If you have `shellcheck` installed, `vet` will run it against the downloaded script first and warn you about any potential issues before you even review it.
* **Requires explicit confirmation:** After the diff and linting, it still prompts you for a final \[y/N\] before executing anything.

Here’s how the workflow changes:

**Before (The risky way):**

curl -sSL https://nvm.sh/install | bash

**After (The vet way):**

# In your terminal, this will prompt you.
vet https://nvm.sh/install

# In a pipeline, after you've audited the script.
vet --force https://nvm.sh/install

**GitHub Repo:** [https://github.com/vet-run/vet](https://github.com/vet-run/vet)

Would love to hear your thoughts, feedback, or critiques. Is this something you'd find useful in your own pipelines?

Thanks

https://redd.it/1lmmje2
@r_devops

8 views13:28

Reddit DevOps

Exploring the Future of Developer Tools: Memory-Driven Automation and Local AI Kernels

Hi everyone, I’ve been working on a concept aimed at transforming how developers interact with their workflows and tools. The idea revolves around creating a memory and automation layer that lives locally alongside AI kernels think of it as a personal assistant that remembers your context, tools, and preferences, rather than trying to know everything. What makes this different: Always-on, local-first operation for privacy and low latency Complete sovereignty over your data and workflows Deep, actionable integration with developer tools (editors, version control, CI/CD) to automate repetitive tasks, surface relevant context, and provide traceability across multi-feature projects Designed for real project continuity: persistent memory, version awareness, and workflow automation not just chat history I’m still in the early stages and haven’t shipped anything yet, but I’m excited about the potential here. I’d love to hear your thoughts on the challenges or opportunities you see in this space. What would you want from a developer-centric AI assistant that truly understands your workflow and project history? I’m sharing this to get feedback and connect with others passionate about AI and developer tooling. Looking forward to your insights!

https://redd.it/1lmp94l
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

4 views15:28

Reddit DevOps

Just started my Devops journey

Hi,

I have overall 3 years of experience as system Admin and recently cleared my RHCSA exam.

I want to switch my career to Devops profile and for this I learnt Linux and now I am learning Git and Git hub. I have learnt fundamental of Git and Git hub like init, push, pull, clone, fork, Authentication type like ssh and PAT,etc.

Now I need study partner, who is also learning Devops and also happy to connect with someone who is ready to help whenever I stuck anywhere.

Anyone who is open to connect, just dm me.

Thanks for your help and support.

https://redd.it/1lmtgo3
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

4 views18:28

Reddit DevOps

SRE Interview Coming up, no Experience

I have an interview for a Site Reliability Engineer role, but i have no experience in it! I only trained as an SDET, so i was surprised when a company reached out for this SRE position, i honestly have no background in it at all

What kind of questions should i expect?

They also mentioned there will be a technical interview and that i need to share my screen with them! What kind of coding tasks or other topics might they ask about?

Please help this person land the job!😅

https://redd.it/1lmtnm3
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views19:28

Reddit DevOps

Adding personal account to work laptop?

Hey! I’m currently an intern and I have a really great work laptop. I need some extra material to use during my projects - mainly some notes from my uni courses that are on my student account. I was wondering if it would be wrong for me to add my personal university account and download the notes from my drive? I don’t really care too much if they have access to it and I can delete it. If anything the notes are legally protected by the professor so only if you have taken the courses you can have the notes and if you haven’t it would be legal trouble

https://redd.it/1lmwfrq
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views20:28

Reddit DevOps

Learning to Build an AI Agent for DevOps – What Would Actually Make It Useful?

Yo! I’m in the process of learning how to build AI agents, and I’m trying to figure out how to make one genuinely useful for my team at work (DevOps/SRE focus). The idea is to create a bot that helps troubleshoot issues, remembers past incidents, and maybe even catches patterns we’d normally miss—kind of like a second brain that never forgets weird root causes.

Right now mine call

Parse incident docs and chunk them into embeddings for semantic search - not very hard
Let you chat with it to troubleshoot or recall past issues (as long as the app is running)
Run locally as a CLI, but could grow into a Slack bot or web UI later

What I’m trying to learn is:
If you had something like this, what would actually make it valuable for you and your team?

Would you want it to:

Surface similar past incidents automatically?
Suggest fixes or known playbooks?
Explain confusing Terraform or k8s configs?
Help triage alerts and logs?
Say “this looks like that one outage in April”?

Also: are any of you already using tools like this? Whether it's scripts, platforms, or vendor stuff—I’d love to know what’s out there and whether it’s worth the cost.

I’m not trying to pitch anything—just hoping to learn from others building or using AI in this space. Appreciate any thoughts, feedback, or links.

https://redd.it/1lmyg9w
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views22:28

Reddit DevOps

Why do so few AI projects have real observability?

So many teams are shipping AI agents, co-pilots, chatbots — but barely track what’s happening under the hood.

If an AI assistant gives a bad answer, where did it fail? If an SMB loses a sale because the bot didn’t hand off to a human, where’s the trace?

Observability should be standard for AI stacks:
• Traces for every agent step (MCP calls, vector search, plugin actions)
• Logs structured with context you can query
• Metrics to show ROI (good answers vs. hallucinations, conversions driven)
• Real-time dashboards business owners actually understand

SMBs want trust, devs need debuggability, and enterprises need audit trails — yet most teams treat AI like a black box.

Curious:
→ If you run an AI product, what do you trace today?
→ What’s missing in your LLM or agent logs?
→ What would real end-to-end OTEL look like for your use case?

Working on it now — here’s a longer breakdown if you want it: https://go.fabswill.com/otelmcpandmore

https://redd.it/1ln24vo
@r_devops

YouTube

End-to-End Observability with OpenTelemetry + MCP & Semantic Search | Next.js, .NET, Qdrant, Docker

🔍 How should you think about Observability in modern AI-powered apps?

In this deep-dive session, we tackle End-to-End Observability using OpenTelemetry — plus we build a Model Context Protocol (MCP) server with Semantic Search powered by Qdrant, Next.js…

7 views01:28

About

Blog

Apps

Platform