Reddit DevOps

10 years of building Apache Kafka

Hey folks, I've started a Substack dedicated to the development of Apache Kafka. I've started off with some posts about our build infrastructure and I thought this community might find it interesting.

Here's a blurb:

> The Apache Kafka build system has evolved many times over the years. There has been a concerted effort to modernize the build in the past few months. After dozens of commits, many of conversations with the ASF Infrastructure team, and a lot of trial and error, Apache Kafka is now using GitHub Actions.

Read the full post on my free Substack: https://mumrah.substack.com/p/10-years-of-building-apache-kafka

https://redd.it/1i0iujk
@r_devops

Building Apache Kafka

10 years of building Apache Kafka

A New Year, a new build system.

14 views18:28

Reddit DevOps

Horror Story/Rant: Bad manager that just destroyed team work

My manager (lets call him Bob) is pretty good with human leadership skill. And it is good to have that kind of character in manager.

However, he refused to take engineers recommendation to resolve technical debts, operation challenges, stack complexity. For example:
- we have three different eks clusters in the same region because Bob thinks that increase reliability and HA. Mind you, those clusters also backed the same EC2 in the same region and AZs. If EKS and EC2 are down, 3 clusters are just down too. No matter how many clusters we have. We told him, we just need one. And the answer is no given the reason above. Now, eks is out of date and we are forced to upgrade 3 eks clusters. And surprisingly, we let go of our team EKS admin last month lol. The recommendation was made 6 months ago.
- have a release approval for any changes to Prod controlled by terraform. But Bob tends to make changes by hands without release approval and ask to do it in terraform with release approval. we told Bob we shouldn’t do this. Let’s follow the correct process. And we are violating company release approval chain. Again no. Bob does what Bod needs to.
- Bob thinks being DevOps is being able to be great SRE and developers at same time. Sure those fields are related. But one person can only do so much. If there are such people, they are unicorns and get paid way more than us.

I know the ship is going down. I am trying to save the ship but the captain is just bad.

Rant ends.

https://redd.it/1i0lfpl
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views19:28

Reddit DevOps

Anyone regretted moving back to Engineering?

Has anyone successfully transitioned from Management back into Engineering and regretted it? If so, what did you regret and did you end up taking a pay cut? If not, are you happier now?

Edit: I am a Manager now with a decent salary, but I realized I don’t care about management at all and really miss hands-on work, so I’m considering transitioning back into Engineering, be that DevOps, Cloud, or something similar.

https://redd.it/1i0levs
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views20:28

Reddit DevOps

Expensive logging

I work in a GCP environment and due to reportedly hideously expensive logging costs, I'm being told to cut down on logging. I believe in logging errors, but now we take a Java exception and report that XYZ exception occurred. No stack trace.

Tragically, this code will be deployed to production, leaving some poor support person the unenviable task of guessing where and why the exception occurred.

How are modern corporate apps doing logging given the unaffordable cost of logging? Please note, our current logging is going to GCP log explorer. The multi billion dollar corporation cannot afford to log, at least to gcp log explorer.

https://redd.it/1i0lumb
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views21:28

Reddit DevOps

Documentation Tools, Strategies and Processes

This covers a wide area but I'd like some input from those in the community who have established a setup for automated documentation.

I work for a company that is growing rapidly. We in the infra team are a little bit haphazard with our documentation (aren't we all?). I know there are various schools of thought on documentation more generally and I'm not trying to get into that here. I want to know what approach people would suggest to centralizing our docs concerning a myriad of different tools and services, across equally as many repos.

It needs to be something robust which can handle generating documentation of multiple versions and be updated automatically on new releases of said tools and services.

We've dabbled in just using classic readme files, GitHub Pages, etc. We've toyed with Sphinx and Hugo but not sure if we should go the whole hog with these CMS tools. It nearly feels like it'd take an entire team to set this up. Curious to hear what others do and what some of the big companies like Netflix and Spotify do?

https://redd.it/1i0k5vf
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views22:28

Reddit DevOps

Broadcom CDDirector

Hi folks, I'm no vet, but I had never heard of this SaaS offering, and I'm beginning to loathe it...

It's been in the org for a good few years (before I joined) and our implementation is messy, it's flanked by in house apps to read/write to Jira/Jenkins, and we're not even using the useful features such as promotion through regions...

So has anyone heard of this / have experience with it? Should I run screaming? It just feels like a layer of abstraction on top of jenkins, and the more integrated features like pipeline generated releases just sounds like gitops without the community conventions.

TIA!

https://redd.it/1i0i2e7
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views23:28

Reddit DevOps

DevOps Job market 2025 Event Audio

Hey Folks,

As many of you requested we did record 2025 DevOps Job Market event I hope our community will find this type of events useful. If anybody wants to speak on our next event Please DM me

Speakers:
\- Javier, Director of Public Cloud @ Orange, Ex-AWS, Ex-Huawei, Ex-Microsoft
\- Luis, Staff SRE Intuit, Ex-NYSE
\- Ali, SWE @ Google, Google Cloud Team
\- Baha, Principal DevOps engineer, talks about running DevOps contracting

This was free event, I'm hosting audio on SoundCloud, you can check info, timestamps, and embed audio here:

https://prepare.sh/events/2025-devops-job-market

Our next event is planned on 31 Jan (date might change) and our guest speaker is an exceptional DevOps engineer and specialist in the field of Observability who personally was a role model for myself. He is a Principal Engineer @ AWS , Ex-Redhat, CNCF Ambassador, Apache foundation Contributor.

The topic of our next event is "Roadmap to become 10x DevOps Engineer".

You can join event on our server, You can find link on prepare.sh

https://redd.it/1i0u8u5
@r_devops

prepare.sh

2025 DevOps Job Market Discussions

Join us for an in-depth discussion on the 2025 DevOps job market, featuring industry experts sharing insights and advice.

9 views01:28

Reddit DevOps

GoDaddy's API Restrictions Got You Down? Help Us Find a Cert-Manager-Friendly DNS Provider!

In our Kubernetes environments, we use Cert-Manager to automate certificate renewals, and it has been working flawlessly. However, with GoDaddy's recently imposed restrictions (which I’m sure many of you are aware of), we’re looking to migrate our domains to a DNS provider with an API that doesn’t have such limitations.

Can anyone recommend a DNS provider that integrates well with Cert-Manager to continue automating the renewal process?

Thanks in advance for your help!

https://redd.it/1i1005z
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views07:28

Reddit DevOps

Learning platform - which one to choose?

Hi guys, I have some Linux experience, some technical support as SaaS company and about 10 months of software engineering with QA and some DevOps part like Jenkins, Terraform, Kubernetes. They fired a lot of us in my last job as a SWE and I want to upskill myself, which e-learning platform with hands on labs do you believe should work the best for me, is it KodeKloud, Cloud academy, PluralSight or Coursera and then create some of my projects and upload on GitHub?

https://redd.it/1i10mte
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

20 views08:28

Reddit DevOps

Docker image optimisation with docker-repack

Tom Forbes from GitGuardian recently published a tool to optimize docker images size and download speed: docker-repack. From his benchmark, the results seem promising with up to 8x faster download and 9x smaller images. The average reduction is more around 2-3x.

He published some details in a blog post: https://blog.gitguardian.com/demystifying-docker-optimizing-images/.

I'm not a docker internals expert but that seems like quite an improvement. I wonder if this could be available as an option to docker build at some point. Do you really want to do that in production in the first place? From my guts feeling I would say yes but there might be hidden downsides.

https://redd.it/1i138hm
@r_devops

GitHub

GitHub - orf/docker-repack: Repack docker images to optimize for pulling speed.

Repack docker images to optimize for pulling speed. - orf/docker-repack

16 views11:28

Reddit DevOps

How to prepare for 'pairing' exercises?

I've had a number of recruitment processes where I've been asked to do a pairing exercise and not done well. I'm wondering how I can better prepare for these. I'm a platform engineer with 10 YOE

Typically, in my experience, any pairing exercise comes after meeting team or Hiring Manager and system design stages. Tentative conclusion: my interview technique isn't awful if I get through those. I am asked to log into some remote environment and/via screenshare. This is typically homemade and not SAAS and often poorly integrated, e.g. high latency, low resolution screensharing with mismatched key bindings- I had one where I was using a mac to access an ubuntu desktop system and all the key bindings were Windows... Bye bye any extensions, local snippets etc. that I would normally use.

People claim that they want me to 'just tackle this as you normally would' when, besides the above, what they actually mean is 'we want to see you access it exactly the way that we think that we would ourselves in some notional perfect world'. e.g. for a new error I would typically google the error message as first step or use Claude/ChatGPT. Sure, maybe you don't like what you get from an LLM but have you seen what I can do with it? Really feels like this year's version of sneering at people using VSCode rather than Vim.

The exercise is typically something incredibly specific to their particular use case rather than a general concept and often about solving a problem in a really specific way (which just happens to be their pet method that they are hoping to implement real soon now) where there might be multiple valid solutions.

Sometimes the task is something that has very little relationship to the advertised spec, e.g. some sort of pure coding exercise for a platform engineer is a favourite gatekeep for software devs - ok so you're a full-time software dev and this is something you feel strong on to assess candidates but it's a small part of what I do and not going to highlight my strengths. If you're a startup, are you really all about artisanal hand-crafted code or are you more focussed on getting stuff out the door as fast as possible that gets the job done?

As I say, I have significant experience and I absolutely can get stuff done in the real world. Ranting about the poor match to the real world isn't going to help me pass such tests. There seems to be such a randomness of environments and scenarios that I struggle to see how I can prepare better. Any tips?

https://redd.it/1i13e9c
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views12:28

Reddit DevOps

Senior devs, here's a FREE workshop on release management!

# What it is not:

This isn’t another 101 session—You'll get advanced insights tailored to engineers operating at scale. Whether you’re managing large-scale production systems or refining your team’s delivery processes, this workshop will deliver actionable takeaways you can implement immediately.

# What it is:

We’re hosting a free workshop for experienced engineers and engineering leaders managing complex systems and an AMA session focused on scaling release management processes.

You will learn directly from leaders who’ve optimized software delivery in some of the most demanding

# Meet the Experts:

\- Ankit Jain: CEO and co-founder of Aviator, a developer productivity startup. Ankit is a former Google engineer with extensive experience leading engineering teams and building efficient release pipelines.

\- Vilas Veeraraghavan: Former Engineering Leader at Netflix, Walmart, Bill . com, and TruckStop. With deep expertise in scaling CI/CD, chaos engineering, cloud-native systems, and DevEx tooling, Vilas has delivered solutions in industries ranging from streaming to logistics.

\## What to Expect:

🔍 Analyze Key Challenges
Get clarity on common pitfalls in release cycles, including:
Streamlining deployments and rollbacks.
Managing production risks and distributed systems at scale.
Identifying bottlenecks that slow delivery in high-performing teams.
🔧 Learn Scalable Best Practices:
Discover actionable strategies for:
Automating release workflows tailored to complex infrastructures.
Improving deployment visibility for better incident management.
Managing service-specific release processes in diverse team setups.
💡 Interactive Problem-Solving Session:
Engage directly with our speakers and an open AMA to tackle your toughest challenges.

Here's the RSVP link with more info

See you there! 👨‍💻👩‍💻

https://redd.it/1i16qqx
@r_devops

lu.ma

How Netflix and Walmart Mastered Fast and Reliable Software Releases: Best Practices & Pitfalls · Luma

Join us for a 1-hour interactive workshop and AMA session to elevate your release management process, featuring insights from Ankit Jain, CEO of Aviator, and…

11 views14:28

Reddit DevOps

Options for in-house container (potential VM) platform

Most of our production workloads are in the cloud but we have a legacy setup spanning back nearly 20 years in-house that we are trying to modernize.

I'm looking to shift most development/staging to containers. I have a decent amount of experience with containers/docker etc. but not with orchestration, kubernetes etc. Nomad seems like a decent option but I'm weary about getting into best with HashiCorp too.

I'm looking at options for a smaller environment without having to get super deep into the complexities of kubernetes. I've seen nomad mentioned as well as mini kube, k3s etc. I don't know what to start with.

Also VM platform is oVirt/RHEV which is basically dead in the water and if we continue with VMs I need to replace it with something else (proxmox perhaps). Something that can do both VMs/containers like OpenShift could be an option, but I could potentially get off VMs all together and go 100% container, or build container platform on top of VM cluster.

Again, since most of this setup will be for development/staging purposes it doesn't have to be super redundant but I do have the infrastructure available to do basically whatever needed.

Should I bite the bullet and go straight to k8s or look at other alternatives?

https://redd.it/1i17cqk
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views15:28

Reddit DevOps

Do you guys enjoy writing terraform?

For those building in the cloud, working in smaller orgs do you actually enjoy writing terraform? I find that I would enjoy my job much more if I could just focus on building out features instead of splitting my focus on development, cloud training & infra buildout.

Is there anything you guys use for self-service? I recently wanted to do a poc on AWS ECS but then had to deal of the headache of figuring out the right internal module version to use & then running it before I was able to start working on my poc

https://redd.it/1i194v0
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

13 views16:28

Reddit DevOps

anyone here setup bitnami kafka + cert manager + istio ingress kubernetes gateway API?

I am trying to figure out how to actually connect to it using the url. I have it running in cluster now. chatgpt is sending me down a rabbit hole...requesting help from my fellow humans. If anyone can share the setup.

https://redd.it/1i1ccha
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views18:28

Reddit DevOps

Devops career growth

I recently got moved from being a fed developer into leading a small small team of contractors to build agnostic pipelines for a large organization. I am concerned that I may have just been given busy work because I’m female… guess I am looking for some reassurance that there is still potential for a lot of growth as a DevOps engineer. Opinions?

https://redd.it/1i1e70z
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

16 views19:28

Reddit DevOps

Logstash alternatives

Logstash has been my go to tool for ETLs for most of my professional career. It's either already been in place as the ETL process or the destination has been an Elasticsearch cluster making it the easiest choice to implement. I've never actually looked at any alternatives, anyone have any recommendations?

https://redd.it/1i1bo9e
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views20:28

Reddit DevOps

Does GCP consider image downloads as outgoing traffic?

I wanted to clarify with more knowledgeable people, I have a website and there are a lot of images on it that are loaded into the frontend part, there are no requests on the backend.

Does GCP consider uploading images to the frontend as outgoing traffic from the server? I know it's a stupid question, but I just don't understand it anymore.

Every month, I receive a bill of $ 120 for outgoing traffic from my server in Europe, which traffic goes to America, in the amount of about 700-800 GB.

At the same time, requests do not go anywhere from my server, namely the project that lies on it, I did not write such methods there and I do not need them.

https://redd.it/1i1g8cm
@r_devops

Does GCP consider image downloads as outgoing traffic? : r/devops

376K subscribers in the devops community.

9 views21:28

Reddit DevOps

What do people expect from DevOps/SRE at 150k+ base salary positions?

I am wondering what technical areas should one currently focus on to land high-paying job? I mostly talk about US salaries because I haven't seen such high ones in Europe or elsewhere. Is it simply something like Kubernetes and containerization overall, common IaC tooling, Clouds, Ansible, logging i.e just basic DevOps stuff, but with deeper understanding? Is it something more specific or foundational like NALSD, DSA, OS? Or maybe it's just matching a job that looks for a person with a deep knowledge in one certain topic?

Please share your experience or observations!

https://redd.it/1i1hcjz
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views22:28

Reddit DevOps

Your blue-green deployment approach

Is anyone here using awscdk to do blue-green deployment via ci/cd self-service? If so, how are you doing it? I was thinking about the state or cloudformation about the resources that it already deployed. How will it do blue-green if that is the case. Also, are you happy you used awscdk to do build your automated ci/cd pipeline?

Or maybe I should be open for other ideas aside from awscdk, terraform, opentofu. How did you build your automated ci/cd pipeline? How are your developers using it to deploy their resources?

https://redd.it/1i1i3ja
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views23:28

Reddit DevOps

Need help about DePIN powered server uptime manager

For a while, we’ve been developing a DePIN-powered uptime monitoring tool designed to potentially handle data from millions of devices. Our current infrastructure monitoring and uptime management service, (Checkmate) is evolving to include DePIN integration. This will allow users to burn tokens to access data from the UpRock DePIN network.

This is currently how it works under the hood:

\- Connect your wallet

\- Select the server you want to monitor

\- Choose a geographic focus—whether specific cities, countries, or entire continents—for Checkmate to send ping messages

While managing large volumes of data isn’t an issue at this stage, visualization remains a challenge. We’ve implemented MapLibre to display the data, giving users the flexibility to send one-off ping requests to the DePIN network or schedule continuous checks (e.g., every minute).

Given the novelty of this concept (similar to RIPE Atlas), visualizations will play a critical role for admins. Here's what we can currently offer on the dashboard:

\- Node distribution on a map: Visualize the number of nodes per country.

\- Selective probing: Choose probes directly on the map.

\- Probe details: View all probes selected for a specific server.

\- One-off ping tests: Perform immediate connectivity checks.

I need some feedback on how to move ahead. Since we are just a few weeks away from the general release, it would be great if I could get some thoughts. We’re considering whether this is the right balance of features or if adjustments are needed.

My immediate questions would be:

\- If you had access to a global DePIN network for server monitoring, what would you prioritize seeing on the dashboard?

\- Would you be interested in seeing historical logs? Like access logs going back to a specific time.

\- would you want to customize packet size? (set the size of the packets being sent).

Probably there are others upcoming but I would like to start with a small UI set initially.

https://redd.it/1i1jzck
@r_devops

Checkmate

Checkmate - Open source infrastructure monitoring

Monitor your servers, websites, Docker containers, and infrastructure with Checkmate. Open-source, self-hosted, and built for teams who value control.

9 views00:28

About

Blog

Apps

Platform