Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Two choices for the career path

Dear Nerds,

I’m calling for the advice of the lord of the nerds, please hear me.

Context: I work at a SaaS company with the title Product Support Engineer and it is a combined role so there is a 60% Support - 40% DevOps Tasks. Recently, I delivered the whole infra and pipelines of this new product we have.

I got an offer from another company doing secure OT, and the position is NOC Operator / Automation Engineer.

Goal: I need the better approach to help me reach my goals to be a full time DevOps engineer. Which one of these roles might be a considerably relative/easier stepping stone?

https://redd.it/1mbkges
@r_devops
Best code coverage online tool for large open source monorepo (100+ packages)?

Hi everyone,

I'm working on a large open source monorepo with over 100 packages, and I'm looking to properly set up code coverage reporting.

# Current setup:

Each package generates its own `lcov` file
I can merge them into a single root locv file, if necessary.

With that, I'm looking for:

A solid online code coverage tool
Supports monorepos
Can show coverage badges per package
Integrates easily with CI (GitHub Actions)

# Questions:

What tools do you recommend? (e.g., Codecov, Coveralls, SonarCloud, others?)
Have you set up coverage reporting for a monorepo of this scale before? Any tips or lessons learned?

I’ve never handled coverage at this scale before, so any guidance, examples, or war stories would be super helpful.

Thanks in advance!

https://redd.it/1mblvnh
@r_devops
Connecting to Cloud SQL From Cloud Run without a VPC (GCP)

According to this post that was recently sent to me, its not necessary to create a VPC and doing so would create a network detour effect, as traffic would go out of a GCP managed VPC to your own VPC and back to their VPC. I'm wondering what everyone's thoughts are on this sort of network architecture--i.e. enabling peering to make this connection happen. As it stands, it seems like I wouldn't be able to use IAM auth with this method and would need dedicated postgres credentials for my cloud run jobs. One, is this a valid method of making this connection happen? And two, should I actually be using dedicated credentials (instead of IAM tokens) in production? Lastly, any reason to do all this instead of just use a Cloud SQL Connector? In my case, regarding the connector--there is no support for psycopg yet as a database adapter, but that is soon changing. In the meantime, I'd have to use asyncpg if I wanted to use a connector.

https://redd.it/1mbngxm
@r_devops
Do OSS compliance tools have to be this heavy? Would you use one if it was just a CLI?

Posting this to get a sanity check from folks working in software, security, or legal review.
There are a bunch of tools out there for OSS compliance stuff, like:
License detection (MIT, GPL, AGPL, etc.)
CVE scanning
SBOM generation (SPDX/CycloneDX)
Attribution and NOTICE file creation
Policy enforcement

Most of the well-known options (like Snyk, FOSSA, ORT, etc.) tend to be SaaS-based, config-heavy, or tied into CI/CD pipelines.

Do you ever feel like:
These tools are heavier or more complex than you need?
They're overkill when you just want to check a repo’s compliance or risk profile?
You only use them because “the company needs it” — not because they’re developer-friendly?

If something existed that was:
Open-source
Local/offline by default
CLI-first
Very fast
No setup or config required
Outputs SPDX, CVEs, licenses, obligations, SBOMs, and attribution in one scan...

Would that kind of tool actually be useful at work?
And if it were that easy — would you even start using it for your own side projects or internal tools too?

https://redd.it/1mbowvo
@r_devops
Falling in love with problems... not tools

Time and time again, I find myself falling in love with a tool rather than the initial problem I set out to solve. This tends to lead to over-engineering because I'm constantly chasing the most optimized way to structure the codebase, create pipelines that meet each and every use case, and build scalability into every single app that might only ever have five users (I'm looking at you k8s).

I feel like it's not inherently wrong to strive for optimization or scalability. But as the saying goes: progress over perfection. Our job is to deliver what the business needs and solve problems that drive the company and broader industry forward. Sometimes I lose sight of that fundamental truth.

The infrastructure we build, the automation we create, and the systems we design are all means to an end. They're not the destination... they're the vehicle that gets us there. When we become too enamored with the elegance of our technical solutions, we risk losing sight of the business value we're supposed to deliver.

Anybody else feel this way?



https://redd.it/1mbqy6v
@r_devops
What secret management tool do you use?

We are interested in implementing this at home to securely transfer passwords and certificates from one specialist to another. The tools should have an option to be integrated with services such as Jenkins and Ansible.

Although I have not worked with this type of program before, I believe a good starting point would be to try HashiCorp Vault https://github.com/hashicorp/vault. What are your thoughts on this, and which ones do you use?

https://redd.it/1mbsyje
@r_devops
SRE / DevOps more exciting than full stack development?

looking for some vibes based career advice.

I'm currently a web dev at a f5000, 3 yoe, and kinda bored. Lately, I feel most engaged and satisfied when production bugs gets me into the zone, and I have to use all my mental energy to resolve the bug ASAP and make a meaningful difference to a user.

This happens about once a week for a few hours at a time. The rest of the time I'm babysitting GitHub copilot to do some CRUD ticket.

I know it's a pretty nice gig, grass is greener on the other side, etc etc. I am still interested in hearing some perspectives:

if you've moved from full stack web dev to SRE or DevOps, do you find the work more engaging? More secure? More lucrative? Is there downtime?

For more context, my company does not have dedicated SRE / DevOps roles. I'm planning ahead for if I get laid off, or decide to commit to upskilling for a 'better' job.

To be honest, I have a limited understanding of what SRE and DevOps roles involve. I imagine working with kubernetes, terraform, being on call a lot, etc. Do let me know if there's something I'm missing. TIA

https://redd.it/1mbv64v
@r_devops
Started a newsletter digging into real infra outages - first post: Reddit’s Pi Day incident

Hey guys, I just launched a newsletter where I’ll be breaking down real-world infrastructure outages - postmortem-style.

These won’t just be summaries, I’m digging into how complex systems fail even when everything looks healthy. Things like monitoring blind spots, hidden dependencies, rollback horror stories, etc.

The first post is a deep dive into Reddit’s 314-minute Pi Day outage - how three harmless changes turned into a $2.3M failure:

Read it here

If you're into SRE, infra engineering, or just love a good forensic breakdown, I'd love for you to check it out.

https://redd.it/1mbo3oq
@r_devops
DevOps Projects Feedback

Hi Reddit Fam!

I have been trying to create a portal which resonates with the actual project that people can do and get hands-on experience.

Now making the portal was not challenging but putting the quality project at one place is, the best way I thought of collecting the project was to target various certification examination and get the projects around it.

I have added few project, if you guys can just give me a feedback on them. And also what all more type of project I should put here? Any recommendations would be appreciated.

Website: https://bartman.ai/
Coupon code: DOCKERSEC

If something doesn’t work then let me know.

For now, I am focused on CKA certification for this week.

https://redd.it/1mc4uky
@r_devops
Anyone integrated an AI code reviewer into your CI/CD?

We just rolled out CARE — an AI-powered plugin that performs code reviews directly in your CI/CD pipelines or locally. 

It’s tailored for Guidewire/Gosu (but also supports Java or any other popular programming language) and integrates with Bitbucket/Git/Azure DevOps. 

Instead of static rule checks, CARE does:  


Real-time feedback in MRs 

Unit test/code generation 

Inline responses to dev comments 

Seamless updates with new best practices 


Trying to gauge: is DevOps moving toward proactive QA with AI, or is this still too early for most teams? 

https://redd.it/1mc5obe
@r_devops
Do DevOps teams at newer companies still choose Terraform for IaC, or native IaC services (like CloudFormation/Bicep)?

Terraform has been the go to for companies with cloud resources across multiple platforms or migrating from onprem, because of its great cross platform support. But for newer startups or organisations starting out in the cloud, I’d say using platform specific IaC services is usually easier than picking up Terraform, and the platform integration is probably better too. Native tools also don’t require installing extra CLIs or managing state files.

If you're at a newer company or helping clients spin up infra, what are you using for IaC? Are platform native tools good enough now, or is Terraform still the default?

https://redd.it/1mc7p46
@r_devops
Free DevOps Tool Developer Experience Audit

I'm offering free developer experience audits specifically focused on DevOps tools.

My background: Helped dyrectorio (deployment orchestration and container management) and Gimlet (GitOps deployment) gain significant GitHub adoption through improved developer onboarding and documentation. Not affiliated with them anymore.

I specialize in identifying friction points in CI/CD pipelines, infrastructure tooling adoption, and developer-facing automation workflows.

What I'll analyze:

Developer onboarding for your DevOps tools
CI/CD pipeline user experience and documentation
Infrastructure-as-code developer workflows
Tool integration friction points

DM me if you'd like an audit of your developer-facing DevOps processes.

https://redd.it/1mc8qna
@r_devops
Problem when fetching image via api gateway

I'm trying to use KrakenD as an api gateway. I have this endpoint on a flask microservice (both the gateway the microservice are conteinerized)

/images/<date>/<hour>/<filename>

When I fetch the image with a direct connection there are no errors. When I use the endpoint on the gateway it gives back a 404 error. This is the endpoint. I have other endpoints but those work.

        {
            "endpoint": "/api/images/{date}/{hour}/{filename}",
            "method": "GET",
            "inputparams": [
                "date",
                "hour",
                "filename"
            ],
            "backend": [
                {
                    "url
pattern": "/images/{date}/{hour}/{filename}",
                    "host":
                        "https://data_processor:8080"
                   
                }
            ]
        }


This is the configuration of the endpoint.

https://redd.it/1mc9nq4
@r_devops
Rollouts

Hello folks,

I want to understand how you guys handles the rollouts.

We are hosting services on Azure.

While rollout, we have few manual changes in app config, kv, DB, etc. and then push services one by one to AKS, how do you handles it, so that everybody will understand different approaches and can implement.

https://redd.it/1mc7v24
@r_devops
If I hear "treat your platform as a product" one more time...

Let's just admit it that we've all been there:
You start with a clean slate. You build a platform tailored perfectly to your org.

Custom pipelines. Custom tooling. A CI/CD “stack” that makes sense to you.

And it works… until it doesn’t.

Suddenly, your internal platform is this black box only you and your team understand.

It’s brittle, hard to onboard new people to, impossible to scale cleanly, and when something breaks, you’re reinventing the wheel again.

We all say things like “our business is unique”, “our scale is different”, “our use case is too complex”. But in reality, the foundations are the same across the board.

https://redd.it/1mcc78f
@r_devops
Tried Jenkins again, was not that bad as I had in mind!

Hi everyone,
as the title says, I gave Jenkins another shot. The last time I used it was at my former company, with a pretty archaic setup: several VMs running Docker Engine, the Docker plugin to spin up workers, and some static servers for on-site deployments in a local datacenter. All of it glued together with some cool Ansible playbooks (still proud of those, ngl). The goal back then was to avoid the classic pet server scenario. If you know me personally, you probably know the company I worked for!

Now I gave it a fresh spin and I approached it with a Kubernetes-first mindset. Deployed everything via Helm charts and used the Kubernetes plugin. And since I like working with Pulumi (and work since then for them), I used that too. You could likely do the same with Terraform and the Kubernetes/Helm provider.

I wrote it all down here: https://www.pulumi.com/blog/jenkins-pulumi-2025-experience/

Any "old" DevOps tech you gave also a new lock/try?

https://redd.it/1mcg2kk
@r_devops
Farewell to my dad

https://blog.mattsbit.co.uk/2025/07/23/dad/

I originally wrote the speach in my blog repo, just for writing purposes.

My dad's funeral was a couple of days ago and wondered, maybe, someone might appreciate it - either because they've lost their dad or it makes them appreciate their dad a little more.
Particularly in this community, as I assume you probably grew up with messing with computers and/or servers and probably had a similar influence from your dads.

https://redd.it/1mcheuv
@r_devops
Is there a proper way to get depot sizes on perforce ?

I wrote a script for our perforce server , but sooner after it crashed our server.
The server was a 4 CPU and 8GB RAM system that was stable. But after running my script it crashed the server (linux) . After our crash I doubled the CPU to 8 and RAM to 16GB .

Still wary of using my script below and asking how perforce admins query depot sizes safely.


depot_sizes.sh
—————————————————

 \#!/bin/bashfor

depot in $(p4 depots | awk '{print $2}'); do   
echo "Depot: $depot"   
p4 sizes //$depot/... | awk '{total += $4} END {print "  Total Size: " total " bytes\\n"}'
done

—————————————————


https://redd.it/1mcherq
@r_devops
DevOps Confessions

Hey guys. just ran into something funny on YouTube, thought you might enjoy it.

Plus, AI videos are terrifying.

https://www.youtube.com/watch?v=Y1xIRAjzTjM

https://redd.it/1mcgq4j
@r_devops
Test your database backups before they fail you in production

Hey devs! 👋

Just shipped BackupGuardian - tired of backup validation tools that only check syntax but don't actually test restoration.

This one spins up Docker containers and actually restores your entire backup to see what breaks. Supports PostgreSQL/MySQL + has a CLI for CI/CD.

Built it after a 3 AM incident where a "validated" backup was missing half the constraints 😅

Demo: https://www.backupguardian.org

Anyone else been burned by "good" backups before?

https://redd.it/1mctf97
@r_devops
Good tip

I came across this tip and couldn't help but share it it's so good and useful

I think we should follow him and help him reach 10 followers

https://x.com/username_husan/status/1950368472258793538?s=46

https://redd.it/1mcwusn
@r_devops