Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
AWS network automation

I find myself in a funny position to redo part of the network in AWS. We have two parts: one is newer and uses transit gateways that are centralized in a single account, the other is older and vpc peering is used between many accounts/vpcs. We try to use terraform for everything. That said, how the $%\^&* do you automate transit gateways?

In terraform, i have taken the following steps in the past

1) Got into the product's terraform repo, run the attachment module we have and it outputs the gateway attachment id.

2) Get into the centralized network account repo, add the cidr/attachment id under a region in a large json file and run it. It adds the attachment id to a route table (non-prod vs prod) and a static route to the cidr is added in other regions as needed. The terraform module I wrote is "clever" and Kerighan's law makes it difficult for me to debug problems with the sub 100 vpcs we have now.

How do people handle this with hundreds of vpcs in a way that keeps state? I can see this working with a bunch of cloudwatch event rules and lambdas, but that seems very push and pray to me whereas I know what I'm getting with terraform before applying it.

https://redd.it/1kdcirx
@r_devops
MacOs HomeBrew and Open Source tooling

Hey guys!

Quick question for ya, I've been at a job for awhile now but we just got transitioned over to macOS. We were on windows machines before. Software was always distributed through self service software centers or pushed via org policy.
Now however Im running into issues getting up and running with my dev tooling (mostly cli tools, and local cluster dev). Currently homebrew isnt an approved technology, but its so common to get tools installed that way im not familiar with any other common patterns. Ive been tasked with trying to make an argument to allow it for devs from my team.
Im anticipating security folks and others having a high skepticism because they cannot "own" the software that gets installed there as far as Im aware. The current pattern would have me contact the helpdesk to install software via .pkg or be distributed.

Currently other package managers are allowed - like conda, npm, yarn, etc. But I know its not quite an apples to apples comparison.

What arguments would you make to allow homebrew into the ecosystem? Are any of your jobs able to track whats installed accurately? Im assuming the MDR/AV software locally would pick up something.

https://redd.it/1kdcehg
@r_devops
Need Advice on scaling my platform architecture

I’m building a trading platform where users interact with a chatbot to create trading strategies. Here's how it currently works:

User chats with a bot to generate a strategy
The bot generates code for the strategy
FastAPI backend saves the code in PostgreSQL (Supabase)
Each strategy runs in its own Docker container

Inside each container:

Fetches price data and checks for signals every 10 seconds
Updates profit/loss (PNL) data every 10 seconds
Executes trades when signals occur

The Problem:
I'm aiming to support 1000+ concurrent users, with each potentially running 2 strategies — that's over 2000 containers, which isn't sustainable. I’m now relying entirely on AWS.

Proposed new design:
Move to a multi-tenant architecture:

One container runs multiple user strategies (thinking 50–100 per container depending on complexity)
Containers scale based on load

Still figuring out:

How to start/stop individual strategies efficiently — maybe an event-driven system? (PostgreSQL on Supabase is currently used, but not sure if that’s the best choice for signaling)
How to update the database with the latest price + PNL without overloading it. Previously, each container updated PNL in parallel every 10 seconds. Can I keep doing this efficiently at scale?

Questions:

1. Is this architecture reasonable for handling 1000+ users?
2. Can I rely on PostgreSQL LISTEN/NOTIFY at this scale? I read it uses a single connection — is that a bottleneck or a bad idea here?
3. Is batching updates every 10 seconds acceptable? Or should I move to something like Kafka, Redis Streams, or SQS for messaging?
4. How can I determine the right number of strategies per container?
5. What AWS services should I be using here? From what I gathered with ChatGPT, I need to:
Create a Docker image for the strategy runner
Push it to AWS ECR
Use Fargate (via ECS) to run it

https://redd.it/1kdftny
@r_devops
🚨 DevOps Interview in 2 Days with Zero Experience – Need Your Guidance!

Hey r/devops community,

I'm reaching out for some advice. I have an interview for a DevOps internship in just two days. My background includes basic knowledge of Git, Linux, and Python, but I have no prior experience in DevOps.

Given the limited time, what key areas should I focus on to make the most of my preparation? Any resources, tips, or guidance would be greatly appreciated.

Thank you in advance for your support!

https://redd.it/1kdk7va
@r_devops
Redis is open source again?

Redis seems to be Open Source again!!!

With Redis 8, the Redis community is thinking of going back to open source.

Source: https://thenewstack.io/redis-is-open-source-again/

Guys let's discuss this. Is this real?

https://redd.it/1kdlg94
@r_devops
As a DevOps Engineer, do I need to know databases?

The question pretty much. How important is it to know dbs to be a better DevOps Engineer? Mind you, I'm already a DevOps Engineer but there's barely anything I'm touching db related, or even networking related TBH. Well, networking aside, how important is it to know dbs? I mean, I know dbs (Postgres and MSSQL) a bit, is it needed to know a whole lot more?

https://redd.it/1kdrpcq
@r_devops
i made bikya for selling used products and real estate, please check it out!

made it fully in PHP Any tips would be helpful

https://bikya.infy.uk/

https://redd.it/1kdvift
@r_devops
Cobbler/Chef Educational Resources

I’m a network engineer by day and part time lab assistant to earn a few extra bucks in the evening. They are wanting in the next 90 days to get me spun up on assisting with tickets as the physical lift and rack and cable audit is wrapping up. They utilize cobbler and chef today and asked I start learning it, I’ve never touched any of these. Are there any good resources or recommendations for getting basic down with these? I have some familiarity with ansible but that’s it.

https://redd.it/1kdv75y
@r_devops
What is k8s in bare metal?

Newbie understanding: If I'm not mistaken, k8s in bare metal means deploying/managing a k8s cluster in a single-node server. Otherwords, control plane and node components are in a single server.

However, in managed k8s services like AWS (EKS) and DigitalOcean (DOKS). I see that control plane and node components can be on a different servers (multi-node).

So which means EKS and DOKS are more suitable for complex structure and bare metal for manageble setup.

I'll appreciate any knowledge/answer shared for my question. TIA.

https://redd.it/1kdy5af
@r_devops
Jira time logging for DevOps

I work at a big company and we are required to log the time we work on jira tickets to measure our productivity and for other reports for management. Some times I work the 8 hours but most of the time I finish my tasks and sits free most of the day. So sometimes I fake the logged hours so they know that I'm fully utilized. I've raised this with my manager and he said to fill my backlog and improve the system. I get that I can find somethings to be improved but it won't be the case all the time and I'll have some idle time in the end.

So my questions to you is:
Do you face similar situations at your company? What does it looks like?
How do you measure the productivity of the team?
Is the logged time a good measure to check the engineers productivity?
Any other thoughts? :) Thanks

https://redd.it/1kdxiak
@r_devops
From Rejection to Redemption: How I Broke Into DevOps

Guys, I'm here sitting on my back yard on a beautiful Saturday and I am about to sign an offer letter with a Fortune 500 company — with a 25% salary increase.

But just a few months ago, I was getting rejected from interviews that didn’t even last 10 minutes. I was so embarrassed on how bad I did on the interviews. With over a decade in IT — supporting Windows and Linux systems, solving tough problems, and holding a high-level security clearance — I thought I had a solid foundation. But in the world of DevOps, I kept hearing the same message:

“You don’t have enough experience.”

“You’re not worth senior-level DevOps pay.”

And ironically, being a high earner already seemed to work *against* me.

I was turned down from at least eight interviews. Some didn’t even give me a chance to speak. I started doubting myself — hard.

So when another recruiter reached out, I told her:

"I don’t want to waste your team’s time. My background might not align."

She said:

"Actually, we really like what we see. Let’s get you in front of the hiring manager."_

After the first interview with the **hiring manager**, I asked for **two weeks** to prepare for the technical round — not to delay, but because I was *determined* not to fail again.

At that point, I didn’t even have a home lab. But I went all in.

In those two weeks:

\- Built a full homelab from scratch

\- Deployed the Sock Shop app using ArgoCD

\- Provisioned infrastructure with Terraform

\- Set up monitoring with **Prometheus, Grafana, and Kuberhealthy**

\- Studied nonstop for a HackerRank I had never heard of

\- **Watched DevOps interview Q&A videos on YouTube while driving — even while taking my dog to the vet**

\- **Skipped volleyball — something I love — and turned down social invites from friends just to stay locked in**

The **technical interview was round 2 of 4**, but after one hour of walking through my setup, architecture, and decisions — they said:

"We’re skipping the rest. We're making you an offer."_

That moment changed everything.

**My clearance didn’t get me here. My title didn’t. My past salary didn’t.**

But *grit, sacrifice, and proof of ability* did.

And the cherry on top? I’ll get to **work from home eventually** — a goal I’ve had for years.

To anyone trying to break into DevOps:

Don’t wait until you’re “ready.”

**Start building, start learning, and never stop showing up.**

Your breakthrough might be closer than you think.

Sorry English isn't my first language and I use ChatGPT to help me with this but it's truly my experience. So good luck out there, if I can make it, you can!!!! Cheers!!!

https://redd.it/1ke1fjq
@r_devops
Time-based permissions

What tools are you using for managing time-based temporary permissions, such as AWS/GCP accounts, database, SSH access, etc. ?

Looking for a solution for managing permissions for people accessing restricted resources.

https://redd.it/1ke2w14
@r_devops
Need Guidance for Amazon Systems/DevOps Engineer Interview (Cloud Support Background)


Hope you're all doing well.

I'm currently working as a Cloud Support Engineer and have managed to land an interview with Amazon for a Systems/DevOps Engineer role. While I’m excited, I’m also feeling a bit stressed—mainly because I haven’t officially worked as a Systems or DevOps Engineer before.

The interview email was pretty detailed (and a little overwhelming). As most of you know, the world of DevOps is huge—tons of tools, technologies, and concepts—and it’s tough to gain hands-on experience with all of them. To top it off, the interview includes live coding sessions, which has me even more anxious.

The below qualifications are mentioned in the job description:

Proficient executing standard operating procedures and following operational best practices
• Knowledge of scripting processes in a language such as Bash, Python, or Ruby or coding software applications in a modern language such as Java, TypeScript, or similar
• Experience working cross-organizationally and leading strategic team efforts requiring work from multiple team members
• Experience performance tuning software applications and optimizing fleet utilization
• Experience with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar)

I’m using the prep material Amazon provided, but I’d love any advice on what to focus on—specific tools, topics, or concepts that are likely to come up. Also, if anyone has insight into the kind of coding questions typically asked, that would be super helpful.

Any resources, tips, or just general encouragement would be massively appreciated!

Thanks in advance, and apologies if this isn’t the right place to post.

https://redd.it/1ke1vlz
@r_devops
Upwind's Cloud Security CNAPP. Is it viable?

Can anyone share their real-world experience implementing Upwind's "Runtime-Powered" Cloud Security Platform?

The promise of using real-time runtime data (I think they use eBPF sensors?) to focus only on actual threats and drastically cut alert fatigue – supposedly by 95% – sounds incredibly appealing, especially for teams drowning in alerts from native tools or older solutions. They also talk about 10x faster root cause analysis.

But what's the reality? What are you giving up? Is the eBPF approach truly agentless and low-overhead as claimed, or is there hidden complexity? Does its coverage and visibility really stack up against established agentless players when it comes to things like posture management, vulnerability scanning, and workload protection all rolled into one?

I'm also interested in the value ($) proposition and how it compares in practice to vendors like Wiz or Orca. Is it genuinely simplifying vulnerability management and threat detection effectively?

https://redd.it/1ke5wbp
@r_devops
Where to get started

Hello, I’m a long time admirer of this form. I’m a “junior devops engineer” in the financial field that was a previous mid-level, sulfur engineer, I’ve been doing so-called devops work for about a year now where I’m assigned to a team where I’m managed their pipelining, but I feel like I’m not doingreal devops. I’ve been so studying outside of work just to get more exposure to the field, but I just want to know if there are any seniors in here that can point me in the right directionwhere I can start to get more exposure to more Devos technology. At my job, we don’t utilize a lot of the all the devops technologies. I am starting a new project at work Monday so hopefully I will get more exposure to more technologies. But any pointers would be helpful


https://redd.it/1ke8cl1
@r_devops
Why did it take OpenAI 24 hours to roll back a faulty model?

Hi everyone,

I read through an article by OpenAI and stumbled upon the following segment:

>With the recent GPT‑4o update, we started the rollout on Thursday, April 24th and completed it on Friday, April 25th. We spent the next two days monitoring early usage and internal signals, including user feedback. By Sunday, it was clear the model’s behavior wasn’t meeting our expectations.

>We took immediate action by pushing updates to the system prompt late Sunday night to mitigate much of the negative impact quickly, and initiated a full rollback to the previous GPT‑4o version on Monday. The full rollback took around 24 hours to manage stability and avoid introducing new issues across the deployment.

>Today, GPT‑4o traffic is now using this previous version. Since the rollback, we've been working to fully understand what went wrong and make longer-term improvements.

I am just a developer who is using services like Vercel for deployment (or in a more professional context I used Azure WebApps). Of course, I do understand that for a larger user base, more servers have to be migrated and that this can take a longer time. However, 24hrs feels like a long time to me and I would like to understand, what exactly takes that long in the process. Has anyone insights or information on this?

Thank you :)

https://redd.it/1kedugg
@r_devops
DevSecOps / AI CTF today - Ctf.punksecurity.co.uk

Our CTF runs today, with entry level and difficult challenges across DevSecOps and AI. No cost to play, some prizes for the best teams.

CTFs are little competitive puzzle based games designed to expose you to different tech and have you think in different ways. In our case it’s cicd attacks and AI prompt injection attacks :)

https://ctf.punksecurity.co.uk



https://redd.it/1keev01
@r_devops
What would you be willing to pay for at your company?

Over the years, we’ve seen several licensing dramas and ongoing debates even on this sub — the latest being Redis becoming open source again.

Someone once said: “I'm fine with companies making money from software” — and I’d say that’s the bare minimum.

But the real question is: what would your company actually be willing to pay for? Just compute power? Services? Or even open source software?

If it's the latter: what are you looking for? Suppose a piece of software simply works, has decent documentation, and no major feature gaps — would you still be willing to support it financially?

How do you evaluate packaging and delivering propositions, like Linkerd, or Chainguard, to get paid for? This is what I'm currently pursuing: just releasing and packaging latest — you can try it and test it, you wouldn't ever and ever go in production with a non version pinned software, so I can offer you stable version pinned versions (always based on upstream, no forks) with SBOM and detailed changelog and upgrade instructions, if required.

https://redd.it/1keei1p
@r_devops
Canary like deployments for Custom Resources?

Why is there no Canary-like deployment orchestrator for Custom Resources with quality gateway analysis?

AFAIK, Flagger, Keptn ( have some maintenance problems ), Argo Rollouts, these are tightly bound to K8s vanilla resources and Ingress in general, but what if I want to deploy a Custom Resource, then check metrics, then do some custom action, and promote eventually "the deployment". Ofc I know what's Canary and what's traffic shifting.

Like, how are You versioning and deploying Workflows for batch operations? I want to test it, like use the new version for 10% workloads, and do the incremental promotion eventually based on the quality gateway check ( Prometheus metrics in this case

Thanks

Is this use case nonsense, or the

https://redd.it/1kehhvs
@r_devops