Reddit DevOps
269 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Streamlining Secrets Management for AWS Lambda with AWS Secrets Manager & TypeScript


Hello r/devops,

I’d like to share my latest video tutorial on securing AWS Lambda functions using AWS Secrets Manager in a TypeScript monorepo. This method centralizes secret management, improves security, and ensures cost efficiency—key aspects for modern DevOps practices.

Watch the video: https://youtu.be/I5wOfGrxZWc
Access the source code here: https://github.com/radzionc/radzionkit

I appreciate any thoughts or feedback you may have. Thanks for reading!

https://redd.it/1jd4cck
@r_devops
I chose docker swarm

Wanted to know your opinion on this setup i made.

So i got hired by this company who has a lot of mobile apps and websites. All backends were dockerized and put on one mega ec2 instance, bound to a different port on the machine with a nginx reverse proxy listening on the domain and sending traffic to the respective port on localhost.

The server's load was through the roof and they wanted to add more and more backends.

One more thing of relevance here, I'm the only devops guy there, the rest are backend developers with little knowledge in docker or frontend devs with no knowledge in docker.

The solution i proposed, docker swarm over multiple ec2 instances.

First i used nginx docker instead of installing it on the instance directly, one replica per instance.

Second, all internet facing app is added to the nginx docker network. This eliminates the need to bind it on the host and can be reached internally from nginx container using stackname_servicename:serviceport.
The service can have a second network if it has any other services.

We can almost use the same docker compose files that were used before, aside from the few new commands devs have to learn, they can all understand the infra.

Now i could set up ASG in aws, but i would prefer to do it manual for now, i prepared a terraform/ansible script that provisions the leader/nodes of the swarm and i can simply increase the number of nodes and it will be providioned and configured into the swarm.

For dns, i want to add every node public ip to every domain (now this bit surely needs improvement) so that it reaches the nginx on the node itself.

Databases are still a problem as i chose i put them all on the leader node so i would preserve the data on restarts. I chose this over doing ebs multi-attach or efs.

Let me know your opinion on this and how you would improve it

https://redd.it/1jd75nc
@r_devops
The eternal struggle

Tech is easy. You have a problem, you troubleshoot, you fix it. Rinse and repeat. But explaining that problem to someone who isn’t knee-deep in logs and YAML files? That’s where I crash and burn.

I’ve been working in DevOps for a while now, and the more I progress technically, the more I realize that my soft skills are lagging hard. Talking to stakeholders, justifying decisions, even something as basic as daily stand-ups.half the time, I feel like I’m either over-explaining or not making sense at all. It’s like my brain refuses to translate tech into human language.

And it’s not just a work thing. The same awkwardness bleeds into my personal life. Making conversation? small talk? networking? It feels like an impossible task. Meanwhile I see colleagues who just get people. They navigate meetings like it’s a dance, while I’m out here stepping on toes and knocking over chairs.

I know soft skills are a muscle that needs training, but imo it requires actual effort and consistency, and I’d rather refactor a spaghetti-code terraform module than actively work on my communication skills.

https://redd.it/1jd95s6
@r_devops
Roast My SaaS Monorepo Refactor (DDD + Nx) - Where Do Migrations & Databases Go?

Hey r/devops, roast my attempt at refactoring my SaaS monorepo! I’m knee-deep in an Nx setup with a Telegram bot (and future web app/API), trying to apply DDD and clean architecture. My old aws_services.py was a dumpster fire of mixed logic lol.

I am seeking some advice,

Context: I run an image-editing SaaS (\~$5K MRR, 30% monthly growth) I built post-uni with no formal AWS/devops training. It’s a Telegram bot for marketing agencies, using AI to process uploads. Currently at 100-150 daily users, hosted on AWS (EC2, DynamoDB, S3, Lambda). I’m refactoring to add an affiliate system and prep for a PostgreSQL switch, but my setup’s a mess.

Technical Setup:

Nx Monorepo:
/apps/telegram-bot: Bot logic, still has a bloated aws_services.py.
/apps/infra: AWS CDK for DynamoDB/S3 CloudFormation.
/libs/core/domain: User, Affiliate models, services, abstract repos.
/libs/infrastructure: DynamoDB repos, S3 storage.
Database: Single DynamoDB (UserTable, planning Affiliates).
Goal: Decouple domain logic, add affiliates (clicks/revenue), abstract DB for future Postgres.

Problems:

Migrations feel weird in /apps. DB is for the business, not just the bot.
One DB or many? I’ve got a Telegram bot now, but a web app, API, and second bot are coming.

Questions:

1. Migrations in a Monorepo: Sticking them in /libs/infrastructure/migrations (e.g., DynamoDB scripts)—good spot, or should they go in /apps/infra with CDK?
2. Database Strategy: One central DB (DynamoDB) for all apps now, hybrid (central + app-specific) later. When do you split, and how do you sync data?
3. DDD + Nx: How do you balance app-centric /apps with domain-centric DDD? Feels clunky.

Specific Points of Interest:

Migrations: Centralize them or tie to infra deployment? Tools for DynamoDB → Postgres?
DB Scalability: Stick with one DB or go per-app as I grow? (e.g., Telegram’s telegram\_user\_id vs. web app’s email).
Best Practices: Tips for a DDD monorepo with multiple apps?

Roast away lol. What am I screwing up? How do I make this indestructible as I move from alpha to beta? DM me if you’re keen to collab. My 0-1 and sales skills are solid, but 1-100 robustness is my weak spot. Thanks for any wisdom!

https://redd.it/1jd94kp
@r_devops
Monitoring terraform flow

What's the correct way to monitor terrafrom flow win s3 bucket as a back-end with big devops team?
Is there an option to have easily human readable output?

Or the best way just to use something like Atlantis and just abstain from using terraform from local machines?

https://redd.it/1jdbchn
@r_devops
Let's talk about remediating cloud security issues

Let’s say an issue pops up in a cloud security tool (Wiz, Orca, Prisma Cloud, etc.).

I’m trying to understand what happens after an alert is prioritized and added to Jira for remediation by DevOps/DevSecOps.

What takes the most time in the remediation process? I assume it depends a lot on the alert type. I’d imagine that figuring out the impact of a change on existing infrastructure and applications takes a while—but does it? Is there anything else that slows things down?

Also, do the "simple" alerts—like closing an S3 bucket, restricting an IAM role, or changing a policy—still take time to remediate?

Thanks!

Disclaimer - I am now building a security startup and I want to understand this problem better.

https://redd.it/1jdboxq
@r_devops
How toil killed my team

When I first stepped into the world of Site Reliability Engineering, I was introduced to the concept of toil. Google’s SRE handbook defines toil as anything repetitive, manual, automatable, reactive, and scaling with service growth—but in reality, it’s much worse than that. Toil isn’t just a few annoying maintenance tickets in Jira; it’s a tax on innovation. It’s the silent killer that keeps engineers stuck in maintenance mode instead of building meaningful solutions.

I saw this firsthand when I joined a new team plagued by recurring Jira tickets from a failing dnsmasq service on their autoscaling GitLab runner VMs. The alarms never stopped. At first, I was horrified when the proposed fix was simply restarting the daemon and marking the ticket as resolved. The team had been so worn down by years of toil and firefighting that they’d rather SSH into a VM and run a command than investigate the root cause. They weren’t lazy—they were fatigued.

This kind of toil doesn’t happen overnight. It’s the result of years of short-term fixes that snowball into long-term operational debt. When firefighting becomes the norm, attrition spikes, and innovation dies. The team stops improving things because they’re too busy keeping the lights on. Toil is self-inflicted, but the first step to recovery is recognizing it exists and having the will to automate your way out of it.

https://redd.it/1jdd63a
@r_devops
DDOS, what's your story ? How much ? Who ? What do you do against it ? any horror stories to share ?

I'm curious to hear about your DevOps experience regarding DDoS attacks.

How often do you encounter DDoS attacks, and what type of DDoS are they (L7, for example)?

Have you noticed specific patterns or events that trigger these attacks?

What tools do you use to defend against them?

Do you have any horror stories to share?

https://redd.it/1jddc3f
@r_devops
Grafana Alloy: My Promtail Migration Journey (with HCL configs ready to steal)

Hey fellow DevOps warriors,

After putting it off for months (fear of change is real!), I finally bit the bullet and migrated from Promtail to Grafana Alloy for our production logging stack.

Thought I'd share what I learned in case anyone else is on the fence.

Highlights:

- Complete HCL configs you can copy/paste (tested in prod)

- How to collect Linux journal logs alongside K8s logs

- Trick to capture K8s cluster events as logs

- Setting up VictoriaLogs as the backend instead of Loki

- Bonus: Using Alloy for OpenTelemetry tracing to reduce agent bloat

Nothing groundbreaking here, but hopefully saves someone a few hours of config debugging.

The Alloy UI diagnostics alone made the switch worthwhile for troubleshooting pipeline issues.

Full write-up:

https://developer-friendly.blog/blog/2025/03/17/migration-from-promtail-to-alloy-the-what-the-why-and-the-how/

Not affiliated with Grafana in any way - just sharing my experience.

Curious if others have made the jump yet?

https://redd.it/1jdhqnk
@r_devops
How many of you fellow devopses actually do meaningful work ?

I'm not talking about "some" work, but actually meaningful work like:

- migrating big important workloads

- solving high scaling issues

- setting up stuff from ground up (tenants for clients that pay a lot)

- managing fleets of k8s clusters

---

Recently I joined a team that supports some e-commerce platform, but majority of work is doing small fixes here or there, pay is good and I have a lot of free time, but I'm wondering, how many ppl are doing barely anything like me and how many are doing the heavy lifting.

https://redd.it/1jdiygl
@r_devops
Advice on CI/CD setup with GitHub Actions

I'll try to keep this short. We use GitHub as code repository and therefore I decided to use GH action for CI/CD pipelines. I don't have much experience with all the devops stuff but I am currently trying to learn it.

We have multiple services, each in its own repository (this is pretty new, we've had a mono repository before and therefore the following problem didn't exist until now). All of these repos have at least 3 branches: dev, staging and production. Now, I need the following: Whenever I push to staging or production, I want it to basically redeploy to AWS using Kubernetes (with kustomize for segregating the environments).

My intuitive approach was to make a new "infra" repository where I can centrally manage my deployment workflow which basically consists of these steps: Setting up AWS credentials, building images and pushing it to the AWS registry (ECR), applying K8s kustomize which detects the new image and accordingly redeploys them.

I initially thought introducing the infra repo to seperate the concern (business logic vs infra code) and make the infra stuff more reusable would be a great idea, but I realized fast that this come with some issues: The image build process has to take place in the "service repo", because it has to access the Dockerfile. However, the infra process has to take place in the infra repo because this is where I have all my k8s files. Ultimately this somehow leads to a contradiction, because I found out that if I call the infra workflow from the service repository, it will also be executed in the context of the service repo and therefore I don't have access to all the k8s files in the infra repo.

My conclusion is that I would somehow have to make the image build and push in the service repo. Consequently the infra repo must listen to this and somehow gets triggered to do the redeployments. Or should I just checkout another repo?

Sorry if something is misleading - as I said, I am pretty new to devops. I'd appreciate any input from you guys, it's important to me to somehow follow best practices so don't be gentle with me.

Edit: typos

https://redd.it/1jdksxo
@r_devops
I Did analysis of DevOps job market for 2025

Hi Folks,

beginning of 2024 I did a pet project and scraped around 700 Linkedin DevOps jobs post. Still had the data and wanted to do smt with it so Yesterday I did compared it to March 2025.

Here are findings coding is required much more than it used to.. Golang went up 13%, Python went up 9% as well as JS.
Hate to say but Jenkins went up idk why but my guess less people work with it and there is a shortage.
there are other things too like certificates are less required now or mentioned (by a lot)

anyway here is the article https://prepare.sh/articles/devops-job-market-trends-2025

I advice you to check it out but just in case you want very minimal version:
TL;DR

Go +13%
Python +9%
Jenkins +6.8% (almost 7%)
Terraform +9%
Flux down, Argo up (slightly)

Certs are mentioned way less than they used to by 15-20%. Everyone seems to got one and they get are saturated.

https://redd.it/1jdo4zd
@r_devops
How’s MAcbook air M4 for a software engineer

I'm thinking about getting the MacBook Air M4 for my everyday engineering tasks. I don’t do anything too intense—just running web apps, scripts, and a few Docker containers on my local machine. It’s mostly standard DevOps stuff. My work leans more toward DevOps and cloud computing, and I usually run the heavier applications on a remote server.

For those with a MacBook Air, do you think it’s a good fit for my typical workload?

https://redd.it/1jdvppg
@r_devops
Best devops tutorials that are equivalent or almost equivalent to actual work experience

In my experience, practical tutorials are the best thing to become ready to take on any job, so I am wondering what are the best practical tutorials for devops.

https://redd.it/1jdvmez
@r_devops
Do We Still Need Daily Stand-Ups & Cross-Team Syncs?

With so many tools for async collaboration, do we still need frequent one-on-one syncs between teams, or can automated updates and feedback loops replace them?

Are daily stand-ups and constant check-ins still necessary, or has your team found a better way to collaborate? Would love to hear how different teams handle this!

https://redd.it/1jdywwh
@r_devops
Large critical data stores in the cloud

How do you feel about having large critical data stores in the cloud? On site databases allow you to take physical backups and take them off site so you can always recover if necessary however impractical that might be. Although cloud gives you better resilience does that give you full confidence in your ability to recover from any disaster eg bad actor. Is cross account backup sufficient? Do you back up to a different vendor? Or do you still sink the data to on premise storage just in case?

https://redd.it/1jdzrxy
@r_devops
DevOps Engineers – Please Help With My Graduation Project on Security Scanning Tools!

Hey everyone!

I’m working on my thesis and need your help! I'm conducting a short survey as part of my research to improve security scanning tools for DevOps teams, and I would really appreciate your input.

The survey is focused on understanding your experiences with security scanning tools like Microsoft Defender (for Cloud), Trivy, Snyk, and others within your DevOps pipelines. It includes questions about:

How often you scan container images for vulnerabilities
The tools you currently use for security scanning
The challenges and limitations you face
Your feedback on what improvements would make these tools better

This short survey is part of my graduation assignment, where I’m developing a new security scanner for Azure DevOps, aimed at improving security in DevOps environments. Your input will directly help shape the development of this tool.

Deadline: Please complete the survey by March 25, 2025.

🔗 Take the Survey Here!

Thank you so much for your help! 🙏

Your insights are invaluable for my project and will contribute to making DevOps security tools better for everyone!

https://redd.it/1je0eh7
@r_devops
EU SysEleven: has anyone worked with it?

hey devops people,

I may start working in a company which will transition from AWS & Azure to SysEleven, which is some German-based open-source provider which offers managed Kubernetes solutions. This decision is taken already, it's just a matter of implementing it now.

has anybody worked with SysEleven? what's the vibe here? what were some pain points during transitions? any opinion and feedback with your work with it is welcomed.

https://redd.it/1je1nen
@r_devops
What's the best starting point for devops?

Hi there, I started self learning IT a couple months ago, I am fascinated about devops world but I know it is not an entry level position. I already looked at the roadmap so I know that many skills like linux, scripting etc are requested in order to get to that point, and it will surely take some years, but in the meantime is it better to start working as a developer or as a helpdesk/sysadmin? Which one would be more helpful for future devops ?

https://redd.it/1je17vs
@r_devops
DevOps job prospects, EU

For someone who would be fluent in the host nations language and has 5+ years of experience AWS, AZURE etc, how is the job market looking in Germany/Netherlands/Belgium etc. for cybersecurity roles at present? Is there much demand?

https://redd.it/1je33y5
@r_devops