Reddit DevOps
271 subscribers
9 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
[Question] CI/CD Design - Architecture Book Request

Hello fellow devops enthusiast

I’m looking for a solid book (or even an eBook) that goes beyond CI/CD basics and covers design patterns and architecture for real-world setups and could help me face the corporate BS i am facing with the Infra and system teams about environment and security and dev/prod segregation.

Ideally, it should include:

* Production vs development environment design.
* Jenkins agent-controller architecture and best practice.
* Patterns for scaling and securing Jenkins

Examples of integrating Jenkins with Git, Docker, Kubernetes, etc.

I’ve already read Continuous Delivery by Jez Humble, but I’m looking for something more practical, it doesn't matter if it covers Gitlab Runner or Github action, tbh i'm more interested in the architecture and design aspect.


Thank you.

https://redd.it/1mo7p5q
@r_devops
A move to AgentOps Organizations

The focus of many organizations will become human-driven orchestration of agentic agents vs human-driven execution. Our jobs will become the trainers and auditors of these agents fine tuning to fit the needs of the org. This will create flatter looking org charts, where small teams will have dedicated agents as the nucleolus of that department. Example:

Marketing agent,

Operations agent,

Research agent,

Finance agent,

DevoOs agent,

Security agent,

HR agent,

Compliance Agent, etc,

Us humans will be the crew members of those agents. Working cross-departments will be done between the agents not just us picking up the phone or holding a TEAMs meeting with long drawn out 30-60-90 day plans. Our focus will completely be on the agent making sure it operates well in creativity, ethics, judgement and relationship management.

We are at the cusp of this movement. The startups will have the easier time establishing this organizational model, followed by companies with robust r/D departments and finally older modeled organizations like government/federal agencies will eventually migrate.

What are your thoughts?

https://redd.it/1moawl0
@r_devops
7 real S3 screw-ups I see all the time (and how to fix them)

My post in r/aws was blowing up with so much value, so sharing here too!

S3 isn’t that expensive… until you ignore it for a few months. Then suddenly you’re explaining to finance why storage costs doubled.

Here’s the stuff I keep seeing over and over:

1. Data nobody touches - You’ve got objects sitting in Standard for years without a single access. Set up lifecycle rules to shove them into Glacier or Deep Archive automatically.
2. Intelligent-Tiering everywhere - Sounds great until you realize it has a per-object monitoring fee and moves to deep archive at a snail’s pace. Only worth it when access patterns are truly unpredictable.
3. API errors quietly eating your budget - 4xx and 5xx errors are way more common than people think. I’ve seen billions of them in a single day just from bad retry logic.
4. Versioning without cleanup - Turn it on without an expiration policy and you’ll pay to keep every single version forever.
5. Archiving thousands of tiny files - Those 1KB objects add up. Compact them before archiving, you can do it through the API, no need to download.
6. Backup graveyards - Backups that nobody touches but still sit in Standard storage. If you’re not reading them often, save them directly into a cheaper class, worst case - pay for the retrieval.
7. Pointless lifecycle transitions - Don’t store something in Standard for 1 day and then move it. Just put it in the right class from the start and skip the extra PUT fee.

Sounds obvious... but those fixes might be worth 50% of your S3 bill...

(Disclaimer: Not here to sell you anything, just sharing stuff I’ve learned working with a bunch of companies from small startups to huge enterprises after founding reCost. Hope it helps!)



https://redd.it/1moc5xo
@r_devops
Planning to Become a DevOps Engineer in 2025? Here’s What Actually Matters

I see a lot of people jumping straight into Docker and Kubernetes and then wondering why they feel lost. DevOps isn’t just “learn these 5 tools” it’s a mix of mindset, fundamentals, and the right tools at the right time. Here’s a breakdown of how I’d start if I was new in 2025.

1. Learn the Fundamentals First
Before you even touch fancy automation tools, make sure you actually understand the stuff you’ll be automating. That means:

Linux basics (file system, processes, permissions, services)

Networking (IP, DNS, HTTP/S, ports, routing, NAT, firewalls)

System administration (users, groups, package management, logs)

Bash scripting for automating simple tasks

Basic Python scripting (log parsing, API calls, automation scripts)

If you can’t explain what happens when you curl a URL or why a service isn’t starting, you’ll struggle later.

2. Version Control and CI/CD Are Core Skills
Every DevOps pipeline starts with Git. Learn branching, merging, pull requests, and resolving conflicts.

Then move into CI/CD (Continuous Integration/Continuous Deployment). Popular tools:

Jenkins

GitLab CI

GitHub Actions

CircleCI

You don’t just need to “click a deploy button” — understand pipeline stages, automated testing, build artifacts, and how to roll back if something breaks.

3. Containers and Orchestration
Containers are a big part of DevOps. Start with Docker:

Build images with Dockerfiles

Use volumes and networks

Work with multi-container apps via Docker Compose

Once you’re solid there, learn Kubernetes (K8s). Don’t rush this — it’s a lot. Focus on:

Pods, deployments, services

ConfigMaps and secrets

Scaling and rolling updates

Ingress and service discovery

You’ll also want to understand managed K8s services like AWS EKS, Azure AKS, or GCP GKE.

4. Cloud Skills Are Non-Negotiable
Pick one cloud provider to start: AWS, Azure, or GCP. AWS is the most common, but it’s fine to choose based on job market in your area.

Learn:

Compute (EC2)

Networking (VPC, subnets, security groups)

Storage (S3, EBS)

IAM (roles, policies, least privilege)

Then, learn how to deploy containers or Kubernetes clusters in the cloud.

5. Infrastructure as Code (IaC)
This is how you make cloud resources repeatable and version-controlled. Terraform is the most popular and works with all major clouds.

Learn how to:

Define infrastructure in .tf files

Use variables and modules

Apply and destroy infrastructure safely

Store state securely

6. Monitoring, Logging, and Alerting
If you build and deploy something but can’t see when it’s failing, you’re not doing DevOps.

Get hands-on with:

Prometheus + Grafana for metrics

ELK stack (Elasticsearch, Logstash, Kibana) for logging

Cloud-native tools like AWS CloudWatch or GCP Stackdriver

7. Security (DevSecOps Basics)
Security is now a core part of DevOps, not an afterthought. Learn to:

Scan code for vulnerabilities (Snyk, Trivy)

Manage secrets (Vault, AWS Secrets Manager)

Secure Docker images

Apply IAM best practices

8. Build Real Projects
Don’t just follow tutorials. Build something end-to-end, like:

A microservice app with Docker

CI/CD pipeline → Docker → Kubernetes → Cloud deployment

Terraform for infra provisioning

Monitoring + logging setup

Push everything to GitHub with a README that explains your setup.

9. Network With the Community
Join DevOps communities:

Reddit (r/devops, r/kubernetes, r/aws)

CNCF Slack channels

DevOps Discord servers

Local meetups or conferences

Ask questions, share your progress, and help others.

10. Stay Consistent & Keep Learning
DevOps tools evolve fast. Even once you land a job, you’ll keep learning. Read blogs, watch KubeCon talks, experiment in your home lab.

If you start from zero and commit a few hours per week, you could be job-ready in 6–8 months. The key is not to try and master everything at once — build layer by layer, and make sure each new tool you learn connects to something you already understand.

If you want a well-structured course & resource suggestions to
follow this roadmap step-by-step, DM me and I’ll share what worked for me and others breaking into DevOps.

https://redd.it/1moe2i9
@r_devops
Focus Career in DevOps

I have grown to have a strong interest in the world of DevOps and I keep seeing these "road maps" posted on LinkedIn and other threads. I'm curious from those who actually work in a DevOps focused role what a true development path would be.

Currently, I have focus in the following areas

\- Networking fundamentals (certified with CompTIA Net+ and hands on experience with Fortinet and some Cisco)

\- AWS cloud basics with hands on experience with EC2, S3, IAM and CloudWatch. I have noob level experience with Terraform as well.

\- Powershell scripting

\- Microsoft services (Exchange, MDM, SharePoint, Entra)

\- Windows Server

\- Active Directory

\- Linux basics


What areas should I add or consider learning besides the areas I am dedicating time to develop already? I heard Kubernetes and Docker was a good area but I have zero experience with containers so no idea where to even start.

https://redd.it/1moc4mq
@r_devops
Devops job market

Just curious how the devops job market is as compared to software engineering? Is it as bad a software engineering these days?

https://redd.it/1mok4we
@r_devops
Retraining into DevOps/cloud with no prior experience—Is “DevOps Beginners to Advanced with Projects” a solid starting point?

>Hey everyone, I’m looking to switch into a DevOps or cloud role for a better work–home balance and have zero background in IT or ops. I’ve found the Udemy course “DevOps Beginners to Advanced with Projects” (by Imran Teli). It’s a bestseller with 4.6 rating, updated August 2025, over 54 hours of lessons—tools include Linux, scripting, AWS, Jenkins, GitHub Actions, Ansible, Docker, Kubernetes, Terraform, etc.  .

>


>The hands-on, project-based format seems promising, but I wonder whether it’s too broad. Have any of you taken this course (or something similar)? Does it give a solid foundation? What additional resources or next steps would you recommend to truly understand the why behind the tools, and start applying them effectively in real-world scenarios?

>


>Appreciate any advice—even on hands-on labs, free resources, certification paths, or community groups would be really helpful.

https://redd.it/1mokol5
@r_devops
Are LangGraph + Temporal a good combo for automating KYC/AML workflows to cut compliance overhead?

I’m designing a compliance-heavy SaaS platform (real estate transactions) where every user role—seller, investor, wholesaler, title officer—has to pass full KYC/KYB, sanctions/PEP screening, and milestone-based rescreening before they can act.

The goal:

* Automate onboarding checks, sanctions rescreens, and deal milestone gating
* Log everything immutably for audit readiness (no manual report compilation)
* Trigger alerts/escalations if compliance requirements aren’t met
* Reduce the human compliance team’s workload by \~70% so they only handle exceptions

I’m considering using LangGraph to orchestrate AI agents for decisioning, document validation, and notifications, combined with Temporal to run deterministic workflows for onboarding, milestone checks, and partner webhooks (title/escrow updates).

Question to the community:

* Has anyone paired LangGraph (or similar LLM graph orchestration) with Temporal for production-grade compliance operations?
* Any pitfalls in using Temporal for long-lived KYC/AML processes (14-day onboarding timeouts, daily sanctions cron, etc.)?
* Does this combo make sense for reducing manual workload in a high-trust, regulated environment, or would you recommend another orchestration stack?

Looking for insights from anyone who’s run similar patterns in fintech, proptech, or other regulated SaaS.

https://redd.it/1mokg0f
@r_devops
Trading Support Engineer looking to transition into SRE/Devops after lay off. What are my chances?

I am currently weighing my options as I recently got laid off and I see no future in the support engineering role.


It really sucks to be in this position as I know that having different titles in my resume can hurt my chances because I am not going on a sensible trajectory or something\~

My experience:

In the past I have worked as a Quality Analyst for Facebook (2 years) under contract with WiPro, A testing engineer (2 years) for Facebook under Wipro, and a quality assurance engineer for a year at a lesser known company. In my current role as a Support engineer with 4 years of experience, I manage incidents, failovers, config management, troubleshooting kubernetes services, monitoring and alerting, approve releases and do rollbacks. I support a low-latency trading platform at a hedge fund and often have to investigate networking problems using Grafana and look at logs from all types of services.

Transition into Devops/SRE:

As I do my research I came across devops as the path to take when transitioning to SRE roles, but I don't have experience in the following: Cloud, Linux, Terraform, Deployments . I have basic experience with Python, SQL for data analytic projects, and use Grafana and Elk but I don't actually make the dashboards. I know how to use ArgoCD and have used Jenkins before although I forgot. I have exposure to most tools on a superficial level.

My plan:

I am considering doing the Cloud Computing and DevOps Certification Program from Purdue and Simplilearn to get experience in these areas. I think this is going to give me the guidance and structure I need and the hands on experience I am lacking as it's project heavy. After finishing I would take some AWS certs that are relevant to the role's I am applying.

My questions:

\- Has anyone heard of or taken this certification?
\- Is this line of work affected by the tech lay offs?
\- What are my chances of entering a well known company with my experience and the Certifications?
\- Is Support engineering -> DevOps or SRE a good transition path or are these not related?
\- Any advice anyone can give me as I navigate my options in DevOps and SRE?

Side note: I know my work is reactive and Devops SREis proactive. But i think it can help that I deal with live issues in production environments and the goal is to reduce down time?



https://redd.it/1mos4j2
@r_devops
Ask for Career Shift Advice

Can I transition from business to DevOps? I'm 27 years old, so I think I'm too old to start learning something heavy such as DevOps from scratch without any programming language. What would you recommend I start with first?



https://redd.it/1moxsir
@r_devops
VSCode extensions

Which extensions helped you the most, while using k8s, TF, Fastlane, Gitlab etc.

https://redd.it/1moysyx
@r_devops
Prep for aws devops role - associate level?

Hi all,

So i have been a sysadmin for past 2½ year, now looking for devops role(preferably aws). Finally after applying for 100+ jobs, i was shortlisted in one and the interview is scheduled for 24th of August(10 days to prerp). I don't want to blow it so how would you guys recommend on the preparation?

About me:

2.6 years exp as API Gateway sysadmin

Good in linux, python, docker, GitHub Actions, API gateway (although it doesn't matter in this job i guess)

Moderate in Ansible, AWS(Ec2,ecs,iam, networking basics)

Below par in Kubes in general, AWSeks,s3, code deploy, other stuffs , IaC


Few things, most of the devops tools I've learnt myself and doesn't have experience in doing at a prod/enterprises level. Even aws I've leveraged free tier and learned most things.

I believe EKS/Kubes is something i can't be slacking so should i try running a dummy eks cluster (it is not in free tier so I did not try yet).


So how would i go by to be interview ready? Any tips/resource would be helpful

Thanks in advance.

P.S I've been geniune in my resume/during frist screening about my background and knowledge.

https://redd.it/1mozfb4
@r_devops
EBPF tools moving fast, but docs still a mess

Been playing around with eBPF lately for some observability stuff. The tools are getting really good, but finding clear info on kernel changes or verifier errors is still painful.

How are you all keeping up? Blogs?Just trial and error?

https://redd.it/1mp015h
@r_devops
Company doesn't pay for training - should I leave ?

I work in the UK as a Junior DevOps Engineer on 40k per year. I have been with my company a year now.
I have managed to touch a wide range of the DevOps tool stack and I feel quite confident in my skills.
I've been looking for new roles to hopefully move into the mid level. And although I know experience is better than certs, every single recruiter I have spoken to has highlighted my lack of certificates.
The problem is that my company doesn't pay for them. They refuse to buy any online courses. And they even refuse to provide us with a sandpit account or learning resources on Aws.

I don't earn a lot of money, but I feel like saving a bit and trying to get SAA AWS under my belt through my own money.
Does anyone know anyways I can make this cheaper for myself or better recommendations on what I should do

https://redd.it/1moy0q1
@r_devops
Understanding SAP

I’ve got a web shop project, that creates SAP orders, to manage and I need to get comfortable with the way SAP operates. Every company has their own way of implementations so I imagine there is no plug-and-play strat, but the docs I got are shit so I’m hoping there is some common ground. I have started going through BAPI tutorials since it’s the outer communication endpoint and maybe I’ll be able to understand the docs a little more.
I’ll appreciate any advice 🙏

https://redd.it/1mp1xag
@r_devops
Scaling open-source Jenkins vs. adopting CloudBees: What's the real tipping point?

Looking for some real-world takes on Jenkins scaling dilemma.

I work for a company with \~1500 employee size. Our self-managed Jenkins is hitting \~450 concurrent jobs, and we expect that number to keep climbing. We're at a crossroads: keep throwing more hardware at it or seriously consider CloudBees that offers horizontal scaling along with other enterprise features.

I'm trying to figure out the real tipping point.

For CloudBees customers: What pain point finally made you adopt Cloudbees? Did it truly solve your scaling problems, and was it worth the cost?
For Jenkins admins: How have you scaled past this point? Is there a practical limit to just beefing up the hardware?

Genuinely curious to hear your experience to make an informed decision. Thanks!

https://redd.it/1mp3bmz
@r_devops
Pro tip - avoid working at small no-names at all costs

Out of my 3 year career, 2 of those were spent at a small unknown eCommerce SaaS (it doesn't matter that they have/had interesting clients), and my job hunt is basically just:

* It doesn't matter that I have the skills that I have.

* It doesn't matter what I've done or achieved.

* It doesn't matter if I'm an exact match to the job description.

* Nothing about me, my work history, etc. matters.

* Because I didn't spend enough time at a bigger/more impactful company, and so I couldn't possibly be a viable person to hire.

I had 3 separate calls today all mentioning this directly. Back to square one, again (I'm crashing out if you can't tell).

https://redd.it/1mp5q6s
@r_devops
I built a LeetCode-style site for real-world Linux & DevOps debugging challenges

While preparing for my Meta Production Engineer interview, I realized there’s no good place to practice these Linux operations problems.

* Linux troubleshooting
* Bash scripting & automation
* Performance bottlenecks
* Networking misconfigurations
* Debugging weird production issues

So I built [sttrace.com](https://sttrace.com/), its a LeetCode-like platform, but for real-world software engineering ops problems.

Right now it only has 6 questions but I will add more soon. Let me know what you guys think.

🔗 [sttrace.com](https://sttrace.com/)

**PS:** Apologies if the website feels slow, currently it is hosted on my homelab.

https://redd.it/1mp6ott
@r_devops
Need Help with Elasticsearch, Redis, and Weighted Round Robin for Product Search System (Newbie Here!)

Hi everyone, I'm working on a search system for an e-commerce platform and need some advice. I'm a bit new to this, so please bear with me if I don't explain things perfectly. I'll try to break it down and would love your feedback on whether my approach makes sense or if I should do something different. Here's the setup:

# What I'm Trying to Do

I want to use **Elasticsearch** (for searching products) and **Redis** (for caching results to make searches faster) in my system. I also want to use **Weighted Round Robin (WRR)** to prioritize how products are shown. The idea is to balance **sponsored products** (paid promotions) and **non-sponsored products** (regular listings) so that both get fair visibility.

* **Per page**, I want to show **70 products**, with **15 of them being sponsored** (from different indices in Elasticsearch) and the rest non-sponsored.
* I want to split the sponsored and non-sponsored products into **separate WRR pools** to control how they’re displayed.

# My Weight Calculation for WRR

To decide which products get shown more often, I'm calculating a **weight** based on:

* **Product reviews** (positive feedback from customers)
* **Total product sales** (how many units sold)
* **Seller feedback** (how reliable the seller is)

Here's the formula I'm planning to use:
`Weight = 0.5 * (1 + log(productPositiveFeedback)) + 0.3 * (1 + log(totalProductSell)) + 0.2 * (1 + log(sellerFeedback))`

To make sure big sellers don’t dominate completely, I want to **cap the weight** in a way that balances things for new sellers. For example:

* If the calculated weight is above **10**, it gets counted as **11** (e.g., actual weight of 20 becomes 11).
* If it’s above **100**, it becomes **101** (e.g., actual weight of 960 becomes 101).
* So, a weight of **910** would count as **100**, and so on.

This way, I hope to give newer sellers a chance to compete with big sellers. **Question 1: Does this weight calculation and capping approach sound okay? Or is there a better way to balance things?**

# My Search Process

Here’s how I’m planning to handle searches:

1. When someone searches (e.g., "GTA 5"), the system first checks **Redis** for results.
2. If it’s not in Redis, it queries **Elasticsearch**, stores the results in Redis, and shows them on the UI.
3. This way, future searches for the same term are faster because they come from Redis.

**Question 2: Is this Redis + Elasticsearch approach good? How many products should I store in Redis per search to keep things efficient?** I don’t want to overload Redis with too much data.

# Handling Categories

My products are also organized by **categories** (e.g., electronics, games, etc.). **Question 3: Will my weight calculation mess up how products are shown within categories?** Like, will it prioritize certain products across all categories in a weird way?

# Search Term Overlap Issue

I noticed that if someone searches for **"GTA 5"** and I store those results in Redis, a search for just **"GTA"** might pull up a lot of the same GTA 5 products. Since both searches have similar data, **Question 4: Could this cause problems with how products are prioritized?** Like, is one search getting higher priority than it should?

# Where to Implement WRR

Finally, I’m unsure where to handle the **Weighted Round Robin logic**. Should I do it in **Elasticsearch** (when fetching results) or in **Redis** (when caching or serving results)? **Question 5: Which is better for WRR, and why?**

# Note for Readers

I’m pretty new to building systems like this, so I might not have explained everything perfectly. I’ve read about Elasticsearch, Redis, and WRR, but putting it all together is a bit overwhelming. I’d really appreciate it if you could explain things in a simple way or point out any big mistakes I’m making. If you need more details, let me know!

Thanks in advance for any help! 🙏

https://redd.it/1mpbkba
@r_devops