What’s your experience with an incident that you will never forget?
I would like to know your experiences how was the cross-team collaboration handled during the incident war room and what came out of the retrospective
https://redd.it/1ktzxzn
@r_devops
I would like to know your experiences how was the cross-team collaboration handled during the incident war room and what came out of the retrospective
https://redd.it/1ktzxzn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Differences in DB
Short version... I'm learning k8s right now. My lecture is using the example of using "redis as a DB in memory" > (worker app) > "postgreSQL DB as a persistent"... why can't one DB be used for both sides?
I hope this is just my lack of niche knowledge. My core concept understanding has been going so well
https://redd.it/1ku0grm
@r_devops
Short version... I'm learning k8s right now. My lecture is using the example of using "redis as a DB in memory" > (worker app) > "postgreSQL DB as a persistent"... why can't one DB be used for both sides?
I hope this is just my lack of niche knowledge. My core concept understanding has been going so well
https://redd.it/1ku0grm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Unethical question: should I lie about my experience?
Hello,
For the past year or so I’ve been working towards becoming a full time devops engineer (was a system integrator). Made countless projects, took courses, and had some freelance jobs. I even helped the devops team in my old workplace. Unfortunately these do not count, and I always get crossed out before I can prove myself, either by automated systems or HR, for not having the 2-3 years of required experience (this is the standard for junior positions where I live, no one hires without experience, unless you have a degree and even then…). After applying to every position available within 80km (around 100 jobs), I have yet to receive even a phone call.
Is it really that valuable? And if it is, how am I supposed get 2-3 years of experience, when no one hires me? I’m genuinely considering lying about my experience, at this point not even to get a job, just to see if my skills are enough for these positions.
I really don’t want to, and I think honesty and clarity are more important than anything, but I’m getting desperate.
Some people recommended me to take a related position (like sysadmin or sre), and move to devops later, but it takes a long time and it’s still somewhat of a gamble. Plus none of the things that got me interested in devops to begin with are a part of these roles.
What should I do?
https://redd.it/1ku2vlx
@r_devops
Hello,
For the past year or so I’ve been working towards becoming a full time devops engineer (was a system integrator). Made countless projects, took courses, and had some freelance jobs. I even helped the devops team in my old workplace. Unfortunately these do not count, and I always get crossed out before I can prove myself, either by automated systems or HR, for not having the 2-3 years of required experience (this is the standard for junior positions where I live, no one hires without experience, unless you have a degree and even then…). After applying to every position available within 80km (around 100 jobs), I have yet to receive even a phone call.
Is it really that valuable? And if it is, how am I supposed get 2-3 years of experience, when no one hires me? I’m genuinely considering lying about my experience, at this point not even to get a job, just to see if my skills are enough for these positions.
I really don’t want to, and I think honesty and clarity are more important than anything, but I’m getting desperate.
Some people recommended me to take a related position (like sysadmin or sre), and move to devops later, but it takes a long time and it’s still somewhat of a gamble. Plus none of the things that got me interested in devops to begin with are a part of these roles.
What should I do?
https://redd.it/1ku2vlx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I feel like a tool boy
I've been a devops engineer/SRE for years but lately got stuck. I've got chances to work with many toolchains: bootstraping kubernetes, build CI/CD: gitlabCI, github actions, argo, implement IaC with terraform, secret manage ment, use cloud (AWS), etc. I've learnt so many tooling practices. But lately i realized I don't really inderstand what's under the hood, what is the exact capacity of the infra, the parameters of db, redis... that we have to tune. Also I don't understand the biz that's running on my infra. I can hardly excel in operation. Anyone feel the same? Please give me some advice to grow.
https://redd.it/1ku44k4
@r_devops
I've been a devops engineer/SRE for years but lately got stuck. I've got chances to work with many toolchains: bootstraping kubernetes, build CI/CD: gitlabCI, github actions, argo, implement IaC with terraform, secret manage ment, use cloud (AWS), etc. I've learnt so many tooling practices. But lately i realized I don't really inderstand what's under the hood, what is the exact capacity of the infra, the parameters of db, redis... that we have to tune. Also I don't understand the biz that's running on my infra. I can hardly excel in operation. Anyone feel the same? Please give me some advice to grow.
https://redd.it/1ku44k4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
needs help in integrating two services using key pair auth via git actions
anyone here ever integrated two services especially graphana and snowflake with key pair auth via git actions?
please let me know any information or doc you can share if you know or worked on this shit
https://redd.it/1ku4n79
@r_devops
anyone here ever integrated two services especially graphana and snowflake with key pair auth via git actions?
please let me know any information or doc you can share if you know or worked on this shit
https://redd.it/1ku4n79
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
HIRING SRE Intern @ Fintech Company (Bangalore - India) 2025
Hey folks!
We're hiring an SRE (Site Reliability Engineering) Intern at a leading Fintech Company for 2025. If you're passionate about automation, reliability, and large-scale systems – this might be a great fit!
📍 Location: Bangalore - India (In-office)
🕒 Duration: 4-6 months (flexible start date, ideally June/July)
🎓 Who can apply: Final year B.Tech / M.Tech / MCA students or recent grads with a strong background in systems, infrastructure, or devops
💼 What You'll Work On:
Building automation to reduce toil in operations
Writing scripts and tools for service monitoring and alerting
Helping improve uptime and performance for distributed systems
Working with experienced SREs to improve system resilience and productivity
🛠️ What We're Looking For:
Strong Linux fundamentals & scripting (Python, Bash, Go preferred)
Familiarity with monitoring/logging tools and CI/CD pipelines
Eagerness to learn and contribute to real-world production systems
Previous project/internship experience in backend, devops, or infrastructure is a plus
📧 How to Apply:
DM me your resume. We'll review and get in touch with you quickly.
https://redd.it/1ku553h
@r_devops
Hey folks!
We're hiring an SRE (Site Reliability Engineering) Intern at a leading Fintech Company for 2025. If you're passionate about automation, reliability, and large-scale systems – this might be a great fit!
📍 Location: Bangalore - India (In-office)
🕒 Duration: 4-6 months (flexible start date, ideally June/July)
🎓 Who can apply: Final year B.Tech / M.Tech / MCA students or recent grads with a strong background in systems, infrastructure, or devops
💼 What You'll Work On:
Building automation to reduce toil in operations
Writing scripts and tools for service monitoring and alerting
Helping improve uptime and performance for distributed systems
Working with experienced SREs to improve system resilience and productivity
🛠️ What We're Looking For:
Strong Linux fundamentals & scripting (Python, Bash, Go preferred)
Familiarity with monitoring/logging tools and CI/CD pipelines
Eagerness to learn and contribute to real-world production systems
Previous project/internship experience in backend, devops, or infrastructure is a plus
📧 How to Apply:
DM me your resume. We'll review and get in touch with you quickly.
https://redd.it/1ku553h
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
"use AI, improve your productivity by 20%!" - meanwhile, a layoff org chart that cuts 50% of engineering including all non-seniors was found.
awful leadership, the worst decisions and lack of actual impact on the company that I've ever seen.
of course, they're still on the org chart post-layoffs :)
and as someone who uses those tools, I know they can't do the job, I know a couple seniors can't do the job of everyone magically with those tools, and I know the problem is not productivity but the terrible management without any clue about what we do.
I've been interviewing for a couple months now, companies all look for the exact tools they're using in the exact configuration they've set them up - no matter if you have 15+ years of experience with everything under the sun and a track record of becoming the go-to for any new thing after a month of working with it.
anyway, senior infrastructure engineer looking for an EU remote position, based in France. hit me up if you need someone who does good work on anything, but especially kubernetes.
https://redd.it/1ku6k5o
@r_devops
awful leadership, the worst decisions and lack of actual impact on the company that I've ever seen.
of course, they're still on the org chart post-layoffs :)
and as someone who uses those tools, I know they can't do the job, I know a couple seniors can't do the job of everyone magically with those tools, and I know the problem is not productivity but the terrible management without any clue about what we do.
I've been interviewing for a couple months now, companies all look for the exact tools they're using in the exact configuration they've set them up - no matter if you have 15+ years of experience with everything under the sun and a track record of becoming the go-to for any new thing after a month of working with it.
anyway, senior infrastructure engineer looking for an EU remote position, based in France. hit me up if you need someone who does good work on anything, but especially kubernetes.
https://redd.it/1ku6k5o
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Are kubernetes bundle certificates worth for me?
I come from SAP BASIS background where we don't actually work on Kubernetes.I am looking to upkill and move to devops . I already use kodekloud to learn kubernetes. Are completing kubernetes bundle be helpful in landing a job in devops considering the price of kubernetes bundle?
https://redd.it/1ku6989
@r_devops
I come from SAP BASIS background where we don't actually work on Kubernetes.I am looking to upkill and move to devops . I already use kodekloud to learn kubernetes. Are completing kubernetes bundle be helpful in landing a job in devops considering the price of kubernetes bundle?
https://redd.it/1ku6989
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I need help
Hi everyone,
I'm conducting academic research for my thesis on zero trust architectures in cloud security within large enterprises and I need your help!
If you work in cybersecurity or cloud security at a large enterprise, please consider taking a few minutes to complete my survey. Your insights are incredibly valuable for my data collection and your participation would be greatly appreciated.
https://forms.gle/pftNfoPTTDjrBbZf9
Thank you so much for your time and contribution!
https://redd.it/1kua7fk
@r_devops
Hi everyone,
I'm conducting academic research for my thesis on zero trust architectures in cloud security within large enterprises and I need your help!
If you work in cybersecurity or cloud security at a large enterprise, please consider taking a few minutes to complete my survey. Your insights are incredibly valuable for my data collection and your participation would be greatly appreciated.
https://forms.gle/pftNfoPTTDjrBbZf9
Thank you so much for your time and contribution!
https://redd.it/1kua7fk
@r_devops
Google Docs
Survey: Zero Trust Architectures in Large Enterprise Cloud Security
Thank you for participating in this survey for an MSc thesis research project on zero trust architectures in cloud security. Your insights as a professional working with or within large enterprises are invaluable. This survey is anonymous and will take approximately…
Dealing with huge amount of key/value pairs, environment variables, secrets - does a tool exist?
Hey all, I was wondering if anyone here knows if a tool exists that can do the following:
have the ability to read from multiple key-value + secrets "sources". Think local environment, k8s configmaps and secrets, files, vault, etc
take that as input and "initialize" the environment of a system/pod/container, placing config files and setting environment variables
The reason I'm asking is because litterally EVERY CI/CD env I've worked on where I wasn't involved from the start, seems to be this unholy mess of hardcoded arguments to command line tools, environment variables set in gitlab groups and projects, values.yamls with hardcoded or sometimes templated values, .env files, and env vars set in things like .gitlab-ci.yaml.
It's a total maintenance nightmare, dealing with 800+ key/values and secrets set all over the place, redundancy, duplicates.. I've been trying to have a look at the problem more abstractly and figured the following:
1. I have essentially two broad worlds I need key-value pairs and secrets in: build-time (during the creation and testing of software artifacts) and run-time (when the created software is invoked)
2. It would be marvelous if some sort of
3. Having this
4. Tool would ideally run without need for any service component, and with as little dependencies as possible
Anyway, my reason for posting was: maybe some of you had these same experiences and thoughts about it + maybe some of you know of a tool which does more or less that.
https://redd.it/1kualuf
@r_devops
Hey all, I was wondering if anyone here knows if a tool exists that can do the following:
have the ability to read from multiple key-value + secrets "sources". Think local environment, k8s configmaps and secrets, files, vault, etc
take that as input and "initialize" the environment of a system/pod/container, placing config files and setting environment variables
The reason I'm asking is because litterally EVERY CI/CD env I've worked on where I wasn't involved from the start, seems to be this unholy mess of hardcoded arguments to command line tools, environment variables set in gitlab groups and projects, values.yamls with hardcoded or sometimes templated values, .env files, and env vars set in things like .gitlab-ci.yaml.
It's a total maintenance nightmare, dealing with 800+ key/values and secrets set all over the place, redundancy, duplicates.. I've been trying to have a look at the problem more abstractly and figured the following:
1. I have essentially two broad worlds I need key-value pairs and secrets in: build-time (during the creation and testing of software artifacts) and run-time (when the created software is invoked)
2. It would be marvelous if some sort of
init-thing existed which could take those key-value pairs and secrets from multiple sources and initialize an environment before build steps or runtime execution occurs. Initialize in this context would mean setting/constructing env vars and placing config files at some filesystem location, where these files run through a template of sorts.3. Having this
init-thing would then make it possible to harmonize where key/values and secrets come from, since the init-thing abstracts it away (I.e., you could change the source of a k/v from a configmap in kubernetes to an env file somewhere else - init-thing doesn't care where it comes from and will initialize the environment all the same)4. Tool would ideally run without need for any service component, and with as little dependencies as possible
Anyway, my reason for posting was: maybe some of you had these same experiences and thoughts about it + maybe some of you know of a tool which does more or less that.
https://redd.it/1kualuf
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Kubernetes take home assignment - eks
How would you build kubernetes on eks for a take home assignment for a job? I’ve built the terraform with a plan and deploy pipeline, a docker image creation pipeline to push to ecr
would you just run the kubernetes manifest files from kubectl/eksctl via terminal for setup or pipeline them also?
Assignment is just building a 3 tier web app using the tech stack i listed, anything else is a bonus
TIA
https://redd.it/1kubqa5
@r_devops
How would you build kubernetes on eks for a take home assignment for a job? I’ve built the terraform with a plan and deploy pipeline, a docker image creation pipeline to push to ecr
would you just run the kubernetes manifest files from kubectl/eksctl via terminal for setup or pipeline them also?
Assignment is just building a 3 tier web app using the tech stack i listed, anything else is a bonus
TIA
https://redd.it/1kubqa5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Free DevOps projects websites
Hi, I approached a couple of "tech influencers" to share this list however, they have not done it. I don't what the story behind 'not sharing free resources is'. The only reason I asked them is because they have a higher audience reach. So, I decided to do this myself.
I hope this helps people who are new to the field of DevOps or even experienced people. Some of them don't need a test environment. Please feel free to add if you know more. I will keep updating this post.
P.S. I do not own any of these. If you own any of them and want them removed from this list (for whatever reasons), please do let me know. I will remove them.
Linux
https://linuxupskillchallenge.org/
https://overthewire.org/wargames/
DevOps
https://workshops.aws/
https://kodekloud.com/free-labs
https://sadservers.com/scenarios
https://labs.iximiuz.com/
https://devopsupskillchallenge.com/
https://engineer.kodekloud.com/practice
https://cloudresumechallenge.dev/docs/the-challenge/aws/
https://learngitbranching.js.org/
https://labs.play-with-docker.com/
https://madhuakula.com/kubernetes-goat/
https://github.com/bregman-arie/devops-exercises
https://redd.it/1kudmi2
@r_devops
Hi, I approached a couple of "tech influencers" to share this list however, they have not done it. I don't what the story behind 'not sharing free resources is'. The only reason I asked them is because they have a higher audience reach. So, I decided to do this myself.
I hope this helps people who are new to the field of DevOps or even experienced people. Some of them don't need a test environment. Please feel free to add if you know more. I will keep updating this post.
P.S. I do not own any of these. If you own any of them and want them removed from this list (for whatever reasons), please do let me know. I will remove them.
Linux
https://linuxupskillchallenge.org/
https://overthewire.org/wargames/
DevOps
https://workshops.aws/
https://kodekloud.com/free-labs
https://sadservers.com/scenarios
https://labs.iximiuz.com/
https://devopsupskillchallenge.com/
https://engineer.kodekloud.com/practice
https://cloudresumechallenge.dev/docs/the-challenge/aws/
https://learngitbranching.js.org/
https://labs.play-with-docker.com/
https://madhuakula.com/kubernetes-goat/
https://github.com/bregman-arie/devops-exercises
https://redd.it/1kudmi2
@r_devops
linuxupskillchallenge.org
Linux Upskill Challenge - Linux Upskill Challenge
A month-long course aimed at those who aspire to get Linux-related jobs in the industry - junior Linux sysadmin, DevOps-related work, and similar. Learn the skills required to sysadmin a remote Linux server from the commandline.
Where do you store your documentation ? Or what tool do you use
I’m looking for different documentation tools I could use in my organization. From complex technical docs to the simple todos, what do you guys use?
https://redd.it/1kueea3
@r_devops
I’m looking for different documentation tools I could use in my organization. From complex technical docs to the simple todos, what do you guys use?
https://redd.it/1kueea3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
ELI5: CAP Theorem in System Design
This is a super simple ELI5 explanation of the CAP Theorem. I mainly wrote it because I found that sources online are either not concise or lack important points. I included two system design examples where CAP Theorem is used to make design decision. Maybe this is helpful to some of you :-) Here is the repo: https://github.com/LukasNiessen/cap-theorem-explained
## Super simple explanation
C = Consistency = Every user gets the same data
A = Availability = Users can retrieve the data always
P = Partition tolerance = Even if there are network issues, everything works fine still
Now the CAP Theorem states that in a distributed system, you need to decide whether you want consistency or availability. You cannot have both.
### Questions
And in non-distributed systems? CAP Theorem only applies to distributed systems. If you only have one database, you can totally have both. (Unless that DB server if down obviously, then you have neither.
Is this always the case? No, if everything is green, we have both, consistency and availability. However, if a server looses internet access for example, or there is any other fault that occurs, THEN we have only one of the two, that is either have consistency or availability.
### Example
As I said already, the problems only arises, when we have some sort of fault. Let's look at this example.
Normal operation: Everything works fine. US users write to master, changes replicate to Europe, EU users read consistent data.
Network partition happens: The connection between US and Europe breaks.
Now we have two choices:
Choice 1: Prioritize Consistency (CP)
- EU users get error messages: "Database unavailable"
- Only US users can access the system
- Data stays consistent but availability is lost for EU users
Choice 2: Prioritize Availability (AP)
- EU users can still read/write to the EU replica
- US users continue using the US master
- Both regions work, but data becomes inconsistent (EU might have old data)
## What are Network Partitions?
Network partitions are when parts of your distributed system can't talk to each other. Think of it like this:
- Your servers are like people in different rooms
- Network partitions are like the doors between rooms getting stuck
- People in each room can still talk to each other, but can't communicate with other rooms
Common causes:
- Internet connection failures
- Router crashes
- Cable cuts
- Data center outages
- Firewall issues
The key thing is: partitions WILL happen. It's not a matter of if, but when.
## The "2 out of 3" Misunderstanding
CAP Theorem is often presented as "pick 2 out of 3." This is wrong.
Partition tolerance is not optional. In distributed systems, network partitions will happen. You can't choose to "not have" partitions - they're a fact of life, like rain or traffic jams... :-)
So our choice is: When a partition happens, do you want Consistency OR Availability?
- CP Systems: When a partition occurs
This is a super simple ELI5 explanation of the CAP Theorem. I mainly wrote it because I found that sources online are either not concise or lack important points. I included two system design examples where CAP Theorem is used to make design decision. Maybe this is helpful to some of you :-) Here is the repo: https://github.com/LukasNiessen/cap-theorem-explained
## Super simple explanation
C = Consistency = Every user gets the same data
A = Availability = Users can retrieve the data always
P = Partition tolerance = Even if there are network issues, everything works fine still
Now the CAP Theorem states that in a distributed system, you need to decide whether you want consistency or availability. You cannot have both.
### Questions
And in non-distributed systems? CAP Theorem only applies to distributed systems. If you only have one database, you can totally have both. (Unless that DB server if down obviously, then you have neither.
Is this always the case? No, if everything is green, we have both, consistency and availability. However, if a server looses internet access for example, or there is any other fault that occurs, THEN we have only one of the two, that is either have consistency or availability.
### Example
As I said already, the problems only arises, when we have some sort of fault. Let's look at this example.
US (Master) Europe (Replica)
┌─────────────┐ ┌─────────────┐
│ │ │ │
│ Database │◄──────────────►│ Database │
│ Master │ Network │ Replica │
│ │ Replication │ │
└─────────────┘ └─────────────┘
│ │
│ │
▼ ▼
[US Users] [EU Users]
Normal operation: Everything works fine. US users write to master, changes replicate to Europe, EU users read consistent data.
Network partition happens: The connection between US and Europe breaks.
US (Master) Europe (Replica)
┌─────────────┐ ┌─────────────┐
│ │ ╳╳╳╳╳╳╳ │ │
│ Database │◄────╳╳╳╳╳─────►│ Database │
│ Master │ ╳╳╳╳╳╳╳ │ Replica │
│ │ Network │ │
└─────────────┘ Fault └─────────────┘
│ │
│ │
▼ ▼
[US Users] [EU Users]
Now we have two choices:
Choice 1: Prioritize Consistency (CP)
- EU users get error messages: "Database unavailable"
- Only US users can access the system
- Data stays consistent but availability is lost for EU users
Choice 2: Prioritize Availability (AP)
- EU users can still read/write to the EU replica
- US users continue using the US master
- Both regions work, but data becomes inconsistent (EU might have old data)
## What are Network Partitions?
Network partitions are when parts of your distributed system can't talk to each other. Think of it like this:
- Your servers are like people in different rooms
- Network partitions are like the doors between rooms getting stuck
- People in each room can still talk to each other, but can't communicate with other rooms
Common causes:
- Internet connection failures
- Router crashes
- Cable cuts
- Data center outages
- Firewall issues
The key thing is: partitions WILL happen. It's not a matter of if, but when.
## The "2 out of 3" Misunderstanding
CAP Theorem is often presented as "pick 2 out of 3." This is wrong.
Partition tolerance is not optional. In distributed systems, network partitions will happen. You can't choose to "not have" partitions - they're a fact of life, like rain or traffic jams... :-)
So our choice is: When a partition happens, do you want Consistency OR Availability?
- CP Systems: When a partition occurs
GitHub
GitHub - LukasNiessen/cap-theorem-explained: CAP theorem explained. Everything in the context of system design
CAP theorem explained. Everything in the context of system design - LukasNiessen/cap-theorem-explained
→ node stops responding to maintain consistency
- AP Systems: When a partition occurs → node keeps responding but users may get inconsistent data
In other words, it's not "pick 2 out of 3," it's "partitions will happen, so pick C or A."
## System Design Example 1: Social Media Feed
Scenario: Building Netflix
Decision: Prioritize Availability (AP)
Why? If some users see slightly outdated movie names for a few seconds, it's not a big deal. But if the users cannot watch movies at all, they will be very unhappy.
## System Design Example 2: Flight Booking System
In here, we will not apply CAP Theorem to the entire system but to parts of the system. So we have two different parts with different priorities:
### Part 1: Flight Search
Scenario: Users browsing and searching for flights
Decision: Prioritize Availability
Why? Users want to browse flights even if prices/availability might be slightly outdated. Better to show approximate results than no results.
### Part 2: Flight Booking
Scenario: User actually purchasing a ticket
Decision: Prioritize Consistency
Why? If we would prioritize availibility here, we might sell the same seat to two different users. Very bad. We need strong consistency here.
### PS: Architectural Quantum
What I just described, having two different scopes, is the concept of having more than one architecture quantum. There is a lot of interesting stuff online to read about the concept of architecture quanta :-)
https://redd.it/1kufxrm
@r_devops
- AP Systems: When a partition occurs → node keeps responding but users may get inconsistent data
In other words, it's not "pick 2 out of 3," it's "partitions will happen, so pick C or A."
## System Design Example 1: Social Media Feed
Scenario: Building Netflix
Decision: Prioritize Availability (AP)
Why? If some users see slightly outdated movie names for a few seconds, it's not a big deal. But if the users cannot watch movies at all, they will be very unhappy.
## System Design Example 2: Flight Booking System
In here, we will not apply CAP Theorem to the entire system but to parts of the system. So we have two different parts with different priorities:
### Part 1: Flight Search
Scenario: Users browsing and searching for flights
Decision: Prioritize Availability
Why? Users want to browse flights even if prices/availability might be slightly outdated. Better to show approximate results than no results.
### Part 2: Flight Booking
Scenario: User actually purchasing a ticket
Decision: Prioritize Consistency
Why? If we would prioritize availibility here, we might sell the same seat to two different users. Very bad. We need strong consistency here.
### PS: Architectural Quantum
What I just described, having two different scopes, is the concept of having more than one architecture quantum. There is a lot of interesting stuff online to read about the concept of architecture quanta :-)
https://redd.it/1kufxrm
@r_devops
Reddit
From the devops community on Reddit: ELI5: CAP Theorem in System Design
Explore this post and more from the devops community
Quick update: That “I’ll fix your infra in 48 hours” post kinda blew up
Didn’t expect this, but that post got over 220k views, 180+ comments, and around 70 DMs.
Spent the last two weeks helping people fix all kinds of things weird CI bugs, Terraform headaches, K8s issues, GPU cost blowups… the usual chaos. A few folks just needed a nudge in the right direction, others had full-on dumpster fires.
Out of all that, 12 people offered legit work. I stuck with 3-4 of them , we’ve been deep in infra stuff for the past couple weeks and it's honestly been solid.
Here’s the part I need your help with now:
IF YOU’RE DEALING WITH INFRA OR DEVOPS PAIN RIGHT NOW . I’D LOVE TO KNOW WHAT IT IS.
Also curious what tools you’re using daily.
Drop anything even just a one-liner it’ll help me see what patterns are popping up across teams.
Still around and still down to help. Let’s keep it going.
https://redd.it/1kuhnxm
@r_devops
Didn’t expect this, but that post got over 220k views, 180+ comments, and around 70 DMs.
Spent the last two weeks helping people fix all kinds of things weird CI bugs, Terraform headaches, K8s issues, GPU cost blowups… the usual chaos. A few folks just needed a nudge in the right direction, others had full-on dumpster fires.
Out of all that, 12 people offered legit work. I stuck with 3-4 of them , we’ve been deep in infra stuff for the past couple weeks and it's honestly been solid.
Here’s the part I need your help with now:
IF YOU’RE DEALING WITH INFRA OR DEVOPS PAIN RIGHT NOW . I’D LOVE TO KNOW WHAT IT IS.
Also curious what tools you’re using daily.
Drop anything even just a one-liner it’ll help me see what patterns are popping up across teams.
Still around and still down to help. Let’s keep it going.
https://redd.it/1kuhnxm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What’s one DevOps tool you still don’t fully trust?
I’ll go first: Helm.
I’ve used it in multiple projects, and yeah, it’s powerful—but it always feels like I’m one typo away from chaos. Templating gone wrong, values.yaml overrides not working, random “why is this resource even here” moments…
Same goes for Ansible sometimes—like I blink and it rewrites half my infra.
Do you have a tool like that?
One you use, but always double-check… just in case?
https://redd.it/1kui6os
@r_devops
I’ll go first: Helm.
I’ve used it in multiple projects, and yeah, it’s powerful—but it always feels like I’m one typo away from chaos. Templating gone wrong, values.yaml overrides not working, random “why is this resource even here” moments…
Same goes for Ansible sometimes—like I blink and it rewrites half my infra.
Do you have a tool like that?
One you use, but always double-check… just in case?
https://redd.it/1kui6os
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Saving 50%+ off our $80K cloud monitoring bill cont'd
Checking back in my last post diving into piloting new cloud monitoring infra to tackle my client's ridiculous $80K/month o11y bill.
As planned, we expanded the pilot, getting ton more services and traffic flowing through the BYOC eBPF/OTEL setup.
The concerns about having to manage the GC stack completely miss the fully-managed point. The stack runs on our infrastructure but is 100% managed by the GC team. There is no tuning ClickHouse or monitoring it they do it all for us, and that was exactly what happened. We get an endpoint to send data to, and that’s it.
Reality vs. Sales Pitch / "Gotchas": With the BYOC approach, the customer (or my client) is the one paying for the infrastructure, so TCO is more complex (subscription + hosting) and required more back and forth up and down the chain of command. We also had to make sure all the incentives were aligned and that GC could help us optimize the infrastructure and the data stored. In other words, pay for only what we use.
I've yet to put it to the test, but G community slack channels are monitored (but NOT enterprise SLA). This is passable for now and my team will find out in the coming months.
A few key learnings during and immediately after the migration process:
\- Search syntax takes time to wrap our head around. Docs could be expanded much more.
\- Prometheus compatibility was super critical (we missed this completely during the requirement phase), but thankfully PromQL queries converted 1:1.
\- Migration tools to convert dashboards & monitors was nice touch.
Ok tldr; of everything so far, we saved money by
1. Better data tiering by reducing hot logging down to 7 days, 90 days cold for compliance.
2. Unified platforms (MELT + RUM, Hybrid eBPF/OTEL)
3. Ownning infra at no management overhead
No question at this time, I'm going to sign off and enjoy the memorial day long weekend.
https://redd.it/1kuh0t1
@r_devops
Checking back in my last post diving into piloting new cloud monitoring infra to tackle my client's ridiculous $80K/month o11y bill.
As planned, we expanded the pilot, getting ton more services and traffic flowing through the BYOC eBPF/OTEL setup.
The concerns about having to manage the GC stack completely miss the fully-managed point. The stack runs on our infrastructure but is 100% managed by the GC team. There is no tuning ClickHouse or monitoring it they do it all for us, and that was exactly what happened. We get an endpoint to send data to, and that’s it.
Reality vs. Sales Pitch / "Gotchas": With the BYOC approach, the customer (or my client) is the one paying for the infrastructure, so TCO is more complex (subscription + hosting) and required more back and forth up and down the chain of command. We also had to make sure all the incentives were aligned and that GC could help us optimize the infrastructure and the data stored. In other words, pay for only what we use.
I've yet to put it to the test, but G community slack channels are monitored (but NOT enterprise SLA). This is passable for now and my team will find out in the coming months.
A few key learnings during and immediately after the migration process:
\- Search syntax takes time to wrap our head around. Docs could be expanded much more.
\- Prometheus compatibility was super critical (we missed this completely during the requirement phase), but thankfully PromQL queries converted 1:1.
\- Migration tools to convert dashboards & monitors was nice touch.
Ok tldr; of everything so far, we saved money by
1. Better data tiering by reducing hot logging down to 7 days, 90 days cold for compliance.
2. Unified platforms (MELT + RUM, Hybrid eBPF/OTEL)
3. Ownning infra at no management overhead
No question at this time, I'm going to sign off and enjoy the memorial day long weekend.
https://redd.it/1kuh0t1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Hey everyone, I hope this is okay to post here – just looking for a few people to beta test a tool I’m working on.
I’ve been working on a tool that helps businesses get more Google reviews by automating the process of asking for them through simple text templates. It’s a service I’m calling STARSLIFT, and I’d love to get some real-world feedback before fully launching it.
Here’s what it does:
✅ Automates the process of asking your customers for Google reviews via SMS
✅ Lets you track reviews and see how fast you’re growing (review velocity)
✅ Designed for service-based businesses who want more reviews but don’t have time to manually ask
Right now, I’m looking for a few U.S.-based businesses willing to test it completely free. The goal is to see how it works in real-world settings and get feedback on how to improve it.
If you:
Are a service-based business in the U.S. (think contractors, salons, dog groomers, plumbers, etc)
Get at least 5-20 customers a day
Are interested in trying it out for a few weeks
… I’d love to connect.
As a thank you, you’ll get free access even after the beta ends.
If this sounds interesting, just drop a comment or DM me with:
What kind of business you have
How many customers you typically serve in a day
Whether you’re in the U.S.
I’ll get back to you and set you up! No strings attached – this is just for me to get feedback and for you to (hopefully) get more reviews for your business.
https://redd.it/1kutyuv
@r_devops
I’ve been working on a tool that helps businesses get more Google reviews by automating the process of asking for them through simple text templates. It’s a service I’m calling STARSLIFT, and I’d love to get some real-world feedback before fully launching it.
Here’s what it does:
✅ Automates the process of asking your customers for Google reviews via SMS
✅ Lets you track reviews and see how fast you’re growing (review velocity)
✅ Designed for service-based businesses who want more reviews but don’t have time to manually ask
Right now, I’m looking for a few U.S.-based businesses willing to test it completely free. The goal is to see how it works in real-world settings and get feedback on how to improve it.
If you:
Are a service-based business in the U.S. (think contractors, salons, dog groomers, plumbers, etc)
Get at least 5-20 customers a day
Are interested in trying it out for a few weeks
… I’d love to connect.
As a thank you, you’ll get free access even after the beta ends.
If this sounds interesting, just drop a comment or DM me with:
What kind of business you have
How many customers you typically serve in a day
Whether you’re in the U.S.
I’ll get back to you and set you up! No strings attached – this is just for me to get feedback and for you to (hopefully) get more reviews for your business.
https://redd.it/1kutyuv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What is the best way to learn Devops?
I am a MERN stack developer (Starting my 4th year in IT) and the way I learnt MERN is I learnt the basics of each part and started watching people build projects and build alongside them and when I didnt understand a piece of code I would use ChatGPT and document that particular concept. After 1-2 projects, I started building basic stuff.
TLDR; Learnt mern stack by YT and AI
Unfortunately I cant do the same with Devops because the concepts are too theoretical i presume. So is there something you have that will help me learn it?
PS: Sorry for the long description. Thank you for any advice.
https://redd.it/1kuw1sm
@r_devops
I am a MERN stack developer (Starting my 4th year in IT) and the way I learnt MERN is I learnt the basics of each part and started watching people build projects and build alongside them and when I didnt understand a piece of code I would use ChatGPT and document that particular concept. After 1-2 projects, I started building basic stuff.
TLDR; Learnt mern stack by YT and AI
Unfortunately I cant do the same with Devops because the concepts are too theoretical i presume. So is there something you have that will help me learn it?
PS: Sorry for the long description. Thank you for any advice.
https://redd.it/1kuw1sm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
🚀 Milestone Unlocked: 2K Stars! 🌟
🚀 Milestone Unlocked: 2K Stars! 🌟
My Cheat-Sheet Collection just hit 2,000 stars on GitHub!
Huge thanks to everyone who starred, shared, and contributed. Your support keeps this project growing. 🙌
If you haven't checked it out yet — it's a curated collection of high-quality PDF cheat sheets for developers, DevOps engineers, and tech enthusiasts. 📚💻
Feel free to explore, contribute, and share!
\#DevOps #CheatSheet #GitHub #OpenSource #Infosec #DevSecOps #Kubernetes #Linux
https://redd.it/1kuxk2d
@r_devops
🚀 Milestone Unlocked: 2K Stars! 🌟
My Cheat-Sheet Collection just hit 2,000 stars on GitHub!
Huge thanks to everyone who starred, shared, and contributed. Your support keeps this project growing. 🙌
If you haven't checked it out yet — it's a curated collection of high-quality PDF cheat sheets for developers, DevOps engineers, and tech enthusiasts. 📚💻
Feel free to explore, contribute, and share!
\#DevOps #CheatSheet #GitHub #OpenSource #Infosec #DevSecOps #Kubernetes #Linux
https://redd.it/1kuxk2d
@r_devops
GitHub
GitHub - sk3pp3r/cheat-sheet-pdf: 📜 A Cheat-Sheet Collection from the WWW
📜 A Cheat-Sheet Collection from the WWW. Contribute to sk3pp3r/cheat-sheet-pdf development by creating an account on GitHub.