Python learning path
Hey guys wanted to learn python , for quite a while now, could someone please suggest any resources that are useful , I have worked with python a bit tweaking code here and there .
Could someone please share a course that they have found useful.
Also is it worth to put in learning efforts , especially when ai is there?
https://redd.it/1lo31ki
@r_devops
Hey guys wanted to learn python , for quite a while now, could someone please suggest any resources that are useful , I have worked with python a bit tweaking code here and there .
Could someone please share a course that they have found useful.
Also is it worth to put in learning efforts , especially when ai is there?
https://redd.it/1lo31ki
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Certified Kubernetes Administrator (CKA) Exam Guide - V1.32 (2025)
Your ultimate resource for acing the CKA exam on your first attempt! This repo offers detailed explanations, hands-on labs, and essential study materials, empowering aspiring Kubernetes administrators to master their skills and achieve certification success. Unlock your Kubernetes potential today!
https://github.com/techwithmohamed/CKA-Certified-Kubernetes-Administrator
https://redd.it/1lo3aba
@r_devops
Your ultimate resource for acing the CKA exam on your first attempt! This repo offers detailed explanations, hands-on labs, and essential study materials, empowering aspiring Kubernetes administrators to master their skills and achieve certification success. Unlock your Kubernetes potential today!
https://github.com/techwithmohamed/CKA-Certified-Kubernetes-Administrator
https://redd.it/1lo3aba
@r_devops
GitHub
GitHub - techwithmohamed/CKA-Certified-Kubernetes-Administrator: CKA Certification Exam Guide 2026 — study notes, practice questions…
CKA Certification Exam Guide 2026 — study notes, practice questions, kubectl cheat sheet, exam tips, and full Kubernetes v1.35 syllabus breakdown. Covers etcd backup, RBAC, kubeadm, Gateway API, Ne...
Got Amazon Devops 2 interview in a few days!
Got Amazon Devops 2 interview in a few days! Pls if someone can help me with what to prepare and what type of questions I can expect in the interview. Thank you
https://redd.it/1lo4p8n
@r_devops
Got Amazon Devops 2 interview in a few days! Pls if someone can help me with what to prepare and what type of questions I can expect in the interview. Thank you
https://redd.it/1lo4p8n
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I'm getting an error after certificate renewal please help
Hello,
My Kubernetes cluster was running smoothly until I tried to renew the certificates after they expired. I ran the following commands:
>sudo kubeadm certs renew all
>echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> \~/.bashrc
>source \~/.bashrc
After that, some abnormalities started to appear in my cluster. Calico is completely down and even after deleting and reinstalling it, it does not come back up at all.
When I check the daemonsets and deployments in the kube-system namespace, I see:
>kubectl get daemonset -n kube-system
>NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
>calico-node 0 0 0 0 0 kubernetes.io/os=linux 4m4s
>
>kubectl get deployments -n kube-system
>NAME READY UP-TO-DATE AVAILABLE AGE
>calico-kube-controllers 0/1 0 0 4m19s
Before this, I was also getting "unauthorized" errors in the kubelet logs, which started after renewing the certificates. This is definitely abnormal because the pods created from deployments are not coming up and remain stuck.
There is no error message shown during deployment either. Please help.
https://redd.it/1lo52hc
@r_devops
Hello,
My Kubernetes cluster was running smoothly until I tried to renew the certificates after they expired. I ran the following commands:
>sudo kubeadm certs renew all
>echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> \~/.bashrc
>source \~/.bashrc
After that, some abnormalities started to appear in my cluster. Calico is completely down and even after deleting and reinstalling it, it does not come back up at all.
When I check the daemonsets and deployments in the kube-system namespace, I see:
>kubectl get daemonset -n kube-system
>NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
>calico-node 0 0 0 0 0 kubernetes.io/os=linux 4m4s
>
>kubectl get deployments -n kube-system
>NAME READY UP-TO-DATE AVAILABLE AGE
>calico-kube-controllers 0/1 0 0 4m19s
Before this, I was also getting "unauthorized" errors in the kubelet logs, which started after renewing the certificates. This is definitely abnormal because the pods created from deployments are not coming up and remain stuck.
There is no error message shown during deployment either. Please help.
https://redd.it/1lo52hc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Update: DockedUp v1.0.0 release, check the demo once !!!
Hey r/devops!
Last week I introduced **DockedUp** — a real-time, interactive terminal dashboard for managing Docker containers. Thanks so much for the support and feedback! 🙌
I’ve just pushed a big update with performance improvements, better logs, and smoother UI — plus a new demo to show it off:
**Check out the new demo GIF**
### Install via pip or pipx:
### Then just run:
#### Links:
GitHub: [github.com/anilrajrimal1/dockedup](https://github.com/anilrajrimal1/dockedup)
PyPI: pypi.org/project/dockedup
https://redd.it/1loa4j8
@r_devops
Hey r/devops!
Last week I introduced **DockedUp** — a real-time, interactive terminal dashboard for managing Docker containers. Thanks so much for the support and feedback! 🙌
I’ve just pushed a big update with performance improvements, better logs, and smoother UI — plus a new demo to show it off:
**Check out the new demo GIF**
### Install via pip or pipx:
pipx install dockedup
### or
pip install dockedup
### Then just run:
dockedup
#### Links:
GitHub: [github.com/anilrajrimal1/dockedup](https://github.com/anilrajrimal1/dockedup)
PyPI: pypi.org/project/dockedup
https://redd.it/1loa4j8
@r_devops
GitHub
GitHub - anilrajrimal1/dockedup: A real-time, interactive CLI dashboard for monitoring Docker containers. View status, health,…
A real-time, interactive CLI dashboard for monitoring Docker containers. View status, health, CPU, and memory usage with a clean, color-coded interface. Supports docker-compose grouping and hotkeys...
Suggestions for an innovation sprint project? What useful new concepts or tech is 'trending'?
We are planning an innovation sprint (1 week to create a demo/PoC for a green-field project, 1 week to finalise, prep slides and demonstrate) and are at the ideas stage. I had hard plans of what I wanted to use the time for which were completely trainwrecked by a late directive to fit RnD tax credits.
I'm now in a position where I am absolutely uninterested and would like some help taking back some control of this valuable time - and not get roped in as a 6th person working on a 'support hub chat bot' project.
Any suggestions for things to consider?
\- Is there somewhere I follow for good coverage of new trends and evolution in the DevOps field?
\- We have aks clusters in azure for deployments without any tools like Kubecost implemented. Could be a good way to brush up on my k8s/helm knowledge and deliver something that would look good in my annual review if it manages any costs savings?
Thanks for any advice!
https://redd.it/1loa5w5
@r_devops
We are planning an innovation sprint (1 week to create a demo/PoC for a green-field project, 1 week to finalise, prep slides and demonstrate) and are at the ideas stage. I had hard plans of what I wanted to use the time for which were completely trainwrecked by a late directive to fit RnD tax credits.
I'm now in a position where I am absolutely uninterested and would like some help taking back some control of this valuable time - and not get roped in as a 6th person working on a 'support hub chat bot' project.
Any suggestions for things to consider?
\- Is there somewhere I follow for good coverage of new trends and evolution in the DevOps field?
\- We have aks clusters in azure for deployments without any tools like Kubecost implemented. Could be a good way to brush up on my k8s/helm knowledge and deliver something that would look good in my annual review if it manages any costs savings?
Thanks for any advice!
https://redd.it/1loa5w5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is it possible to route non http traffic by DNS with Istio
My assumption is no, but maybe there’s something that would work
Let’s say I have a JDBC connection for 3 databases db1.com, db2.com, db3.com
In K8 with istio virtual services/gateway (without multiple load balancers) is it possible for all 3 connections to listen on tcp 5432 and then route to a db in a specific namespace
Example, assume the LB in the 3 is the exact same
User (db1) —> LB(5432) —> namespace 1
User (db2) —> LB(5432) —> namespace 2
User (db3) —> LB(5432) —> namespace 3
My assumption as this isn’t http we’d be looking at L4 meaning the DNS would be unknown to us/not usable.
Is this correct? Is there anyway to do the above for a DB tcp connection with a single LB/port but route to namespaces based on the DNS name?
https://redd.it/1lodag9
@r_devops
My assumption is no, but maybe there’s something that would work
Let’s say I have a JDBC connection for 3 databases db1.com, db2.com, db3.com
In K8 with istio virtual services/gateway (without multiple load balancers) is it possible for all 3 connections to listen on tcp 5432 and then route to a db in a specific namespace
Example, assume the LB in the 3 is the exact same
User (db1) —> LB(5432) —> namespace 1
User (db2) —> LB(5432) —> namespace 2
User (db3) —> LB(5432) —> namespace 3
My assumption as this isn’t http we’d be looking at L4 meaning the DNS would be unknown to us/not usable.
Is this correct? Is there anyway to do the above for a DB tcp connection with a single LB/port but route to namespaces based on the DNS name?
https://redd.it/1lodag9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Good observability tooling doesn’t mean teams actually understand it
Been an engineering manager at a large org for close to three years now. We’re not exactly a “digitally native” company, but we have \~5K developers. Platform org has solid observability tooling (LGTM stack, decent golden paths).
What I keep seeing though - both in my team and across the org - is that product engineers rarely understand the nuances of the “three pillars” of observability - logs, metrics, and traces.
Not because they’re careless, but because their cognitive budget is limited. They're focused on delivering product value, and learning three completely different mental models for telemetry is a real cost.
Even with good platform support, that knowledge gap has real implications -
* Slower incident response and triage
* Platform teams needing to educate and support a lot more
* Alert fatigue and poor signal-to-noise ratios
I wrote up [some thoughts](https://open.substack.com/pub/musingsonsoftware/p/org-implications-of-contemporary?r=57p3s&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) on why these three pillars exist (hint - it’s storage and query constraints) and what that means for teams trying to build observability maturity -
* Metrics, logs, and traces are separate because they store and query data differently.
* That separation forces dev teams to learn three mental models.
* Even with “golden path” tooling, you can’t fully outsource that cognitive load.
* We should be thinking about unified developer experience, not just unified tooling.
Curious if others here have seen the same gap between tooling maturity and team understanding and if you do I'm eager to understand how you address it in your orgs.
https://redd.it/1loes4q
@r_devops
Been an engineering manager at a large org for close to three years now. We’re not exactly a “digitally native” company, but we have \~5K developers. Platform org has solid observability tooling (LGTM stack, decent golden paths).
What I keep seeing though - both in my team and across the org - is that product engineers rarely understand the nuances of the “three pillars” of observability - logs, metrics, and traces.
Not because they’re careless, but because their cognitive budget is limited. They're focused on delivering product value, and learning three completely different mental models for telemetry is a real cost.
Even with good platform support, that knowledge gap has real implications -
* Slower incident response and triage
* Platform teams needing to educate and support a lot more
* Alert fatigue and poor signal-to-noise ratios
I wrote up [some thoughts](https://open.substack.com/pub/musingsonsoftware/p/org-implications-of-contemporary?r=57p3s&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) on why these three pillars exist (hint - it’s storage and query constraints) and what that means for teams trying to build observability maturity -
* Metrics, logs, and traces are separate because they store and query data differently.
* That separation forces dev teams to learn three mental models.
* Even with “golden path” tooling, you can’t fully outsource that cognitive load.
* We should be thinking about unified developer experience, not just unified tooling.
Curious if others here have seen the same gap between tooling maturity and team understanding and if you do I'm eager to understand how you address it in your orgs.
https://redd.it/1loes4q
@r_devops
Substack
Org Implications of Contemporary Observability Tooling
Why metrics, logs, and traces remain separate and what that means for your org.
VENT Seeing engineers use LLMs to generate all the code that I used to write for them is concerning
One of our engineering directors decided to spin up a new service. Within minutes, he was able to produce the scripts / terraform to bring up the infra for these services, along with the scripts to deploy them. It’s very clear that this code was written by an LLM
It’s good, clean code too.
This is all stuff that I used to do, and I am realizing that pretty soon I will no longer be needed for this set of tasks.
This leads me to wonder what types of tasks I should focus on so as not to get automated away entirely.
I'm not trying to be a luddite or an alarmist. It's great that these tools have enabled higher productivity, and honestly writing those types of scripts was never particularly fun or engaging. Just trying to stay ahead of getting eaten by the AI bear.
https://redd.it/1lodvtm
@r_devops
One of our engineering directors decided to spin up a new service. Within minutes, he was able to produce the scripts / terraform to bring up the infra for these services, along with the scripts to deploy them. It’s very clear that this code was written by an LLM
It’s good, clean code too.
This is all stuff that I used to do, and I am realizing that pretty soon I will no longer be needed for this set of tasks.
This leads me to wonder what types of tasks I should focus on so as not to get automated away entirely.
I'm not trying to be a luddite or an alarmist. It's great that these tools have enabled higher productivity, and honestly writing those types of scripts was never particularly fun or engaging. Just trying to stay ahead of getting eaten by the AI bear.
https://redd.it/1lodvtm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Monday Questions - r/DevOptimize
r/DevOptimize is taking questions on making delivery simpler and packaging. Feel free to ask here or there.
* Are your deploys more steps than "install packages; per-env config; start services"? more than 100 lines?
* Do you have separate IaC source repos or branches for each environment? Let's discuss!
* Do you have more than two or three layers in your container build?
https://redd.it/1loi1wt
@r_devops
r/DevOptimize is taking questions on making delivery simpler and packaging. Feel free to ask here or there.
* Are your deploys more steps than "install packages; per-env config; start services"? more than 100 lines?
* Do you have separate IaC source repos or branches for each environment? Let's discuss!
* Do you have more than two or three layers in your container build?
https://redd.it/1loi1wt
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Are we supposed to know everything?
I used to think DevOps interviews would focus on CI/CD, observability, and maybe some k8s troubleshooting.
Then came a “design a distributed key-value store” question. My brain just… rebooted.
It’s not that I didn’t know what quorum or replication meant. But I hadn’t reviewed consensus protocols since college. I fumbled the difference between consistency and availability under pressure.
That interview was a wake-up call: if you're applying to DevOps roles that lean heavy on the “dev,” you will be asked to reason through failure models, caching layers, GC behavior, or how your system handles 4x traffic spikes without falling over.
Since then, I’ve been treating system design prep like a separate skill. I watch ByteByteGo on 1.5x speed. I sketch distributed tracing pipelines in Notion. I’ve also been using Beyz coding assistant to walk through mock scenarios. The kind where you have to balance tradeoffs and justify design choices on the fly.
It’s not about memorizing Raft vs Paxos. It’s about showing that you can ask good questions, make sane decisions, and evolve your design when requirements shift. (Also, knowing when not to build a whole new infra stack just to sound smart.)
System design interviews aren't going away. But neither is your ability to improve. Anyone else trying to "relearn" distributed systems after years of just... shipping YAML?
https://redd.it/1loj6m2
@r_devops
I used to think DevOps interviews would focus on CI/CD, observability, and maybe some k8s troubleshooting.
Then came a “design a distributed key-value store” question. My brain just… rebooted.
It’s not that I didn’t know what quorum or replication meant. But I hadn’t reviewed consensus protocols since college. I fumbled the difference between consistency and availability under pressure.
That interview was a wake-up call: if you're applying to DevOps roles that lean heavy on the “dev,” you will be asked to reason through failure models, caching layers, GC behavior, or how your system handles 4x traffic spikes without falling over.
Since then, I’ve been treating system design prep like a separate skill. I watch ByteByteGo on 1.5x speed. I sketch distributed tracing pipelines in Notion. I’ve also been using Beyz coding assistant to walk through mock scenarios. The kind where you have to balance tradeoffs and justify design choices on the fly.
It’s not about memorizing Raft vs Paxos. It’s about showing that you can ask good questions, make sane decisions, and evolve your design when requirements shift. (Also, knowing when not to build a whole new infra stack just to sound smart.)
System design interviews aren't going away. But neither is your ability to improve. Anyone else trying to "relearn" distributed systems after years of just... shipping YAML?
https://redd.it/1loj6m2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I need an UDP load balancer that can retry on timeouts
Greetings, friends,
Recently, I've been frantically searching for a solution to my problem:
I have a system that is composed of multiple servers that receive UDP packets and send back responses.
I need a load balancer that can also retry sending the UDP packet if no response comes back to it within 3 milliseconds. I need to check for ANY response, no parsing or anything.
I know that no response is to be expected from UDP, however, unfortunately, that is exactly what I need, otherwise, I have some edge cases where I no longer have 100% availability.
So far, I'm using Envoy Proxy, however, it does not support such a functionality for UDP.
I looked into potentially extending Envoy proxy, to create a custom UDP filter with these retries, however, it seems to be a pretty daunting task.
I couldn't even compile Envoy to begin with. It took 4 hours and ended in an error.
Does anyone know of any solution that could help achieve this? A LOT of traffic needs to be handled.
https://redd.it/1loix24
@r_devops
Greetings, friends,
Recently, I've been frantically searching for a solution to my problem:
I have a system that is composed of multiple servers that receive UDP packets and send back responses.
I need a load balancer that can also retry sending the UDP packet if no response comes back to it within 3 milliseconds. I need to check for ANY response, no parsing or anything.
I know that no response is to be expected from UDP, however, unfortunately, that is exactly what I need, otherwise, I have some edge cases where I no longer have 100% availability.
So far, I'm using Envoy Proxy, however, it does not support such a functionality for UDP.
I looked into potentially extending Envoy proxy, to create a custom UDP filter with these retries, however, it seems to be a pretty daunting task.
I couldn't even compile Envoy to begin with. It took 4 hours and ended in an error.
Does anyone know of any solution that could help achieve this? A LOT of traffic needs to be handled.
https://redd.it/1loix24
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Snyk free plan limits
Hi there,
I'm currently using Snyk on a private GitHub repository integrated with my GitHub Actions pipeline. Although I've exceeded the usage limits of the free plan by quite a bit, everything still seems to be working without issue.
Does anyone know why that might be the case? Should I expect the scans to stop working suddenly, or is there typically some buffer or grace period before enforcement?
Thanks in advance!
https://redd.it/1lohq5q
@r_devops
Hi there,
I'm currently using Snyk on a private GitHub repository integrated with my GitHub Actions pipeline. Although I've exceeded the usage limits of the free plan by quite a bit, everything still seems to be working without issue.
Does anyone know why that might be the case? Should I expect the scans to stop working suddenly, or is there typically some buffer or grace period before enforcement?
Thanks in advance!
https://redd.it/1lohq5q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Deploying OpenStack on Azure VMs — Common Practice or Overkill?
Hey everyone,
I recently started my internship as a junior cloud architect, and I’ve been assigned a pretty interesting (and slightly overwhelming) task:
Set up a private cloud using OpenStack, but hosted entirely on Azure virtual machines.
Before I dive in too deep, I wanted to ask the community a few important questions:
1. Is this a common or realistic approach?
Using OpenStack on public cloud infrastructure like Azure feels a bit counterintuitive to me. Have you seen this done in production, or is it mainly used for learning/labs?
2. Does it help reduce costs, or can it end up being more expensive than using Azure-native services or even on-premise servers?
3. How complex is this setup in terms of architecture, networking, maintenance, and troubleshooting?
Any specific challenges I should be prepared for?
4. What are the best practices when deploying OpenStack in a public cloud environment like Azure? (e.g., VM sizing, network setup, high availability, storage options…)
5. Is OpenStack-Ansible a good fit for this scenario, or should I consider other deployment tools like Kolla-Ansible or DevStack?
6. Are there security implications I should be especially careful about when layering OpenStack over Azure?
7. If anyone has tried this before — what lessons did you learn the hard way?
If you’ve got any recommendations, links, or even personal experiences, I’d really appreciate it. I'm here to learn and avoid as many beginner mistakes as possible 😅
Thanks a lot in advance!
https://redd.it/1lol38q
@r_devops
Hey everyone,
I recently started my internship as a junior cloud architect, and I’ve been assigned a pretty interesting (and slightly overwhelming) task:
Set up a private cloud using OpenStack, but hosted entirely on Azure virtual machines.
Before I dive in too deep, I wanted to ask the community a few important questions:
1. Is this a common or realistic approach?
Using OpenStack on public cloud infrastructure like Azure feels a bit counterintuitive to me. Have you seen this done in production, or is it mainly used for learning/labs?
2. Does it help reduce costs, or can it end up being more expensive than using Azure-native services or even on-premise servers?
3. How complex is this setup in terms of architecture, networking, maintenance, and troubleshooting?
Any specific challenges I should be prepared for?
4. What are the best practices when deploying OpenStack in a public cloud environment like Azure? (e.g., VM sizing, network setup, high availability, storage options…)
5. Is OpenStack-Ansible a good fit for this scenario, or should I consider other deployment tools like Kolla-Ansible or DevStack?
6. Are there security implications I should be especially careful about when layering OpenStack over Azure?
7. If anyone has tried this before — what lessons did you learn the hard way?
If you’ve got any recommendations, links, or even personal experiences, I’d really appreciate it. I'm here to learn and avoid as many beginner mistakes as possible 😅
Thanks a lot in advance!
https://redd.it/1lol38q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
GitHub action failing - Cannot read password despite clearly seeing it as GITHUBTOKEN
Hey guys,
Technical question here:
I am having an error where my GITHUB\TOKEN is being seen. [ Tested by adding 'echo "${#GITHUB_TOKEN}" the pound symbol which outputs the length, obviously not the actual token \]
yet I am getting 'err: fatal: could not read Password for 'https://***@github.com': ' in my GitHub action logs when trying to run git pull.
git pull https://${GITHUBTOKEN}@github.com/x/x.git main
Banging my head across this for the past three hours. Below is how I grab the GITHUB TOKEN.
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Deploy to server
uses: appleboy/[email protected]
env:
GITHUBTOKEN: ${{ secrets.GITHUBTOKEN }}
with:
host: ${{ secrets.HOST }}
username: ${{ secrets.USERNAME }}
key: ${{ secrets.SSHPRIVATEKEY }}
port: ${{ secrets.PORT || 22 }}
envs: GITHUBTOKEN
script: |
Thank you!
Mike
https://redd.it/1loje9q
@r_devops
Hey guys,
Technical question here:
I am having an error where my GITHUB\TOKEN is being seen. [ Tested by adding 'echo "${#GITHUB_TOKEN}" the pound symbol which outputs the length, obviously not the actual token \]
yet I am getting 'err: fatal: could not read Password for 'https://***@github.com': ' in my GitHub action logs when trying to run git pull.
git pull https://${GITHUBTOKEN}@github.com/x/x.git main
Banging my head across this for the past three hours. Below is how I grab the GITHUB TOKEN.
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Deploy to server
uses: appleboy/[email protected]
env:
GITHUBTOKEN: ${{ secrets.GITHUBTOKEN }}
with:
host: ${{ secrets.HOST }}
username: ${{ secrets.USERNAME }}
key: ${{ secrets.SSHPRIVATEKEY }}
port: ${{ secrets.PORT || 22 }}
envs: GITHUBTOKEN
script: |
Thank you!
Mike
https://redd.it/1loje9q
@r_devops
GitHub
GitHub · Change is constant. GitHub keeps you ahead.
Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
How does your company define DevOps, SRE, and Platform Teams?
For context: I’ve been a software engineer for 20 years and got into DevOps over a decade ago. I’ve held a variety of roles since then, and one thing I’ve noticed is that every company seems to structure the “ops” side of the house differently. I’m curious as to how do other companies approach it?
At my current company, here’s how things are set up:
* **DevOps Team**: Owns cloud infrastructure, manages our CDK setup and CI/CD pipelines, and has a grab bag of other responsibilities.
* **SRE Team**: Functions more like a traditional NOC, handling day-to-day server support and managing observability. There's some overlap with the DevOps team, and the boundaries aren't always clear.
* **Platform Team**: Software engineers focused on building internal tools to support development and QA.
I’m still relatively new here, and the structure feels a bit unusual especially compared to the model laid out in Google’s SRE book. I’d love to hear how other companies are organizing things.
https://redd.it/1lomymp
@r_devops
For context: I’ve been a software engineer for 20 years and got into DevOps over a decade ago. I’ve held a variety of roles since then, and one thing I’ve noticed is that every company seems to structure the “ops” side of the house differently. I’m curious as to how do other companies approach it?
At my current company, here’s how things are set up:
* **DevOps Team**: Owns cloud infrastructure, manages our CDK setup and CI/CD pipelines, and has a grab bag of other responsibilities.
* **SRE Team**: Functions more like a traditional NOC, handling day-to-day server support and managing observability. There's some overlap with the DevOps team, and the boundaries aren't always clear.
* **Platform Team**: Software engineers focused on building internal tools to support development and QA.
I’m still relatively new here, and the structure feels a bit unusual especially compared to the model laid out in Google’s SRE book. I’d love to hear how other companies are organizing things.
https://redd.it/1lomymp
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Another team took my work to corporate leadership and now they're "leading" a global rollout while I'm cast to the shadows. I had zero knowledge of this until they failed to reverse-engineer and contacted me.
Let me start by saying I’m (early career) a year into this corporate job at a "billion-dollar" multinational company. I fully understand that any work I do while employed is legally the company's intellectual property. That said, this post is more about how I can take advantage of my contributions for my career rather than being brushed aside.
Long story short, I single-handedly modernized a legacy system used in my region, automated several processes, deployments, migrated infra to the cloud, introduced GitOps and proper CI/CD pipelines, and implemented monitoring dashboards with Prometheus+Grafana. This overhaul gained a lot of traction so much so that a team from another region requested I build the same system for them, tailored to their needs.
Now here’s where things got interesting. Apparently, while in conversations with this other region, someone higher up at the global level got access to my project and showed it to their boss who is just one level below the CEO. I still have no idea who this person is or how they even gained access to my work. Anyways, this corporate leader was so impressed that they decided the system should be rolled out globally as soon as possible. The person who shared my project then took it upon themselves to assign a team dedicated to replicating it for all regions.
Now this assigned team somehow managed to access my project (I genuinely suspect a security breach or admin-level involvement) and tried to reverse-engineer everything I built.. but failed. They then began trying to identify who was behind the project and eventually contacted my manager (the "official" project manager) by pulling him into a meeting without prior notice. Odd.
So my manager then decided to setup a proper call with this team with me involved this time. In this call, they basically came forward and requested us to provide all the code, tools, and cloud infrastructure so they can simply copy and paste it for all regions, as well as requesting several technical sessions. To make matters worse, they want me to handle all the IT bureaucratic processes for every region to get things set up (I can already see myself being roped into supporting all regions and not just my own at this point). However, I strongly believe this "replication" approach will be destined to fail as each region has different user requirements and processes not quite comparable to ours. And I also strongly believe they will struggle to get anything running, due to their limited technical and business knowledge of the processes, and the type of technical questions I was being asked.
Anyways, if this team rolls out my solution globally for each region, they’ll receive all the visibility and credit (they'll be hosting demo sessions with region leaders which for sure I wont be invited to), while I'll be essentially cast into the shadows. What’s frustrating is that I have full knowledge of the system and am responsible for it so why isn't my manager at least being the one leading this global rollout and not some random team?
I’ve been trying to indirectly nudge my manager to take ownership of the global rollout, instead of letting this new team take over. But I’m not sure how this will play out. The person who assigned this team is closer to the corporate leader, while my manager is a few steps lower in the hierarchy. So far, all he’s done is try to keep our regional manager informed of the situation playing out. Realistically, only the regional manager can mention this to the corporate leader, but I’m not confident that will happen.
My manager often says "how will this benefit the team?" But in this case, it’s clear he’s struggling to see any benefit in simply handing over our work to another team that will walk away with all the credit.
We’re still in the early stages, and I haven’t handed anything over yet. But I’m deeply
Let me start by saying I’m (early career) a year into this corporate job at a "billion-dollar" multinational company. I fully understand that any work I do while employed is legally the company's intellectual property. That said, this post is more about how I can take advantage of my contributions for my career rather than being brushed aside.
Long story short, I single-handedly modernized a legacy system used in my region, automated several processes, deployments, migrated infra to the cloud, introduced GitOps and proper CI/CD pipelines, and implemented monitoring dashboards with Prometheus+Grafana. This overhaul gained a lot of traction so much so that a team from another region requested I build the same system for them, tailored to their needs.
Now here’s where things got interesting. Apparently, while in conversations with this other region, someone higher up at the global level got access to my project and showed it to their boss who is just one level below the CEO. I still have no idea who this person is or how they even gained access to my work. Anyways, this corporate leader was so impressed that they decided the system should be rolled out globally as soon as possible. The person who shared my project then took it upon themselves to assign a team dedicated to replicating it for all regions.
Now this assigned team somehow managed to access my project (I genuinely suspect a security breach or admin-level involvement) and tried to reverse-engineer everything I built.. but failed. They then began trying to identify who was behind the project and eventually contacted my manager (the "official" project manager) by pulling him into a meeting without prior notice. Odd.
So my manager then decided to setup a proper call with this team with me involved this time. In this call, they basically came forward and requested us to provide all the code, tools, and cloud infrastructure so they can simply copy and paste it for all regions, as well as requesting several technical sessions. To make matters worse, they want me to handle all the IT bureaucratic processes for every region to get things set up (I can already see myself being roped into supporting all regions and not just my own at this point). However, I strongly believe this "replication" approach will be destined to fail as each region has different user requirements and processes not quite comparable to ours. And I also strongly believe they will struggle to get anything running, due to their limited technical and business knowledge of the processes, and the type of technical questions I was being asked.
Anyways, if this team rolls out my solution globally for each region, they’ll receive all the visibility and credit (they'll be hosting demo sessions with region leaders which for sure I wont be invited to), while I'll be essentially cast into the shadows. What’s frustrating is that I have full knowledge of the system and am responsible for it so why isn't my manager at least being the one leading this global rollout and not some random team?
I’ve been trying to indirectly nudge my manager to take ownership of the global rollout, instead of letting this new team take over. But I’m not sure how this will play out. The person who assigned this team is closer to the corporate leader, while my manager is a few steps lower in the hierarchy. So far, all he’s done is try to keep our regional manager informed of the situation playing out. Realistically, only the regional manager can mention this to the corporate leader, but I’m not confident that will happen.
My manager often says "how will this benefit the team?" But in this case, it’s clear he’s struggling to see any benefit in simply handing over our work to another team that will walk away with all the credit.
We’re still in the early stages, and I haven’t handed anything over yet. But I’m deeply
concerned about how this is unfolding. From a career perspective, it looks like I'm gaining nothing from this besides telling myself I did the work. Being so early in my career, a project like this would really benefit me tenfold. I really don't want to waste this chance to turn this into something beneficial.
https://redd.it/1lor008
@r_devops
https://redd.it/1lor008
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Built an audiobook on AI infra (NVIDIA cert prep) – Free chapters out now
Hey,
If you’ve ever had to manage GPUs, troubleshoot inference endpoints, or optimize AI workloads, this might interest you:
🎧 I’m building an audiobook series based on the NVIDIA Certified AI Infrastructure & Operations (NCA-AIIO) certification.
The first 4 chapters are free and walk through:
AI infra basics
GPU architecture
AI/ML frameworks
Networking for AI inference and training
I created it for those who prefer learning on the go.
The full version will include real-world ops, deployment patterns, performance tuning, and security.
🔗 Free chapters here
Would love feedback from anyone working with production ML or AI systems!
https://redd.it/1losjiz
@r_devops
Hey,
If you’ve ever had to manage GPUs, troubleshoot inference endpoints, or optimize AI workloads, this might interest you:
🎧 I’m building an audiobook series based on the NVIDIA Certified AI Infrastructure & Operations (NCA-AIIO) certification.
The first 4 chapters are free and walk through:
AI infra basics
GPU architecture
AI/ML frameworks
Networking for AI inference and training
I created it for those who prefer learning on the go.
The full version will include real-world ops, deployment patterns, performance tuning, and security.
🔗 Free chapters here
Would love feedback from anyone working with production ML or AI systems!
https://redd.it/1losjiz
@r_devops
FlashGenius
FlashGenius - AI-Powered Certification Exam Prep
Master professional certifications with AI-powered flashcards, exam simulations, and personalized learning paths.
AWS Spot Instance selection tool - looking for automation ideas
Sharing spotinfo - a CLI that simplifies spot instance selection for automation workflows.
**What it provides**:
* Query spot prices and interruption rates
* Single Go binary, no dependencies
* Works offline (embedded data)
* JSON/CSV output for scripting
* AI assistant integration via MCP
**Current automation patterns**:
1. **Dynamic selection**:
```bash
INSTANCE=$(spotinfo --cpu=4 --memory=16 --sort=price --output=text | head -1)
terraform apply -var="instance_type=$INSTANCE"
```
2. **Region optimization**:
```bash
spotinfo --type="m5.large" --region=all --output=csv | \
awk -F',' '$5 < 10 {print $1, $6}' | sort -k2 -n
```
3. **Fleet configuration**:
```bash
spotinfo --region=us-east-1 --output=json | \
jq '[.[] | select(.Range.max < 20)]' > spot-fleet.json
```
Also works with Claude Desktop/Cursor for team members who prefer natural language queries.
GitHub: [https://github.com/alexei-led/spotinfo](https://github.com/alexei-led/spotinfo)
(Stars help me understand usage patterns)
What spot instance automation patterns are you using? Which features would make your workflows smoother?
https://redd.it/1lou2pe
@r_devops
Sharing spotinfo - a CLI that simplifies spot instance selection for automation workflows.
**What it provides**:
* Query spot prices and interruption rates
* Single Go binary, no dependencies
* Works offline (embedded data)
* JSON/CSV output for scripting
* AI assistant integration via MCP
**Current automation patterns**:
1. **Dynamic selection**:
```bash
INSTANCE=$(spotinfo --cpu=4 --memory=16 --sort=price --output=text | head -1)
terraform apply -var="instance_type=$INSTANCE"
```
2. **Region optimization**:
```bash
spotinfo --type="m5.large" --region=all --output=csv | \
awk -F',' '$5 < 10 {print $1, $6}' | sort -k2 -n
```
3. **Fleet configuration**:
```bash
spotinfo --region=us-east-1 --output=json | \
jq '[.[] | select(.Range.max < 20)]' > spot-fleet.json
```
Also works with Claude Desktop/Cursor for team members who prefer natural language queries.
GitHub: [https://github.com/alexei-led/spotinfo](https://github.com/alexei-led/spotinfo)
(Stars help me understand usage patterns)
What spot instance automation patterns are you using? Which features would make your workflows smoother?
https://redd.it/1lou2pe
@r_devops
GitHub
GitHub - alexei-led/spotinfo: CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types, saving, price, and interruption…
CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types, saving, price, and interruption frequency. - alexei-led/spotinfo
Tried doing ASPM in-house. Gave up after 3 sprints
We’re a mid-size SaaS shop running IaC + containers + CI/CD on GitHub Actions. Thought we could build a lightweight ASPM framework with OSS + some repo scanning.
Reality: maintaining policy-as-code at scale + tracking exposures across services + correlating to runtime risk was hell. Half the alerts were noisy, the rest got buried in Jira.
We’re now testing out a commercial CNAPP with ASPM baked in. Wondering if others went this route or made internal ASPM stick?
https://redd.it/1louxim
@r_devops
We’re a mid-size SaaS shop running IaC + containers + CI/CD on GitHub Actions. Thought we could build a lightweight ASPM framework with OSS + some repo scanning.
Reality: maintaining policy-as-code at scale + tracking exposures across services + correlating to runtime risk was hell. Half the alerts were noisy, the rest got buried in Jira.
We’re now testing out a commercial CNAPP with ASPM baked in. Wondering if others went this route or made internal ASPM stick?
https://redd.it/1louxim
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community