DevOps&SRE Library
17.9K subscribers
464 photos
4 videos
2 files
4.79K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
Lessons from a Rollback Gameday

Insights and best practices from a real-world rollback gameday


https://medium.com/expedia-group-tech/lessons-from-a-rollback-gameday-4d05cf1c9524
Graceful External Termination: Handling Pod Deletions in Kubernetes Data Ingestion and Streaming Jobs

https://medium.com/ibm-data-ai/graceful-external-termination-handling-pod-deletions-in-kubernetes-data-ingestion-and-streaming-df1b2cd8d727
Securing Kubernetes API Server Health Checks Without Anonymous Access

https://dev.to/azalio/securing-kubernetes-api-server-health-checks-without-anonymous-access-31f9
Upgrading Stateful Kubernetes Clusters with near-zero downtime

At Freshworks, we regularly perform blue-green migrations to upgrade our EKS clusters and implement Redis-related changes with minimal disruption. In this article, we’ll walk through how we migrate approximately 900 Redis endpoints — spanning one staging region and five production regions, each with 4–5 EKS clusters — while ensuring high availability for our stateful Redis workloads.

Our mission was clear: complete the migration with minimal disruption to our services while ensuring data consistency. Here’s how we tackled this complex engineering challenge and achieved near-zero downtime migrations at scale.


https://medium.com/freshworks-engineering-blog/fast-k8s-upgrades-9cb60be7f93e
Cloud-Native Secret Management: OIDC in K8s Explained

External Secrets is the de-facto choice for secrets management in Kubernetes clusters. It simplifies the task of the administrator(s) of the cluster, ensuring only the secrets that are explicitly defined are present and accessible.

It comes with many great features but most important than all is its integration with major cloud providers.

In this blog post you will learn how to deploy it without hard-coded credentials and using only the power of OpenID Connect for trust relationship between services.


https://developer-friendly.blog/blog/2025/03/24/cloud-native-secret-management-oidc-in-k8s-explained/
Speeding Up My ZSH Shell

Super quick one I want to document here! I got myself on a side quest, again! No biggie, my ZSH shell was taking ages to load. When I say ages, more like 5+ seconds every time I opened a new terminal, that sort of thing can add up. This is just something I’ve lived with over the years, nothing has prompted this other than me wondering why it’s slow, then searching for how to profile it.


https://scottspence.com/posts/speeding-up-my-zsh-shell
ChatOps fatigue: how to create alerts that matter

In today's workplace, communication tools like Slack or Microsoft Teams are essential for staying connected at work. However, as orchestration and automation needs increase, so does the volume of notifications flooding these channels. What’s meant to streamline work can quickly become overwhelming. We call it "ChatOps fatigue" - when teams get so many alerts, they start tuning them out.


https://www.tines.com/blog/chatops-fatigue-how-to-create-alerts-that-matter
YAML templating was a mistake

Modern Kubernetes deployment methodologies have grown increasingly complex, layering abstraction upon abstraction in pursuit of flexibility. This article challenges that trajectory by examining how fundamental Unix tools combined with Makefiles can provide a more transparent and maintainable alternative to popular solutions like Helm and Kustomize.


https://dev.to/avkr/replace-helm-with-kiss-456a
Defining and Implementing Effective SLOs and SLIs for ArgoCD

https://kuqja424671.substack.com/p/defining-and-implementing-effective
From Docker Compose to Kubernetes: Migrating Spring Boot & Kafka microservices

https://medium.com/@devripper133127/migration-of-an-event-driven-architecture-to-kubernetes-b62691c5a858
From Laptop to Hybrid Cloud: Building a Modern and Frugal Kubernetes Network with Cilium ClusterMesh

https://medium.com/@shih.chieh.cheng/from-laptop-to-hybrid-cloud-building-a-modern-and-frugal-kubernetes-network-with-cilium-67559d404eca
freelens

Freelens is a free and open-source user interface designed for managing Kubernetes clusters. It provides a standalone application compatible with macOS, Windows, and Linux operating systems, making it accessible to a wide range of users. The application aims to simplify the complexities of Kubernetes management by offering an intuitive and user-friendly interface.


https://github.com/freelensapp/freelens
3
kubetail

Kubetail is a general-purpose logging dashboard for Kubernetes, optimized for tailing logs across multi-container workloads in real-time. With Kubetail, you can view logs from all the containers in a workload (e.g. Deployment or DaemonSet) merged into a single, chronological timeline, delivered to your browser or terminal.


https://github.com/kubetail-org/kubetail
nelm

Nelm is a Helm 3 alternative. It is a Kubernetes deployment tool that manages Helm Charts and deploys them to Kubernetes. It is also the deployment engine of werf. Nelm can do (almost) everything that Helm does, but better, and even quite some on top of it.


https://github.com/werf/nelm
1
Lessons from scaling PostgreSQL queues to 100k events per second

At RudderStack, we decided to use PostgreSQL as our main streaming engine and queuing system instead of specialized tools like Apache Kafka. We picked PostgreSQL because it's flexible, reliable for transactions, and easier to debug. If you are curious to learn more about that decision, read the previous post about the rationale behind why we chose Postgres over Apache Kafka and the initial architectural patterns we employed. Over the past six years, this system has proven reliable and has scaled to handle 100,000 events per second—but only after we successfully navigated challenges like table bloat, query performance degradation, index bottlenecks, and retry storms.

This post is a chronicle of the critical, hard-won lessons learned while maturing PostgreSQL into a highly performant and resilient queuing system.


https://www.rudderstack.com/blog/scaling-postgres-queue
How we discovered, and recovered from, Postgres corruption on the matrix.org homeserver

https://matrix.org/blog/2025/07/postgres-corruption-postmortem
Насколько хорошо вы разбираетесь в «пингвинах»? 🐧
 
Стажер или опытный сисадмин — проверьте свои навыки работы с Linux в тесте от Selectel
 
Вы легко отличите Ubuntu от Fedora, внимательны к командам в терминале и разбираетесь в синтаксисе chmod? Тогда вы легко справитесь с квизом. А если найдете пробелы в знаниях, читайте материалы Академии Selectel или проходите их бесплатный курс по системному администрированию.

Реклама. АО «Селектел», ИНН 7810962785, ERID: 2VtzqvvfrCG
Please open Telegram to view this post
VIEW IN TELEGRAM
marchat

Terminal-based group chat app with real-time WebSocket messaging, file sharing, themes, and admin tools — built with Go and Bubble Tea.


https://github.com/Cod-e-Codes/marchat