DevOps&SRE Library
18K subscribers
465 photos
3 videos
2 files
4.83K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://knd.gov.ru/license?id=67704b536aa9672b963777b3&registryType=bloggersPermission
Download Telegram
s3grep

s3grep is a parallel CLI tool for searching logs and unstructured content in Amazon S3 buckets. It supports .gz decompression, progress bars, and robust error handling—making it ideal for cloud-native log analysis.


https://github.com/dacort/s3grep
Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes

While migrating Pinterest’s search infrastructure — which powers core experiences for millions of users monthly — to Kubernetes, we faced a challenge in the new environment: one in every million search requests took 100x longer than usual.

This post chronicles our investigation, uncovering an elusive interaction between our memory-intensive search system and a seemingly innocent monitoring process. The journey involves profiling search systems, debugging performance issues, Linux kernel features, and memory management.


https://medium.com/pinterest-engineering/debugging-the-one-in-a-million-failure-migrating-pinterests-search-infrastructure-to-kubernetes-bef9af9dabf4
sentinel

Multi-protocol service monitoring system with real-time alerts and web dashboard. Supports HTTP/HTTPS, TCP and gRPC monitoring with Telegram notifications.


https://github.com/sxwebdev/sentinel
Overcoming the downsides of mutating webhooks: Our journey to an alternative

UiPath Automation Suite has many services that communicate using FQDN (Fully Qualified Domain Name). As this suite operates on the premises of our customers, it provides them with the freedom to select their own FQDN. Often, the certificate required for their chosen FQDN is not signed by a known authority. To talk securely using the HTTPS protocol, all the services must trust the FQDN’s certificate. However, these services are owned by multiple teams. Asking each team to handle this individually is cumbersome and makes managing future certificate trust requests more challenging.


https://engineering.uipath.com/overcoming-the-downsides-of-mutating-webhooks-our-journey-to-an-alternative-5b0fbea83c59
Submariner Lighthouse: Multi-Cluster Service Discovery for Kubernetes

https://dev.to/reoring/submariner-lighthouse-multi-cluster-service-discovery-for-kubernetes-4fj7
HAMi

HAMi, formerly known as 'k8s-vGPU-scheduler', is a Heterogeneous device management middleware for Kubernetes. It can manage different types of heterogeneous devices (like GPU, NPU, etc.), share heterogeneous devices among pods, make better scheduling decisions based on topology of devices and scheduling policies.


https://github.com/Project-HAMi/HAMi
From Linux Primitives to Kubernetes Security Contexts

In Kubernetes, containers typically start with root privileges.

This happens because, by default, container processes run as UID 0 unless overridden.

Kubernetes does not impose a non-root policy; it inherits whatever the image defines.

This isn't a bug, it's a design choice carried over from Docker.

While convenient during development, it introduces unnecessary risk in production environments.

If an attacker compromises the container, root access increases the likelihood of privilege escalation to the host.

The Kubernetes API offers several ways to restrict container privileges using the Security Context.

With it, you can control the user a container runs as, manage Linux capabilities, enforce read-only filesystems, and block privilege escalation.

However, despite its importance, Security Contexts are often misunderstood or misapplied.

Many teams discover these controls only after a security audit or scanner flags a running container.

The next steps are usually reactively patching the config, suppressing the warning and moving on.

Before we get into Kubernetes SecurityContexts, we need to understand what they're actually configuring under the hood.


https://learnkube.com/security-contexts
When PostgreSQL performance slows down, here is where to look first

https://stormatics.tech/blogs/when-postgresql-performance-slows-down-here-is-where-to-look-first
Terraformer: Reverse Engineering Infrastructure as Code

As infrastructure as code (IaC) becomes a foundational pillar of modern cloud-native and DevOps practices, tools that bridge the gap between existing infrastructure and code are increasingly valuable. One such powerful utility is Terraformer, an open-source tool developed by Google that helps users generate Terraform configurations from existing infrastructure resources. This article thoroughly explores Terraformer, including its architecture, use cases, benefits, challenges, and practical examples.


https://blog.stackademic.com/terraformer-reverse-engineering-infrastructure-as-code-a4542ab44ba9
Writing an internal Terraform provider from A to Z

We recently wrote a Terraform provider for an internal API at Typeform. This allowed us to manage mutable runtime data stored in an API through source files, with good change control, and a nice developer experience. Some of the steps were a little tricky, or required us to trawl through documentation, and I thought to myself: “I hope this is easier next time we do it!”


https://medium.com/typeforms-engineering-blog/writing-an-internal-terraform-provider-from-a-to-z-c5704a5f584b
Modern Kubernetes: Can we replace Helm?

For a long time, Kubernetes resource management has been synonymous with Helm.

There have been plenty of attempts to replace Helm and its templating miasma known as Charts. But those attempts never seem to stick, sometimes because they’re not different enough, or more often because the size and mass of the Helm ecosystem creates an inertia that’s hard to overcome.

This post explores how Yoke is trying to do the impossible: introducing Flights, a complete alternative to Helm Charts, while bringing Helm along for the ride.


https://yokecd.github.io/blog/posts/helm-compatibility
Hot-Patching Pods in Kubernetes 1.33: What Breaks, What Works, and How We’re Making It Usable

https://www.cloudbolt.io/blog/hot-patching-pods-in-kubernetes-1-33
kubernetes-controller-sharding

Make Kubernetes controllers horizontally scalable by distributing reconciliation of API objects across multiple controller instances. Remove the limitation to have only a single active replica (leader) per controller.

https://github.com/timebertt/kubernetes-controller-sharding
Kwatcher

Kwatcher is a Kubernetes operator that:

1. Automatically creates a ConfigMap from data fetched from an external URL using a secured Secret,
2. Periodically polls the URL (based on refreshInterval),
3. Updates the ConfigMap when the data changes,
4. And automatically triggers pod redeployment via annotations in the related Deployments.


https://github.com/Berg-it/Kwatcher