DevOps&SRE Library
18.4K subscribers
462 photos
4 videos
2 files
4.97K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
YamlQL

Query YAML files with SQL. Transform any YAML structure into a queryable database instantly.


https://github.com/AKSarav/YamlQL
kube-composer

A modern, intuitive Kubernetes YAML generator that simplifies deployment configuration for developers and DevOps teams.


https://github.com/same7ammar/kube-composer
What Is OTLP and Why It's the Future of Observability

You're probably reading this because you don't want to sink time or money into proprietary protocols and agents anymore. Why would you? They tie you to a single vendor, force you to adapt to their quirks, and make it painful to change direction later.

What you really need is an open, consistent way to instrument, collect, and move your telemetry without worrying about compatibility or lock-in. That's exactly what OpenTelemetry (OTel) gives you. And at the center of it all is the OpenTelemetry Protocol (OTLP), the common language that makes your services, collectors, and backends speak fluently with each other.

This guide will walk you through OTLP in detail: what it is, why it matters, and how to use it in real pipelines. By the end, you'll see how embracing OTLP and pairing it with an OTel-native backend helps you solve the challenges of modern observability while keeping your stack open, reliable, and free of lock-in.


https://www.dash0.com/knowledge/opentelemetry-protocol-otlp
What are metrics in OpenTelemetry: A Complete Guide

A comprehensive guide to understanding metrics in OpenTelemetry. What they are, how they work, and how to implement them effectively with practical code examples.


https://oneuptime.com/blog/post/2025-08-26-what-are-metrics-in-opentelemetry/view
Cloudreve

Self-hosted file management system with multi-cloud support.


https://github.com/cloudreve/Cloudreve
Building a Unified OpenTelemetry Pipeline in Kubernetes

https://fatihkoc.net/posts/opentelemetry-kubernetes-pipeline
velld

A self-hosted database backup management tool. Schedule automated backups, monitor status, and manage multiple databases from one place.


https://github.com/dendianugerah/velld
3
PrivateCaptcha

Private Captcha is an independent, privacy-first, self-hostable Proof-of-Work CAPTCHA service made in EU.


https://github.com/PrivateCaptcha/PrivateCaptcha
flint

A single <11MB binary with a modern Web UI, CLI, and API for KVM.
No XML. No bloat. Just VMs.


https://github.com/volantvm/flint
cluster-bare-autoscaler

Cluster Bare Autoscaler (CBA) automatically adjusts the size of a bare-metal Kubernetes cluster by powering nodes off or on based on real-time resource usage, while safely cordoning and draining nodes before shutdown.


https://github.com/docent-net/cluster-bare-autoscaler
volcano-vgpu-device-plugin

Volcano vgpu device-plugin can provide device-sharing mechanism for NVIDIA devices managed by volcano.


https://github.com/Project-HAMi/volcano-vgpu-device-plugin
KAI-Scheduler

KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.


https://github.com/NVIDIA/KAI-Scheduler
kubezonnet

Monitor cross-zone network traffic in Kubernetes.


https://github.com/polarsignals/kubezonnet
k3k

K3k, Kubernetes in Kubernetes, is a tool that empowers you to create and manage isolated K3s clusters within your existing Kubernetes environment.


https://github.com/rancher/k3k
ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.


https://github.com/containers/ramalama
hwameistor

HwameiStor is an HA local storage system for cloud-native stateful workloads. It creates a local storage resource pool for centrally managing all disks such as HDD, SSD, and NVMe. It uses the CSI architecture to provide distributed services with local volumes and provides data persistence capabilities for stateful cloud-native workloads or components.


https://github.com/hwameistor/hwameistor
Why Environments Beat Clusters For Dev Experience

The cloud ecosystem has reached a turning point. Tools for operators/administrators are now mature and can handle most day-to-day operations that deal with Kubernetes clusters. Finally, we can turn our focus to application developers and their needs.

If you look at all the Kubernetes tools available, you’ll understand that most of them treat Kubernetes as another form of infrastructure. You can easily find tools that install Kubernetes, monitor Kubernetes, secure Kubernetes, do cost estimations for Kubernetes, etc. But how many Kubernetes tools can you find that target application developers and their day-to-day responsibilities?

Several companies even try to hide Kubernetes completely from developers by using leaky abstractions or so-called developer portals. These adoption efforts almost always fail simply because nobody asked the developers what they really need. Don’t fall into this trap.

In this article, we see some common examples of what companies “think” about developers’ needs versus what developers need in practice, in the context of application development for Kubernetes.


https://medium.com/containers-101/why-environments-beat-clusters-for-dev-experience-f6eef0cd928b
Terraform state locking explained (and why it hurts at scale)

Terraform state locking is a textbook example of solving a distributed coordination problem with the wrong primitive. You have concurrent actors, partial modifications, and dependency graphs—and the solution is a global mutex on a JSON blob. The scaling characteristics are exactly what you'd predict from this mismatch.


https://stategraph.dev/blog/terraform-state-locking-explained
How to write and rightsize Terraform modules

There are four key areas to consider when deciding on best practices for designing Terraform modules: scope, code strategy, security, and testing.


hashicorp.com/en/blog/how-to-write-and-rightsize-terraform-modules
pogocache

Fast caching software built from scratch with a focus on low latency and cpu efficency.


https://github.com/pogocache/pogocache