DevOps&SRE Library
18.4K subscribers
465 photos
4 videos
2 files
4.98K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Different Ways to Aggregate Nines

While working on SLOs, SLAs and SLIs I have found that there are only so many ways to aggregate service metrics. I have not yet found somewhere that attempts to review the different aggregation methods and what their relative strengths and weaknesses are.


https://hross.substack.com/p/different-ways-to-aggregate-nines
Distributed Tracing: A Whistle Stop Tour

Know enough to be dangerous in 10 minutes


https://metoro.io/blog/distributed-tracing-whistle-stop-tour
spqr

SPQR is a production-ready system for horizontal scaling of PostgreSQL via sharding. We appreciate any kind of feedback and contribution to the project.


https://github.com/pg-sharding/spqr
Grafana Loki: Optimising log based metrics

There are multiple layers where the performance of Loki can be improved and fine-tuned. From optimising the query, channeling it efficiently for processing, to allocating the right computational resources, we will cover the following parameters that make a significant improvement to the performance.


https://dev.to/siddharthjain1715/grafana-loki-optimising-log-based-metrics-5edb
Is GitOps actually useful?

GitOps doesn’t solve all deployment problems or even cover the entire deployment process, but it’s a solid foundational building block.


https://medium.com/@briankgrant/is-gitops-actually-useful-a1c851ba99d8
rotz

Fully cross platform dotfile manager and dev environment bootstrapper written in Rust.


https://github.com/volllly/rotz
Moving fast breaks things: the importance of a staging environment

https://graphite.dev/blog/staging-environment
Terragrunt Reference Architecture

This repository embodies a structured approach to organizing Terraform code with Terragrunt, focusing on reusability, ease of management, and scalability across multiple environments and cloud providers. It's crafted to guide teams in building robust cloud infrastructure that adheres to best practices and principles.


https://github.com/Excoriate/terragrunt-ref-arch
oneuptime

OneUptime is a comprehensive solution for monitoring and managing your online services. Whether you need to check the availability of your website, dashboard, API, or any other online resource, OneUptime can alert your team when downtime happens and keep your customers informed with a status page. OneUptime also helps you handle incidents, set up on-call rotations, run tests, secure your services, analyze logs, track performance, and debug errors.


https://github.com/oneuptime/oneuptime
Understanding Kubernetes emptyDir — With 3 Practical Use-cases

Learn how to effectively implement emptyDir memory for pods, with hands-on use cases for temporary data handling in Kubernetes.


https://decisivedevops.com/understanding-kubernetes-emptydir-with-3-practical-use-cases-960f550e0e34
Mastering Kubernetes: Journey with Cluster API

Let’s talk about how at Hepsiburada, we efficiently manage hundreds of Kubernetes clusters that directly handle about 95% of our over 100 million monthly visitor traffic. We’ll delve into the complexities of managing multiple clusters and discuss the strategies we employ to tackle these challenges.


https://medium.com/hepsiburadatech/mastering-kubernetes-journey-with-cluster-api-2fb779ee7177
Horizontal Autoscaling in Kubernetes

In this article I will write about the horizontal autoscaling in kubernetes. The intended audience is the software developers and devops/SRE engineers with at least some elementary background in kubernetes interested in learning about auto-scaling. When I was learning this topic, I didn’t find a single straightforward article that explains all the relveant concepts, so I took the challenge and rolled one myself.


https://medium.com/@aharon.haravon/horizontal-autoscaling-in-kubernetes-b9ef7a9f067a
Testing Service Mesh Performance in Multi-Cluster Scenario: Istio vs Kuma vs NSM

This article may be useful for those who are aware of service meshes and probably trying to improve scalability and connectivity between applications in Kubernetes and other container orchestration systems, e.g., adding encryption and authorization for application connections.


https://dev.to/pragmagic/testing-service-mesh-performance-in-multi-cluster-scenario-istio-vs-kuma-vs-nsm-4agj
Maximizing the Utility of Scarce AI Resources: A Kubernetes Approach

Optimizing the use of limited AI training accelerators


https://towardsdatascience.com/maximizing-the-utility-of-scarce-ai-resources-a-kubernetes-approach-0230ba53965b