DevOps&SRE Library
18.3K subscribers
457 photos
4 videos
2 files
4.94K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Alerts Are Fundamentally Messy

Good alerting hygiene consists of a few components: chasing down alert conditions, reflecting on incidents, and thinking of what makes a signal good or bad. The hope is that we can get our alerts to the stage where they will page us when they should, and they won’t when they shouldn’t.

However, the reality of alerting in a socio-technical system must cater not only to the mess around the signal, but also to the longer term interpretation of alerts by people and automation acting on them. This post will expand on this messiness and why Honeycomb favors an iterative approach to setting our alerts.


https://www.honeycomb.io/blog/alerts-are-fundamentally-messy
glasskube

Using traditional package managers or applying manifests directly can be super confusing and doesn't scale. Therefore, Glasskube will help you to install your favorite Kubernetes packages using the Glasskube UI for reduced complexity and increased transparency. We are also providing a brew inspired CLI for advanced users. Our packages are dependency aware, as you would expect from a package manager. Designed as a cloud native application, so you can follow your GitOps approach.


https://github.com/glasskube/glasskube
apisix

Apache APISIX is a dynamic, real-time, high-performance API Gateway.

APISIX API Gateway provides rich traffic management features such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more.

You can use APISIX API Gateway to handle traditional north-south traffic, as well as east-west traffic between services. It can also be used as a k8s ingress controller.


https://github.com/apache/apisix
Multiple Terraform projects in a mono-repo. How to survive a mess?

Do you have a set of projects sitting in a mono-repo and having various workspaces, file structures, and Terraform versions? A pain of switching the versions and remembering all path/workspace combinations? Uncertainty about the correctness of the workspace, or plan file before applying it?
I feel you! I’d share my experience in managing such projects, an approach to make it much easier, and a simple tool I wrote a few years ago for that. How is it related to Docker Compose? I’ll tell you…


https://tech.westwing.de/multiple-terraform-projects-in-a-mono-repo-how-to-survive-a-mess-e1ec5a136d17
k8s-cleaner

Cleaner is a Kubernetes controller that identifies unused or unhealthy resources, helping you maintain a streamlined and efficient Kubernetes cluster. It provides flexible scheduling, label filtering, Lua-based selection criteria, resource removal or update and notifications via Slack, Webex and Discord.


https://github.com/gianlucam76/k8s-cleaner
gitbutler

The GitButler version control client, backed by Git, powered by Tauri/Rust/Svelte


https://github.com/gitbutlerapp/gitbutler
dotslash

DotSlash (dotslash) is a command-line tool that lets you represent a set of platform-specific, heavyweight executables with an equivalent small, easy-to-read text file. In turn, this makes it efficient to store executables in source control without hurting repository size. This paves the way for checking build toolchains and other tools directly into the repo, reducing dependencies on the host environment and thereby facilitating reproducible builds.


https://github.com/facebook/dotslash
sad

Basically sad is a Batch File Edit tool.

It will show you a really nice diff of proposed changes before you commit them.

Unlike sed, you can double check before you fat finger your edit.


https://github.com/ms-jpq/sad
Why large companies and fast-moving startups are banning merge commits

Over the past decade, more and more closed-source repos have started banning merge commits on trunk and shifting to a squash-rebase-and-merge workflow. The benefits are clear: rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits. Trunk branches remain linear, and branches function as brief, atomic diffs off the trunk. Some operations become more complex (largely due to incomplete/missing Git tooling), but the end state is a tidier history.


https://graphite.dev/blog/why-ban-merge-commits
Implementing Unit and integration tests in AWS using Terraform, Terratest, and Golang

https://blog.playgroundtech.io/implementing-unit-and-integration-tests-in-aws-using-terraform-terratest-and-golang-5f92c676ede1
Mastering Terraform: Best Practices for Scalable, Secure, and Reliable Infrastructure as Code

https://dev.to/prakhyatkarri/terraform-45-best-practices-62l
5 AWS/GCP Terraform Gotchas

Programming is full of ups and downs. While the victories always feel great, there are also lots of tricky little things that end up sapping your time and energy. I feel a sense of responsibility to share those experiences in the hopes that it helps even just one fellow programmer.


https://awstip.com/5-aws-gcp-terraform-gotchas-20d0afaab163
awesome-limits

Everything has limits, including software systems. When you hit these limits, bad things can happen.

You've probably hit memory and disk limits, but those aren't the only ones.

This page lists limits that, when breached, led to someone having a bad time. I tweeted about limits and got all sorts of interesting responses. This page contains some of them, with links to the tweets, which often contain more details.


https://github.com/lorin/awesome-limits
Tell me about a time…

Here are some proposed questions for interviewing someone for an SRE role. Really, these are just conversation starters to get them reflecting and discussing specific incident details.


https://surfingcomplexity.blog/2023/12/24/tell-me-about-a-time
10 Tips for Onboarding New SRE Hires

There’s more than one way to mess up your new SRE hire and get them stuck in a loop.

Here are 6 ways new hires will know you’ve made this mistake:

1. unclear role requirements
2. going too advanced too soon
3. not having any tangible, measurable things to do in the first few months
4. not feeling connected with the rest of the SRE team
5. no clarity on how SRE fits into the wider organization
6. little to no collaboration with teams outside of SRE

This article will unpack these 6 sticking points and show how to solve them.


https://www.srepath.com/10-tips-for-onboarding-new-sre-hires
Starting SRE at startups and smaller organizations

Most of the original thinking behind SRE focuses on implementing it in large-scale systems.

I believe that any organization that has software at the foundation of its core business should at the very least pay attention to SRE principles.

You can always pare hyperscale ideas down to your level of need, which we will explore later in this article.


https://www.srepath.com/starting-sre-at-startups-and-smaller-organizations
AWS Extended EKS Support: A Costly Band-Aid for Kubernetes Clusters

Amazon Web Services (AWS) recently announced extended support for Amazon Elastic Kubernetes Service (EKS) versions (starting April, 2024), allowing customers to use older versions of Kubernetes for an additional 12 months. While this may seem like a convenient option, it comes with a hefty price tag and several drawbacks that customers should carefully consider before opting for it.

https://medium.com/@talkimhi/aws-extended-eks-support-a-costly-band-aid-for-kubernetes-clusters-120b8d537abe