DevOps&SRE Library

Alerts Are Fundamentally Messy

Good alerting hygiene consists of a few components: chasing down alert conditions, reflecting on incidents, and thinking of what makes a signal good or bad. The hope is that we can get our alerts to the stage where they will page us when they should, and they won’t when they shouldn’t.

However, the reality of alerting in a socio-technical system must cater not only to the mess around the signal, but also to the longer term interpretation of alerts by people and automation acting on them. This post will expand on this messiness and why Honeycomb favors an iterative approach to setting our alerts.

https://www.honeycomb.io/blog/alerts-are-fundamentally-messy

3.44K views07:01

DevOps&SRE Library

glasskube

Using traditional package managers or applying manifests directly can be super confusing and doesn't scale. Therefore, Glasskube will help you to install your favorite Kubernetes packages using the Glasskube UI for reduced complexity and increased transparency. We are also providing a brew inspired CLI for advanced users. Our packages are dependency aware, as you would expect from a package manager. Designed as a cloud native application, so you can follow your GitOps approach.

https://github.com/glasskube/glasskube

3.78K views15:01

DevOps&SRE Library

apisix

Apache APISIX is a dynamic, real-time, high-performance API Gateway.

APISIX API Gateway provides rich traffic management features such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more.

You can use APISIX API Gateway to handle traditional north-south traffic, as well as east-west traffic between services. It can also be used as a k8s ingress controller.

https://github.com/apache/apisix

4.13K views07:01

DevOps&SRE Library

Multiple Terraform projects in a mono-repo. How to survive a mess?

Do you have a set of projects sitting in a mono-repo and having various workspaces, file structures, and Terraform versions? A pain of switching the versions and remembering all path/workspace combinations? Uncertainty about the correctness of the workspace, or plan file before applying it?
I feel you! I’d share my experience in managing such projects, an approach to make it much easier, and a simple tool I wrote a few years ago for that. How is it related to Docker Compose? I’ll tell you…

https://tech.westwing.de/multiple-terraform-projects-in-a-mono-repo-how-to-survive-a-mess-e1ec5a136d17

4.35K views15:00

DevOps&SRE Library

k8s-cleaner

Cleaner is a Kubernetes controller that identifies unused or unhealthy resources, helping you maintain a streamlined and efficient Kubernetes cluster. It provides flexible scheduling, label filtering, Lua-based selection criteria, resource removal or update and notifications via Slack, Webex and Discord.

https://github.com/gianlucam76/k8s-cleaner

4.35K views07:01

DevOps&SRE Library

gitbutler

The GitButler version control client, backed by Git, powered by Tauri/Rust/Svelte

https://github.com/gitbutlerapp/gitbutler

4.04K views15:01

DevOps&SRE Library

dotslash

DotSlash (dotslash) is a command-line tool that lets you represent a set of platform-specific, heavyweight executables with an equivalent small, easy-to-read text file. In turn, this makes it efficient to store executables in source control without hurting repository size. This paves the way for checking build toolchains and other tools directly into the repo, reducing dependencies on the host environment and thereby facilitating reproducible builds.

https://github.com/facebook/dotslash

3.77K views07:01

DevOps&SRE Library

sad

Basically sad is a Batch File Edit tool.

It will show you a really nice diff of proposed changes before you commit them.

Unlike sed, you can double check before you fat finger your edit.

https://github.com/ms-jpq/sad

3.87K views15:02

DevOps&SRE Library

Why large companies and fast-moving startups are banning merge commits

Over the past decade, more and more closed-source repos have started banning merge commits on trunk and shifting to a squash-rebase-and-merge workflow. The benefits are clear: rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits. Trunk branches remain linear, and branches function as brief, atomic diffs off the trunk. Some operations become more complex (largely due to incomplete/missing Git tooling), but the end state is a tidier history.

https://graphite.dev/blog/why-ban-merge-commits

3.77K views07:00

DevOps&SRE Library

Implementing Unit and integration tests in AWS using Terraform, Terratest, and Golang

https://blog.playgroundtech.io/implementing-unit-and-integration-tests-in-aws-using-terraform-terratest-and-golang-5f92c676ede1

3.77K views15:01

DevOps&SRE Library

Mastering Terraform: Best Practices for Scalable, Secure, and Reliable Infrastructure as Code

https://dev.to/prakhyatkarri/terraform-45-best-practices-62l

4.29K views07:01

DevOps&SRE Library

Sentry Automation via Terraform: Project and DSN

https://engineering.getmidas.com/sentry-terraform-automation-project-and-dsn-d4f489b97a09

4.06K views15:02

DevOps&SRE Library

5 AWS/GCP Terraform Gotchas

Programming is full of ups and downs. While the victories always feel great, there are also lots of tricky little things that end up sapping your time and energy. I feel a sense of responsibility to share those experiences in the hopes that it helps even just one fellow programmer.

https://awstip.com/5-aws-gcp-terraform-gotchas-20d0afaab163

3.62K views07:00

DevOps&SRE Library

awesome-limits

Everything has limits, including software systems. When you hit these limits, bad things can happen.

You've probably hit memory and disk limits, but those aren't the only ones.

This page lists limits that, when breached, led to someone having a bad time. I tweeted about limits and got all sorts of interesting responses. This page contains some of them, with links to the tweets, which often contain more details.

https://github.com/lorin/awesome-limits

4.08K views15:01

DevOps&SRE Library

Tell me about a time…

Here are some proposed questions for interviewing someone for an SRE role. Really, these are just conversation starters to get them reflecting and discussing specific incident details.

https://surfingcomplexity.blog/2023/12/24/tell-me-about-a-time

3.96K views07:00

DevOps&SRE Library

Rebuilding Netflix Video Processing Pipeline with Microservices

https://netflixtechblog.com/rebuilding-netflix-video-processing-pipeline-with-microservices-4e5e6310e359

3.95K views15:01

DevOps&SRE Library

10 Tips for Onboarding New SRE Hires

There’s more than one way to mess up your new SRE hire and get them stuck in a loop.

Here are 6 ways new hires will know you’ve made this mistake:

1. unclear role requirements
2. going too advanced too soon
3. not having any tangible, measurable things to do in the first few months
4. not feeling connected with the rest of the SRE team
5. no clarity on how SRE fits into the wider organization
6. little to no collaboration with teams outside of SRE

This article will unpack these 6 sticking points and show how to solve them.

https://www.srepath.com/10-tips-for-onboarding-new-sre-hires

4.1K views07:01

DevOps&SRE Library

Starting SRE at startups and smaller organizations

Most of the original thinking behind SRE focuses on implementing it in large-scale systems.

I believe that any organization that has software at the foundation of its core business should at the very least pay attention to SRE principles.

You can always pare hyperscale ideas down to your level of need, which we will explore later in this article.

https://www.srepath.com/starting-sre-at-startups-and-smaller-organizations

4.26K views15:01

DevOps&SRE Library

Ansible vs Terraform: Choose One or Use Both?

https://www.env0.com/blog/ansible-vs-terraform-when-to-choose-one-or-use-them-together

4.05K views07:00

DevOps&SRE Library

AWS Extended EKS Support: A Costly Band-Aid for Kubernetes Clusters

Amazon Web Services (AWS) recently announced extended support for Amazon Elastic Kubernetes Service (EKS) versions (starting April, 2024), allowing customers to use older versions of Kubernetes for an additional 12 months. While this may seem like a convenient option, it comes with a hefty price tag and several drawbacks that customers should carefully consider before opting for it.

https://medium.com/@talkimhi/aws-extended-eks-support-a-costly-band-aid-for-kubernetes-clusters-120b8d537abe

3.99K views15:00

About

Blog

Apps

Platform