DevOps&SRE Library

A Glimpse into the Redesigned Goku-Ingestor vNext at Pinterest

Better performance, lower cost and less code complexity

https://medium.com/pinterest-engineering/a-glimpse-into-the-redesigned-goku-ingestor-vnext-at-pinterest-d68159473464

4.78K views15:01

Simplicity

In May 2009, Google hosted an internal "Design Wizardry" panel, with talks by Jeff Dean, Mike Burrows, Paul Haahr, Alfred Spector, Bill Coughran, and myself. Here is a lightly edited transcript of my talk. Some of the details have aged out, but the themes live on, now perhaps more than ever.

https://commandcenter.blogspot.com/2023/12/simplicity.html

4.87K views07:02

DevOps&SRE Library

A deep dive into CPU requests and limits in Kubernetes

In this post, we are going to dive a bit deeper into CPU and share some general recommendations for specifying CPU requests and limits. We will also explore the differences between using the default policy (CFS quota) and the CPU Manager’s static policy. We are not going to consider memory resources in this post.

https://www.datadoghq.com/blog/kubernetes-cpu-requests-limits

5.41K views15:01

DevOps&SRE Library

GitOps Guide: ArgoCD vs Flux

https://www.codereliant.io/gitops-guide-argocd-vs-flux

5.38K views07:01

DevOps&SRE Library

A Spooky Performance Regression in AWS EBS Volumes

https://www.dolthub.com/blog/2023-11-22-spooky-performance-regression-aws-ebs

Christmas Come Early: An AWS EBS Performance Regression Update

https://www.dolthub.com/blog/2023-12-08-christmas-come-early-ebs-performance-regression-update

5.43K views15:01

DevOps&SRE Library

Scaling SRE Teams

Scaling teams of site reliability engineers comes with many challenges. Here, explore the challenges of scaling and review a successful scaling framework.

https://dzone.com/articles/scaling-sre-teams

5.38K views07:00

DevOps&SRE Library

Mastering AWS Lambda with Terraform: A Comprehensive Guide

https://blog.awsfundamentals.com/aws-lambda-with-terraform

5.3K views15:01

DevOps&SRE Library

VictoriaMetrics: A Comprehensive Guide, Comparing It to Prometheus, and Implementing Kubernetes Monitoring

https://medium.com/@seifeddinerajhi/victoriametrics-a-comprehensive-guide-comparing-it-to-prometheus-and-implementing-kubernetes-03eb8feb0cc2

5.26K views07:01

DevOps&SRE Library

Kubernetes And Kernel Panics

How Netflix’s Container Platform Connects Linux Kernel Panics to Kubernetes Pods

https://netflixtechblog.com/kubernetes-and-kernel-panics-ed620b9c6225

4.95K views15:01

DevOps&SRE Library

Kubewatch: A Kubernetes Watcher for Observability and Monitoring

Kubewatch is a Kubernetes watcher that publishes notifications to available collaboration hubs/notification channels. It watches the cluster for resource changes and notifies you through webhooks.

https://medium.com/@seifeddinerajhi/kubewatch-a-kubernetes-watcher-for-observability-and-monitoring-d6dea1dbeb06

https://github.com/robusta-dev/kubewatch

4.94K views07:01

DevOps&SRE Library

Notes on Self-hosted Transactional Email

Since a little more than two months ago, Healthchecks.io has been sending transactional email (~300’000 emails per month) through its own SMTP server. Here are my notes on setting it up.

https://blog.healthchecks.io/2023/08/notes-on-self-hosted-transactional-email

4.83K views15:01

DevOps&SRE Library

Healthchecks.io Hosting Setup, 2022 Edition

https://blog.healthchecks.io/2022/02/healthchecks-io-hosting-setup-2022-edition

Healthchecks.io Hosting, Questions and Answers

https://blog.healthchecks.io/2022/05/healthchecks-io-hosting-questions-and-answers

4.6K views07:01

DevOps&SRE Library

Martian Kubernetes Kit: a smooth-sailing toolkit from our SRE team

We’ve been using Kubernetes since before it was a “thing”, and as of 2023, we believe that it is still underutilized. In fact, it’s the best (and basically only real “at-scale”) solution for orchestrating Docker containers—or containers in general, after you’ve outgrown services like Heroku or Fly.io! That’s a bold claim, but it’s a belief backed up by our years of SRE experience. In this post, we’ll expand on that, and we’ll introduce a Kubernetes toolkit we already use and support for our clients, which simultaneously de-complexifies and highlights the benefits of Kubernetes.

https://evilmartians.com/chronicles/martian-kubernetes-kit-a-smooth-sailing-toolkit-from-our-sre-team

5.01K views15:00

DevOps&SRE Library

tofuenv

OpenTofu version manager inspired by tfenv

https://github.com/tofuutils/tofuenv

4.85K views16:01

DevOps&SRE Library

Service Level Indicators

Introduction to SLI, examples, counterexamples and tips

https://blog.alexewerlof.com/p/sli

4.84K views07:01

DevOps&SRE Library

On Error Budgets

An error budget is essentially the permissible limit of risk or failure that a service can tolerate while still meeting its objectives. It is closely tied to Service Level Objectives, which define the expected level of service reliability. For instance, if an SLO dictates 99.9% uptime, the error budget allows for a 0.1% margin of error or downtime.

https://www.codereliant.io/on-error-budgets

4.78K views15:00

DevOps&SRE Library

Upgrading GitHub.com to MySQL 8.0

GitHub uses MySQL to store vast amounts of relational data. This is the story of how we seamlessly upgraded our production fleet to MySQL 8.0.

https://github.blog/2023-12-07-upgrading-github-com-to-mysql-8-0

4.81K views07:01

DevOps&SRE Library

AWS CDK vs Terraform

IaC is one of the key DevOps practices, and AWS CDK & Terraform are both great IaC tools to manage your AWS infrastructure. Having used both extensively, let me share my experience with the 2 IaC tools.

https://medium.com/@kansvignesh/aws-cdk-vs-terraform-738c39d91f7a

4.86K views15:01

DevOps&SRE Library

git rebase: what can go wrong?

https://jvns.ca/blog/2023/11/06/rebasing-what-can-go-wrong-

4.69K views07:00

DevOps&SRE Library

Testing Framework in Terraform 1.6: A deep-dive

https://mattias.engineer/posts/terraform-testing-deep-dive

4.44K views15:01

DevOps&SRE Library

Take your testing to the cloud

https://mattias.engineer/posts/take-your-testing-to-the-cloud

3.96K views07:00