DevOps&SRE Library

In this article, we’ll explore load shedding, which involves deciding which traffic to serve when you can’t handle all of it. The reason for having insufficient capacity can vary. We might face unexpected high traffic from a promotion, a malicious attempt to take our service offline, or maybe we’ve rolled out a change that doesn’t scale properly despite our best efforts to catch it in testing.

https://medium.com/agoda-engineering/load-shedding-private-cloud-first-81ddd5ab53ac

4.35K views07:01

DevOps&SRE Library

A Hands-On Guide to Kubernetes Endpoints & EndpointSlices

Understanding Kubernetes Endpoints and Endpoint Slices: A Comprehensive Guide

https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-endpoints-endpointslices-%EF%B8%8F-1375dfc9075c

4.54K views15:02

DevOps&SRE Library

Amazon EKS- managing and fixing ETCD database size

Story detailing how to investigate and fix ETCD db issues when using EKS. You will find out how I managed to completely break our EKS cluster because of overloaded ETCD.

https://marcincuber.medium.com/amazon-eks-managing-and-fixing-etcd-database-size-b6fb875888cb

4.34K views07:01

DevOps&SRE Library

Unexpected HPA Scale Down of ArgoCD Rollouts

https://medium.com/@user.andrei/unexpected-hpa-scale-down-of-argocd-rollouts-964172271ab3

4.18K views15:01

DevOps&SRE Library

A Hands-On Guide to Kubernetes QoS Classes

Understanding Quality of Service Classes in Kubernetes: A Practical Example

https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-qos-classes-%EF%B8%8F-571b5f8f7e58

3.84K views07:01

DevOps&SRE Library

DBaaS in 2024: Which PostgreSQL operator for Kubernetes to select for your platform?

P1: https://medium.com/@davidpech_39825/dbaas-in-2024-which-postgresql-operator-for-kubernetes-to-select-for-your-platform-51cf4d5dec4a

P2: https://medium.com/@davidpech_39825/dbaas-in-2024-which-postgresql-operator-for-kubernetes-to-select-for-your-platform-4d17352b35a1

3.75K views15:00

DevOps&SRE Library

Scaling Strategies on AWS EKS: Understanding HPA, VPA, and Cluster Autoscaler

https://towardsaws.com/scaling-strategies-on-aws-eks-understanding-hpa-vpa-and-cluster-autoscaler-12b88758d1d5

3.55K views07:02

DevOps&SRE Library

Deploying a scalable STUN service in Kubernetes

https://medium.com/l7mp-technologies/deploying-a-scalable-stun-service-in-kubernetes-c7b9726fa41d

3.45K views15:02

DevOps&SRE Library

Private kubernetes ingress with tailscale operator, cert-manager and external-dns

https://medium.com/@mattiaforc/zero-trust-kubernetes-ingress-with-tailscale-operator-cert-manager-and-external-dns-8f42272f8647

3.59K views07:00

DevOps&SRE Library

How to attach USB devices to Kubernetes pods using Akri

https://medium.com/@hampusc/how-to-attach-usb-devices-to-kubernetes-pods-using-akri-19fb70d41f1e

3.76K views15:01

DevOps&SRE Library

zeropod

Zeropod is a Kubernetes runtime (more specifically a containerd shim) that automatically checkpoints containers to disk after a certain amount of time of the last TCP connection. While in scaled down state, it will listen on the same port the application inside the container was listening on and will restore the container on the first incoming connection. Depending on the memory size of the checkpointed program this happens in tens to a few hundred milliseconds, virtually unnoticable to the user. As all the memory contents are stored to disk during checkpointing, all state of the application is restored.

https://github.com/ctrox/zeropod

3.82K views05:00

DevOps&SRE Library

AWS Controllers for Kubernetes

Manage AWS services using Kubernetes

https://aws-controllers-k8s.github.io/community

3.71K views15:02

DevOps&SRE Library

helmper

A little helper that pushes Helm Charts and images to your registries, easily configured with a declarative spec.

https://github.com/ChristofferNissen/helmper

3.74K views07:01

DevOps&SRE Library

contrast

Contrast runs confidential container deployments on Kubernetes at scale.

https://github.com/edgelesssys/contrast

4.16K views15:00

DevOps&SRE Library

prom-analytics-proxy

prom-analytics-proxy is a lightweight proxy application designed to sit between your Prometheus server and its clients. It provides valuable insights by collecting detailed analytics on PromQL queries, helping you understand query performance, resource usage, and overall system behavior. This can significantly improve observability for Prometheus users, providing actionable data to optimize query execution and infrastructure.

https://github.com/nicolastakashi/prom-analytics-proxy

4.29K views07:00

About

Blog

Apps

Platform