Announcing: 52 Weeks of SRE - A Journey to Master Site Reliability Engineering
https://jpereira.me/announcing-52-weeks-of-sre-a-journey-to-master-site-reliability-engineering
Week 1: https://jpereira.me/week-1-introduction-to-sre-where-the-magic-begins
Week 2: https://jpereira.me/week-2-monitoring-fundamentals
Week 3: https://jpereira.me/week-3-service-level-objectives-slos
https://jpereira.me/announcing-52-weeks-of-sre-a-journey-to-master-site-reliability-engineering
Week 1: https://jpereira.me/week-1-introduction-to-sre-where-the-magic-begins
Week 2: https://jpereira.me/week-2-monitoring-fundamentals
Week 3: https://jpereira.me/week-3-service-level-objectives-slos
The Karpenter Effect: Redefining Our Kubernetes Operations
https://medium.com/adevinta-tech-blog/the-karpenter-effect-redefining-our-kubernetes-operations-80c7ba90a599
A reflection on our journey towards AWS Karpenter, improving our Upgrades, Flexibility, and Cost-Efficiency in a 2,000+ Nodes Fleet
https://medium.com/adevinta-tech-blog/the-karpenter-effect-redefining-our-kubernetes-operations-80c7ba90a599
How to - Choose the Right Instance Size for AWS RDS
https://reliabilityengineering.substack.com/p/how-to-choose-the-right-instance
https://reliabilityengineering.substack.com/p/how-to-choose-the-right-instance
Managing AWS EKS access entries with Terraform and OpenTofu
https://dev.to/aws-builders/managing-aws-eks-access-entries-with-terraform-and-opentofu-414
https://dev.to/aws-builders/managing-aws-eks-access-entries-with-terraform-and-opentofu-414
terraform-aws-clickops-notifier
https://github.com/cloudandthings/terraform-aws-clickops-notifier
Get notified when actions are taken in the AWS Console.
https://github.com/cloudandthings/terraform-aws-clickops-notifier
Kubernetes networking: service, kube-proxy, load balancing
https://learnk8s.io/kubernetes-services-and-load-balancing
TL;DR: This article explores Kubernetes networking, focusing on Services, kube-proxy, and load balancing.
https://learnk8s.io/kubernetes-services-and-load-balancing
How Agoda Handles Load Shedding in Private Cloud
https://medium.com/agoda-engineering/load-shedding-private-cloud-first-81ddd5ab53ac
In this article, we’ll explore load shedding, which involves deciding which traffic to serve when you can’t handle all of it. The reason for having insufficient capacity can vary. We might face unexpected high traffic from a promotion, a malicious attempt to take our service offline, or maybe we’ve rolled out a change that doesn’t scale properly despite our best efforts to catch it in testing.
https://medium.com/agoda-engineering/load-shedding-private-cloud-first-81ddd5ab53ac
A Hands-On Guide to Kubernetes Endpoints & EndpointSlices
https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-endpoints-endpointslices-%EF%B8%8F-1375dfc9075c
Understanding Kubernetes Endpoints and Endpoint Slices: A Comprehensive Guide
https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-endpoints-endpointslices-%EF%B8%8F-1375dfc9075c
Amazon EKS- managing and fixing ETCD database size
https://marcincuber.medium.com/amazon-eks-managing-and-fixing-etcd-database-size-b6fb875888cb
Story detailing how to investigate and fix ETCD db issues when using EKS. You will find out how I managed to completely break our EKS cluster because of overloaded ETCD.
https://marcincuber.medium.com/amazon-eks-managing-and-fixing-etcd-database-size-b6fb875888cb
Unexpected HPA Scale Down of ArgoCD Rollouts
https://medium.com/@user.andrei/unexpected-hpa-scale-down-of-argocd-rollouts-964172271ab3
https://medium.com/@user.andrei/unexpected-hpa-scale-down-of-argocd-rollouts-964172271ab3
A Hands-On Guide to Kubernetes QoS Classes
https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-qos-classes-%EF%B8%8F-571b5f8f7e58
Understanding Quality of Service Classes in Kubernetes: A Practical Example
https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-qos-classes-%EF%B8%8F-571b5f8f7e58
DBaaS in 2024: Which PostgreSQL operator for Kubernetes to select for your platform?
P1: https://medium.com/@davidpech_39825/dbaas-in-2024-which-postgresql-operator-for-kubernetes-to-select-for-your-platform-51cf4d5dec4a
P2: https://medium.com/@davidpech_39825/dbaas-in-2024-which-postgresql-operator-for-kubernetes-to-select-for-your-platform-4d17352b35a1
P1: https://medium.com/@davidpech_39825/dbaas-in-2024-which-postgresql-operator-for-kubernetes-to-select-for-your-platform-51cf4d5dec4a
P2: https://medium.com/@davidpech_39825/dbaas-in-2024-which-postgresql-operator-for-kubernetes-to-select-for-your-platform-4d17352b35a1
Scaling Strategies on AWS EKS: Understanding HPA, VPA, and Cluster Autoscaler
https://towardsaws.com/scaling-strategies-on-aws-eks-understanding-hpa-vpa-and-cluster-autoscaler-12b88758d1d5
https://towardsaws.com/scaling-strategies-on-aws-eks-understanding-hpa-vpa-and-cluster-autoscaler-12b88758d1d5
Deploying a scalable STUN service in Kubernetes
https://medium.com/l7mp-technologies/deploying-a-scalable-stun-service-in-kubernetes-c7b9726fa41d
https://medium.com/l7mp-technologies/deploying-a-scalable-stun-service-in-kubernetes-c7b9726fa41d
Private kubernetes ingress with tailscale operator, cert-manager and external-dns
https://medium.com/@mattiaforc/zero-trust-kubernetes-ingress-with-tailscale-operator-cert-manager-and-external-dns-8f42272f8647
https://medium.com/@mattiaforc/zero-trust-kubernetes-ingress-with-tailscale-operator-cert-manager-and-external-dns-8f42272f8647
How to attach USB devices to Kubernetes pods using Akri
https://medium.com/@hampusc/how-to-attach-usb-devices-to-kubernetes-pods-using-akri-19fb70d41f1e
https://medium.com/@hampusc/how-to-attach-usb-devices-to-kubernetes-pods-using-akri-19fb70d41f1e
zeropod
https://github.com/ctrox/zeropod
Zeropod is a Kubernetes runtime (more specifically a containerd shim) that automatically checkpoints containers to disk after a certain amount of time of the last TCP connection. While in scaled down state, it will listen on the same port the application inside the container was listening on and will restore the container on the first incoming connection. Depending on the memory size of the checkpointed program this happens in tens to a few hundred milliseconds, virtually unnoticable to the user. As all the memory contents are stored to disk during checkpointing, all state of the application is restored.
https://github.com/ctrox/zeropod
AWS Controllers for Kubernetes
https://aws-controllers-k8s.github.io/community
Manage AWS services using Kubernetes
https://aws-controllers-k8s.github.io/community
1
helmper
https://github.com/ChristofferNissen/helmper
A little helper that pushes Helm Charts and images to your registries, easily configured with a declarative spec.
https://github.com/ChristofferNissen/helmper
contrast
https://github.com/edgelesssys/contrast
Contrast runs confidential container deployments on Kubernetes at scale.
https://github.com/edgelesssys/contrast
prom-analytics-proxy
https://github.com/nicolastakashi/prom-analytics-proxy
prom-analytics-proxy is a lightweight proxy application designed to sit between your Prometheus server and its clients. It provides valuable insights by collecting detailed analytics on PromQL queries, helping you understand query performance, resource usage, and overall system behavior. This can significantly improve observability for Prometheus users, providing actionable data to optimize query execution and infrastructure.
https://github.com/nicolastakashi/prom-analytics-proxy