The Saga is Antipattern
The Saga pattern is often positioned as a better way to handle distributed transactions. I see no point in discussing Saga's disadvantages because the problem is that Saga should not be used in the microservices at all:https://dev.to/siy/the-saga-is-antipattern-1354
If you need distributed transactions across a few microservices, most likely you incorrectly defined and separated domains.
Below is a long explanation why.
Lost in transit: debugging dropped packets from negative header lengths
https://blog.cloudflare.com/lost-in-transit-debugging-dropped-packets-from-negative-header-lengths
https://blog.cloudflare.com/lost-in-transit-debugging-dropped-packets-from-negative-header-lengths
Analyzing Volatile Memory on a Google Kubernetes Engine Node
TL:DR At Spotify, we run containerized workloads in production across our entire organization in five regions where our main production workloads are in Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). If we detect suspicious behavior in our workloads, we need to be able to quickly analyze it and determine if something malicious has happened. Today we leverage commercial solutions to monitor them, but we also do our own research to discover options and alternative methods.https://engineering.atspotify.com/2023/06/analyzing-volatile-memory-on-a-google-kubernetes-engine-node
One such research project led to the discovery of a new method for conducting memory analysis on GKE by combining three open source tools, AVML, dwarf2json, and Volatility 3, the result being a snapshot of all the processes and memory activities on a GKE node.
This new method empowers us and other organizations to use an open source alternative if we do not have a commercial solution in place or if we want to compare our current monitoring to the open source one.
In this blog post, I’ll explain in detail how memory analysis works and how this new method can be used on any GKE node in production today.
Crossplane: Why it Didn't Work for Us
We investigated Crossplane at a deep level and found it wasn't for us. Read on to learn about our investigation and the issues we found.https://masterpoint.io/updates/passing-on-crossplane
IaC CI/CD integration for Terraform Vet
https://medium.com/google-cloud/iac-ci-cd-integration-for-terraform-vet-d67ef528a982
https://medium.com/google-cloud/iac-ci-cd-integration-for-terraform-vet-d67ef528a982
Automate AWS SSO Using Terraform
Leveraging Terraform to automate the setup and configuration of SSO resources, streamline user management, and enhance security.https://medium.com/cloud-native-daily/automate-aws-sso-using-terraform-2f219a45c16f
tfgen
Terragrunt alternative to keep your Terraform code consistent and DRYhttps://github.com/refl3ction/tfgen
terraform-registry
This is an implementation of the Terraform registry protocol used to host a private Terraform registry.https://github.com/nrkno/terraform-registry
tfvar
tfvar is a Terraform's variable definitions template generator. It scans your Terraform configurations or modules and extracts the variables into formats of your choice for editing, e.g., tfvar, environment variables, etc.https://github.com/shihanng/tfvar
SRE Engagement Models
- Consultinghttps://certomodo.substack.com/p/sre-engagement-models
- Embedded
- Infra Team
CloudFront and Terraform Essentials: How to Optimize Content Delivery
We are going to describe how CloudFront can be integrated with API Gateway to provide lower-latency. And we will go through the attributes of the CloudFront resources in Terraform, including the ones that we need to create the distribution and configure origins and behaviors.https://medium.com/@xpiotrkleban/cloudfront-and-terraform-essentials-how-to-optimize-content-delivery-27c84e8aef04
Best practices for monitoring static web applications
https://www.datadoghq.com/blog/static-web-application-monitoring-best-practices
https://www.datadoghq.com/blog/static-web-application-monitoring-best-practices
latency: a primer
hi! this article is aimed at folks who are interested in performance analysis or operations of software, and want to understand the impact on user experience. the examples will be centered around web applications and web services, but can be applied in other contexts as well.https://igor.io/latency
Principles of Reliable Software Design
Reliable software design is a discipline that involves a careful balance of numerous principles, each of which is intended to ensure the development of high-quality software that meets the needs of users and stakeholders.https://www.codereliant.io/principles-of-reliable-software-design-part-1
Failover
What is it? How does it work? When to use it and when not to use it?https://blog.alexewerlof.com/p/failover
Solving challenges caused by Out Of Memory (OOM) Killer in Linux
Learn how out of memory events created challenges for our team, and how we solved them.https://redpanda.com/blog/solve-out-of-memory-killer-events
acme-dns
A simplified DNS server with a RESTful HTTP API to provide a simple way to automate ACME DNS challenges.https://github.com/joohoi/acme-dns
Building and operating a pretty big storage system called S3
Today, I am publishing a guest post from Andy Warfield, VP and distinguished engineer over at S3. I asked him to write this based on the Keynote address he gave at USENIX FAST ‘23 that covers three distinct perspectives on scale that come along with building and operating a storage system the size of S3.https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Bridging the gap between IaC and Schema Management
When we started building Atlas a couple of years ago, we noticed that there was a substantial gap between what was then considered state-of-the-art in managing database schemas and the recent strides from Infrastructure-as-Code (IaC) to managing cloud infrastructure.https://atlasgo.io/blog/2023/07/19/bridging-the-gap-between-iac-and-schema-management
In this post, we review that gap and show how Atlas – along with its Terraform provider – can bridge the two domains.