DevOps&SRE Library
18.3K subscribers
456 photos
5 videos
2 files
4.93K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Best practices for monitoring static web applications

https://www.datadoghq.com/blog/static-web-application-monitoring-best-practices
latency: a primer

hi! this article is aimed at folks who are interested in performance analysis or operations of software, and want to understand the impact on user experience. the examples will be centered around web applications and web services, but can be applied in other contexts as well.

https://igor.io/latency
Principles of Reliable Software Design

Reliable software design is a discipline that involves a careful balance of numerous principles, each of which is intended to ensure the development of high-quality software that meets the needs of users and stakeholders.

https://www.codereliant.io/principles-of-reliable-software-design-part-1
Failover

What is it? How does it work? When to use it and when not to use it?

https://blog.alexewerlof.com/p/failover
Solving challenges caused by Out Of Memory (OOM) Killer in Linux

Learn how out of memory events created challenges for our team, and how we solved them.

https://redpanda.com/blog/solve-out-of-memory-killer-events
acme-dns

A simplified DNS server with a RESTful HTTP API to provide a simple way to automate ACME DNS challenges.

https://github.com/joohoi/acme-dns
Building and operating a pretty big storage system called S3

Today, I am publishing a guest post from Andy Warfield, VP and distinguished engineer over at S3. I asked him to write this based on the Keynote address he gave at USENIX FAST ‘23 that covers three distinct perspectives on scale that come along with building and operating a storage system the size of S3.

https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html
Bridging the gap between IaC and Schema Management

When we started building Atlas a couple of years ago, we noticed that there was a substantial gap between what was then considered state-of-the-art in managing database schemas and the recent strides from Infrastructure-as-Code (IaC) to managing cloud infrastructure.

In this post, we review that gap and show how Atlas – along with its Terraform provider – can bridge the two domains.

https://atlasgo.io/blog/2023/07/19/bridging-the-gap-between-iac-and-schema-management
A misadventure with Terraform Sets & PagerDuty Schedules

How Terraform's setunion() disregards ordering.

https://tratnayake.dev/a-misadventure-with-terraform-sets-pagerduty-schedules
Stop using IAM User Credentials with Terraform Cloud

I recently started using Terraform Cloud but discovered that the getting started tutorial which describes how to integrate it with Amazon Web Services (AWS) suggested using IAM user credentials. This is not ideal as these credentials are long-lived and can lead to security issues.

https://www.wolfe.id.au/2023/07/17/stop-using-iam-user-credentials-with-terraform-cloud
Secure Your AWS Environments with Terraform, Vault, and Veeam

https://julia.hashnode.dev/secure-your-aws-environments-with-terraform-vault-and-veeam
sre-checklist

A checklist of anyone practicing Site Reliability Engineering

https://github.com/bregman-arie/sre-checklist
Why bother with SLI and SLO?

Is there really any value in setting service level indicators and objectives?

https://blog.alexewerlof.com/p/why-bother-with-sli-and-slo
Traffic Jams in the Cloud: Are Overloads Sabotaging Your Application's Reliability?

https://blog.fluxninja.com/blog/traffic-jams-in-the-cloud-unveiling-the-true-enemy-of-reliability
PostgreSQL: No More VACUUM, No More Bloat

PostgreSQL, a powerful open-source object-relational database system, has been lauded for its robustness, functionality, and flexibility. However, it is not without its challenges – one of which is the notorious VACUUM process. However, the dawn of a new era is upon us with OrioleDB, a novel engine designed for PostgreSQL that promises to eliminate the need for the resource-consuming VACUUM.

https://www.orioledata.com/blog/no-more-vacuum-in-postgresql
Identifying GCP’s Hidden Network Inter-Zone Egress Costs

Learn how to identify your Inter-Zone Egress costs in a few easy steps, using commonly available methods.

Ever wondered where those Inter-Zone Egress costs are coming from? Found yourself looking at GCP’s network pricing page many times to break it down? Me too. So I thought I might as well try to help clear things up.

https://www.doit.com/identifying-gcps-hidden-network-inter-zone-egress-costs