DevOps&SRE Library – Telegram

DevOps&SRE Library

18.4K subscribers

465 photos

4 videos

2 files

4.98K links

Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3

Download Telegram

About

Blog

Apps

Platform

DevOps&SRE Library

18.4K subscribers

DevOps&SRE Library

SQL is No Excuse to Avoid DevOps

Звучит дико, но некоторые команды до сих пор не используют миграции схемы бд из кода. Отличная статья на эту тему от Томаса Лимочелли.

https://queue.acm.org/detail.cfm?id=3300018

2.34K views02:19

DevOps&SRE Library

Stack Overflow: How We Do Monitoring - 2018 Edition

Как устроен мониторинг в StackOverflow.

https://nickcraver.com/blog/2018/11/29/stack-overflow-how-we-do-monitoring

2.82K views02:22

DevOps&SRE Library

Designing resilient systems: Circuit Breakers or Retries?

Серия из 2х постов про важные концепции для построения распределенных отказоустойчивых систем - circuit breakers и retries.

https://engineering.grab.com/designing-resilient-systems-part-1
https://engineering.grab.com/designing-resilient-systems-part-2

2.12K views01:34

DevOps&SRE Library

Какие метрики и как стоит замониторить в Redis.

How to monitor Redis performance metrics:
https://www.datadoghq.com/blog/how-to-monitor-redis-performance-metrics

How to collect Redis metrics:
https://www.datadoghq.com/blog/how-to-collect-redis-metrics

Monitor Redis using Datadog:
https://www.datadoghq.com/blog/monitor-redis-using-datadog

2.22K views01:55

DevOps&SRE Library

Какие метрики и как стоит замониторить в Nginx.

How to monitor NGINX:
https://www.datadoghq.com/blog/how-to-monitor-nginx

How to collect NGINX metrics:
https://www.datadoghq.com/blog/how-to-collect-nginx-metrics

How to monitor NGINX with Datadog:
https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog

3.5K views01:58

DevOps&SRE Library

Automating Datacenter Operations at Dropbox

Отличный пост в техническом блоге Dropbox про то, как они делают провижен свичей и серверов в своих ДЦ.

https://blogs.dropbox.com/tech/2019/01/automating-datacenter-operations-at-dropbox

4.79K views23:49

DevOps&SRE Library

PagerDuty Incident Response

PagerDuty выложили в открытый доступ свою внутреннюю документацию по работе с инцидентами.

https://response.pagerduty.com

4.72K views23:57

DevOps&SRE Library

nginx mirroring tips and tricks

Пост в блоге Александра Дзьоба про новую фичу в nginx - mirror module

https://alex.dzyoba.com/blog/nginx-mirror

2.65K views00:06

DevOps&SRE Library

DevOps&SRE Library

Go это отличный язык для автоматизации, на который следует обратить внимание DevOps специалистам и SRE. Since February 2015, the SRE (site reliability engineering) team at Stack Overflow has switched from a mixture of Python and Bash to Go. Even though Go…

В продолжении темы Go для DevOps/SRE специалистов 2 интересных поста в блоге GopherSRE про свитчинг с питона на голанг.

Why I moved from Python to Go (Part I):
https://www.gophersre.com/2017/08/05/why-i-moved-from-python-to-go-part-i

Why I moved from Python to Go(Part II):
https://www.gophersre.com/2017/08/10/why-i-moved-from-python-to-go-part-ii

2.8K views11:34

DevOps&SRE Library

Попытался немного консолидировать список полезных материалов для подготовки к интервью на позицию SRE. Список сделал на основе своего небольшого опыта прохождения интервью на такую позицию в разные компании (GitLab, Google, Revolut, etc).

Очень приветствуется обратная связь. Пишите в личку свои замечания и предложения - @mxssl, ставьте звездочки на гитхабе если список показался вам полезным.

https://github.com/mxssl/sre-interview-prep-guide

3.56K views13:18

DevOps&SRE Library

Grokking the System Design Interview

Самый крутой курс для подготовки к System Design Interview. Очень кратко и емко рассмотрены основные кейсы, термины и понятия, которые необходимо знать для проектирования систем.

Part 1:
https://coursehunter-club.net/t/educative-io-design-gurus-grokking-the-system-design-interview-part-1/579

Part 2:
https://coursehunter-club.net/t/educative-io-design-gurus-grokking-the-system-design-interview-part-2/580

Part 3:
https://coursehunter-club.net/t/educative-io-design-gurus-grokking-the-system-design-interview-part-3/581

Part 4:
https://coursehunter-club.net/t/educative-io-design-gurus-grokking-the-system-design-interview-part-4/583

Part 5:
https://coursehunter-club.net/t/educative-io-design-gurus-grokking-the-system-design-interview-part-5/584

6.98K views22:49

DevOps&SRE Library

Serverless Failure Stories

Коллекция историй про фейлы связанные с serverless инфраструктурой

https://github.com/cristim/serverless-failure-stories

2.4K viewsedited 23:16

DevOps&SRE Library

The cloud skills shortage and the unemployed army of the certified

Провокационный пост:

Why it’s so hard to find roles in cloud technology, while jobs go unfilled.

https://itnext.io/the-cloud-skills-shortage-and-the-unemployed-army-of-the-certified-bd405784cef1

2.92K viewsedited 00:06

DevOps&SRE Library

Какая по вашему мнению самая лучшая CI/CD платформа?

Anonymous Poll

GoCD - https://www.gocd.org

2%

Drone - https://drone.io

2%

Concourse CI - https://concourse-ci.org

50%

GitLab CI - https://about.gitlab.com

33%

Jenkins - https://jenkins.io

6%

TeamCity - https://www.jetbrains.com/teamcity

2%

CircleCI - https://circleci.com

2%

Travis CI - https://travis-ci.org

1%

Bamboo - https://www.atlassian.com/software/bamboo

2%

VSTS - https://visualstudio.microsoft.com/team-services

1.26K voters3.88K views15:52

DevOps&SRE Library

DevOps&SRE Library pinned «Какая по вашему мнению самая лучшая CI/CD платформа?»

15:53

DevOps&SRE Library

Architecting for Reliability

Серия постов про то, какими способами можно улучшить надежность работы приложений.

Part 1 - Concepts:
https://medium.com/becloudy/architecting-for-reliability-part-1-concepts-17028343089

Part 2 - Resiliency and Availability Design Patterns for the Cloud:
https://medium.com/becloudy/architecting-for-reliability-part-2-resiliency-and-availability-design-patterns-for-the-cloud-cf7aaaed0df2

Part 3 - High Availability Architectures:
https://medium.com/becloudy/architecting-for-reliability-part-3-high-availability-architectures-8dfd0f87d25e

4.97K views17:54

DevOps&SRE Library

Dapp / Werf

Ребята из Flant переписали Dapp c Ruby на Go и назвали его странным словом Werf. Определенно стоит обратить внимание - ребята настроены серьезно и проделали большую работу.

https://github.com/flant/werf

3.74K views01:11

DevOps&SRE Library

Awesome Prometheus alerts

Список готовых алертов для прометеуса. Он достаточно неплохой, но многие значения имеет смысл подгонять под свою инфру и свои нужды.

https://awesome-prometheus-alerts.grep.to

3.02K views15:52

DevOps&SRE Library

Site Reliability Engineering | Технострим

Неплохая лекция из курса «Проектирование высоконагруженных систем» от Mail.ru Group про SRE

https://youtu.be/4VW4FGYHMPs

4.22K views03:46

DevOps&SRE Library

Prometheus Alert Testing utility

PAT позволяет писать юнит тесты для алертов prometheus.

Этот подход описан в SRE workbook:

At Google, we test our monitoring and alerting using a domain-specific language that allows us to create synthetic time series. We then write assertions based upon the values in a derived time series, or the firing status and label presence of specific alerts.

https://github.com/kevinjqiu/pat

4.14K viewsedited 08:08

DevOps&SRE Library

Коллеги подсказывают, что в нативный promtool от разрабов прометеуса тоже возможность писать юнит тесты для алертов завезли:

https://www.robustperception.io/unit-testing-rules-with-prometheus

3.84K views10:56