DevOps&SRE Library

kubernetes-best-practices

A cookbook with the best practices to working with kubernetes.

https://github.com/diegolnasc/kubernetes-best-practices

3.9K views16:00

DevOps&SRE Library

Prometheus Definitive Guide Part III - Prometheus Operator

https://dev.to/ninii72387534/prometheus-definitive-guide-part-iii-prometheus-operator-5338

3.36K views07:00

DevOps&SRE Library

Всем привет!

Мы – Deutsche Telekom, крупнейший европейский провайдер телекоммуникаций и одна из ведущих мировых компаний.

Сейчас у нас активно растет команда в рамках проекта NT Common Testing. Целью проекта является контроль качества работы сетевой инфраструктуры Deutsche Telekom. Присоеденившись к команде, вы сможете, совместно со специалистами BSS и OSS, участвовать в автоматизированном интеграционном тестировании новых высокотехнологичных решений, внедряемых DT, осуществлять их валидацию и верификацию, контролировать эффективность работы сетевого оборудования и систем управления доступом абонентов.

В настоящий момент у нас открыто 3 позиции Network Automation-инженеров в данный проект. Мы с радостью рассмотрим специалистов как Middle, так и Senior уровня. Подробнее с описанием вакансии вы можете ознакомиться по ссылке: https://deutschetelekomitsolutions.ru/jobs/983/?sphrase_id=1465

Своим сотрудникам мы предлагаем отличный социальный пакет:
• Зарплату от 150 тысяч рублей NET (верхний предел, фактически, не ограничен);
• ДМС с первого месяца;
• Компенсацию спорта;
• Обучение за счет компании;
• Возможность получения реферального и Welcome бонусов;
• Гибкий график и возможность полностью удаленной работы;

Мы будем рады пообщаться с вами и рассказать больше о проекте и компании. Если вас заинтересовала вакансия и вы хотите стать частью нашей команды, пишите на почту [email protected] или @Mlymar в телеграм.

3.4K views09:00

DevOps&SRE Library

Below: a time travelling resource monitoring tool

below was designed and developed by the resource control team at Facebook to view and record historical Linux system data.

https://developers.facebook.com/blog/post/2021/09/21/below-time-travelling-resource-monitoring-tool

code: https://github.com/facebookincubator/below

3.27K views16:00

DevOps&SRE Library

SLOs and why you should care

Ever wondered what all the fuss over Service Level Objectives (SLOs) is about? Let’s find out.

https://engineering.solarisbank.com/slos-and-why-you-should-care-136f80bf686e

3.24K views07:00

DevOps&SRE Library

kubermetrics

Kubermetrics is an open-source dev tool that provides Kubernetes cluster monitoring as well as data visualization in a simple and easy to understand user interface. Kubermetrics intergrates both the Prometheus and Grafana Dashboards on one page! Allowing for custominzable dashboards and alerts.

https://github.com/oslabs-beta/kubermetrics

3.74K views16:00

DevOps&SRE Library

What is expected in the SRE role? We analyzed 30 job postings to find out

https://spike.sh/blog/sre-role-2021-analysed-30-job-postings

3.3K views07:00

DevOps&SRE Library

Making Kubernetes Operations Easy with kubectl Plugins

https://martinheinz.dev/blog/58

3.18K views16:00

DevOps&SRE Library

Kube-fledged: Cache Container Images in Kubernetes

https://itnext.io/kube-fledged-cache-container-images-in-kubernetes-7880a00bab91

3.21K views07:00

DevOps&SRE Library

Distributed Tracing with Spring Cloud Jaeger

https://amrutprabhu.medium.com/distributed-tracing-with-spring-cloud-jaeger-1ce2bb9d8294

3.26K views16:00

DevOps&SRE Library

The Definitive Guide to Kubernetes in Production

https://www.weave.works/blog/the-definitive-guide-to-kubernetes-in-production

3.38K views07:00

DevOps&SRE Library

How to set up monitoring tools for Java application

https://wkrzywiec.medium.com/how-to-set-up-monitoring-tools-for-java-application-322d14c191e4

3.27K views16:00

DevOps&SRE Library

Are SSDs Really More Reliable Than Hard Drives?

https://www.backblaze.com/blog/are-ssds-really-more-reliable-than-hard-drives

4.14K views07:00

DevOps&SRE Library

Best practices for writing incident postmortems

https://www.datadoghq.com/blog/incident-postmortem-process-best-practices

3.29K views16:00

DevOps&SRE Library

peirates

Peirates, a Kubernetes penetration tool, enables an attacker to escalate privilege and pivot through a Kubernetes cluster. It automates known techniques to steal and collect service accounts, obtain further code execution, and gain control of the cluster.

https://github.com/inguardians/peirates

4.32K views07:00

DevOps&SRE Library

Custom Prometheus Metrics with Go

https://dev.to/metonymicsmokey/custom-prometheus-metrics-with-go-520n

3.33K views16:00

DevOps&SRE Library

youki

youki is an implementation of the OCI runtime-spec in Rust, similar to runc.

https://github.com/containers/youki

3.22K views07:00

DevOps&SRE Library

Reverse Proxy, HTTP Keep-Alive Timeout, and sporadic HTTP 502s

https://iximiuz.com/en/posts/reverse-proxy-http-keep-alive-and-502s

3.6K views16:00

DevOps&SRE Library

automated-cloud-advisor

Automated Cloud Advisor is an extensible tool that aims at facilitating cost optimization in AWS, by collecting data for resources that are under utilized. In addition, this is a great learning tool for new DevOps/Cloud engineers that want to start automating things in AWS.

https://github.com/disneystreaming/automated-cloud-advisor

3.26K views07:00

DevOps&SRE Library

The Speed of Time

How long does it take to read the time? How would you time time? These strange questions came to the fore back in 2014 when Netflix was switching services from CentOS Linux to Ubuntu, and I helped debug several weird performance issues including one I'll describe here.

https://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html

3.37K views16:00

About

Blog

Apps

Platform