DevOps & SRE notes
12.1K subscribers
45 photos
19 files
2.52K links
Helpful articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
☁️ AWS Podcast – The Official Podcast for Developers ☁️

Stay ahead with the latest AWS news, trends, and innovations! Join Simon Elisha and Hawn Nguyen-Loughren for regular updates, deep dives, new launches, and expert interviews covering storage, security, serverless, infrastructure, and more.

πŸš€ Explore Cutting-Edge Cloud Tech
πŸŽ™ Deep Dives & Expert Insights
πŸ”§ For Developers, Architects & IT Pros

✨ Subscribe now and be at the forefront of cloud innovation! πŸ“²
The article "Linux LKM Persistence" explores advanced techniques for maintaining persistence on Linux systems using loadable kernel modules (LKMs). It delves into the methods of loading malicious kernel modules at boot time, focusing on the use of the systemd-modules-load service to install and hide a rootkit

https://righteousit.com/2024/11/18/linux-lkm-persistence/

πŸš€ Join our community πŸŒπŸ’»
πŸ‘3
The article "Using Tracetest with OpenTelemetry for Trace-based Testing" explores the concept of trace-based testing in distributed systems and introduces tools to implement this approach. It delves into the basics of distributed tracing, compares it with traditional logging methods, and demonstrates how to use OpenTelemetry for instrumentation and Tracetest for creating and running trace-based tests.

https://tracetest.io/blog/trace-based-testing-with-opentelemetry-using-tracetest-with-opentelemetry

πŸš€ Join our community πŸŒπŸ’»
πŸ‘3
The article details Timescale's journey to optimize upsert performance on compressed hypertables in TimescaleDB, a PostgreSQL extension. It describes how the team addressed a customer's challenge with suboptimal upsert performance, ultimately achieving a 300x speed improvement by leveraging existing indexes for efficient conflict resolution during upserts on high-cardinality datasets.

https://www.timescale.com/blog/how-we-made-postgresql-upserts-300x-faster-on-compressed-data

πŸš€ Join our community πŸŒπŸ’»
πŸ‘3
The article discusses setting up a self-hosted full-stack observability system for startups using open-source tools, emphasizing the importance of monitoring system performance as businesses scale. It outlines key components and tools like OpenTelemetry, OpenSearch, Prometheus, and Grafana to achieve comprehensive system visibility without incurring high SaaS costs

https://osuite.io/articles/full-stack-observability-self-hosted

πŸš€ Join our community πŸŒπŸ’»
πŸ‘4πŸ‘Ž1
☁️ AWS Morning Brief – AWS News with a Twist ☁️

Stay updated on the latest AWS newsβ€”sprinkled with snark! With over 60+ AWS posts daily, we cut through the noise to bring you the hidden gems, top community contributions, and the must-know updates**β€”all summarized with **wit and clarity.

πŸ” Curated AWS News & Insights
πŸŽ™ No-Nonsense, Snarky Summaries
πŸš€ Stay Informed Without the Overload

✨ Subscribe now and get your AWS updatesβ€”minus the nonsense! πŸ“²
πŸ‘4
The authro explores the emerging trend of AI agents in the observability and monitoring space, discussing how these agents could potentially revolutionize the way operational data is processed and utilized. It highlights various startups developing AI-powered solutions for DevOps, incident response, and SRE tasks, while also addressing potential challenges such as data privacy concerns and the need for benchmarking to evaluate agent effectiveness.

https://monitoring2.substack.com/p/ai-agents-invade-observability

πŸš€ Join our community πŸŒπŸ’»
πŸ‘5
The article compares the performance characteristics of Classic and Quorum queues in RabbitMQ, highlighting their strengths and use cases. It presents benchmark results showing that Classic queues offer higher throughput and lower latency, making them suitable for high-performance applications, while Quorum queues provide better fault tolerance and durability at the cost of reduced performance, making them ideal for mission-critical systems requiring high availabilitys

https://dzone.com/articles/battle-of-the-rabbitmq-queues-performance-insights

πŸš€ Join our community πŸŒπŸ’»
❀‍πŸ”₯3πŸ‘1
Slack's engineering team details their journey in evolving their Chef infrastructure to manage tens of thousands of EC2 instances efficiently. The article explores the challenges faced and solutions implemented as they transitioned from a single Chef stack to a sharded infrastructure, improving reliability and deployment safety for their vast and growing infrastructure.
https://slack.engineering/advancing-our-chef-infrastructure/

πŸš€ Join our community πŸŒπŸ’»
πŸ”₯3
The blogpost discusses the creation of SREBench, a Kubernetes task dataset designed to evaluate LLM performance in root cause analysis of Kubernetes issues. It details the challenges faced by the Parity team in developing a reliable benchmark for their AI agent, ultimately leading to the creation of a synthetic dataset inspired by the MuSR murder mystery reasoning benchmark

https://www.tryparity.com/blog/how-and-why-we-made-srebench-swebench-for-k8s

πŸš€ Join our community πŸŒπŸ’»
πŸ‘3