CatOps
5.08K subscribers
94 photos
5 videos
19 files
2.57K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
AI agents invade observability: snake oil or the future of SRE?.

We got from "measure everything" by Twitter to "monitor only what matters" by Honeycomb. Yet, alert fatigue, convoluted dashboards, and garbage metrics are still an issue today.

Could AI solve this? We simply don't know yet. The linked article speculates on this topic: is AI in Observability just another marketing trick or something that could help engineers to solve issues faster or more importantly prevent those issues all together.

#ai #observabilty
👍2
My German teacher is also a volunteer. Together with his buddies, he constantly raises money for ongoing requests from the Ukrainian defenders.

You can support them via this Monobank Jar:

https://send.monobank.ua/jar/8jQSHW57kP

#donations #Ukraine
👍6🔥31
YouTube algorithms got me an interesting video titled: Microservices are Technical Debt. This is an interview with a principle engineer at DoorDash (US food delivery service) and a reference to an old article in DoorDash’s blog: Future-proofing: How DoorDash Transitioned from a Code Monolith to a Microservice Architecture.

Quite ironic that the title of the original article has words “future proofing” inside. Although, the argument about monoliths vs microservices is old and boring, there are a few of interesting insights in the interview. For example, microservices did help DoorDash to move faster when Corona hit. This is something that most likely wouldn’t be able to do without them. So, even with all the clumsiness that they bring today, that decision was worth it at the time.

#architecture
👍32👀2
Recently I was on a meetup, where a guy from Grafana Labs presented Beyla - their new eBPF application instrumentation solution. It's an interesting concept, that allows one to "instrument" an app without actually changing the code, but by intercepting system calls using eBPF.

What can I say: it's a cool concept. Here, Netflix describes how they use eBPF to monitor noisy neighbors. Yet, in case of Netflix it involves a lot of custom code, ofc.

#observability #ebpf
👍5
Today you may encounter mentions of 9.9 CVE for Linux. Most likely, it's all about this one.

This CVE is related to CUPS - a printing service for Linux. So, if you don't print things, you can just uninstall or disable it on your Linux machine and move with your day.

Anyway, this is an interesting read on its own. It's interesting how they found this vulnerability.

P.S. These are news from chat, btw.

#security
9
​​For today’s Donations Monday, I would like to remind you about a fundraiser for the International Legion of Ukraine.

This is a fundraiser for drones and other equipment, and it’s almost closed already. So, with your help we could close it today!

This is a great opportunity to accomplish something on Monday already 😅

Monobank Jar:

https://send.monobank.ua/jar/7282sCqqgy

#donations #Ukraine
6
Today I want to share with 2 articles from a guy who knows a thing or two about SLOs - Alex Ewerlöf.

- Heterogeneous SLI vs Homogeneous SLI in which he argues on type of events used for SLI and how to reason about them
- SLO: Elastic vs Datadog vs Grafana in which he compares SLO offerings from 3 major observability providers

#slo#observability
👍4
When using ElasticSearch for logs, you most likely create indices periodically as well as have a job to rotate them logs.

However, ES can also be used as a database, and in that case one should be more careful with the data.

Here is a neat how-to article on changing the field type in the index mapping without downtime.

#elasticsearch
👍5
An interesting article was shared in our chat yesterday.

This is a summary of the analysis of AI's influence on code quality. Some excerpts:

> The data strongly correlates “using Copilot” with “mistake code” being pushed to the repository more frequently.

> The 17% decrease in “move” operations when compared to 2021 hints at
the built-in trait of AI assistants to discourage code reuse. Instead of

refactoring and working to DRY (“Don’t Repeat Yourself”) code, they offer a one-keystroke templation to repeat existing code.

> Especially next to the decrease in “moved code,” the 11% increase in the proportion of duplicated code confirms the drop in overall code quality in 2023 when compared to 2021.

And my favorite one:

> In the absence of a CTO or VP of Engineering who actively schedules time to reduce “tech debt,” “copy/pasted code” often never gets consolidated into the appropriate component libraries.

Although, I saw this even before AI.

#ai #programming
👍9🤔2
Kubernetes Pi Cluster release v1.9.

This might be interesting for you, if you run Kubernetes clusters on Raspberry Pies. But what's also interesting about this particular release is that they have stated the reasons for a migration from ArgoCD to FluxCD for cluster bootstrap.

(saving you a click)
Main reasons for this migration: FluxCd native support of Helm.

- ArgoCD does not use Helm to deploy applications, instead helm template command is used to generate the manifest files to be applied to the cluster. The engine used by Argo CD for applying manifests to the cluster, is not always fully compatible with all Helm possible configurations (hooks, lookups, random password) causing out-of-sync situations.

- FluxCD uses helm command to deploy Helm Charts, so Helm charts installed in this way support all the Helm-functions. Also, it eases the debugging process, because helm CLI tool can be used to see installed packages and configuration applied.

- Dependencies Definition support and improve performance in Bootstrap process.

- ArgoCD does not support application dependencies definition, only synchronization waves can be defined, so applications can be allocated to one of the synchronization waves, so some kind of bootstrapping order can be specified. The problem with this approach is that one synchronization wave cannot start till the previous one has ended successfully, making the full process take longer times.

- FluxCD support the definition of dependencies between applications so the cluster can be bootstrapped in order. Each application start its deployment as soon as all its dependencies are already synchronized, improving the time required to make a full cluster deployment.

- Avoid definition of extra-configuration in the manifest files to fix never-ending out-of-sync ArgoCD issues. Due to how Argo CD drift assessment logic certain not mandatory fields or server assigned fields are marked as out-of-synch, and they have to be configured to be ignored during the sync process.

Cluster bootstrap process using Ansible playbook has been updated to use FluxCD instead of ArgoCD

#kubernetes #argo #flux
👍9🤔5
Another amazing article from the chat - 6 Reasons You Don't Need an SRE Team.

This quote from there should be carved in stone and put upon every technical manager's desk:

 reliability is everyone's job.
If you, or engineering leads who work in your org don't think so, then hiring a separate team to care about it isn't going to help.


There is also a complimentary article - Oncall: An Equal-Opportunity Waste of Time which is Ok, but I enjoyed the first one much more!.

#sre #culture
👍11🤔1
​​For today’s Donations Monday I would like to remind you of a standing jar of my German tutor.

https://send.monobank.ua/jar/8jQSHW57kP

He’s also a volunteer and he uses this jar to address ongoing requests from the combatants.

These things may not be as favorable to the media compared to some other equipment, but it doesn’t make it not important.

#donations #Ukraine
👍4
Today I stumbled upon an interesting project: Withmarble helps you to learn computer science topics using interactive flash cards.

It also looks like it uses some LLM under the hood to generate certain answers, but this is just a guess.

In any case, the project is very raw: it has only a couple of cases, it has bugs on both mobile and desktop, etc. For example, if you opened a flash card, there is no way to close it and go back to the list.

Still, I think it's a nice idea to teach folks computer science. Maybe, some of you could take this idea and execute it better :D

#programming
👍5
Kubernetes on a High Traffic Environment: 3 Key Takeaways is a nice brief article on concrete things one would benefit from in a high load environment. These things are:

- Node cache DNS
- Peak EWMA algorithm for load balancing
- Multiple Ingress Controllers for different income streams (if this is your case).

This article also contains links to other articles, where you can learn more about each thing separately.

#kubernetes
🔥9
There is a slight disagreement between those who believe that AI is here to save the world from software developers with a job, and those who believe that this is just an advanced autocomplete.

This article provides some arguments to the latter point.

For me, first and foremost, it is interesting insight on the ways how people test new AI models.

P.S. If you are from the optimistic tribe, make sure to check out Den's video (in Ukrainian) about Cursor - an AI-powered editor.

#ai #programming
👍72
​​A friend of mine raises funds for a van for her relative that serves in AFU right now.

The fundraiser is in Privat Bank, which doesn’t accept non-Ukrainian cards for whatever reason. However there’s also PayPal.

Privat for Ukrainian cards: https://next.privat24.ua/send/dntp4

PayPal (worldwide): [email protected]

If you’re gonna use PayPal, please, put a comment that this is for a van, so it’s easier for her to distinguish between donations.

#donations #Ukraine
👍2
Many people know about resources in Kubernetes, because every second article talks about the importance of setting them correctly. Many people know that resources in Kubernetes are later translated into Linux cgroups, because this is a common interview question.

Yet, how many people know, how exactly are resource requests and limits are translated into cgroups?

#kubernetes #linux
👍21