Reddit DevOps
271 subscribers
11 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
eBPF in Observability Systems

Over the past few years, eBPF seems to have really taken off. Being able to attach programs to the Linux kernel really opens up a whole world of possibilities. As a DevOps Engineer I have really loved deploying Cilium to inspect K8S network traffic

As a technology, eBPF natural fit for observability systems and it has been adopted by a number of vendors already. This is the first of a two-part article looking at some general patterns of eBPF in observability and then going on to look at its implementation in a number of products.

I'd be really interested to hear other people's thoughts on eBPF both in general as well as in observability platforms.

Thanks!

https://observability-360.com/article/ViewArticle?id=ebpf-in-observability






https://redd.it/1di514i
@r_devops
Why are you managing your infrastructure? What was the most challenging part?

Additionally to in-premises, cloud providers count too (AWS, Azure, Digital Ocean, etc.)



In my case is because I wanted to save costs and have a means to deploy my application.

https://redd.it/1di4plq
@r_devops
Which Cloud Platform to Learn?

Hi Guys,

I'd like to know which is the best way for me to learn cloud, i've been studying for AWS for a little while doing projects on and off, but our company uses GCP, we have a RACI model where we are
Responsible and Accountable with, IAC for Projects deployment, IAM Policies, Cloud Buckets Deployments
But we barely use it since our company doesn't really need it as of the moment, compute engine deployment was our chance of deployment, but for reasons, company decided not to let our team handle those deployments and have developer do it, guess they are the devops team,.

my question;

should I learn AWS, where I have free voucher for CCP Exam
and the market is larger so opportunity is larger i guess?, where I already have a little bit more knowledge than GCP? but not work experience though.

or should I learn GCP, where the company is using it(note that opportunity may never come in for me, because our company doesn't really hire internals for these kind of roles,(devops/cloud engineer).

https://redd.it/1di622l
@r_devops
Q Is this a useful ebook bundle for beginner devops?

Hello, I took a job as a sysamin, turns out it's mostly devops. I'm learning as I go but my depth of understanding what goes on behind the scenes is pretty limited as I haven't done software engineering in years. I received an email for an O'Reilly ebook bundle, the titles mostly touch on the kinds of things I interact with, and I'm wondering what the devops community thinks of these books. Titles include:


Python for DevOps
Efficient Linux at the Command Line
Learning DevSecOps
Web Application Security
Terraform Up & Running
Practical Linux System Administration
Ansible Up & Running
Docker Up & Running
Linux Pocket Guide (Essential Commands)
FastAPI
Learning Modern Linux
Network Programmability & Automation
Kubernetes Up & Running
Learning Git
Learning GitHub Actions

https://redd.it/1di5kps
@r_devops
Too many manual PR's to get a change out, am I doing it wrong?

How do you cicd your internal libraries? I have worked at 3 different companies and it's the same stuff all the time

1. Make change, create a PR
2. Merge and generate artifact
3. Another PR to bump the artifact wherever used

Sometimes it goes a few more steps. I'm not sure if the problem is with the convoluted mess the internal libraries are or the way cicd has been established

This doesn't necessarily have to be internal libraries. Perhaps a general problem statement is, how do you propagate newly generated artifact IDs. Is it worth automating them to auto apply wherever necessary or should bumps must go through PR as any other change?

https://redd.it/1di9lz3
@r_devops
Android emulator in Gitlab CICD

Hello, I am building a Gitlab CI to test android stuff (bare with me a little as I am not a mobile dev and don't know a lot about the android world (yet)). We are using AWS to host our own gitlab-runners.
The pipeline should be able to run the android sdk emulator, which seems to require hardware acceleration via KVM. I can use AWS bare metal instances which provide KVM. Ideally I would like to use docker to run the tasks and that seems to be possible via mounting /dev/kvm (?)
I also need to decide between arm and x86 maschines.
The cheapest arm ones are a lot cheaper than the cheapest x86 ones so I would like to go with that (we currently wouldn't fully utilize the x86 bare metal maschines) but I am not sure about performance
Any of you have done something similar? What did you end up using and how well is it performing? If there is a good way to do this that doesn't require bare metal machines I am also very open to hearing about it

https://redd.it/1di4uju
@r_devops
mac

I have about 9 mac nodes building mac stuff.

I have my pipelines updating a log with their resource status's but is there a good lightweight platform for monitoring these things?

https://redd.it/1di9y3u
@r_devops
Graduated Computer Scientist need help with Azure.

Hello. I have an AP degree in Computer Science, so much knowledge is limited. I have been granted AZ-900 & AZ-204 courses to get certified, however I feel like the course is very lackluster. Is there anyone out there that can provide some material for me to study? Preferably for AZ-204, I am a bit lost.

Thanks in advance

https://redd.it/1dilz1s
@r_devops
How does your company handle local administrator users?

This might be more for /sysadmin but I wanted to know how it's handled in development/operations teams at different orgs where there may be a greater need for elevated permissions than in other departments.

A while ago I was tasked with removing all administrator rights and instead of getting some paid solution I was asked to do come up with a solution utilising PowerShell/Python scripts which I think isn't a great solution as many things can go wrong.

If you use Windows on end devices, how do you handle local admin?

https://redd.it/1dinims
@r_devops
Comprehensive guide to Kubernetes Networking with packet walk in Public Cloud (GKE, Ingress, GCLB, Network Policies, etc.)

Hi everyone! Coming from a strong networking background, I was always a bit frustrated that packet walks were not a natural thing in Kubernetes. Information on the topic is scattered, so I decided to put together a comprehensive walkthrough with packet walks.

It follows an application flow from a user to a GKE-hosted app, covering common components like GCLB, Ingress, and Network Policies. I start with the basics and also show how to use the Client Intents CRD (from the Otterize OSS project) for easier Network Policy management.

Would love to hear your feedback and any questions you have!

https://otterize.com/blog/mastering-kubernetes-networking-otterize-s-journey-in-cloud-native-packet-management

https://redd.it/1dir9xb
@r_devops
Automated testing in C++

So I want to use automated testing in C++ that runs when i create a PR. But i don't find any good resources to learn about automated testing. So how can I learn about automated testing with GitHub?

https://redd.it/1diraqt
@r_devops
AI tool to manage aws infrastructure

Are you currently using any AI tool to manage AWS infrastructure? If yes, is this tool able to manage aws infrastructure using plain english commands (e..g list my vpcs, create a vpc, delete a vpc etc)?

Amazon Q currently has support for querying AWS resources but does not work for deleting or provisioning AWS resources. I'm also aware about AI tools which can generate commands from a prompt which can be used with AWS CLI but that is not what I am looking for.

https://redd.it/1ditlru
@r_devops
New Release: Infisical PKI — Private CA & Certificate Management

Hi everyone! — I wanted to bring attention to an awesome development that we've been working on at Infisical that's fresh off the oven this week.

Infisical has released a new PKI product line including Private Certificate Authority (CA) and Certificate Management this week :)

With this release, it is now possible for you to create Private CA hierarchies and issue X.509 certificates for applications ranging from creating encrypted TLS communication channels to authenticating users, computers, and IoT devices.

With this first step into PKI tooling, you can now use Infisical not only for your secrets management needs but also for your internal PKI.

In the coming months, we’re excited to roll out more updates to this new product line including advanced alerting, comprehensive event bus / webhooks for lifecycle events, and much more.

You can read the fuller announcement for it here!

https://redd.it/1diw9gt
@r_devops
Monitoring and Alerting

Hello,

I am looking to see what are the best practices for monitoring and alerting in the devops world.

I am currently working as a NOC Operator/Analyst for my company that specializes in retail. I mainly oversee all servers (on-prem & cloud), UPS, switches and ISP circuits at our corporate office. I have alerts set in place if any issues occur along with triage/troubleshooting steps and RCAs created after the issue is resolved.

Recently, I was asked to assist our IT Ecom/Dev team to help revise the current monitoring and alerts system they have in place. Within a month, I was kinda brought up to speed for someone who has no devops experience. So far, I was provided a list of alerts we have in place and where they occur on the data flow map and basic knowledge on how all services are linked to our middleware. I also have knowledge on order flow and "Life of a SKU".

I've already noticed some gaps/areas in need of improvements. But here is a small list of what I plan to work on. Please let me know if I'm on the right track and if there's anything I should look into.

Missing descriptions in the alerts
currently, some of the alerts have a basic "Failed to do XYZ" but doesn't explain why. So myself and the dev team will have to spend time looking through logs to find out what went wrong. I believe we can place logic for specific error codes to specify the root cause.
Missing Triage/Troubleshooting
If an alert comes in, there's no instructions on what to do. Like who to notify or what to check. I believe we need to have a process in place for each alert especially SEV1's . It can be a status check on services, cloud instances, and check logs for errors. Then escalate to the correct team depending on the issue.
Run a weekly report on common alerts
Run a report to review alerts that come in frequently to investigate trends and see if we can set automation to resolve the issues or set up preventive action so the error will no longer occur.

Here are the tools I have access to:

Grafana
GCP (Pub/Sub) (Logs)
OpsGenie Alerts
OMS dashboard
WMS dashboard
Shopify

I was also wondering if there's a role that specializes in monitoring and alerts + investigations in the devops environment. I've been enjoying learning a lot about pipelines/workflows and would love to be that support system for our team so the Devs can focus on their sprints and less on the issues that randomly come up.

https://redd.it/1divjvh
@r_devops
Openshift project help

Hi y’all, I’m working on an Openshift project for school - basically it is to build a 3 tier architecture (with 3 containers, one for DB2, one for the app itself and one for the api).

I’m not sure how to start. I’ve written the code for the app already and been stuck on the db2 and api communication till now.

Any ideas would be greatly appreciated.

Can provide link for the Git repo if needed

https://redd.it/1diyeov
@r_devops
Cloud for DevOps

Just getting started with my devops journey. Most courses in Youtube for DevOps require AWS but I am more familiar with GCP as compared to AWS (beginner level in GCP - cleared GCP ACE).

Is it possible to continue my journey with GCP or is AWS a necessary requirement for learning DevOps and for future use ?

https://redd.it/1div6em
@r_devops
Any Learning tool where I can practice the concepts I learned ?

Hi all,
Just curious to know if there is any online tool./ website where I can learn and practice at same time. Switching from video tutorials to my Linux machine is a little distracting. Looking for places where this could be more engaging and efficient.

https://redd.it/1dj9u9v
@r_devops
PlatformCon 2024 Workshop: Deep Dive: Delivering and Managing an LLM Agent Application with KusionStack

Youtube: https://www.youtube.com/watch?v=ekYrvL27gv4

This workshop serves as an in-depth exploration of KusionStack and how it can be leveraged to deploy an LLM Agent application to the cloud. The demo consists of several stages that progressively carry out a story to deploy and manage an LLM Agent, each slightly more complex than the last.

This session is designed to provide a practical understanding of managing complex applications in modern cloud-native environments. Through real-world examples, we'll demonstrate KusionStack’s pivotal role in enabling the realization from intents to actual deliveries, providing actionable insights for leveraging KusionStack in a similar environment.

Prerequisite:
- A Kubernetes cluster. Minikube or Kind is fine too
- AWS Account with AccessKey and SecretKey ready (Optional, for provisioning cloud resources)
- An OpenAI key (Optional, for testing the Agent Application)

Website: https://www.kusionstack.io/docs/
GitHub: https://github.com/KusionStack

https://redd.it/1djbbyw
@r_devops
Is your companies tech documentation also a mess?

Hey everyone, I work at a medium sized tech company and our documentation is all over the place. We use Confluence mainly, but also Gitlab readmes and Slack channels. The information is spread out and it's quite hard to find the information that you need. From reading this subreddit, it seems a lot of tech companies are in the same boat.

Has anyone used or built anything internally to help with this?

I'm thinking something like a semantic search across Confluence, Gitlab, and Slack channels. Anybody know of something that can do this?

https://redd.it/1djd7yz
@r_devops
How do you roll out components to your clusters?

I have a vague idea and would like some help detailing!

Say you need to upgrade Kong using TF and Flux
You have one AKS cluster in each region

Do you go in each cluster repo and upgrade one at a time

or

Do you go to one repo and upgrade to all simultaneously?


The problem is that there are many clusters, is there an architecture where you could just upgrade to all at once? What is the requirement there, to have applications running on infra that can handle it?

https://redd.it/1djflbb
@r_devops