Reddit DevOps
270 subscribers
6 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Need Guidance (Student)

Hey, There I am an engineering student, I have a pretty good interest in DevOps. I have started some lechers about AWS, if there any small amount of guidance you can provide me that would be helpful. For getting started

https://redd.it/1eprn6m
@r_devops
Is there a book/course/podcast/tool-suite that helped you go from fundamental understanding to mastery?

15 year software developer entering devops space.

https://redd.it/1epsrj2
@r_devops
How do you handle disaster recovery?

As a best practice, let's say.

And on that note, what does HA mean to you?

https://redd.it/1epvu41
@r_devops
Distributed Tracing Weekend Project with Grafana Tempo

🌟 This weekend, I got curious about exploring Distributed Tracing in a Microservice architecture using containers in my Homelab. I decided to dive in and simulate an order processing system with multiple services: order-service, inventory-service, payment-service, warehouse-service, and fraud-service.

🔍 One of the challenges with microservices is debugging when a single request traverses multiple services. It can be tricky to trace and understand what’s happening across the system. To tackle this, I implemented distributed tracing using OpenTelemetry, Grafana Tempo, and Grafana Loki. The setup is designed to provide a seamless way to view traces directly from logs, making it easier to debug and monitor the entire process.

🚀 The project includes Docker Compose with auto-configuration, so you can easily spin it up and explore the architecture yourself. If you're interested, feel free to check out the repo, and don't forget to give it a ⭐️ if you find it useful!

Github Repo:

https://github.com/ruanbekker/grafana-tempo-loki-tracing

https://redd.it/1epww6b
@r_devops
Risks of running 2nd Express server with health check port?

I have a simple app running on NodeJS/Express ubuntu backend on AWS ec2, with free monitoring using UptimeRobot (UTR). I decided I didn't want to leave health check API exposed publicly, so I stood up a second express instance in my server.js, a second port (4431), and configured port 4431 to host only my healthcheck route. I then locked down p4431 access via Security Group to only IP ranges owned by UTR (https://uptimerobot.com/help/locations). It all works as intended, UTR can monitor successfully while the port and health check remain publicly closed. Just curious: Are there any risks or critical tradeoffs with this approach? Something like "a second express server drastically increases resource consumption", etc?

https://redd.it/1epy6cw
@r_devops
DevOps Testing Tools For 2024 Compared

The article discusses various testing tools that are commonly used in DevOps workflows. It provides an overview of the following popular tools for different types of testing (unit, integration, performance, security, monitoring) to help choose the right testing tools for their specific needs and integrate them: [9 Best DevOps Testing Tools For 2024](https://www.codium.ai/blog/best-devops-testing-tools/)

* QA Wolf
* k6
* Opkey
* Parasoft
* Typemock
* EMMA
* SimpleTest
* Tricentis Tosca
* AppVerify

https://redd.it/1eq4pyv
@r_devops
Pragmatic scaling of small self hosted CI runner fleet

Hi,

I'm managing the self-hosted CI infrastructure for a small software dev team. Mostly, I'm ensuring we have enough runners for the team needs. We don't have apps, the runners are just building code, running tests, etc (in docker, so the runners just need to have docker).

We have a couple small servers, and what I had so far was a basic Linux distro plus a homemade script to have the runners up and running.

Now I'm facing two issues : we're leaving gitlab for github, where runner can only execute one job, so while having the capability to parallelize a dozen jobs was just one simple param in a config file for gitlab, now I actually need to instanciate a dozen runners. Plus, the team grows, the CI does more and more, so I need to be able to add runners from time to time.

Now I'm looking for the most pragmatical way to scale this runner fleet up, given that I've never played with k8s, proxmox, ansible and the likes. It should be easy to maintain and scale, and not too hard to setup.

I'm thinking about getting proxmox on each node with 4 VMs, each having a runner (setup through a script), but managing all this manually already feels hard.

What are the best options, simple and not overkill, yet efficient, given that I don't know yet about k8s, we don't have a cluster or anything like that?

https://redd.it/1eq7sqy
@r_devops
HAProxy and hotsport on eks

Hello everyone,

I have a situation at work where I need to change how we expose our sandboxed environments to clients. Our current infrastructure runs on eks and we provision pods to clients on-demand using nodeport as the service type with one node in the cluster exposed publicly and acting as the entry point for the client connection. We are running this setup because all of the client connections are tcp based and the guy who designed the original infra obviously hasn't put much thought into the user-base growing and the nodeport range limitation posed by eks that we'd eventually run into (Only 2767 ports could be used simultaneously).

Now, I am thinking about using HAproxy controller and hostport to map the client connection directly to the pod, but I have no idea or how that would work, it's just an idea that I have initially. I would love to hear some solution suggestions and/or pointers on how I would start implementing the idea that I have. All the application pods are tcp based and I need to make an exclusive pod for each client those are the only two constraints.

https://redd.it/1eqayb4
@r_devops
Organizing & minimizing cloud costs in AWS

We're running a bunch of workloads on AWS in different accounts. Every now and then (usually when we have a big spike in expenses), we find ourselves trying to figure where our main expenses are coming from, what kind of workloads are currently running and wasting money and whether we have redundant workloads that we should get rid of.

In general, we are trying to constantly add tags to workloads and educate the team to add the relevant tags to any workloads they start (often developers starting EC2 machines, snapshots, Sagemaker pipelines, S3 buckets etc.).

Needless to say that sometimes people don't add tags at all, or do not add the appropriate tags. Sometimes people forget their expensive instances running idle during the weekend etc.

How do you guys handle monitoring your workloads (what asset belongs to what project), expenses, reducing redundant workloads to a minimum and generally keeping a good hygiene environment where a lot of money is not spent unnecessarily

https://redd.it/1eqf1h3
@r_devops
New to Devops - How do I find where our grafana instance is installed in our EKS cluster?

Good day folks. I was tasked with troubleshooting a grafana-loki issue but I don't know where to start. I looked at our console and tried to verify that the loki data connection was good to go but it isn't. It can't call the resource. I was told that once upon a time it worked and then stopped working a few weeks ago. I didn't configure grafana or loki myself so I don't know the details.

At this point I am just trying to find where the grafana/loki configuration is located. The lead for our particular section of the project is out sick so. And even when he gets back, I hate asking him stuff like this because I get the notion that 1. either he feels like I should know it already or 2. He just hates being bothered. He never voiced this but his tone isn't really inviting lol.

I have been a systems admin for quite a while and I just this year got the opportunity to get deep into Devops. So, sorry if my responses aren't as educated as one would expect lol. Our environment seems very intricate and not only me, but a few other new hires with over 10-15 years in IT are even saying the way these guys are going about getting us accustomed to this environment isn't optimal lol.


Thanks in advance.

https://redd.it/1eqgonz
@r_devops
Where to store Py script run as part of GH Actions Workflow?

I have a Github Actions workflow which orchestrates terraform resources across a few different platforms. One step of the process is running a Py script which queries one of our platforms for a few key pieces of info before appending them to tfvars. Currently that script lives in the module root folder. This is part of a template which is cloned to create multiple services, so that means each repo has it's own copy of the script - probably a bad practice, we'll have to update each individual script in each repo if we ever make changes. What is the best way to make this script available to Workflow as a single source of truth?

https://redd.it/1eqgpp8
@r_devops
Sharing a Kubernetes + Azure Key Vault Integration Guide - Feedback Welcome!

Hey r/DevOps community,
I've been working on integrating Kubernetes with Azure Key Vault using OIDC, and I thought I'd share what I've learned in case it's helpful to others.
I've put together a detailed video guide (about 2 hours long) covering:
Setting up a K3d cluster
Establishing OIDC trust between Kubernetes and Azure Key Vault
Implementing the External Secrets Operator
Practical secret management in a production-like environment
Here's the link: https://youtu.be/JFJJWB7neIg?si=auHt3HF0wqZT5ZC7
I'm not here to promote myself, just to share knowledge. I've learned so much from this community, and I hope this can give something back.
If you do check it out, I'd be incredibly grateful for any feedback, corrections, or suggestions for improvement. There's always more to learn, and I'm sure many of you have tackled similar challenges in different (probably better) ways.
Some questions I'm particularly interested in:
Have you faced any specific challenges with Kubernetes-Azure integration that aren't covered here?
Are there any best practices or security considerations you think are crucial for this kind of setup?
How do you handle secret management in your organizations?
Whether you watch the video or not, I'd love to hear your thoughts and experiences on this topic. Thanks for being such a great community for learning and sharing!

https://redd.it/1eqj2gl
@r_devops
EKS for dev teams

I got a task to build EKS cluster for software developers. While EKS setup is clear - i got a question, what would developers prefer for deploying their stuff? (Including observability, logs, etc). I am looking at stuff like ArgoCD - but heard about it also not so favorable comments. So prefer pure pipelines, but still seeing “something” in cluster in my opinion is nice.

https://redd.it/1eql1s9
@r_devops
How to make my app reachable from outside through URL

Hello, I applied one of free bootcamp which is giving job opportunities and they sent to me use case, they wish for me created api services which is being get health endpoint to check app health. Then create docker file to push my app to docker hub. After that set up k8s includes rule that if health endpoint is failed, make application restart. I did all till that point.

But they also wish that make app to reachable through url. I can use native load balancer or tools like nginx.

I don't know how to make it since I don't have DNS in the hand.

I would be glad to hear your advice.
Thank you

https://redd.it/1eqniti
@r_devops
Using work provided sand-box for learning? At odds here..

I have access to an Azure and AWS provided sand-box , as a total newb and junior, I need to upskill. I see where I am lacking and it involves, AWS , azure, cloud based skills. Which employer knew.


My only concern is that, I want to start using it, but deploying sample websites , to learn PKI, learn certificate installs , buying domains etc. Only concern with that is….1. Buying things,(domains, etc) is not cheap. 2. I was told its for learning purposes and to kill off anything Which i wont be using ( was also planning to self learn terraform this way - to help destroy any infrastructure I create.)

Is there anything bad with inherently learning how to do this, I would for 1, not be establishing a personal website or the like, but i would like to learn how to deploy it. Since this account is being paid for or would i be better off, mocking something we have in our DEV or test environment to learn from in my own sandbox.

Never really had one, so seeking some input on that

https://redd.it/1eqpfxy
@r_devops
created a terraform for faster docker builds using a remote buildkit instance

Hey folks, I know most of you use Docker in your CI/CD pipelines. Slow Docker builds are so annoying and frustrating—we’ve all been there!

I created this open-sourced repo, https://github.com/useblacksmith/remote-buildkit-terraform, which contains a terraform config to quickly spin up and configure a remote BuildKit instance in aws that caches docker layers and substantially speeds up docker builds.

It is not perfect and wouldn’t work for large engineering teams, but it could really help many folks here.

Feel free to use it and let me know what you think.

https://redd.it/1eqr954
@r_devops
Are we having a good use case for k8s jobs?

Hello,

we are looking at optimising our kubernetes workloads. The cluster's are hosted on AWS EKS.

For reference a small overview of how a usual java/python service works in our cluster:

We are using AWS step functions to create a message in SQS, our pods are constaly checking its appropriate queue. If there is a new message it will perform the task. Based on queue size we are scaling the pods, to be able to handle higher traffic. As HPA we are using zalando adapter for metrics server Github.

So far this works quite well. However most of our services are not often triggered, this means we have a lot of pods just running without doing anything.

To better use our resources, we thought about migrating some of these services from pods to jobs. If a new message is sent to a queue, it will trigger a kubernetes job (looks like KEDA could be used for this). And the service will perform its task and then the job gets terminated.

Would this be a good use case for kubernetes jobs or are you recommending to look at other approches?

Thanks!

https://redd.it/1eqnxup
@r_devops
Should we CI/CD on production

Yesterday, my colleague told me that he didn’t think implemented ci/cd on production environment was a good idea. Since it could accidentally made something wrong and out of control. He suggested that we should deployed production manually, what do you guys think about it, please let me know

https://redd.it/1equmsf
@r_devops
Immutable VM image bakery companies?

What companies create hardened immutable VM images? For containers/docker images ChainGuard seems to be the front runners. Do any companies focus on VM images?

https://redd.it/1eqv2sm
@r_devops