Reddit DevOps – Telegram

Reddit DevOps

270 subscribers

8 photos

31.1K links

Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels

Download Telegram

About

Blog

Apps

Platform

270 subscribers

Distributed Tracing Weekend Project with Grafana Tempo

🌟 This weekend, I got curious about exploring Distributed Tracing in a Microservice architecture using containers in my Homelab. I decided to dive in and simulate an order processing system with multiple services: order-service, inventory-service, payment-service, warehouse-service, and fraud-service.

🔍 One of the challenges with microservices is debugging when a single request traverses multiple services. It can be tricky to trace and understand what’s happening across the system. To tackle this, I implemented distributed tracing using OpenTelemetry, Grafana Tempo, and Grafana Loki. The setup is designed to provide a seamless way to view traces directly from logs, making it easier to debug and monitor the entire process.

🚀 The project includes Docker Compose with auto-configuration, so you can easily spin it up and explore the architecture yourself. If you're interested, feel free to check out the repo, and don't forget to give it a ⭐️ if you find it useful!

Github Repo:

https://github.com/ruanbekker/grafana-tempo-loki-tracing

https://redd.it/1epww6b
@r_devops

GitHub - ruanbekker/grafana-tempo-loki-tracing: Grafana Distributed Tracing Example with: Tempo, Prometheus, Loki, Grafana and…

Grafana Distributed Tracing Example with: Tempo, Prometheus, Loki, Grafana and Python Flask - ruanbekker/grafana-tempo-loki-tracing

9 views22:28

Risks of running 2nd Express server with health check port?

I have a simple app running on NodeJS/Express ubuntu backend on AWS ec2, with free monitoring using UptimeRobot (UTR). I decided I didn't want to leave health check API exposed publicly, so I stood up a second express instance in my server.js, a second port (4431), and configured port 4431 to host only my healthcheck route. I then locked down p4431 access via Security Group to only IP ranges owned by UTR (https://uptimerobot.com/help/locations). It all works as intended, UTR can monitor successfully while the port and health check remain publicly closed. Just curious: Are there any risks or critical tradeoffs with this approach? Something like "a second express server drastically increases resource consumption", etc?

https://redd.it/1epy6cw
@r_devops

Locations and IPs | UptimeRobot

If you get any false positives, there is a strong chance that the IPs used are blocked by your hosting provider. Make sure that you allow-list these IPs!

14 views23:28

DevOps Testing Tools For 2024 Compared

The article discusses various testing tools that are commonly used in DevOps workflows. It provides an overview of the following popular tools for different types of testing (unit, integration, performance, security, monitoring) to help choose the right testing tools for their specific needs and integrate them: [9 Best DevOps Testing Tools For 2024](https://www.codium.ai/blog/best-devops-testing-tools/)

* QA Wolf
* k6
* Opkey
* Parasoft
* Typemock
* EMMA
* SimpleTest
* Tricentis Tosca
* AppVerify

https://redd.it/1eq4pyv
@r_devops

13 Best DevOps Testing Tools For 2025 - Qodo

The appropriate selection of DevOps automated testing tools is crucial for the success of any DevOps initiative.

22 views05:28

Pragmatic scaling of small self hosted CI runner fleet

Hi,

I'm managing the self-hosted CI infrastructure for a small software dev team. Mostly, I'm ensuring we have enough runners for the team needs. We don't have apps, the runners are just building code, running tests, etc (in docker, so the runners just need to have docker).

We have a couple small servers, and what I had so far was a basic Linux distro plus a homemade script to have the runners up and running.

Now I'm facing two issues : we're leaving gitlab for github, where runner can only execute one job, so while having the capability to parallelize a dozen jobs was just one simple param in a config file for gitlab, now I actually need to instanciate a dozen runners. Plus, the team grows, the CI does more and more, so I need to be able to add runners from time to time.

Now I'm looking for the most pragmatical way to scale this runner fleet up, given that I've never played with k8s, proxmox, ansible and the likes. It should be easy to maintain and scale, and not too hard to setup.

I'm thinking about getting proxmox on each node with 4 VMs, each having a runner (setup through a script), but managing all this manually already feels hard.

What are the best options, simple and not overkill, yet efficient, given that I don't know yet about k8s, we don't have a cluster or anything like that?

https://redd.it/1eq7sqy
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views08:28

HAProxy and hotsport on eks

Hello everyone,

I have a situation at work where I need to change how we expose our sandboxed environments to clients. Our current infrastructure runs on eks and we provision pods to clients on-demand using nodeport as the service type with one node in the cluster exposed publicly and acting as the entry point for the client connection. We are running this setup because all of the client connections are tcp based and the guy who designed the original infra obviously hasn't put much thought into the user-base growing and the nodeport range limitation posed by eks that we'd eventually run into (Only 2767 ports could be used simultaneously).

Now, I am thinking about using HAproxy controller and hostport to map the client connection directly to the pod, but I have no idea or how that would work, it's just an idea that I have initially. I would love to hear some solution suggestions and/or pointers on how I would start implementing the idea that I have. All the application pods are tcp based and I need to make an exclusive pod for each client those are the only two constraints.

https://redd.it/1eqayb4
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views11:28

Organizing & minimizing cloud costs in AWS

We're running a bunch of workloads on AWS in different accounts. Every now and then (usually when we have a big spike in expenses), we find ourselves trying to figure where our main expenses are coming from, what kind of workloads are currently running and wasting money and whether we have redundant workloads that we should get rid of.

In general, we are trying to constantly add tags to workloads and educate the team to add the relevant tags to any workloads they start (often developers starting EC2 machines, snapshots, Sagemaker pipelines, S3 buckets etc.).

Needless to say that sometimes people don't add tags at all, or do not add the appropriate tags. Sometimes people forget their expensive instances running idle during the weekend etc.

How do you guys handle monitoring your workloads (what asset belongs to what project), expenses, reducing redundant workloads to a minimum and generally keeping a good hygiene environment where a lot of money is not spent unnecessarily

https://redd.it/1eqf1h3
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views14:28

New to Devops - How do I find where our grafana instance is installed in our EKS cluster?

Good day folks. I was tasked with troubleshooting a grafana-loki issue but I don't know where to start. I looked at our console and tried to verify that the loki data connection was good to go but it isn't. It can't call the resource. I was told that once upon a time it worked and then stopped working a few weeks ago. I didn't configure grafana or loki myself so I don't know the details.

At this point I am just trying to find where the grafana/loki configuration is located. The lead for our particular section of the project is out sick so. And even when he gets back, I hate asking him stuff like this because I get the notion that 1. either he feels like I should know it already or 2. He just hates being bothered. He never voiced this but his tone isn't really inviting lol.

I have been a systems admin for quite a while and I just this year got the opportunity to get deep into Devops. So, sorry if my responses aren't as educated as one would expect lol. Our environment seems very intricate and not only me, but a few other new hires with over 10-15 years in IT are even saying the way these guys are going about getting us accustomed to this environment isn't optimal lol.

Thanks in advance.

https://redd.it/1eqgonz
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views15:28

Where to store Py script run as part of GH Actions Workflow?

I have a Github Actions workflow which orchestrates terraform resources across a few different platforms. One step of the process is running a Py script which queries one of our platforms for a few key pieces of info before appending them to tfvars. Currently that script lives in the module root folder. This is part of a template which is cloned to create multiple services, so that means each repo has it's own copy of the script - probably a bad practice, we'll have to update each individual script in each repo if we ever make changes. What is the best way to make this script available to Workflow as a single source of truth?

https://redd.it/1eqgpp8
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views16:28

Sharing a Kubernetes + Azure Key Vault Integration Guide - Feedback Welcome!

Hey r/DevOps community,
I've been working on integrating Kubernetes with Azure Key Vault using OIDC, and I thought I'd share what I've learned in case it's helpful to others.
I've put together a detailed video guide (about 2 hours long) covering:
Setting up a K3d cluster
Establishing OIDC trust between Kubernetes and Azure Key Vault
Implementing the External Secrets Operator
Practical secret management in a production-like environment
Here's the link: https://youtu.be/JFJJWB7neIg?si=auHt3HF0wqZT5ZC7
I'm not here to promote myself, just to share knowledge. I've learned so much from this community, and I hope this can give something back.
If you do check it out, I'd be incredibly grateful for any feedback, corrections, or suggestions for improvement. There's always more to learn, and I'm sure many of you have tackled similar challenges in different (probably better) ways.
Some questions I'm particularly interested in:
Have you faced any specific challenges with Kubernetes-Azure integration that aren't covered here?
Are there any best practices or security considerations you think are crucial for this kind of setup?
How do you handle secret management in your organizations?
Whether you watch the video or not, I'd love to hear your thoughts and experiences on this topic. Thanks for being such a great community for learning and sharing!

https://redd.it/1eqj2gl
@r_devops

🔐 Ultimate Guide: Kubernetes OIDC Integration with Azure Key Vault | External Secrets in Action

Dive deep into the world of Kubernetes and Azure integration in this comprehensive 2-hour tutorial. Learn how to establish a secure OIDC trust between a k3d Kubernetes cluster and Azure API, unlocking powerful secret management capabilities.

🔧 What you'll…

15 views17:28

EKS for dev teams

I got a task to build EKS cluster for software developers. While EKS setup is clear - i got a question, what would developers prefer for deploying their stuff? (Including observability, logs, etc). I am looking at stuff like ArgoCD - but heard about it also not so favorable comments. So prefer pure pipelines, but still seeing “something” in cluster in my opinion is nice.

https://redd.it/1eql1s9
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views18:28

How to make my app reachable from outside through URL

Hello, I applied one of free bootcamp which is giving job opportunities and they sent to me use case, they wish for me created api services which is being get health endpoint to check app health. Then create docker file to push my app to docker hub. After that set up k8s includes rule that if health endpoint is failed, make application restart. I did all till that point.

But they also wish that make app to reachable through url. I can use native load balancer or tools like nginx.

I don't know how to make it since I don't have DNS in the hand.

I would be glad to hear your advice.
Thank you

https://redd.it/1eqniti
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views20:28

Using work provided sand-box for learning? At odds here..

I have access to an Azure and AWS provided sand-box , as a total newb and junior, I need to upskill. I see where I am lacking and it involves, AWS , azure, cloud based skills. Which employer knew.

My only concern is that, I want to start using it, but deploying sample websites , to learn PKI, learn certificate installs , buying domains etc. Only concern with that is….1. Buying things,(domains, etc) is not cheap. 2. I was told its for learning purposes and to kill off anything Which i wont be using ( was also planning to self learn terraform this way - to help destroy any infrastructure I create.)

Is there anything bad with inherently learning how to do this, I would for 1, not be establishing a personal website or the like, but i would like to learn how to deploy it. Since this account is being paid for or would i be better off, mocking something we have in our DEV or test environment to learn from in my own sandbox.

Never really had one, so seeking some input on that

https://redd.it/1eqpfxy
@r_devops

Using work provided sand-box for learning? At odds here.. : r/devops

18 votes, 20 comments. 349K subscribers in the devops community.

12 views21:28

created a terraform for faster docker builds using a remote buildkit instance

Hey folks, I know most of you use Docker in your CI/CD pipelines. Slow Docker builds are so annoying and frustrating—we’ve all been there!

I created this open-sourced repo, https://github.com/useblacksmith/remote-buildkit-terraform, which contains a terraform config to quickly spin up and configure a remote BuildKit instance in aws that caches docker layers and substantially speeds up docker builds.

It is not perfect and wouldn’t work for large engineering teams, but it could really help many folks here.

Feel free to use it and let me know what you think.

https://redd.it/1eqr954
@r_devops

GitHub - useblacksmith/remote-buildkit-terraform

Contribute to useblacksmith/remote-buildkit-terraform development by creating an account on GitHub.

14 views22:28

Are we having a good use case for k8s jobs?

Hello,

we are looking at optimising our kubernetes workloads. The cluster's are hosted on AWS EKS.

For reference a small overview of how a usual java/python service works in our cluster:

We are using AWS step functions to create a message in SQS, our pods are constaly checking its appropriate queue. If there is a new message it will perform the task. Based on queue size we are scaling the pods, to be able to handle higher traffic. As HPA we are using zalando adapter for metrics server Github.

So far this works quite well. However most of our services are not often triggered, this means we have a lot of pods just running without doing anything.

To better use our resources, we thought about migrating some of these services from pods to jobs. If a new message is sent to a queue, it will trigger a kubernetes job (looks like KEDA could be used for this). And the service will perform its task and then the job gets terminated.

Would this be a good use case for kubernetes jobs or are you recommending to look at other approches?

Thanks!

https://redd.it/1eqnxup
@r_devops

GitHub - zalando-incubator/kube-metrics-adapter: General purpose metrics adapter for Kubernetes HPA metrics

General purpose metrics adapter for Kubernetes HPA metrics - zalando-incubator/kube-metrics-adapter

16 views23:28

Should we CI/CD on production

Yesterday, my colleague told me that he didn’t think implemented ci/cd on production environment was a good idea. Since it could accidentally made something wrong and out of control. He suggested that we should deployed production manually, what do you guys think about it, please let me know

https://redd.it/1equmsf
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views01:28

argc - Top-tier utility/framework for creating shell scripts

https://github.com/sigoden/argc

I’m not the author. Whoever it is, they are a bloody legend!

Figured I would share it as it deserves way more love.

https://redd.it/1eqvgzw
@r_devops

GitHub - sigoden/argc: A Bash CLI framework, also a Bash command runner.

A Bash CLI framework, also a Bash command runner. Contribute to sigoden/argc development by creating an account on GitHub.

13 views02:28

Immutable VM image bakery companies?

What companies create hardened immutable VM images? For containers/docker images ChainGuard seems to be the front runners. Do any companies focus on VM images?

https://redd.it/1eqv2sm
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views03:28

Attempting a Website Builder

Hey everyone. Im attempting to build a website builder (targeting low traffic sites).

my plan was to deploy a single VM initially and run multiple containers on it. i.e backend, frontend, reverse-proxy, certbot for the app/builder and have a service like vercel/netlify handle all the domains and deployment of users websites.

but i had the bright idea of what if i have a go of it myself and learn more dev ops on the way. What do i need to know to build the devops side of a website builder....

I thought at first should i try run everything on a single VM to reduce costs initially and scale vertically and worry about getting it get scale horizontally across multiple VM later. (I know its a single point of failure).

Am i crazy for even thinking a website builder can operate without kubernetes?

currently i have a cd ci pipeline. with infrastructure managed by terraform. and ansible configuring my vm and pull and run my docker images.

any direction or thoughts would help. I am fairly new to dev ops, so sorry if my explainations aren't clear.

many thanks.

https://redd.it/1eqyhw6
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views04:28

ZFS or Ceph

Any use in learning ZFS or Ceph if I want to switch to a DevOps job?

https://redd.it/1er1wzs
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views08:28

See the cost of your Terraform in IntelliJ IDEs, as you develop it

Hi, my name is Owen and I recently started working at Infracost (YC W21 batch) (https://infracost.io). Infracost shows engineers how much their code changes will cost on the cloud before it gets deployed. For example, when an engineer changes a cloud resource (like an AWS virtual machine), Infracost posts a comment in CI/CD telling them "This change is going to increase your costs by 25% next month from $500/m to $625/m".

Previously, I was one of the founders of tfsec, the code security scanner; I quickly realised that identifying issues in your code (especially infrastructure code, i.e. Terraform) as soon as possible was the best defence. A lot of the principles of code scanning for security misconfigurations translate well to identifying cost impact. Many times, people are surprised by how cloud resources are priced and how expensive they can be. It is also really unfair that engineers are never given a ‘checkout screen’ when buying infrastructure, and then are blamed for breaking cloud budgets.

I believe engineers should have access to key information about cloud costs at the time of writing the code. So, I spent some time and built an Infracost plugin for the IntelliJ family of IDEs (https://plugins.jetbrains.com/plugin/24761-infracost).

With this plugin installed, as you develop your Terraform code, you will get the cost impact of your current project, and quickly see where the expensive resources are hiding in your code (just hit save & it will recalculate). Two main use cases I’m thinking of:

As you change resources, you can see the cost impact. For example, I increased the instance size from my Dev to Prod environment to handle the prod-sized workloads, and I can see the increase costs.
Comparing costs: I can copy + paste blocks of code and see the cost impact of using different configuration options, like removing multi-AZ options from test environments etc. I can see I save a few thousand dollars per year that way immediately.

You can still use Infracost in GitHub/GitLab to automate the cost analysis in CI/CD, and check for best practices, and the IDE tools will help you spot the issues sooner.

I’d love to get your feedback on this. I want to know if it is helpful, what other cool features we can add, and how it can be improved. Also if you spot any issues or bugs, let me know!

Here is how to install it: https://plugins.jetbrains.com/plugin/24761-infracost

I've done a demo video to get you started too - https://www.youtube.com/watch?v=kgfkdmUNzEo

https://redd.it/1er7966
@r_devops

Infracost - Shift FinOps Left

Shift FinOps Left with Infracost

Know the cost impact of infrastructure changes before launching resources. Try it for free today!

11 views13:28

Take control over GitHub repositories through leaked secrets in artifacts

New research shows how organizations tend to embed secrets in GitHub Actions workflow artifacts, mainly GitHub tokens. While the GITHUB_TOKEN is invalidated as soon as the job is complete, it's still possible to track the artifact upload and utilize the token to push code to the repository before the job is done.

Issue was found in highly-popular open source projects, owned by Google, Microsoft, AWS, Red Hat, Canonical (Ubuntu), OWASP, and others.

https://unit42.paloaltonetworks.com/github-repo-artifacts-leak-tokens/

https://redd.it/1er8x0j
@r_devops

ArtiPACKED: Hacking Giants Through a Race Condition in GitHub Actions Artifacts

New research uncovers a potential attack vector on GitHub repositories, with leaked tokens leading to potential compromise of services.

11 views14:28