Reddit DevOps – Telegram

Reddit DevOps

269 subscribers

3 photos

31K links

Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels

Download Telegram

About

Blog

Apps

Platform

269 subscribers

GitHub Actions analytics: what am I missing?

How are you actually tracking GitHub Actions costs across your org?

I've been working on a GitHub Actions analytics tool for the past year, and honestly, when GitHub rolled out their own metrics dashboard 6 months ago, I thought I was done for.

But after using GitHub's implementation for a while, it's pretty clear they built it for individual developers, not engineering managers trying to get org-wide visibility. The UX is clunky, you can't easily compare teams or projects.

For those of you managing GitHub Actions at scale - what's been your experience? Are you struggling with the same issues, or have you found workarounds that actually work?

Some specific pain points I've heard:

No easy way to see which teams/repos are burning through your Actions budget
Can't create meaningful reports for leadership
Impossible to benchmark performance across different projects
Zero alerting when costs spike

Currently working on octolense.com to tackle these problems, but curious what other approaches people are taking. Anyone found tools that actually solve the enterprise analytics gap?

https://redd.it/1luc2t4
@r_devops

Octolense - Operational intelligence for teams building on GitHub

Turn pull requests, workflows, and CI activity into clear signals—so you can spot issues faster and keep engineering moving.

8 views02:28

Setting up a Remote Development Machine for development

Hello everyone. I am kind of a beginner at this but I have been assigned to make an RDM at my office (Software development company). The company wants to minimize the use of laptop within the office as some employees don't have the computing powers for deploying/testing codes. What they expect of the RDM is as follows:

* The RDM will be just one main machine where all the employees (around 10-12) can access simultaneously (given that we already make an account for them on the machine). If 10 is a lot (for 1 machine), then we can have 2 separate RDM's, 5 users on one and 5 on the other

* The RDM should (for now) be locally accessible, making it public is not a need as of now

* Each employee will be assigned his account on the RDM thus every employee can see ONLY their files and folders

Now my question here is, is this achievable? I can't find an online source that has done it this way. The only source I could find that matched my requirements was this:
https://medium.com/@timatomlearning/building-a-fully-remote-development-environment-adafaf69adb7

https://medium.com/walmartglobaltech/remote-development-an-efficient-solution-to-the-time-consuming-local-build-process-e2e9e09720df (This just syncs the files between the host and the server, which is half of what I need)

Any help would be appreciated. I'm a bit stuck here

https://redd.it/1lugiq8
@r_devops

Building a Fully Remote Development Environment

Read about how we built a fully remote development environment at Atom Learning using cloud-hosted Kubernetes.

7 views05:28

What would be considered as the best achievement to list in a CV for DevOps intern role?

Hi everyone,
I’m currently preparing my CV for DevOps intern applications and I’m wondering — what kind of achievements or experience would actually stand out?

I’ve worked on a few personal projects with Docker, GitHub Actions, and basic CI/CD setups. But I’m not sure how to frame them as solid achievements. Could anyone share examples or tips on what recruiters/hiring managers look for at the intern level?

Thanks in advance!

https://redd.it/1lui32h
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views07:28

Looking for recommendations on SMS and email providers with API and pay-as-you-go pricing

Hi everyone,

I’m developing a software app that needs to send automated SMS and email notifications to customers.

I’m looking for reliable SMS and email providers that:

* offer easy-to-use APIs
* support pay-as-you-go pricing
* provide delivery reports

What providers do you recommend? Any personal experience or advice would be really appreciated!

Thanks in advance!

https://redd.it/1lujeur
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views09:28

Is "self-hosting" and "homelab" something I should mention in my CV/Resume

for DevOps/SRE/Platform/Cloud intern positions?

https://redd.it/1lujqly
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views10:28

QA with security testing background looking to transition to DevSecOps

Hello,

I am a QA with more than 11 years of experience in the software industry and I have acquired skills related to cybersecurity by doing pentesting for my employers and doing public bug bounties(but never professionally or with a job title related to security). I want to move into a DevSecOps role and my motive is purely financial as I have reached the tipping point as a QA.
What should be my transition plan/path? Is there any certification you can recommend me for this role specifically?

Below is what chatgpt recommended me and a plan to acquire the skills listed. Is this the right path or the right set of skills?

🧰 Key Responsibilities:

Area Responsibilities

CI/CD Security Automate security scanning in pipelines (SAST, DAST, secrets detection, dependency scanning)
Cloud Security Implement IAM best practices, manage cloud security policies (e.g., AWS IAM, KMS, GuardDuty)
Infrastructure as Code (IaC) Secure Terraform/CloudFormation scripts using tools like Checkov, tfsec
Container/K8s Security Harden Docker images, manage security in Kubernetes clusters
Secrets Management Use tools like Vault, AWS Secrets Manager, or Sealed Secrets
Monitoring & Compliance Implement runtime security, SIEM integration, compliance audits (e.g., CIS Benchmarks)
Security-as-Code Apply policies using tools like OPA/Gatekeeper, Conftest

🧠 Skills Required:

Strong scripting knowledge (Bash, Python, or similar)

Hands-on experience with CI/CD tools (GitHub Actions, GitLab, Jenkins)

Familiarity with cloud providers (AWS, Azure, GCP)

IaC experience (Terraform, Ansible, etc.)

Container tools: Docker, Kubernetes, Falco, Trivy

Security toolchains: Snyk, Anchore, Checkov, etc.

https://redd.it/1lukcdi
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views11:28

Who is responsible for setting up and maintaining CI/CD pipelines in your org?

In my experience, setting up and maintaining CI/CD pipelines has typically been a joint effort between DevOps and Developers. But I’ve recently come across teams where QAs play a major role in owning and maintaining these pipelines.

We’re currently exploring how to structure this in our organisation, whether it should be Developers, DevOps or QAs who take ownership of the CI/CD process.

I’d love to hear how it works in your company. Also please comment what's working and what's not working with the current process.

View Poll

https://redd.it/1lunc34
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views13:28

Anyone else tried Bash 5.3 yet? Some actually useful improvements for once

Been testing Bash 5.3 in our staging environment and honestly didn't expect much, but there are some solid quality-of-life improvements that actually matter for day-to-day work.

The ones I'm finding most useful:

Better error messages \- Parameter expansion errors actually tell you what's wrong now instead of just "bad substitution". Saved me 20 minutes of debugging yesterday.

Built-in microsecond timestamps \- $EPOCHREALTIME gives you epoch time with decimal precision. Great for timing deployment steps without needing external tools.

Process substitution debugging \- When complex pipelines break, it actually tells you which part failed. Game changer for troubleshooting.

Improved job control \- The wait builtin can handle multiple PIDs properly now. Makes parallel deployment scripts way more reliable.

Faster tab completion \- Noticeable improvement in directories with thousands of files.

The performance improvements are real too. Startup time and memory usage both improved, especially with large scripts.

Most of these solve actual problems I hit weekly in CI/CD pipelines and deployment automation. Not just theoretical improvements.

Has anyone else been testing it? Curious what other practical improvements people are finding.

Also wondering about compatibility - so far everything's been backward compatible but want to hear if anyone's hit issues.

Been documenting all my findings if anyone wants a deeper dive - happy to share here: https://medium.com/@heinancabouly/bash-5-3-is-here-the-shell-update-that-actually-matters-97433bc5556c?source=friends\_link&sk=2f7a69f424f80e856716d256ca1ca3b9

https://redd.it/1luoqk3
@r_devops

Bash 5.3 is Here: The Shell Update That Actually Matters

Bash 5.3 brings genuinely useful improvements: read more to learn how to utilize!

4 views14:28

Creating customer specific builds out of a template that holds multiple repos

I hope the title makes sense. I only recently started working with Azure DevOps (pipeline)
Trying my best to make sense:

My infrastructure looks like this:

I have a product (`Banana!Supreme`) that is composed of 4 submodules:

- Banana.Vision @ 1a2b3c4d5e6f7g8h9i0j

- Banana.WPF @ a1b2c3d4e5f6a7b8c9d0

- Banana.Logging @ abcdef1234567890abcd

- Banana.License @ 123456abcdef7890abcd

Now, for each customer, I basically *rebrand the program*, so I might have:

- `Jackfruit!Supreme v1.0` using current module commits

- `Blueberry!Supreme v1.0` a week later, possibly using newer module commits

I want to:

- Lock in which submodule versions were used for a specific customer build (so I can rebuild it in the future).

What I currently trying to build // hallucinated as framework of thought:

```
SupremeBuilder/

├── Banana.Vision ⬅️ submodule

├── Banana.WPF/ ⬅️ submodule

├── Banana.Logging/ ⬅️ submodule

├── Banana.License/ ⬅️ submodule

├── customers/

│ ├── Jackfruit/

│ │ └── requirements.yml ⬅️ which module versions to use

│ ├── Blueberry/

│ │ ├── requirements.yml

│ │ └── branding.config ⬅️ optional: name, icons, colors

├── build.ps1 ⬅️ build script reading requirements

└── azure-pipelines.yml ⬅️ pipeline entry
```

The requirements.txt locking in which submodules are used for the build and which version

Example `requirements.yml`:

```yaml

app_name: Jackfruit!Supreme

version: 1.0

modules:

Banana.Vision @ 1a2b3c4d5e6f7g8h9i0j

Banana.WPF @ a1b2c3d4e5f6a7b8c9d0

Banana.Logging @ abcdef1234567890abcd

Banana.License @ 123456abcdef7890abcd

```

Is this even viable?
I wanna stay in Azure DevOps and work with .yaml.

Happy for any insight or examples

Similar reddit post by u/mike_testing:
[https://www.reddit.com/r/devops/comments/18eo4g5/how_do_you_handle_cicd_for_multiple_repos_that/](https://www.reddit.com/r/devops/comments/18eo4g5/how_do_you_handle_cicd_for_multiple_repos_that/)

edit: I keep wirting versions instead of commits. Updated

https://redd.it/1lupz73
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views15:28

Notificator Alertmanager GUI

Hello !

It’s been a while I was using Karma as a Alert viewer for Alertmanager.

After so many trouble using the WebUI I decide to create my own project

Notificator : a GUI for Alertmanager with sound and notification on your laptop !

Developed with Go

Here is the GitHub hope you will like it 😊

https://github.com/SoulKyu/notificator

https://redd.it/1lusprq
@r_devops

GitHub - SoulKyu/notificator: Notificator is a GUI for alertmanager with sounds and notifications

Notificator is a GUI for alertmanager with sounds and notifications - SoulKyu/notificator

👍1

7 views16:28

We built this project to increase LLM throughput by 3x. Now it has been adopted by IBM in their LLM serving stack!

Hi guys, our team has built this open source project, LMCache, to reduce repetitive computation in LLM inference and make systems serve more people (3x more throughput in chat applications) and it has been used in IBM's open source LLM inference stack.

In LLM serving, the input is computed into intermediate states called KV cache to further provide answers. These data are relatively large (\~1-2GB for long context) and are often evicted when GPU memory is not enough. In these cases, when users ask a follow up question, the software needs to recompute for the same KV Cache. LMCache is designed to combat that by efficiently offloading and loading these KV cache to and from DRAM and disk. This is particularly helpful in multi-round QA settings when context reuse is important but GPU memory is not enough.

Ask us anything!

Github: https://github.com/LMCache/LMCache

https://redd.it/1luumz3
@r_devops

GitHub - LMCache/LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Supercharge Your LLM with the Fastest KV Cache Layer - LMCache/LMCache

10 views17:28

Very simple GitHub Action to detect changed files (with grep support, no dependencies)

I built a minimal GitHub composite action to detect which files have changed in a PR with no external dependencies, just plain Bash! Writing here to share a simple solution to something I commonly bump into.

Use case: trigger steps only when certain files change (e.g. *.py, *.json, etc.), without relying on third-party actions. Inspired by tj-actions/changed-files, but rebuilt from scratch after recent security concerns.

Below you will find important bits of the action, feel free to use, give feedback or ignore!
I explain more around it in my blog post

runs:
using: composite
steps:
\- uses: actions/checkout@v4
with:
fetch-depth: 0

\- id: changed-files
shell: bash
run: |
git fetch origin ${{ github.event.pull_request.base.ref }}
files=$(git diff --name-only origin/${{ github.event.pull_request.base.ref }} HEAD)
if [ "${{ inputs.file-grep }}" != "" \]; then
files=$(echo "$files" | grep -E "${{ inputs.file-grep }}" || true)
fi
echo "changed-files<<EOF" >> $GITHUB_OUTPUT
echo "$files" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT

https://redd.it/1luv6fs
@r_devops

GitHub - tj-actions/changed-files: :octocat: Github action to retrieve all (added, copied, modified, deleted, renamed, type changed…

:octocat: Github action to retrieve all (added, copied, modified, deleted, renamed, type changed, unmerged, unknown) files and directories. - tj-actions/changed-files

8 views18:28

PagerDuty Pros/Cons

Our team is considering about using PD. How was it for your team? Issues? Alternatives?

https://redd.it/1luzfbu
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views21:28

Why do providers only charge for egress + other networking questions

Hi!

I have a few networking questions, have of course used AI & surfed around, but cannot find concrete answers.

1. Why do cloud providers only charge for egress? Is it because the customer has already paid for the ingress via their ISP? Does the ISP ( Say AT&T ) pay internet exchange routes in the area or how does this work, or do they usually just have their own lines everywhere around the country? [ US \]

2. How much egress do you think you can send out via your ISP before they shut you off for the month? Usually ISPs when I have signed on have just stated the speed ( 100MBS ) for example, but nothing about egress.

https://redd.it/1lv2re5
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views23:28

Does anyone choose devops? I somehow ended up as the only devops person in my team and can’t figure things out most of the time… when does it get better?

I feel lost. I am dealing with deploying old codebases. I know my way around AWS for the most part. I feel like most of my deployments fail. I considered myself a somewhat good engineer before when I was doing development work but now I feel kinda dumb. My bosses seems to be happy with me but idk what I’m doing most time, things break all the time and it takes me forever to fix and figure out these stacks and technologies. Does this ever get better?

https://redd.it/1lv4sfe
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views00:28

Wasps With Bazookas v2 - A Distributed http/https load testing system

# What the Heck is This?

Wasps With Bazookas is a distributed swarm-based load testing tool made up of two parts:

Hive: the central coordinator (think: command center)
Wasps: individual agents that generate HTTP/S traffic from wherever you deploy them

You can install wasps on as many machines as you want — across your LAN, across the world — and aim the swarm at any API or infrastructure you want to stress test.

It’s built to help you measure actual performance limits, find real bottlenecks, and uncover high-overhead services in your stack — without the testing tool becoming the bottleneck itself.

# Why I built it

As you can tell, I came up with the name as a nod towards its inspiration bees with machine guns

I spent months debugging performance bottlenecks in production systems. Every time I thought I found the issue, it turned out the load testing tool itself was the bottleneck, not my infrastructure.

This project actually started 6+ years ago as a Node.js wrapper around wrk, but that had limits. I eventually rewrote it entirely in Rust, ditched wrk, and built the load engine natively into the tool for better control and raw speed.

# What Makes This Special?

# The Hive Architecture

🏠 HIVE (Command Center)
↕️
🐝🐝🐝🐝🐝🐝🐝🐝
Wasp Army Spread Out Across the World (or not)
↕️
🎯 TARGET SERVER

Hive: Your command center that coordinates all wasps
Wasps: Individual load testing agents that do the heavy lifting
Distributed: Each wasp runs independently, maximizing throughput
Millions of RPS: Scale to millions of requests per second
Sub-microsecond Latency: Precise timing measurements
Real-time Reporting: Get results as they happen

I hope you enjoy WaspsWithBazookas! I frequently create open-source projects to simplify my life and, ideally, help others simplify theirs as well. Right now, the interface is quite basic, and there's plenty of room for improvement. I'm excited to share this project with the community in hopes that others will contribute and help enhance it further. Thanks for checking it out and I truly appreciate your support!

https://redd.it/1lv5r5q
@r_devops

GitHub - Phara0h/WaspsWithBazookas: Its like bees with machine guns but way more power

Its like bees with machine guns but way more power - Phara0h/WaspsWithBazookas

4 views01:28

Release cycles, ci/cd and branching strategies

For all mid sized companies out there with monolithic and legacy code, how do you release?

I work at a company where the release cycle is daily releases with a confusing branching strategy(a combination of trunk based and gitflow strategies). A release will often have hot fixes and ready to deploy features. The release process has been tedious lately

For now, we mainly 2 main branches (apart from feature branches and bug fixes). Code changes are first merged to dev after unit Tests run and qa tests if necessary, then we deploy code changes to an environment daily and run e2es and a pr is created to the release branch. If the pr is reviewed and all is well with the tests and the code exceptions, we merge the pr and deploy to staging where we run e2es again and then deploy to prod.

Is there a way to improve this process? I'm curious about the release cycle of big companies

https://redd.it/1lv6brv
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

3 views02:28

Advice Needed Robust PII Detection Directly in the Browser (WASM / JS)

Hi everyone,

I'm currently building a feature where we execute SQL queries using DuckDB-WASM directly in the user's browser. Before displaying or sending the results, I want to detect any potential PII (Personally Identifiable Information) and warn the user accordingly.

Current Goal:
- Run PII detection entirely on the client-side, without sending data to the server.
- Integrate seamlessly into existing confirmation dialogs to warn users if potential PII is detected.

Issue I'm facing:
My existing codebase is primarily Node.js/TypeScript. I initially attempted integrating Microsoft Presidio (Python library) via Pyodide in-browser, but this approach failed due to Presidio’s native dependencies and reliance on large spaCy models, making it impractical for browser usage.

Given this context (Node.js/TypeScript-based environment), how could I achieve robust, accurate, client-side PII detection directly in the browser?

Thanks in advance for your advice!

https://redd.it/1lv72bs
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views03:28

DataDog synthetics are the best but way over priced. Made something better and free

After seeing DataDog Synthetics pricing, I built a distributed synthetic monitoring solution that we've been using internally for about a year. It's scalable, performant, and completely free.

Current features:

Distributed monitoring nodes
Multi-step browser checks
API monitoring
Custom assertions

Coming soon:

Email notifications (next few days)
Internal network synthetics
Additional integrations
Open sourcing most of the codebase

If you need synthetic monitoring but can't justify enterprise pricing, check it out: https://synthmon.io/

Would love feedback from the community on what features you'd find most useful.

https://redd.it/1lv8xlz
@r_devops

5 views04:28

Best way to continue moving into devops from helpdesk?

I’ve looked over some of the roadmaps, and I know I already have some of the knowledge, so I was curious what I have already done/what I should do to continue to move down the career path to get into devops. Below are some of the things I am considering as I am moving down this career path.

1) I have graduated about a year ago with a degree in computer science. During this time I was exposed to several coding languages including C, Java, and most importantly (in my opinion) python

2) I have an A+ certification and am almost finished studying for my network+

3) As stated in the title, I currently work in a helpdesk position. I have only been there about 4 months, but during that time I have been writing some basic powershell scripts to help automate tasks in Active Directory, and I’ve written one major script in python that helps ticket creation go a bit smoother (nothing fancy, it’s really just a way to format text as a lot of what we do is copying and pasting information, but it works)

4) I currently have a homelab. A lot of what I do is based around docker containers that each run their own web application. I won’t pretend I am super familiar with docker but it is something I have used a decent amount

5) I have used sql, as well as some nosql languages such as neo4j. I’ve also hosted a sql database on aws but that was a while ago and it would take me a while to do it again.

Is there anything else that I could do to further my knowledge? Any other certifications or intermediate career jumps I could make before landing a dev ops position? I’m a little bit lost so any help would be appreciated

https://redd.it/1lvbncd
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views06:28

My aws ubuntu instance status checks failed twice

I did-not set any cloud watch restarts. Last week all of a sudden my aws instance status checks failed.
After restarting the instance it started working.

And then when i checked the logs. I found this

‘’’
amazon-ssm-agent405: ... dial tcp 169.254.169.254:80: connect: network is unreachable
systemd-networkd-wait-online: Timeout occurred while waiting for network connectivity
‘’’

It was working fine. Then last night the same instance it failed again. This time the errors
‘’’
Jul 8 15:36:25 systemd-networkd352: ens5: Could not set DHCPv4 address: Connection timed out
Jul 8 15:36:25 systemd-networkd352: ens5: Failed
‘’’

This is the command i used to get the logs:

grep -iE "oom|panic|killed process|segfault|unreachable|network|link down|i/o error|xfs|ext4|nvme" /var/log/syslog | tail -n 100

Why is this happening?

https://redd.it/1lvbqq3
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views07:28