Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
DevSecOps / AI CTF today - Ctf.punksecurity.co.uk

Our CTF runs today, with entry level and difficult challenges across DevSecOps and AI. No cost to play, some prizes for the best teams.

CTFs are little competitive puzzle based games designed to expose you to different tech and have you think in different ways. In our case it’s cicd attacks and AI prompt injection attacks :)

https://ctf.punksecurity.co.uk



https://redd.it/1keev01
@r_devops
What would you be willing to pay for at your company?

Over the years, we’ve seen several licensing dramas and ongoing debates even on this sub — the latest being Redis becoming open source again.

Someone once said: “I'm fine with companies making money from software” — and I’d say that’s the bare minimum.

But the real question is: what would your company actually be willing to pay for? Just compute power? Services? Or even open source software?

If it's the latter: what are you looking for? Suppose a piece of software simply works, has decent documentation, and no major feature gaps — would you still be willing to support it financially?

How do you evaluate packaging and delivering propositions, like Linkerd, or Chainguard, to get paid for? This is what I'm currently pursuing: just releasing and packaging latest — you can try it and test it, you wouldn't ever and ever go in production with a non version pinned software, so I can offer you stable version pinned versions (always based on upstream, no forks) with SBOM and detailed changelog and upgrade instructions, if required.

https://redd.it/1keei1p
@r_devops
Canary like deployments for Custom Resources?

Why is there no Canary-like deployment orchestrator for Custom Resources with quality gateway analysis?

AFAIK, Flagger, Keptn ( have some maintenance problems ), Argo Rollouts, these are tightly bound to K8s vanilla resources and Ingress in general, but what if I want to deploy a Custom Resource, then check metrics, then do some custom action, and promote eventually "the deployment". Ofc I know what's Canary and what's traffic shifting.

Like, how are You versioning and deploying Workflows for batch operations? I want to test it, like use the new version for 10% workloads, and do the incremental promotion eventually based on the quality gateway check ( Prometheus metrics in this case

Thanks

Is this use case nonsense, or the

https://redd.it/1kehhvs
@r_devops
Some packages on Sonatype Nexus aren't updated when using as a Composer repository

Hello,

We have a Nexus Sonatype repository for Composer and one of the devops guys who was maintaining it left and now we are not sure why some packages aren't being updated to the latest.

For example, we need to install the package robrichards/xmlseclibs: https://packagist.org/packages/robrichards/xmlseclibs


We need the latest version which is 3.1.3 but in our repository it's only 3.1.1 and i was last updated on 2024: https://ibb.co/4ZtJF9Gd


We are not sure how to make Nexus get the latest version when someone is using the composer require robrichards/xmlseclibs command


What should I try to do?

Thanks!

https://redd.it/1keiala
@r_devops
Built a basic SSH connection manager in Python — would love feedback

Hey folks,

I’ve built a small Python library to manage SSH connections and run commands across multiple remote machines. It’s still in the early stage but works well for basic cluster operations.

# 🔧 What it does:

* Load SSH hosts from config
* Connect to multiple servers in parallel
* Run commands across all of them
* Some basic monitoring and result handling

I made this because I often deal with multiple remote machines (AI training, deployments, etc.) and got tired of managing them manually or writing repetitive scripts.

The current version is lightweight and it has only pythonic api — no GUI or TUI yet or CLI, but I’m thinking of adding that in future.

Would really appreciate:

* Feedback on structure or usage
* Suggestions for features or improvements
* If this is even useful to others in its current state

# 🔗 GitHub:

[https://github.com/goravaa/ssh-clusters-manager](https://github.com/goravaa/ssh-clusters-manager)

Thanks in advance to anyone who checks it out 🙌

https://redd.it/1kenbqu
@r_devops
American Sign Language in DevOps Communities and Teaching

Hello everyone,

I’m a student in university who hosts workshops within our local Google Developer Groups Chapter.

I go to a university that has a substantial deaf and hard of hearing population.

This year, I’ve hosted several talks, and on occasion have had some deaf students attend. On such days we have requested interpreting services and have been able to access them, which have a been great.

However, I have subconsciously felt that although all of our talks are in English, there is still a language barrier. Talking about Kubernetes, Containers, Linux, and other development frameworks, I’m not sure if the ideas within my presentations have been able to fully get across accessibly through an ASL context.

Has anyone encountered a similar predicament? Looking for some tips to improve my communication skills within workshop environments to make everyone feel included.

https://redd.it/1kenn11
@r_devops
Built a fast multi-host terminal log viewer with timeline histogram – looking for feedback

Hey all – I’ve been working on Nerdlog: an open-source fast terminal-based log viewer loosely inspired by Graylog/Kibana, having a similar timeline histogram on top, but designed to be snappy, lightweight and setup-free (it just ssh-s to the hosts and uses standard tools such as awk, tail, head, etc).

It's optimized for reading system logs (from /var/log/messages or /var/log/syslog or straight from journalctl), and being as efficient at that as possible. To share some numbers, I've been using it daily with 20+ hosts simultaneously, reading 1GB+ log files on each of them; and getting logs for the last hour was taking 2-3 seconds.

Initially I hacked it together as a revolt against company-wide enforcement of Splunk, which I found way too slow for the amount of logs that we were having; but the project is outgrowing the initial proof-of-concept stage now.

I'd love feedback from the DevOps crowd: so far it was focused on my needs as a developer to read backend logs, but I think there is good potential it can be useful in the ops context as well, I just need to know the pain points and specifics of your needs. Is there a feature that is painfully missing in whatever log viewer that you're using now? Or vice versa: a feature that you love in some other log viewer and that Nerdlog should have too? Let me know!

GitHub repo here.

And thanks!

https://redd.it/1kerwu1
@r_devops
Self-hosted alternative to AWS Elastic Beanstalk with GitHub deploy and automatic horizontal scaling (no Kubernetes)?

I’m looking for a self-hosted platform similar to AWS Elastic Beanstalk that lets me push my code to GitHub and handles deployment plus automatic horizontal scaling on VPS servers.

Requirements:

GitHub → automatic deploy
VPS-based horizontal (instance-level) scaling
Not a serverless (AWS Lambda-style) solution
No Kubernetes (I don’t want to manage K8s clusters)

Which open-source tools or platforms would you recommend?

https://redd.it/1kesqg3
@r_devops
What else do I need before I apply?

I've been a systems admin for over a decade. The last two years I've been doing gitops with ansible and terraform, and also managing some kubernetes clusters on-prem. I know enough Azure to get around but I'm not an expert. I've written some minor CI/CD pipelines as well. I'd like to move into an actual DevOps position but not sure what else I need. I'm not an expert software engineer, but I can write a powershell or python script with enough time.

https://redd.it/1kesm74
@r_devops
Resource recs for cloud engineer that eventually needs to help developers

Hi everyone!

I know this is a horrible title btw. And excuse me if I got some terms wrong. And I meant "occasionally".

Here's the issue: I work as a cloud support engineer for a very small cloud shop and our clients are mainly startups so keep that in mind lol. We are supposed to support our client's infrastructure only, but a lot of times receive tickets asking for help in things that lean into the DevOps and software development fields. I have a very superficial background in backend development so sometimes with a bit of reading the docs and researching I can be of help, but a lot of times I feel like my "help" is lacking and not substantial enough. The other day for example we got a client asking how he could reduce downtime in his app during (schema, I assume) migrations. My colleague helped him, but then this weekend I researched the topic and I'm not sure the advice he provided was great.

On top of that, I'm pretty new to technology in general, still in college and I have A TON of things to learn and study on my to-do list that are related to cloud, networking, IaC, etc, but I feel like it would be incredibly useful to pick up some things in other related fields that would help me in my job.

I'm not assuming in any way that I can pick up a book and suddenly become a genius, but what are the resources - courses, videos, books that in your experience could be helpful to someone in a position like the one I'm in?

https://redd.it/1kew9rd
@r_devops
Introducing VPS Pilot – My open-source project to manage and monitor VPS servers!

 Built with:

Agents (Golang) installed on each VPS

Central server (Golang) receiving metrics via TCP

Dashboard (React.js) for real-time charts

TimescaleDB for storing historical data

 Features so far:

CPU, memory, and network monitoring (5m to 7d views)

Discord alerts for threshold breaches

Live WebSocket updates to the dashboard

 Coming soon:

Project management via config.vpspilot.json

Remote command execution and backups

Cron job management from central UI

 Looking for contributors!
If you're into backend, devops, React, or Golang — PRs are welcome 
 GitHub: https://github.com/sanda0/vps\_pilot

\#GoLang \#ReactJS \#opensource \#monitoring \#DevOps See less

https://redd.it/1kewtsx
@r_devops
Please guide me in learning infrastructure automation

I currently manage a few servers running some ecommerce sites (WordPress) and some custom PHP based applications (Vanilla PHP, and Laravel) on DigitalOcean. My setup is pretty basic and consists of

* Fedora Cloud OS (I upgrade servers every 6 months for my sanity)
* Nginx, PHP-FPM (multiple pools), MariaDB, Valkey (Redis)
* Postfix (send-only mail server), OpenDKIM
* Logrotate (to rotate logs per user)
* Cron job for files and db backups to each user's directory, logrotate renames the backups and retains last x days of backups.

Earlier, when servers were few, I used to do setup and configure these manually. Server would be taken down a couple of hours for maintenance and upgrade every 6 months.

Then I did basic automation and configuration by writting custom bash scripts. The maintenance time reduced from hours to less than 30mins every 6 months. Downloading backups and restoring them is the only thing that consumes more time now as the data is huge.

I'm now at a stage where I need to fully automate it as the number of servers are growing each month. From what I've understood, I need to:

* Switch from Nginx, PHP-FPM to Caddy & FrankenPHP
* Containerize each application. We currently use docker-compose for development and testing. I guess we need to learn how to use that safely in production.
* Switch from raw logs to ELK stack.
* Switch from Postfix, OpenDKIM to Maddy/Haraka/Postal setup on a separate server and use SMTP from others server to this server.
* Switch from Fedora to some LTS OS like Ubuntu.
* Switch from bash scripts for setup and configuration to something like Ansible combined with Terraform and Nomad (not sure about these two).
* Add replication to MariaDB.
* Add CI/CD pipelines with Github Private repo.

I'm quite overwhelmed and it's taking a lot of time to wrap my head around these things. I know I have to take it slow and not do it all at once.

Have someone been through such manual to fully automated setup? How did you figure your way out? Please guide me if you have any experience with any of these.

Edit: List formatting.

https://redd.it/1kf0qts
@r_devops
What’s one cloud concept that took you way longer to understand than expected?

For me, it was IAM on AWS. At first, it seemed simple—just give users permissions, right? But once I got into roles, policies, trust relationships, and least privilege... it felt like falling down a rabbit hole.

I kept second-guessing myself every time I tried to troubleshoot access issues. Even now, I still double-check every policy I write like three times 😅

Curious—what was your “wait, why is this so complicated?” moment when learning cloud?

https://redd.it/1kf2vqj
@r_devops
LLMs ('AI') are coming for our jobs whether or not they work - Chris's Wiki

From here:

> In most non-tech organizations, both internal development and system administration is something similar to janitorial services; you have to have it because otherwise your organization falls over, but you don't like it and you're happy to spend as little on it as possible.

https://redd.it/1kf2kzc
@r_devops
Ibm Event notification question

Hello everyone,

I am having difficulties to configure my alerts with different templates.
Maybe can someone help me?

In Event-notifications i have created a Source.
In this sources i have 2 Topics.
I have 2 subscriptions and 2 templates.

But only one of the template is used to send the alerts to slack.

How can i change that?

Ideally would be to write the Template query to call the alert description on slack.
Is this possible?

https://redd.it/1kf8w1x
@r_devops
Passive FTP into Kubernetes ? Sounds cursed. Works great.

“talk about forcing some ancient tech into some very new tech wow... surely there's a better way” said a VMware admin watching my counter FTP strategy😅

Challenge accepted

I recently needed to run a passive-mode FTP server inside a Kubernetes cluster and quickly hit all the usual problems : random ports, sticky control sessions, health checks failing for no reason… you know the drill.

So i built a Helm chart that deploys vsftpd, exposes everything via stable NodePorts, and even generates a full haproxy.cfg based on your cluster’s node IPs, following the official HAProxy best practices for passive FTP.
You drop that file on your HAProxy box, restart the service, and FTP/FTPS just work.

https://github.com/adrghph/kubeftp-proxy-helm

Originally, this came out of a painful Tanzu/TKG setup (where the built-in HAProxy is locked down), but the chart is generic enough to be used in any Kubernetes cluster with a HAProxy VM in front.

Let me know if anyone else is fighting with FTP in modern infra. bye!

https://redd.it/1kfa7mz
@r_devops
Restart Operator: Schedule K8s Workload Restarts

github: [https://github.com/archsyscall/restart-operator](https://github.com/archsyscall/restart-operator)

Built a simple K8s operator that lets you schedule periodic restarts of Deployments, StatefulSets, and DaemonSets using cron expressions.

apiVersion: restart-operator.k8s/v1alpha1
kind: RestartSchedule
metadata:
name: nightly-restart
spec:
schedule: "0 3 * * *" # 3am daily
targetRef:
kind: Deployment
name: my-application

It works by adding an annotation to the pod template spec, triggering Kubernetes to perform a rolling restart. Useful for apps that need periodic restarts to clear memory, refresh connections, or apply config changes.

helm repo add archsyscall https://archsyscall.github.io/restart-operator
helm repo update
helm install restart-operator archsyscall/restart-operator

Look, we all know restarts aren't always the most elegant solution, but they're surprisingly effective at solving tricky problems in a pinch.

Thank you!

https://redd.it/1kfbkfl
@r_devops
Does anyone here use Humanitec? Feedback wanted!

I’ve been looking into Humanitec and I’m curious to hear from people who are actually using it.

What use case(s) you’re solving with it?
How it's integrated into your workflows?
Any wins or challenges you've encountered?
Would you recommend it to others building platform tooling?

I’m especially interested in any honest pros and cons.
Appreciate any insight you can share!

https://redd.it/1kfcpze
@r_devops
I got my first devops position

I'm really happy about this but I don't have a lot of experience. I'm Actually straight out of college. I studied what kubernetes and docker was and even went to linenode to create a kubernetes cluster to get some experience. After messing around a bit I realized I have no idea what to do with this stuff.

I start working a few weeks and I'm a little worried I'm going to go in just not knowing enough, which they probably know. I was wondering if anyone here had any advice on what I could maybe do in the meantime to get prepared. My current goal right now is to just get better with bash scripting because it seems like that's really important.

Thanks in advance!

https://redd.it/1kfce49
@r_devops
Any experience monitoring Redshift

Does anyone have experience monitoring Redshift? We've been having a series of data incidents and we're lacking visibility for what's happening with various jobs. The team usually resorts to tracking various sys_xxx tables to investigate failures. We're also using dbt, which writes some state to tables in Redshift as well. We're using Datadog and pulling in metrics for both Glue and Redshift, but none of those seem to be particularly helpful. I'm looking for any tips anyone has.

https://redd.it/1kfcfcy
@r_devops
Got a 3hr interview coming up. Tips/advice appreciated.

I got through the recruiter screening, a meeting with their main DevOps guy and CTO. I got notified that I'll be moving forward to the next round which is a 3 hour interview with other members of the team. I doubt it's going to be 3 straight hours and it'll probably be more like 3 1 hour blocks.

Anyways,
Any tips, advice, or suggestions? The interviews I already did were pretty chill and I think this might be the last round. The company is pretty cool and in a space where I have some expertise which I think gave me a leg up, I really want the job so help me get through the final push. A little background, I got about 10 years of full stack engineering experience and about the last 5ish years I've been exclusively doing DevOps

Oh edit to add: this is all completely remote

https://redd.it/1kffm0m
@r_devops