Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
how do you actually stay on top of configuration drift?

so i've been thinking a lot about config drift lately, especially in fast-moving environments where infrastructure changes constantly. even with IaC and automated policies, things always seem to slip through... manual tweaks, unexpected dependencies, or just plain human error.

i came across this article that breaks down some solid strategies for controlling drift, but i'm curious - what’s actually worked for you in practice? do you rely more on automation, strict policies, or just accept a certain level of drift as inevitable?

would love to hear how different teams approach this.

https://redd.it/1j56676
@r_devops
Opsgenie is shutting down! Here are 5 open source alternatives to switch to

Hi,

In their recent blog post, Atlassian announced they'll be shutting down Opsgenie on June 4th, 2025. There's currently a heated discussion about this on Hacker News for anyone interested.

If you're affected by this change, I've compiled some of the best open-source alternatives to Opsgenie:

https://openalternative.co/alternatives/opsgenie

This is by no means a complete list, so if you know of any solid alternatives that aren't included, please let me know.

Thanks!

https://redd.it/1j572gh
@r_devops
Teleport Application | Hashicorp Vault UI | Expose issues

Hi!

I'm trying to use teleport to expose the hashicorp vault ui we have on our Kubernetes cluster.

I'm receiving a blank page with 500 errors when I try to access them. This is my kube-agent config


...
app_service:
enabled: true
apps:
- name: vault-dev
uri: https://develop-vault-server-active.vault.svc.cluster.local:8200
labels:
env: develop
service: vault
rewrite:
headers:
- 'Host: develop-vault-server-active.vault.svc.cluster.local:8200'
...

Kube-agent logs


2025-03-05T11:19:26.510Z INFO [KUBERNETE] Starting Kube service via proxy reverse tunnel. pid:6.1 service/kubernetes.go:257
2025-03-05T11:19:26.575Z INFO [APP:SERVI] Cache "apps" first init succeeded. cache/cache.go:1152
2025-03-05T11:19:29.618Z INFO [APP:SERVI] All applications successfully started. pid:6.1 service/service.go:6224
2025-03-05T11:19:29.618Z INFO [PROC:1] The new service has started successfully. Starting syncing rotation status. pid:6.1 max_retry_period:4m16s service/connect.go:642
2025-03-05T11:22:09.831Z INFO emitting audit event event_type:app.session.chunk fields:map[app_name:vault-dev app_public_addr:vault.dev.teleport.xxx.co app_uri: cluster_name:teleport.xxx.co code:T2008I ei:6.65831065482e+11 event:app.session.chunk namespace:default private_key_policy:none server_id:21235eb8-04a9-400d-85a1-c58792a0f5f8 server_version:17.2.2 session_chunk_id:60b98e63-6fa4-4864-9293-e5a9e35eb0c3 sid:8671e5e0d3b649b50dc0d77860af90de88912c7d4b5addeff76f6599e740ed64 time:2025-03-05T11:22:09.831Z trace.component:audit uid:8396daf7-5fd3-44ae-b465-10a3b4e62382 user:username user_kind:1] events/emitter.go:287
2025-03-05T11:22:09.842Z INFO [APP:SERVI] Round trip: GET , code: 307, duration: 10.831033ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.888Z INFO emitting audit event event_type:app.session.chunk fields:map[app_name:vault-dev app_public_addr:vault.dev.teleport.xxx.co app_uri: cluster_name:teleport.xxx.co code:T2008I ei:6.0885849394e+10 event:app.session.chunk namespace:default private_key_policy:none server_id:21235eb8-04a9-400d-85a1-c58792a0f5f8 server_version:17.2.2 session_chunk_id:9862c82f-32e5-4c4a-87cd-dd4648dd3c38 sid:063e3000708b3f2fdebe6610a068ef36daf56cf5103e63d3df7689ce3e8e43f2 time:2025-03-05T11:22:09.886Z trace.component:audit uid:b8afdde3-43ad-4cb8-9d93-a3d234d2d169 user:username user_kind:1] events/emitter.go:287
2025-03-05T11:22:09.902Z INFO [APP:SERVI] Round trip: GET , code: 307, duration: 16.153207ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.928Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 4.198207ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.994Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.837296ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.228Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.695592ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.238Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.327523ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.241Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 3.076735ms tls:version: 304, tls:resume:false, tls:csuite:1301,
While executing "vagrant up", I am encountering the follow error. Would be thankful if you please guide me on this. Thank you in advance.

==> controlplane: Setting hostname...

==> controlplane: Configuring and enabling network interfaces...

The SSH connection was unexpectedly closed by the remote end. This usually indicates that SSH within the guest machine was unable to properly start up. Please boot the VM in GUI mode to check whether it is booting properly.



Following are the complete message till I got the error and got stopped:

ub1@ub1-VirtualBox:\~/certified-kubernetes-administrator-course/kubeadm-clusters/virtualbox$ vagrant up

Bringing machine 'controlplane' up with 'virtualbox' provider...

Bringing machine 'node01' up with 'virtualbox' provider...

Bringing machine 'node02' up with 'virtualbox' provider...

==> controlplane: Box 'ubuntu/jammy64' could not be found. Attempting to find and install...

controlplane: Box Provider: virtualbox

controlplane: Box Version: >= 0

==> controlplane: Loading metadata for box 'ubuntu/jammy64'

controlplane: URL: https://vagrantcloud.com/api/v2/vagrant/ubuntu/jammy64

==> controlplane: Adding box 'ubuntu/jammy64' (v20241002.0.0) for provider: virtualbox

controlplane: Downloading: https://vagrantcloud.com/ubuntu/boxes/jammy64/versions/20241002.0.0/providers/virtualbox/unknown/vagrant.box

==> controlplane: Successfully added box 'ubuntu/jammy64' (v20241002.0.0) for 'virtualbox'!

==> controlplane: Importing base box 'ubuntu/jammy64'...

==> controlplane: Matching MAC address for NAT networking...

==> controlplane: Setting the name of the VM: controlplane

Vagrant is currently configured to create VirtualBox synced folders with

the `SharedFoldersEnableSymlinksCreate` option enabled. If the Vagrant

guest is not trusted, you may want to disable this option. For more

information on this option, please refer to the VirtualBox manual:



https://www.virtualbox.org/manual/ch04.html#sharedfolders



This option can be disabled globally with an environment variable:



VAGRANT_DISABLE_VBOXSYMLINKCREATE=1



or on a per folder basis within the Vagrantfile:



config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false

==> controlplane: Clearing any previously set network interfaces...

==> controlplane: Preparing network interfaces based on configuration...

controlplane: Adapter 1: nat

controlplane: Adapter 2: bridged

==> controlplane: Forwarding ports...

controlplane: 22 (guest) => 2222 (host) (adapter 1)

==> controlplane: Running 'pre-boot' VM customizations...

==> controlplane: Booting VM...

==> controlplane: Waiting for machine to boot. This may take a few minutes...

controlplane: SSH address: 127.0.0.1:2222

controlplane: SSH username: vagrant

controlplane: SSH auth method: private key

controlplane: Warning: Connection reset. Retrying...

controlplane: Warning: Remote connection disconnect. Retrying...

controlplane: Warning: Connection reset. Retrying...

controlplane:

controlplane: Vagrant insecure key detected. Vagrant will automatically replace

controlplane: this with a newly generated keypair for better security.

controlplane:

controlplane: Inserting generated public key within guest...

controlplane: Removing insecure key from the guest if it's present...

controlplane: Key inserted! Disconnecting and reconnecting using new SSH key...

==> controlplane: Machine booted and ready!

==> controlplane: Checking for guest additions in VM...

controlplane: The guest additions on this VM do not match the installed version of

controlplane: VirtualBox! In most cases this is fine, but in rare cases it can

controlplane: prevent things such as shared folders from working properly. If you see

controlplane: shared folder errors, please make sure the
guest additions within the

controlplane: virtual machine match the version of VirtualBox you have installed on

controlplane: your host and reload your VM.

controlplane:

controlplane: Guest Additions Version: 6.0.0 r127566

controlplane: VirtualBox Version: 7.1

==> controlplane: Setting hostname...

==> controlplane: Configuring and enabling network interfaces...

The SSH connection was unexpectedly closed by the remote end. This

usually indicates that SSH within the guest machine was unable to

properly start up. Please boot the VM in GUI mode to check whether

it is booting properly.



https://redd.it/1j5ae1f
@r_devops
Recommended learning path for AWS infrastructure services

Hi,

so what learning path/strategy/resources would your recommend for someone who wants to get practical skills and be able to design/build and manage cloud infrastructure in AWS, using IaC and be on top of the game when it comes to automation and monitoring?

Existing experience includes: strong networking - including core networking as well as application proxies and WAFs
Strong Linux and scripting skiils
C, Python, Go programming experience
Strong DBA experience, also directory services and auth solutions
System design and infrastructure architecture experience, including many types of virtualization platforms
but very limited public cloud production experience

Once again, not looking for a certification path, but more of a hands on, practical get up and being successful platform engineer using AWS and foundational services + EKS, ECS.

Ideally looking for learning from real world examples or building/running real world complex systems in AWS.

What would be practical approach to learning be like?

https://redd.it/1j58qj0
@r_devops
Is there a local dev (single license) setup for JFrog Artifactory?

My company uses JFrog Artifactory, so being a good dev I installed it locally learn the finer points. However I brought up the UI of my new install and it asked me for a license, then completely me blocked from doing anything 😂

Most other companies let you use their full product locally for evaluation purposes... What do you all suggest?

I know they have alternative versions (Artifactory OSS & JFrog Container Registry) which are more limited (Java, Docker) are those my best bet?

I noticed they also have a cloud managed version (with free trial) but I was hoping to self-host so I could really learn it, but maybe it's not worth the hassle?

https://redd.it/1j5cjtx
@r_devops
General Advice For a Kubernetes setup

Our planned setup is:

1 Kubernetes Cluster - CI/CD via Jenkins
1 Deployment (2-3 pods) for our UI
1 Deployment (2-3 pods) for our Server
SQL server hosted any way we please
The top 3 are mandatory per the situation (we don't own the infrastructure) but the DB we have some say over.

Question:

We are a small team, none of us do a ton of DevOps
Would folks recommend trying to put the database into the cluster itself or would it be easier to host the database elsewhere and connect to it?
I have heard managing persistent statefulset resources in the cluster can be painful.

https://redd.it/1j5aw89
@r_devops
Is there a canvas app that lets you quickly design a DevOps infrastructure?

I would like to design something and have someone look at it and criticize it. Is there any app like that? It would be really useful.

https://redd.it/1j5dyxg
@r_devops
What are the main benefits of setting up a vps for your project?

Want to learn more about vps in general and how I can benefit from setting one up.

https://redd.it/1j5ga1g
@r_devops
Seeking clients as a Devops Freelancer

I am working as a full time devops engineer but these days I don't have much project work and I want to take up freelancing projects side by side . What are the best ways I can do that ?

https://redd.it/1j5hkzy
@r_devops
s1h: ssh + scp + passwords manager unified in one simple CLI

Hello everyone, I use ssh a lot, and I have a mixture of passwords & private key, which is a pain to work with. To solve that pain point, I created this tool called s1h inspired from k9s:
https://github.com/noboruma/s1h
Hope you find it useful as well!

https://redd.it/1j5jlzo
@r_devops
Seeking feedback on my approach to building a container orchestrator (Uncloud)

Hey DevOps folks,

I'm reaching out for some honest feedback on a personal open source project that stemmed from my curiosity about simplifying the state of the art in container orchestration.

After spending years working with Kubernetes at a unicorn and for my home infra, I found myself increasingly frustrated by the operational overhead and complexity. I kept thinking: "Surely there must be a middle ground between simple Docker Compose and full-blown Kubernetes for small-medium scale? Can it work without Raft?" I wanted container orchestration to bring me joy again, the way Ansible did when I first tried it a decade ago, or Docker after that. Do you sometimes feel the same?

That frustration led me to start building Uncloud, intentionally focusing on core design principles that differ from traditional container orchestrators like Kubernetes, Docker Swarm, or Nomad:

No control plane: Fully decentralised design without quorum eliminates single points of failure and reduces operational overhead. Each machine maintains a synchronised copy of the cluster state through peer-to-peer communication, keeping cluster operations functional even if some machines go offline
Zero-config private network: Automatic WireGuard mesh with peer discovery and NAT traversal. Containers get unique IPs for direct cross-machine communication
Imperative over declarative: Favoring imperative operations over state reconciliation simplifies both the mental model and troubleshooting
Partition tolerant: Ability to function during network partitions at the cost of eventual consistency
Batteries included: Built-in service discovery using DNS, load balancing, ingress with HTTPS
Docker-like CLI: Familiar commands for managing both infrastructure and applications

I want well-designed building blocks that just work together. When a service needs high availability, I should be able to scale it across machines and know that if any machine goes down the remaining ones will continue serving traffic. I don’t need advanced auto-healing or auto-scaling magic that is easy to misconfigure. When I deploy, I want immediate feedback, not wondering whether the reconciliation loop will eventually catch up.

Please check out the GitHub page for more technical details and a Demo: https://github.com/psviderski/uncloud

I know this approach won't fit everyone's needs and by no means does it intend to replace K8s at scale. Always use what works best for your specific situation and don’t forget to have fun!

I’d really love to hear your feedback:

Am I alone in wanting something more powerful than Docker Compose but less complex than Kubernetes?
If you're dealing with similar challenges, what would you prioritise in a tool like this?

https://redd.it/1j5dxkr
@r_devops
CI/CD compliance audit

Have you ever conducted a compliance audit of CI/CD pipelines? By compliance, I mean ensuring that all CI/CD pipeline configurations comply with internal policies or external norms and frameworks (CIS Benchmark, NIST, NIS2, ISO 27001, etc.).

I'm super interested by feedbacks about it

https://redd.it/1j5kwo2
@r_devops
Understanding and mitigating Tail Latency by using request Hedging

Hi folks! 👋

I recently dove deep into latency mitigation strategies and wrote about request hedging, a technique I discovered while studying Grafana's distributed system toolkit. I thought this might be valuable for others working on distributed systems.

The article covers:
\- What tail latency is and why it matters
\- How request hedging works to combat latency spikes
\- Practical implementation example with some simulated numbers

Blog post: https://blog.alexoglou.com/posts/hedging

If you worked on tackling tail latency challenges in your systems I would love to know what you implemented and how it performed!

https://redd.it/1j5ld3g
@r_devops
Lighthouse and TTFB on azure

I have an azure Ubuntu server where I host a website that’s built using php (symfony), MySQL on an azure musql server, and node js. I’ve been trying to enhance the lighthouse performance score for the website. In general, I get 60-70 for performance and we aim to get to 90. I’ve looked into different aspects including caching, compression, using http2, and an azure cdn. The results are slightly better but not close to our target. One aspect I notice a lot is the TTFB values fluctuating all over the place from 60-1100 ms , which seems a lot. Has anybody tried any solutions to enhance that?

https://redd.it/1j5ng6w
@r_devops
Github actions, share custom actions

Hi everyone, I'm using Github Actions to build and deploy my applications.

I've already read that Github Actions has many shortcomings when it comes to advanced settings.

I'm using a private repo to share my custom actions: my-actions-repo.

When I need use a custom action in some job I need specify the complete syntax: my\_user\_name/my-actions-repo/actions/aws/aws-login@main, even though the workflow and actions are in the same repository.

name: "Workflow reusable"
on:
    workflow_call:
        inputs:
          image:
            description: "The Docker image to use"
            type: string
            required: true

jobs:

    job1:
        runs-on: ubuntu-latest
        container:
            image: ${{ inputs.image }}
        needs: build
        steps:
            - name: Checkout
              uses: actions/checkout@v3
            - name: AWS Login
              uses: my_user_name/my-actions-repo/actions/aws/aws-login@main
              with:
                region: "us-east-1"

How could I specify that the custom actions are within the actions repository (my-actions-repo), or what other options do I have since it is very dirty to indicate the entire syntax, I would like to only indicate: `./actions/aws/aws-login.`

If I just put "`/actions/aws/aws-login`", it tries to look for the actions in the repository where I'm calling my reusable workflow.

https://redd.it/1j5qrpv
@r_devops
Failed to get a junior DevOps job

Hello everyone,

For the past seven months, I have been studying and attending DevOps courses on Udemy. I also purchased TechWorld with Nana’s DevOps Bootcamp and have been learning all the essential tools that every DevOps engineer should know also I have a solid linux knowledge. However, I have not yet succeeded in securing a Junior DevOps position.

Currently, I am working as a Software Support Engineer, but I want to build a career in DevOps. What workflow should I follow to gain real-world DevOps experience until I get accepted for a Junior DevOps role?

https://redd.it/1j5q1lo
@r_devops
Argocd + naming convention for multi cluster deployments

Just curious how people handle naming their applications when using argocd?

I'm currently setting up an applicationset that I want to deploy to multiple clusters. The problem is I was wanting them all to have the same helm names inside the cluster

Ie. I want the helm chart in the cluster to be called {{name}}, not {{name}}-{{cluster}}, I don't care if the application inside ArgoCD is different but is there a way to reuse helm chart names?

https://redd.it/1j5vbpb
@r_devops