how do you actually stay on top of configuration drift?
so i've been thinking a lot about config drift lately, especially in fast-moving environments where infrastructure changes constantly. even with IaC and automated policies, things always seem to slip through... manual tweaks, unexpected dependencies, or just plain human error.
i came across this article that breaks down some solid strategies for controlling drift, but i'm curious - what’s actually worked for you in practice? do you rely more on automation, strict policies, or just accept a certain level of drift as inevitable?
would love to hear how different teams approach this.
https://redd.it/1j56676
@r_devops
so i've been thinking a lot about config drift lately, especially in fast-moving environments where infrastructure changes constantly. even with IaC and automated policies, things always seem to slip through... manual tweaks, unexpected dependencies, or just plain human error.
i came across this article that breaks down some solid strategies for controlling drift, but i'm curious - what’s actually worked for you in practice? do you rely more on automation, strict policies, or just accept a certain level of drift as inevitable?
would love to hear how different teams approach this.
https://redd.it/1j56676
@r_devops
The New Stack
The Engineer’s Guide to Controlling Configuration Drift
Automated validation is key here — it involves running tests that compare your actual environment with what you’ve defined.
Opsgenie is shutting down! Here are 5 open source alternatives to switch to
Hi,
In their recent blog post, Atlassian announced they'll be shutting down Opsgenie on June 4th, 2025. There's currently a heated discussion about this on Hacker News for anyone interested.
If you're affected by this change, I've compiled some of the best open-source alternatives to Opsgenie:
https://openalternative.co/alternatives/opsgenie
This is by no means a complete list, so if you know of any solid alternatives that aren't included, please let me know.
Thanks!
https://redd.it/1j572gh
@r_devops
Hi,
In their recent blog post, Atlassian announced they'll be shutting down Opsgenie on June 4th, 2025. There's currently a heated discussion about this on Hacker News for anyone interested.
If you're affected by this change, I've compiled some of the best open-source alternatives to Opsgenie:
https://openalternative.co/alternatives/opsgenie
This is by no means a complete list, so if you know of any solid alternatives that aren't included, please let me know.
Thanks!
https://redd.it/1j572gh
@r_devops
Work Life by Atlassian
The Evolution of IT Operations and Opsgenie
Learn about our new IT Ops capabilities, and what they mean for Opsgenie.
Teleport Application | Hashicorp Vault UI | Expose issues
Hi!
I'm trying to use teleport to expose the hashicorp vault ui we have on our Kubernetes cluster.
I'm receiving a blank page with 500 errors when I try to access them. This is my kube-agent config
...
app_service:
enabled: true
apps:
- name: vault-dev
uri: https://develop-vault-server-active.vault.svc.cluster.local:8200
labels:
env: develop
service: vault
rewrite:
headers:
- 'Host: develop-vault-server-active.vault.svc.cluster.local:8200'
...
Kube-agent logs
2025-03-05T11:19:26.510Z INFO [KUBERNETE] Starting Kube service via proxy reverse tunnel. pid:6.1 service/kubernetes.go:257
2025-03-05T11:19:26.575Z INFO [APP:SERVI] Cache "apps" first init succeeded. cache/cache.go:1152
2025-03-05T11:19:29.618Z INFO [APP:SERVI] All applications successfully started. pid:6.1 service/service.go:6224
2025-03-05T11:19:29.618Z INFO [PROC:1] The new service has started successfully. Starting syncing rotation status. pid:6.1 max_retry_period:4m16s service/connect.go:642
2025-03-05T11:22:09.831Z INFO emitting audit event event_type:app.session.chunk fields:map[app_name:vault-dev app_public_addr:vault.dev.teleport.xxx.co app_uri: cluster_name:teleport.xxx.co code:T2008I ei:6.65831065482e+11 event:app.session.chunk namespace:default private_key_policy:none server_id:21235eb8-04a9-400d-85a1-c58792a0f5f8 server_version:17.2.2 session_chunk_id:60b98e63-6fa4-4864-9293-e5a9e35eb0c3 sid:8671e5e0d3b649b50dc0d77860af90de88912c7d4b5addeff76f6599e740ed64 time:2025-03-05T11:22:09.831Z trace.component:audit uid:8396daf7-5fd3-44ae-b465-10a3b4e62382 user:username user_kind:1] events/emitter.go:287
2025-03-05T11:22:09.842Z INFO [APP:SERVI] Round trip: GET , code: 307, duration: 10.831033ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.888Z INFO emitting audit event event_type:app.session.chunk fields:map[app_name:vault-dev app_public_addr:vault.dev.teleport.xxx.co app_uri: cluster_name:teleport.xxx.co code:T2008I ei:6.0885849394e+10 event:app.session.chunk namespace:default private_key_policy:none server_id:21235eb8-04a9-400d-85a1-c58792a0f5f8 server_version:17.2.2 session_chunk_id:9862c82f-32e5-4c4a-87cd-dd4648dd3c38 sid:063e3000708b3f2fdebe6610a068ef36daf56cf5103e63d3df7689ce3e8e43f2 time:2025-03-05T11:22:09.886Z trace.component:audit uid:b8afdde3-43ad-4cb8-9d93-a3d234d2d169 user:username user_kind:1] events/emitter.go:287
2025-03-05T11:22:09.902Z INFO [APP:SERVI] Round trip: GET , code: 307, duration: 16.153207ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.928Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 4.198207ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.994Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.837296ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.228Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.695592ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.238Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.327523ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.241Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 3.076735ms tls:version: 304, tls:resume:false, tls:csuite:1301,
Hi!
I'm trying to use teleport to expose the hashicorp vault ui we have on our Kubernetes cluster.
I'm receiving a blank page with 500 errors when I try to access them. This is my kube-agent config
...
app_service:
enabled: true
apps:
- name: vault-dev
uri: https://develop-vault-server-active.vault.svc.cluster.local:8200
labels:
env: develop
service: vault
rewrite:
headers:
- 'Host: develop-vault-server-active.vault.svc.cluster.local:8200'
...
Kube-agent logs
2025-03-05T11:19:26.510Z INFO [KUBERNETE] Starting Kube service via proxy reverse tunnel. pid:6.1 service/kubernetes.go:257
2025-03-05T11:19:26.575Z INFO [APP:SERVI] Cache "apps" first init succeeded. cache/cache.go:1152
2025-03-05T11:19:29.618Z INFO [APP:SERVI] All applications successfully started. pid:6.1 service/service.go:6224
2025-03-05T11:19:29.618Z INFO [PROC:1] The new service has started successfully. Starting syncing rotation status. pid:6.1 max_retry_period:4m16s service/connect.go:642
2025-03-05T11:22:09.831Z INFO emitting audit event event_type:app.session.chunk fields:map[app_name:vault-dev app_public_addr:vault.dev.teleport.xxx.co app_uri: cluster_name:teleport.xxx.co code:T2008I ei:6.65831065482e+11 event:app.session.chunk namespace:default private_key_policy:none server_id:21235eb8-04a9-400d-85a1-c58792a0f5f8 server_version:17.2.2 session_chunk_id:60b98e63-6fa4-4864-9293-e5a9e35eb0c3 sid:8671e5e0d3b649b50dc0d77860af90de88912c7d4b5addeff76f6599e740ed64 time:2025-03-05T11:22:09.831Z trace.component:audit uid:8396daf7-5fd3-44ae-b465-10a3b4e62382 user:username user_kind:1] events/emitter.go:287
2025-03-05T11:22:09.842Z INFO [APP:SERVI] Round trip: GET , code: 307, duration: 10.831033ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.888Z INFO emitting audit event event_type:app.session.chunk fields:map[app_name:vault-dev app_public_addr:vault.dev.teleport.xxx.co app_uri: cluster_name:teleport.xxx.co code:T2008I ei:6.0885849394e+10 event:app.session.chunk namespace:default private_key_policy:none server_id:21235eb8-04a9-400d-85a1-c58792a0f5f8 server_version:17.2.2 session_chunk_id:9862c82f-32e5-4c4a-87cd-dd4648dd3c38 sid:063e3000708b3f2fdebe6610a068ef36daf56cf5103e63d3df7689ce3e8e43f2 time:2025-03-05T11:22:09.886Z trace.component:audit uid:b8afdde3-43ad-4cb8-9d93-a3d234d2d169 user:username user_kind:1] events/emitter.go:287
2025-03-05T11:22:09.902Z INFO [APP:SERVI] Round trip: GET , code: 307, duration: 16.153207ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.928Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 4.198207ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:09.994Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.837296ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.228Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.695592ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.238Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 2.327523ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223
2025-03-05T11:22:10.241Z INFO [APP:SERVI] Round trip: GET , code: 200, duration: 3.076735ms tls:version: 304, tls:resume:false, tls:csuite:1301,
tls:server:74656c65706f72742e7470662e636f.teleport.cluster.local reverseproxy/reverse_proxy.go:223https://develop-vault-server-active.vault.svc:8200https://develop-vault-server-active.vault.svc:8200/favicon.icohttps://develop-vault-server-active.vault.svc:8200https://develop-vault-server-active.vault.svc:8200/https://develop-vault-server-active.vault.svc:8200/ui/https://develop-vault-server-active.vault.svc:8200/ui/https://develop-vault-server-active.vault.svc:8200/ui/assets/vendor-d7bcb4a6a4344380e4c2303094d4ca7d.csshttps://develop-vault-server-active.vault.svc:8200/ui/assets/chunk.143.e91479deff7823988269.csshttps://develop-vault-server-active.vault.svc:8200/ui/assets/vault-83d1a3f61679fd041c567318ad68c607.css
Is someone already exposed the hashicorp vault ui with teleport?
https://redd.it/1j592h8
@r_devops
Is someone already exposed the hashicorp vault ui with teleport?
https://redd.it/1j592h8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
While executing "vagrant up", I am encountering the follow error. Would be thankful if you please guide me on this. Thank you in advance.
==> controlplane: Setting hostname...
==> controlplane: Configuring and enabling network interfaces...
The SSH connection was unexpectedly closed by the remote end. This usually indicates that SSH within the guest machine was unable to properly start up. Please boot the VM in GUI mode to check whether it is booting properly.
Following are the complete message till I got the error and got stopped:
ub1@ub1-VirtualBox:\~/certified-kubernetes-administrator-course/kubeadm-clusters/virtualbox$ vagrant up
Bringing machine 'controlplane' up with 'virtualbox' provider...
Bringing machine 'node01' up with 'virtualbox' provider...
Bringing machine 'node02' up with 'virtualbox' provider...
==> controlplane: Box 'ubuntu/jammy64' could not be found. Attempting to find and install...
controlplane: Box Provider: virtualbox
controlplane: Box Version: >= 0
==> controlplane: Loading metadata for box 'ubuntu/jammy64'
controlplane: URL: https://vagrantcloud.com/api/v2/vagrant/ubuntu/jammy64
==> controlplane: Adding box 'ubuntu/jammy64' (v20241002.0.0) for provider: virtualbox
controlplane: Downloading: https://vagrantcloud.com/ubuntu/boxes/jammy64/versions/20241002.0.0/providers/virtualbox/unknown/vagrant.box
==> controlplane: Successfully added box 'ubuntu/jammy64' (v20241002.0.0) for 'virtualbox'!
==> controlplane: Importing base box 'ubuntu/jammy64'...
==> controlplane: Matching MAC address for NAT networking...
==> controlplane: Setting the name of the VM: controlplane
Vagrant is currently configured to create VirtualBox synced folders with
the `SharedFoldersEnableSymlinksCreate` option enabled. If the Vagrant
guest is not trusted, you may want to disable this option. For more
information on this option, please refer to the VirtualBox manual:
https://www.virtualbox.org/manual/ch04.html#sharedfolders
This option can be disabled globally with an environment variable:
VAGRANT_DISABLE_VBOXSYMLINKCREATE=1
or on a per folder basis within the Vagrantfile:
config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false
==> controlplane: Clearing any previously set network interfaces...
==> controlplane: Preparing network interfaces based on configuration...
controlplane: Adapter 1: nat
controlplane: Adapter 2: bridged
==> controlplane: Forwarding ports...
controlplane: 22 (guest) => 2222 (host) (adapter 1)
==> controlplane: Running 'pre-boot' VM customizations...
==> controlplane: Booting VM...
==> controlplane: Waiting for machine to boot. This may take a few minutes...
controlplane: SSH address: 127.0.0.1:2222
controlplane: SSH username: vagrant
controlplane: SSH auth method: private key
controlplane: Warning: Connection reset. Retrying...
controlplane: Warning: Remote connection disconnect. Retrying...
controlplane: Warning: Connection reset. Retrying...
controlplane:
controlplane: Vagrant insecure key detected. Vagrant will automatically replace
controlplane: this with a newly generated keypair for better security.
controlplane:
controlplane: Inserting generated public key within guest...
controlplane: Removing insecure key from the guest if it's present...
controlplane: Key inserted! Disconnecting and reconnecting using new SSH key...
==> controlplane: Machine booted and ready!
==> controlplane: Checking for guest additions in VM...
controlplane: The guest additions on this VM do not match the installed version of
controlplane: VirtualBox! In most cases this is fine, but in rare cases it can
controlplane: prevent things such as shared folders from working properly. If you see
controlplane: shared folder errors, please make sure the
==> controlplane: Setting hostname...
==> controlplane: Configuring and enabling network interfaces...
The SSH connection was unexpectedly closed by the remote end. This usually indicates that SSH within the guest machine was unable to properly start up. Please boot the VM in GUI mode to check whether it is booting properly.
Following are the complete message till I got the error and got stopped:
ub1@ub1-VirtualBox:\~/certified-kubernetes-administrator-course/kubeadm-clusters/virtualbox$ vagrant up
Bringing machine 'controlplane' up with 'virtualbox' provider...
Bringing machine 'node01' up with 'virtualbox' provider...
Bringing machine 'node02' up with 'virtualbox' provider...
==> controlplane: Box 'ubuntu/jammy64' could not be found. Attempting to find and install...
controlplane: Box Provider: virtualbox
controlplane: Box Version: >= 0
==> controlplane: Loading metadata for box 'ubuntu/jammy64'
controlplane: URL: https://vagrantcloud.com/api/v2/vagrant/ubuntu/jammy64
==> controlplane: Adding box 'ubuntu/jammy64' (v20241002.0.0) for provider: virtualbox
controlplane: Downloading: https://vagrantcloud.com/ubuntu/boxes/jammy64/versions/20241002.0.0/providers/virtualbox/unknown/vagrant.box
==> controlplane: Successfully added box 'ubuntu/jammy64' (v20241002.0.0) for 'virtualbox'!
==> controlplane: Importing base box 'ubuntu/jammy64'...
==> controlplane: Matching MAC address for NAT networking...
==> controlplane: Setting the name of the VM: controlplane
Vagrant is currently configured to create VirtualBox synced folders with
the `SharedFoldersEnableSymlinksCreate` option enabled. If the Vagrant
guest is not trusted, you may want to disable this option. For more
information on this option, please refer to the VirtualBox manual:
https://www.virtualbox.org/manual/ch04.html#sharedfolders
This option can be disabled globally with an environment variable:
VAGRANT_DISABLE_VBOXSYMLINKCREATE=1
or on a per folder basis within the Vagrantfile:
config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false
==> controlplane: Clearing any previously set network interfaces...
==> controlplane: Preparing network interfaces based on configuration...
controlplane: Adapter 1: nat
controlplane: Adapter 2: bridged
==> controlplane: Forwarding ports...
controlplane: 22 (guest) => 2222 (host) (adapter 1)
==> controlplane: Running 'pre-boot' VM customizations...
==> controlplane: Booting VM...
==> controlplane: Waiting for machine to boot. This may take a few minutes...
controlplane: SSH address: 127.0.0.1:2222
controlplane: SSH username: vagrant
controlplane: SSH auth method: private key
controlplane: Warning: Connection reset. Retrying...
controlplane: Warning: Remote connection disconnect. Retrying...
controlplane: Warning: Connection reset. Retrying...
controlplane:
controlplane: Vagrant insecure key detected. Vagrant will automatically replace
controlplane: this with a newly generated keypair for better security.
controlplane:
controlplane: Inserting generated public key within guest...
controlplane: Removing insecure key from the guest if it's present...
controlplane: Key inserted! Disconnecting and reconnecting using new SSH key...
==> controlplane: Machine booted and ready!
==> controlplane: Checking for guest additions in VM...
controlplane: The guest additions on this VM do not match the installed version of
controlplane: VirtualBox! In most cases this is fine, but in rare cases it can
controlplane: prevent things such as shared folders from working properly. If you see
controlplane: shared folder errors, please make sure the
guest additions within the
controlplane: virtual machine match the version of VirtualBox you have installed on
controlplane: your host and reload your VM.
controlplane:
controlplane: Guest Additions Version: 6.0.0 r127566
controlplane: VirtualBox Version: 7.1
==> controlplane: Setting hostname...
==> controlplane: Configuring and enabling network interfaces...
The SSH connection was unexpectedly closed by the remote end. This
usually indicates that SSH within the guest machine was unable to
properly start up. Please boot the VM in GUI mode to check whether
it is booting properly.
https://redd.it/1j5ae1f
@r_devops
controlplane: virtual machine match the version of VirtualBox you have installed on
controlplane: your host and reload your VM.
controlplane:
controlplane: Guest Additions Version: 6.0.0 r127566
controlplane: VirtualBox Version: 7.1
==> controlplane: Setting hostname...
==> controlplane: Configuring and enabling network interfaces...
The SSH connection was unexpectedly closed by the remote end. This
usually indicates that SSH within the guest machine was unable to
properly start up. Please boot the VM in GUI mode to check whether
it is booting properly.
https://redd.it/1j5ae1f
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Recommended learning path for AWS infrastructure services
Hi,
so what learning path/strategy/resources would your recommend for someone who wants to get practical skills and be able to design/build and manage cloud infrastructure in AWS, using IaC and be on top of the game when it comes to automation and monitoring?
Existing experience includes: strong networking - including core networking as well as application proxies and WAFs
Strong Linux and scripting skiils
C, Python, Go programming experience
Strong DBA experience, also directory services and auth solutions
System design and infrastructure architecture experience, including many types of virtualization platforms
but very limited public cloud production experience
Once again, not looking for a certification path, but more of a hands on, practical get up and being successful platform engineer using AWS and foundational services + EKS, ECS.
Ideally looking for learning from real world examples or building/running real world complex systems in AWS.
What would be practical approach to learning be like?
https://redd.it/1j58qj0
@r_devops
Hi,
so what learning path/strategy/resources would your recommend for someone who wants to get practical skills and be able to design/build and manage cloud infrastructure in AWS, using IaC and be on top of the game when it comes to automation and monitoring?
Existing experience includes: strong networking - including core networking as well as application proxies and WAFs
Strong Linux and scripting skiils
C, Python, Go programming experience
Strong DBA experience, also directory services and auth solutions
System design and infrastructure architecture experience, including many types of virtualization platforms
but very limited public cloud production experience
Once again, not looking for a certification path, but more of a hands on, practical get up and being successful platform engineer using AWS and foundational services + EKS, ECS.
Ideally looking for learning from real world examples or building/running real world complex systems in AWS.
What would be practical approach to learning be like?
https://redd.it/1j58qj0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is there a local dev (single license) setup for JFrog Artifactory?
My company uses JFrog Artifactory, so being a good dev I installed it locally learn the finer points. However I brought up the UI of my new install and it asked me for a license, then completely me blocked from doing anything 😂
Most other companies let you use their full product locally for evaluation purposes... What do you all suggest?
I know they have alternative versions (Artifactory OSS & JFrog Container Registry) which are more limited (Java, Docker) are those my best bet?
I noticed they also have a cloud managed version (with free trial) but I was hoping to self-host so I could really learn it, but maybe it's not worth the hassle?
https://redd.it/1j5cjtx
@r_devops
My company uses JFrog Artifactory, so being a good dev I installed it locally learn the finer points. However I brought up the UI of my new install and it asked me for a license, then completely me blocked from doing anything 😂
Most other companies let you use their full product locally for evaluation purposes... What do you all suggest?
I know they have alternative versions (Artifactory OSS & JFrog Container Registry) which are more limited (Java, Docker) are those my best bet?
I noticed they also have a cloud managed version (with free trial) but I was hoping to self-host so I could really learn it, but maybe it's not worth the hassle?
https://redd.it/1j5cjtx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
General Advice For a Kubernetes setup
Our planned setup is:
1 Kubernetes Cluster - CI/CD via Jenkins
1 Deployment (2-3 pods) for our UI
1 Deployment (2-3 pods) for our Server
SQL server hosted any way we please
The top 3 are mandatory per the situation (we don't own the infrastructure) but the DB we have some say over.
Question:
We are a small team, none of us do a ton of DevOps
Would folks recommend trying to put the database into the cluster itself or would it be easier to host the database elsewhere and connect to it?
I have heard managing persistent statefulset resources in the cluster can be painful.
https://redd.it/1j5aw89
@r_devops
Our planned setup is:
1 Kubernetes Cluster - CI/CD via Jenkins
1 Deployment (2-3 pods) for our UI
1 Deployment (2-3 pods) for our Server
SQL server hosted any way we please
The top 3 are mandatory per the situation (we don't own the infrastructure) but the DB we have some say over.
Question:
We are a small team, none of us do a ton of DevOps
Would folks recommend trying to put the database into the cluster itself or would it be easier to host the database elsewhere and connect to it?
I have heard managing persistent statefulset resources in the cluster can be painful.
https://redd.it/1j5aw89
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is there a canvas app that lets you quickly design a DevOps infrastructure?
I would like to design something and have someone look at it and criticize it. Is there any app like that? It would be really useful.
https://redd.it/1j5dyxg
@r_devops
I would like to design something and have someone look at it and criticize it. Is there any app like that? It would be really useful.
https://redd.it/1j5dyxg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What are the main benefits of setting up a vps for your project?
Want to learn more about vps in general and how I can benefit from setting one up.
https://redd.it/1j5ga1g
@r_devops
Want to learn more about vps in general and how I can benefit from setting one up.
https://redd.it/1j5ga1g
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Seeking clients as a Devops Freelancer
I am working as a full time devops engineer but these days I don't have much project work and I want to take up freelancing projects side by side . What are the best ways I can do that ?
https://redd.it/1j5hkzy
@r_devops
I am working as a full time devops engineer but these days I don't have much project work and I want to take up freelancing projects side by side . What are the best ways I can do that ?
https://redd.it/1j5hkzy
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
s1h: ssh + scp + passwords manager unified in one simple CLI
Hello everyone, I use ssh a lot, and I have a mixture of passwords & private key, which is a pain to work with. To solve that pain point, I created this tool called s1h inspired from k9s:
https://github.com/noboruma/s1h
Hope you find it useful as well!
https://redd.it/1j5jlzo
@r_devops
Hello everyone, I use ssh a lot, and I have a mixture of passwords & private key, which is a pain to work with. To solve that pain point, I created this tool called s1h inspired from k9s:
https://github.com/noboruma/s1h
Hope you find it useful as well!
https://redd.it/1j5jlzo
@r_devops
GitHub
GitHub - noboruma/s1h: ssh + scp + passwords manager
ssh + scp + passwords manager. Contribute to noboruma/s1h development by creating an account on GitHub.
Seeking feedback on my approach to building a container orchestrator (Uncloud)
Hey DevOps folks,
I'm reaching out for some honest feedback on a personal open source project that stemmed from my curiosity about simplifying the state of the art in container orchestration.
After spending years working with Kubernetes at a unicorn and for my home infra, I found myself increasingly frustrated by the operational overhead and complexity. I kept thinking: "Surely there must be a middle ground between simple Docker Compose and full-blown Kubernetes for small-medium scale? Can it work without Raft?" I wanted container orchestration to bring me joy again, the way Ansible did when I first tried it a decade ago, or Docker after that. Do you sometimes feel the same?
That frustration led me to start building Uncloud, intentionally focusing on core design principles that differ from traditional container orchestrators like Kubernetes, Docker Swarm, or Nomad:
No control plane: Fully decentralised design without quorum eliminates single points of failure and reduces operational overhead. Each machine maintains a synchronised copy of the cluster state through peer-to-peer communication, keeping cluster operations functional even if some machines go offline
Zero-config private network: Automatic WireGuard mesh with peer discovery and NAT traversal. Containers get unique IPs for direct cross-machine communication
Imperative over declarative: Favoring imperative operations over state reconciliation simplifies both the mental model and troubleshooting
Partition tolerant: Ability to function during network partitions at the cost of eventual consistency
Batteries included: Built-in service discovery using DNS, load balancing, ingress with HTTPS
Docker-like CLI: Familiar commands for managing both infrastructure and applications
I want well-designed building blocks that just work together. When a service needs high availability, I should be able to scale it across machines and know that if any machine goes down the remaining ones will continue serving traffic. I don’t need advanced auto-healing or auto-scaling magic that is easy to misconfigure. When I deploy, I want immediate feedback, not wondering whether the reconciliation loop will eventually catch up.
Please check out the GitHub page for more technical details and a Demo: https://github.com/psviderski/uncloud
I know this approach won't fit everyone's needs and by no means does it intend to replace K8s at scale. Always use what works best for your specific situation and don’t forget to have fun!
I’d really love to hear your feedback:
Am I alone in wanting something more powerful than Docker Compose but less complex than Kubernetes?
If you're dealing with similar challenges, what would you prioritise in a tool like this?
https://redd.it/1j5dxkr
@r_devops
Hey DevOps folks,
I'm reaching out for some honest feedback on a personal open source project that stemmed from my curiosity about simplifying the state of the art in container orchestration.
After spending years working with Kubernetes at a unicorn and for my home infra, I found myself increasingly frustrated by the operational overhead and complexity. I kept thinking: "Surely there must be a middle ground between simple Docker Compose and full-blown Kubernetes for small-medium scale? Can it work without Raft?" I wanted container orchestration to bring me joy again, the way Ansible did when I first tried it a decade ago, or Docker after that. Do you sometimes feel the same?
That frustration led me to start building Uncloud, intentionally focusing on core design principles that differ from traditional container orchestrators like Kubernetes, Docker Swarm, or Nomad:
No control plane: Fully decentralised design without quorum eliminates single points of failure and reduces operational overhead. Each machine maintains a synchronised copy of the cluster state through peer-to-peer communication, keeping cluster operations functional even if some machines go offline
Zero-config private network: Automatic WireGuard mesh with peer discovery and NAT traversal. Containers get unique IPs for direct cross-machine communication
Imperative over declarative: Favoring imperative operations over state reconciliation simplifies both the mental model and troubleshooting
Partition tolerant: Ability to function during network partitions at the cost of eventual consistency
Batteries included: Built-in service discovery using DNS, load balancing, ingress with HTTPS
Docker-like CLI: Familiar commands for managing both infrastructure and applications
I want well-designed building blocks that just work together. When a service needs high availability, I should be able to scale it across machines and know that if any machine goes down the remaining ones will continue serving traffic. I don’t need advanced auto-healing or auto-scaling magic that is easy to misconfigure. When I deploy, I want immediate feedback, not wondering whether the reconciliation loop will eventually catch up.
Please check out the GitHub page for more technical details and a Demo: https://github.com/psviderski/uncloud
I know this approach won't fit everyone's needs and by no means does it intend to replace K8s at scale. Always use what works best for your specific situation and don’t forget to have fun!
I’d really love to hear your feedback:
Am I alone in wanting something more powerful than Docker Compose but less complex than Kubernetes?
If you're dealing with similar challenges, what would you prioritise in a tool like this?
https://redd.it/1j5dxkr
@r_devops
GitHub
GitHub - psviderski/uncloud: A lightweight tool for deploying and managing containerised applications across a network of Docker…
A lightweight tool for deploying and managing containerised applications across a network of Docker hosts. Bridging the gap between Docker and Kubernetes ✨ - psviderski/uncloud
CI/CD compliance audit
Have you ever conducted a compliance audit of CI/CD pipelines? By compliance, I mean ensuring that all CI/CD pipeline configurations comply with internal policies or external norms and frameworks (CIS Benchmark, NIST, NIS2, ISO 27001, etc.).
I'm super interested by feedbacks about it
https://redd.it/1j5kwo2
@r_devops
Have you ever conducted a compliance audit of CI/CD pipelines? By compliance, I mean ensuring that all CI/CD pipeline configurations comply with internal policies or external norms and frameworks (CIS Benchmark, NIST, NIS2, ISO 27001, etc.).
I'm super interested by feedbacks about it
https://redd.it/1j5kwo2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Understanding and mitigating Tail Latency by using request Hedging
Hi folks! 👋
I recently dove deep into latency mitigation strategies and wrote about request hedging, a technique I discovered while studying Grafana's distributed system toolkit. I thought this might be valuable for others working on distributed systems.
The article covers:
\- What tail latency is and why it matters
\- How request hedging works to combat latency spikes
\- Practical implementation example with some simulated numbers
Blog post: https://blog.alexoglou.com/posts/hedging
If you worked on tackling tail latency challenges in your systems I would love to know what you implemented and how it performed!
https://redd.it/1j5ld3g
@r_devops
Hi folks! 👋
I recently dove deep into latency mitigation strategies and wrote about request hedging, a technique I discovered while studying Grafana's distributed system toolkit. I thought this might be valuable for others working on distributed systems.
The article covers:
\- What tail latency is and why it matters
\- How request hedging works to combat latency spikes
\- Practical implementation example with some simulated numbers
Blog post: https://blog.alexoglou.com/posts/hedging
If you worked on tackling tail latency challenges in your systems I would love to know what you implemented and how it performed!
https://redd.it/1j5ld3g
@r_devops
GitHub
GitHub - grafana/dskit: Distributed systems kit
Distributed systems kit. Contribute to grafana/dskit development by creating an account on GitHub.
Lighthouse and TTFB on azure
I have an azure Ubuntu server where I host a website that’s built using php (symfony), MySQL on an azure musql server, and node js. I’ve been trying to enhance the lighthouse performance score for the website. In general, I get 60-70 for performance and we aim to get to 90. I’ve looked into different aspects including caching, compression, using http2, and an azure cdn. The results are slightly better but not close to our target. One aspect I notice a lot is the TTFB values fluctuating all over the place from 60-1100 ms , which seems a lot. Has anybody tried any solutions to enhance that?
https://redd.it/1j5ng6w
@r_devops
I have an azure Ubuntu server where I host a website that’s built using php (symfony), MySQL on an azure musql server, and node js. I’ve been trying to enhance the lighthouse performance score for the website. In general, I get 60-70 for performance and we aim to get to 90. I’ve looked into different aspects including caching, compression, using http2, and an azure cdn. The results are slightly better but not close to our target. One aspect I notice a lot is the TTFB values fluctuating all over the place from 60-1100 ms , which seems a lot. Has anybody tried any solutions to enhance that?
https://redd.it/1j5ng6w
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Github actions, share custom actions
Hi everyone, I'm using Github Actions to build and deploy my applications.
I've already read that Github Actions has many shortcomings when it comes to advanced settings.
I'm using a private repo to share my custom actions: my-actions-repo.
When I need use a custom action in some job I need specify the complete syntax: my\_user\_name/my-actions-repo/actions/aws/aws-login@main, even though the workflow and actions are in the same repository.
name: "Workflow reusable"
on:
workflow_call:
inputs:
image:
description: "The Docker image to use"
type: string
required: true
jobs:
job1:
runs-on: ubuntu-latest
container:
image: ${{ inputs.image }}
needs: build
steps:
- name: Checkout
uses: actions/checkout@v3
- name: AWS Login
uses: my_user_name/my-actions-repo/actions/aws/aws-login@main
with:
region: "us-east-1"
How could I specify that the custom actions are within the actions repository (my-actions-repo), or what other options do I have since it is very dirty to indicate the entire syntax, I would like to only indicate: `./actions/aws/aws-login.`
If I just put "`/actions/aws/aws-login`", it tries to look for the actions in the repository where I'm calling my reusable workflow.
https://redd.it/1j5qrpv
@r_devops
Hi everyone, I'm using Github Actions to build and deploy my applications.
I've already read that Github Actions has many shortcomings when it comes to advanced settings.
I'm using a private repo to share my custom actions: my-actions-repo.
When I need use a custom action in some job I need specify the complete syntax: my\_user\_name/my-actions-repo/actions/aws/aws-login@main, even though the workflow and actions are in the same repository.
name: "Workflow reusable"
on:
workflow_call:
inputs:
image:
description: "The Docker image to use"
type: string
required: true
jobs:
job1:
runs-on: ubuntu-latest
container:
image: ${{ inputs.image }}
needs: build
steps:
- name: Checkout
uses: actions/checkout@v3
- name: AWS Login
uses: my_user_name/my-actions-repo/actions/aws/aws-login@main
with:
region: "us-east-1"
How could I specify that the custom actions are within the actions repository (my-actions-repo), or what other options do I have since it is very dirty to indicate the entire syntax, I would like to only indicate: `./actions/aws/aws-login.`
If I just put "`/actions/aws/aws-login`", it tries to look for the actions in the repository where I'm calling my reusable workflow.
https://redd.it/1j5qrpv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Failed to get a junior DevOps job
Hello everyone,
For the past seven months, I have been studying and attending DevOps courses on Udemy. I also purchased TechWorld with Nana’s DevOps Bootcamp and have been learning all the essential tools that every DevOps engineer should know also I have a solid linux knowledge. However, I have not yet succeeded in securing a Junior DevOps position.
Currently, I am working as a Software Support Engineer, but I want to build a career in DevOps. What workflow should I follow to gain real-world DevOps experience until I get accepted for a Junior DevOps role?
https://redd.it/1j5q1lo
@r_devops
Hello everyone,
For the past seven months, I have been studying and attending DevOps courses on Udemy. I also purchased TechWorld with Nana’s DevOps Bootcamp and have been learning all the essential tools that every DevOps engineer should know also I have a solid linux knowledge. However, I have not yet succeeded in securing a Junior DevOps position.
Currently, I am working as a Software Support Engineer, but I want to build a career in DevOps. What workflow should I follow to gain real-world DevOps experience until I get accepted for a Junior DevOps role?
https://redd.it/1j5q1lo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Argocd + naming convention for multi cluster deployments
Just curious how people handle naming their applications when using argocd?
I'm currently setting up an applicationset that I want to deploy to multiple clusters. The problem is I was wanting them all to have the same helm names inside the cluster
Ie. I want the helm chart in the cluster to be called {{name}}, not {{name}}-{{cluster}}, I don't care if the application inside ArgoCD is different but is there a way to reuse helm chart names?
https://redd.it/1j5vbpb
@r_devops
Just curious how people handle naming their applications when using argocd?
I'm currently setting up an applicationset that I want to deploy to multiple clusters. The problem is I was wanting them all to have the same helm names inside the cluster
Ie. I want the helm chart in the cluster to be called {{name}}, not {{name}}-{{cluster}}, I don't care if the application inside ArgoCD is different but is there a way to reuse helm chart names?
https://redd.it/1j5vbpb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Managing Terminating Namespaces: Real-World Lessons in Kubernetes Cleanup
https://blog.abhimanyu-saharan.com/posts/managing-terminating-namespaces-real-world-lessons-in-kubernetes-cleanup
https://redd.it/1j5xhjn
@r_devops
https://blog.abhimanyu-saharan.com/posts/managing-terminating-namespaces-real-world-lessons-in-kubernetes-cleanup
https://redd.it/1j5xhjn
@r_devops
Blog | Abhimanyu Saharan
Managing Terminating Namespaces: Real-World Lessons in Kubernetes Cleanup
Learn how to diagnose and resolve stuck Kubernetes namespaces caused by lingering finalizers and orphaned resources using practical commands and real-world examples.