I built Backup Guardian after a 3AM production disaster with a "good" backup
Hey r/devops
This is actually my first post here, but I wanted to share something I built after getting burned by database backups one too many times.
The 3AM story:
Last month I was migrating a client's PostgreSQL database. The backup file looked perfect, passed all syntax checks, file integrity was good. Started the migration and... half the foreign key constraints were missing. Spent 6 hours at 3AM trying to figure out what went wrong.
That's when it hit me: most backup validation tools just check SQL syntax and file structure. They don't actually try to restore the backup.
What I built:
Backup Guardian actually spins up fresh Docker containers and restores your entire backup to see what breaks. It's like having a staging environment specifically for testing backup files.
How it works:
Upload your `.sql`, `.dump`, or `.backup` file
Creates isolated Docker container
Actually restores the backup completely
Analyzes the restored database
Gives you a 0-100 migration confidence score
Cleans up automatically
Also has a CLI for CI/CD:
npm install -g backup-guardian
backup-guardian validate backup.sql --json
Perfect for catching backup issues before they hit production.
Try it: https://www.backupguardian.org
CLI docs: https://www.backupguardian.org/cli
GitHub: https://github.com/pasika26/backupguardian
Tech stack: Node.js, React, PostgreSQL, Docker (Railway + Vercel hosting)
Current support: PostgreSQL, MySQL (MongoDB coming soon)
What I'm looking for:
Try it with your backup files - what breaks?
Feedback on the validation logic - what am I missing?
Feature requests for your workflow
Your worst backup disaster stories (they help me prioritize features!)
I know there are other backup tools out there, but couldn't find anything that actually tests restoration in isolated environments. Most just parse files and call it validation.
Being my first post here, I'd really appreciate any feedback - technical, UI/UX, or just brutal honesty about whether this solves a real problem!
What's the worst backup disaster you've experienced?
https://redd.it/1mb9435
@r_devops
Hey r/devops
This is actually my first post here, but I wanted to share something I built after getting burned by database backups one too many times.
The 3AM story:
Last month I was migrating a client's PostgreSQL database. The backup file looked perfect, passed all syntax checks, file integrity was good. Started the migration and... half the foreign key constraints were missing. Spent 6 hours at 3AM trying to figure out what went wrong.
That's when it hit me: most backup validation tools just check SQL syntax and file structure. They don't actually try to restore the backup.
What I built:
Backup Guardian actually spins up fresh Docker containers and restores your entire backup to see what breaks. It's like having a staging environment specifically for testing backup files.
How it works:
Upload your `.sql`, `.dump`, or `.backup` file
Creates isolated Docker container
Actually restores the backup completely
Analyzes the restored database
Gives you a 0-100 migration confidence score
Cleans up automatically
Also has a CLI for CI/CD:
npm install -g backup-guardian
backup-guardian validate backup.sql --json
Perfect for catching backup issues before they hit production.
Try it: https://www.backupguardian.org
CLI docs: https://www.backupguardian.org/cli
GitHub: https://github.com/pasika26/backupguardian
Tech stack: Node.js, React, PostgreSQL, Docker (Railway + Vercel hosting)
Current support: PostgreSQL, MySQL (MongoDB coming soon)
What I'm looking for:
Try it with your backup files - what breaks?
Feedback on the validation logic - what am I missing?
Feature requests for your workflow
Your worst backup disaster stories (they help me prioritize features!)
I know there are other backup tools out there, but couldn't find anything that actually tests restoration in isolated environments. Most just parse files and call it validation.
Being my first post here, I'd really appreciate any feedback - technical, UI/UX, or just brutal honesty about whether this solves a real problem!
What's the worst backup disaster you've experienced?
https://redd.it/1mb9435
@r_devops
Backup Guardian
Backup Guardian | Database Backup Monitoring & Alerts
Monitor your database backups with real-time alerts, health checks, and automated validation. Ensure your data is always protected.
What’s the worst cloud cost horror story you’ve experienced or heard of?
I'm looking for real-life cloud cost horror stories of unexpected bills, misconfigured resources, out-of-control autoscaling, forgotten services running for months… you name it. This is for a blog I'm planning to write, so if you guys don't mind, pls go ahead and share your worst cloud spend nightmare.
https://redd.it/1mb9ywn
@r_devops
I'm looking for real-life cloud cost horror stories of unexpected bills, misconfigured resources, out-of-control autoscaling, forgotten services running for months… you name it. This is for a blog I'm planning to write, so if you guys don't mind, pls go ahead and share your worst cloud spend nightmare.
https://redd.it/1mb9ywn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
“Buy 2 boxes” to “wrangle 20 services” , did Cloud + K8s really make Ops net easier?
https://preview.redd.it/mu7bionsjkff1.png?width=2542&format=png&auto=webp&s=62dafbc9f44af33967d047f3d6ea01af373f5420
TL;DR I’m about to spec fresh on‑prem gear because an uptick of EU‑based customers cite local data‑protection. Meanwhile our Cloud/K8s stack feels like it took the “buy 2 of everything” rule turned into “wrangle 20 loosely-coupled things.”
I assume a regular post in here but:
Context
• Ideal: “The cloud will abstract ops so we can focus on code!”
• Current reality: Terraform, EKS, Helm, Prometheus, ArgoCD, Istio, OPA, Velero, external‑DNS, cert‑manager, Gatekeeper.. Each layer buys freedom with complexity tax.
• Customers in Europe/APAC now insist data stay inside national borders and under their own encryption keys meaning we either pony up for dedicated regions (≈$$$) or roll our own small‑ish DC.
Questions for the hive mind
1. If you’ve pivoted from cloud‑first back to on‑prem/hybrid and possibly a monolith setup, did it by any chance actually simplify things? (Networking? Cost forecasting? Audit trail?)
2. Which hyperscale options truly compete in the “sovereign cloud” space today?
I’d love war stories, cost curves or regrets that can be shared.
https://redd.it/1mba4ex
@r_devops
https://preview.redd.it/mu7bionsjkff1.png?width=2542&format=png&auto=webp&s=62dafbc9f44af33967d047f3d6ea01af373f5420
TL;DR I’m about to spec fresh on‑prem gear because an uptick of EU‑based customers cite local data‑protection. Meanwhile our Cloud/K8s stack feels like it took the “buy 2 of everything” rule turned into “wrangle 20 loosely-coupled things.”
I assume a regular post in here but:
Context
• Ideal: “The cloud will abstract ops so we can focus on code!”
• Current reality: Terraform, EKS, Helm, Prometheus, ArgoCD, Istio, OPA, Velero, external‑DNS, cert‑manager, Gatekeeper.. Each layer buys freedom with complexity tax.
• Customers in Europe/APAC now insist data stay inside national borders and under their own encryption keys meaning we either pony up for dedicated regions (≈$$$) or roll our own small‑ish DC.
Questions for the hive mind
1. If you’ve pivoted from cloud‑first back to on‑prem/hybrid and possibly a monolith setup, did it by any chance actually simplify things? (Networking? Cost forecasting? Audit trail?)
2. Which hyperscale options truly compete in the “sovereign cloud” space today?
I’d love war stories, cost curves or regrets that can be shared.
https://redd.it/1mba4ex
@r_devops
Switching Career Paths: DevOps vs Cloud Data Engineering – Need Advice
Hi everyone 👋
I'm currently working in an SAP BW role and actively preparing to transition into the cloud space. I’ve already earned AWS certification and I’m learning Terraform, Docker, and CI/CD practices. At the same time, I'm deeply interested in data engineering—especially cloud-based solutions—and I've started exploring tools and architectures relevant to that domain.
I’m at a crossroads and hoping to get some community wisdom:
🔹 Option 1: Cloud/DevOps
I enjoy working with infrastructure-as-code, containerization, and automation pipelines. The rapid evolution and versatility of DevOps appeal to me, and I see a lot of room to grow here.
🔹 Option 2: Cloud Data Engineering
Given my background in SAP BW and data-heavy implementations, cloud data engineering feels like a natural extension. I’m particularly interested in building scalable data pipelines, governance, and analytics solutions on cloud platforms.
So here’s the big question:
👉 Which path offers better long-term growth, work-life balance, and alignment with future tech trends?
Would love to hear from folks who’ve made the switch or are working in these domains. Any insights, pros/cons, or personal experiences would be hugely appreciated!
Thanks in advance 🙌
https://redd.it/1mbamhx
@r_devops
Hi everyone 👋
I'm currently working in an SAP BW role and actively preparing to transition into the cloud space. I’ve already earned AWS certification and I’m learning Terraform, Docker, and CI/CD practices. At the same time, I'm deeply interested in data engineering—especially cloud-based solutions—and I've started exploring tools and architectures relevant to that domain.
I’m at a crossroads and hoping to get some community wisdom:
🔹 Option 1: Cloud/DevOps
I enjoy working with infrastructure-as-code, containerization, and automation pipelines. The rapid evolution and versatility of DevOps appeal to me, and I see a lot of room to grow here.
🔹 Option 2: Cloud Data Engineering
Given my background in SAP BW and data-heavy implementations, cloud data engineering feels like a natural extension. I’m particularly interested in building scalable data pipelines, governance, and analytics solutions on cloud platforms.
So here’s the big question:
👉 Which path offers better long-term growth, work-life balance, and alignment with future tech trends?
Would love to hear from folks who’ve made the switch or are working in these domains. Any insights, pros/cons, or personal experiences would be hugely appreciated!
Thanks in advance 🙌
https://redd.it/1mbamhx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you think AI can affect Infrastructure management?
Hello everyone,
I am thinking about how AI can affect Infrastructure management, and I don't have many ideas about how it can affect the infrastructure side besides the agents to detect anomalies.
Can you share your thoughts/tools that you know are being born?
A great week for you all.
https://redd.it/1mbda6q
@r_devops
Hello everyone,
I am thinking about how AI can affect Infrastructure management, and I don't have many ideas about how it can affect the infrastructure side besides the agents to detect anomalies.
Can you share your thoughts/tools that you know are being born?
A great week for you all.
https://redd.it/1mbda6q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Proxmox-GitOps - Self-configuring GitOps Environment for Container Automation in Proxmox VE
Hi everyone, I wanted to share my GitOps project for my homelab, a self-configuring CI/CD environment for Proxmox:
https://github.com/stevius10/Proxmox-GitOps
Proxmox-GitOps is built to manage and deploy LXC containers in Proxmox, fully defined as code and easy to modify via Pull Request. Consistent, modular, and dynamically adapting to changing environments and base configurations.
A single command (and accepting the Pull Request in the Docker environment, ha) bootstraps the recursive deployment:
- The Docker-based environment pushes its own codebase as a monorepo, referencing modular components (containers you define are automatically integrated as submodules), each integrated into CI/CD. This triggers the pipeline.
- The pipeline then triggers itself — updating references, enforcing state, and continuing recursively.
Provisioning is handled via Ansible using the Proxmox API. Configuration is managed with Chef/Cinc cookbooks focused on application logic.
Shared configuration is applied consistently across all services. Changes to the base system propagate automatically.
It’s easily extensible, aiming to have all containers built the same way. There’s an explanation of how to do this in the README of the repository.
This project is still young and there are most likely some bugs. I built it primarily for my own homelab, but I’d like to develop it further. Would really appreciate your input – even (or especially) if you run into issues.
Thank you in advance for any interest or feedback you have 🙂
https://redd.it/1mb8mem
@r_devops
Hi everyone, I wanted to share my GitOps project for my homelab, a self-configuring CI/CD environment for Proxmox:
https://github.com/stevius10/Proxmox-GitOps
Proxmox-GitOps is built to manage and deploy LXC containers in Proxmox, fully defined as code and easy to modify via Pull Request. Consistent, modular, and dynamically adapting to changing environments and base configurations.
A single command (and accepting the Pull Request in the Docker environment, ha) bootstraps the recursive deployment:
- The Docker-based environment pushes its own codebase as a monorepo, referencing modular components (containers you define are automatically integrated as submodules), each integrated into CI/CD. This triggers the pipeline.
- The pipeline then triggers itself — updating references, enforcing state, and continuing recursively.
Provisioning is handled via Ansible using the Proxmox API. Configuration is managed with Chef/Cinc cookbooks focused on application logic.
Shared configuration is applied consistently across all services. Changes to the base system propagate automatically.
It’s easily extensible, aiming to have all containers built the same way. There’s an explanation of how to do this in the README of the repository.
This project is still young and there are most likely some bugs. I built it primarily for my own homelab, but I’d like to develop it further. Would really appreciate your input – even (or especially) if you run into issues.
Thank you in advance for any interest or feedback you have 🙂
https://redd.it/1mb8mem
@r_devops
GitHub
GitHub - stevius10/Proxmox-GitOps: Automation Framework for Linux Containers (LXC) as IaC on Proxmox VE.
Automation Framework for Linux Containers (LXC) as IaC on Proxmox VE. - stevius10/Proxmox-GitOps
Built a small GitHub Action to send Slack/Email alerts from any workflow step
Github Action : https://github.com/Hookflo/notify-action
I was tired of waiting around for long CI jobs to finish or manually checking logs when tests failed or cron jobs completed. even sometime workflows gets failed and to track again have to check in actions, I mean why not to get a simple slack alert about failure with reason.
So I put together a tiny GitHub Action that sends Slack/Email alerts from any step in your workflow.
It uses Hookflo under the hood to send alert and log each event, so you get both real-time alerts and a central view of what happened across your pipelines.
Works great for:
Test failures
Cron job done
Long-running jobs
Job timeout
Just add a single step, pass a message + Hookflo webhook configuration, and you're done.
Do star it if you like the action, and definitely give a try using Hookflo's free trial.
https://redd.it/1mbg5xq
@r_devops
Github Action : https://github.com/Hookflo/notify-action
I was tired of waiting around for long CI jobs to finish or manually checking logs when tests failed or cron jobs completed. even sometime workflows gets failed and to track again have to check in actions, I mean why not to get a simple slack alert about failure with reason.
So I put together a tiny GitHub Action that sends Slack/Email alerts from any step in your workflow.
It uses Hookflo under the hood to send alert and log each event, so you get both real-time alerts and a central view of what happened across your pipelines.
Works great for:
Test failures
Cron job done
Long-running jobs
Job timeout
Just add a single step, pass a message + Hookflo webhook configuration, and you're done.
Do star it if you like the action, and definitely give a try using Hookflo's free trial.
https://redd.it/1mbg5xq
@r_devops
GitHub
GitHub - Hookflo/notify-action: A simple and Quick, production ready GitHub Action to send JSON alerts and notifications via POST…
A simple and Quick, production ready GitHub Action to send JSON alerts and notifications via POST requests to Slack, email services, logging endpoints, and more. - GitHub - Hookflo/notify-action:...
Devops In Startup
Hello Community ,I have been trying to get into DevOps in Startups . I could be working more but I think its better I learn more in DevOps. How should I Do this Actually I follow good communities that show up startup details. But I am confused How to approach startups. Anyone who is working in startups as DevOps or Cloud Engineer. Meanwhile I have been writing Cold Emails also I have 6 months Internship experience. I think mostly people Iam a Fresher
let me know which approach is good using Linkedin ,Cold Emails, X
https://redd.it/1mbhu1d
@r_devops
Hello Community ,I have been trying to get into DevOps in Startups . I could be working more but I think its better I learn more in DevOps. How should I Do this Actually I follow good communities that show up startup details. But I am confused How to approach startups. Anyone who is working in startups as DevOps or Cloud Engineer. Meanwhile I have been writing Cold Emails also I have 6 months Internship experience. I think mostly people Iam a Fresher
let me know which approach is good using Linkedin ,Cold Emails, X
https://redd.it/1mbhu1d
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Two choices for the career path
Dear Nerds,
I’m calling for the advice of the lord of the nerds, please hear me.
Context: I work at a SaaS company with the title Product Support Engineer and it is a combined role so there is a 60% Support - 40% DevOps Tasks. Recently, I delivered the whole infra and pipelines of this new product we have.
I got an offer from another company doing secure OT, and the position is NOC Operator / Automation Engineer.
Goal: I need the better approach to help me reach my goals to be a full time DevOps engineer. Which one of these roles might be a considerably relative/easier stepping stone?
https://redd.it/1mbkges
@r_devops
Dear Nerds,
I’m calling for the advice of the lord of the nerds, please hear me.
Context: I work at a SaaS company with the title Product Support Engineer and it is a combined role so there is a 60% Support - 40% DevOps Tasks. Recently, I delivered the whole infra and pipelines of this new product we have.
I got an offer from another company doing secure OT, and the position is NOC Operator / Automation Engineer.
Goal: I need the better approach to help me reach my goals to be a full time DevOps engineer. Which one of these roles might be a considerably relative/easier stepping stone?
https://redd.it/1mbkges
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Best code coverage online tool for large open source monorepo (100+ packages)?
Hi everyone,
I'm working on a large open source monorepo with over 100 packages, and I'm looking to properly set up code coverage reporting.
# Current setup:
Each package generates its own `lcov` file
I can merge them into a single root
With that, I'm looking for:
A solid online code coverage tool
Supports monorepos
Can show coverage badges per package
Integrates easily with CI (GitHub Actions)
# Questions:
What tools do you recommend? (e.g., Codecov, Coveralls, SonarCloud, others?)
Have you set up coverage reporting for a monorepo of this scale before? Any tips or lessons learned?
I’ve never handled coverage at this scale before, so any guidance, examples, or war stories would be super helpful.
Thanks in advance!
https://redd.it/1mblvnh
@r_devops
Hi everyone,
I'm working on a large open source monorepo with over 100 packages, and I'm looking to properly set up code coverage reporting.
# Current setup:
Each package generates its own `lcov` file
I can merge them into a single root
locv file, if necessary.With that, I'm looking for:
A solid online code coverage tool
Supports monorepos
Can show coverage badges per package
Integrates easily with CI (GitHub Actions)
# Questions:
What tools do you recommend? (e.g., Codecov, Coveralls, SonarCloud, others?)
Have you set up coverage reporting for a monorepo of this scale before? Any tips or lessons learned?
I’ve never handled coverage at this scale before, so any guidance, examples, or war stories would be super helpful.
Thanks in advance!
https://redd.it/1mblvnh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Connecting to Cloud SQL From Cloud Run without a VPC (GCP)
According to this post that was recently sent to me, its not necessary to create a VPC and doing so would create a network detour effect, as traffic would go out of a GCP managed VPC to your own VPC and back to their VPC. I'm wondering what everyone's thoughts are on this sort of network architecture--i.e. enabling peering to make this connection happen. As it stands, it seems like I wouldn't be able to use IAM auth with this method and would need dedicated postgres credentials for my cloud run jobs. One, is this a valid method of making this connection happen? And two, should I actually be using dedicated credentials (instead of IAM tokens) in production? Lastly, any reason to do all this instead of just use a Cloud SQL Connector? In my case, regarding the connector--there is no support for psycopg yet as a database adapter, but that is soon changing. In the meantime, I'd have to use asyncpg if I wanted to use a connector.
https://redd.it/1mbngxm
@r_devops
According to this post that was recently sent to me, its not necessary to create a VPC and doing so would create a network detour effect, as traffic would go out of a GCP managed VPC to your own VPC and back to their VPC. I'm wondering what everyone's thoughts are on this sort of network architecture--i.e. enabling peering to make this connection happen. As it stands, it seems like I wouldn't be able to use IAM auth with this method and would need dedicated postgres credentials for my cloud run jobs. One, is this a valid method of making this connection happen? And two, should I actually be using dedicated credentials (instead of IAM tokens) in production? Lastly, any reason to do all this instead of just use a Cloud SQL Connector? In my case, regarding the connector--there is no support for psycopg yet as a database adapter, but that is soon changing. In the meantime, I'd have to use asyncpg if I wanted to use a connector.
https://redd.it/1mbngxm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do OSS compliance tools have to be this heavy? Would you use one if it was just a CLI?
Posting this to get a sanity check from folks working in software, security, or legal review.
There are a bunch of tools out there for OSS compliance stuff, like:
License detection (MIT, GPL, AGPL, etc.)
CVE scanning
SBOM generation (SPDX/CycloneDX)
Attribution and NOTICE file creation
Policy enforcement
Most of the well-known options (like Snyk, FOSSA, ORT, etc.) tend to be SaaS-based, config-heavy, or tied into CI/CD pipelines.
Do you ever feel like:
These tools are heavier or more complex than you need?
They're overkill when you just want to check a repo’s compliance or risk profile?
You only use them because “the company needs it” — not because they’re developer-friendly?
If something existed that was:
Open-source
Local/offline by default
CLI-first
Very fast
No setup or config required
Outputs SPDX, CVEs, licenses, obligations, SBOMs, and attribution in one scan...
Would that kind of tool actually be useful at work?
And if it were that easy — would you even start using it for your own side projects or internal tools too?
https://redd.it/1mbowvo
@r_devops
Posting this to get a sanity check from folks working in software, security, or legal review.
There are a bunch of tools out there for OSS compliance stuff, like:
License detection (MIT, GPL, AGPL, etc.)
CVE scanning
SBOM generation (SPDX/CycloneDX)
Attribution and NOTICE file creation
Policy enforcement
Most of the well-known options (like Snyk, FOSSA, ORT, etc.) tend to be SaaS-based, config-heavy, or tied into CI/CD pipelines.
Do you ever feel like:
These tools are heavier or more complex than you need?
They're overkill when you just want to check a repo’s compliance or risk profile?
You only use them because “the company needs it” — not because they’re developer-friendly?
If something existed that was:
Open-source
Local/offline by default
CLI-first
Very fast
No setup or config required
Outputs SPDX, CVEs, licenses, obligations, SBOMs, and attribution in one scan...
Would that kind of tool actually be useful at work?
And if it were that easy — would you even start using it for your own side projects or internal tools too?
https://redd.it/1mbowvo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Falling in love with problems... not tools
Time and time again, I find myself falling in love with a tool rather than the initial problem I set out to solve. This tends to lead to over-engineering because I'm constantly chasing the most optimized way to structure the codebase, create pipelines that meet each and every use case, and build scalability into every single app that might only ever have five users (I'm looking at you k8s).
I feel like it's not inherently wrong to strive for optimization or scalability. But as the saying goes: progress over perfection. Our job is to deliver what the business needs and solve problems that drive the company and broader industry forward. Sometimes I lose sight of that fundamental truth.
The infrastructure we build, the automation we create, and the systems we design are all means to an end. They're not the destination... they're the vehicle that gets us there. When we become too enamored with the elegance of our technical solutions, we risk losing sight of the business value we're supposed to deliver.
Anybody else feel this way?
https://redd.it/1mbqy6v
@r_devops
Time and time again, I find myself falling in love with a tool rather than the initial problem I set out to solve. This tends to lead to over-engineering because I'm constantly chasing the most optimized way to structure the codebase, create pipelines that meet each and every use case, and build scalability into every single app that might only ever have five users (I'm looking at you k8s).
I feel like it's not inherently wrong to strive for optimization or scalability. But as the saying goes: progress over perfection. Our job is to deliver what the business needs and solve problems that drive the company and broader industry forward. Sometimes I lose sight of that fundamental truth.
The infrastructure we build, the automation we create, and the systems we design are all means to an end. They're not the destination... they're the vehicle that gets us there. When we become too enamored with the elegance of our technical solutions, we risk losing sight of the business value we're supposed to deliver.
Anybody else feel this way?
https://redd.it/1mbqy6v
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What secret management tool do you use?
We are interested in implementing this at home to securely transfer passwords and certificates from one specialist to another. The tools should have an option to be integrated with services such as Jenkins and Ansible.
Although I have not worked with this type of program before, I believe a good starting point would be to try HashiCorp Vault https://github.com/hashicorp/vault. What are your thoughts on this, and which ones do you use?
https://redd.it/1mbsyje
@r_devops
We are interested in implementing this at home to securely transfer passwords and certificates from one specialist to another. The tools should have an option to be integrated with services such as Jenkins and Ansible.
Although I have not worked with this type of program before, I believe a good starting point would be to try HashiCorp Vault https://github.com/hashicorp/vault. What are your thoughts on this, and which ones do you use?
https://redd.it/1mbsyje
@r_devops
GitHub
GitHub - hashicorp/vault: A tool for secrets management, encryption as a service, and privileged access management
A tool for secrets management, encryption as a service, and privileged access management - hashicorp/vault
SRE / DevOps more exciting than full stack development?
looking for some vibes based career advice.
I'm currently a web dev at a f5000, 3 yoe, and kinda bored. Lately, I feel most engaged and satisfied when production bugs gets me into the zone, and I have to use all my mental energy to resolve the bug ASAP and make a meaningful difference to a user.
This happens about once a week for a few hours at a time. The rest of the time I'm babysitting GitHub copilot to do some CRUD ticket.
I know it's a pretty nice gig, grass is greener on the other side, etc etc. I am still interested in hearing some perspectives:
if you've moved from full stack web dev to SRE or DevOps, do you find the work more engaging? More secure? More lucrative? Is there downtime?
For more context, my company does not have dedicated SRE / DevOps roles. I'm planning ahead for if I get laid off, or decide to commit to upskilling for a 'better' job.
To be honest, I have a limited understanding of what SRE and DevOps roles involve. I imagine working with kubernetes, terraform, being on call a lot, etc. Do let me know if there's something I'm missing. TIA
https://redd.it/1mbv64v
@r_devops
looking for some vibes based career advice.
I'm currently a web dev at a f5000, 3 yoe, and kinda bored. Lately, I feel most engaged and satisfied when production bugs gets me into the zone, and I have to use all my mental energy to resolve the bug ASAP and make a meaningful difference to a user.
This happens about once a week for a few hours at a time. The rest of the time I'm babysitting GitHub copilot to do some CRUD ticket.
I know it's a pretty nice gig, grass is greener on the other side, etc etc. I am still interested in hearing some perspectives:
if you've moved from full stack web dev to SRE or DevOps, do you find the work more engaging? More secure? More lucrative? Is there downtime?
For more context, my company does not have dedicated SRE / DevOps roles. I'm planning ahead for if I get laid off, or decide to commit to upskilling for a 'better' job.
To be honest, I have a limited understanding of what SRE and DevOps roles involve. I imagine working with kubernetes, terraform, being on call a lot, etc. Do let me know if there's something I'm missing. TIA
https://redd.it/1mbv64v
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Started a newsletter digging into real infra outages - first post: Reddit’s Pi Day incident
Hey guys, I just launched a newsletter where I’ll be breaking down real-world infrastructure outages - postmortem-style.
These won’t just be summaries, I’m digging into how complex systems fail even when everything looks healthy. Things like monitoring blind spots, hidden dependencies, rollback horror stories, etc.
The first post is a deep dive into Reddit’s 314-minute Pi Day outage - how three harmless changes turned into a $2.3M failure:
Read it here
If you're into SRE, infra engineering, or just love a good forensic breakdown, I'd love for you to check it out.
https://redd.it/1mbo3oq
@r_devops
Hey guys, I just launched a newsletter where I’ll be breaking down real-world infrastructure outages - postmortem-style.
These won’t just be summaries, I’m digging into how complex systems fail even when everything looks healthy. Things like monitoring blind spots, hidden dependencies, rollback horror stories, etc.
The first post is a deep dive into Reddit’s 314-minute Pi Day outage - how three harmless changes turned into a $2.3M failure:
Read it here
If you're into SRE, infra engineering, or just love a good forensic breakdown, I'd love for you to check it out.
https://redd.it/1mbo3oq
@r_devops
Substack
The Reddit Pi Day Incident
How three innocent changes conspired to create a $2.3M disaster
DevOps Projects Feedback
Hi Reddit Fam!
I have been trying to create a portal which resonates with the actual project that people can do and get hands-on experience.
Now making the portal was not challenging but putting the quality project at one place is, the best way I thought of collecting the project was to target various certification examination and get the projects around it.
I have added few project, if you guys can just give me a feedback on them. And also what all more type of project I should put here? Any recommendations would be appreciated.
Website: https://bartman.ai/
Coupon code: DOCKERSEC
If something doesn’t work then let me know.
For now, I am focused on CKA certification for this week.
https://redd.it/1mc4uky
@r_devops
Hi Reddit Fam!
I have been trying to create a portal which resonates with the actual project that people can do and get hands-on experience.
Now making the portal was not challenging but putting the quality project at one place is, the best way I thought of collecting the project was to target various certification examination and get the projects around it.
I have added few project, if you guys can just give me a feedback on them. And also what all more type of project I should put here? Any recommendations would be appreciated.
Website: https://bartman.ai/
Coupon code: DOCKERSEC
If something doesn’t work then let me know.
For now, I am focused on CKA certification for this week.
https://redd.it/1mc4uky
@r_devops
BartMan
BartMan - AI Career Development Platform | Interview Prep & Job Matching
AI-powered career platform with video interview analysis, personalized job matching, and skill development. Get interview-ready and land your dream job with AI coaching.
Anyone integrated an AI code reviewer into your CI/CD?
We just rolled out CARE — an AI-powered plugin that performs code reviews directly in your CI/CD pipelines or locally.
It’s tailored for Guidewire/Gosu (but also supports Java or any other popular programming language) and integrates with Bitbucket/Git/Azure DevOps.
Instead of static rule checks, CARE does:
✅ Real-time feedback in MRs
✅ Unit test/code generation
✅ Inline responses to dev comments
✅ Seamless updates with new best practices
Trying to gauge: is DevOps moving toward proactive QA with AI, or is this still too early for most teams?
https://redd.it/1mc5obe
@r_devops
We just rolled out CARE — an AI-powered plugin that performs code reviews directly in your CI/CD pipelines or locally.
It’s tailored for Guidewire/Gosu (but also supports Java or any other popular programming language) and integrates with Bitbucket/Git/Azure DevOps.
Instead of static rule checks, CARE does:
✅ Real-time feedback in MRs
✅ Unit test/code generation
✅ Inline responses to dev comments
✅ Seamless updates with new best practices
Trying to gauge: is DevOps moving toward proactive QA with AI, or is this still too early for most teams?
https://redd.it/1mc5obe
@r_devops
sollers.eu
CARE – Code AI Review Excellence | Sollers
Do DevOps teams at newer companies still choose Terraform for IaC, or native IaC services (like CloudFormation/Bicep)?
Terraform has been the go to for companies with cloud resources across multiple platforms or migrating from onprem, because of its great cross platform support. But for newer startups or organisations starting out in the cloud, I’d say using platform specific IaC services is usually easier than picking up Terraform, and the platform integration is probably better too. Native tools also don’t require installing extra CLIs or managing state files.
If you're at a newer company or helping clients spin up infra, what are you using for IaC? Are platform native tools good enough now, or is Terraform still the default?
https://redd.it/1mc7p46
@r_devops
Terraform has been the go to for companies with cloud resources across multiple platforms or migrating from onprem, because of its great cross platform support. But for newer startups or organisations starting out in the cloud, I’d say using platform specific IaC services is usually easier than picking up Terraform, and the platform integration is probably better too. Native tools also don’t require installing extra CLIs or managing state files.
If you're at a newer company or helping clients spin up infra, what are you using for IaC? Are platform native tools good enough now, or is Terraform still the default?
https://redd.it/1mc7p46
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Free DevOps Tool Developer Experience Audit
I'm offering free developer experience audits specifically focused on DevOps tools.
My background: Helped dyrectorio (deployment orchestration and container management) and Gimlet (GitOps deployment) gain significant GitHub adoption through improved developer onboarding and documentation. Not affiliated with them anymore.
I specialize in identifying friction points in CI/CD pipelines, infrastructure tooling adoption, and developer-facing automation workflows.
What I'll analyze:
Developer onboarding for your DevOps tools
CI/CD pipeline user experience and documentation
Infrastructure-as-code developer workflows
Tool integration friction points
DM me if you'd like an audit of your developer-facing DevOps processes.
https://redd.it/1mc8qna
@r_devops
I'm offering free developer experience audits specifically focused on DevOps tools.
My background: Helped dyrectorio (deployment orchestration and container management) and Gimlet (GitOps deployment) gain significant GitHub adoption through improved developer onboarding and documentation. Not affiliated with them anymore.
I specialize in identifying friction points in CI/CD pipelines, infrastructure tooling adoption, and developer-facing automation workflows.
What I'll analyze:
Developer onboarding for your DevOps tools
CI/CD pipeline user experience and documentation
Infrastructure-as-code developer workflows
Tool integration friction points
DM me if you'd like an audit of your developer-facing DevOps processes.
https://redd.it/1mc8qna
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Problem when fetching image via api gateway
I'm trying to use KrakenD as an api gateway. I have this endpoint on a flask microservice (both the gateway the microservice are conteinerized)
/images/<date>/<hour>/<filename>
When I fetch the image with a direct connection there are no errors. When I use the endpoint on the gateway it gives back a 404 error. This is the endpoint. I have other endpoints but those work.
{
"endpoint": "/api/images/{date}/{hour}/{filename}",
"method": "GET",
"inputparams": [
"date",
"hour",
"filename"
],
"backend": [
{
"urlpattern": "/images/{date}/{hour}/{filename}",
"host":
"https://data_processor:8080"
}
]
}
This is the configuration of the endpoint.
https://redd.it/1mc9nq4
@r_devops
I'm trying to use KrakenD as an api gateway. I have this endpoint on a flask microservice (both the gateway the microservice are conteinerized)
/images/<date>/<hour>/<filename>
When I fetch the image with a direct connection there are no errors. When I use the endpoint on the gateway it gives back a 404 error. This is the endpoint. I have other endpoints but those work.
{
"endpoint": "/api/images/{date}/{hour}/{filename}",
"method": "GET",
"inputparams": [
"date",
"hour",
"filename"
],
"backend": [
{
"urlpattern": "/images/{date}/{hour}/{filename}",
"host":
"https://data_processor:8080"
}
]
}
This is the configuration of the endpoint.
https://redd.it/1mc9nq4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community