How to Deploy a Containerized Backend for Free?
Howdy!! I’m working on a small charity project for a client and I’m trying to stay entirely within the free tier. The backend is built with microservices and includes:
- A Redis container
- A PostgreSQL container
- An API Gateway using Spring Cloud
- Around 6 Microservices for business logic
In terms of infrastructure the project is not expecting great demand of users, around 100 are expected. So I was planning to use Oracle Cloud’s Free Tier VMs, install Docker, and run all the services there.
Additionally, I’m considering running Prometheus in a separate VM for monitoring and logging.
Are there better (still free) alternatives you'd recommend for containerized deployments?
https://redd.it/1lj2yrs
@r_devops
Howdy!! I’m working on a small charity project for a client and I’m trying to stay entirely within the free tier. The backend is built with microservices and includes:
- A Redis container
- A PostgreSQL container
- An API Gateway using Spring Cloud
- Around 6 Microservices for business logic
In terms of infrastructure the project is not expecting great demand of users, around 100 are expected. So I was planning to use Oracle Cloud’s Free Tier VMs, install Docker, and run all the services there.
Additionally, I’m considering running Prometheus in a separate VM for monitoring and logging.
Are there better (still free) alternatives you'd recommend for containerized deployments?
https://redd.it/1lj2yrs
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Built an AI agent for adaptive security scanning - lessons for infrastructure automation
Traditional security scanners are the worst kind of infrastructure tooling - rigid, fragile, and break when you change one config. Built a ReAct agent that reasons through targets instead of following predefined playbooks.
The infrastructure problem: Security scanning tools are like bad Ansible playbooks - they assume everything stays the same. Change a port, modify a service, update an endpoint - they fail. Modern infrastructure needs adaptive automation.
What this agent does:
Reasons about what to probe next based on discovered services
Adapts scanning strategy when it encounters unexpected responses
Chains multi-step discovery (finds service → identifies version → tests specific vulnerabilities)
No hardcoded scan sequences - decides what's worth checking
Implementation challenges that apply to any infrastructure automation:
Non-deterministic tool execution (LLMs sometimes get lazy and quit early)
Context management in multi-step workflows
Balancing automation with reliable execution patterns
Token cost control in long-running processes
Results: Found SQL injection, directory traversal, and auth bypasses through adaptive reasoning. Discovered attack vectors that rigid scanners miss because they can actually think through the target.
Infrastructure automation insights:
LLMs can make decisions impossible to code traditionally
Need hybrid control - LLM reasoning + deterministic flow control
State management crucial for complex multi-step operations
Adaptive logic beats rigid playbooks for unknown environments
Think of it as Infrastructure as Reasoning instead of Infrastructure as Code. Could apply similar patterns to any ops automation that needs to adapt to changing environments.
Technical implementation: https://vitaliihonchar.com/insights/how-to-build-react-agent
Anyone experimenting with LLM-based infrastructure automation? What patterns work for reliable execution in production environments?
https://redd.it/1lj4rjh
@r_devops
Traditional security scanners are the worst kind of infrastructure tooling - rigid, fragile, and break when you change one config. Built a ReAct agent that reasons through targets instead of following predefined playbooks.
The infrastructure problem: Security scanning tools are like bad Ansible playbooks - they assume everything stays the same. Change a port, modify a service, update an endpoint - they fail. Modern infrastructure needs adaptive automation.
What this agent does:
Reasons about what to probe next based on discovered services
Adapts scanning strategy when it encounters unexpected responses
Chains multi-step discovery (finds service → identifies version → tests specific vulnerabilities)
No hardcoded scan sequences - decides what's worth checking
Implementation challenges that apply to any infrastructure automation:
Non-deterministic tool execution (LLMs sometimes get lazy and quit early)
Context management in multi-step workflows
Balancing automation with reliable execution patterns
Token cost control in long-running processes
Results: Found SQL injection, directory traversal, and auth bypasses through adaptive reasoning. Discovered attack vectors that rigid scanners miss because they can actually think through the target.
Infrastructure automation insights:
LLMs can make decisions impossible to code traditionally
Need hybrid control - LLM reasoning + deterministic flow control
State management crucial for complex multi-step operations
Adaptive logic beats rigid playbooks for unknown environments
Think of it as Infrastructure as Reasoning instead of Infrastructure as Code. Could apply similar patterns to any ops automation that needs to adapt to changing environments.
Technical implementation: https://vitaliihonchar.com/insights/how-to-build-react-agent
Anyone experimenting with LLM-based infrastructure automation? What patterns work for reliable execution in production environments?
https://redd.it/1lj4rjh
@r_devops
Vitalii Honchar
How to Build a ReAct AI Agent for Cybersecurity Scanning with Python and LangGraph
Learn to build a ReAct AI agent for cybersecurity vulnerability scanning using Python and LangGraph. Complete tutorial with working code, token optimization techniques, and real-world implementation that finds critical vulnerabilities automatically.
I’m starting my DevOps journey, So what skills, tools, and real-world challenges should I focus on mastering?
Hi everyone!
I’m an engineering student / early-career professional interested in becoming a DevOps engineer. I don’t just want to study theory or pass certifications, I really want to master real-world skills, work on solid projects, and understand what DevOps looks like in production environments.
I have a few questions and I would love to hear from those with experience:
1) What tools, practices, and concepts did you find most important when working as a DevOps engineer in real-world jobs?
2) What challenges did you face that theory/certification didn’t prepare you for?
3) If you could go back and guide your beginner self, what would you focus on learning or practicing early?
4) What kind of projects (personal or in a lab) would actually make me job-ready?
5) What mistakes do DevOps beginners usually make that I should avoid?
I’m especially interested in AWS, CI/CD pipelines,Terraform, Docker/Kubernetes, and automation but open to all advice!
Thanks so much for your time, looking forward to learning from your experience!
https://redd.it/1lj48nu
@r_devops
Hi everyone!
I’m an engineering student / early-career professional interested in becoming a DevOps engineer. I don’t just want to study theory or pass certifications, I really want to master real-world skills, work on solid projects, and understand what DevOps looks like in production environments.
I have a few questions and I would love to hear from those with experience:
1) What tools, practices, and concepts did you find most important when working as a DevOps engineer in real-world jobs?
2) What challenges did you face that theory/certification didn’t prepare you for?
3) If you could go back and guide your beginner self, what would you focus on learning or practicing early?
4) What kind of projects (personal or in a lab) would actually make me job-ready?
5) What mistakes do DevOps beginners usually make that I should avoid?
I’m especially interested in AWS, CI/CD pipelines,Terraform, Docker/Kubernetes, and automation but open to all advice!
Thanks so much for your time, looking forward to learning from your experience!
https://redd.it/1lj48nu
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
TOP 10 DevOps Tools in 2025: Based on 300 LinkedIn job posts
Hey folks,
Recently I was looking for a new job and got curious about what DevOps tools are actually in demand right now, what I did is:
- Analyzed 300 recent LinkedIn DevOps job posts,
Then I used AI to analyze the job descriptions and pull out the most mentioned tools
- Cross-checked with my own experience,
tbh I added all data and asked chatgpt to write up the rest so data is from me but writeup is not. Still imo it's quite useful.
1. GitHub Actions
2. Terraform
3. Kubernetes
4. ArgoCD
5. Docker
6. Jenkins
7. Prometheus
8. Ansible
9. Vault
10. Pulumi
Honorable mentions: GitLab CI/CD, Helm, Grafana, AWS CodePipeline.
If you want the full breakdown (and some honest pros/cons for each tool), I put together a full article here: https://prepare.sh/articles/devops-job-market-trends-2025
Would love to hear what tools your team is actually using, or if there’s anything you think should’ve made the list.
https://redd.it/1lj93a1
@r_devops
Hey folks,
Recently I was looking for a new job and got curious about what DevOps tools are actually in demand right now, what I did is:
- Analyzed 300 recent LinkedIn DevOps job posts,
Then I used AI to analyze the job descriptions and pull out the most mentioned tools
- Cross-checked with my own experience,
tbh I added all data and asked chatgpt to write up the rest so data is from me but writeup is not. Still imo it's quite useful.
1. GitHub Actions
2. Terraform
3. Kubernetes
4. ArgoCD
5. Docker
6. Jenkins
7. Prometheus
8. Ansible
9. Vault
10. Pulumi
Honorable mentions: GitLab CI/CD, Helm, Grafana, AWS CodePipeline.
If you want the full breakdown (and some honest pros/cons for each tool), I put together a full article here: https://prepare.sh/articles/devops-job-market-trends-2025
Would love to hear what tools your team is actually using, or if there’s anything you think should’ve made the list.
https://redd.it/1lj93a1
@r_devops
Prepare.sh
DevOps Job Market Trends [2025]
DevOps job market trends for 2025 analysis showing Terraform, Python, and Kubernetes dominating job requirements while Golang surges by 13%. Essential insights for tech professionals.
These 5 small Python projects actually help you learn basics
When I started learning Python, I kept bouncing between tutorials and still felt like I wasn’t actually learning.
I could write code when following along, but the second i tried to build something on my own… blank screen.
What finally helped was working on small, real projects. Nothing too complex. Just practical enough to build confidence and show me how Python works in real life.
Here are five that really helped me level up:
1. File sorter Organizes files in your Downloads folder by type. Taught me how to work with directories and conditionals.
2. Personal expense tracker Logs your spending and saves it to a CSV. Simple but great for learning input handling and working with files.
3. Website uptime checker Pings a URL every few minutes and alerts you if it goes down. Helped me learn about requests, loops, and scheduling.
4. PDF merger Combines multiple PDF files into one. Surprisingly useful and introduced me to working with external libraries.
5. Weather app Pulls live weather data from an API. This was my first experience using APIs and handling JSON.
While i was working on these, i created a system in Notion to trck what I was learning, keep project ideas organized, and make sure I was building skills that actually mattered.
I’ve cleaned it up and shared it as a free resource in case it helps anyone else who’s in that stuck phase i was in.
You can find it in my profile bio.
If you’ve got any other project ideas that helped you learn, I’d love to hear them. I’m always looking for new things to try.
https://redd.it/1ljant6
@r_devops
When I started learning Python, I kept bouncing between tutorials and still felt like I wasn’t actually learning.
I could write code when following along, but the second i tried to build something on my own… blank screen.
What finally helped was working on small, real projects. Nothing too complex. Just practical enough to build confidence and show me how Python works in real life.
Here are five that really helped me level up:
1. File sorter Organizes files in your Downloads folder by type. Taught me how to work with directories and conditionals.
2. Personal expense tracker Logs your spending and saves it to a CSV. Simple but great for learning input handling and working with files.
3. Website uptime checker Pings a URL every few minutes and alerts you if it goes down. Helped me learn about requests, loops, and scheduling.
4. PDF merger Combines multiple PDF files into one. Surprisingly useful and introduced me to working with external libraries.
5. Weather app Pulls live weather data from an API. This was my first experience using APIs and handling JSON.
While i was working on these, i created a system in Notion to trck what I was learning, keep project ideas organized, and make sure I was building skills that actually mattered.
I’ve cleaned it up and shared it as a free resource in case it helps anyone else who’s in that stuck phase i was in.
You can find it in my profile bio.
If you’ve got any other project ideas that helped you learn, I’d love to hear them. I’m always looking for new things to try.
https://redd.it/1ljant6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How to be the devops lead?
I recently joined a company as the devops lead. The have been running their infra over github actions and azure container apps. While the deployment scripts themselves are a bit unweildy, it does seem to work more or less. But the team lacks confidence in launching the product because they think that the infra may be not up to the task.
I want to ask the folks here: what would be your goal in such a situation? Is there a platonic ideal of devops that we should aim towards? Where should i dedicate my efforts to? Basically i have been asked to improve the reliability of infrastructure and make it a more modern, flexible, easy to use, easy to rollback setup. I feel a bit lost because the space is too open and I am not sure where to focus my attention and approach this systematically.
I would love to learn about your philosophy of approaching devops and more high level concepts i should be aiming for.
https://redd.it/1ljchu8
@r_devops
I recently joined a company as the devops lead. The have been running their infra over github actions and azure container apps. While the deployment scripts themselves are a bit unweildy, it does seem to work more or less. But the team lacks confidence in launching the product because they think that the infra may be not up to the task.
I want to ask the folks here: what would be your goal in such a situation? Is there a platonic ideal of devops that we should aim towards? Where should i dedicate my efforts to? Basically i have been asked to improve the reliability of infrastructure and make it a more modern, flexible, easy to use, easy to rollback setup. I feel a bit lost because the space is too open and I am not sure where to focus my attention and approach this systematically.
I would love to learn about your philosophy of approaching devops and more high level concepts i should be aiming for.
https://redd.it/1ljchu8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Evaluated 15 SSO providers while scaling auth — here's what caught us off guard
We’re scaling auth for a multi-tenant SaaS product and needed to support enterprise SSO (SAML, OIDC, SCIM, etc.).
Expected it to be a quick eval, but ended up comparing 15+ providers: Okta, Auth0, WorkOS, FusionAuth, Ping, etc.
What surprised us:
SCIM support isn’t always included (and pricing is all over the place)
Admin UX + branding controls vary widely
Some dev SDKs were great, others were... painful
Session control and audit logs aren’t as standard as you'd think
We documented it all in a side-by-side matrix (happy to share if useful), but I’m curious:
If you've implemented SSO or CIAM recently — what were your dealbreakers?
Also, did you self-host (like Keycloak) or go fully managed?
Would love to hear what mattered most to this community.
https://redd.it/1ljdsin
@r_devops
We’re scaling auth for a multi-tenant SaaS product and needed to support enterprise SSO (SAML, OIDC, SCIM, etc.).
Expected it to be a quick eval, but ended up comparing 15+ providers: Okta, Auth0, WorkOS, FusionAuth, Ping, etc.
What surprised us:
SCIM support isn’t always included (and pricing is all over the place)
Admin UX + branding controls vary widely
Some dev SDKs were great, others were... painful
Session control and audit logs aren’t as standard as you'd think
We documented it all in a side-by-side matrix (happy to share if useful), but I’m curious:
If you've implemented SSO or CIAM recently — what were your dealbreakers?
Also, did you self-host (like Keycloak) or go fully managed?
Would love to hear what mattered most to this community.
https://redd.it/1ljdsin
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
how do you manage scheduled jobs inside your cluster's containers?
i am req. to develop/advise to schedule a job that runs in the app's backend.
which means i can't run cronjob container, since it can't run the code over the backend container-its its own container.
so i can use schedule library (python) or create a listen loop to SQS, or whatever.
but the problem is any listening to a cron/time based event requires INFINITE loop to listen.
that's a wtf moment to me. which i thought if the container already has the friggin date. why can't it simply run it according to its own date???
but no, it needs to count the seconds, the min. whatever. to run in the appropriate time.
so i might be totally uninformed. so i'd appreciate you directing me
EDIT:
the reason i don't want infinite loop, cause it sounds way too risky to put in production env. and can create unnecessary load, and in general doesn't sound like good practice, unless you really know how to create an efficient loop with all the error handling of an expert.
https://redd.it/1lja1df
@r_devops
i am req. to develop/advise to schedule a job that runs in the app's backend.
which means i can't run cronjob container, since it can't run the code over the backend container-its its own container.
so i can use schedule library (python) or create a listen loop to SQS, or whatever.
but the problem is any listening to a cron/time based event requires INFINITE loop to listen.
that's a wtf moment to me. which i thought if the container already has the friggin date. why can't it simply run it according to its own date???
but no, it needs to count the seconds, the min. whatever. to run in the appropriate time.
so i might be totally uninformed. so i'd appreciate you directing me
EDIT:
the reason i don't want infinite loop, cause it sounds way too risky to put in production env. and can create unnecessary load, and in general doesn't sound like good practice, unless you really know how to create an efficient loop with all the error handling of an expert.
https://redd.it/1lja1df
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Career help
I want to transition my career from Windows support l1 to Azure DevOps. I'm also interested in exploring a career in Azure with OpenShift. Could you please guide me on the right learning path to get started?
https://redd.it/1ljffx6
@r_devops
I want to transition my career from Windows support l1 to Azure DevOps. I'm also interested in exploring a career in Azure with OpenShift. Could you please guide me on the right learning path to get started?
https://redd.it/1ljffx6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Public Nexus repository for granting file access to third-parties
Forgive my complete ignorance on this topic, but I am an account manager at a company and am being asked by one of our customers to utilize a Nexus Repository in order to send some installer files of our application.
I'm trying to lighten the load on our dev team and learn some of this myself, but am having a hard time figuring out how Nexus could be utilized as a way to share our exe's and such.
Does anybody have familiarity with this? Are there any specific vendors that you would recommend? Reach out to Sonatype sales folks directly?
https://redd.it/1lji7q0
@r_devops
Forgive my complete ignorance on this topic, but I am an account manager at a company and am being asked by one of our customers to utilize a Nexus Repository in order to send some installer files of our application.
I'm trying to lighten the load on our dev team and learn some of this myself, but am having a hard time figuring out how Nexus could be utilized as a way to share our exe's and such.
Does anybody have familiarity with this? Are there any specific vendors that you would recommend? Reach out to Sonatype sales folks directly?
https://redd.it/1lji7q0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Networking Across AWS and Azure
I have an ECS app running in private subnets on AWS. To avoid NAT gateway costs, I set up VPC endpoints for ECR and Secrets Manager access. Everything works great for AWS services.
Problem: I just realized my app also needs to connect to Azure PubSub, and obviously there's no VPC endpoint for that since it's not an AWS service.
Is there a way to make Azure Pubsub accessible from private subnets without a NAT gateway? Or should I just bite the bullet on NAT costs?
Any advice appreciated!
https://redd.it/1ljk0ye
@r_devops
I have an ECS app running in private subnets on AWS. To avoid NAT gateway costs, I set up VPC endpoints for ECR and Secrets Manager access. Everything works great for AWS services.
Problem: I just realized my app also needs to connect to Azure PubSub, and obviously there's no VPC endpoint for that since it's not an AWS service.
Is there a way to make Azure Pubsub accessible from private subnets without a NAT gateway? Or should I just bite the bullet on NAT costs?
Any advice appreciated!
https://redd.it/1ljk0ye
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Hashicorp 3rd Party Support Services?
Hi Guys,
We're just starting out using Hashicorp Nomad, Consul, Vault (or OpenBao), Packr. All open source variants.
We've got some technical questions which isnt exactly covered in the Docs, and theres not much resource for it online (especially regarding Nomad and Consul).
Does anyone know of any 3rd Party Company providing Hashicorp Support Services? We dont have deep pockets but we are open to subscribe to a support retainer, or purchase a number of hours.
Its really for consultation, troubleshoot, asking scenario specific questions and solutioning. Not expecting anyone to write any stuff for us. Also speaking to someone with operational experience with these would really help.
Thank you!
https://redd.it/1ljdult
@r_devops
Hi Guys,
We're just starting out using Hashicorp Nomad, Consul, Vault (or OpenBao), Packr. All open source variants.
We've got some technical questions which isnt exactly covered in the Docs, and theres not much resource for it online (especially regarding Nomad and Consul).
Does anyone know of any 3rd Party Company providing Hashicorp Support Services? We dont have deep pockets but we are open to subscribe to a support retainer, or purchase a number of hours.
Its really for consultation, troubleshoot, asking scenario specific questions and solutioning. Not expecting anyone to write any stuff for us. Also speaking to someone with operational experience with these would really help.
Thank you!
https://redd.it/1ljdult
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Tool Release Kube Composer – Visually Build & Prototype Kubernetes Configs (198⭐️ on GitHub
Hey 👋
I’ve been working on an open-source project called Kube Composer — it’s a visual editor for designing and prototyping Kubernetes configurations without writing raw YAML.
🚀 What’s it for?
• Quickly scaffold Kubernetes resources for apps and microservices
• Visualize relationships between objects (e.g., Services, Deployments, Ingress)
• Export production-ready YAML configs
• Great for platform teams, internal developer platforms (IDPs), and onboarding
🧑💻 New update just dropped:
• Cleaner and more intuitive UI
• Layout & performance improvements
• Usability fixes from real-world feedback
⭐ We just passed 198 GitHub stars!
Appreciate all the support from the community — your stars, feedback, and issues have helped shape the direction.
👷♀️ Looking for collaborators:
If you’re into Kubernetes, GitOps, or building internal tools, I’d love your feedback or help on shaping features like CRD support, Helm integration, and OpenTelemetry flow mapping.
🔗 GitHub: https://github.com/same7ammar/kube-composer
Would love to hear how this could fit into your workflows or dev environments. Always open to suggestions and PRs 🙌
https://redd.it/1ljj9a7
@r_devops
Hey 👋
I’ve been working on an open-source project called Kube Composer — it’s a visual editor for designing and prototyping Kubernetes configurations without writing raw YAML.
🚀 What’s it for?
• Quickly scaffold Kubernetes resources for apps and microservices
• Visualize relationships between objects (e.g., Services, Deployments, Ingress)
• Export production-ready YAML configs
• Great for platform teams, internal developer platforms (IDPs), and onboarding
🧑💻 New update just dropped:
• Cleaner and more intuitive UI
• Layout & performance improvements
• Usability fixes from real-world feedback
⭐ We just passed 198 GitHub stars!
Appreciate all the support from the community — your stars, feedback, and issues have helped shape the direction.
👷♀️ Looking for collaborators:
If you’re into Kubernetes, GitOps, or building internal tools, I’d love your feedback or help on shaping features like CRD support, Helm integration, and OpenTelemetry flow mapping.
🔗 GitHub: https://github.com/same7ammar/kube-composer
Would love to hear how this could fit into your workflows or dev environments. Always open to suggestions and PRs 🙌
https://redd.it/1ljj9a7
@r_devops
GitHub
GitHub - same7ammar/kube-composer: Open-Source Kubernetes YAML Builder with Intuitive Web Interface and Dynamic Visualization for…
Open-Source Kubernetes YAML Builder with Intuitive Web Interface and Dynamic Visualization for Developers and DevOps Engineers - same7ammar/kube-composer
Managing browser-heavy CI/CD tests without heavy containers any slick setups?
My CI pipeline relies widely on browser-based end-to-end tests (OAuth flows, payment redirects, multi-session scenarios). Containers and headless browsers work, but they're resource-intensive and sometimes inaccurate due to fingerprint differences.
Has anyone used tools that provide isolated, local browser sessions you can script or profile-test with minimal overhead?
https://redd.it/1ljw1a6
@r_devops
My CI pipeline relies widely on browser-based end-to-end tests (OAuth flows, payment redirects, multi-session scenarios). Containers and headless browsers work, but they're resource-intensive and sometimes inaccurate due to fingerprint differences.
Has anyone used tools that provide isolated, local browser sessions you can script or profile-test with minimal overhead?
https://redd.it/1ljw1a6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
[Feedback Wanted] Container Platform Focused on Resource Efficiency, Simplicity, and Speed
Hey r/devops! I'm working on a cloud container platform and would love to get your thoughts and feedback on the concept. The objective is to make container deployment simpler while maximizing resource efficiency. My research shows that only 13% of provisioned cloud resources are actually utilized (I also used to work for AWS and can verify this number) so if we start packing containers together, we can get higher utilization. I'm building a platform that will attempt to maintain \~80% node utilization, allowing for 20% burst capacity without moving any workloads around, and if the node does step into the high-pressure zone, we will move less-active pods to different nodes to continue allowing the very active nodes sufficient headroom to scale up.
My primary starting factor was that I wanted to make edits to open source projects and deploy those edits to production without having to either self-host or use something like ECS or EKS as they have a lot of overhead and are very expensive... Now I see that Cloudflare JUST came out with their own container hosting solution after I had already started working on this but I don't think a little friendly competition ever hurt anyone!
I also wanted to build something that is faster than commodity AWS or Digital Ocean servers without giving up durability so I am looking to use physical servers with the latest CPUs, full refresh every 3 years (easy since we run containers!), and RAID 1 NVMe drives to power all the containers. The node's persistent volume, stored on the local NVMe drive, will be replicated asynchronously to replica node(s) and allow for fast failover. No more of this EBS powering our databases... Too slow.
Key Technical Features:
* True resource-based billing (per-second, pay for actual usage)
* Pod live migration and scale down to ZERO usage using [zeropod](https://github.com/ctrox/zeropod)
* Local NVMe storage (RAID 1) with cross-node backups via [piraeus](https://piraeus.io/)
* Zero vendor lock-in (standard Docker containers)
* Automatic HTTPS through Cloudflare.
* Support for port forwarding raw TCP ports with additional TLS certificate generated for you.
Core Technical Goals:
1. Deploy any Docker image within seconds.
2. Deploy docker containers from the CLI by just pushing to our docker registry (not real yet): `docker push` [`ctcr.io/someuser/container:dev`](https://ctcr.io/someuser/container:dev)
3. Cache common base images (redis, postgres, etc.) on nodes.
4. Support failover between regions/providers.
Container Selling Points:
* No VM overhead - containers use \~100MB instead of 4GB per app
* Fast cold starts and scaling - containers take seconds to start vs servers which take minutes
* No cloud vendor lock-in like AWS Lambda
* Simple pricing based on actual resource usage
* Focus on environmental impact through efficient resource usage
Questions for the Community:
1. Has anyone implemented similar container migration strategies? What challenges did you face?
2. Thoughts on using Piraeus + ZeroPod for this use case?
3. What issues do you foresee with the automated migration approach?
4. Any suggestions for improving the architecture?
5. What features would make this compelling for your use cases?
I'd really appreciate any feedback, suggestions, or concerns from the community. Thanks in advance!
https://redd.it/1ljvhjs
@r_devops
Hey r/devops! I'm working on a cloud container platform and would love to get your thoughts and feedback on the concept. The objective is to make container deployment simpler while maximizing resource efficiency. My research shows that only 13% of provisioned cloud resources are actually utilized (I also used to work for AWS and can verify this number) so if we start packing containers together, we can get higher utilization. I'm building a platform that will attempt to maintain \~80% node utilization, allowing for 20% burst capacity without moving any workloads around, and if the node does step into the high-pressure zone, we will move less-active pods to different nodes to continue allowing the very active nodes sufficient headroom to scale up.
My primary starting factor was that I wanted to make edits to open source projects and deploy those edits to production without having to either self-host or use something like ECS or EKS as they have a lot of overhead and are very expensive... Now I see that Cloudflare JUST came out with their own container hosting solution after I had already started working on this but I don't think a little friendly competition ever hurt anyone!
I also wanted to build something that is faster than commodity AWS or Digital Ocean servers without giving up durability so I am looking to use physical servers with the latest CPUs, full refresh every 3 years (easy since we run containers!), and RAID 1 NVMe drives to power all the containers. The node's persistent volume, stored on the local NVMe drive, will be replicated asynchronously to replica node(s) and allow for fast failover. No more of this EBS powering our databases... Too slow.
Key Technical Features:
* True resource-based billing (per-second, pay for actual usage)
* Pod live migration and scale down to ZERO usage using [zeropod](https://github.com/ctrox/zeropod)
* Local NVMe storage (RAID 1) with cross-node backups via [piraeus](https://piraeus.io/)
* Zero vendor lock-in (standard Docker containers)
* Automatic HTTPS through Cloudflare.
* Support for port forwarding raw TCP ports with additional TLS certificate generated for you.
Core Technical Goals:
1. Deploy any Docker image within seconds.
2. Deploy docker containers from the CLI by just pushing to our docker registry (not real yet): `docker push` [`ctcr.io/someuser/container:dev`](https://ctcr.io/someuser/container:dev)
3. Cache common base images (redis, postgres, etc.) on nodes.
4. Support failover between regions/providers.
Container Selling Points:
* No VM overhead - containers use \~100MB instead of 4GB per app
* Fast cold starts and scaling - containers take seconds to start vs servers which take minutes
* No cloud vendor lock-in like AWS Lambda
* Simple pricing based on actual resource usage
* Focus on environmental impact through efficient resource usage
Questions for the Community:
1. Has anyone implemented similar container migration strategies? What challenges did you face?
2. Thoughts on using Piraeus + ZeroPod for this use case?
3. What issues do you foresee with the automated migration approach?
4. Any suggestions for improving the architecture?
5. What features would make this compelling for your use cases?
I'd really appreciate any feedback, suggestions, or concerns from the community. Thanks in advance!
https://redd.it/1ljvhjs
@r_devops
GitHub
GitHub - ctrox/zeropod: pod that scales down to zero
pod that scales down to zero. Contribute to ctrox/zeropod development by creating an account on GitHub.
DevOps Roadman
Hello guys i really want to migrate to DevOps, but i struggle find a job. Here is some background of mine i am in the IT field 4+ years mainly dealing with networking equipment , Linux servers , firewalls , and IPS . I have self studied Python and also worked in home environment with Git , Docker and K8S (obviously not a pro) . Any tips at this point will be appreciated and also if you want to share your story how you become DevOps engineer be free to share . Thanks in advance !
https://redd.it/1ljy9lq
@r_devops
Hello guys i really want to migrate to DevOps, but i struggle find a job. Here is some background of mine i am in the IT field 4+ years mainly dealing with networking equipment , Linux servers , firewalls , and IPS . I have self studied Python and also worked in home environment with Git , Docker and K8S (obviously not a pro) . Any tips at this point will be appreciated and also if you want to share your story how you become DevOps engineer be free to share . Thanks in advance !
https://redd.it/1ljy9lq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Codeline baseline and mainline confision specially between codeline and baseline.
Mainline seems to be something that will be released. But codeline and baseline sound similar. What is the difference? Context git flow workflow
https://redd.it/1ljz6w8
@r_devops
Mainline seems to be something that will be released. But codeline and baseline sound similar. What is the difference? Context git flow workflow
https://redd.it/1ljz6w8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
DevOps/SE Starter Guide
Business Management graduate here working at a tech consulting company in the UK, looking to get into Project Management. My work do a lot of software engineering and DevOps, but my technical background is very limited, so I understand the financial aspects of projects but not the service delivery side.
Does anybody have recommendations of free courses (or even YouTube videos) to take to start from the beginning, most that I have tried assume you have some prior knowledge, to which I have basically none. Thanks!
https://redd.it/1ljzzsk
@r_devops
Business Management graduate here working at a tech consulting company in the UK, looking to get into Project Management. My work do a lot of software engineering and DevOps, but my technical background is very limited, so I understand the financial aspects of projects but not the service delivery side.
Does anybody have recommendations of free courses (or even YouTube videos) to take to start from the beginning, most that I have tried assume you have some prior knowledge, to which I have basically none. Thanks!
https://redd.it/1ljzzsk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Am I literally the ONLY person who's hit this ArgoCD + Crossplane silent failure issue??
Okay, this is driving me absolutely insane. Just spent the better part of a week debugging what I can only describe as the most frustrating GitOps issue I've ever encountered.
The problem: ArgoCD showing resources as "Healthy" and "Synced" while Crossplane is ACTIVELY FAILING to provision AWS resources. Like, completely failing. AWS throwing 400 errors left and right, but ArgoCD? "Everything's fine! 🔥 This is fine! 🔥"
I'm talking about Lambda functions not updating, RDS instances stuck in limbo, IAM roles not getting created - all while our beautiful green ArgoCD dashboard mocks us with its lies.
The really weird part: I've been Googling this for DAYS and I'm finding basically NOTHING. Zero blog posts, zero Stack Overflow questions, zero GitHub issues that directly address this. It's like I'm living in some alternate dimension where I'm the only person running ArgoCD with Crossplane who's noticed that the health checks are fundamentally broken.
The issue is in the health check Lua logic - it processes status conditions in array order, so if
Seriously though - has NOBODY else hit this?
Are you all just... not using health checks with Crossplane?
Is everyone just monitoring AWS directly and ignoring ArgoCD status?
Am I the unluckiest person alive?
Did I stumble into some cursed configuration that nobody else uses?
I fixed it by reordering the condition checks (error conditions first, then healthy conditions), but I'm genuinely baffled that this isn't a known issue. The default Crossplane health checks that everyone copies around have this exact problem.
Either I'm missing something obvious, or the entire GitOps community is living in blissful ignorance of their deployments silently failing.
Please tell me I'm not alone here. PLEASE.
UPDATE: Fine, I wrote up the technical details and solution here because apparently I'm pioneering uncharted DevOps territory over here. If even ONE person hits this after me, at least there will be a record of it existing.
https://redd.it/1lk0wgz
@r_devops
Okay, this is driving me absolutely insane. Just spent the better part of a week debugging what I can only describe as the most frustrating GitOps issue I've ever encountered.
The problem: ArgoCD showing resources as "Healthy" and "Synced" while Crossplane is ACTIVELY FAILING to provision AWS resources. Like, completely failing. AWS throwing 400 errors left and right, but ArgoCD? "Everything's fine! 🔥 This is fine! 🔥"
I'm talking about Lambda functions not updating, RDS instances stuck in limbo, IAM roles not getting created - all while our beautiful green ArgoCD dashboard mocks us with its lies.
The really weird part: I've been Googling this for DAYS and I'm finding basically NOTHING. Zero blog posts, zero Stack Overflow questions, zero GitHub issues that directly address this. It's like I'm living in some alternate dimension where I'm the only person running ArgoCD with Crossplane who's noticed that the health checks are fundamentally broken.
The issue is in the health check Lua logic - it processes status conditions in array order, so if
Ready: True comes before Synced: False in the conditions array, ArgoCD just says "cool, we're healthy!" and completely ignores the fact that your cloud resources are on fire.Seriously though - has NOBODY else hit this?
Are you all just... not using health checks with Crossplane?
Is everyone just monitoring AWS directly and ignoring ArgoCD status?
Am I the unluckiest person alive?
Did I stumble into some cursed configuration that nobody else uses?
I fixed it by reordering the condition checks (error conditions first, then healthy conditions), but I'm genuinely baffled that this isn't a known issue. The default Crossplane health checks that everyone copies around have this exact problem.
Either I'm missing something obvious, or the entire GitOps community is living in blissful ignorance of their deployments silently failing.
Please tell me I'm not alone here. PLEASE.
UPDATE: Fine, I wrote up the technical details and solution here because apparently I'm pioneering uncharted DevOps territory over here. If even ONE person hits this after me, at least there will be a record of it existing.
https://redd.it/1lk0wgz
@r_devops
Medium
The Silent Failure: Why Your ArgoCD + Crossplane Resources Show Healthy When They’re Not
How a subtle health check logic bug can make your GitOps pipeline lie to you
nbuild, Yet Another Ci/Cd.
nbuild in action: https://nappgui.com/builds/en/builds/r6349.html
Oriented to C/C++ projects based on CMake.
Written in ANSI C90 with NAppGUI-SDK.
Runs as a command line tool: `nbuild -n network.json -w workflow.json`
Works on a local network, no cloud bills.
Monolithic design, no scripting.
Splits large build jobs into priority queues.
Threading. Multiple runners in parallel.
SSH is the only requeriment on runners, apart from CMake and compilers.
Power on/off on demand. Supports VirtualBox, UTM, VMware, macOS bless.
Runners are preconfigured. No setup from scratch.
Supports legacy systems.
Generates HTML5/LaTeX/PDF project documentation with ndoc.
HTML5 build reports.
Open Source: https://github.com/frang75/nbuild
https://redd.it/1ljyfhz
@r_devops
nbuild in action: https://nappgui.com/builds/en/builds/r6349.html
Oriented to C/C++ projects based on CMake.
Written in ANSI C90 with NAppGUI-SDK.
Runs as a command line tool: `nbuild -n network.json -w workflow.json`
Works on a local network, no cloud bills.
Monolithic design, no scripting.
Splits large build jobs into priority queues.
Threading. Multiple runners in parallel.
SSH is the only requeriment on runners, apart from CMake and compilers.
Power on/off on demand. Supports VirtualBox, UTM, VMware, macOS bless.
Runners are preconfigured. No setup from scratch.
Supports legacy systems.
Generates HTML5/LaTeX/PDF project documentation with ndoc.
HTML5 build reports.
Open Source: https://github.com/frang75/nbuild
https://redd.it/1ljyfhz
@r_devops
GitHub
GitHub - frang75/nbuild: Yet Another Ci/Cd system
Yet Another Ci/Cd system. Contribute to frang75/nbuild development by creating an account on GitHub.
🛡️ RELIAKIT TL-15 Open-Source Chaos + Healing Framework for Planet-Grade Infrastructure
Built for resilience engineers, platform teams, and SREs who want more than just monitoring — they want autonomous recovery.
Let me know what you think — would love your input and improvements!
🔗 GitHub again:
https://github.com/zebadiee/reliakit-tl15
🤝 Looking For
• Feedback on architecture
• Contributors to test new zones
• Suggestions for AI drift detection features
• Adoption in real infrastructure setups
https://redd.it/1lk2lli
@r_devops
Built for resilience engineers, platform teams, and SREs who want more than just monitoring — they want autonomous recovery.
Let me know what you think — would love your input and improvements!
🔗 GitHub again:
https://github.com/zebadiee/reliakit-tl15
🤝 Looking For
• Feedback on architecture
• Contributors to test new zones
• Suggestions for AI drift detection features
• Adoption in real infrastructure setups
https://redd.it/1lk2lli
@r_devops
GitHub
GitHub - zebadiee/reliakit-tl15: ReliaKit TL-15 is an open-source, planet-grade resilience framework for distributed infrastructure.…
ReliaKit TL-15 is an open-source, planet-grade resilience framework for distributed infrastructure. It integrates automated DDoS protection, geo-aware routing, chaos engineering, and symbolic AI ho...