Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Your blue-green deployment approach

Is anyone here using awscdk to do blue-green deployment via ci/cd self-service? If so, how are you doing it? I was thinking about the state or cloudformation about the resources that it already deployed. How will it do blue-green if that is the case. Also, are you happy you used awscdk to do build your automated ci/cd pipeline?

Or maybe I should be open for other ideas aside from awscdk, terraform, opentofu. How did you build your automated ci/cd pipeline? How are your developers using it to deploy their resources?

https://redd.it/1i1i3ja
@r_devops
Need help about DePIN powered server uptime manager

For a while, we’ve been developing a DePIN-powered uptime monitoring tool designed to potentially handle data from millions of devices. Our current infrastructure monitoring and uptime management service, (Checkmate) is evolving to include DePIN integration. This will allow users to burn tokens to access data from the UpRock DePIN network.

This is currently how it works under the hood:

\- Connect your wallet

\- Select the server you want to monitor

\- Choose a geographic focus—whether specific cities, countries, or entire continents—for Checkmate to send ping messages

While managing large volumes of data isn’t an issue at this stage, visualization remains a challenge. We’ve implemented MapLibre to display the data, giving users the flexibility to send one-off ping requests to the DePIN network or schedule continuous checks (e.g., every minute).

Given the novelty of this concept (similar to RIPE Atlas), visualizations will play a critical role for admins. Here's what we can currently offer on the dashboard:

\- Node distribution on a map: Visualize the number of nodes per country.

\- Selective probing: Choose probes directly on the map.

\- Probe details: View all probes selected for a specific server.

\- One-off ping tests: Perform immediate connectivity checks.

I need some feedback on how to move ahead. Since we are just a few weeks away from the general release, it would be great if I could get some thoughts. We’re considering whether this is the right balance of features or if adjustments are needed.

My immediate questions would be:

\- If you had access to a global DePIN network for server monitoring, what would you prioritize seeing on the dashboard?

\- Would you be interested in seeing historical logs? Like access logs going back to a specific time.

\- would you want to customize packet size? (set the size of the packets being sent).

Probably there are others upcoming but I would like to start with a small UI set initially.

https://redd.it/1i1jzck
@r_devops
Need help about DePIN powered server uptime manager

For a while, we’ve been developing a DePIN-powered uptime monitoring tool designed to potentially handle data from millions of devices. Our current infrastructure monitoring and uptime management service, (Checkmate) is evolving to include DePIN integration. This will allow users to burn tokens to access data from the UpRock DePIN network.

This is currently how it works under the hood:

\- Connect your wallet

\- Select the server you want to monitor

\- Choose a geographic focus—whether specific cities, countries, or entire continents—for Checkmate to send ping messages

While managing large volumes of data isn’t an issue at this stage, visualization remains a challenge. We’ve implemented MapLibre to display the data, giving users the flexibility to send one-off ping requests to the DePIN network or schedule continuous checks (e.g., every minute).

Given the novelty of this concept (similar to RIPE Atlas), visualizations will play a critical role for admins. Here's what we can currently offer on the dashboard:

\- Node distribution on a map: Visualize the number of nodes per country.

\- Selective probing: Choose probes directly on the map.

\- Probe details: View all probes selected for a specific server.

\- One-off ping tests: Perform immediate connectivity checks.

I need some feedback on how to move ahead. Since we are just a few weeks away from the general release, it would be great if I could get some thoughts. We’re considering whether this is the right balance of features or if adjustments are needed.

My immediate questions would be:

\- If you had access to a global DePIN network for server monitoring, what would you prioritize seeing on the dashboard?

\- Would you be interested in seeing historical logs? Like access logs going back to a specific time.

\- would you want to customize packet size? (set the size of the packets being sent).

Probably there are others upcoming but I would like to start with a small UI set initially.

https://redd.it/1i1jxsj
@r_devops
Salary depression

I’m a lead/staff SRE/Devops practitioner that is currently on the market. Is it just me, or are companies in the US trying to drive salaries down really hard? I’ve seen on-call lead engineers advertised as “max 120k” and I talked to someone today who hadn’t advertised a salary but their max was 140k for a lead SRE with 10+ years experience in a senior role.

Are people actually taking these salaries?

https://redd.it/1i1mzs4
@r_devops
Full-Time DevOps also doing contracting gigs?

Hi all,

I’m currently a full-time DevOps engineer. I enjoy what I do at my current employer, have great management, and don’t want to leave. However, I would like to earn more by potentially finding DevOps related contract jobs to do part-time. If any of you out there are doing this, are there any apps or resources you could point me to? Thanks in advance.

https://redd.it/1i1mkqm
@r_devops
Does Palantir's Apollo offer any real value?

Does Palantir's Apollo offer any real value? It looks and smells like a scam, but it's hard to tell. What do you think about it?

https://redd.it/1i1ofh9
@r_devops
Introducing Whispr: A DevOps tool to fetch secure vault secrets Just-In-Time for Apps

Hi DevOps community, let me introduce an exciting tool we created at Cybrota.

Whispr (Pronounced whisper) is an open-source tool to fetch vault secrets (AWS, Azure or GCP) and inject them straight into your app environment either via environment or as STDIN args. This is very handy in keeping your `.env` file free from plain-text secrets and fetch them on-demand for your local/CI app development. It avoids attacks like stolen-credentials by storing nothing.

All it takes is:

`pip install whispr`

How it works ?

1. Place an empty `.env` file in your project, and let Whispr fetch corresponding secrets from a connected vault and inject values into your program environment. All you need is to run

```sh
$ whispr run 'your_command_with_args'
```

2. Whispr uses your existing vault's authentication (IAM) to securely fetch secrets. So no new auth mechanisms are required.

3. In addition Whispr comes with handy utilities to peek your secret quickly (Vault-agnostic), or even generate a crypto-safe random sequence for rotating secrets.

Here is the GitHub project: https://github.com/cybrota/whispr

4. If you want to inject secrets into app's environment programmatically (without `run`), whispr package provides elegant API.

Tool is currently attracting 2K downloads per month, with various enterprise teams already using it to set up safe and authorized pre-commit hooks to standardizing local app development.

The project itself uses security best practices like code scanning, No shell-use while launching app, and PyPi verified attestation to release packages etc.

I would love to hear your feedback about possible improvements, criticism, and suggestions! I hope it will show up in your workflows soon!

https://redd.it/1i1qffo
@r_devops
Secure Apple Devops Interview

Hey everyone, I recently got myself an interview for a DevOps Engineering position. I’ve mostly done Cloud Ops/ Dev Ops work in AWS (4 years) with some Network admin /Support (2.5 years) work back in my earlier career days.


This role seem to focus more on KVM, Xen, Containers, Enterprise Linux, Ansible (with Python and bash obviously), telemetry tools such as Prometheus, Alertmanager. Looking for some help on a preparation plan if someone has gone through a similar interview process already. If you could give any advice or help tips that would be great!

https://redd.it/1i1okno
@r_devops
My CAPA Experience

Disclaimer: This story was written by one of our employees



I recently earned my CAPA certification and wanted to share my experience.



For preparation, I took the DevOps and Workflow Management with Argo course (LFS256). While the course taught me a lot about the Argo project and how it works, I feel like it didn’t cover everything on the exam. Out of 60 questions, at least 10 caught me off guard because they covered topics I had never encountered before.



If I were to take the exam again, I’d definitely read through the entire documentation for each Argo project and focus on the details. The course links some parts of the docs, but in hindsight, that wasn’t enough.



Comparing this to my experience with the CKA exam (which I passed about 18 months ago), the prep for the CKA felt tougher, even though I had great study resources. That said, I walked away from the CKA feeling confident I’d passed, while with CAPA, I was genuinely unsure and thought I might need a retake.



I’m not sure if my struggle with CAPA was because I hate multiple-choice exams, put less effort into prep, didn’t have the right materials, or some questions surprised me —but for me, CAPA felt harder.



Has anyone done the CAPA exam? Can you compare it to some other CNCF certification exams?

https://redd.it/1i1vuyp
@r_devops
Feedback for OneUptime: Open Source Monitoring and Observability Platform

We're building an open source observability platform - OneUptime (https://oneuptime.com). Think of it as your open-source alternative to Datadog, NewRelic, PagerDuty, and Incident.io—100% FOSS and Apache Licensed.

Already using OneUptime? Huge thanks! We’d love to hear your feedback.

Not on board yet? We’re curious why and eager to know how we can better serve your needs. What features would you like to see implemented? We listen to this community very closely and will ship updates for you all.

Looking forward to hearing your thoughts and feedback!

https://redd.it/1i1xa5y
@r_devops
Any Alternative to TEAMS for AWS Identity Center

https://aws-samples.github.io/iam-identity-center-team/
Do we have any alternative solution like TEAMS which can perform Elevated Access?
Specifically for Master Account.

https://redd.it/1i1y8w1
@r_devops
A Small Tool I Built for Faster Feedback: cfex

Hi everyone,
As a developer, I noticed that startups and small teams often face delays when sharing applications for feedback or demos due to the hassle of setting up staging environments. To solve this, I built cfex, a small CLI tool that lets you go live instantly.

With just one command:

cfex api.yourdomain.com:8080

Your app is live at https://api.yourdomain.com, with HTTPS and HTTP/3 enabled by default. It’s perfect for quick iterations, testing, or showing progress to stakeholders.

The tool is similar to ngrok but built on top of cloudflared, leveraging Cloudflare's robust infrastructure.

The code is open source: https://github.com/muthuishere/cfex-cli
More details: https://muthuishere.medium.com/one-command-to-go-live-with-cfex-135d74d81b45

I’d love to hear your feedback or ideas for improving it. If you think it could help your team or project, feel free to give it a try!



https://redd.it/1i1zqmn
@r_devops
Biotech pros, dive into our Apache NiFi demo for big-scale data automation.

We created a demo video in how Apache NiFi can be used. The video doesn't explicitly show data or workflows specifically pertaining to biotech, but it does show NiFi functionality.

Reason for this post, is I'm looking to see if other biotech business are running into data ingestion limitations and need solutions at scale for ingestion.

Sharing below is our case studies, and the video link to the demo. I would love to get feedback as to the effectivness this solution is for biotech businesses.

Case Studies: https://dasnuve.com/case-studies

NiFi Workflows Demo: https://videoshare.dasnuve.com/video/nifi-workflows-demo

https://redd.it/1i206yc
@r_devops
Database DevOps survey (<10min): Five chances to win $100 for submitting your responses!

Hello to our friends in r/devops – the database DevOps community eagerly seeks your input on the state, needs, and opportunities of database change management workflows in 2025. 

If you’re on a developer, database, DevOps, platform, or data team, we want to hear from you! Your participation helps make modern pipelines faster, easier, safer, and better integrated.

We’re also giving away five, $100 gift cards (or charitable donations) to survey respondents. Plus, you’ll get early access to the report containing the survey’s findings and perspectives from industry experts. 

Submit your responses by February 7, 2025, and help shape database workflows that support modern opportunities and challenges like:

Cloud ecosystems
Platform engineering
AI/ML workloads
Security and compliance

Take the 2025 Database DevOps Adoption & Innovation Survey: https://hubs.li/Q0324Mk40 

https://redd.it/1i22arp
@r_devops
Dependency management organization wide

Due to security regulations and recent implementation of SCA, I want to limit my organization's use of external libs. The idea is to maintain an artifact repository with, not only builds of internal libs, but also external ones, and limit the deployments to only be able to use those libs. This would allow us to have more control over our dependencies and its versions so as to not introduce any vulnerabilities or even supply chain attacks from recent commits on our stack.

First of all, do you think thats a good idea? and second: any good way to implement this, particularly the restriction part?

https://redd.it/1i25qrs
@r_devops
Job interview take home assignment

This company basically has me implementing a single node cluster locally and doing the entire write up and documentation in one day along with readme’s for each tool (ex: helm, tf, overall repo).

The sent me a few dirs and files(skeleton) and everything was practically blank/empty safe for empty templating and helm configs (thank God i didn’t have to make that up too). I have to add and test all dependencies with versioning. Create and configure all necessary terraform files.


Is this normal? This is the last step in the interview process. Although I’m finding it fun and understand the “why”, it’s just really taking quite a bit of time. I have been interviewing with other companies and none of them requested anything similar.

https://redd.it/1i2680q
@r_devops
Should My Startup Use Cloud Services or Local Equipment for Hosting?

We’re a small startup preparing to launch our web application. Our outsourcing partner recommends purchasing local equipment for hosting, but we’re considering cloud services like AWS for flexibility and easier maintenance.

Here are the key factors:

1. Early stage with unpredictable resource usage.
2. Limited budget but need scalability.
3. We want to minimize costs without compromising service quality.

What approach would you recommend for startups in this situation? Are cloud services generally more cost-effective and scalable in the long term, or should we start with local equipment and later migrate?

Any advice or shared experiences would be greatly appreciated!

https://redd.it/1i2abdp
@r_devops
Are YAMLs and Bash enough for CI/CD?

I’ve been doing CI/CD for a while, from Jenkins to GitLab CI and GitHub Actions. Recently, with this whole platform engineering approach, I’ve started feeling stuck with the CI platforms. And it’s not just CI/CD—it’s also about automations for resource ochestration, ephemeral environments, spinning up new services, custom tests, and so on.

We’re building increasingly complex automations, and sometimes plain Bash just isn’t enough. I really love Bash, but we all know how hard it can be to develop, debug, test, and reuse code with it. On top of that, we often end up creating custom images every time we need something like jq, yq, or docker.

I’m considering introducing a programming language like Python or Go for these more complex automations. The idea would be to use the CI platform just to define when and where scripts run, keeping the logic portable.

I’ve looked into tools like Dagger.io, but I’m hesitant to add another dependency when I’m trying to reduce them. Also know Humanitec have a "Platform Ochestrator" to handle this complex logic, but again, is another commitment.

Have you implemented something like that? How was it? Was it worth it?

You recommend doing full Python/Go/etc scripting or mixing some bash for less complex automations?

How much do you value the “portability” property on your automations/CI Scripts? 

Is this being discussed somewhere? 

Would love to hear your thoughts!

https://redd.it/1i29vyv
@r_devops
Share your story about how you manage to automate your job in your company

I’ve always been curious about how people manage to automate everything, and get so much free time to a point where they basically can get a second job and still not worried about their system being unstable. How do you guys do it?

https://redd.it/1i2dnpk
@r_devops
Has anyone checked out Pagerduty’s AIOps?

Has anyone used this or taken a deeper dive into it? Curious if it’s legit or if it’s too good to be true?

https://redd.it/1i2dvyn
@r_devops
Docker: still worth relearning?

I'm not trying to make myself super marketable, but I also don't want to learn a dying technology. I used to know basic docker skills about 10ish years ago (give or take), and I'm wanting to spin up some basic web apps partly for the fun of it. Is docker worth investing my time or should I leverage something else to handle my infra needs?

https://redd.it/1i2g0mm
@r_devops