I got tired of writing boilerplate config parsers in C, so I built a zero-dependency schema-to-struct generator (cfgsafe)
Hey everyone,
Like a lot of you, I find dealing with application configuration in C to be a massive pain. You usually end up choosing between:
1. Pulling in a heavy library.
2. Using a generic INI parser that forces you to use string lookups (`hash_get("db.port")`) everywhere.
3. Writing a bunch of manual, brittle `strtol` and validation boilerplate.
I wanted something that gives me **strongly-typed structs** and **guarantees that my data is valid** before my core application logic even runs.
So I built **cfgsafe**. It’s a pure C99 code generator and parser.
You define your configuration shape in a tiny `.schema` file:
schema ServerConfig {
service_name: string {
min_length: 3
}
section database {
host: string { default: "localhost", env: "DB_HOST" }
port: int { range: 1..65535 }
}
use_tls: bool { default: false }
cert: path {
required_if: use_tls == true
exists: true
}
}
Then you run my generator (`cfg-gen config.schema`). It spits out a **single-file STB-style C header** containing both your exact structs and the parsing implementation.
In your `main.c`, using it is completely native and completely safe:
ServerConfig_t cfg;
cfg_error_t err;
// Loads the INI, applies ENV variables, and runs your validation checks
cfg_status_t status = ServerConfig_load(&cfg, "config.ini", &err);
if (status == CFG_SUCCESS) {
// 100% type-safe. No void pointers. No manual parsing.
printf("Starting %s on %s:%d\n",
cfg.service_name,
cfg.database.host,
(int)cfg.database.port);
ServerConfig_free(&cfg);
} else {
// Gives you granular errors: e.g. "Field 'database.port' out of range"
fprintf(stderr, "Startup error (%s): %s\n", err.field, err.message);
}
# Why I think it's cool:
* **Zero Dependencies:** No external regex engines or JSON libraries needed. The generated STB header is all you need.
* **Complex Validation Baked In:** Built-in support for numeric ranges (`1..100`), regex patterns, array lengths, cross-field conditional logic (`required_if`), and even checking if file paths actually exist on the system *during* parsing!
* **First-Class Env Variables:** If `DB_HOST` is set in the environment, it seamlessly overrides the INI file.
I’d love to get feedback from other C developers. Is this something you'd use in your projects? Are there config features I missed?
**Repo:** [https://github.com/aikoschurmann/cfgsafe](https://github.com/aikoschurmann/cfgsafe) *(Docs and examples are in the README!)*
https://redd.it/1ryup8z
@r_devops
Hey everyone,
Like a lot of you, I find dealing with application configuration in C to be a massive pain. You usually end up choosing between:
1. Pulling in a heavy library.
2. Using a generic INI parser that forces you to use string lookups (`hash_get("db.port")`) everywhere.
3. Writing a bunch of manual, brittle `strtol` and validation boilerplate.
I wanted something that gives me **strongly-typed structs** and **guarantees that my data is valid** before my core application logic even runs.
So I built **cfgsafe**. It’s a pure C99 code generator and parser.
You define your configuration shape in a tiny `.schema` file:
schema ServerConfig {
service_name: string {
min_length: 3
}
section database {
host: string { default: "localhost", env: "DB_HOST" }
port: int { range: 1..65535 }
}
use_tls: bool { default: false }
cert: path {
required_if: use_tls == true
exists: true
}
}
Then you run my generator (`cfg-gen config.schema`). It spits out a **single-file STB-style C header** containing both your exact structs and the parsing implementation.
In your `main.c`, using it is completely native and completely safe:
ServerConfig_t cfg;
cfg_error_t err;
// Loads the INI, applies ENV variables, and runs your validation checks
cfg_status_t status = ServerConfig_load(&cfg, "config.ini", &err);
if (status == CFG_SUCCESS) {
// 100% type-safe. No void pointers. No manual parsing.
printf("Starting %s on %s:%d\n",
cfg.service_name,
cfg.database.host,
(int)cfg.database.port);
ServerConfig_free(&cfg);
} else {
// Gives you granular errors: e.g. "Field 'database.port' out of range"
fprintf(stderr, "Startup error (%s): %s\n", err.field, err.message);
}
# Why I think it's cool:
* **Zero Dependencies:** No external regex engines or JSON libraries needed. The generated STB header is all you need.
* **Complex Validation Baked In:** Built-in support for numeric ranges (`1..100`), regex patterns, array lengths, cross-field conditional logic (`required_if`), and even checking if file paths actually exist on the system *during* parsing!
* **First-Class Env Variables:** If `DB_HOST` is set in the environment, it seamlessly overrides the INI file.
I’d love to get feedback from other C developers. Is this something you'd use in your projects? Are there config features I missed?
**Repo:** [https://github.com/aikoschurmann/cfgsafe](https://github.com/aikoschurmann/cfgsafe) *(Docs and examples are in the README!)*
https://redd.it/1ryup8z
@r_devops
GitHub
GitHub - aikoschurmann/cfgsafe
Contribute to aikoschurmann/cfgsafe development by creating an account on GitHub.
Managing state of applications
I recently got a new job and im importibg every cloud resource to IaC. Then I will just change the terraform variables and deploy everything to prod (they dont have a prod yet)
There is postgres and keycloak deployed. I also think that I should postgres databases and users in code via ansible. Same with keycloak. Im thinking to reduce the permissons of the developers in postgres and keycloak, so only way they can create stuff is through PRs to ansible with my revier
I want to double check if it has any downsides or good practice.
Any comments?
https://redd.it/1rz19ei
@r_devops
I recently got a new job and im importibg every cloud resource to IaC. Then I will just change the terraform variables and deploy everything to prod (they dont have a prod yet)
There is postgres and keycloak deployed. I also think that I should postgres databases and users in code via ansible. Same with keycloak. Im thinking to reduce the permissons of the developers in postgres and keycloak, so only way they can create stuff is through PRs to ansible with my revier
I want to double check if it has any downsides or good practice.
Any comments?
https://redd.it/1rz19ei
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I Benchmarked Redis vs Valkey vs DragonflyDB vs KeyDB
Hi everyone
I just created a benchmark comparing Redis, Valkey, DragonflyDB, and KeyDB.
Honestly this one was pretty interesting, and some of the results were surprising enough that I reran the benchmark quite a few times to make sure they were real.
As requested on my previous benchmarks, I also uploaded the benchmark to GitHub.
|Benchmark|Redis
|:-|:-|:-|:-|:-|
|Small writes throughput
|Hot reads throughput
|Mixed workload throughput
|Pipeline throughput
|Hot reads p95 latency
|Mixed workload p95 latency
|Pub/Sub p95 latency
Full benchmark + charts: here
GitHub
Happy to run more tests if there’s interest
https://redd.it/1rz2tx1
@r_devops
Hi everyone
I just created a benchmark comparing Redis, Valkey, DragonflyDB, and KeyDB.
Honestly this one was pretty interesting, and some of the results were surprising enough that I reran the benchmark quite a few times to make sure they were real.
As requested on my previous benchmarks, I also uploaded the benchmark to GitHub.
|Benchmark|Redis
8.4.0|DragonflyDB v1.37.0|Valkey 9.0.3|KeyDB v6.3.4||:-|:-|:-|:-|:-|
|Small writes throughput
(higher is better)|452,812 ops/s|494,248 ops/s|432,825 ops/s|385,182 ops/s||Hot reads throughput
(higher is better)|460,361 ops/s|494,811 ops/s|445,592 ops/s|475,307 ops/s||Mixed workload throughput
(higher is better)|444,026 ops/s|468,316 ops/s|428,907 ops/s|405,764 ops/s||Pipeline throughput
(higher is better)|1,179,179 ops/s|951,274 ops/s|1,461,472 ops/s|647,779 ops/s||Hot reads p95 latency
(lower is better)|0.607 ms|0.743 ms|1.191 ms|0.711 ms||Mixed workload p95 latency
(lower is better)|0.623 ms|0.783 ms|1.271 ms|0.735 ms||Pub/Sub p95 latency
(lower is better)|0.592 ms|0.583 ms|1.002 ms|0.557 ms|Full benchmark + charts: here
GitHub
Happy to run more tests if there’s interest
https://redd.it/1rz2tx1
@r_devops
RepoFlow
Redis vs Valkey vs DragonflyDB vs KeyDB Benchmarks
Raw benchmark results for Redis, DragonflyDB, Valkey, and KeyDB across small writes, reads, mixed traffic, batching, fanout, latency, and memory use.
What cloud cost fixes actually survive sprint planning on your team?
I keep coming back to this because it feels like the real bottleneck is not detection.
Most teams can already spot some obvious waste:
gp2 to gp3
log retention cleanup
unattached EBS
idle dev resources
old snapshots nobody came back to
But once that has to compete with feature work, a lot of it seems to die quietly.
The pattern feels familiar:
everyone agrees it should be fixed
nobody really argues with the savings
a ticket gets created
then it loses to roadmap work and just sits there
So I’m curious how people here actually handle this in practice.
What kinds of cloud cost fixes tend to survive prioritization on your team?
And what kinds usually get acknowledged, ticketed, and then ignored for weeks?
I’ve been building around this problem, so I’m biased, but I’m starting to think the real gap is not finding waste. It’s turning it into work that actually has a chance of getting done.
https://redd.it/1rz607q
@r_devops
I keep coming back to this because it feels like the real bottleneck is not detection.
Most teams can already spot some obvious waste:
gp2 to gp3
log retention cleanup
unattached EBS
idle dev resources
old snapshots nobody came back to
But once that has to compete with feature work, a lot of it seems to die quietly.
The pattern feels familiar:
everyone agrees it should be fixed
nobody really argues with the savings
a ticket gets created
then it loses to roadmap work and just sits there
So I’m curious how people here actually handle this in practice.
What kinds of cloud cost fixes tend to survive prioritization on your team?
And what kinds usually get acknowledged, ticketed, and then ignored for weeks?
I’ve been building around this problem, so I’m biased, but I’m starting to think the real gap is not finding waste. It’s turning it into work that actually has a chance of getting done.
https://redd.it/1rz607q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Does anyone works for SKY TV UK?
Hi All,
I have an interview scheduled at SKY headoffice on next Monday for the SRE engineer second round. Does anyone have an idea of how it would be?
https://redd.it/1rz7blw
@r_devops
Hi All,
I have an interview scheduled at SKY headoffice on next Monday for the SRE engineer second round. Does anyone have an idea of how it would be?
https://redd.it/1rz7blw
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Trivy - Supply chain attack
https://arstechnica.com/security/2026/03/widely-used-trivy-scanner-compromised-in-ongoing-supply-chain-attack/
Of course this hits late on a Friday :(
https://redd.it/1rz98r2
@r_devops
https://arstechnica.com/security/2026/03/widely-used-trivy-scanner-compromised-in-ongoing-supply-chain-attack/
Of course this hits late on a Friday :(
https://redd.it/1rz98r2
@r_devops
Ars Technica
Widely used Trivy scanner compromised in ongoing supply-chain attack
Admins: Sorry to say, but it's likely a rotate-your-secrets kind of weekend.
A Technical Write Up on the Trivy Supply Chain Attack
I wrote a little blog on some deeper dives into how the Trivy Supply Chain attack happened: https://rosesecurity.dev/2026/03/20/typosquatting-trivy.html
https://redd.it/1rzbg4l
@r_devops
I wrote a little blog on some deeper dives into how the Trivy Supply Chain attack happened: https://rosesecurity.dev/2026/03/20/typosquatting-trivy.html
https://redd.it/1rzbg4l
@r_devops
rosecurity@dev
How a Typosquatted Domain and a Fake Version Tag Turned Trivy Into a Credential Stealer
On March 19, 2026, someone (or some group) poisoned the Aqua Security Trivy ecosystem. A tool that thousands of organizations rely on to find vulnerabilities in their container images and configurations was quietly turned into a weapon that stole their secrets…
Is it wise for me to work on this and migrate out of Jenkins to Bitbucket Pipelines?
I have an existing infra repository that uses terraform to build resources on AWS for various projects. It already have VPC and other networking set up and everything is working well.
I’m looking to migrate it out to opentofu and using bitbucket pipelines to do our CI/CD as opposed to Jsnkins which is our current CI/CD solution.
Is it wise for me to create another VPC on a new mono-repo or should I just leverage the existing VPC? for this?
I’m looking to shift all our staging environment to on-site and using NGINX and ALB to direct all traffic to the relevant on-site resources and only use AWS for prod services. Would love to have your advice on this
https://redd.it/1rzg0no
@r_devops
I have an existing infra repository that uses terraform to build resources on AWS for various projects. It already have VPC and other networking set up and everything is working well.
I’m looking to migrate it out to opentofu and using bitbucket pipelines to do our CI/CD as opposed to Jsnkins which is our current CI/CD solution.
Is it wise for me to create another VPC on a new mono-repo or should I just leverage the existing VPC? for this?
I’m looking to shift all our staging environment to on-site and using NGINX and ALB to direct all traffic to the relevant on-site resources and only use AWS for prod services. Would love to have your advice on this
https://redd.it/1rzg0no
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Replacing MinIO with RustFS via simple binary swap (Zero-data migration guide)
Hi everyone, I’m from the RustFS team (u/rustfs_official).
If you’re managing MinIO clusters, you’ve probably seen the recent repo archiving. For the r/devops community, "migration" usually means a massive headache—egress costs, downtime, and the technical risk of moving petabytes of production data over the network.
We’ve been working on a binary replacement path to skip that entirely. Instead of a traditional move, you just update your Docker image or swap the binary. The engine is built to natively parse your existing bucket metadata, IAM policies, and lifecycle rules directly from the on-disk format.
Why this fits a DevOps workflow:
Actually "Drop-in": Designed to be swapped into your existing `docker-compose` or K8s manifests. It maintains S3 API parity, so your application-level endpoints don't need to change.
Rust-Native Performance: We built this for high-concurrency AI/ML workloads. Using Rust lets us eliminate the GC-related latency spikes often found in Go-based systems. RDMA and DPU support are on our roadmap to offload the storage path from the CPU.
Predictable Tail Latency: We’ve focused on a leaner footprint and more consistent performance than legacy clusters, especially under heavy IOPS.
Zero-Data Migration: No re-uploading or network transfer. RustFS reads the existing MinIO data layout natively, so you keep your data exactly where it is during the swap.
We’re tracking the technical implementation and the step-by-step migration guide in this GitHub issue:
https://github.com/rustfs/rustfs/issues/2212
We are currently at
https://redd.it/1rz148h
@r_devops
Hi everyone, I’m from the RustFS team (u/rustfs_official).
If you’re managing MinIO clusters, you’ve probably seen the recent repo archiving. For the r/devops community, "migration" usually means a massive headache—egress costs, downtime, and the technical risk of moving petabytes of production data over the network.
We’ve been working on a binary replacement path to skip that entirely. Instead of a traditional move, you just update your Docker image or swap the binary. The engine is built to natively parse your existing bucket metadata, IAM policies, and lifecycle rules directly from the on-disk format.
Why this fits a DevOps workflow:
Actually "Drop-in": Designed to be swapped into your existing `docker-compose` or K8s manifests. It maintains S3 API parity, so your application-level endpoints don't need to change.
Rust-Native Performance: We built this for high-concurrency AI/ML workloads. Using Rust lets us eliminate the GC-related latency spikes often found in Go-based systems. RDMA and DPU support are on our roadmap to offload the storage path from the CPU.
Predictable Tail Latency: We’ve focused on a leaner footprint and more consistent performance than legacy clusters, especially under heavy IOPS.
Zero-Data Migration: No re-uploading or network transfer. RustFS reads the existing MinIO data layout natively, so you keep your data exactly where it is during the swap.
We’re tracking the technical implementation and the step-by-step migration guide in this GitHub issue:
https://github.com/rustfs/rustfs/issues/2212
We are currently at
v1.0.0-alpha.87 and pushing toward a stable Beta in April.https://redd.it/1rz148h
@r_devops
GitHub
A Simple Way to Migrate from MinIO to RustFS · Issue #2212 · rustfs/rustfs
Binary Replacement: A Simple Way to Migrate from MinIO to RustFS Today we are excited to announce that RustFS has introduced a key feature in the latest release (1.0.0-alpha.87): users can now migr...
Finding RCA using AI when an alert is triggered.
I am trying to build a service that finds RCA based on different data sources such as ELK, NR, and ALB when an alert is triggered.
Please suggest that am I in right direction
```bash
curl https://localhost:8000/rca/9af624ff-e749-46d2-a317-b728c345e953
```
output
```json
{
"incident_id": "9af624ff-e749-46d2-a317-b728c345e953",
"generated_at": "2026-03-20T18:57:17.759071",
"summary": "The incident involves errors in the `prod-sub-service` service, specifically related to the `/api/v2/subscription/coupons/{couponCode}` endpoint. The root cause appears to be a code bug within the application logic handling coupon code updates, leading to errors during PUT requests. The absence of ALB data and traffic volume information limits the ability to assess traffic-related factors.",
"probable_root_causes": [
{
"rank": 1,
"root_cause": "Code bug in coupon update logic",
"description": "The New Relic APM traces indicate an error occurring within the `WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode}` endpoint during a PUT request. The ELK logs show WARN messages originating from multiple instances of the `subscription-backend-newecs` service around the same time as the New Relic errors, suggesting a widespread issue. The lack of ALB data prevents correlation with specific user requests, but the New Relic trace provides a sample URL indicating the affected endpoint.",
"confidence_score": 0.85,
"supporting_evidence": [
"NR: Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"NR: sampleUrl: /api/v2/subscription/coupons/CMIMT35",
"ELK: WARN messages from multiple instances of `subscription-backend-newecs` service"
],
"mitigations": [
"Rollback the latest deployment if a recent code change is suspected.",
"Investigate the coupon update logic in the `api/v2/subscription/coupons/{couponCode}` endpoint."
]
}
],
"overall_confidence": 0.8,
"immediate_actions": "Monitor the error rate and consider rolling back the latest deployment if the error rate continues to increase. Investigate the application logs for more detailed error messages.",
"permanent_fix": "Identify and fix the code bug in the coupon update logic. Add more robust error handling and logging to the `api/v2/subscription/coupons/{couponCode}` endpoint. Implement thorough testing of coupon-related functionality before future deployments."
}
```
```bash
curl https://localhost:8000/evidence/9af624ff-e749-46d2-a317-b728c345e953
```
```json
{
"incident_id": "9af624ff-e749-46d2-a317-b728c345e953",
"summary": "Incident 9af624ff-e749-46d2-a317-b728c345e953: prod-sub-service_4xx>400",
"error_signatures": [
{
"source": "newrelic",
"error_class": "UnknownError",
"error_message": "Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"transaction": "WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"count": 1,
"sources": [
"newrelic"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.352Z WARN 1 --- [subscription-backend-newecs] [o-7570-exec-207] [69bd98062347b35a37a12ec7150a752f-37a12ec7150a752f] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1759206496052 or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.348Z WARN 1 ---
I am trying to build a service that finds RCA based on different data sources such as ELK, NR, and ALB when an alert is triggered.
Please suggest that am I in right direction
```bash
curl https://localhost:8000/rca/9af624ff-e749-46d2-a317-b728c345e953
```
output
```json
{
"incident_id": "9af624ff-e749-46d2-a317-b728c345e953",
"generated_at": "2026-03-20T18:57:17.759071",
"summary": "The incident involves errors in the `prod-sub-service` service, specifically related to the `/api/v2/subscription/coupons/{couponCode}` endpoint. The root cause appears to be a code bug within the application logic handling coupon code updates, leading to errors during PUT requests. The absence of ALB data and traffic volume information limits the ability to assess traffic-related factors.",
"probable_root_causes": [
{
"rank": 1,
"root_cause": "Code bug in coupon update logic",
"description": "The New Relic APM traces indicate an error occurring within the `WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode}` endpoint during a PUT request. The ELK logs show WARN messages originating from multiple instances of the `subscription-backend-newecs` service around the same time as the New Relic errors, suggesting a widespread issue. The lack of ALB data prevents correlation with specific user requests, but the New Relic trace provides a sample URL indicating the affected endpoint.",
"confidence_score": 0.85,
"supporting_evidence": [
"NR: Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"NR: sampleUrl: /api/v2/subscription/coupons/CMIMT35",
"ELK: WARN messages from multiple instances of `subscription-backend-newecs` service"
],
"mitigations": [
"Rollback the latest deployment if a recent code change is suspected.",
"Investigate the coupon update logic in the `api/v2/subscription/coupons/{couponCode}` endpoint."
]
}
],
"overall_confidence": 0.8,
"immediate_actions": "Monitor the error rate and consider rolling back the latest deployment if the error rate continues to increase. Investigate the application logs for more detailed error messages.",
"permanent_fix": "Identify and fix the code bug in the coupon update logic. Add more robust error handling and logging to the `api/v2/subscription/coupons/{couponCode}` endpoint. Implement thorough testing of coupon-related functionality before future deployments."
}
```
```bash
curl https://localhost:8000/evidence/9af624ff-e749-46d2-a317-b728c345e953
```
```json
{
"incident_id": "9af624ff-e749-46d2-a317-b728c345e953",
"summary": "Incident 9af624ff-e749-46d2-a317-b728c345e953: prod-sub-service_4xx>400",
"error_signatures": [
{
"source": "newrelic",
"error_class": "UnknownError",
"error_message": "Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"transaction": "WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"count": 1,
"sources": [
"newrelic"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.352Z WARN 1 --- [subscription-backend-newecs] [o-7570-exec-207] [69bd98062347b35a37a12ec7150a752f-37a12ec7150a752f] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1759206496052 or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.348Z WARN 1 ---
[subscription-backend-newecs] [io-7570-exec-27] [69bd9806ff3c59d567dab14f8f053ec9-67dab14f8f053ec9] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: amp-q2qBEcUz8XpTtq6uRj7Mlg or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.294Z WARN 1 --- [subscription-backend-newecs] [io-7570-exec-15] [69bd9806d2f343be667802fffd087c32-667802fffd087c32] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1769877708220 or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.139Z WARN 1 --- [subscription-backend-newecs] [o-7570-exec-210] [69bd980671619f9bdb0caa96d4af52e5-db0caa96d4af52e5] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1769877708220 or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:00.660Z WARN 1 --- [subscription-backend-newecs] [o-7570-exec-327] [69bd980424debc250365d3ed4c60d3c0-0365d3ed4c60d3c0] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1618108529209 or number: , timestamp=Fri Mar 20 18:55:00 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
}
],
"slow_traces": [
{
"transaction": "WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"error_class": "",
"error_message": "Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"sample_uri": "/api/v2/subscription/coupons/CZMINT35",
"count": 1,
"trace_id": "trace-unknown"
}
],
"failed_requests": [
{
"source": "newrelic",
"url": "/api/v2/subscription/coupons/CZMINT35",
"error_class": "",
"error_message": "Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"trace_id": "trace-unknown"
}
],
"traffic_analysis": {
"total_requests": 0,
"total_errors": 0,
"error_rate_pct": 0.0,
"top_client_ips": [],
"top_user_agents": [],
"ip_concentration_alert": false,
"ua_concentration_alert": false
},
"blast_summary": "New Relic: 1 error transactions | ELK: 588 error log entries",
"timeline_summary": "First error at 2026-03-20T18:52:17.356000 | Peak at 2026-03-20T18:55:02.353000"
}
```
https://redd.it/1rz6z3m
@r_devops
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.294Z WARN 1 --- [subscription-backend-newecs] [io-7570-exec-15] [69bd9806d2f343be667802fffd087c32-667802fffd087c32] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1769877708220 or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:02.139Z WARN 1 --- [subscription-backend-newecs] [o-7570-exec-210] [69bd980671619f9bdb0caa96d4af52e5-db0caa96d4af52e5] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1769877708220 or number: , timestamp=Fri Mar 20 18:55:02 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
},
{
"source": "elk",
"service": "prod-subscription-service",
"error": "2026-03-20T18:55:00.660Z WARN 1 --- [subscription-backend-newecs] [o-7570-exec-327] [69bd980424debc250365d3ed4c60d3c0-0365d3ed4c60d3c0] c.h.s.e.handlers.GlobalExceptionHandler : Exception: CustomException(code=404, message=Customer does not exist for id: 1618108529209 or number: , timestamp=Fri Mar 20 18:55:00 GMT 2026, path=/api/v1/subscription/customer)",
"count": 1,
"sources": [
"elk"
]
}
],
"slow_traces": [
{
"transaction": "WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"error_class": "",
"error_message": "Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"sample_uri": "/api/v2/subscription/coupons/CZMINT35",
"count": 1,
"trace_id": "trace-unknown"
}
],
"failed_requests": [
{
"source": "newrelic",
"url": "/api/v2/subscription/coupons/CZMINT35",
"error_class": "",
"error_message": "Error in WebTransaction/SpringController/api/v2/subscription/coupons/{couponCode} (PUT)",
"trace_id": "trace-unknown"
}
],
"traffic_analysis": {
"total_requests": 0,
"total_errors": 0,
"error_rate_pct": 0.0,
"top_client_ips": [],
"top_user_agents": [],
"ip_concentration_alert": false,
"ua_concentration_alert": false
},
"blast_summary": "New Relic: 1 error transactions | ELK: 588 error log entries",
"timeline_summary": "First error at 2026-03-20T18:52:17.356000 | Peak at 2026-03-20T18:55:02.353000"
}
```
https://redd.it/1rz6z3m
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need advice on changing domain from Azure IAM to Azure devops
Hey folks,
I currently work at TCS as support engineer helping customers resolve tickets on Azure around IAM
With 5 yoe my salary is just 4.5 lpa (INR)
Need advice if I want to move to Azure devops
Do I need certification or any upskilling advice
Would really appreciate the same
https://redd.it/1rzpct1
@r_devops
Hey folks,
I currently work at TCS as support engineer helping customers resolve tickets on Azure around IAM
With 5 yoe my salary is just 4.5 lpa (INR)
Need advice if I want to move to Azure devops
Do I need certification or any upskilling advice
Would really appreciate the same
https://redd.it/1rzpct1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
jsongrep is faster than {jq, jmespath, jsonpath-rust, jql}
jsongrep is an open source tool I made for querying JSON that is fast, like really really fast.
I started working on the project as part of my undergraduate research— it has an intuitive regular path query language and also exposes its search engine as a Rust library if you’re looking to integrate into your Rust projects.
I find the tool incredibly useful for working with JSON and it has become my de facto JSON tool over existing projects like jq.
Technical blog post: https://micahkepe.com/blog/jsongrep/
GitHub: https://github.com/micahkepe/jsongrep
Benchmarks: https://micahkepe.com/jsongrep/end_to_end_xlarge/report/index.html
https://redd.it/1rzv98l
@r_devops
jsongrep is an open source tool I made for querying JSON that is fast, like really really fast.
I started working on the project as part of my undergraduate research— it has an intuitive regular path query language and also exposes its search engine as a Rust library if you’re looking to integrate into your Rust projects.
I find the tool incredibly useful for working with JSON and it has become my de facto JSON tool over existing projects like jq.
Technical blog post: https://micahkepe.com/blog/jsongrep/
GitHub: https://github.com/micahkepe/jsongrep
Benchmarks: https://micahkepe.com/jsongrep/end_to_end_xlarge/report/index.html
https://redd.it/1rzv98l
@r_devops
Micah's Secret Blog
jsongrep is faster than {jq, jmespath, jsonpath-rust, jql}
An introduction to the jsongrep tool, a technical explanation of its DFA-based search engine, and performance results against popular JSON query tools.
Going from DevOps to L3 Support role
Hi community, I need some advice from you guys. This is a special scenario.
I have about 4 years of DevOps experience. I'm looking to move from a DevOps Engineer role to an L3 support role within the same company. I know it feels like a downgrade, but let me compare the facts.
Currently, I'm working as a DevOps Engineer for this early-stage company. But there are a few problems. So I'm looking forward to go into the L3 support team. There are pros and cons. Let me list them down.
**DevOps Engineer**
**Pros**
* Tech stack is good. (AWS, ECS, Terraform, GitHub Actions)
* Weekends are usually free. (However, having a weekend support roster that is manageable)
**Cons**
* High Pressure Environment (We are getting frequent DB access tickets, Pipeline failures)
* High Context Switching with high message load.
* Due to the high workload and faster delivery, we might need to do work extra hours regularly (like 12+ hours)
* Job security is low. People are getting terminated for low performance. And remaining team members are also exhausted.
* No Leaves/Holidays.
* Salary is relatively low compared to other L3 team, and no benefits.
**L3 Support Engineer (same company)**
**Pros**
* The team is familiar to me. So I think culture will be supportive.
* Job security is relatively high, due to understandable management.
* **Salary is possibly 15% highe**r, with other benefits like medical insurance.
* Relatively less pressure now, a manageable amount of tickets. We are getting tickets filtered after the L2 support. Not sure whether the ticket count will increase in the future.
**Cons**
* 24x7 Roster Basis. So will have to do night shifts twice a week.
* No weekend off since it is a roster. But there will be like 2 days off after 6 days.
* Tech Stack is Application Support. So, we need to understand how the app works in depth, with code-level understanding, to work with Databases. But **no direct DevOps exposure.**
I know DevOps is technically a much better job, but for me, it's difficult to work in this high-pressure, fast-paced team.
My mind says maybe I should move into the L3 support team. If I move there, I need to do regular certifications and projects in my personal time to keep my DevOps skills in tact. That's my plan.
I can't go find another DevOps job because the job market is very bad right now, and the salary here is above market rates.
What's your view on this? I'd like to get some outside views on this problem.
TIA!!
https://redd.it/1s028uz
@r_devops
Hi community, I need some advice from you guys. This is a special scenario.
I have about 4 years of DevOps experience. I'm looking to move from a DevOps Engineer role to an L3 support role within the same company. I know it feels like a downgrade, but let me compare the facts.
Currently, I'm working as a DevOps Engineer for this early-stage company. But there are a few problems. So I'm looking forward to go into the L3 support team. There are pros and cons. Let me list them down.
**DevOps Engineer**
**Pros**
* Tech stack is good. (AWS, ECS, Terraform, GitHub Actions)
* Weekends are usually free. (However, having a weekend support roster that is manageable)
**Cons**
* High Pressure Environment (We are getting frequent DB access tickets, Pipeline failures)
* High Context Switching with high message load.
* Due to the high workload and faster delivery, we might need to do work extra hours regularly (like 12+ hours)
* Job security is low. People are getting terminated for low performance. And remaining team members are also exhausted.
* No Leaves/Holidays.
* Salary is relatively low compared to other L3 team, and no benefits.
**L3 Support Engineer (same company)**
**Pros**
* The team is familiar to me. So I think culture will be supportive.
* Job security is relatively high, due to understandable management.
* **Salary is possibly 15% highe**r, with other benefits like medical insurance.
* Relatively less pressure now, a manageable amount of tickets. We are getting tickets filtered after the L2 support. Not sure whether the ticket count will increase in the future.
**Cons**
* 24x7 Roster Basis. So will have to do night shifts twice a week.
* No weekend off since it is a roster. But there will be like 2 days off after 6 days.
* Tech Stack is Application Support. So, we need to understand how the app works in depth, with code-level understanding, to work with Databases. But **no direct DevOps exposure.**
I know DevOps is technically a much better job, but for me, it's difficult to work in this high-pressure, fast-paced team.
My mind says maybe I should move into the L3 support team. If I move there, I need to do regular certifications and projects in my personal time to keep my DevOps skills in tact. That's my plan.
I can't go find another DevOps job because the job market is very bad right now, and the salary here is above market rates.
What's your view on this? I'd like to get some outside views on this problem.
TIA!!
https://redd.it/1s028uz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community