Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Getting to the final round of interviews only to be passed over for the other candidate feels bad.

I didn't receive any particular feedback that said why, but if I had to guess it's because I'm in a larger city, where the cost of living necessitates a higher salary so I was asking for the higher end of what they were offering. But that's pure speculation. Could be the other candidate was just more qualified too.

Either way, it sucks. I've been out of work for months trying to find something. I really, REALLY don't want to work for defense contractors, but they're some of the only people in my state that are hiring and paying, and it's also mostly in-office (or all in-office).

I'll just keep looking until I find something, but yeah feelsbadman

https://redd.it/1knn56d
@r_devops
👍1
How do you approach opentelemetry traces, metrics and logs for Local/CI envs in your day-to-day work? Looking to exchange experiences.

Hello Folks,

I'm working in a project and I'm helping the team to instrument the services in way that it can help the devs to get more insight about what their code is doing and also OPS teams to get understanding on what is happening on the CI side from time to time.

Of course I could just push the money printer button and just use Datadog or something similar, but I'm thinking about the dev experience using local (opensource) tools.

In the past, I've used the following tools:

* OpenSearch: dataprepper + opensearch, requires one configuration file but you get hit by \~1.5GB memory usage;
* Grafana Labs: Grafana +Alloy + Tempo + Loki + Prometheus works but requires more configuration.

The thing is: when something fails, devs have problems to identify what component or microservice that is part the observability stack failed, some doesnt even knows that something is not working.

So I'm trying to improve the situation above and of course, maybe someone can call it hair splitting ... but currently I maybe found the most lightweight setup that I could've ask:

* davetron5000/otel-desktop-viewer + prometheus + dozzle: prometheus has now an otlp receiver and the otel-desktop-viewer is simple: no need to setup otelcol or something else. Dozzle for logs.

The solution above doesn't have any kind of correlation but its really light weight: if you can't see the traces interface, recreate the container; same goes for prometheus metrics.

With the above in mind, I'd like to ask:

What is the toolset that you employ to the scenario above? What do you like more about it?

Thanks in advance.

https://redd.it/1knqptx
@r_devops
Common pattern of success.

Good evening, fellow engineers.

Tonight I’ve been reflecting on everything that’s been happening to me and of course I know I’m not alone. Every one of us has a story. Joy, pain, burnout, moments of pride, periods of depression, wins and losses. Life hits us all.
So here’s my honest question to the truly SUCCESSFUL, GROUNDED, and BRILLIANT engineers in this space:
What’s your recipe? What keeps you moving forward even when mentally, emotionally, or spiritually you’re completely drained with all kind of life circumstances- family, society etc.

I’m not some kid with wide-eyed wonder asking a feel-good, cliche question. I’m an adult who’s been in and still is in a never-ending grind. But at some point, I just have to ask: how? What’s the actual difference between someone who breaks through and someone who stays stuck, looping in the same spiral for years?

Let’s put aside the motivational quotes and hustle porn etc. There must be something real, something practical and shared that unites those who consistently get through the fog and stay on the path.

So what are your biggest struggles when it comes to your career? How do you overcome them day in, day out? What patterns or mindsets you guys have that actually move you forward?

P.S to folks with high sense of humor: I’m all for humor and good energy, but this one matters so pls let’s keep it real. This could genuinely help a lot of people who are stuck in silence right now.

https://redd.it/1knsu0y
@r_devops
Devops projects

Can you guys please help me with some of the best projects that I can add in my resume as I am from testing background.
I want to do 30 days 30 projects .

https://redd.it/1knutfb
@r_devops
What would be your next step?

Some background: I've got about 11 years of experience running or leading software projects in different areas, from small business automation to 2 start-ups and now also close to 3 years experience as a python/django developer. When I left my most recent start-up, I was hired as a developer, and after 3 months, got a new head of department, and my role changed to be more DevOps. The next 3 months I worked on migrating 3 projects from Linode server to K8s, and then also upgraded several parts of the existing k8s infrastructure from k8s secrets to GCP Secrets Manager.

All this work went well and I learnt loads. My work is in production, so I must have done something right.

However, last week, I got fired. No prior indication in 1:1 meetings that anything was wrong. The reason I was given is that the role I was in is very technical, and the ratio of my experience is the wrong way round. (They want it to be more 11 years as developer IC and 3 years managing projects)


I really enjoy working as an IC, and especially enjoy the K8s/DevOps side of things. I've been looking at applying for technical project management roles, but that seems like it would take me completely out of the IC or DevOps side of things. On the other hand I am not sure what type of role to go for next, where my experience won't end up counting against me again.

UK based.

Appreciate your thoughts.

https://redd.it/1knv27n
@r_devops
Optimising OpenTelemetry pipelines to cut observability vendor costs with filtering, sampling etc

If you’re using a managed observability vendor and not self-hosting, rising ingestion and storage costs can quickly become a major issue, specially as your telemetry volume grows.

Here are a few approaches I’ve implemented to reduce telemetry noise and control costs in OpenTelemetry pipelines:

Filtering health check traffic: Drop spans and logs from periodic `/health` or `/ready` endpoints using the OTel Collector `filterprocessor`.
Trace sampling: Apply tail-based or probabilistic sampling to reduce high-volume, low-signal traces (e.g., homepage GET requests) while retaining statistically meaningful coverage.
Log severity filtering: Drop low-severity (`DEBUG`) logs in production pipelines, keeping only `INFO` and above.
Vendor ingest controls: Use backend features like SigNoz Ingest Guard, Datadog Logging Without Limits, or Splunk Ingest Actions to cap ingestion rates and manage surges at the source.

I’ve written a detailed blog that covers how to identify observability noise, implement these strategies, including solid OTel Collector config examples.

https://redd.it/1kntxrv
@r_devops
Is it just me, or the demand for DevSecOps / Cloud Security sucks right now ? Based in Netherlands

Hey guys,

I've recently been working DevSecOps / Cloud Security for a couple of years, based out of Netherlands. Mostly have experience in AWS, but starting to work in GCP

Recently I was searching for opportunities on LinkedIn, and it seems that they're super hard to come by. I can see a lot of opportunities for DevOps people, but its like no one wants a DevOps person dedicated to security

I've seen some which either requires a 6 - 7 years of experience, with someone who has experience on every cloud based technology under the sun or they want no one

Also, I'm not sure if its just the market in NL, but it seems like a lot of companies have their infra in Azure, so every other DevOps / DevSecOps opportunities mentions their tooling. Companies with their infra in AWS seem really far & in between

So I wanted to come on here & ask other engineers, that is it just my experience or is my experience similar to yours ?

Also, any other pointers about the DevOps market in NL would be helpful

Thank you !

https://redd.it/1knwxyp
@r_devops
Junior DevOps role

Hello guys i am in the IT field for 4 years working as Network Security Administrator , and for some time now i want to migrate to DevOps team . I have started self studying the necessary technologies for this role, and my question to you what are my chances to start in such a role with NO previous experience in Development or Operations . At this point i am good with Linux and Python/Bash scripts and have some basic knowledge and hands-on with Docker , K8S , and Terraform . Just wondering if i have some realistic chances to het hired , thanks in advance !

https://redd.it/1knxd8k
@r_devops
Any tips & tricks to reduce Datadog logging costs in volatile environments?

If log volumes and usage patterns are volatile, what are the best ways to tame Datadog bills for log management? Agressive filtering and minimal retention of indexed logs isn't the solution apparently. The problem here is to find and maintain adequate balance between signal and noise.

Folks, has anybody run into smth like this and how have you solved it?

https://redd.it/1knycwy
@r_devops
Kubernetes 1.33 brings in-place Pod resource resizing (finally!)

Kubernetes 1.33 just dropped with a feature many of us have been waiting for - in-place Pod vertical scaling in beta, enabled by default!

What is it? You can now change CPU and memory resources for running Pods without restarting them. Previously, any resource change required Pod recreation.

Why it matters:

No more Pod restart roulette for resource adjustments
Stateful applications stay up during scaling
Live resizing without service interruption
Much smoother path for vertical scaling workflows

I've written a detailed post with a hands-on demo showing how to resize Pod resources without restarts. The demo is super simple - just copy, paste, and watch the magic happen.

Medium Post

Check it out if you're interested in the technical details, limitations, and future integration with VPA!

https://redd.it/1ko0mx5
@r_devops
Looking for 2025 DevOps trends and pain points

Hey folks!

I’m helping my team define OKRs and we want to bring more business value through DevOps and Cloud projects.

What are the main pain points you've seen in 2025 so far?
Any industries struggling more than others?
What kind of DevOps-driven offers could support business teams better?

Appreciate any thoughts or links. Thanks in advance!

https://redd.it/1knzqat
@r_devops
I self-created Linkedin Job, Applied with 18 different resumes to see which resume format passes ATS, here it is.

Hi Folks,

During past few weeks I was experimenting with Linkedin, I created few of accounts with different setup to see what makes candidate to have higher chances to get a job or be rejected by Linkedin filters.

Out of 56 candidates only 18 appeared in my Inbox, for others I had to manually select "Not a Fit" section (spam folder) to see those candidates as they are hidden. They get a rejection letter 3 days after application. LinkedIn does this 3 day thing not to frustrate people, shitty thing if you ask me cuz you are hopeful for that time while in fact you are already rejected.

Before I go on, let me give a full disclosure, I'm sharing [LaTeX formatted resume](https://interview10x.com/resume.latex) for TL;DR (latex is open source format for creating documents) also I'm adding UI Interface I did for those who just wanna use [UI to drag and drop PDF](https://interview10x.com/linkedin-optimization/), before you accuse me of something you should be aware that this app is open source, free and doesn't require signup it basically takes your current resume and converts that to the very same LaTeX resume so you don't have to do it manually. You can use either, both will be equally fine, UI works only for pdf (no Word files) also it fails sometimes (1-2% of times), I have no plans of improving it, but you can.

Ok lets continue with Linkedin filters:

* The very first and **most Brutal** filter is if your Country is not in same country where job was advertised.
* If job is advertised as Hybrid or On-Site, and your location is way too far even in same country you have 50-50 chance of ending up in spam (auto-reject)
* Another one is your Phone number's country code, don't use foreign numbers
* Another big one is Resume format. Some PDF resume formats especially fancy ones are not parsed well by Linkedin and if they can't parse it they will rank you significantly lower. Keep it very simple in terms of styling.
* Don't spam bunch of keywords e.g. comma separated/bullet list of technologies at the bottom of the page, this kind of tricks doesn't work anymore and will do more harm triggering spam filter, keywords should be naturally integrated in descriptions of what you did at your past jobs. If you need to highlight them for recruiters you can use bold text.

https://redd.it/1ko27x6
@r_devops
Quality vs speed?

Lone Devops engineer, still considered junior even after 2 years. There is so much crap I need to wrap my head around, and I still feel like I am learning every day. Some days I feel like I need to relearn what I learned months ago. Never ending cycle.

I had to push up and shipped an ask which was brand new to me, so I learned something while doing it. But also, it occurred to me, I may have skipped out on some best practices. I created my PR anyway and merged it. I figured it is best just to ship this now vs putting it on hold, and I can come back and reiterate on it.


As someone who is still on the lower end of the totem pole here, wanted to ask you all, do you guys find yourself shipping new functionality (rather merging new functionality) that may not always have the best practices but doing so just to get it out there due to 1. not blocking dev team, 2. having that new shiny functionality team wants, 3. deadlines, or whatever else.


I also did so because it felt like a ton of weight off my shoulders - but I know I will need to come back an reiterate on it. Am I in the wrong for this ? ( I do have a senior mentor but this person does not work on the project with me and is out on parental leave so I have no one to ask but you kind reddit strangers :) )

https://redd.it/1ko32r4
@r_devops
How to Consolidate Two Postgres Databases from Separate Cloud SQL Instances to Save Costs and Maintain Easy Migration?

I currently have two Google Cloud SQL instances, each hosting one Postgres database. Since my GCP credits are about to expire, I want to reduce costs by shutting down one Cloud SQL instance and moving its database elsewhere.

I’m considering two main options:

# Option 1: Move the database to the surviving Cloud SQL instance (2 databases in 1 instance)

Pros:
Easy migration using Google Database Migration Service
Managed backups, maintenance, and security handled by Cloud SQL
Easier future migration since it remains a managed Postgres service
Cons:
Potentially higher cost due to storage and instance size
Slightly against best practice of using multiple smaller instances instead of one large instance

# Option 2: Host the database myself on an existing VM (using Postgres in Docker)

Pros:
Cheaper in terms of Cloud SQL costs
Full control over configuration and tuning
Cons:
Need to manage backups, upgrades, and security manually
Possible performance impact on the VM running the application
Migration and scaling could be more complex in the future

# My questions:

1. Are there other cost-effective and manageable options I should consider for consolidating or migrating my Postgres databases?
2. If I choose Option 1, how significant are the downsides of running two databases on a single Cloud SQL instance? Is this a common and recommended practice?
3. If I choose Option 2, what are the best practices to ensure reliability, backups, and easy future migration?
4. Any tips on minimizing costs while maintaining performance and ease of management in Google Cloud SQL?

https://redd.it/1ko68nc
@r_devops
How can I detect when a new version of a chart is released so my repo updates and argo pushes it?

Is there a way to update my Chart.yaml's version when for example the traefik chart is updated upstream?

I'm using Argocd to manage my homelab. I tell it to watch one of my github repos.
In this repo I've got all my apps in in a /namespace/app folders
For some I use helm charts and others I use kustomize.


For my example, I've got
/automated/common/traefik
Chart.yaml
values.yaml


in my Chart.yaml I've got

name: traefik
apiVersion: v2
version: 1.0.0
dependencies:
- name: traefik
  repository: https://helm.traefik.io/traefik
  version: 33.2.0

But If I go to https://github.com/traefik/traefik-helm-chart/blob/master/traefik/Chart.yaml
I can see they updated the chart to version: 35.2.0
Is there something out there I can use to detect that and change mine?

github actions? a script I can run?



https://redd.it/1ko4j35
@r_devops
Dynamic helm values files: ansible, terraform, or something else?

The title alludes to an x+y problem; the original problem is our project is currently repeating a crap ton of things in our values file and our projects continue to bloat.


For example: we share x volumes mounted across n subchart deployments, so in the parent chart we are specifying volume.mounts x times under subchart.extraVolumes n times.


I first wanted to try creating a parent dict containing all extraVolumes, and then distributing those values to their respective subchart.extraVolumes, but apparently that's not possible.


I got excited when I started reading about Values.global, but it seems to be completely useless unless a chart adds support for any and all variables to be overridden by the possible existence of a value (e.g. Values.global.extraVolumes); I imagine it'd be a lot more powerful if it could be referenced by parent and subcharts without the global key.


So now I'm wondering if I should pick ansible back up and write templates to generate values files in our ci pipelines. I read it was possible to do this in terraform too, but I'm not as familiar and would have to spend more time learning it for something that feels more complicated than it needs to be (i.e. just leave it alone and continue as is).


Relevant threads in my searching:

- https://github.com/helm/helm/issues/6699
- https://github.com/helm/helm/issues/30851
- https://github.com/helm/helm/issues/2492

https://redd.it/1ko84fw
@r_devops
Transferable Skills and Tools?

I am starting as a Systems Engineer soon in an OpenStack Red Hat shop with a couple years experience in support and product. I have a few different options of team I will be on and one is the SRE team, but at this company they only really touch OpsGenie, Dynatrace, Commvault backups, and CMDB in Servicenow. They have other teams that manage container orchestration (OpenShift), CI/CD pipelines, and automation tools (Terraform, Ansible, etc). My question is in order to learn transferable skills for future jobs as SRE, DevOps, and Platform Engineers at other companies, should I join the SRE team or join another team to learn Openshift, CI/CD, Terraform, Ansible, etc? Any help or recommendations would be appreciate since I want to learn as much as possible. I am also interested in their Web Infra and Linux teams.

https://redd.it/1kogwzh
@r_devops
Chainguard

I really hate Chainguard. It is so expensive and they say it’s open source but it’s not really open source.

https://redd.it/1kohh8j
@r_devops
When does kodekloud usually have discounts?

I plan on purchasing the standard plan for kodekloud so I can follow the sre or maybe even devops path with labs. Especially Kubernetes, docker, ansible, terraform, linux.

When does kodekloud usually have discounts? I read that sometimes there are steep discounts on the plans. Should I just wait for it?

Or is it better to just grab these courses separately from other places and by different people? I chose Kodekloud because it has labs ready and I tried the free docker labs and it is engaging to me.

https://redd.it/1koiyn6
@r_devops