Reddit DevOps
271 subscribers
22 photos
31.3K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Manage spot nodepool in GKE

Hi Everyone,

We run about 60-65% of our workloads on spot VMs, but during peak hours we usually hit stock out and new pods are usually in pending state for long hours waiting for a spot VM. So we have implemented 2 ways to improve this state.

1. deploy the same deployment on payg nodepool with a higher hpa threshold, so it scales when spot pods doesnt scale.

2. create a nodepool with different series of machines with same configurations, taints and labels, but at times one nodepool doesnt scale even if it isnt hitting stock out, whereas the other nodepool would have stocked out.

Are there any better ways you guys tackle the stock out situation ? Kindly advice.

Thanks !

https://redd.it/1g5wxze
@r_devops
How do you handle urgent communication when a team leader from another team doesn't respond?

Hey all,

I’m curious about how you handle situations where you need an urgent response from a team leader in another department or team, but they take hours to reply. For example, if you're waiting for critical input and it's been over two hours with no answer—how do you proceed?

Do you escalate the issue, follow up with other team members, or try to find alternative solutions? How do you balance urgency without coming across as too pushy?

https://redd.it/1g5z5y6
@r_devops
Move from Sprints to KanBan?

For a long time I have hated sprints and pretend agile for DevOps in our organisation. We plan, refine, retro, demo - blah blah blah - the whole thing in our org is pointless because we plan/refine but when the sprint starts we abandon the sprint for another team’s work because they didn’t think to involve us early enough or they say no when we asked if they had any requirements, and now we’ve blockers.

Instead, we’re going to use KanBan. Anything in to-do can be picked up by anyone at anytime. Items only move from backlog to to-do after it has been refined, backlog items don’t show on the board, and board is DevOps only work. We no longer include other teams work on our KanBan board.

I’m not saying we won’t do work for other team. Instead we rotate through other teams. Embedding in their process and our tasks in their epics belong on their board because they belong to that team.

My fear is - how will be prioritise our own board if we’re embedding in other teams. What happens to unfinished work after a rotation.

What are peoples thoughts? Is there a better approach?

https://redd.it/1g61gzq
@r_devops
What happens if I have multiple IP addresses in a single weighted routing record in route 53?

Basically the title.

I am in the process of migrating from simple routing to weighted routing and wanted to test using a few servers.

Currently, we have a single A record which is simple routing, it consists of all the server IPs.

I am trying to take out some servers and add some weighted routing entries for the same.

If I have 3 records,
Record A - weighted, 2 IPs, weight 50
Record B - weighted, 1 IP, weight 50

Will each of the IPs in record A get equal traffic, I.e 25%?

I was not able to replicate the above.

Please help.

Thanks in advance.

https://redd.it/1g67daj
@r_devops
First time creating a simple CI/CD pipeline help

Hey guys,

I'm a self learning developer trying to broaden my skills and learn more about cloud services and CI/CD pipelines and all that. To start simple I want to host my personal website on an EC2 instance and set up github actions to containerize and deploy to my instance such that when I push changes to my repository, that all happens automatically. My first basic question is how do you handle private environment variables? Do you use AWS parameter store? I'm using Go and the 'godotenv' package, but if I change my code around to use the AWS Parameter store, how would I test it locally? Since I wouldn't have access to the parameter store, right?

Thanks!

(you'll probably see me a lot around here in the near future 🙌)

https://redd.it/1g6auuk
@r_devops
Interview experience

I recently interviewed in mid level company where they rejected me in second round stating that I do have experience in CI/CD tool like Jenkins , harness but i don't have experience in azure cicd. Why can't they understand it's just a tool, logic and workflow is same everywhere.

https://redd.it/1g6b7yf
@r_devops
How Well Does "all in the same repo" CDK approach Scale?

I asked this in the r/aws (here) but will also post on here.


I am in the process of adopting and learning about CDK for our large-scale microservices architecture. What I want to know is how well does it scale when used in an environment with 100s+ of microservices and pipelines.

Has anyone got any recommendations on best practices in terms of structuring and managing CDK for scale? Does anyone have experience using CDK in environments with 100+ microservices?

I can see that the biggest shift with CDK is essentially coupling the CI/CD config, infra config and application code all in the same repo. How does this approach/recommendation scale?

Let's say I have 100s of microservices and I need to update CI/CD or some infra config across all. Every time you make a change to the pipeline config in the repo, you are potentially "touching" the app and making a release. I can accept the changes to the infra "close" to the app like Lambda config, SQS etc., but I'm not sure CI/CD config is the same.

How do others manage updates to shared infrastructure or CI/CD configurations across multiple services?

Also, regarding self-mutating pipelines: it's something I tried 5 years ago with raw CloudFormation but found that if there was a change to the CodePipeline executing the change to itself, the execution would instantly fail and I would need to rerun it. Has this been fixed?

Lastly, why would a developer want to see the "pipeline update" step execute and do nothing 99% of the time, just wasting time and slowing down the CI/CD cycle?

I'd love to hear about your experiences and best practices for using CDK at scale. Any insights would be greatly appreciated!



https://redd.it/1g6dfye
@r_devops
DevOps/Cloud Engineers: How Do You Manage and Visualize Your Infrastructure?

Hey everyone! 👋

I’m working on a new tool concept to help DevOps teams and cloud engineers better manage their infrastructure. Before diving deeper into development, I’d love to get some feedback from the experts—that’s you! 😊

If you have a few minutes, could you help me by answering some quick questions?

# 1. Visualizing Cloud Infrastructure

How do you currently visualize your infrastructure (e.g., cloud resources, clusters, VMs)?
Do you rely on built-in cloud tools, open-source solutions, or paid platforms?

# 2. Managing Costs and Outdated Components

How do you track cloud costs across different resources?
Do you encounter issues with outdated Terraform modules, providers, or infrastructure drift? If so, how do you handle them?

# 3. What’s Your Biggest Pain Point?

What’s your biggest challenge in managing infrastructure or optimizing cloud costs?
If a tool existed that could simplify visualization and provide actionable cost insights, would that be valuable to you?

Thanks a ton in advance! Your insights will help us make something that really solves problems for the community. 🙌

https://redd.it/1g6dazv
@r_devops
How can a devops engineer develop in machine learning?

Hi everyone, I'm a Devops engineer and I really love my job. I love doing linux, writing scripts, configuring networks and so on. But I also love math and algorithms and data structures. I want to participate in the development of artificial intelligence, apply my math knowledge, but still keep doing devops. Any advice?

https://redd.it/1g6gacx
@r_devops
My employer is offering me a 65% raise and a bonus in the next pay cycle if I rescind my 2 weeks notice.


In the past year working in a start up, I had made a transition working as a senior cloud infrastructure engineer to a junior and now mid level full stack engineer. 2 senior cloud guys and 1 senior full stack engineer decided to leave our company to take roles in FAANGs (who also happen to be our customers for our product) these last few months. Although we re’orgd and some duties got divvied out amongst us. I got bombarded doing my job and taking on cloud duties again. My mental health has been killing me with deadlines, and management asking us to push new releases on a Friday, which takes up some of my weekend. I’m just so done. I been offered employment elsewhere and put my notice in so I can take a month off for vacation and reset. Well I got a call almost instantly from the CTO, Product, and CEO about anything they can do to keep me including offering me a promotion to senior, a huge raise, focus on backend development only, and a $25k retention bonus on the next pay cycle. The raise is about 10% more than the new employee is offering.


They want to give me the weekend to think over it. I’m contemplating on whether I should take the offer or not.

https://redd.it/1g6he1w
@r_devops
Promoted to Manager

I've been in the sysadmin space for 7ish years, plus another 3 as DevOps engineer. The change into DevOps at first was a bit rocky and I think I suffered from imposter syndrome. I was the weakest one on the team (in terms of hard technical background and devops "years"). In recent months I realize I actually did beat out my coworkers at other skills: communication, organization/planning, enabling/empowering other engineers, not letting "great be the enemy of good", making actual progress. Recently we had a bit of a reorg and now I am managing the DevOps team. In a way it makes sense to me, I know the principles and our goals and the bite-sized chunks it takes to get there. I'll never be able to write slick bash one-liners on the fly with 5 ppl in a zoom meeting watching. Sorry for the rant and tamble. Long story short: any tips or suggestions for this transition from engineer to managers? Do you think I have enough background to succeed? Any suggested material or reading? (rn I've been reading Radical Candor)

https://redd.it/1g6hdmb
@r_devops
Books for experienced DevOps?

Hey
I would like to hear recommendations about books that go beyond explaining what is DevOps from zero

I'm thinking about The phoenix project as I heard its great for both beginner and experienced engineers in IT, and also heard about SRE Orielly book
What you guys think?




https://redd.it/1g6kon6
@r_devops
The new release of Dockerfile.app has launched.

Visit https://www.dockerfile.app

Features:
→ Save dockerfiles
→ Browse them
→ Upvote them
→ Search for dockerfiles
→ Create an account

All to create a community-driven location to get top-notch dockerfiles for all languages and frameworks.

Bugs? Let me know.

Feedback is welcome.

https://redd.it/1g6m2ce
@r_devops
Are there any DevOps or Infrastructure-type jobs where you work on boats?

Weird Q. But any need for these roles on oil rigs or cruise ships or anything similar? I had a random thought that it might be a fun alternative / break out of normal society for a while. But no idea if there is any demand for these kind of jobs in those environments. Presumably basic IT support, but not sure what else.

https://redd.it/1g6mlxi
@r_devops
Looking for devops role ; need suggestion

Hi there everyone.

I’m a student currently doing a bachelor in here USA and looking for devops role.

Primarily I did devops for 1 year maintaining full stack apps, automated build using GitHub ci cd and reproducible servers with nixos. I’m proficient with rust, python and basic web development but I’m sure I can learn anything quick if there’s any new tech part to learn.

My current financial situation is going really tough right now. if there’s anyone who’s got job or have any opportunities. Please help me out.


Appreciate it 🙏

https://redd.it/1g6seyv
@r_devops
Management platform

Does any integrated open source software exist to manage Servers, K8s clusters, credentials etc. dor technical devops guys? I can’t find something besides portainer, lens and the usual suspects.

https://redd.it/1g6lh4e
@r_devops
Interview Mess - DevOps

I'm giving interviews these days and I've good understanding of DevOps tools and technologies. But whenever I go into the interviews, interviewers start asking troubleshooting questions and other issue and I've not faced this issue so I'm not sure how to answer these questions. And gets rejected. I know I'm not expert in all these but none of them consider basic understanding of the same.. getting rejected day by day... 😕
What should I do..? There is so much to learn.

https://redd.it/1g6ya2c
@r_devops
Monitoring setup with Grafana Alloy and Mimir

Hello everyone, currently I'm working in startup and we just have one cluster that we would like to have logging and monitoring setup. This is first time Im setting up logging and monitoring and I am not able understand if I will need Prometheus or not as Alloy can directly write to the Mimir. Is there any benefit we get if Alloy sends data to Prometheus and Prometheus writes to Mimir?

https://redd.it/1g726gb
@r_devops
Seeking Some Words of Wisdom

Hi all,

I’m currently working as a Platform Engineer at a large multinational company, but my journey here has been anything but straightforward. I started my career 11 years ago as a .NET developer. After a few years, I began feeling stagnant and found myself drawn to the world of cloud technologies. Driven by this passion, I started teaching myself everything I could from online tutorials and guides, determined to gain the necessary skills in platform engineering.

About 3-4 years ago, I took the leap and fully transitioned into the Platform Engineering space, and, I’m happy with that decision.
However, I’m constantly reminded of just how fast the world of DevOps evolves—especially with the rise of GenAI, MLOps, and other emerging technologies. It’s exciting, but also overwhelming at times.

No matter how much I learn or how many projects I work on, I can’t shake the feeling that it’s never enough. I struggle with the question of whether I’m truly “qualified” to call myself a Platform Engineer. I don’t hold any formal Kubernetes or cloud certifications, but I’ve gained hands-on experience working with these technologies. Still, the lingering doubt remains—how much is enough?

I find myself feeling uncertain about areas like networking and Linux, especially since I transitioned from a purely Windows-focused background. This sense of not knowing enough sometimes makes me question my place in this field.

I’m hoping to hear from others who may have faced similar feelings or have advice on how to navigate these challenges. How do you balance continuous learning with feeling confident in what you already know? How do you define “enough” in a field that never stops changing?


https://redd.it/1g7684l
@r_devops
Automate Deployment config changes?

There is something I have always been wondering about how to best solve this. The problem:

Deployments in Kubernetes cluster based on helm and ArgoCD. Now most things can be automated quite easily with this setup but what always seems to become troublesome in bigger projects are changes to configmaps and secrets especially staging these when they are environment specific.

Current setup:

Developers try to document all required changes and set values in a secret store that is referenced. This however still requires a lot of effort before deployments to change some environment variables in helm charts and secret references etc.

Is there a setup to fully automate this easily? We have a ton of different staging environments >25...


Edit: All generic environment variables and configmaps get baked into the helm base charts/images already

https://redd.it/1g76zcp
@r_devops