Reddit DevOps
269 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How often do you guys use SSH?

I personally find it a huge hassle to jump to several severs and modify the same configuration manually. I know there are tons of tools out there like Ansible that automate configuration, but my firm in unique in that we have a somewhat small set of deployments in which manual intervention in possible, but automation is not yet necessary.

Curious if fellow Dev Ops engineers have the same issues / common patterns when interacting with remote severs, or it is mostly automated now days? My experience is limited so hard to tell what happens at larger firms.

If you do interact with SSH regularly, what’s the thing that slows you down the most or feels unnecessarily painful? And have you built (or wished for) a better way to handle it?

https://redd.it/1ilnf0x
@r_devops
Devops/DevSecOps graduation thesis ideas?

I'm currently working on my graduation thesis and looking for interesting topics related to DevOps/DevSecOps. I want to explore something that is both academically relevant and practically useful in the industry. I'm working as a software engineering now, but I have some certs in cloud, as AZ-104.

Some areas that have caught my attention include:

Security automation in CI/CD pipelines
Comparing traditional DevOps vs. DevSecOps implementations
Zero Trust security models in DevOps environments
Security in Cloud

I'm open to suggestions, especially if you've worked on a similar topic or have insights into emerging trends. Any recommendations or resources would be greatly appreciated!

https://redd.it/1ilqcgy
@r_devops
Practicing with Terraform and Ansible

I understand, in principle, the functions of these two tools, but as I work to better understand where the lines are (can be, or should be) drawn, I'm still failing to understand. I'm currently running a Proxmox server, and would like to configure and provision some resources. To learn, while achieving a task that will help me, I want to build the following, using as much IaC tooling as possible (if I have to write my own Python scripts, or learn some Go, that's not out of the question):

Configure several VMs (Terraform)

On said VMs, provision a variety of Docker containers (Terraform or Ansible)

Manage configuration for these docker containers (Ansible)

Ultimately, I want to spin up the Pterodactyl (https://pterodactyl.io/) application on a webserver, spin up an instance of Wings (a daemon that Pterodactyl interfaces with to create docker containers), and then thru Pterodactyl's API, create and configure multiple game servers (minecraft) (Wings handles the spinning up of them, but I need to define their creation and resources, which can be managed via API), and then from here, configure these game servers with the correct settings and plugins. All while this is happening, I want to interface with and configure opnsense on my router to permit the correct ports and telegraf/influxdb for collection of metrics and logs.

The part that I'm getting the most confusion here is spinning up Docker containers - is Ansible or Terraform a better fit for this? I see plenty of Ansible modules available for configuring my applications, but not all of them would cooperate with an application running in a docker container. And secondly, interfacing with Pterodactyl, instructing it to spin up several game servers.

https://redd.it/1ilq7kr
@r_devops
Kubernetes deployment pods being restarted from time to time

I am beginner in DevOps. Any help would be highly appreciated.
I running several containers related to a website in a Kubernetes cluster.

While other containers are running perfectly in the cluster, there is one container that is being restarted continuously and its reason is "OOMKiilled". Here is the graph of its memory usage:
https://ibb.co/93S7bMWG
Also, this is the deployment with highest memory utilization out of all deployments.

Its cpu usage is completely normal (below 40%) at all times.

I have following resource configuration in its deployment yaml file:

resources:
requests:
memory: "750Mi"
cpu: "400m"
limits:
memory: "800Mi"
cpu: "600m"

Also, this deployment is running HPA with minReplicas: 2 and minReplicas: 4 with cpu-based autoscaling (80%).

Here is the memory usage of Node it is running on. All nodes have similar pattern.
https://ibb.co/hJkFPSXZ

Also, I have Cluster Autoscaler with max-nodes set to 6.
Cluster is running 5 nodes and all have this similar resource requests/limits:

Resource Requests Limits
-------- -------- ------
cpu 1030m (54%) 2110m (111%)
memory 1410Mi (24%) 4912Mi (84%)

Now my question is:

1. Isn't that resource request/limit for deployment is per replica?
2. (In Node) While RAM Free shows less memory available, it is using a lot of RAM Cache. Why my pod is being killed instead of reducing the cache size? (I recently upgraded the VM to higher memory to see if that solves the problem but I still have the same issue)
3. Those two replicas are running in separate Nodes (I checked that in grafana). Why they both are being terminated together?
4. Should I use memory based HPA or use VPA or stay with current configuration? And why?

Thank you.

https://redd.it/1ilze5h
@r_devops
They Said It Was Impossible… But Here We Are!

A couple of months ago, I asked about breaking into DevOps as an intern. The response?



"DevOps isn’t entry-level."
"Start in helpdesk and maybe in 10 years, you'll get there."
"DevOps is for the pros, not juniors!"

Well… today, I officially accepted a DevOps internship offer! 🎉💪

Moral of the story? Don’t let random internet strangers tell you what you can or can’t achieve. If you put in the work, learn, and stay persistent, you’ll find the right opportunity.

For those still trying to break in—keep going. The gatekeepers don’t get to decide your career. 🚀

https://redd.it/1im0lvz
@r_devops
Knowledge of nix/nixos any relevant in DevOps ?

Would you say learning nix/nixos is any relevant and has advantages in recruiting for DevOps ?

Very few companies use nix now, but I have the feeling that nix will become something relevant in the future, would you support this claim, what are your thoughts ?

https://redd.it/1im2chl
@r_devops
Help with monitoring system project

I'm doing a 6 month Internship and I was assigned a project to create for them a monitoring system.
They want to monitor metrics(cpu, mem, etc..), some services' logs such as apache(req/min, ddos, errors,..) and ssh, their saas, backend, websockets and applications.

They don't want to use any premade tools such as prometheus, grafana, new relic or anything similar. Instead, they said i have to create python agents for scraping metrics and logs and a develop flask/vuejs dashboard where I will visualize them, both in real time and provide a history.

During my research I've come across multiple technologies and libraries/packages to use.
For databases, I decided to go with InfluxDB for the metrics, and Elasticsearch for logs (though I hear it's very resource heavy?)

I'm still unsure how the data should be transmitted.
For metrics, to limit the traffic, my tutor suggested using mqtt to send the data to the dashboard in realtime and so the db isn't querried every x interval of time (I was thinking about using websocket), while simultaneously saving them directly from the target to the database (here I was thinking about storing them in batches to limit amount of requests, or use a websocket). The dashboard can retrieve history from database

For logging, I haven't conducted enough research as to how I should be using elasticsearch, or if i should.

I'm still a bit lost, as when it comes to monitoring all my projects used basic prometheus+grafana.

I need advice on what I should do considering above, did I choose the right technologies? Is the data collection mechanism fine, any important tips for things i'm unaware of or any sort of guidance, anything helps

https://redd.it/1im4wqp
@r_devops
Need help plz..

Recently I got selected as a jr devops engineer but I will be on probation for 3 months and then there will be a performance review which can result in a permanent role or most probably termination.

I don't have real time experience in Devops and I am freaking out now..

---------------------------------------------------

Here is the JD :-

Key Responsibilities:

Support in Continuous Integration/Continuous Deployment (CI/CD)

Assist in the setup and maintenance of CI/CD pipelines.

Monitor build and deployment processes to ensure smooth operation.

Learn and assist in the implementation of IaC using tools like Terraform, Ansible, or CloudFormation.

Support the automation of infrastructure provisioning and management.

Assist in setting up monitoring and logging tools.

Monitor system performance and generate reports.

Collaborate with development, QA, and operations teams.

Participate in training sessions and team meetings to enhance skills and knowledge.

---------------------------------------------------

Can anyone help me plz about what to learn and where to learn... 👏👏👏



https://redd.it/1im566q
@r_devops
Talk with your Kubernetes logs with natural language ( AI-driven K8S operator )

Can you talk to your Kubernetes cluster using natural language? Yes! I've implemented the simplest AI-powered interaction with Kubernetes to inspire others to explore this path further—or even transform K8S Whisperer into the Tony Stark of Kubernetes management. 🚀


Demo video :

https://www.youtube.com/watch?v=T3E9Wjbq44E&list=RDsa7uGYm-ixA&index=25


Source code :

https://github.com/ARAldhafeeri/K8sWhisperer-




https://redd.it/1im6tmo
@r_devops
I created 3 FREE AWS Practice Exams w/ hundreds of random questions to help you ace your certification (SAA, Cloud & AI Practitioner)🎯

I'm excited to share a comprehensive AWS certification practice pack with you! As someone who has navigated the AWS certification journey, I understand the importance of having access to quality study materials. That's why I've created this free resource pack featuring three complete practice exams:

You can access all three practice exams here

AWS Cloud Practitioner
AWS Solutions Architect Associate
AWS AI Practitioner

Each practice exam features hundreds of carefully selected questions covering all essential exam topics and domains. You can choose between two formats:

Basic mode: 35 questions, 40-minute duration
Full mode: 65 questions, 90-minute duration

Key Features:

Real-time score tracking during the exam
Detailed answer review to learn from your mistakes
Randomized questions for more effective studying
Comprehensive coverage of all exam domains
Matches the real exam format and difficulty

While these practice exams are valuable study tools, remember that hands-on experience is crucial! I highly recommend complementing your studies with AWS Skillbuilder for practical experience.

I developed these practice exams with dedication and care to support our community. While you'll find information about contributing to the project within the links, rest assured they will always remain completely free, regardless of contributions. I believe quality AWS certification preparation should be accessible to everyone!

Want to stay updated on future resources? Connect with me on LinkedIn!

https://redd.it/1im9y66
@r_devops
K8s CD tools where spoke clusters create connection to hub cluster

I'm investigating open source CD tools to deploy apps on multiple clusters running on IoT devices. We're considering something similar to a traditional hub-and-spoke pattern, but where the K8s agent/operator on the device cluster initiates the connection to the hub CD management plane. That means the hub no longer needs ingress to the devices hosting the cluster.

Does anyone know of CD tools that work this way? I have found ArgoCD Agent (https://github.com/argoproj-labs/argocd-agent), but that is still experimental. We're not married to GitOps tools, so open to alternatives.

https://redd.it/1im9j5m
@r_devops
Externalizing pipeline and making it consumable

Good news / bad news

Current application owners love my new pipeline….automated huge portions of the build and deployment process, I even built custom pieces to create RFCs 💅🏻

Bad news, entire org wants to move to my pipeline

So… for those who have done something like this. How do I do this without losing my mind?

I want to move individual steps out, token rotations, security scans, build steps… etc. Move them one at a time and make them consumable…?

Current application using gradle… some use
Maven… some both lmao

Was literally just told “You choose how to handle it”…

So… help? 😅😅😅

https://redd.it/1imazty
@r_devops
Are there versioning tools for deploying multi-service mono-repos?

I've been confronted with a reoccurring challenge where I have a multi service software architecture where all the services are sourced from the same git repository. These services are all deployed as Docker images to a Kubernetes cluster.

The challenge I have is that I only want to rebuild and redeploy the particular services that have actually changed. I also need to handle cases where one service may reference a common library which can also change.

So, let's say I have three services A, B and C. A and C both reference common library lib-FOO. So, for example:

If C's source code changes, I want C to be rebuilt and deployed whereas nothing happens to the other services ( A and B).

If the source code to lib-FOO changes, lib-FOO, A and C all need to be rebuilt and redeployed. B hasn't changed, so no actions should be taken for it.

Are there any specific tools or technologies I can use for this scenario?



https://redd.it/1ime7kj
@r_devops
What should a transitioning engineer know to be successful in dev ops?

Greetings,
I am a systems engineer working in defense, who also has experience in the embedded world. I am considering moving industries. (Never mind why)

I was taking with a friend of a friend the other day who works in back end web development. Going over my skills I was rather surprised that they suggested dev ops as a possible new role. Their reasoning: I often take old software and create integrations to keep them running in modern environments. Sometimes I use VMs, sometimes using docker/podman, and sometimes occasionally just recompiling the code with small changes to the code/build scripts. (This isn't my main role, just something I get tasked with regularly.)

Long story short, what kind of skills would look good to an employer for someone transitioning into this field? I.e. with experience but not directly related experience. Any certs or online classes worth checking out, whether for the resume or for practical knowledge?

https://redd.it/1imbt93
@r_devops
How often do you consult other team members?

TO preface, I am junior and I was lucky enough to land a role in this. But, i have been “let loose” so to speak, barely any oversight but at the same time. I dont have anyone to talk to or bounce ideas off of, other than my boss - who is the lead.

Got a recent request from the Dev team to implement some stages in our pipelines and it will require a good amount of net new changes. I have an idea as to what I need to do, dont have it all yet figured out but will go step by step(Yes I know 90% of you are experts, i am a novice.)Wanted to pull in the boss man, or just ping him and let them know of my recent request, and maybe jot down some high level ideas of next steps. Boss is a busy person, but usually answers my questions, albeit , very brief responses.


Would this be a bad look to even reach out and say “ Hey boss man, got this recent request which came in, have an idea of how to go about it, but wanted to bounce some ideas off of you” or even just provide a high level outline of what i would be working on . Trying to move up to a mid-level and dont want this to mess my chances up. I’m probably freaking out over nothing but still…

https://redd.it/1imh1af
@r_devops
Seeking Advice on Managing Environment-Specific Configuration without changing code

I’m working on a Next.js project deployed to OpenShift using Docker, with five distinct environments. The issue is that the project needs pre-build configuration, which includes environment-specific variables that are injected in the entry point with a prebuild Node.js script. Every time there’s a change in the environment (like modifying an ingress in a deployment), I need to adjust the configuration file for that environment and redeploy.

I can’t use OpenShift’s configMaps or secrets because they’re meant for runtime, and I don’t want to have a separate Dockerfile for each environment. So, I’ve considered two possible solutions:

1. Create an API that provides the configuration values before the build, basically a Service Discovery pattern. This way, in the script that runs before the build, I can make a request to get those variables. I’ve heard HashiCorp Consul could be used for this, but I’m looking for something free.
2. Do the build with placeholders for those values (like the environment's hostname) and use a script with envsubst in the Dockerfile before starting the container to replace the placeholders. This would involve analyzing many files during the build.

Both solutions would require a redeployment, but at least I wouldn’t need to modify the code directly. Plus, the API solution could work for other projects that also need environment-specific variables during the build. The purpose of this post is to get some feedback, as I’ve been thinking about this problem for so long that I might be overcomplicating the solution.

https://redd.it/1imhsq1
@r_devops
Kubernetes is the new Jenkins....

With all the operators, extensions and nightmare of keeping it up to date kubernetes is the new Jenkins....

https://redd.it/1imk1ry
@r_devops
NOC to DevOps or Cybersecurity?

Hey guys, a little bit about myself, I'm turning 22 soon, I've started my career through Tier 2 Support Specialist for 2 Years, made a transition to NOC within the same company (Present) currently I'm in this position for 4months.

This department will be closed In May and I don't know which path to choose.

NOC > Junior DevOps or other similar rules, (Existing experience In DevOps tools and cloud while I keep learning during my work)
NOC > SOC / Incident Respone Analyst (where I need to learn from scratch but I've always had passion for ti)

If you guys were in my shoes, which path you'd choose or what you'd do if u were me?

If there are any more similar rules feel free to list them here.

Thank you guys it means a lot!

https://redd.it/1imkv5l
@r_devops
Anyone cares for Datadog Vector as SaaS?

I see developers and DevOps struggling with running in-house OTel pipelines.
Would you like to subscribe to a SaaS version of Vector (https://vector.dev/)?

The only prohibitive cost would be the outbound cost, which can be offset if the SaaS service provides a CloudFront link to send the data to since the cost of EC2 to CloudFront is USD Zero.

Would you still not use this service, and why?

https://redd.it/1imjuas
@r_devops
How to Publish to GitHub Pages From Another Repository

Hey DevOps folks!

I wrote a detailed guide on deploying static sites from one GitHub repository to another using GitHub Actions and OpenTofu.

This setup is particularly useful if you want to:

- Keep your source code private while using free GitHub Pages hosting
- Manage infrastructure as code using OpenTofu/Terraform
- Automate cross-repository deployments with GitHub Actions

The guide walks through:

1. Setting up the target GitHub Pages repository
2. Configuring the source code repository
3. Creating necessary deploy keys and GitHub Actions workflows
4. Implementing the deployment pipeline using OpenTofu
5. Managing the infrastructure with Terragrunt

All code examples are provided, including complete GitHub Actions workflows and OpenTofu configurations.

https://developer-friendly.blog/blog/2025/02/10/how-to-publish-to-github-pages-from-another-repository/

Let me know if you have any questions!

Please share in the comments if you prefer an alternative approach.

https://redd.it/1imb0fy
@r_devops
YAHH - Per-project history file

YAHH is a Zsh-based tool that helps you manage separate command histories on a per-project basis. Instead of having one global history file or one per directory, YAHH allows you to keep distinct histories—called realms—for each of your projects.

This makes it easier to recall recurrent commands that are specific to a given project or operational environment, useful in professional services, consulting and other context-switching role.

https://github.com/Positronico/yahh

https://redd.it/1imt4fz
@r_devops