Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
# bcrypt hash of the string "password": $(echo password | htpasswd -BinC 10 admin | cut -d: -f2)
hash: '$2a$10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W'
username: 'admin'
userID: '08a8684b-db88-4b73-90a9-3cd1661f5466'


# Enable local users
enablePasswordDB: true
# Allow password grants with local users
oauth2:
passwordConnector: local


I've run the following on GCP:
    gcloud iam workload-identity-pools create $POOL_ID \
--location="global" \
--description="Pool for Hetzner workloads" \
--display-name="Hetzner Pool" \
--project=$PROJECT_ID



gcloud iam workload-identity-pools providers create-oidc $PROVIDER_ID \
--location="global" \
--workload-identity-pool=$POOL_ID \
--issuer-uri="https://auth.example.ai" \
--allowed-audiences="//iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$POOL_ID" \
--attribute-mapping="google.subject=assertion.sub,attribute.email=assertion.email,attribute.groups=assertion.groups" \
--project=$PROJECT_ID

gcloud iam service-accounts add-iam-policy-binding $SERVICE_ACCOUNT \
--member="principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$POOL_ID/subject/$SUBJECT" \
--role="roles/iam.serviceAccountTokenCreator" \
--project=$PROJECT_ID

gcloud iam workload-identity-pools add-iam-policy-binding $POOL_ID \
--location="global" \
--member="principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$POOL_ID/subject/$SUBJECT" \
--role="roles/iam.workloadIdentityUser" \
--project=$PROJECT_ID







https://redd.it/1li7cn4
@r_devops
How to reach the devops or cloud people that need remote support?

So I'm a person from DevOps and Cloud field, and started my gigs on fiverr. I've been thinking about how to gets or reach those clients through mail. I've been doing client support and remote support work for few clients and I'm starting towards freelancing. So what are your thoughts, how will you reach somebody for work support etc?

https://redd.it/1li89s3
@r_devops
AWS terraform documentation feels like trash

Hi, I recently started working on AWS using terraform. And to be honest I am quite disappointed with the implementation of modules and their official documentation. I also work with azure using terraform and their implementation and documentation of modules A4 much more comprehensive, mature and well designed.

Do you also face issues while working with AWS terraform?What do refer when you're stuck ? Would love to hear your thoughts and experience.

Thanks in advance.

https://redd.it/1liat8s
@r_devops
Airflow webserver UI - integrate LDAP with Kerberos?


Is it possible to do away with ldap bind username and password and instead use Kerberos instead? We are on airflow2 and a lot of the answers is for airflow1. There is also a lack of examples on implementing this. Please is anyone able to advise?

https://redd.it/1liersm
@r_devops
Will learning devops help me become a better backend developer?

I have studied primarily Java and Python for 2 years. I love backend and have built a couple of rest APIs. But I’m still a newbie and want to get even better at it.

I’ve got 2 options now:
A) study devops for 2 years, this is new for me
B) study frontend for 2 years, this is not new for me, so I would just take a lot of the free time to build my own projects

Now the only reason I am considering devops is that I don’t know much about it, so if it can actually help me become better at backend, I would love to study it for that sake!

https://redd.it/1lif7ja
@r_devops
Lessons from comparing SSO vendors for a growing SaaS platform

We had to scale from homegrown auth to proper SSO and dug into a bunch of vendors — from developer-focused ones like FusionAuth and WorkOS to enterprise stacks like Okta and Microsoft Entra.

Comparing deployment models, docs, SDKs, SCIM support, and pricing taught us a lot.

Anyone else go through this recently? Curious what you optimized for — integration speed? CIAM vs workforce? Multi-tenant support?

https://redd.it/1lihsul
@r_devops
new to grafana - display mem usage and limits from containers

Hi I am new to K8S and Grafana. Mainly worked on AWS IAC the last few years.

I am using the official traefik dashboard in grafana and trying to extend it to also display the pod memory usage, limits and requests.

I am having to use two different metrics endpoints (kube_pod_* and go_mem_*) to achieve this and unable to get the dashboard to work in such a way that the limit and cpu switch between the different services from the dropdown box that acts as a filter.

Anyone able to explain where I'm going wrong or able to help. Tried copilot with no luck. real humans are required.


"pluginVersion": "10.4.12",
"targets":
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"editorMode": "code",
"expr": "go_memstats_sys_bytes{container=~\".*traefik.*\", service=~\"$service\"}",
"instant": false,
"legendFormat": "{{container}}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "c8cf1b2b-d68b-4b9a-93c0-e3520f97bcf3"
},
"editorMode": "code",
"expr": "label_replace(\n kube_pod_container_resource_requests{container=~\".*traefik.*\", resource=\"memory\"},\n \"service\", \"$1\", \"container\", \"(.*)\"\n) ",
"hide": false,
"instant": false,
"legendFormat": "{{service}}-limits",
"range": true,
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "c8cf1b2b-d68b-4b9a-93c0-e3520f97bcf3"
},
"editorMode": "code",
"expr": "label_replace(\n kube_pod_container_resource_requests{container=~\".*traefik.*\", resource=\"memory\"},\n \"service\", \"$1\", \"container\", \"(.*)\"\n)",
"hide": false,
"instant": false,
"legendFormat": "{{service}}-requests",
"range": true,
"refId": "C"
}
,
"title": "Memory Usage",
"transformations":
{
"filter": {
"id": "byRefId",
"options": "B"
},
"id": "filterFieldsByName",
"options": {
"byVariable": true,
"include": {
"variable": "$service"
}
},
"topic": "series"
},
{
"filter": {
"id": "byRefId",
"options": "C"
},
"id": "filterFieldsByName",
"options": {
"byVariable": true,
"include": {
"variable": "$service"
}
},
"topic": "series"
},
{
"filter": {
"id": "byRefId",
"options": "A"
},
"id": "filterFieldsByName",
"options": {
"byVariable": false,
"include": {
"variable": "$service"
}
},
"topic": "series"
}
,




https://redd.it/1ligynd
@r_devops
Ory Kratos for new projects in 2025?

I like the idea behind Ory Kratos and since I only need authentication (authorization is handled elsewhere) I took a closer look and built a small PoC for my workflow. There are quite a few inconsistencies in the API, documentation and code examples unfortunately and the repository doesn't see too many commits anymore. I wonder if it's still a good choice for new projects in 2025.

Has anyone here experience with the self-hosted version of Kratos and would like to share it?

https://redd.it/1lijs20
@r_devops
Good resources/path to learn and move to devops

I’m in QA Automation since past 4ish years and recently have started losing interest in the field.

I do manage pipelines and some part of QA infra, and I have grown interest in DevOps recently.

I’m struggling to find good resources and path to learn devops, has anyone found any good resources that they can share?

Before starting learning I’m someone who would like to know the outlines of what I’ll learn and what’s next to learn hence would like to know the path to follow as well!
Thank you!

https://redd.it/1lik3v2
@r_devops
Best approach to prevent Windows reboots

Hello DevOps fellows. I'm working on a Jenkins pipeline that manages Windows 10 hosts, and I need to check for pending Windows updates and reboots to prevent unexpected interruptions during pipeline executions in these hosts.

Currently I'm calling two powershell scripts that returns to me if there is any updates/reboots pending, but I can't get the time remaining until Windows forces a reboot and somethimes the pending updates scripts fails (don't know why :-( ).

Did any of you already had to implement something like this? If so, how? Any tips?

I tough in searching for a patch management tool, but didn't found anything opensource to test.

Thanks in advance!



https://redd.it/1ligwix
@r_devops
Looking for recommendations of open-source projects to showcase DevOps/Kubernetes skill

I'm preparing for job interviews and want to demonstrate my DevOps skills: CI/CD pipelines, IaC, Kubernetes deployments, Helm, monitoring, and more.

I'm looking for recommendations of small to medium open-source backend projects (preferably Python or Go) that I can fork and use to build a full infrastructure around — including Kubernetes manifests, Helm charts, pipelines with Jenkins, and cloud infrastructure setup.

I won't be modifying or developing the application itself — just using the fork for demo purposes (e.g. to run in my own cluster and show full DevOps lifecycle). I won’t publish or promote my fork as a new product or separate project.

Any recommendations would be greatly appreciated. Thank you!

https://redd.it/1lio497
@r_devops
From Bash Scripts to the Cloud: Where Do I Go From Here?"

Hey folks,

I’m someone who has a solid interest in Linux and the command line. I’ve been learning the basics of operating systems, Linux, and bash scripting, and I find myself really enjoying the terminal workflow and the logic behind automating things.

Now, I want to break into the Cloud/DevOps domain — but I’m not exactly sure where I stand and what entry points would make the most sense given my current skillset.

Here’s what I currently know:

Basic OS concepts (processes, memory, etc.)

Linux fundamentals (file system, permissions, package managers)

Bash scripting (basic to intermediate level)

Comfortable navigating and working on the Linux CLI

What I want to know:

1. With this skillset, what kinds of roles should I target? (internships, junior DevOps roles, etc.)


2. What should I start learning next to become job-ready in the cloud/devops space? (e.g., Git, Docker, CI/CD tools, cloud platforms?)


3. Is it possible to land a Cloud/DevOps internship or entry-level role before being fully certified or “expert” level in everything?


4. Any roadmap or learning path recommendations that build naturally on top of my current Linux CLI knowledge?



Would love to hear from people who’ve walked a similar path or are working in the domain. I’m motivated and committed to keep learning, and I feel like I’m finally heading in the right direction — just need some guidance.

TL;DR:
I know Linux, OS basics, and bash scripting. I love using the CLI and want to get into the Cloud/DevOps field. What kind of roles can I aim for now, and what should I learn next to improve my chances of landing an internship or junior role?


https://redd.it/1liec2w
@r_devops
Leveraging Your Prometheus Data: What's Beyond Dashboards and Alerts?

So, I work at an early-stage ISP as network dev and we're growing pretty fast, and from the beginning, I've implemented decent monitoring utilizing Prometheus. This includes custom exporters for network devices, OLTs, ONTs, last-mile CPEs, radios, internal tools, network Netflow, and infrastructure metrics, all together, close to 15ish exporters pulling metrics. I have dashboards and alerts for cross-checking, plus some Slack bots that can call metrics via Slack. But I wanted to see if anyone has done anything more than the basics with their wealth of metrics? Just looking for any ideas to play with!

Thanks for any ideas in advance.

https://redd.it/1lj25sy
@r_devops
How to Deploy a Containerized Backend for Free?

Howdy!! I’m working on a small charity project for a client and I’m trying to stay entirely within the free tier. The backend is built with microservices and includes:
- A Redis container
- A PostgreSQL container
- An API Gateway using Spring Cloud
- Around 6 Microservices for business logic

In terms of infrastructure the project is not expecting great demand of users, around 100 are expected. So I was planning to use Oracle Cloud’s Free Tier VMs, install Docker, and run all the services there.

Additionally, I’m considering running Prometheus in a separate VM for monitoring and logging.

Are there better (still free) alternatives you'd recommend for containerized deployments?

https://redd.it/1lj2yrs
@r_devops
Built an AI agent for adaptive security scanning - lessons for infrastructure automation

Traditional security scanners are the worst kind of infrastructure tooling - rigid, fragile, and break when you change one config. Built a ReAct agent that reasons through targets instead of following predefined playbooks.

The infrastructure problem: Security scanning tools are like bad Ansible playbooks - they assume everything stays the same. Change a port, modify a service, update an endpoint - they fail. Modern infrastructure needs adaptive automation.

What this agent does:

Reasons about what to probe next based on discovered services
Adapts scanning strategy when it encounters unexpected responses
Chains multi-step discovery (finds service → identifies version → tests specific vulnerabilities)
No hardcoded scan sequences - decides what's worth checking

Implementation challenges that apply to any infrastructure automation:

Non-deterministic tool execution (LLMs sometimes get lazy and quit early)
Context management in multi-step workflows
Balancing automation with reliable execution patterns
Token cost control in long-running processes

Results: Found SQL injection, directory traversal, and auth bypasses through adaptive reasoning. Discovered attack vectors that rigid scanners miss because they can actually think through the target.

Infrastructure automation insights:

LLMs can make decisions impossible to code traditionally
Need hybrid control - LLM reasoning + deterministic flow control
State management crucial for complex multi-step operations
Adaptive logic beats rigid playbooks for unknown environments

Think of it as Infrastructure as Reasoning instead of Infrastructure as Code. Could apply similar patterns to any ops automation that needs to adapt to changing environments.

Technical implementation: https://vitaliihonchar.com/insights/how-to-build-react-agent

Anyone experimenting with LLM-based infrastructure automation? What patterns work for reliable execution in production environments?

https://redd.it/1lj4rjh
@r_devops
I’m starting my DevOps journey, So what skills, tools, and real-world challenges should I focus on mastering?

Hi everyone!

I’m an engineering student / early-career professional interested in becoming a DevOps engineer. I don’t just want to study theory or pass certifications, I really want to master real-world skills, work on solid projects, and understand what DevOps looks like in production environments.

I have a few questions and I would love to hear from those with experience:

1) What tools, practices, and concepts did you find most important when working as a DevOps engineer in real-world jobs?

2) What challenges did you face that theory/certification didn’t prepare you for?

3) If you could go back and guide your beginner self, what would you focus on learning or practicing early?

4) What kind of projects (personal or in a lab) would actually make me job-ready?

5) What mistakes do DevOps beginners usually make that I should avoid?

I’m especially interested in AWS, CI/CD pipelines,Terraform, Docker/Kubernetes, and automation but open to all advice!

Thanks so much for your time, looking forward to learning from your experience!

https://redd.it/1lj48nu
@r_devops
TOP 10 DevOps Tools in 2025: Based on 300 LinkedIn job posts

Hey folks,

Recently I was looking for a new job and got curious about what DevOps tools are actually in demand right now, what I did is:

- Analyzed 300 recent LinkedIn DevOps job posts,
Then I used AI to analyze the job descriptions and pull out the most mentioned tools
- Cross-checked with my own experience,
tbh I added all data and asked chatgpt to write up the rest so data is from me but writeup is not. Still imo it's quite useful.

1. GitHub Actions
2. Terraform
3. Kubernetes
4. ArgoCD
5. Docker
6. Jenkins
7. Prometheus
8. Ansible
9. Vault
10. Pulumi

Honorable mentions: GitLab CI/CD, Helm, Grafana, AWS CodePipeline.

If you want the full breakdown (and some honest pros/cons for each tool), I put together a full article here: https://prepare.sh/articles/devops-job-market-trends-2025

Would love to hear what tools your team is actually using, or if there’s anything you think should’ve made the list.

https://redd.it/1lj93a1
@r_devops
These 5 small Python projects actually help you learn basics

When I started learning Python, I kept bouncing between tutorials and still felt like I wasn’t actually learning.

I could write code when following along, but the second i tried to build something on my own… blank screen.

What finally helped was working on small, real projects. Nothing too complex. Just practical enough to build confidence and show me how Python works in real life.

Here are five that really helped me level up:

1. File sorter Organizes files in your Downloads folder by type. Taught me how to work with directories and conditionals.
2. Personal expense tracker Logs your spending and saves it to a CSV. Simple but great for learning input handling and working with files.
3. Website uptime checker Pings a URL every few minutes and alerts you if it goes down. Helped me learn about requests, loops, and scheduling.
4. PDF merger Combines multiple PDF files into one. Surprisingly useful and introduced me to working with external libraries.
5. Weather app Pulls live weather data from an API. This was my first experience using APIs and handling JSON.

While i was working on these, i created a system in Notion to trck what I was learning, keep project ideas organized, and make sure I was building skills that actually mattered.

I’ve cleaned it up and shared it as a free resource in case it helps anyone else who’s in that stuck phase i was in.

You can find it in my profile bio.

If you’ve got any other project ideas that helped you learn, I’d love to hear them. I’m always looking for new things to try.

https://redd.it/1ljant6
@r_devops
How to be the devops lead?

I recently joined a company as the devops lead. The have been running their infra over github actions and azure container apps. While the deployment scripts themselves are a bit unweildy, it does seem to work more or less. But the team lacks confidence in launching the product because they think that the infra may be not up to the task.

I want to ask the folks here: what would be your goal in such a situation? Is there a platonic ideal of devops that we should aim towards? Where should i dedicate my efforts to? Basically i have been asked to improve the reliability of infrastructure and make it a more modern, flexible, easy to use, easy to rollback setup. I feel a bit lost because the space is too open and I am not sure where to focus my attention and approach this systematically.

I would love to learn about your philosophy of approaching devops and more high level concepts i should be aiming for.

https://redd.it/1ljchu8
@r_devops
Evaluated 15 SSO providers while scaling auth — here's what caught us off guard

We’re scaling auth for a multi-tenant SaaS product and needed to support enterprise SSO (SAML, OIDC, SCIM, etc.).
Expected it to be a quick eval, but ended up comparing 15+ providers: Okta, Auth0, WorkOS, FusionAuth, Ping, etc.

What surprised us:

SCIM support isn’t always included (and pricing is all over the place)
Admin UX + branding controls vary widely
Some dev SDKs were great, others were... painful
Session control and audit logs aren’t as standard as you'd think

We documented it all in a side-by-side matrix (happy to share if useful), but I’m curious:

If you've implemented SSO or CIAM recently — what were your dealbreakers?
Also, did you self-host (like Keycloak) or go fully managed?

Would love to hear what mattered most to this community.

https://redd.it/1ljdsin
@r_devops
how do you manage scheduled jobs inside your cluster's containers?

i am req. to develop/advise to schedule a job that runs in the app's backend.

which means i can't run cronjob container, since it can't run the code over the backend container-its its own container.

so i can use schedule library (python) or create a listen loop to SQS, or whatever.

but the problem is any listening to a cron/time based event requires INFINITE loop to listen.

that's a wtf moment to me. which i thought if the container already has the friggin date. why can't it simply run it according to its own date???

but no, it needs to count the seconds, the min. whatever. to run in the appropriate time.

so i might be totally uninformed. so i'd appreciate you directing me


EDIT:

the reason i don't want infinite loop, cause it sounds way too risky to put in production env. and can create unnecessary load, and in general doesn't sound like good practice, unless you really know how to create an efficient loop with all the error handling of an expert.

https://redd.it/1lja1df
@r_devops