Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Just started as a System Engineer — aiming for DevOps, need guidance on next

Hey everyone,I recently got my first role as a System Engineer. My work right now involves system administration, Linux basics, and some networking.In the long run, I want to move into DevOps — I’m interested in automation, CI/CD pipelines, and cloud platforms — but I’m honestly a bit nervous about the coding side. I’m fine with basic scripting, but anything beyond that feels intimidating.Here’s what I currently know:Comfortable with Linux (process/user management, permissions, disk mgmt, still learning networking)Basic cloud concepts (AWS fundamentals)Some exposure to shell scriptingBasic SQL & Excel (from previous work)I’d love advice on:Which DevOps tools or platforms to learn first (e.g., Jenkins, Docker, Kubernetes, Terraform, etc.)How much coding I realistically need to be good at for DevOps Recommended learning order so I don’t get overwhelmedAny tips for building a DevOps portfolio as a beginner.

https://redd.it/1mmeemy
@r_devops
Beginner to AWS (DAY 5) : rate the level of this project (also suggest me some good projects so that i'll be able to land an internship/job ) ps: i am currently in my last year of Engineering

Built a production-ready AWS VPC architecture:

• Deployed EC2 instances in private subnets across two Availability Zones.

• Configured Application Load Balancer for incoming traffic distribution.

• Implemented Auto Scaling for elastic capacity.

• Enabled secure outbound internet access using dual NAT gateways for high availability.

• Ensured fault tolerance and resilience with multi-AZ design.

https://redd.it/1mmkgt5
@r_devops
GDG Organizer: Done All Associate GCP Certs, Voucher in Hand - Which Professional Should I Pursue Next? (Seeking Advice)

https://preview.redd.it/owi4ldw028if1.png?width=414&format=png&auto=webp&s=b43e7c0183027f7456a4223ad6e5b6237ae4e5b6


Hey everyone!

As a GDG Organizer, I'm deep into cloud stuff. I've got a GCP cert voucher burning a hole in my pocket, and I'm ready to tackle a Professional cert after doing ACE and CDL.

My background is heavy in backend (Python/Go/FastAPI), DevOps (CI/CD, Docker, K8s, Terraform, Prometheus, Grafana), and I'm also into AI/ML (LLMs, GenAI).

Given that, which Professional cert would you recommend and why? (Architect, DevOps, ML Engineer, Developer, Data Engineer?)

Also, quick side note/vent: GCP support has been truly horrendous for me lately. Seriously the worst support system I've dealt with. Anyone else feel this pain?

Thanks for any tips!

https://redd.it/1mmnjqf
@r_devops
Looking for Suggestions on Solid, Interview-Worthy DevOps Projects

I’ve recently completed most of the core fundamentals in my DevOps learning journey, and now I’m looking for ideas for a solid, end-to-end project that can really stand out in interviews and demonstrate my skills.

Here’s what I’ve covered so far:

Networking fundamentals
Linux system administration and shell scripting
Docker & Kubernetes
Git and GitHub
Basic AWS (with 2–3 small practice projects)
Python scripting

I’ve also done some basic projects (like setting up my own Linux environment, simple CI/CD pipelines, etc.), but now I want to build something closer to real-world, infrastructure-related work that hiring managers would appreciate.

What would you suggest as an impressive, portfolio-worthy DevOps project that’s both challenging and relevant to industry use cases?

TL;DR: Finished core DevOps skills, looking for suggestions on a real-world, interview-ready DevOps project to showcase in my portfolio.

https://redd.it/1mmqhxa
@r_devops
How do you start planning and setting up a new project?

I’ve just been assigned to a project as a DevOps intern and I’m trying to understand how experienced engineers approach things from the start. When you join a project, how do you plan the architecture, choose the right resources, decide on integrations, set up CI/CD pipelines, and define the main components needed? I’m curious about the typical thought process and steps you follow so I can learn how to approach my own work better

https://redd.it/1mmtomn
@r_devops
Ops in DevOps reconversion with apprenticeship. I'm currently learning GitLab and Git while sending job applications. Any tips/hints/recommandation?

Here is my main roadmap until i find an job for my formation in apprenticeship, what do you think about it?

I built it by looking at job boards and my school roadmap, and what is the most demanded and priorise this + requirement (in exemple: i see that Kubernetes is super demanded but without Ansible, Terraform and Docker it's a no go... plus i wanted to put everything in git to learn it by practicing so...)

I have a Homelab based on an old gaming computer (with good CPU and 32Go ram... better than nothing to experiment!)

Anyway here is my actual roadmap (that i have in my README.md in my gitlab, my only file right now x) )
but tomorrow i'll start Docker so it'll be a good start to do a new project and put new files,, like the configuration, make it evolve with branches, take the habit of doing a good and easy to read/maintain main, make multi-branches, etc...)

I don't know yet what i'll put in my servers but probably things usefull for my Homelab+something interesting to show during job interviews!

EDIT: i'd already read a lot and talked about DevOps mindset :) so this part is already pretty well advanced.

\## Roadmap ###

Phase 1: Git Mastery

\- [x\] Create lab-infra project with clear README

\- [x\] First branches -> Merge Requests -> squash merge

\- [x\] Protect main (no direct push)

\- [x\] Install Git; Set user.name and user.email; add SSH key

\- [ \] Master Git conflicts resolution and multi-branch workflow

\### Phase 2: Docker Fundamentals

\- [ \] Install Docker Desktop on Windows

\- [ \] Build first containerized web service (Nginx)

\- [ \] Master Docker best practices (.dockerignore, non-root user, HEALTHCHECK)

\- [ \] Create multi-stage builds for optimized image

\### Phase 3: Terraform Infrastructure

\- [ \] Terraform basics: providers, resources, and state management

\- [ \] Deploy VMs on Proxmox with reusable module

\- [ \] Output VM IPs and names for automation chain

\- [ \] Version control Terraform configurations in Gi

\### Phase 4: Ansible Configuration

\- [ \] Ansible fundamentals: inventory, playbooks, and modules

\- [ \] Build dynamic inventory from Terraform outputs

\- [ \] Create roles for system hardening and Docker installation

\- [ \] Achieve idempotence for reliable automation

\### Phase 5: Kubernetes Orchestration

\- [ \] Deploy k3s cluster on Proxmox VMs using Terraform and Ansible

\- [ \] Master Kubernetes basics: pods, services, deployments

\- [ \] Implement persistent volumes and ingress controllers

\- [ \] Deploy applications with proper manifests

\### Phase 6: Monitoring and Observability

\- [ \] Deploy Prometheus and Grafana stack with automation

\- [ \] Configure node_exporter on all VMs automatically

\- [ \] Create custom dashboards and import professional templates

\- [ \] Setup Alertmanager and test alert workflows

\- [ \] Integrate Datadog for advanced monitoring capabilities

\### Phase 7: CI/CD Integration

\- [ \] Create GitLab CI/CD pipeline for Docker image builds

\- [ \] Implement automated security scanning with Trivy

\- [ \] Deploy to Kubernetes via GitOps approach

\- [ \] Add infrastructure quality gates (tflint, checkov, ansible-lint)

\### All Along: Professional Presentation And Habits

\- [ \] Architecture documentation with diagrams

\- [ \] Disaster recovery procedures documentation

\- [ \] Clean public repository with no secrets

\- [ \] Screenshots and setup instructions for portfolio



https://redd.it/1mmulu9
@r_devops
What Kubernetes do, that Google Cloud can not?

I never used Kubernetes, but I use Google Cloud at work.

We run our code in multiple products from Google Cloud:

\- Cloud Run for our APIs;
\- Cloud Scheduler for some scheduled background jobs;
\- Cloud SQL for DB;
\- etc;

What Kubernetes offer that I can't get from Google Cloud products? Why would I ever use GKE (Google Kubernetes Engine)?

https://redd.it/1mmwesq
@r_devops
Interview this week for a contract Platform engineer for Nvdia

Interview this week for a contract Platform engineer for Nvdia
Any other contractors/FT working for Nvdia and can offer some insights
What are some of the questions I can ask to find this is legit ?

Potential red flags
Asking for my references before the interview and very generic information about the role
Rate is very low
Direct vendor

Please DM if you do not want to share info on reddit

TLDR: check if the req for Platform engineer at Nvdia is legit or a attempt to get my details

TIA



https://redd.it/1mmxrwx
@r_devops
I'm struggling to figure out how to handle user data in the context of cattle-like VMs when the VMs are developers' primary workstations.

Working in azure for context.

We have developers currently in personal assignment pools where everyone gets their own unique vm. We have FSLogix handling the mounting of the user profiles, but it doesn't do anything for the user data, such as git repos that have been cloned down. Users often run local builds, so performance is important here.

The workstations are very much pets in that they get patches regularly and are very long lived. We are trying to move to immutable images, and trying to break free from the pets at the same time.

I can use a pooled host pool for the vm itself, and Fslogix for the user profile, but I always get stuck on the user data. If performance wasn't an issue, a network share with acls being mounted on login seems to be the best solution. Given that users would constantly be doing local builds on that share, I would like a 'local' solution instead, I'm just not sure what it is.

Something that could attach an existing data disk at logon or even at boot, based on the user that signed in, would be ideal, but I don't know of such a capability.

How would y'all handle a scenario like this?

Edit to add: forgot to note that this is for Windows

https://redd.it/1mn01a9
@r_devops
What’s the full list of moving parts needed to build a real financial exchange from scratch?

I’m not talking about a simple trading app. I mean a proper exchange in the league of NYSE, MCX, or LME electronic, possibly with physical settlement that can actually function in the real world.

If someone wanted to create one from the ground up, what exactly would need to be in place? I’m trying to get my head around the entire picture:

Core technology stack and matching engine design
Clearing and settlement systems
Regulatory licensing and jurisdictional differences
Membership structures, listing requirements, and onboarding
Market-making and liquidity provision
Risk management and surveillance systems
Connectivity to participants and data vendors
Physical delivery and warehousing

I’m especially interested in the less obvious operational and legal layers people tend to underestimate. If you’ve ever been involved in building, running, or integrating with an exchange, I’d really value a detailed breakdown from your perspective.

https://redd.it/1mn3er2
@r_devops
My lack of Kubernetes knowledge be what’s stopping me from working in DevOps?

Long story short, I’ve been working in IT for the past 3-4 years, mostly with infrastructure and support. The main reason I started in support was just to get my foot in the door in IT, I’ve always wanted to move into DevOps since that’s what I enjoy. I have the SAA-03 certification, but it doesn’t carry much weight since most of my AWS experience comes from hobby projects (just using EC2, S3, VPCs, and Lambda). The only "proper" CI/CD experience I’ve had was building a pipeline for a B2B e-commerce project at my previous job, using GitHub Actions and Docker for the front-end and back-end.

https://redd.it/1mn6dy1
@r_devops
Splash: Transform hard to grok plain text into beautiful color coded logs

Splash(https://github.com/joshi4/splash) is a CLI that automatically transforms logs of various formats into easy to scan color coded logs.

I built Splash after I found myself squinting and leaning in to the screen every time I needed to find a specific log line. It's so frustrating when you realize you've been looking for something that was there in the logs all along but you never saw it.

Other times, I'd add "ASDF" or "####" to certain log lines and use the terminals search to find those lines. To solve this, I built string and regexp matching into splash.

Hope people here find it useful! Always happy to get feature requests.



https://redd.it/1mn69ri
@r_devops
We recently released Zellij 0.43: bringing your terminal to the browser - would love to hear your thoughts

Hi all,

I am the lead maintainer of Zellij* and last week we released a significant version that includes the ability to share existing terminal sessions in the browser, as well as start new ones or resurrect exited ones. This ability includes built-in security measures such as authentication and enforcing HTTPS on external interfaces.

Personally, I am a terminal developer and use the web terminal as my daily driver. Many others use it to securely access their machine remotely. I would be curious to hear feedback from the DevOps community: would you find this useful for other things (eg. in place ssh key exchanges and the like)? If not, what would you be missing?

If you'd like to read more, the announcement is here: https://zellij.dev/news/web-client-multiple-pane-actions/
And I made a brief screencast/tutorial demonstrating the feature as well as how it integrates with native browser features such as bookmarks: https://zellij.dev/tutorials/web-client/

Curious to hear your thoughts.

*Zellij is a terminal workspace and multiplexer - in case you don't know it, there's more info here: https://zellij.dev/about/

https://redd.it/1mn63k0
@r_devops
How do you deal with GPU shortages or scheduling?

Feels like every AI project I’m on turns into “The Hunger Games” for GPUs.

* Either they’re all booked
* Or sitting idle somewhere I can’t use them
* Or I’m stuck juggling AWS/GCP/on-prem like a madman

How are you all handling this? Do you have some magic scheduler, or is it just Slack messages and crossed fingers?

Would love to hear your war stories.

https://redd.it/1mn8jvu
@r_devops
How are you tracking engineering performance metrics in real time?

We’ve been syncing PR review times, cycle time and throughout into a central dashboard, ours lives in monday dev but I’m curious what pipelines others have built. Do you pull data from github actions or custom scripts? How do you avoid drowning in too many numbers while still spotting real bottlenecks?

https://redd.it/1mn9ixr
@r_devops
Need recommendations for database archival and purging

Looking for an open-source solution to archive and purge old data in GCP Cloud SQL

Incrementally archive table data older than 3 months into Google Cloud Storage (GCS).

After archiving, automatically purge the archived records from the database.

Ideally, I'd like something that supports incremental runs (so it doesn't reprocess already archived data) and can be scheduled or automated.

Has anyone implemented something similar or can recommend a tool for this?

https://redd.it/1mn9f6p
@r_devops
Looking for some advice on career switch

I am working as a QA engineer for almost 8 years now and have experience in web, mobile and api testing both manual and automation. Currently I am working on mostly performance testing using Jmeter and am looking to switch careers and maybe move into Devops side seeing as I interact with them a lot and am interested.

I have worked on python and java but not that great with programming. I know there is a roadmap shared here and a lot of threads with advice but I'm mostly looking at what I can do to switch and what kind of jobs to look for which may pay me the same or slightly more than what I Currently get.

I tried using copilot to give me a basic rundown but it is suggesting certificates from some sites like devops university etc.. I have also seen some offered by azure, aws and Google but not sure which to go for and where to start. I am from India but am open to looking for jobs anywhere so general advice and guidance would be good. Thanks

https://redd.it/1mn63dy
@r_devops
How to learn on a bike tour?

Hey guys!

I'll be on a solo 1-month long bike tour and as I'm a Fullstack Developer who's switching to DevOps I'd also spend my time wisely and learn. I'm highly motivated in DevOps, I've already set up a home lab at home, with Proxmox on it, to which I have access via wireguard.

I know hands-on experience is the best to get a job and to learn the most, but since I can't really do it while biking or on a phone, I'd look for podcasts, courses, Youtube channels, books as I have a Kindle as well, which would be a great resource to spend my with!

https://redd.it/1mnc6lf
@r_devops
On-site SRE Assessment

I’ve got an upcoming on-site technical assessment for a Site Reliability Engineer role. I already passed the oral technical interview, but I have no idea what they might throw at me in the hands-on part.

Has anyone here been through a similar assessment? What kinds of tasks or challenges should I be prepared for? Any tips on what to focus on in the next few days would be really appreciated.

Thanks in advance!

https://redd.it/1mnja43
@r_devops
Debian 12 Packer image on Proxmox keeps on waiting for auto configuration network

I'm struggling a bit to make Packer works on my Proxmox Hypervisor to create a VM template.

I keep on getting hit by the "network autoconfiguration failed" even if my preseed.cfg mentionned to disable the network autoconfig.

It seems like the setup in my preseed.cfg isn't used. I've setup a fix ip address, but it's keep on hiting this prompt...

[![Screenshot of Debian 12 prompting "Network autoconfiguration failed"][1]][1]

Here are my files:

## debian12.pkrvars.hcl:

```
// debian12.pkr.hcl

packer {
required_plugins {
name = {
version = "1.1.6"
source = "github.com/hashicorp/proxmox"
}
}
}

variable "bios_type" {
type = string
}

variable "boot_command" {
type = string
}

variable "boot_wait" {
type = string
}

variable "bridge_firewall" {
type = bool
default = false
}

variable "bridge_name" {
type = string
}

variable "cloud_init" {
type = bool
}

variable "iso_file" {
type = string
}

variable "iso_storage_pool" {
type = string
default = "local"
}

variable "machine_default_type" {
type = string
default = "pc"
}

variable "network_model" {
type = string
default = "virtio"
}

variable "os_type" {
type = string
default = "l26"
}

variable "proxmox_api_token_id" {
type = string
}

variable "proxmox_api_token_secret" {
type = string
sensitive = true
}

variable "proxmox_api_url" {
type = string
}

variable "proxmox_node" {
type = string
}

variable "qemu_agent_activation" {
type = bool
default = true
}

variable "scsi_controller_type" {
type = string
}

variable "ssh_timeout" {
type = string
}

variable "tags" {
type = string
}

variable "io_thread" {
type = bool
}

variable "cpu_type" {
type = string
default = "kvm64"
}

variable "vm_info" {
type = string
}

variable "disk_discard" {
type = bool
default = true
}

variable "disk_format" {
type = string
default = "qcow2"
}

variable "disk_size" {
type = string
default = "16G"
}

variable "disk_type" {
type = string
default = "scsi"
}

variable "nb_core" {
type = number
default = 1
}

variable "nb_cpu" {
type = number
default = 1
}

variable "nb_ram" {
type = number
default = 1024
}

variable "ssh_username" {
type = string
}

variable "ssh_password" {
type = string
}

variable "ssh_handshake_attempts" {
type = number
}

variable "storage_pool" {
type = string
default = "local-lvm"
}

variable "vm_id" {
type = number
default = 99999
}

variable "vm_name" {
type = string
}

locals {
packer_timestamp = formatdate("YYYYMMDD-hhmm", timestamp())
}

source "proxmox-iso" "debian12" {
bios = "${var.bios_type}"
boot_command = ["${var.boot_command}"]
boot_wait = "${var.boot_wait}"
cloud_init = "${var.cloud_init}"
cloud_init_storage_pool = "${var.storage_pool}"
communicator = "ssh"
cores = "${var.nb_core}"
cpu_type = "${var.cpu_type}"
http_directory = "autoinstall"
insecure_skip_tls_verify = true
iso_file = "${var.iso_file}"
machine = "${var.machine_default_type}"
memory = "${var.nb_ram}"
node = "${var.proxmox_node}"
os = "${var.os_type}"
proxmox_url = "${var.proxmox_api_url}"
qemu_agent = "${var.qemu_agent_activation}"
scsi_controller = "${var.scsi_controller_type}"
sockets = "${var.nb_cpu}"
ssh_handshake_attempts = "${var.ssh_handshake_attempts}"
ssh_pty = true
ssh_timeout = "${var.ssh_timeout}"
ssh_username = "${var.ssh_username}"
ssh_password = "${var.ssh_password}"
tags = "${var.tags}"
template_description = "${var.vm_info} - ${local.packer_timestamp}"
token = "${var.proxmox_api_token_secret}"