Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
HELP Trying to optimize my Github Action to not install things every time. I'm new to this CI/CD thing

Hi friends, I'm looking for advice on speeding up my GitHub Actions workflow. Currently, a significant portion of my workflow which is taking some time involves:

sudo apt-get install -y gettext
yarn install --frozen-lockfile --silent
yarn my custom script which runs the react-gettext-parser npm library

These steps are executed on every push/PR, and I'm wondering if there's a more efficient way to handle them?
I wonder if it would be better if I could, for instance, compile what I'm installing, and instead use that compiled thing when my action triggers without having to install everything every time.

Has anyone faced similar challenges and found effective solutions? I'm open to any suggestions or best practices you can share. Thanks in advance : )

https://redd.it/1iyr471
@r_devops
How can I improve at performance tuning topologies/systems/deployments?

Machine learning engineer here, ~4.5 YOE. Most of my XP has been training and evaluating models. But I just started a new job where my primary responsibility will be to optimize systems/pipelines for low-latency, high-throughput inference. TL;DR: I struggle at this and want to know how to get better.

Model building and model serving are completely different beasts, requiring different considerations, skill sets, and tech stacks. Unfortunately I don't know much about model serving - my sphere of knowledge skews more heavily towards data science than computer science, so I'm only passingly familiar with hardcore engineering ideas like networking, multiprocessing, different types of memory, etc. As a result, I find this work very challenging and stressful.

For example, a typical task might entail answering questions like the following:

- Given some large model, should we deploy it with a CPU or a GPU?

- If GPU, which specific instance type and why?

- From a cost-saving perspective, should the model be available on-demand or serverlessly?

- If using Kubernetes, how many replicas will it probably require, and what would be an appropriate trigger for autoscaling?

- Should we set it up for batch inferencing, or just streaming?

- How much concurrency will the deployment require, and how does this impact the memory and processor utilization we'd expect to see?

- Would it be more cost effective to have a dedicated virtual machine, or should we do something like GPU fractionalization where different models are bin-packed onto the same hardware?

- Should we set up a cache before a request hits the model? (okay this one is pretty easy, but still a good example of a purely inference-time consideration)

The list goes on and on, and surely includes things I haven't even encountered yet.

I am one of those self-taught engineers, and while I have overall had considerable success as an MLE, I am definitely feeling my own limitations when it comes to performance tuning. To date I have learned most of what I know on the job, but this stuff feels particularly hard to learn efficiently because everything is interrelated with everything else: tweaking one parameter might mean a different parameter set earlier now needs to change. It's like I need to learn this stuff in an all-or-nothing fasion, which has proven quite challenging.

Does anybody have any advice here? Ideally there'd be a tutorial series (preferred), blog, book, etc. that teaches how to tune deployments, ideally with some real-world case studies. I've searched high and low myself for such a resource, but have surprisingly found nothing. Every "how to" for ML these days just teaches how to train models, not even touching the inference side. So any help appreciated!

https://redd.it/1iysmlj
@r_devops
Can Kaniko build a container with provenance=mode-min?

When going through the Kaniko docs I don't see an area for the Kaniko "--provenance" flag. Is setting this provenance level not a feature of Kaniko? Is there an alternate way of setting provenance with Notary/Oras? Is the provenance level set to min by default?

https://redd.it/1iyrvv9
@r_devops
can you guys roast my resume?

Hello everyone, I'm a masters student who has just started to apply for jobs. I don't have much experience in the IT field so I created my resume based on projects solely. I'm looking for jobs in devops(I know companies don't hire freshers for devops role) and SRE, cloud engineer and related jobs. I'm still learning devops so that is the reason I don't have any devops but will soon be adding it after learning.
can any of you guys could roast/review my resume? it would be really appreciated.

Resume link : https://www.reddit.com/r/aws/comments/1iyws7u/can\_you\_guys\_roast\_my\_resume/

Thanks in advance!

https://redd.it/1iywybb
@r_devops
Should I get degree in Cloud computing or Software Engineering from WGU

I have associates degree in computer science and internship experience in devops. Applying for jobs and no luck. thinking about getting bachelors degree from WGU in cloud computing or I should apply for Software engineering , Data Analytics or Cybersecurity?

https://redd.it/1iyypoh
@r_devops
What to do

I am looking to pursue a major . Should I choose computer engineer, software engineer, or electrical engineer. If I want to be come a DevOps.

https://redd.it/1iyz313
@r_devops
How do you manage database access?

We have a few AWS Aurora PostgreSQL databases where we manage database roles for our applications. This is done via psql.

The obvious problem is that it's very manual and not visible without running multiple psql commands. It's tedious to see which roles are available and which schemas, tables, columns they have access to.

What do you all use to visualize and manage this? Even better if it's a universal tool for other kinds of databases (MySQL, Trino, etc.)

Thanks for any advice!

https://redd.it/1iyqa64
@r_devops
IIS vs NGINX vs Apache

I had to install and configure a server to deploy web applications and APIs built in Node.js, I must clarify that these applications are intranet, they will be used only inside of the local company network. This is my first server and I was a little bit scared so I started with Windows Server. I built an Express server to serve each web app and I managed to deploy every single web service.

I wanted to go with a built-in web server to handle issues such as caching and security, a gateway to protect these APIs and serve these applications and I went with IIS, but I am having trouble while deploying web apps that are developed with React. All I hear about IIS is that it is crap and it only fits with Microsoft technologies.

I have the freedom to change anything I want so I want to ask you: should I change the host to a Linux distro and use NGINX or Apache to fulfill my needs even though I don't have experience with built-in web servers o with Linux in general? Or should I stick with IIS from now until I learn about Linux and web servers properly?

https://redd.it/1iz1kt3
@r_devops
Vagrant - WSL - Ansible

Anyone have some knowledge on how to make this set up work properly? I figured out how to make wsl and windows and vagrant to work together on virtualbox but it’s the ansible piece that’s killing my project.

My goal is pretty simple, I am learning ansible so I want to spin up 3 Ubuntu VMs in vagrant then have ansible run through each of the nodes and create a new user on each machine. My problem seems to happen with at ssh as it gets stuck after creating the first vm.

https://redd.it/1iz1kv3
@r_devops
Is there a debugger or some tool to check which container calls which container?

I have like 30 containers calling one another using messages and http calls, and sometimes it's impossible to know what is calling what because each services are coupled to each other and keep calling one another.

https://redd.it/1iz4bk9
@r_devops
SonatypeNexus OSS: Error during transaction commit and more DB errors

I am using Nexus version `3.70.1-02` which is the last version that supports OrientDB. It is deployed on a k8s cluster as a pod. I have been facing multiple issues ever since I tried to fetch a statistics about sizes of different repositories hosted on the nexus using `kubectl exec -it -u root <nexus-pod>` and executed following commands:

java -jar /opt/sonatype/nexus/lib/support/nexus-orient-console.jar
> CONNECT PLOCAL:/nexus-data/db/component admin admin
> select bucket.repositoryname as repository,sum(size) as bytes from asset group by bucket.repositoryname order by bytes desc limit 10;

This command worked as expected but ever since I am facing various transaction errors while reading/writing or even fetching metadata from various repos. I host APT, docker, raw repos on Nexus.

com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled - $ANSI{green {db=component}} Error on transaction commit 570FD604
com.orientechnologies.orient.core.exception.OStorageException: Error during transaction commit
DB name="component"

First I sensed something wrong with permissions as persistent volume in on the host machine so I did chmod -R 775 <nexus-persistent-location> and chown 200:200 <nexus-persistent-location> but this didn't solve the problem.

Every now and then I have to REBUILD the indices using REBUILD INDEX *; command and then delete nexus pod for k8s to create a new one and that works for some time(4-7hrs). Any clues what may be wrong here.

https://redd.it/1iz7rgk
@r_devops
Looking for Feedback on Our Multi-Environment (Dev/RC/Prod) GitLab CI/CD + Docker + Nexus Setup with Semantic Versioning

tl;dr: We have a multi-branch approach (develop, rc, main) with Docker + GitLab CI + Nexus for images. We’re finalizing how we do semantic versioning, environment variables, and Docker Compose setups. Would appreciate any wisdom from experienced DevOps folks!

Hey everyone! I’m working on a small team, and we’re currently establishing a DevOps pipeline for our microservice (a Java/Spring Boot app) and plan to replicate the same approach across multiple projects. We’d love to get some feedback from the DevOps community on our architecture and any potential pitfalls or improvements. Here’s our rough setup:


---

Our Git / Branching Model

We have three main branches:

1. develop – merges from feature/hotfix branches


2. rc – merges from develop when we’re ready for a release candidate


3. main – merges from rc for final production releases



Each branch deploys to its corresponding environment (dev → staging/RC → prod). We protect these branches so only maintainers can approve merges.



---

CI/CD with GitLab

We’re using Docker-in-Docker (dind) to build our Docker images inside GitLab CI, then pushing to Nexus as our Docker registry.

For Semantic Versioning, we’re still deciding between:

Option A: Formal semver only on production merges, while dev/rc images get tagged with branch + commitSHA.

Option B: Distinct semver or “pre-release” tags for dev (v1.2.3-dev), rc (v1.2.3-rc), and final (v1.2.3).


Considering Conventional Commits + semantic-release to auto-bump versions in the future, but that might be overkill initially.



---

Docker Compose & Environment Variables

We have a single docker-compose.yml that spins up PostgreSQL, pgAdmin, and our app container.

For different environments, we might use:

Separate .env files (e.g. .env.dev, .env.rc, .env.prod)

Or Docker Compose profiles (e.g., --profile dev / --profile rc).


Secrets and credentials (DB user/pass, etc.) are stored in GitLab CI variables. During deploy, we generate a .env on the target server (or pass env vars directly).

For production, everything is behind protected branches and environment-scoped variables.



---

Questions / Areas We’d Love Feedback On

1. Semantic Versioning Approach – Is it practical to do formal semver only for production and keep “branch + commitSHA” tags for dev/rc? Or is a uniform semver approach better?


2. Docker-in-Docker – Any pros/cons we should be wary of? Are there better ways to build Docker images in GitLab pipelines?


3. .env Handling – We plan to generate .env in the pipeline or store it on the server. Is that a good practice, or should we consider a different approach (e.g., Vault or similar)?


4. Nexus as a Docker Registry – Any best practices for tag management, cleanup, or security we should know?


5. Overall Flow – Does the dev → rc → main branching and environment progression sound solid, or do you recommend a different branching flow?



We’d love any advice, critiques, or “watch out for this!” tips from people who’ve done similar setups in production. Thanks in advance for your insights!

Thanks so much, everyone!

https://redd.it/1iz9evh
@r_devops
Best server configuration

Let suppose i want to run service :

Laravel service

Redis service

Node Service

RabbitMq Service


Then which server architecture and Linux distribution is good for early startup

Based on uber like application to run

https://redd.it/1izbv1x
@r_devops
Best server configuration

Let suppose i want to run service :

Laravel service

Redis service

Node Service

RabbitMq Service


Then which server architecture and Linux distribution is good for early startup

Based on uber like application to run

https://redd.it/1izbui7
@r_devops
AWS centralized secrets management and delegation across multi-accounts + how to share relevant secrets in-team and with third parties if needed?

AWS centralized secrets management and delegation across multi-accounts + how to share relevant secrets in-team and with third parties if needed?

https://redd.it/1izbsfw
@r_devops
Help Deploying OWASP ZAP on Kubernetes and Linking to GitLab CI

I’m integrating OWASP ZAP into my CI/CD pipeline and have been asked to deploy it on Kubernetes and connect it to GitLab CI. However, I haven’t found relevant documentation on how to properly set this up.

Has anyone done this before or found good resources to follow? Any guidance or examples would be greatly appreciated!

https://redd.it/1izfeuv
@r_devops
Guidance

Hello All,

I have been learning about Cloud and Devops for last 5-6 months and have built 3 applications.
I have built Java API application which connects to Azure Cosmos DB and is deployed on AKS/ Azure Web App using Azure Devops.

I have followed the same process to build and deploy a Node.js and python application. For IAC I have used bicep.

I have been searching for a job change and have been unsuccessful so far. I request you to help me provide your experience and guidance on to which other skills I need to learn in order to stand out and atleast be selected for an interview.


Thank you for all the help in Advance. Looking forward for your help.

Thank you 🙇🏻‍♂️

https://redd.it/1izgdri
@r_devops
Does anyone knows about ComplianceAsCode project, and and if it is easily upgradable ?

I've been assigned to an old project that is using the framework "ComplianceAsCode" in order to write structured documentations. This project has been kept "as it is" since 0.1.58, and today, we would like to renew it and be able to come back to current version which is 0.1.76.

I'm searching for some advice, does anybody knows about this project ?

https://redd.it/1izfy2j
@r_devops
Flexible rate limiting on applications that have none

We have some .net IIS applications sitting behind ALBs that do not have a concept of rate limiting. They are not getting upgraded to core anytime soon. There are features built into IIS, but it would be a redeploy everytime we want to change something. It's also IP based which is a non-starter because some customers have multiple accounts coming from the same IP. Ideally, we'd want to crack the bearer token and get the ID of who sent the request. Than we can set rate limits for big vs small customers.

What have you seen that's effective? Googling today it looks like either nginx was some lua scripting to redis or something like kong gateway whose price is...prohibitive. There look to be some creative solutions with isto, but being these are all EC2 instances and not containers I am not confident how that would work.



https://redd.it/1izjqob
@r_devops
AI agent creates a terraform devops project on AWS

I used Gemini 2.0 flash thinking to create a devops project from scratch. I used Roo vscode extension, gave it an advanced/detailed prompt. Got it to download & study docs, write terraform code, fmt, validate, fix all errors, till success 🎉

I'm a gray devops beard (if I had one!), and not much into making videos. Let me know how to improve or what you'd like to see (AI + devops)

https://youtube.com/watch?v=9ltORvpb57o

https://redd.it/1izkn10
@r_devops