Feeling Stuck in My DevOps Role – Need Career Advice
Hey DevOps folks,
I'm a DevOps engineer with 2 years of experience working at a startup. I primarily work with AWS cloud and some Azure (mostly pipelines), managing 7 applications across 3 environments each. Recently, we migrated to ECS with a cross-account setup, which was an exciting challenge. However, now that most things are automated with Terraform, there’s not much left to do—rarely any production issues, and my work feels stagnant.
Since I’m still early in my career, I don’t want to get stuck doing just this. I’m planning to switch to a new company and need some advice:
1. What type of company should I target? (Startups vs. bigger companies, service-based vs. product-based)
2. What technologies should I focus on learning? (I have hands-on experience with AWS, Azure DevOps, Jenkins, Prometheus, and Grafana. I know Kubernetes but haven’t used it in a real project.)
3. Any other suggestions? (e.g., full remote jobs, certifications, or alternative career paths)
Would really appreciate your insights!!
https://redd.it/1iyipp2
@r_devops
Hey DevOps folks,
I'm a DevOps engineer with 2 years of experience working at a startup. I primarily work with AWS cloud and some Azure (mostly pipelines), managing 7 applications across 3 environments each. Recently, we migrated to ECS with a cross-account setup, which was an exciting challenge. However, now that most things are automated with Terraform, there’s not much left to do—rarely any production issues, and my work feels stagnant.
Since I’m still early in my career, I don’t want to get stuck doing just this. I’m planning to switch to a new company and need some advice:
1. What type of company should I target? (Startups vs. bigger companies, service-based vs. product-based)
2. What technologies should I focus on learning? (I have hands-on experience with AWS, Azure DevOps, Jenkins, Prometheus, and Grafana. I know Kubernetes but haven’t used it in a real project.)
3. Any other suggestions? (e.g., full remote jobs, certifications, or alternative career paths)
Would really appreciate your insights!!
https://redd.it/1iyipp2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Jenkins CICD pipeline migration to GitLab
Hey guys,
What's your experience with migrating the CICD pipelines from jenkins to GitLab? Is it really the only way to rewrite the CICD files one by one or is there a tool for that? I hat do you think,what's the best practice?
https://redd.it/1iyjoyf
@r_devops
Hey guys,
What's your experience with migrating the CICD pipelines from jenkins to GitLab? Is it really the only way to rewrite the CICD files one by one or is there a tool for that? I hat do you think,what's the best practice?
https://redd.it/1iyjoyf
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Debug & chill #2 - Articles of infra & devops debugging
Thrilled to Share the Second Episode of My Debug & Chill Series!
Back in 2020, I started documenting some of my most intriguing troubleshooting adventures, and now I’m releasing them as a blog series. Each post dives into real problems I faced, how I used different tools, and my step-by-step logic.
This second installment dives into a puzzling case of packet duplication in a VMware environment—a seemingly simple scenario that turned out to be much trickier than it looked. Curious about the cause and how we tracked it down?
Check out Debug & Chill #2 here:
https://royreznik.substack.com/p/debug-and-chill-2-strange-packet
I’d love to hear your thoughts or any similar experiences you’ve had. Let me know in the comments!
https://redd.it/1iyjs7q
@r_devops
Thrilled to Share the Second Episode of My Debug & Chill Series!
Back in 2020, I started documenting some of my most intriguing troubleshooting adventures, and now I’m releasing them as a blog series. Each post dives into real problems I faced, how I used different tools, and my step-by-step logic.
This second installment dives into a puzzling case of packet duplication in a VMware environment—a seemingly simple scenario that turned out to be much trickier than it looked. Curious about the cause and how we tracked it down?
Check out Debug & Chill #2 here:
https://royreznik.substack.com/p/debug-and-chill-2-strange-packet
I’d love to hear your thoughts or any similar experiences you’ve had. Let me know in the comments!
https://redd.it/1iyjs7q
@r_devops
Substack
Debug & Chill 2 - Strange Packet Duplication
Debugging a Strange Packet Duplication
Using engineering metrics for good!
Can you share some examples of implementing engineering metrics in your daily workflow that positively impact your team performance?
https://redd.it/1iyin9z
@r_devops
Can you share some examples of implementing engineering metrics in your daily workflow that positively impact your team performance?
https://redd.it/1iyin9z
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Analyzing OpenTelemetry Data in Real Time with SQL - All Open Source
Hi folks!
I recently wrote a blog post on how to analyze OTel data in real time with SQL, using Feldera and Grafana, both open source tools.
We collect data from OTel collector and send it to your self hosted Feldera instance for analysis, and visualize it with Grafana.
The blog post: https://www.feldera.com/blog/opentelemetry
We also have a more detailed use case article: https://docs.feldera.com/use\_cases/otel/intro
Feel free to ask any questions, and hopefully this is useful to you!
https://redd.it/1iymaze
@r_devops
Hi folks!
I recently wrote a blog post on how to analyze OTel data in real time with SQL, using Feldera and Grafana, both open source tools.
We collect data from OTel collector and send it to your self hosted Feldera instance for analysis, and visualize it with Grafana.
The blog post: https://www.feldera.com/blog/opentelemetry
We also have a more detailed use case article: https://docs.feldera.com/use\_cases/otel/intro
Feel free to ask any questions, and hopefully this is useful to you!
https://redd.it/1iymaze
@r_devops
Feldera
From Traces to Insights: How to Analyze OpenTelemetry Data in Real-Time with SQL
Analyzing OpenTelemetry data to generate insightful visualizations using Feldera.
Just Started a DevOps Blog – Looking for Feedback & Suggestions! 🚀
Hey r/devops community!
I recently launched a personal blog where I share my experiences, challenges, and insights as a DevOps engineer. My goal is to post weekly about new technologies, interesting problems I encounter, and solutions I find useful in real-world scenarios.
My latest post is about EKS Auto Mode – I cover provisioning from scratch, deploying both stateless and stateful applications, and all the details involved in setting up a cluster in Auto Mode. I believe it could be a game-changer in the field, and I’d love to hear your thoughts on it!
👉 https://haykops.com/posts/eks-auto-mode/
I'm open to any feedback—whether it's about the content, topics you'd like me to cover, or how I can make the blog more valuable for the DevOps community.
Would love to hear your thoughts! Thanks in advance. 🙌
https://redd.it/1iyligy
@r_devops
Hey r/devops community!
I recently launched a personal blog where I share my experiences, challenges, and insights as a DevOps engineer. My goal is to post weekly about new technologies, interesting problems I encounter, and solutions I find useful in real-world scenarios.
My latest post is about EKS Auto Mode – I cover provisioning from scratch, deploying both stateless and stateful applications, and all the details involved in setting up a cluster in Auto Mode. I believe it could be a game-changer in the field, and I’d love to hear your thoughts on it!
👉 https://haykops.com/posts/eks-auto-mode/
I'm open to any feedback—whether it's about the content, topics you'd like me to cover, or how I can make the blog more valuable for the DevOps community.
Would love to hear your thoughts! Thanks in advance. 🙌
https://redd.it/1iyligy
@r_devops
I built an open-source dashboard for VM images
Hi,
I built this project because I wanted an easier way to visualise all Virtual Machine Images. I was also just very sick of people not following naming conventions and keeping track of images in spreadsheets.
Img-Dash is a simple dashboard for VM images across AWS, GCP and Azure that you can run locally.
Features:-
Consolidated view of all VM images and their data
View, Attach or Delete contextual information (IaC code, Event Data, Compliance Scripts)
Even displays which VMs are using which Image
Simple search and list of images in the dashboard
As a DevOps engineer, it has been ages since I've developed a full stack application so feedback is much appreciated!
Repo: https://github.com/shaozae/Img-Dash
https://redd.it/1iyq02j
@r_devops
Hi,
I built this project because I wanted an easier way to visualise all Virtual Machine Images. I was also just very sick of people not following naming conventions and keeping track of images in spreadsheets.
Img-Dash is a simple dashboard for VM images across AWS, GCP and Azure that you can run locally.
Features:-
Consolidated view of all VM images and their data
View, Attach or Delete contextual information (IaC code, Event Data, Compliance Scripts)
Even displays which VMs are using which Image
Simple search and list of images in the dashboard
As a DevOps engineer, it has been ages since I've developed a full stack application so feedback is much appreciated!
Repo: https://github.com/shaozae/Img-Dash
https://redd.it/1iyq02j
@r_devops
GitHub
GitHub - shaozae/Img-Dash: Centralised Dashboard for VM images
Centralised Dashboard for VM images. Contribute to shaozae/Img-Dash development by creating an account on GitHub.
HELP Trying to optimize my Github Action to not install things every time. I'm new to this CI/CD thing
Hi friends, I'm looking for advice on speeding up my GitHub Actions workflow. Currently, a significant portion of my workflow which is taking some time involves:
sudo apt-get install -y gettext
yarn install --frozen-lockfile --silent
yarn my custom script which runs the react-gettext-parser npm library
These steps are executed on every push/PR, and I'm wondering if there's a more efficient way to handle them?
I wonder if it would be better if I could, for instance, compile what I'm installing, and instead use that compiled thing when my action triggers without having to install everything every time.
Has anyone faced similar challenges and found effective solutions? I'm open to any suggestions or best practices you can share. Thanks in advance : )
https://redd.it/1iyr471
@r_devops
Hi friends, I'm looking for advice on speeding up my GitHub Actions workflow. Currently, a significant portion of my workflow which is taking some time involves:
sudo apt-get install -y gettext
yarn install --frozen-lockfile --silent
yarn my custom script which runs the react-gettext-parser npm library
These steps are executed on every push/PR, and I'm wondering if there's a more efficient way to handle them?
I wonder if it would be better if I could, for instance, compile what I'm installing, and instead use that compiled thing when my action triggers without having to install everything every time.
Has anyone faced similar challenges and found effective solutions? I'm open to any suggestions or best practices you can share. Thanks in advance : )
https://redd.it/1iyr471
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How can I improve at performance tuning topologies/systems/deployments?
Machine learning engineer here, ~4.5 YOE. Most of my XP has been training and evaluating models. But I just started a new job where my primary responsibility will be to optimize systems/pipelines for low-latency, high-throughput inference. TL;DR: I struggle at this and want to know how to get better.
Model building and model serving are completely different beasts, requiring different considerations, skill sets, and tech stacks. Unfortunately I don't know much about model serving - my sphere of knowledge skews more heavily towards data science than computer science, so I'm only passingly familiar with hardcore engineering ideas like networking, multiprocessing, different types of memory, etc. As a result, I find this work very challenging and stressful.
For example, a typical task might entail answering questions like the following:
- Given some large model, should we deploy it with a CPU or a GPU?
- If GPU, which specific instance type and why?
- From a cost-saving perspective, should the model be available on-demand or serverlessly?
- If using Kubernetes, how many replicas will it probably require, and what would be an appropriate trigger for autoscaling?
- Should we set it up for batch inferencing, or just streaming?
- How much concurrency will the deployment require, and how does this impact the memory and processor utilization we'd expect to see?
- Would it be more cost effective to have a dedicated virtual machine, or should we do something like GPU fractionalization where different models are bin-packed onto the same hardware?
- Should we set up a cache before a request hits the model? (okay this one is pretty easy, but still a good example of a purely inference-time consideration)
The list goes on and on, and surely includes things I haven't even encountered yet.
I am one of those self-taught engineers, and while I have overall had considerable success as an MLE, I am definitely feeling my own limitations when it comes to performance tuning. To date I have learned most of what I know on the job, but this stuff feels particularly hard to learn efficiently because everything is interrelated with everything else: tweaking one parameter might mean a different parameter set earlier now needs to change. It's like I need to learn this stuff in an all-or-nothing fasion, which has proven quite challenging.
Does anybody have any advice here? Ideally there'd be a tutorial series (preferred), blog, book, etc. that teaches how to tune deployments, ideally with some real-world case studies. I've searched high and low myself for such a resource, but have surprisingly found nothing. Every "how to" for ML these days just teaches how to train models, not even touching the inference side. So any help appreciated!
https://redd.it/1iysmlj
@r_devops
Machine learning engineer here, ~4.5 YOE. Most of my XP has been training and evaluating models. But I just started a new job where my primary responsibility will be to optimize systems/pipelines for low-latency, high-throughput inference. TL;DR: I struggle at this and want to know how to get better.
Model building and model serving are completely different beasts, requiring different considerations, skill sets, and tech stacks. Unfortunately I don't know much about model serving - my sphere of knowledge skews more heavily towards data science than computer science, so I'm only passingly familiar with hardcore engineering ideas like networking, multiprocessing, different types of memory, etc. As a result, I find this work very challenging and stressful.
For example, a typical task might entail answering questions like the following:
- Given some large model, should we deploy it with a CPU or a GPU?
- If GPU, which specific instance type and why?
- From a cost-saving perspective, should the model be available on-demand or serverlessly?
- If using Kubernetes, how many replicas will it probably require, and what would be an appropriate trigger for autoscaling?
- Should we set it up for batch inferencing, or just streaming?
- How much concurrency will the deployment require, and how does this impact the memory and processor utilization we'd expect to see?
- Would it be more cost effective to have a dedicated virtual machine, or should we do something like GPU fractionalization where different models are bin-packed onto the same hardware?
- Should we set up a cache before a request hits the model? (okay this one is pretty easy, but still a good example of a purely inference-time consideration)
The list goes on and on, and surely includes things I haven't even encountered yet.
I am one of those self-taught engineers, and while I have overall had considerable success as an MLE, I am definitely feeling my own limitations when it comes to performance tuning. To date I have learned most of what I know on the job, but this stuff feels particularly hard to learn efficiently because everything is interrelated with everything else: tweaking one parameter might mean a different parameter set earlier now needs to change. It's like I need to learn this stuff in an all-or-nothing fasion, which has proven quite challenging.
Does anybody have any advice here? Ideally there'd be a tutorial series (preferred), blog, book, etc. that teaches how to tune deployments, ideally with some real-world case studies. I've searched high and low myself for such a resource, but have surprisingly found nothing. Every "how to" for ML these days just teaches how to train models, not even touching the inference side. So any help appreciated!
https://redd.it/1iysmlj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Can Kaniko build a container with provenance=mode-min?
When going through the Kaniko docs I don't see an area for the Kaniko "--provenance" flag. Is setting this provenance level not a feature of Kaniko? Is there an alternate way of setting provenance with Notary/Oras? Is the provenance level set to min by default?
https://redd.it/1iyrvv9
@r_devops
When going through the Kaniko docs I don't see an area for the Kaniko "--provenance" flag. Is setting this provenance level not a feature of Kaniko? Is there an alternate way of setting provenance with Notary/Oras? Is the provenance level set to min by default?
https://redd.it/1iyrvv9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
can you guys roast my resume?
Hello everyone, I'm a masters student who has just started to apply for jobs. I don't have much experience in the IT field so I created my resume based on projects solely. I'm looking for jobs in devops(I know companies don't hire freshers for devops role) and SRE, cloud engineer and related jobs. I'm still learning devops so that is the reason I don't have any devops but will soon be adding it after learning.
can any of you guys could roast/review my resume? it would be really appreciated.
Resume link : https://www.reddit.com/r/aws/comments/1iyws7u/can\_you\_guys\_roast\_my\_resume/
Thanks in advance!
https://redd.it/1iywybb
@r_devops
Hello everyone, I'm a masters student who has just started to apply for jobs. I don't have much experience in the IT field so I created my resume based on projects solely. I'm looking for jobs in devops(I know companies don't hire freshers for devops role) and SRE, cloud engineer and related jobs. I'm still learning devops so that is the reason I don't have any devops but will soon be adding it after learning.
can any of you guys could roast/review my resume? it would be really appreciated.
Resume link : https://www.reddit.com/r/aws/comments/1iyws7u/can\_you\_guys\_roast\_my\_resume/
Thanks in advance!
https://redd.it/1iywybb
@r_devops
Reddit
From the aws community on Reddit
Explore this post and more from the aws community
Should I get degree in Cloud computing or Software Engineering from WGU
I have associates degree in computer science and internship experience in devops. Applying for jobs and no luck. thinking about getting bachelors degree from WGU in cloud computing or I should apply for Software engineering , Data Analytics or Cybersecurity?
https://redd.it/1iyypoh
@r_devops
I have associates degree in computer science and internship experience in devops. Applying for jobs and no luck. thinking about getting bachelors degree from WGU in cloud computing or I should apply for Software engineering , Data Analytics or Cybersecurity?
https://redd.it/1iyypoh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What to do
I am looking to pursue a major . Should I choose computer engineer, software engineer, or electrical engineer. If I want to be come a DevOps.
https://redd.it/1iyz313
@r_devops
I am looking to pursue a major . Should I choose computer engineer, software engineer, or electrical engineer. If I want to be come a DevOps.
https://redd.it/1iyz313
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you manage database access?
We have a few AWS Aurora PostgreSQL databases where we manage database roles for our applications. This is done via psql.
The obvious problem is that it's very manual and not visible without running multiple psql commands. It's tedious to see which roles are available and which schemas, tables, columns they have access to.
What do you all use to visualize and manage this? Even better if it's a universal tool for other kinds of databases (MySQL, Trino, etc.)
Thanks for any advice!
https://redd.it/1iyqa64
@r_devops
We have a few AWS Aurora PostgreSQL databases where we manage database roles for our applications. This is done via psql.
The obvious problem is that it's very manual and not visible without running multiple psql commands. It's tedious to see which roles are available and which schemas, tables, columns they have access to.
What do you all use to visualize and manage this? Even better if it's a universal tool for other kinds of databases (MySQL, Trino, etc.)
Thanks for any advice!
https://redd.it/1iyqa64
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
IIS vs NGINX vs Apache
I had to install and configure a server to deploy web applications and APIs built in Node.js, I must clarify that these applications are intranet, they will be used only inside of the local company network. This is my first server and I was a little bit scared so I started with Windows Server. I built an Express server to serve each web app and I managed to deploy every single web service.
I wanted to go with a built-in web server to handle issues such as caching and security, a gateway to protect these APIs and serve these applications and I went with IIS, but I am having trouble while deploying web apps that are developed with React. All I hear about IIS is that it is crap and it only fits with Microsoft technologies.
I have the freedom to change anything I want so I want to ask you: should I change the host to a Linux distro and use NGINX or Apache to fulfill my needs even though I don't have experience with built-in web servers o with Linux in general? Or should I stick with IIS from now until I learn about Linux and web servers properly?
https://redd.it/1iz1kt3
@r_devops
I had to install and configure a server to deploy web applications and APIs built in Node.js, I must clarify that these applications are intranet, they will be used only inside of the local company network. This is my first server and I was a little bit scared so I started with Windows Server. I built an Express server to serve each web app and I managed to deploy every single web service.
I wanted to go with a built-in web server to handle issues such as caching and security, a gateway to protect these APIs and serve these applications and I went with IIS, but I am having trouble while deploying web apps that are developed with React. All I hear about IIS is that it is crap and it only fits with Microsoft technologies.
I have the freedom to change anything I want so I want to ask you: should I change the host to a Linux distro and use NGINX or Apache to fulfill my needs even though I don't have experience with built-in web servers o with Linux in general? Or should I stick with IIS from now until I learn about Linux and web servers properly?
https://redd.it/1iz1kt3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Vagrant - WSL - Ansible
Anyone have some knowledge on how to make this set up work properly? I figured out how to make wsl and windows and vagrant to work together on virtualbox but it’s the ansible piece that’s killing my project.
My goal is pretty simple, I am learning ansible so I want to spin up 3 Ubuntu VMs in vagrant then have ansible run through each of the nodes and create a new user on each machine. My problem seems to happen with at ssh as it gets stuck after creating the first vm.
https://redd.it/1iz1kv3
@r_devops
Anyone have some knowledge on how to make this set up work properly? I figured out how to make wsl and windows and vagrant to work together on virtualbox but it’s the ansible piece that’s killing my project.
My goal is pretty simple, I am learning ansible so I want to spin up 3 Ubuntu VMs in vagrant then have ansible run through each of the nodes and create a new user on each machine. My problem seems to happen with at ssh as it gets stuck after creating the first vm.
https://redd.it/1iz1kv3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is there a debugger or some tool to check which container calls which container?
I have like 30 containers calling one another using messages and http calls, and sometimes it's impossible to know what is calling what because each services are coupled to each other and keep calling one another.
https://redd.it/1iz4bk9
@r_devops
I have like 30 containers calling one another using messages and http calls, and sometimes it's impossible to know what is calling what because each services are coupled to each other and keep calling one another.
https://redd.it/1iz4bk9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
SonatypeNexus OSS: Error during transaction commit and more DB errors
I am using Nexus version `3.70.1-02` which is the last version that supports OrientDB. It is deployed on a k8s cluster as a pod. I have been facing multiple issues ever since I tried to fetch a statistics about sizes of different repositories hosted on the nexus using `kubectl exec -it -u root <nexus-pod>` and executed following commands:
java -jar /opt/sonatype/nexus/lib/support/nexus-orient-console.jar
> CONNECT PLOCAL:/nexus-data/db/component admin admin
> select bucket.repositoryname as repository,sum(size) as bytes from asset group by bucket.repositoryname order by bytes desc limit 10;
This command worked as expected but ever since I am facing various transaction errors while reading/writing or even fetching metadata from various repos. I host APT, docker, raw repos on Nexus.
com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled - $ANSI{green {db=component}} Error on transaction commit
com.orientechnologies.orient.core.exception.OStorageException: Error during transaction commit
DB name="component"
First I sensed something wrong with permissions as persistent volume in on the host machine so I did
Every now and then I have to REBUILD the indices using
https://redd.it/1iz7rgk
@r_devops
I am using Nexus version `3.70.1-02` which is the last version that supports OrientDB. It is deployed on a k8s cluster as a pod. I have been facing multiple issues ever since I tried to fetch a statistics about sizes of different repositories hosted on the nexus using `kubectl exec -it -u root <nexus-pod>` and executed following commands:
java -jar /opt/sonatype/nexus/lib/support/nexus-orient-console.jar
> CONNECT PLOCAL:/nexus-data/db/component admin admin
> select bucket.repositoryname as repository,sum(size) as bytes from asset group by bucket.repositoryname order by bytes desc limit 10;
This command worked as expected but ever since I am facing various transaction errors while reading/writing or even fetching metadata from various repos. I host APT, docker, raw repos on Nexus.
com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled - $ANSI{green {db=component}} Error on transaction commit
570FD604com.orientechnologies.orient.core.exception.OStorageException: Error during transaction commit
DB name="component"
First I sensed something wrong with permissions as persistent volume in on the host machine so I did
chmod -R 775 <nexus-persistent-location> and chown 200:200 <nexus-persistent-location> but this didn't solve the problem.Every now and then I have to REBUILD the indices using
REBUILD INDEX *; command and then delete nexus pod for k8s to create a new one and that works for some time(4-7hrs). Any clues what may be wrong here.https://redd.it/1iz7rgk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for Feedback on Our Multi-Environment (Dev/RC/Prod) GitLab CI/CD + Docker + Nexus Setup with Semantic Versioning
tl;dr: We have a multi-branch approach (develop, rc, main) with Docker + GitLab CI + Nexus for images. We’re finalizing how we do semantic versioning, environment variables, and Docker Compose setups. Would appreciate any wisdom from experienced DevOps folks!
Hey everyone! I’m working on a small team, and we’re currently establishing a DevOps pipeline for our microservice (a Java/Spring Boot app) and plan to replicate the same approach across multiple projects. We’d love to get some feedback from the DevOps community on our architecture and any potential pitfalls or improvements. Here’s our rough setup:
---
Our Git / Branching Model
We have three main branches:
1. develop – merges from feature/hotfix branches
2. rc – merges from develop when we’re ready for a release candidate
3. main – merges from rc for final production releases
Each branch deploys to its corresponding environment (dev → staging/RC → prod). We protect these branches so only maintainers can approve merges.
---
CI/CD with GitLab
We’re using Docker-in-Docker (dind) to build our Docker images inside GitLab CI, then pushing to Nexus as our Docker registry.
For Semantic Versioning, we’re still deciding between:
Option A: Formal semver only on production merges, while dev/rc images get tagged with branch + commitSHA.
Option B: Distinct semver or “pre-release” tags for dev (v1.2.3-dev), rc (v1.2.3-rc), and final (v1.2.3).
Considering Conventional Commits + semantic-release to auto-bump versions in the future, but that might be overkill initially.
---
Docker Compose & Environment Variables
We have a single docker-compose.yml that spins up PostgreSQL, pgAdmin, and our app container.
For different environments, we might use:
Separate .env files (e.g. .env.dev, .env.rc, .env.prod)
Or Docker Compose profiles (e.g., --profile dev / --profile rc).
Secrets and credentials (DB user/pass, etc.) are stored in GitLab CI variables. During deploy, we generate a .env on the target server (or pass env vars directly).
For production, everything is behind protected branches and environment-scoped variables.
---
Questions / Areas We’d Love Feedback On
1. Semantic Versioning Approach – Is it practical to do formal semver only for production and keep “branch + commitSHA” tags for dev/rc? Or is a uniform semver approach better?
2. Docker-in-Docker – Any pros/cons we should be wary of? Are there better ways to build Docker images in GitLab pipelines?
3. .env Handling – We plan to generate .env in the pipeline or store it on the server. Is that a good practice, or should we consider a different approach (e.g., Vault or similar)?
4. Nexus as a Docker Registry – Any best practices for tag management, cleanup, or security we should know?
5. Overall Flow – Does the dev → rc → main branching and environment progression sound solid, or do you recommend a different branching flow?
We’d love any advice, critiques, or “watch out for this!” tips from people who’ve done similar setups in production. Thanks in advance for your insights!
Thanks so much, everyone!
https://redd.it/1iz9evh
@r_devops
tl;dr: We have a multi-branch approach (develop, rc, main) with Docker + GitLab CI + Nexus for images. We’re finalizing how we do semantic versioning, environment variables, and Docker Compose setups. Would appreciate any wisdom from experienced DevOps folks!
Hey everyone! I’m working on a small team, and we’re currently establishing a DevOps pipeline for our microservice (a Java/Spring Boot app) and plan to replicate the same approach across multiple projects. We’d love to get some feedback from the DevOps community on our architecture and any potential pitfalls or improvements. Here’s our rough setup:
---
Our Git / Branching Model
We have three main branches:
1. develop – merges from feature/hotfix branches
2. rc – merges from develop when we’re ready for a release candidate
3. main – merges from rc for final production releases
Each branch deploys to its corresponding environment (dev → staging/RC → prod). We protect these branches so only maintainers can approve merges.
---
CI/CD with GitLab
We’re using Docker-in-Docker (dind) to build our Docker images inside GitLab CI, then pushing to Nexus as our Docker registry.
For Semantic Versioning, we’re still deciding between:
Option A: Formal semver only on production merges, while dev/rc images get tagged with branch + commitSHA.
Option B: Distinct semver or “pre-release” tags for dev (v1.2.3-dev), rc (v1.2.3-rc), and final (v1.2.3).
Considering Conventional Commits + semantic-release to auto-bump versions in the future, but that might be overkill initially.
---
Docker Compose & Environment Variables
We have a single docker-compose.yml that spins up PostgreSQL, pgAdmin, and our app container.
For different environments, we might use:
Separate .env files (e.g. .env.dev, .env.rc, .env.prod)
Or Docker Compose profiles (e.g., --profile dev / --profile rc).
Secrets and credentials (DB user/pass, etc.) are stored in GitLab CI variables. During deploy, we generate a .env on the target server (or pass env vars directly).
For production, everything is behind protected branches and environment-scoped variables.
---
Questions / Areas We’d Love Feedback On
1. Semantic Versioning Approach – Is it practical to do formal semver only for production and keep “branch + commitSHA” tags for dev/rc? Or is a uniform semver approach better?
2. Docker-in-Docker – Any pros/cons we should be wary of? Are there better ways to build Docker images in GitLab pipelines?
3. .env Handling – We plan to generate .env in the pipeline or store it on the server. Is that a good practice, or should we consider a different approach (e.g., Vault or similar)?
4. Nexus as a Docker Registry – Any best practices for tag management, cleanup, or security we should know?
5. Overall Flow – Does the dev → rc → main branching and environment progression sound solid, or do you recommend a different branching flow?
We’d love any advice, critiques, or “watch out for this!” tips from people who’ve done similar setups in production. Thanks in advance for your insights!
Thanks so much, everyone!
https://redd.it/1iz9evh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Best server configuration
Let suppose i want to run service :
Laravel service
Redis service
Node Service
RabbitMq Service
Then which server architecture and Linux distribution is good for early startup
Based on uber like application to run
https://redd.it/1izbv1x
@r_devops
Let suppose i want to run service :
Laravel service
Redis service
Node Service
RabbitMq Service
Then which server architecture and Linux distribution is good for early startup
Based on uber like application to run
https://redd.it/1izbv1x
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community