Reddit DevOps
271 subscribers
22 photos
31.3K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
DevOps online courses help

Hey everyone,
I'm coming from a JavaScript Full Stack background and looking to transition into DevOps. I found the "DevOps Beginners to Advanced with Projects" course on Udemy ( https://www.udemy.com/course/decodingdevops/?couponCode=2021PM20 ) and was wondering if it's a good starting point to be able to pursue a junior position in the field. Has anyone taken this course? Would you recommend it or suggest something else?


I was also recommended a more specific aws and cka courses but I'm aiming to accomplish them after going through a complete DevOps course.

Thanks in advance!

https://redd.it/1g440rd
@r_devops
Ideas for creating Dead Man's Switch emailing system

Hey guys.

I am not sure if this is the right sub for this, but I feel like you all are my best bet.

Well I am looking to setup a system that basically functions as a Dead Man's Switch that will send out an email to my family members in case I pass away or something. I have seen services like deadmansswitch.net, but there are a few reasons why I am not using their service.

Basically, it would have to work such that the system sends you reminders by email every now and then, and you have to click on a link. If you don't click the link within a predetermined period, the system will trigger and send out a predefined email to your recipients.

I am not a hardcore DevOp like most of you guys, but I know some basic programming. What would be the easiest way to go about building a homemade solution like this?

https://redd.it/1g487tb
@r_devops
How common is it for companies that host hackathons to forbid contractors from participating?

Understand there are a variety of opinions on hackathons.

I work at a place that forbids full time contractors from participating in them. I'm trying to understand if the policy has a legal basis, if it's financially driven or has other motivations that aren't apparent to me.

https://redd.it/1g49sjx
@r_devops
Advice on new architecture. No more AWS ECS, Ansible to orchestrate docker on EC2 instead. Am I insane?

The problem
I currently use AWS ECS Fargate and ALB to serve my api and run my background workers. I don't like this setup because 1) for what we're getting out of it it's too expensive, 2) I can't truly replicate it locally¹, and 3) most importantly, squeezing in DB migrations into an ECS deployment has been quite painful. DB migrations specifically highlighted to me that ECS seems to be the wrong tool here.

So l'd like for some thoughts on if I'm thinking right about completely switching away from ECS and the tools I'm picking to do so.
Some context on the application first though. It's a B2B app, so sudden demand increase is unlikely. I will scale up slowly when needed, but I do want to have a plan for when that need comes. Zero downtime deployments are also not a requirement, a few minutes of downtime at night are fine.

My plan
A single EC2 instance as beefy as needed. It's got the app running in containers and the way to scale is to increase the number of containers. I'll use traefik as a reverse proxy in front of the api containers. I don't know what the appropriate load balancing algorithm would be here, but I didn't think this is that big of a deal either, right? As in, would the answer to this question affect the architecture I'm deciding on? Or can I just revisit this after I've implemented everything to better optimize the load balancer?

When code is merged to master, GitHub Actions will build a new app image, and then I'll use Ansible to automate the deployment in that EC2 instance.
I picked Ansible instead of writing a custom bash script because it seemed that I can use Ansible to declare what I want to happen in some sense, but I can still also imperatively write how it should be done. Is that correct?
This is the most vague area to me to be honest, so feedback here is greatly appreciated. I have never used Ansible before.
Another relevant point to mention here as well is that my custom bash script would be annoyingly stateful, which seemed too error prone. For example I have to check and ensure the state of the machine first, like spinning up traefik if it's not already running, and checking the db connection, etc. Ansible seemed like it had a good approach to this issue. It's a complex issue though, so I'm keeping my expectations low. Experience with things like this is appreciated.

I'm planning on using parameterized docker-compose files to configure the containers and set their network, static env vars, entrypoints, etc., in addition to passing in dynamic configuration (i.e. from AWS SSM) as env vars as well.
Running the entire app locally would simply be `docker-compose up -d` or I can even simulate a deployment process exactly by running the Ansible playbook targeting my local machine.

Here's the deployment logic if it's useful to know:
(I will use a GitHub Actions workflow to run the Ansible playbook directly into the instance to update it)
- Pull the new container image.
- Add a higher priority route in Traefik to point api traffic to a maintenance page.
- Wait 30 seconds for any ongoing requests to complete since Traefik doesn't support connection draining.
- Stop (not remove) old app containers.
- Run DB migration using new app image. Migration atomicity will be ensured by a few steps that are irrelevant here, but what's relevant to know is that it will be done using direct access to the PostgreSQL CLI.
- If db migration is successful, spin up containers for new app version, remove maintenance page and clean up old containers. Deployment is done!
- If db migration fails, start old app containers, remove maintenance page and send alerts.

As for observability, Prometheus would scrape the local collectors (cAdvisor, OTel, etc.) for logs and metrics.

When I need to scale up the api or the workers I'll add more containers, and when I reach hardware limits I can upgrade the instance type. Any gotchas I should know about
when upgrading instance types? From what I read in the docs, upgrading within the same family seems to be a simple task, no?

I know this is all over the place, there's a lot of things that need to fit in together properly, but I tried to only mention what's relevant. If there are any points I forgot to mention, I'll be happy to answer.
If what I'm asking for isn't clear I can also try and clarify that further.
Please mention any pitfalls I might fall into, even if you don't think they apply to my situation.

---

¹: I know that as some point replicating a scalable system locally is not a realistic expectation, but in our current state I see no reason why I can't spin up the same containers and reverse proxy locally and get the same deployed setup on my local machine. When the need comes for something like a hosted global service, I'll drop that requirement, but for now I don't see why I shouldn't be able to do so.


https://redd.it/1g4alqw
@r_devops
I launched my DevOps PostgreSQL platform today - feedback?

My name is Elliott, I’ve been building a DevOps platform the last three years on the top best in class open source platforms (Kubernetes, Elixir, PostgreSQL, Grafana, etc). The goal is to give
engineering teams access to a modern DevOps infrastructure without needing to have full SRE/DevOps committed resourcing.

It’s also open source/fair source - all the source code is here → [https://github.com/batteries-included/batteries-included](https://github.com/batteries-included/batteries-included)
I just shipped a public beta today and would love to hear initial reactions, thoughts, feedback.

Here’s some of the specific details of the platform:

* The platform features a user-friendly suggestion-based interface that guides users on topics like PostgreSQL cluster memory/CPU ratios, serverless web hosting, and secure secret sharing. Advanced users can quickly access full control over their data.
* It’s an Elixir-based UI on a database-driven, self-hosted Kubernetes platform. It can automatically deploy a scalable cloud installation (currently on AWS, with more options to follow) without the need for YAML or Terraform configurations. Alternatively, it can set up a development instance using Kind and Docker or Podman, facilitating a smooth transition from local to production environments.
* The platform supports easy AI project hosting for various workloads. Use Ollama embedding models for text embedding, eliminating OpenAI costs and data leakage risks. With PGVector and Cloud Native PG for vector databases, you can achieve near-state-of-the-art performance without exposing your data to third-party APIs. Experiment with Jupyter Notebooks, featuring optional Nvidia Plugin batteries for no DevOps-required experimentation.
* Single Sign-On is streamlined via Keycloak, Istio Ingress, and OAuth Proxy, securely hosted on your machine or cloud account. We've simplified security with full mTLS, Istio, SSL generation, and automated routing with Let's Encrypt and Acme for HTTP2. Istio Ingress services are seamlessly configured down to the contents of config maps.
* Grafana and Victoria Metrics can be auto-configured with just a few clicks for easy installation.

Here’s also a look at the demo of the database deploy [https://www.youtube.com/watch?v=YbvkWja3VIQ](https://www.youtube.com/watch?v=YbvkWja3VIQ)

The platform follows all the best practices learned for configuring and running a maintainable system without Kubenete's GitOps pain.

If you want to check it out, here are links to docs, site, repo, and join:

* [https://www.batteriesincl.com/](https://www.batteriesincl.com/)
* [https://home.batteriesincl.com/signup](https://home.batteriesincl.com/signup)
* [https://github.com/batteries-included](https://github.com/batteries-included)
* [https://www.batteriesincl.com/docs](https://www.batteriesincl.com/docs)

https://redd.it/1g49t4q
@r_devops
Starting Devops (no cs background

Hey everyone I’m starting to learn devops buying a course on udemy by Imran teli I’m seeking for advice and suggestions about things while learning devops also will my no it/cs backround affect my hiring process once i’m ready to work??

https://redd.it/1g4k5xp
@r_devops
Is there an easy way to see which containers triggered an error without explicitly sending the errors to a logging service?

I have 25 containers constantly sending messages to one another and sometimes one of them gets an error, but I have no idea which container got it. Is there a way to listen for errors on every docker container and centralize logging without explicitly writing code to send error to a microservice? I am using a local docker environment.

https://redd.it/1g4ljef
@r_devops
How much of a challenge are telemetry (metrics, logs, traces) storage costs for your team / company?

Real-life cases of overpriced/inefficient used telemetry storage are very welcome in comments!

View Poll

https://redd.it/1g4iysh
@r_devops
Started DevOps trainee role in a startup one month ago , any advice

I recently got the job in a startup which creates and manages infrastructure and pipelines for other companies (clients) , there is a lot to learn here as I am working directly under a senior Devops engineer, my salary is kind of competitive according to the region , I am getting overwhelmed by the work and working hours are 9-6 , How can I manage so that I don't make a mess of my life in starting of my career

https://redd.it/1g4q5o4
@r_devops
Need resources for DevOps beginner guide

Hey there,

Given my background in deployment and automation, I want to explore devops career path.

Started with few udemy courses and hands on with docker. Exploring further on kubernetes , ansible, aws cloud etc

Let me know good resources to start with for the same

Hoping a positive response.

https://redd.it/1g4pq1c
@r_devops
👍1
Rate My Startup Idea: Cloud Management Platform Using Natural Language Queries

I'm developing a cloud management platform where users can manage AWS, Azure, GCP, etc., using natural language commands. The platform integrates an LLM (like GPT) to translate user queries (e.g., "Launch an EC2 instance") into API calls that perform actual operations on the cloud provider. This aims to simplify cloud management, especially for non-technical users. What do you think of the idea, and how could I improve it?

https://redd.it/1g4rwwz
@r_devops
Seniors who took a chance on a junior hire, how did it go?

Did it pay off? Where are they now? Just curious of other people's stories.

https://redd.it/1g4sqg3
@r_devops
What does your development environment look like? What editor/IDE do you use and how do you keep your workflow as fast and easy as possible?

I once again upon Primeagen's brutally fast and systematized terminal only workflow where his workspace does exactly what he wants, when he wants it, using shortcuts and scripts he created himself. He doesn't need to touch his mouse for most of his work. I have to say that I'm pretty envious.

The thing is, devs mostly work on their own system, I more often than not work on another system or many systems in parallel. Next to the fact that I'm stuck using Windows at work. That causes me to use a random sandbox linux machine that I remote into using Cursor (VSCode) to use as my main workspace - to get at least some sanity back. But this does not "really" work, since many VSCode plugins just don't work remote and all my ZSH customizations also don't translate to any other server I have to connect to to get any work done.

I like it fast. Fast is good.
In the end, this job is, with the exception of writing actual code or automations, pure efficiency hell.
It is impossible to reach 'Flow' with all of these context and program switches.

How the hell do you guys survive this - or do you also suffer through it?
What does your dev environment look like, are you able to optimize everything?
Are you able to work without touching your mouse?

https://redd.it/1g4tfs7
@r_devops
Best practices of using terraform in local machine or a vm

I am having some questions regarding this , since most of the people I know are using tf locally but I created an instance to use tf , since its on the web i can access it from wherever I want , how do u use urs ??

https://redd.it/1g4rcxx
@r_devops
Need advice on balancing project work and self study

I’ve worked in DevOps/SRE positions in different organisations for more than 12 years now. I need some suggestions on how you senior engineers here manage your daily schedule for self study and project work. Every day I want to plan some hours for self learning to upskill myself on different technologies but somehow it’s not getting happened mostly due to personal stuff. I feel a sense of guilt each day after my work that I’m not learning much outside of my work.

https://redd.it/1g4v1br
@r_devops
Getting started in deployment

I am a Full stack software engineer who has decent experience with JS frameworks. Now I am planning to deploy a MERN app on digital Ocean. Can anyone please tell me how I can learn deployment on the Digital Ocean? I have a basic idea about Ubuntu so there I can clone repo and run it on localhost but have no idea how to connect with domain and other things like PM2 can anyone please guide me on how I can learn all these things?

https://redd.it/1g4vznc
@r_devops
How do you make your staging applications private?

Specifically for simple applications with front hosted on S3, served by cloudfront, and back hosted on EKS/ECS/EC2. Still using an authentication.

I understand Cloudfront is not meant to be private, but how do you handle that questions? You can either have a staging infra different than prod (no cloudfront, and front running behind a container for example), but that doesn't make sense for CI/CD, repeatability and testing purposes in my opinion.

I have tried a solution with Tailscale AppConnector, but it's flaky and browsers cache DNS weirdly so traffic sometimes doesn't go through Tailscale app connector node.


How do you manage that? Do you keep your staging applications publicly available?

https://redd.it/1g4udag
@r_devops
What would be a "magical" tool to resolve your problems ?

I’m curious—what’s a tool that you wish existed to make your job easier? Something that could solve those recurring problems that current tools don’t quite handle. Whether it's better automation, monitoring, CI/CD pipelines, or integrations, I’d love to hear what you think is missing!

What tool would make your day-to-day smoother?

https://redd.it/1g4xlt5
@r_devops
Concerns with Windows Subsystem for Linux

The tools I work with often aren't cross platform and are intended for Linux only. Typically this comes from needing MPI support in python, etc. I've had a couple of colleagues warn against using and developing inside of Windows Subsystem for Linux, citing various concerns on "what it will do to my system".

After several months, I've had no issues at all. I'm able to use it just the same as my Ubuntu dual boots or ubuntu Raspberry Pi's, so I'm just wondering if the concerns are valid?

Anyone have any horror stories with WSL that would steer me away from using it?

https://redd.it/1g4ycdt
@r_devops
Need Career Advice: Are AWS SAA, CKA, and Terraform Associate Enough for Job Hunting?

Hi everyone,

I recently graduated as a new grad from the University of Waterloo with an engineering degree and passed the AWS Certified Solutions Architect – Associate (SAA), Certified Kubernetes Administrator (CKA), and Terraform Associate exams. I’m currently exploring job opportunities in Canada, but I’m wondering if these certifications are enough to start applying directly for cloud and DevOps roles.

If not, I’m planning to pursue more certifications and would appreciate your thoughts on the order of priority:

1. Microsoft AZ-104 (Azure Administrator Associate)
2. AWS DevOps Engineer – Professional (DOP)
3. Certified Kubernetes Application Developer (CKAD)

Do you think this plan makes sense? Or would you recommend a different approach? Any feedback or insights would be greatly appreciated!

Thanks in advance!

https://redd.it/1g4zntp
@r_devops