Reddit DevOps
270 subscribers
8 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Bucardo Alternatives?

I know this probably isn't a "true" DevOps question, but lucky me I've inherited our ex-DBA's responsibilities! Anyway we have an onprem postgres cluster in a master-standby setup using streaming replication currently. I'm looking to migrate this into RDS, more specifically looking to replicate into RDS without disrupting our current master. Eventually after testing is complete we would do a cutover to the RDS instance. As far as we are concerned the master is "untouchable"

I've been weighing my options: -

Bucardo seems not possible as it would require adding triggers to tables and I can't do any DDL on a secondary as they are read-only. It would have to be set up on the master (which is a no-no here). And the app/db is so fragile and sensitive to latency everything would fall down (I'm working on fixing this next lol)
Streaming replication - can't do this into RDS
Logical replication - I don't think there is a way to set this up on one of my secondaries as they are already hooked into the streaming setup? This option is a maybe I guess, but I'm really unsure.
pgdump/restore - this isn't feasible as it would require too much downtime and also my RDS instance needs to be fully in-sync when it is time for cutover.

I've been trying to weigh my options and from what I can surmise there's no real good ones. Other than looking for a new job XD

I'm curious if anybody else has had a similar experience and how they were able to overcome, thanks in advance!

https://redd.it/1hyaefd
@r_devops
Socket.io connection being cancelled on fly.io

Hi everyone!

I have a single fly.io machine running my Python backend app, which does text-to-speech audio streaming using Socket.io to stream the audio chunks to the Next.js frontend.
Locally the audio streaming over websockets works, but when I deploy the backend to fly.io and frontend to Vercel - the audio streaming randomly stops after \~2 seconds.

My Python backend uses RealtimeTTS GitHub - KoljaB/RealtimeTTS: Converts text to speech in realtime
with Azure Speech Services.

It’s almost like something forces the audio streaming to stop after a few seconds. Could the fly.io loadbalancer or proxy be the reason here?

My fly.io config:

app = 'staging-polyglotpal'
primaryregion = 'gru'

[processes]
app = "./
main.py"
otelcollector = "otelcol-contrib --config /etc/otelcol/otel-config.yaml"

[build]
dockerfile = "Dockerfile"

[[services]]
name = "app"
processes = ["app"]
internal
port = 8000
protocol = "tcp"
autostopmachines = "suspend"
autostartmachines = true
minmachinesrunning = 1
forcehttps = true

[services.concurrency]
hard
limit = 20
softlimit = 15

[[services.ports]]
handlers = ["tls", "http"]
tls
options = { "alpn" = "h2", "http/1.1", "versions" = "TLSv1.2", "TLSv1.3" }
port = 443

[services.ports]
handlers = "http"
port = 80

[services.http_checks]
interval = "10s"
graceperiod = "30s"
method = "get"
path = "/api/healthcheck/status"
protocol = "http"
timeout = "5s"
tls
skipverify = true

[[services]]
name = "otelcollector"
processes = ["otelcollector"]
internal
port = 40000

[services.tcp_checks]
interval = "10s"
graceperiod = "30s"
timeout = "5s"

[[vm]]
size = "performance-1x"
#size = "shared-cpu-1x"
cpus = 1
memory = "2gb"
processes = ["app"]

[[vm]]
size = "shared-cpu-1x"
cpus = 1
memory = "512mb"
processes = ["otelcollector"]


[
Socket.io](https://socket.io/) config:

ping\
interval=45,
ping_timeout=120,
transports: [“websocket”\],
forceNew: false,
reconnection: true,
reconnectionAttempts: 3,
reconnectionDelay: 3000,
timeout: 10000,

I’m quite stuck right now and would really appreciate some feedback. Has anyone had a similar issue like this?

https://redd.it/1hy7odc
@r_devops
Errors when validating Packer .hcl template: Error: Extraneous label for build and Error: Unsupported block type

Using Packer version v1.11.2 with the AWS Plugin v1.3.4_x5.0 on a RHEL8 EC2 instance. Using it to try and build out a Windows 2019 AMI.

I have a background with Terraform and trying to keep the .hcl template simple, however I keep hitting up against two errors (Error: Unsupported block type and Error: Extraneous label for build) various times and can't figure out from searching online how on fix.

packer {

required_version = ">=1.10.0"

required_plugins {
amazon = {
version = ">=1.3.3"
source = "github.com/hashicorp/amazon"

}

}

}

variables {
ami_name = "packer-windows-server-2019"
instance_type = "g4dn.xlarge"

region = "us-east-1"

vpc_id = "vpc-abc123"

subnet_id = "us-east-1b"

key_name = "generic-keypair"

}



build "amazon-ebs" {
region = "${var.region}"
source_ami = "ami-04d76aa3cb20388b6"

instance_type = "${var.instance_type}"

vpc_id = "${var.vpc_id}"

subnet_id = "${var.subnet_id}"

associate_public_ip_address = true
ssh_username = "Administrator"
ssh_password = "Password-goes-here"
ssh_interface = "winrm"
winrm_protocol = "https"
winrm_transport = "ntlm"
communicator = "winrm"

ami_name = "${var.ami_name}"

tags = {
Name = "${var.ami_name}"
}



provisioner "shell" {
inline = [
"powershell.exe -Command \"Install-WindowsFeature -Name IIS -IncludeManagementTools\"",
"powershell.exe -Command \""iisreset\"",
]

}

https://redd.it/1hycz8e
@r_devops
AI: Driven DevOps?

Hello, my dear DevOps engineers.

I’m writing to gather information about platforms that you may be using that have already incorporated AI and are seeing significant benefits from it.

I’ve always been straightforward when creating deployment pipelines. I recently switched from GitLab to GitHub Actions due to a change in my role, but the issue persists.

This company ecosystem is fragmented, and we urgently need to transition from manual deployments to automated ones (We are AWS powered).

While I’m hesitant about DevOps AI platforms, the future seems to point in that direction. Therefore, I’m requesting your assistance in understanding how platforms like Harness or any other can alleviate the challenges associated with managing automated deployments.

In my opinion, we should begin by standardizing code at the architecture level to enable the utilization of reusable deployment patterns.

I would greatly appreciate any guidance on AI-driven solutions in the following areas:

Deployment platforms
Observability
Platforms that automatically generate test cases based on Epic and stories (as context)
Security

Any assistance you can provide would be invaluable.

Thank you.

https://redd.it/1hyfc3a
@r_devops
Key Management Question: Rewrite Docker ENV or Rewrite JSON config for script?

I have a simple AWS SES mailer node script where I read the AWS keys from a config file. The keys aren't in my repo, but on deploy I write them to a config file.

I know it's a better practice to use the ENV variables in docker because they aren't written to a file in the container directly, but... I'm still rewriting the Dockerfile and that persists after launch. It feels like it doesn't really solve the problem of "can someone actually see these keys in a deploy process?"

Any suggestions for a better way to handle the AWS keys on deploy? I'm using a simple script to do the actual deploy.

https://redd.it/1hybtlw
@r_devops
Assigning instance role to my ec2 instance breaks network connectivity to ec2 endpoint and other aws endpoints

Hey all... really weird issue I am having.

Originally I was trying to set up an EKS cluster, and the nodes were not joining the cluster. I checked it out, and apparently nodeadm-config was unable to do an ec2:DescribeInstances -- but not due to permissions errors, instead due to a network timeout for the ec2.region.amazonaws.com endpoint. Indeed a direct curl to the endpoint just hangs. Other public services e.g. google.com, text.npr.org can be accessed. But stuff on amazonaws.com ... no go.

Through trial and error, I narrowed the issue down to the instance profile used for the ec2 instances. I have made several test ec2 instances, and it seems that adding an instance profile causes requests to the ec2 endpoint to hang.

Does anyone have any idea why this might be happening? Thanks in advance.

https://redd.it/1hyj8gd
@r_devops
40 to 50 year olds, please check in?

Are there DevOps engineers in this age group? I will be 40 years next year. I have such a bad fear of being laid off, I don't know what I will do. I haven't been let go yet. I've been socking away money but I know a lot of work friends that got let go towards 50 in non-devops positions. I am a Dad too so I can't focus on leveling up all the time.

https://redd.it/1hyjz2g
@r_devops
Any free alternatives to SonarQube?

Any free alternatives to SonarQube? Looking for something free. I am already using prettier and ESLint.

https://redd.it/1hyn7uj
@r_devops
What terminal do you guys use as a devops engineer?

Looking to enhance my terminal experience. What terminal do you guys use? How have been your experience? Whats the best feature you like about that terminal

https://redd.it/1hyqg6p
@r_devops
Questionnaire on Log aggregation and monitoring for University Project

I’m working on a university project, and I’d really appreciate it if you could take a few minutes to answer this questionnaire, thanks. This questionnaire is mainly targeting sysadmins. https://forms.gle/cb7Vg1s8avGSvjJDA

https://redd.it/1hysbxz
@r_devops
AWS internal CI/CD best practices

AWS internal CI/CD best practices

I was setting up my own pipeline and I ran into this article when doing a quick Google search.

It said 80% CPU and 80% MEM on the rollback alarm. What do y'all think? In general, I think fault rate percentage depends on what your overall traffic volume is.

https://redd.it/1hysonc
@r_devops
Personal projects/homelab learning experience

Hey all, bit of a meta post I guess


I‘ve found that having my own pet projects (CI, k3s cluster, Rancher, development with Angular) has significantly increased my „intuition“ to solve lots of issues in the realm of DevOps topics.
This intuition has transferred really well into day-to-day topics at work, even though the tools we use there are different.


I‘d like to know if you have a homelab setup/personal project and how that has helped you with your skills.

On the other side I‘d also be interested in folks who refuse to do any projects in their spare time and the reason for this mindset.


Idk what my actual point is, I guess I just want to open up a conversation about personal projects vs. strictly work-only

https://redd.it/1hyt7vc
@r_devops
Devops reading list, what to add ?

I've already read the generally recommended list of DevOps books. What else is worth looking at. I'm more interest in organizational changes and methodology, not so much tools and tech.

* The DevOps handbook
* The Phoenix
* Team Topoligies
* The Unicorn Project
* DORA report

https://redd.it/1hyu78a
@r_devops
Ship it podcast discontinued

A couple of weeks ago I stumbled across the ship it podcast only to learn that it's being discontinued. Dang, I liked the content. What could I listen to instead?

https://redd.it/1hysk3q
@r_devops
From idea to deployment in 4 hours

Hey guys,

yesterday i've accomplished something i've had in my head a few weeks already and only needed around 4 hours from idea to deployment. Check it out: https://og-img.com

It is completely free with no registration required.

The idea:

I wanted to have some dynamic Opengraph Images (those og:image metadata you see in blog & social media posts) on my own blog to look better when shared via social media.

The dynamic part is in the url, which means if i want to write a blogpost about devops salaries i can easily write my own OpenGraph Image with simply just changing the URL like this: https://og-img.com/Devops%20salaries/og.png

The "Devops%20salaries" part is the dynamic part which results in the final OpenGraph Image.

You can generate every Image you can image, as long as it fits in 100 characters, and embed this into your own blog or social media posts.

Tech stack:

The heavy lifting is done via node.js in the backend + vanilla javascript in the frontend

I decided to dockerize the whole application, especially because i wanted Caddy as a reverse proxy to handle the node.js container traffic.

The stack node.js + caddy is running in a docker compose setup on a mediocre VM from hetzner.

I announced the service on r/webdev and got some huge traffic yesterday, but the stack handled all of this perfecty.

If you have any feedback or questions let me know! <3




https://redd.it/1hywubd
@r_devops
Is there a good static analysis tool that's free and that's better than semgrep?

Is there a good static analysis tool that's free and that's better than semgrep?

https://redd.it/1hz3emt
@r_devops
Logging software recommendations

Hey! So I'm looking for a logging software that has a free plan for starting and cheap plans with good retention times. It's for Java apps (specifically Minecraft servers) where I run multiple instances per server type (for example, first server type would be running in 5 instances and second server type would be running in 10 instances). Any recommendations that you can give? I've seen Axiom, I'm currently thinking on that. Or should I use Sentry self-hosted or another product? Thanks in advance.

https://redd.it/1hz0tv6
@r_devops
How to pass environment variables to remote server using GitHub Actions

I have a flask app for which I want to setup CI/CD using GitHub Actions. I am not sure what the best way to pass environment variables for my Postgres and flask containers. Of course I cannot keep a `.env` file in version control.

My approach is to connect to the server through SSH and create a `.env` file and refer to it in my `docker-compose.yml`

Is this bad practice?

name: Deploy Flask App

on:
push:
branches:
- main

jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production server
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.SERVER_HOST }}
username: ${{ secrets.SERVER_USERNAME }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
mkdir -p /app
git clone https://my_app /app
cd /app

# Create .env file from GitHub secrets
cat > .env.prod << EOL
REDDIT_USER=${{ secrets.REDDIT_USER }}
REDDIT_USER_PASSWORD=${{ secrets.REDDIT_USER_PASSWORD }}
REDDIT_CLIENT_ID=${{ secrets.REDDIT_CLIENT_ID }}
REDDIT_CLIENT_SECRET=${{ secrets.REDDIT_CLIENT_SECRET }}
USER_AGENT=${{ secrets.USER_AGENT }}
POSTGRES_DB=${{ secrets.POSTGRES_DB }}
POSTGRES_USER=${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD=${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_HOST=${{ secrets.POSTGRES_HOST }}
POSTGRES_PORT=${{ secrets.POSTGRES_PORT }}
OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }}
OPENAI_ENCODER=cl100k_base
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_MODEL_TOKEN_LIMIT=8191
CHUNK_SIZE=1200
CHUNK_OVERLAP=120
EOL

https://redd.it/1hz6xv7
@r_devops
Free solution to having Nx Cloud on our bare metal server?

Hi everyone. I wanted to ask if there is a free solution to having Nx Remote Cache locally on our bare metal server? Basically a free local version of Nx Cloud.

I have a project in my team that uses an Nx monorepo, and we have own bare metal servers.

I want to be able to share the Nx cache with other team members, so it can speed up development and testing.

Is there any solution out there? Even if it doesn't have all the bells and whistles of the paid version?

https://redd.it/1hz8shb
@r_devops
OpenTofu 1.9 is here with foreach support for providers, here is how to use it.

Last week I decided to try out the for
each and there was not really any clear documentation on how to use it. So after looking through the PRs and some other mentions to figure out how to use it, I threw together a blog post with a complete example.


The TL;DR, so you don't even need to go to the blog is:



variable "aws_regions" {
type = map(string)
default = {
"global" : "us-east-1",
"backup" : "us-west-2"
}
}

terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~>5"
}
}
}

provider "aws" {
for_each = var.aws_regions
alias = "by_region"

region = each.value
}

data "aws_availability_zones" "this" {
for_each = var.aws_regions
provider = aws.by_region[each.key]
}

output "provider_regions" {
value = { for k, v in data.aws_availability_zones.this : k => tolist(v.group_names)[0] }
}

data "aws_availability_zones" "global" {
provider = aws.by_region["global"]
}

output "global_region" {
value = tolist(data.aws_availability_zones.global.group_names)[0]
}




Follow on blog post showing some higher complexity use cases: https://dwood.dev/posts/opentofuproviderforeachcomplexexample/

https://redd.it/1hzch9r
@r_devops
Become a Data Engineer in 2025 (Based on 100 jobs data!)

Happy New Year, everyone! Reposting a combination of 3 of my most upvoted posts last year at the start of the year for those looking to set ambitious career goals in 2025 assuming lot of new people are looking for this info now. After all, there’s no better time to plan your next big leap into Data Engineering!

**1. Top skills in demand -**

I analyzed 100 data engineering job descriptions from Fortune 500 companies to find the most frequently mentioned skills. Here are the top skills in demand:

|**Skill Group**|**Frequency**|**Constituents with Frequency**|
|:-|:-|:-|
|Programming Languages|196|SQL (85), Python (76), Scala (21), Java (14)|
|ETL and Data Pipeline|136|ETL (65), Pipeline (46), Integration (25)|
|Cloud Platforms|85|AWS (45), Azure (26), GCP (14)|
|Data Modeling and Warehousing|83|Data Modeling (40), Warehousing (22), Architecture (21)|
|Big Data Tools|67|Spark (40), Big Data Tools (19), Hadoop (8)|
|DevOps, Version Control and CI/CD|52|Git (14), CI/CD (13), Jenkins (7), Version Control (7), Terraform (6)|
|Data Quality and Governance|42|Data Quality (20), Data Governance (13), Data Validation (9)|
|Data Visualization|23|Data Visualization (11), Tableau (6), Power BI (6)|
|Collaboration and Communication|18|Communication (10), Collaboration (8)|
|API and Microservices|11|API (8), Microservices (3)|
|Machine Learning|10|Machine Learning (7), MLOps (2), AI/ML Model Development (1)|

**2. 4 Month Study Plan -**

**Month 1: Foundations**

* DBMS & SQL: Basics of database concepts, querying, and design.
* Python: Focus on Python essentials, including libraries like Pandas and NumPy.
* Linux: Basic commands and navigation.
* DSA: Data structures and algorithms, especially for big tech roles.

**Month 2: Key Concepts & Tools**

* Data Concepts: Topics such as Data Lake, Data Mart, Fabric, and Mesh.
* Data Governance: Management, security, and ethics in data.
* Spark: Introductory concepts with Apache Spark.
* Distributed Systems: Overview of Hadoop, Hive, and MPP systems.
* Cloud Services: Options such as AWS, GCP, or Azure.

**Month 3: Advanced Topics**

* Orchestration: Basics of workflow orchestration with tools like Apache Airflow.
* Compute: Databricks, Snowflake, or equivalents like AWS EMR.
* Containers: Introduction to Docker and Kubernetes.
* CI/CD: Tools such as Jenkins and SonarQube.
* Streaming: Fundamentals of Kafka.
* ETL/ELT: Tools like dbt and Talend, along with architecture basics.
* Terraform: Code-based infrastructure setup.

**Month 4: Projects & Portfolio**

* Build a project portfolio to showcase skills. Examples include:
* Bank Data Warehouse
* Fraud Detection ETL
* Reddit Review Tracker
* Retail Analytics
* Trip Data Transformation
* YouTube Clone

**3. Certifications**

Note - You don't have do all of these, do 1/2 of AWS or Azure, 1 of Datarbricks or Snowflake, and 1/2 of optional certifications based on your interests. Also I have mentioned resources only for the ones I know - for the ones I haven't attempted/know have left it empty - please add the same in the comments.

|**Certification**|**Coverage**|**Cost (USD)**|**Resource**|
|:-|:-|:-|:-|
|AWS Certified Cloud Practitioner|Basics of AWS Cloud concepts, services, and support.|$100|Stephane Maarek's Udemy courses|
|**AWS Certified Solutions Architect – Associate **|Designing and deploying scalable systems on AWS.|$150|Stephane Maarek's Udemy courses|
|**AWS Certified Data Engineer – Associate **|Managing data pipelines, analytics, and ETL workflows on AWS.|$150|Stephane Maarek's Udemy courses, AWS Builder Labs|
|Microsoft Azure Data Fundamentals (DP-900)|Core data concepts and implementation using Azure.|$99|Eshant Garg/Scott Duffy Udemy courses, Coursera prep courses|
|**Microsoft Azure Data Engineer Associate (DP-203) **|Integrating and transforming data for analytics on Azure.|$165|Eshant Garg/Scott Duffy Udemy courses, Coursera prep courses|
|Databricks Lakehouse Fundamentals|Basics of Databricks Lakehouse architecture and workflows.|Free||
|**Databricks Certified Data