Reddit DevOps
269 subscribers
4 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Flyway with Jenkins

Anybody here tried using this stack before? How was your experience? Does anyone have any use case I can use a reference? Currently trying out flyway if we can adapt it in our dev environment and if we should get the subscription... Any insight is appreciated.. thanks

https://redd.it/1el21aa
@r_devops
Configure ec2 in Github Actions workflow via SSH or use Ansible?

Working on a Github Actions workflow of which part is deploying an AWS ec2 via Terraform. To configure the ec2 instance for a Nodejs application, I could theoretically SSH or remotely run commands on the instance in the workflow - but is there an advantage to running an Ansible playbook via Actions workflow instead? One reason that may be in favor of Ansible: increases the modularity of the pipeline, meaning I could more easily port to another workflow or even CI/CD platform (Jenkins, etc) as the Ansible playbook is agnostic to CI/CD platform on which it rurns. Any other thoughts?

https://redd.it/1el1ryf
@r_devops
Careers after DevOps - experience or suggestions?

Awful economy and a stupidly wide-range of roles within "DevOps Engineer" that are almost impossible to fulful. So what are good exit careers after DevOps?

obviously development (if your programming skills are up to scratch)
what else?



https://redd.it/1elav9p
@r_devops
How OpenAI Scaled Kubernetes to 7,500 Nodes by Removing One Plugin

Hi everyone. I recently read an article about how OpenAI scaled Kubernetes to 7,500 nodes.

There was a lot of information in there but I thought the most important part was how they replaced Flannel with Azure CNI.

So I spent a lot of hours doing a bit more research into the specifics and here are my takeaways:

• Flannel is a Container Network Interface (CNI) plugin that is perfect for pod-to-pod communication between nodes

• Flannel works well for smaller clusters, it was not designed for thousands of nodes

• Flannel's performance got worse with the increased node count because of things like route table creation and traffic routing

• OpenAI already hosted its infrastructure on Azure and used the Azure Kubernetes Service (AKS)

• They switched from Flannel to Azure CNI, which is specifically designed for AKS

• Azure CNI is different from Flannel in several ways which made it a better solution for OpenAI

• The switch to Azure CNI ended up making pod-to-pod communication a lot faster

Okay, this is a super basic summary, but if you want a more detailed explanation with nice visuals, check out the full article.

https://redd.it/1eld525
@r_devops
What Python Frameworks do you use?

I was using the search feature as was surprised to not see a question raised about this. What frameworks should you learn as a devops engineer / what modules do you use? I know for a fact that everyone should learn to import csv or even flask / fast api.

What do you all use / think everyone should know how to use even on a basic level?

https://redd.it/1elgr21
@r_devops
Pull request branch auto-pull on target branch update

I haven't done so much DevOps in my life and need some advice on an issue I am facing. I didn't find something close to what I needed, either I missed it or didn't know how to phrase my question.

In my team, we tend to have 15-20+ open pull requests at a time, and it's quite bothersome when one gets merged the TL refuses to review anything else until they are up to date.

As you can imagine it gets annoying, and because the issue couldn't be solved by having them review the PR anyway, even if it's a couple of commits behind, I thought I would solve it technically.


Here is what I could stitch together as a CI-CD step:



updatebranches:
stage: update-branches
script:
- git fetch --all
- TARGET
BRANCH=$(git branch --contains $CICOMMITSHA | sed -n 's/^* //p')
- | for branch in $(git branch -r | grep -v '\->' | grep -v "$TARGETBRANCH" | sed 's/ *origin\///')
git checkout $branch
git merge origin/$TARGET
BRANCH
if $? -eq 0 ; then
git push origin $branch
else
echo "Merge conflict in $branch. Resolve conflicts manually."
fi
done



I would love any advice. Please tell me if this is bad practice, how I could approach it another way. What other options I have etc

https://redd.it/1eligfi
@r_devops
Blue/Green on Internal Service Microservice

Hi all, for those that are running a microservices environment and are able to perform blue/green deployments on an individual microservice basis - how exactly are you achieving this when performing blue/green on an api service that is consumed only by another microservice (and does not have a front end)?

Suppose the following traffic flow in AWS.

Client desktop browser -> ALB -> microservice_1 -> ALB -> microservice_2 -> ALB -> microservice_3

Suppose I wanted to perform blue/green on microservice two. I create another target group (blue) for micro-service 2 and keep traffic pointing to green. I now have the ability to directly hit the microservice-2-blue from some other machine and run a suite of smoke tests. That said I also validate the end-2-end flow from the client desktop to microservice_3, using microservice_2 blue.

I would imagine this would require some mechanism like an HTTP cookie (use_microservice_2_dark) that each of the intermediary devices would have to pass through each of the hops but I might be over thinking it.

Has anyone come across this particular pattern before?

Thanks!





https://redd.it/1elke7d
@r_devops
TechWorld with Nana DevOps Bootcamp vs KodeKloud Bootcamp

Hey everyone!

I know this has been asked in the past before, but I wanted to know if anyone has had any recent experience with taking the DevOps Bootcamp from TechWorld with Nana, or doing the DevOps / SRE learning path from KodeKloud?

I’m fortunate to have a learning budget at my company, so I’m not necessarily looking for the cheapest option vs finding the best fit in terms of learning material and practical experience. If anyone has other options as well or recommendations I’m happy to hear those as well!

https://redd.it/1elm09m
@r_devops
Missed call from AWS HR

I attended a 5 round technical interview with AWS recently. I got a call from HR today to tell me the results of the interview, but I missed it.

If anybody in this subreddit has experience with Amazon HR, please let me know what you think might happen.

I have been driving myself crazy thinking about the possibilities.

1. Either they reject me, but this call is a courtesy where they go in detail as to how much I suck

2. Or they tell me I have cleared the technical rounds and now have to go through HR round

3. Or they just tell me I have cleared and wanna work out the logistics.

What do you guys think?

https://redd.it/1elmjsf
@r_devops
Best side hustle/side job for a DevOps engineer?

Hey everyone,

I'm a DevOps engineer with about 3.5 years of experience working at a Fortune 500 company in the US. I mostly deal with Infrastructure as Code, pipelines, GitHub Actions, and some Python scripting—basically a mix of sys admin and coding/automation.

I have a decent salary and a great work-life balance, which gives me some extra time to explore side hustles. Earlier this year, I started teaching an online computer science class. It brings in an extra $1000 a month and takes about 9 hours a week, mostly grading assignments and helping students.

I'm looking for more ways to make some extra cash on the side without committing to another full-time job. Ideally, something that only takes a few hours a week and uses my cloud engineering, programming, or DevOps skills. I also get the occasional consulting gig through AlphaSights, but that's rare.

Any suggestions for side gigs or income streams that fit this criteria? I’d love to hear your ideas or experiences. Thanks!

https://redd.it/1elp1pw
@r_devops
DB access and all night pings

My devops team is based in the US but about half of our engineers are in Serbia and India. We currently have no plans to add devops headcount at our international sites. As a result overnight pages are extremely common and on call is pretty brutal for us right now. The WORST part is it’s usually minor issues that the dev could fix on their own, but they don’t have access to our prod DBs, etc. so they can’t do anything until we come online.

I’m looking into ways to give them self-serve access to specific tables outside of normal working hours (needs to be auditable and tables are a must due to compliance requirements). My wife who wakes up every time I get paged will be extremely grateful for any recs.

https://redd.it/1elp829
@r_devops
Adding subfolders in Artifactory Repository Tree while deploying

I am trying to add a subfolder to a repository tree and cannot find a way to do it. I’ve tried appending to the path name before adding the file I want to deploy but nothing seems to help.

https://redd.it/1elo3sc
@r_devops
How do you describe the prod ways to deploy to production?

I wanna have paths to learn to deploy apps to production and also have the develop/staging environments too. 




I’m try to do this one for example:  



A github project, with a dockerfile, a github workflow will build that dockerfile and deploy in GHCR and then later that build will be picked up by Railways and deploy it.



I can make it work fine with the environments vars but the secrets, these are give me hardtime. I think if someone gets the docker image they can in theory see the secrets, will no longer be secrets right?. Do I in the build process (docker or github workflow) copy/create the secrets folder in the docker ?



/run/secrets/api\_key  

/run/secrets/password





> node index.js

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.086114000Z)

Server is running on port[ ~https://localhost:6684/~](https://localhost:6684/)

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.086199000Z)

Environment Variables:

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.086212000Z)

APP\_VERSION: 1.0.1

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.086225000Z)

BUILD\_ENV: development

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.086233000Z)

NODE\_ENV: production

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.086244000Z)

PORT: 6684

[~Aug 06 15:13:47~](https://railway.app/project/2986007e-164a-4999-8d82-ed13cf57953a/logs?filter=%40deployment%3A6280c3f5-5e71-478b-b39b-6212a94f63bf+-%40replica%3A10ea2a15-2ad8-4fab-9989-83268adac972&context=2024-08-06T20%3A13%3A47.492825000Z)

Error: ENOENT: no such file or directory, open '/run/secrets/db\_password'



name: CI/CD

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
build-and-deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '20'

- name: Install dependencies
run: npm install

# - name: Build the app
# run: npm run build

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1

- name: Login to GitHub Container Registry
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push Docker image
uses: docker/build-push-action@v2
with:
context: .
file: ./Dockerfile
platforms: linux/amd64
push: true
tags: ghcr.io/${{ github.repository_owner }}/simple-web-server:latest
build-args: |
APP_VERSION=${{ env.APP_VERSION }}
BUILD_ENV=${{
env.BUILD_ENV }}
secrets: |
db_password=${{ secrets.DB_PASSWORD }}
api_key=${{ secrets.API_KEY }}



# Stage 1: Build the application
FROM node:20 AS builder

# Set build-time arguments
ARG APP_VERSION
ARG BUILD_ENV

# Log build-time arguments
RUN echo "Building with APP_VERSION=${APP_VERSION} and BUILD_ENV=${BUILD_ENV}"

# Set environment variables
ENV APP_VERSION=${APP_VERSION}
ENV BUILD_ENV=${BUILD_ENV}

# Create and change to the app directory
WORKDIR /app

# Copy the application code to the container
COPY package*.json ./
COPY . .

# Install dependencies
RUN npm install

# Stage 2: Run the application
FROM node:20

# Set environment variables
ENV NODE_ENV=production
ENV PORT=3000
ENV APP_VERSION=${APP_VERSION}
ENV BUILD_ENV=${BUILD_ENV}

# Create and change to the app directory
WORKDIR /app

# Copy the build artifacts from the builder stage
COPY --from=builder /app ./

# Log environment variables
RUN echo "Running with APP_VERSION=${APP_VERSION}, BUILD_ENV=${BUILD_ENV}, NODE_ENV=${NODE_ENV}, and PORT=${PORT}"

# Expose the application port
EXPOSE 3000

# Start the application
CMD ["npm", "start"]




this one that use for local, ir works fine because I'm copy the secrets


Dockerfile.local

# Stage 1: Build the application
FROM node:20 AS builder

# Set build-time arguments
ARG APP_VERSION
ARG BUILD_ENV

# Log build-time arguments
RUN echo "Building with APP_VERSION=${APP_VERSION} and BUILD_ENV=${BUILD_ENV}"

# Set environment variables
ENV APP_VERSION=${APP_VERSION}
ENV BUILD_ENV=${BUILD_ENV}

# Create and change to the app directory
WORKDIR /app

# Copy the application code to the container
COPY package*.json ./
COPY . .

# Install dependencies
RUN npm install

# Stage 2: Run the application
FROM node:20

# Set environment variables
ENV NODE_ENV=production
ENV PORT=3000
ENV APP_VERSION=${APP_VERSION}
ENV BUILD_ENV=${BUILD_ENV}

# Create and change to the app directory
WORKDIR /app

# Copy the build artifacts from the builder stage
COPY --from=builder /app ./

# Log environment variables
RUN echo "Running with APP_VERSION=${APP_VERSION}, BUILD_ENV=${BUILD_ENV}, NODE_ENV=${NODE_ENV}, and PORT=${PORT}"

# Create Environment Variables in GitHub Actions:
# Go to your GitHub repository.
#
# Click on Settings > Secrets and Variables > Actions.
#
# Add the following variables:
#
# APP_VERSION
# BUILD_ENV
#
# Add the following secrets:
#
# DB_PASSWORD
# API_KEY

# Copy secrets for local testing
COPY secrets/db_password /run/secrets/db_password
COPY secrets/api_key /run/secrets/api_key

# Expose the application port
EXPOSE 3000

# Start the application
CMD ["npm", "start"]






https://redd.it/1elt24o
@r_devops
What tools are there to manage autoscaling for kafka?

I'm familiar with cruise control but I wonder what are the options out there and which are the most popular ones? Are they fully automatic or do they require some level of continuous manual work?

https://redd.it/1eluf1q
@r_devops
Thinking about getting an extra job

I currently work for a company, and lately, the demand has been quite low to the point where I'm convinced I can handle another job.

However, I’m still not sure about the approach I should take when applying for another position. I’ve done some interviews where I mentioned that I was already working and wanted a second job, but that didn't go very well, haha.

My contract doesn’t have an exclusivity clause, but I wouldn't want them to know that I work somewhere else. I know some companies do a reference check and might end up contacting my current employer.

Any tips on how to proceed? Should I lie about being employed? Tell the truth?

https://redd.it/1elv30q
@r_devops
Challenges with CI/CD permissions management in OSS project: GitHub action.

Hi all :)

We have an OSS project sitting under OSS organization and I encountered a challenge with our CI/CD workflows and hope to get some insights.

The project is in GitHub, and it is a multilingual client library for ValKey/Redis OSS.

We are a team working for one of the big cloud companies, mainly dedicated to this project (not owned by the company, fully open source).


Most of the workflows are simple workflows that can be performed on a regular machine offered by GitHub.
But some of our CI tests including interacting with our company service in order to test massive cases and test that the project working also when the server is the cloud hosted version.
The issue is that in order to interact with the service safely, we need to hold the keys under the repo secrets, and those are available just from the main repo.


The maintainers are not working on the main repo but on theirs forks and opening PR's from forks to the main repo - so their PR's don't have access to the secrets and CI cannot accomplish all tests.
It is an OSS project, so we have to find a way to keep the secrets safe but still to make them available for CI triggered by maintainer fork, after approval from one of the organization members (ValKey).

Any ideas, offers, or insights?

Maybe somebody even want to join the community and help us with DevOps challenges? :P

https://redd.it/1elwmy3
@r_devops
Cypher for Kubernetes API: An expressive new way to work with k8s

Hey everybody 👋
I created this tool six months ago and it's been a daily driver for me since.
I lets me use a syntax similar to Cypher (Neo4j's query language which I adore) to perform CRUD operations on K8s.

My main use for this is examining resources, crafting custom JSON payloads with data from multiple resource kinds is a breeze.

This is an alpha release and while Cyphernetes has been in real-world use by me and a handful of other folk, test thoroughly before performing create/update/delete operations in production.

https://redd.it/1elvb3v
@r_devops
Ideas for a local development CICD pipeline

TL;DR : I have some ideas how I want to handle a local CICD pipeline, in order to increase developers speed in coding and testing.

What problem I want to solve: developers want to test their changes locally and as fast as possible, but the application they are working on has dependencies, and their resources are limited (ram, cpu...). Also, they don't want to wait for the remote CICD pipeline to work its magic and finally tell them it failed.

What solution I want to build : We create an application dependencies graph, as a JSON object. Our local CICD script will read this graph and build everything the application needs. Note, we shouldn't have to rebuild everything each time, only the base application.

How I see it working :

// Application building //

1. Pull the dev configurations from an outside source or .env (if you pull from outside, the credentials to connect are also be in the .env)
2. Build the application image for docker. (image will not be pushed and deleted every rebuild)
3. Launch the container (Your application need to be able to wait on its dependencies)
4. We should be able to launch the containers locally, on a local VM, on a remote local machine (via SSH?), or on a "dev cloud"

// Application dependencies building //

1. Resolve dependencies, pull the necessary code or build steps for all of them, creating subfolders (don't forget to .gitignore them).
2. Pull the configuration for each dependency, like before.
3. Build images if necessary (also not pushed to repo)
4. Launch the container
5. We can do extra steps, like include a dataset in the dependencies for later testing.

// Recursive dependencies building //

1. Repeat "Application dependencies building" recursively. each dependency need to have its own dependencies graph.
2. Allow the developer to decide the depth of dependencies to resolve

// Testing //

2. Now the developer can launch any automatic or manual tests, linters he wants...
3. All the tests remain optional before pushing (no pre-commit / pre-push)
4. The tests available to the developer should be the same as in the remote CICD pipeline, so he can be confident he is pushing correct code (if he launched the tests...)
5. Nuke all and push to Git button.

My question : does something like this already exist ? All this pipeline need to be his own tool. I could do all of this in bash, git, docker and vagrant, probably.

Note : I do all of this for fun on my free time, in my company we do things very differently

https://redd.it/1elzf5l
@r_devops
5YOE aws engineer, any good Azure crash courses?

Hello, have been working in cloud/devops for 5 years primarily with AWS. Got laid off and have s 3rd round interview with a place that is mostly an Azure shop. I understand cloud computing well I just need Azure specific info, have like a week to study it. Is there any recommended courses that would help for an interview?

I did the AZ104 cert 2 years ago but dont remember anything

https://redd.it/1em5llh
@r_devops
Do you have a strategy for dealing with 100s of alerts/rules?

Started a new job recently and their alerting seems a bit of a mess. We have default alerts enabled in tools like Datadog and Lacework, monitoring a few dozen AWS and GCP accounts and it seems like a little bit of a mess.



Hoping for some help/advice on how you guys have approached the high level strategy around alerting. I think it will start with an audit of what rules are enabled and where (there seems to be some overlap).


Maybe categorising alerts at a high level and churning through them to assess whether its useful?

https://redd.it/1em6kf3
@r_devops