Reddit DevOps
268 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How can I reduce the oncall burden?

Hey everyone,

I'm looking for some advice on how to make on-call duties a bit more bearable. I end up being on call every month for a full week (24/7), and those nighttime pages are killing me!

Would love to hear about how you all manage the on-call burden:

Metrics: What do you track to keep on-call healthy and manageable?
Reducing Burden: Any processes or strategies that work well for you?
Tools: What tools help you monitor and improve your on-call setup?
Team Structure: Does each team handle on-call, or do you use a NOC and have escalation policies?

Thanks a bunch!

https://redd.it/1eil9up
@r_devops
What metrics do you use to track your success and influence promotions and pay?

How do you track them? Do you manual monitor them or use in-house or OSS tools?

For example, I keep an eye on cost savings I produce over a time period for services I manage. When my self performance review comes up I use this metric to quantify my performance on keeping costs down. This process needs an improvement.

https://redd.it/1eilq5z
@r_devops
What do you recommend for integrated logs navigation?

We have a small micro services architecture (5-6 services) running on AWS (lambdas, ec2, s3 mostly). We mostly lean on Sentry, Cloudwatch and FullStory for observability.

I'd really like to be able to aggregate, track, visualize and navigate all of these in a single place for both performance and debugging, with big picture and granularity. Before embarking on an in-house solution, is there a platform you recommend? If in-house, do you have approaches that work for you?

https://redd.it/1eilet5
@r_devops
Windows runners

Our CI is built in bash.. until now, most of our jobs were running on Linux runners.

Recently, we have a huge need for windows runners as well… we don’t have the capacity to rewrite everything in powershell and we’re not sure of the alternatives.. the codebase is huge

Has anyone else had this problem? Can you point me in the right direction?

https://redd.it/1eiml9d
@r_devops
How do you layout your resume?

Ive been in devops for 2 years and been apart of layoffs for the second year in a row. I have applied to over 900 jobs. I have tried resume prep services and asked friends. But i am trying to find a job in DevOps for the past few months.

I came from help desk and running help desks and transitioned to DevOps at the end of the stay-at-home phase for the pandemic.

So any tips would be great. I have a year in aws and one in azure. Worked in a startup and at an msp. I genuinely enjoy cloud DevOps and would like to not go back to Help Desk support (even in a hugh tier or managerial role), but that is seeming less possible every day.

https://redd.it/1eiqbnk
@r_devops
Observability Meetup in San Francisco

Hi /devops :-)

I'm hosting an Observability meetup in San Francisco on August 8th, so if you're in the area and want free pizza, beer, and to listen to some cool talks on Observability, stop by!

We'll have speakers from Checkly (Monitoring as code), the co-creator of Hamilton (https://www.tryhamilton.dev/) and Burr (https://github.com/DAGWorks-Inc/burr), and the CEO/Founder of Delta Stream (who is also the creator of ksqlDB).


Should be a solid time :-)

https://redd.it/1eilaii
@r_devops
deleting bin log on primary database


Hey all ,

Have a bit of a problem with our current primary mariadb database server reaching 99% disk space

We have quite a lot of binary logs with replication configured to a secondary. My question is would using the command to purge binary logs on the master for all logs apart from the one the secondary is currently reading cause any issues to the integrity of the data on the primary ?

Thanks for any advice

https://redd.it/1eikmft
@r_devops
Proxmoxgk: a shell tool for deploying LXC/QEMU guests, with Cloud-init

Good evening everyone, I've just released a small command line utility for Proxmox v7, 8 to automate the provisioning and deployment of your containers and virtual machines with Cloud-init.

**Key features:**


* Unified configuration of LXC and QEMU/KVM guests via Cloud-init.
* Flexible guest deployment:
* in single or serial mode
* fully automated or with your own presets
* Fast, personalized provisioning of your Proxmox templates

[Presentation on Proxmox forum](
https://forum.proxmox.com/threads/proxmox-automator-for-deploy-lxc-and-qemu-guests-with-cloud-init.152183/)

[Github](
https://github.com/asdeed/proxmoxgk)

https://redd.it/1eiwlvl
@r_devops
Deploying to cloud (beginner)

Hey 👋🏻...

I am building a project related to scripts to scrape prices from websites and will also learn to deploy it to the cloud..

So I have one Beginner question,
I have two scripts one is in node puppeteer and the other in python selenium,(I am learning and trying both languages)..

So how can I deploy these two scripts to run in the cloud automatically , one after the other daily, and how to check for errors , completion, etc so I can have some logic to retry if they fail..

Do I need to have some Central component to coordinate tasks etc..

Thankyou

https://redd.it/1ej2ga8
@r_devops
Terraform came in clutch

I am currently working on a project and I always had to deploy it manually to the Azure VM. It wasn't cumbersome but as a programmer, repeated boring tasks is definitely not my tea. I had to find some sort of automation for this task. Having practice with Github Actions wasn't enough so what do I do? I tried all sorts of solutions that came into my mind, I wrote bash scripts and setup listeners to check if a new zip file is uploaded(via powershell script) then run the bash script which essentially should have run. The problem was, python ran the bash script in a sub process which invited some really annoying bugs. Hell, I even tried to make a listener in C language to detect if a new file is uploaded, and run the installation bash script but it couldn't do the trick as well.

I almost gave up on this pure automation, until today, I just wanted to learn Terraform because I never really understood it's use case or it's true power but Today, I finally took the courage and started reading the docs. I understood what Terraform really is. After 1 hour of playing around, I thought, let's just use Terraform to do my task. And to my surprise, just after 30 minutes of little adjustments, I was able to finally make a locally hosted CI/CD pipeline using Terraform that deploys the code on the VM.


I understand that this solution to my problem may or may not be the standard or ideal way but definitely worth the effort. Any thoughts on this implementation?

https://redd.it/1ej44cy
@r_devops
NGINX Configuration Help: URL Cleanup Before Redirect

Hi everyone,

I'm working on cleaning up URLs in my NGINX configuration before redirecting them. Specifically, I want to replace all instances of %2F with / in the URL. I'm using a rewrite rule to achieve this, but I'm running into some issues. Here's the configuration I'm working with:

server {
listen 80;
server_name cleaner.home.localhost;

root /usr/share/nginx/html;

location / {
# Do not apply rewrite if it's already been redirected
if ($request_uri ~* "%2F") {
rewrite ^(.*)%2F(.*)$ $1/$2 last;
}

return 301 https://localhost$request_uri;
}
}


Here are the problems I'm encountering:
1. When using the last argument with rewrite, I get a 404 error. I suspect this is due to an infinite loop, which triggers NGINX's fail-safe mechanism.
2. If I remove the last argument, the redirect works, but the rewrite rule doesn't seem to be applied at all. It looks like $request_uri is not affected by the rewrite.

My questions are:
1. How can I ensure that the rewrite rule is applied correctly and the %2F characters are replaced with /?
2. Is there a better way to implement this URL cleanup ?

Thanks in advance for any help!

https://redd.it/1ej5xsl
@r_devops
Open Source Platform Orchestrator Kusion v0.12.1 is Out!

What has changed?

* storage backend enhancement, include supporting `path-style` endpoint for AWS S3, new `kusion release unlock` command for better Release management.
* optimize the display of the sensitive information to reduce risk of sensitive information leakage
* support import existing cloud resources and skip their deletion during `kusion destory`.
* workspace `contenxt` support decalre the Kubernetes cluster configs and Terraform Provider credentials.
* support using the `Spec` file as the input for the `kusion preview` and `kusion apply` command

Also more info can be found in our [medium blog](https://medium.com/@kusionstack/kusion-v0-12-1-release-improve-comprehensive-capabilities-and-optimize-user-experience-0375075d8fde).

Please checkout the new release at: [https://github.com/KusionStack/kusion/releases/tag/v0.12.1](https://github.com/KusionStack/kusion/releases/tag/v0.12.1)

Your feedback and suggestions are welcome!

https://redd.it/1ej7cdf
@r_devops
How would you suggest provisioning an external WAF such as Azure/AWS/CF when TLS termination happens in the ingress-controller & cert-manager is responsible of producing and renewing all of the certificates?

Currently I am running on a multi cloud with multi cluster. I am using ingress-nginx together with external-dns, cert-manager and argocd to automatically deploy, generate certificates, register DNS names and allow routing.

Now I want to add an external WAF, so nothing like mod-security or Calico's but instead Azure Application Gateway, AWS WAF, Cloudflare, etc. Ideally it should be one WAF to rule them all, but I can also live with WAF per provider for now.

Long story short, since I use cert-manager to generate & renew my certificates and nginx to do the TLS termination, I am not sure how I should pass these certificate to the external WAF to do the TLS termination and re-encryption afterwards. I don't want to reinvent the wheel by generating more certificates from outside the cluster.

Basically I want everything to be managed through my cluster, including the setup of the WAF.

I haven't seen much information about this subject, so I tend to assume that I am thinking about it wrongly or I miss something within the process.

One thing I want to note: I did think about using some sort of a mgmt cluster to provision part of these things in advanced, but ideally I'd like not to go there if possible.

I'd appricate your opinions, ideas and help. Thank you

https://redd.it/1ej7uz5
@r_devops
github actions doesn't seem to be using docker cache

I've a Dockerfile for a python app and a github action workflow file, I am not sure why the cache is not being used. Everytime it re-does all the `apt-get` related things. Any idea what could be going on?

FROM python:3.11-bookworm AS builder
WORKDIR /app

ENV POETRY_VERSION=1.8.3
ENV POETRY_NO_INTERACTION=1
ENV POETRY_VIRTUALENVS_IN_PROJECT=1
ENV POETRY_VIRTUALENVS_CREATE=1
ENV POETRY_CACHE_DIR=/tmp/poetry_cache

RUN pip install --no-cache-dir poetry==$POETRY_VERSION;

COPY pyproject.toml poetry.lock ./
RUN --mount=type=cache,target=$POETRY_CACHE_DIR poetry install

# FROM python:3.11-slim-buster AS runtime
# NOTE: issue is these nvidia dependencies are more widely available in ubuntu-X
# packages. Eg. sometimes arm images packages missing etc. So to avoid
# extra headache, we use ubuntu+cuda images
#
# Otherwise we need to do:
# RUN wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
# RUN dpkg -i cuda-keyring_1.1-1_all.deb
# RUN apt-get update && apt upgrade -y
# RUN apt-get install -y --no-install-recommends <pkg_name>
#
# all of which gets very os and arch specific which cause another issue
# because nvidia refers to amd64 as x86_64 in the URL and the docker
# buildx auto env var sets it as amd64 etc. can of worms.
FROM nvidia/cuda:12.0.0-cudnn8-runtime-ubuntu22.04 AS runtime
WORKDIR /app

ENV TZ=Etc/UTC
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -y --no-install-recommends \
libpq5 \
tesseract-ocr \
libtesseract-dev \
software-properties-common

RUN DEBIAN_FRONTEND=noninteractive add-apt-repository ppa:deadsnakes/ppa &&\
apt-get update &&\
apt-get install -y --no-install-recommends \
python3.11 && \
rm -rf /var/lib/apt/lists/*

ENV VIRTUAL_ENV=/app/.venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
ENV PYTHONUNBUFFERED=1
# NOTE: Usually manually setting PYTHONPATH is not needed, but with the
# cuda-ubuntu image, something gets messed up and we need to set this
# manually
ENV PYTHONPATH="$VIRTUAL_ENV/lib/python3.11/site-packages"

COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}
COPY src src

ENTRYPOINT ["python3.11", "-m", "src.abc.cli"]


And I have a github actions workflow for building:

name: "build:push:deploy:abc"
on:
push:
branches: [main]

# in case of back-to-back deploy, we cancel older deploy
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

permissions:
id-token: write # This is required for requesting the JWT
contents: read # This is required for actions/checkout

env:
DOCKER_USERNAME: xyz
DOCKER_REPOSITORY: abc
IMAGE_TAG: latest

jobs:
build:
# see https://github.com/orgs/community/discussions/19197
runs-on: ubuntu-latest
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- uses: actions/checkout@v4
- name: Log in to Docker Hub
uses: docker/login-action@f4ef78c080cd8ba55a85445d5b36e214a81df20a
with:
username: ${{ env.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
version: v0.11.2
buildkitd-flags: --debug
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
- name: Build & Push
uses: docker/build-push-action@v5
with:
context: pipeline/transformations/abc
file: pipeline/transformations/abc/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: "${{ env.DOCKER_USERNAME }}/${{ env.DOCKER_REPOSITORY }}:${{ env.IMAGE_TAG }}"
cache-from: type=gha
cache-to: type=gha,mode=max

https://redd.it/1ejb7ed
@r_devops
Acquiring a New AWS Environment: Seeking Insights on Best Practices for Smooth Transition and Integration

Hello AWS community,

Our company is in the process of acquiring another firm, and part of this acquisition involves taking over their AWS environment. The services they use include EKS, RDS, and Elastic Beanstalk, among others. We'll receive a replica of their system on a new AWS account that will be handed over to us.

What do you guys recommend for us to stay on a lookout, anybody experienced with such transitions?

https://redd.it/1ejd2gj
@r_devops
dynamic vs single database approach for beta version

Hi everyone,

We're about to launch a beta version of our product and have implemented a dynamic database approach: creating a separate database for each company using our platform. We chose this method to enhance security and stability, ensuring that if one database goes down, the others remain unaffected.

However, a friend of a friend recently commented that this approach might be overkill and expensive, suggesting that a single database could be more efficient and effective.

I would really like to hear you advice on this matter, for those who have experience with both approaches, which is better? In the long run or at all? I really want to avoid a pitfall with this, please if you have some knowledge to share it will be really appreciated!

https://redd.it/1ejcyin
@r_devops
Do any of you actually genuinely care about your employer’s core business?

Always been a technical oriented person, and find myself sometimes zoning out once co-workers on the non-technical side of the firm start talking about stuff they find interesting.

I might get drawn in if I sense they really know their stuff, but on the day to day unless the core business is related to anything I am familiar with I would be fighting for my life to stay awake in the workshops with the project managers.

Anyone else experience something similar, if so what motivated you to overcome it. E.g might be that you are a backend engineer at AirBnb that really cares about the service industry and the how AirBnb could introduce new features to improve.

In contrast, I have enough on my plate and if the PM thinks that some cool feature should go into a sprint then ok. But I would rather stay up to date on AWS, Docker, K8s than discuss the review system of AirBnb.

Background: been in tech my whole adult life, and got some opportunities to do some customer facing tech related jobs i.e customer support. But once I finished my master’s in Comp Science, the technical part of my brain just kept churning and got thrown into the deep end of a steep learning curve, technically speaking. I

https://redd.it/1eji6l9
@r_devops
Monitoring Recs?

Hey All

Just spun up some infrastructure for a project im working on (everything is currently hosted on aws in case it matters).

I want to wire in some monitoring. Since it is somewhat of a side project at this point, im curious to see if anyone has any recs on monitoring frameworks, or had their eye on a cool monitoring project that they may or may not have had a chance to check out yet.

Typically I either lean Prometheus+Grafana or Icinga2, but figure i might as well spice things up a bit.

Thanks in advance!

https://redd.it/1eji2nn
@r_devops
What does your DevOps look like, and what can I propose to my company to improve our DevOps

For some context, our company is a BPO (Business Process Outsourcing) provider. We write custom software solution to sell to business. Our core business is data scanning and document formatting. Most of our applications are monoliths and some shared services like a GPU farm. (I am also newish of 1.5 year, so I may not know all our stuff). What I guess you could need in a DevOps environment:

A central place for issues, tickets

We have mantis, helpdesk and "Espaceprojet" a tool to manage clients demands. We have been promised Jira, but I don't think it's happening. Nothing is linked with DevOps pipelines...

A central place for code, code reviews, version management

We have self-hosted GitLab, we use a gitflow workflow. We have runners for code linting / testing. Our git projects look like this in general :

- project-core

- project-iac (100% puppet)

- project-terraform

- project-confjobs (jobs are for Rundeck) confs are a bunch of yaml file that will be ingested at provision time



A central place for credentials

Vault (self-hosted) vault work well for stuff like renewing certificates

A central place for configurations

We use git, but we are moving to Vault even for basic stuff. Vault agent can dynamically load configurations, however, many of our application need a restart to apply the modifications. I am not sure vault should be used like that

A central place for pipelines :

We use Jenkins, we build "project-core" and "project-iac" as Linux packages automatically with it. We need to launch "project-terraform" manually to create all the VM that will be necessary for a project (no docker... 🙁)

A central place to store deployment ready images (artifacts)

We use nexus for all our packages and artifacts. We have many, many puppet modules to manage the installation of tools on our VMs.

A virtualization platform or bare-bones metal

We have both, we have two big Nutanix clusters, a lot of VMs on Outscale (
https://fr.outscale.com/) some few VMware and HyperV here and there. We also have a developer cloud (OpenStack). We also have a GPU farms which is used for some of our products.

A service discovery service : We don't have that
An orchestrator per project

We use Rundeck with scheduled jobs (Backups, Restores, Reboot, Tenable Scans, Jobs to export logs, many other things...) We also use the Rundeck for provisioning VMs and deploy applications to different environments. Most of this is manual, but it's always just one button to press

A central monitoring platform

We have Centreon and an internal tool that leverage Rundeck jobs to check service health, then create a report and then send it to Nagios, which can send us alerts or even calls.

A place to store procedures, on call procedures, contacts and Architecture documents

on SharePoint and in a wiki

The local developer environment: We have a project skeleton and an openstack cloud for devs to bootstrap a dev environment and do their own test.

Please roast us.

https://redd.it/1ejkdux
@r_devops
How many of you have seen the original Flickr/Velocity-2009 presentation?

https://www.youtube.com/watch?v=LdOe18KhtT4

this is essentially the moment 'devops' was born as a popular buzzword concept. patrick dubois started the eponymous conference a few months later. we all know that the term grew to mean everything and anything and then eventually nothing, but I'm curious what % of devops redditors ever even saw the original meaning/purpose in the first pace?

View Poll

https://redd.it/1ejklvt
@r_devops