Reddit DevOps
267 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
how to do proper canary deployment for mutli-region application?

hello, i am in charge of designing canary deployment for our microservices. In the same region, it's relatively simple, I use a weighted route53 and wrote a lambda to control the weight while listen to alerts for rollbacks.

How do i do proper canary for applications that's active-passive in two AWS regions? The application has limitation that it can't be active-active due to data consistency concerns. My current idea is to canary one region, then do the other region, but it seems not efficient, so i am here asking for industry best practice. Thanks!

https://redd.it/1eaqadx
@r_devops
Start-up DevOps

I just joined a start-up

They have few GoDaddy web hosts.
Where
Multiple websites are hosted.

1 was windows server with multiple Databases and.
Net projects.
Should I tell the CEO that it's cheaper to use lambda/Linux servers for some of the services


https://redd.it/1eawj7c
@r_devops
Does anyone have internal CLI tools they have built?

I've started building a CLI tool for our team to use to perform regular actions or search logs in a way that is more aligned to how to how we deploy our applications (think get logs <some-api-we-have> and it'll return back a sensible time ordered collection of logs from various k8s pods, queues and such)

Does anyone else have similar tools? What do they do? Do you find them useful?

https://redd.it/1eb0ni4
@r_devops
Just in time (JIT) AWS escalation tool?

Looking for some tool or service that is:

- cheap / free
- not awful to set up
- can be used with one account/organization
- allows approval and review for temporary audited access to elevated AWS access

I read through this AWS TEAM tool but it requires a second federated organization and my team doesnt want to set up another org in our AWS account.

Any suggestions?

https://redd.it/1eb2ew8
@r_devops
I am a complete noob to devops, and was offered an IaaC role. I am terrified to take it but I really think it can be a great opportunity.

Hi guys, I am currently an a cloud/network engineer supporting a live financial application. I've written SQL scripts, PS scripts, built a few network automation scripts through python, built a few playbooks with Ansible, and learned OOO with C++ in college. However, I have been offered an IaaC engineer role (no production code involved, yet) and I am extremely nervous to take it. I only have about 5 years of true experience in IT but I think this role can be a great segway for me into automation, which is what I've always wanted to focus on rather than pure infrastructure side of things. Im extremely nervous, and I would love to succeed in this role but I do not have much help except this community. Please offer me any advice you have!

https://redd.it/1eb3e3x
@r_devops
CrowdStrike Preliminary Post Incident Review

CrowdStrike put out their official PIR on the incident. I hope whoever wrote this was banging their head against a desk when they had to basically write out "our only testing for this was an automated test that didn't even officially pass".

Here's the link for anyone interested: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

https://redd.it/1eb40oo
@r_devops
No Vault TLS for Production cluster

Hi, i'm trying to set up a Vault production cluster for our company.
The issue i'm having right now is that the browser doesn't recognize my CA certificate. I have created it with this command:

#generate ca in /tmp
cfssl gencert -initca ca-csr.json | cfssljson -bare /tmp/ca

#generate certificate in /tmp
cfssl gencert \
-ca=/tmp/ca.pem \
-ca-key=/tmp/ca-key.pem \
-config=ca-config.json \
-hostname="vault,vault.vault.svc.cluster.local,vault.vault.svc,localhost,127.0.0.1" \
-profile=default \
ca-csr.json | cfssljson -bare /tmp/vault

As i understood this a self signed certificate that's valid only inside my cluster. Used this method as the Vault setup requires tls-server and tls-ca. I can generate the tls-server in my Cloudflare account or use the cert-manager to create one for myself but it doesn't want to work as intended.

extraEnvironmentVars:
VAULTCACERT: /vault/userconfig/tls-ca/tls.crt

extraVolumes:
- type: secret
name: tls-server
- type: secret
name: tls-ca

standalone:
enabled: false
ha:
enabled: true
replicas: 3
config: |
ui = true

listener "tcp" {
tls
disable = 0
address = "0.0.0.0:8200"
tlscertfile = "/vault/userconfig/tls-server/tls.crt"
tlskeyfile = "/vault/userconfig/tls-server/tls.key"
tlsminversion = "tls12"
}

storage "consul" {
path = "vault"
address = "consul-consul-server:8500"
}

# Vault UI
ui:
enabled: true
externalPort: 8200

I was thinking may be to have another certificate to cover the ingress exit only and to use for local cluster a the self signed certificates, but won't work like that too.
Here's the ingress i try to create the connection:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: vault-ingress
namespace: vault
spec:
rules:
- host: vault.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: vault-ui
port:
number: 8200
tls:
- hosts:
- vault.company.com
secretName: default-workshop-example-tls
ingressClassName: nginx

I'm trying to get my head around this for a week, but i can't. Any help would be welcomed! 🙏


The questions are:
How to generate a valid CA certificate? As i understood i can't do it.
How to enable TLS in Vault?
Is my config may be wrong?


https://redd.it/1eb273e
@r_devops
Serverless observability tools (new relic etc.): are my expectations off?

Recently I tried new relic with AWS Lambda (python) and was surprised by how awkward "the basics" of logging and metrics seemed to be compared to my previous experiences with other tools (datadog, elasticsearch, grafana, riemann). Most of my experience with those tools is not with serverless, though.

I'm wondering if that's more a new relic issue or a general problem with poor support for serverless? Am I expecting too much? Am I doing it wrong?

What I expected:

* Searchable, correlated logs across multiple services
* Application metrics (e.g., products sold per hour) and infrastructure metrics (e.g., 50x responses per hour)
* Alerts based on these metrics, integrated with slack and out-of-hours tools like pagerduty
* Performance tracing

What I got from their lambda extension (there are other integration options - e.g. Opentelemetry - but it seems they all have limitations and are a little work-in-progress):

* Quirky Lambda extension with documentation / usability issues
* Logs: currently not clear to me that it's useable to search across multiple Lambdas/services (?!)
* Custom metrics: fine for what I needed, but with caveats (e.g., no tags - no "dimensional metrics")
* Alerts: seems fine. I didn't try it with slack/pagerduty though
* Performance tracing: I didn't need this in my test, but again hindered by documentation issues

How do other tools do on serverless? (datadog, honeycomb, etc.)

https://redd.it/1eb7ddg
@r_devops
SRE/DevOps IDE

Hi! Imagine the perfect SRE/DevOps IDE for your tasks. In your opinion, what is the most important feature it should have? What specific technologies, stacks, integrations, and scenarios should it support? Is there anything else you would like to include?

https://redd.it/1eb9qsf
@r_devops
Daily Work Problems Faced by Engineers in Cloud DevOps and SRE

Hello people.

I know that this may sounds like a very general ask but bear with me. I am looking for problems or process improvements in the Cloud DevOps and SRE work domain. What's something that you as an engineer (or any other employee in this area) face on daily basis or have faced in the past and would like to be solved or made a tool for? My intentions are to start an year long project (so the problem should be big/small enough) that will span my whole senior year in college and the end product being something that helps or solves the said problem.

P.S I would really prefer if it's something that can use some ML to enhance it.

https://redd.it/1ebbo6y
@r_devops
Rant Is devops always so tedious and annoying?

Hello!

This is rant, there will be mistakes in this text because I typed it fast and English is not my primary language.

I'm always trying to debug some obscure random errors with little to no documentation. I find devops so annoying. I'm currently hosting my websites on Azure Static Web Apps (basic React app with Vite and a few Azure Functions) and there are always bugs. I'm using their official GitHub Action pipeline to auto deploy when I push in the main branch or open a PR.

Examples of bugs:
- Exceeding the maxium app size. Ok, maybe it's my fault, but I'm only making a simple React website. I know React (and javascript development in general) is bloated, but come on, why do they offer to deploy React app if everyone will exceed the limit so easily?
- Random error saying "Failure during content distribution". It fixed itself after a few hours, but holy shit that message is so unclear, I searched for hours to find if I did something wrong.
- Sometime the pipeline works, I push code, BAM it now fails (I only added React code, nothing changed in the pipeline nor the config)

Other things I don't like
- Everything is so hard to learn. There's too much documentation (ddouble edge sword, it's good but not for begginers), I find myself reading doc after doc after doc and after a few pages I read something like "If you are using Vite, read [this\]".......... :/
- How can I know if I'm doing something right?
- Should I go with Azure, AWS, Vercel, Cloudflare, Github Pages, or even self host? How can I know what's the best for my use cases???

Thanks for reading my rant. I feel comfortable programming pretty much anything, but when it comes to devops, I feel like a pirate navigating a freaking desert with no water in sight.

I would love to hear where and how you deploy your apps. What services do you use? Do you split the app in a "website" service and a "functions" service? Do you use a backend or only Serverless Functions? What am I doing wrong? What should I do differently?

I'm open to recommandations, feedback and discussions, feel free to comment, I'm not angry at you nor this community, I'm angry at myself for having a hard time with anything devops related...

https://redd.it/1ebc3cd
@r_devops
Is a full CI/CD pipeline for a containerized application possible without kube?

Hi r/devops

The context around why I'm asking this is a bit long, so bleither bear with me or you can just skip the context part for the technical question.

## Context:

I'm a long time tinkerer/lurker of general dev stuff (built a home Nas and hosted a webserver on it, like to play script kiddo on several old pcs with Linux) who recently jumped into it professionally and was hired as a DevOps intern.

I was hired as the sole (besides my boss and some frontdev contractors) developper in a startup a little more than 6 month ago, and boy, has it been --hard-- instructive.

My boss is a great manager and we frequently have debates over what tool to implement and other infra questions that I enjoy a lot.

He has experience mainly as a front developer though, and I found that it really shows. When coupled with his general optimistic attitude "We'll figure it out, don't worry!", which in itself a good thing, it can lead to some... unreasonable expectations.

For example, when I was hired, the "app" consisted in a fully fledged react app hosted on firbase... Which turned out to be an empty hulk in terms of data and functionalities alimented by python scripts ran locally by my boss.

My first task was to "Deploy it" (the backend, ie jupyter notebooks) then "connect it to the front", which I successfully completed, although if I had to do it again, I would certainly make different choices (mainly use firebase as it was intended instead of twisting it into working with a traditional backend)

We are now at a state where the app (front, back and db) talk to each other and somehow work, but it is honestly kind of a frankestein monster. Any software architect worth their salt would probably have a heart attack looking at the repo, and the questionable decisions made by my even less experienced self have already been problematic, as going for Google App Engine for the backend has proven troublesome when it came to orchestrating, for example, a (long running) data pipeline. It is, still to this day, a simple cron on my nas triggering a python script on the remote backend because of weird GCP and GAE limitations.

All that to get the back of the problem: since the app works, and it's been done mostly by a single developer, we've unlocked funding and are now working towards unifying many other apps which were until now proofs of concept into a single unified behemoth. We've hired several people, including a "senior devops" which... well let's just say he's been with us for 2 weeks now and still has trouble getting his python venvs to work.

All that to get to where I'm mow: I am factually the only ops-ish person in a now 10 devs strong startup, each working on one to a couple apps.

For now, I somehow get it together, as so far the devs who worked fast were also of the not-too-blunt type, and I managed to help them make their part work on GCR (lamdas for you AWS people) but I can feel it becoming overwhelming quickly, especially since serverless isn't gonna cut it for a few data and computation-heavy apps we're working on. For these, I can spin up an instance real quick using terraform/pulumi and ansible.

But what about then? What about when it doesn't work in prod? "But it works on my machine" had made my boss tell me to "build a CI CD" but I've come to realize it's not that simple.

## The actual question:

Where do I store all the secrets/configs for so many apps on a monorepo ? How do I inject them safely along the ci cd ? A single giant .env/secret manager ? If so, what about servive accounts/credentials files? How do I handle database connections locally for tests ?

How do I even tackle Ci CD on a monorepo with trunk-based development without branches ?

The answer I've found so far online to all these concerns is always kube, which also seems to solve further issues like scaling and conf management.

If feel like a big multi service app is unmanageable without kubernetes, but my boss refuses to hear about it, as except me and my bootcamp level
knowledge of kube, and a handful devs who worked in an already established cluster, we have nothing. And no time for learning it: we gotta deliver by next calendar year.

My guts tell me we're going head first into a wall, and that it's probably gonna be my job to run around everywhere with a huge roll of tape, but I've come round to realizing that's what a DevOps is to most managers and devs.

I'd still like to hear more experienced views on the matter, though: Am I gonna make it out alive without kube ?

https://redd.it/1eb7dj0
@r_devops
Starting a new job next month as a DevOps engineer. What have I gotten myself into?

Like the title says, I'll start in about three weeks. DevOps engineer at a large org, the team is essentially a "DevOps Center of Excellence" for the entire company. They admin the (self-hosted) GitLab platform, Jfrog, and something called SonarQube. A lot of the work is coaching, training, troubleshooting, and generally ensuring that the engineering teams at all of the different divisions of the company are adopting DevOps best practices and standards. It sounds like there's quite a bit of resistance from the older and more entrenched engineers.

I've been in IT for nearly three decades, but the last ten years doing architecture on the Cloud almost exclusively. Prior to that I was a developer and software engineer mostly for Windows apps and backend web stuff.

What do I need to know to get a head start on this job? I've never worked with a CICD pipeline, or wrote automated testing, or done any of the cool DevOps stuff that took over while I was fiddling with EC2 instances (I guess the IaC stuff I did counts but only barely).

Looking for books, videos, projects, tutorials, or anything else to get me started. Already going through some of the GitLab University material and a Udemy course by Valentin Despa.

Thanks!

https://redd.it/1ebes9j
@r_devops
How are you guys dealing with GitHub "monorepo" releases?

I joined a profitable company about 1 month ago and one of my first tasks was to improve the release process. As most late 2010's company, microservices was a thing there and they had 8 git repositories that had to be released once every 2 weeks and the release manager used to struggle a alot. The team is also made of 5 devs and 2 QA, so 8 repositories is really quite an overhead.

To make things easier I proposed we move everything to a single git repository with 8 root folders and each project would just be a folder within the repository. Not really trying to build any complexity with shared node modules or anything, just 8 folders instead of 8 repositories. Release manager loves me for it because now he has 1 repo to manage instead of 8.

Another thing about it is that they have develop, staging and production. It used to be 3 branches but they complained a lot about getting git conflicts while moving things from dev to staging and from staging to prod. I worked on making use of git tags instead of branches so it dropped the conflict by 99% and devs are happy with it.

New set of issues arrive:
- no way to release only what has changed. Every git tag needs to release all 8 projects. GitHub seems extremely lacking in it's ability to compare a new "staging" tag with the previous staging tag to figure out which folders has changes and release only those workflows.

- no easy way to promote staging tag to production tag. GitHub release page only allows to select an existing branch or recent commits as the tag. Since the staging release is cutoff one week ahead of production release, by the release day it's no longer a recent commit. We are forced to grab the tag with the latest staging release, make a branch out of it and then tag production using that branch. This is less of an issue as it's just annoyance (compared to high expenditure of the previous issue)

- no way to require approval / review of a git tag. Basically whoever has access to create a tag on the repository could just as easily create a personal branch, delete a bunch of code, push that branch to GitHub and then tag it as the latest release for production.


Overall, these are somewhat new issues for me because I spent most of my career working with only develop and main branch (dev and prod) and PR with review required for main. Git conflicts didn't happen because everything was in develop at all times. The introduction of a 3rd stage and the lag between releasing to staging and production creates some git history shenanigans that are annoying to cope with and git tags seems really terrible to manage releases (from develop and DevOps perspective).

What are you guys doing to manage 3 release pipelines?

https://redd.it/1ebi0e2
@r_devops
How do you set up alerts in Dynatrace?

Hi Everyone, I wrote a metric with DQL language in dynatrace. Now having that graph metric, lets say if it goes below some threshold, how to set up alert to send it to lets say pagerduty? I saw the documentation but I cannt find a way to select the specific graph metric I created.

https://redd.it/1ebjf1e
@r_devops
Are there any good Cloud playgrounds? I find many of them are highly restrictive on what you can build

For example, look at KodeKloud. There are so many restriction that you can't do the real world architecture

https://redd.it/1ebh7y0
@r_devops
Any good way to migrate a wiki from Github to Azure DevOps?

Where I work they are switching from Github to Azure Repos, the code part was super easy. The issue is we had a pretty nice and very useful wiki setup on Github, and would like to migrate it to Azure Wiki if possible. I don't have much familiarity with DevOps other than configuring a simple pipeline, so I would like to know if there is a simple way to do that migration other than copying article by article

https://redd.it/1ebi7u4
@r_devops
Has anyone ever used Googles OpenDLP software as their main DLP solution?

Has anyone ever used Googles OpenDLP software as their main DLP solution? How is it nowdays? If we do not have much of a budget for a DLP solution, can I use that or you can suggest some other free software?

https://redd.it/1eboqky
@r_devops
Malware scanning suggestions?

Had a few Wordpress sites get compromised recently. I've got all the things you'd expect (closed ports / 2FA / Wordfence security).

I'm looking into something like https://sandflysecurity.com/ - has anyone got any experience with this?

https://redd.it/1ebqc4p
@r_devops
GCP GKE - Kube-DNS - Custom rewrite rules ?

Hi,

I cannot find a way to define custom dns config in my gke cluster. Im curious whether its simply me missing something or its not possible.

I want to add a rewrite for custom domain based on name regex, while I can very easily do that on EKS by changing the CoreDNS configmap, I cannot find anything like that on GKE?

Help appreciated.

ps. ChatGPT is like "Yea just update coredns configmap on GKE".. + /r/kubernetes seems dead

https://redd.it/1ebpu3n
@r_devops