Reddit DevOps
271 subscribers
9 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How we brought automated Rollbacks to 2,100+ services using Argo Rollouts

Hey everyone đź‘‹ I work in the Backend Platform team at Monzo.

We've written about how our team brought automated rollbacks to our deployment system. This is the most substantial change we’ve made to our deployment system in some time, so it was not without its challenges!

At the heart of this new feature is Argo Rollouts \- a Kubernetes extension that supports advanced deployment strategies. In this post we dig into how we integrated Argo Rollouts with our existing deployment tooling, while keeping the Monzo delight factor. We show how we migrated all 2,000+ services to this new system and discuss the lessons we learnt along the way.

đź”— Here's the link: https://monzo.com/blog/2022/11/02/argo-rollouts-at-scale

We’d love to hear your thoughts and questions.

https://redd.it/yox68a
@r_devops
Hi r/devops, how would you write end-to-end system tests for a system comprised of multiple java apps connected by kafka and with multiple databases? I have managed to run the whole system in docker for development. Now I need a framework to write tests cases as below and run them in docker.

app_1 --> kafka_topic_1 --> app_2 --> kafka_topic_2 --> app_3 -> postgres_db

example test case: app_1 publishes a message + assert new db entry created

https://redd.it/yown5v
@r_devops
Serverless Containers, forced to do microservice? Per entity or per operation?

Tech stack if it matters: Fastify GraphQL Docker image.

I have monolith application that was initially on Google Cloud Run and cold start was pretty bad. But now that I’m thinking, it was probably because my container was a monolith.

Now plan on migrating to AWS, lambda can use Docker containers. Was watching AWS talks, that you should keep everything small to reduce cold start. Please note: I do not want to use AWS App Sync, I want my GraphQL schema to be with my application and not with AWS for cloud agnostic. But then again I have to make my Docker containers specific to Lambda image I think.

Should AWS Lambda containers be treated the same as Google Cloud Run? They are essentially the same right?

Back to the main question, with either AWS lambda containers or Google Cloud Run containers. Both serverless containers, I am pretty much forced to do microservice just to have small cold start, correct?

Do I break these down per entity? Or per method? A container for CRUD user (4x Lambdas) or 1x container for user entity and all its methods?

https://redd.it/yos4wp
@r_devops
CyberSec Question - How do I implement secure installation of a debian package?

Hi,

​

I am currently working on some project and I hit a wall and not sure how to proceed.I have a software that creates a Debian package by running through multiple BB repositories. That package is later transferred to an offline system (no internet access). I then run dpkg to install the package.

​

Now the thing is, I want to make sure that there is some sort of verification for this procedure. I want dpkg to only go through for THIS specific debian, and for future debians I create using the software - not just any debian it is given. I also want specific user to be able to perform this installation so I want to put NOPASSWD line in sudoers.d/user file for dpkg command to allow the user to install this debian, but only if verification goes through. I could just go with adding dpkg [filename\] in sudoers file but file name is not good enough.

​

I am not really good at cybersec, so please give me some ideas on how to proceed. Thank you!!

https://redd.it/yoo1ad
@r_devops
Are there "Configuration Manager" solution out there?

Hi

I am not sure if "Configuration Manager" is the correct term.

I deploy my infrastructure as code by using a JSON file as a parameter file for each environment. I was wondering if there was any "Configuration Manager" solution on the market. I am thinking of a solution that would provide a user interface with the ability to create a "form" and add fields with its type (drop-down, int, string). Then, a user could create "new environment", fill and select values in the form and click "Save". The records would be saved in a backend database and the pipeline would be designed to retrieve the records from the database.

The closest I can think is Azure DevOps Variables Groups, but it does not support value type and validation, cannot have a drop-down menu for example.

Thank you

https://redd.it/yp693i
@r_devops
Anyone an expert in APM (Application Portfolio Management)??

Hi, I need to build an excel file of all our business and tech applications (APM) STYLE with details...anyone done this before and have a template of sorts? Thanks.

https://redd.it/yp7it3
@r_devops
Datadog Cost Optimization Tips

Hi folks! This sub provided inspiration for my company to add Datadog as an integration to our product so this is my attempt to return the favor.

This is a list of Datadog cost optimizations we have put into practice with customers and generally a collection of tips that experienced SREs seemed to know about but that we could not find listed publicly anywhere. Hope you find it helpful and please comment if there are more we are missing: https://www.vantage.sh/blog/datadog-cost-optimization-tips

https://redd.it/yp2xz4
@r_devops
What are some of the best ways to handle someone’s ego at work?

Serious question…

There must be an appropriate way to handle peoples’ egos without causing yourself problems…

I’m a Senior Staff level SRE, I’m fairly introverted, and I’m new to this DevOps (infrastructure) team at a very tiny startup company that’s trying to grow to enterprise level. I come from a history of working at much larger enterprises and doing legitimate SWE and actual product SRE/DevOps, and a situation has come up at work where one of the “rockstars” on the team has “corrected” me on something that they really don’t know anything about… and now their inaccuracy is going to cause misinformation throughout the org as well as some production issues that aren’t exactly trivial... But if I correct them, I fear that I may piss them off, may actually make myself out to be an asshole, and put a target on my back. This feels like a catch-22.

Moreover, I’ve already learned that this engineering org is toxic, and management is even more toxic (by far). I want to avoid leaving; I’m actually fairly stimulated by the challenge of surviving (and even possibly thriving) in a bad culture.

How have you been successful in dealing with a person’s ego in a situation where you feel compelled to speak up, especially when that person holds status within the org?

TL;DR: Just need the question in the title answered.

https://redd.it/yp8xbb
@r_devops
Is NixOS a thing?

I've been falling in a nixos rabbit hole for a few days for now. Want to ask if it's somehow good for production and deployment.
RN using puppet to manage all my servers, but that nixos approach looks magnificent to me.
Does NixOS has tools like hiera for managing multiple machines from same repo and including manifests as packages?
Is NixOPS mature enough today?

https://redd.it/ypb3pg
@r_devops
Governance Azure Policy to set WAF IP restrictions

I'm attempting to stop deployments of app services if they do not have the proper WAF custom rules of our ip restrictions for our FD that they are pushed through. I started writing some powershell for this but Azure policy would be best. If not Azure policy, I would like to mimic policy behavior as much as possible. I was initially told I couldnt do this with policy because the solution im trying would need to major resources to understand eachothers logic....

Is the only way to go about this to maybe delete the app service and not block deployment? This kind of seems overboard and not appropriate towards the app service devs. How often can this run? Can it be triggered by app service deployments? Can this be applied to just a single subscription? Etc....it would be great if it can auto enforce it

https://redd.it/ypaqkw
@r_devops
Should I run if networking is created by hand in a Terraform-backed project?

So, I am in a project which has Terraform in the stack, but we don’t have permissions to various things from the VPC category, which means Terraform cannot deploy pur network fully.

Should I run from the project? What are your thoughts?

https://redd.it/yp0vtm
@r_devops
is it alright to build app on same vps that it is running on ?

is it alright to build app on same vps that it is running on ?

https://redd.it/ypes9u
@r_devops
What are some of the most unconventional job titles for devops/cloud engineer that you have come across?

I'll go first.
Recently I saw a LinkedIn post of someone who had their tittle set as 'Chief Devops Wizard'.

https://redd.it/ypgiga
@r_devops
New to DevOps

I have been a full stack developer for about 5 years now and recently moved to a new company. I knew that they didn't have a DevOps team upon interviewing with them but I didn't realize how bad it was. Since I had experience with some DevOps principles at my last job, I had some suggestions as to what could be changed. This led them to ask me to be their DevOps engineer as well (since they didn't have budget to hire one). I was happy to do this because I find DevOps very interesting and look forward to learning more.

That being said, I have no idea where to begin. I have begun to add insight to their code with logs and tracing but I don't feel like that is really DevOps, it's just necessary.

Things aren't containerized, their deployment is very manual, IaC is non-existant and lots of other things.

My question is, where do I start? What is a good base so that I can begin to bring things into the modern era, that is also easy enough for someone with little DevOps experience?

Note: We do use AWS but not to its fullest extent. Also, getting some consultant time is a hard sell.

Any advice would be very appreciated!

https://redd.it/yoy9wo
@r_devops
GCP Associate Cloud Engineer

How much would it take for someone to prepare for this exam?

I have work experience with AWS (cloud practitioner and solutions architect associate certs also)

It's very different then AWS or it more just the naming of the services?

https://redd.it/ypirmp
@r_devops
DevOps best practices - Staging environments

Hi,


I am new to DevOps and learning about the different staging environments.


I find it hard to find a single authoritative source that I can read on the best practices and which is the best approach to take.


My knowledge comes from anecdotes and talking with colleagues.


What I have so far is :


Dev/Non-Prod/Production environments


Blue/Green Deployment


Which type of process should be applied, and how do you technically implement these different environments? Do you have a single repo, and a branch for each environment?


To get some further light on this would be great!

https://redd.it/yk8j36
@r_devops
CICD strategy with UAT

Hi Guys

​

usual approach:

We usually use default or slightly modified git branch strategy with feature-dev-master branches

we create features from dev and put it into dev. After some time Code freeze is declared, dev is "locked", tested by QA and then pushed into master. Master is considered prod-ready and packages built from it are shared with clients.

​

current project approach:

On another project that I joined, my client provides a website to his own clients. Clients upload data that is transformed and prepared to be consumed as files and reports. Their logic is mainly separated but there are some common parts. So some parts may intervene with each other(!)

Their current workflow is feature-dev-master branches BUT they have different environments.

So they use dev branch to publish to dev env and after dev testing - to QA for proper QA testing.

After it's done - branch goes into master, This master branch is published into UAT environment and after confirmation from client - master branch goes into Prod env as well.

https://ibb.co/1nMR50w

problem:

Now the problem here is everything that is in master should be marked as "ready for production" which means every client should check his story and give his approval.

And now we are not in development phase but rather in support phase, which means no planned releases, mainly small changes and bugs.

So my team is facing the current issue - we have couple of features/bugs implemented and ready to be delivered after UAT testing. Suddenly another client came with some critical data issue that we need to fix. We fix it but we can not push it into prod as there are 2 changes that are waiting for UAT approval.

​

Quick solution here would be cherry pick. But it's quite typical scenario so we should cherry pick every time. Moreover as this critical fix was tested on UAT we can not guarantee (like 99.99% but not 100%) that the same correct behaviour remains after we push it into production without other 2 features. Ideally we kind of need to test it again, which doesn't make a lot of sense.

​

I came up with the new flow. Which works better in terms that we will have the branch with only those changes that will go to the production. But it doesn't mitigate this cherry pick issue completely and I'm not sure if there anything else we can improve.

https://ibb.co/Lddpq4m

https://ibb.co/8D3Dv2s

https://redd.it/ypkzmz
@r_devops
👍1
Automation API-like feature for Terraform CDK?

Is there a way to embed Terraform CDK code in a clean way like we can do with Pulumi's Automation API?

https://redd.it/ypnu10
@r_devops