Reddit DevOps
270 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
What can we do better?

I work at a small startup and we offer a system on the web. We currently have 500 subscribers.

Our most pressing issue is that we don't always deliver updates with actual quality and end up hurting our clients in the process.

A few of our latest issues have been:

A guy dropped an index on our database and thought his query to create another index to replace it had run successfully, but it hadn't. Our database was overloaded for about an hour, until he realized his mistake.

A developer was using an ORM to generate queries but one of the generated queries ended up using the wrong date field, which had no index. Many clients reported the system was slow as a result of that.

A front-end developer fixed a bug but ended up bringing back another bug, which had already been fixed before.

I updated our Redis cluster (which changed its hostname) and forgot that there was a pretty important Lambda function which used that hostname. Only found out a week later.

Our main system is in Java, with a mix of Spring and Struts. It's also pretty monolithic. We're currently in the process of migrating most of our SSR pages to Angular and we're also making our back-end available as a public API through AWS API Gateway.

All our developers get full dumps of our production database whenever they need it. The downside of that (security aside) is that they have to wait for hours for their local database to be ready. The back-end developers can also connect directly to our production database (running on AWS RDS) when they need to debug.

Our back-end has a few tests, but they're all very basic and they were only introduced because someone said "hey, we need tests". No new tests have been added since September, even though we've made a lot of updates since then.

Our front-end has zero tests.

We create a different Git branch for every new issue. After the developer finishes their work, they send it to the staging girl to test it. The staging girl manually checks if everything is working as intended. A big issue with that is that she cannot catch any bugs that demand a lot of traffic to reproduce, which end up being the most serious bugs, since they affect everyone.

After staging is done, the developer opens a pull request. Another guy reviews the PR and approves it. The code will, then, go through a pipeline which builds it using Maven and uploads it to Elastic Beanstalk. When traffic isn't too high, we start the updates manually, selecting the version we want and rolling back if there was any issue.

The infrastructure for our main system was created manually, through the AWS console. I've been using IaC (AWS CDK) for new micro-services and when I need to move a service to a different kind of infrastructure. There is no pipeline for infrastructure; updates are performed manually.

Whenever there's a performance/stability issue, I use CloudWatch metrics and logs, as well as VisualVM, to diagnose it. One problem we have is that we don't have a history of our JVM metrics. If we don't happen to be at the office at the time of the issue, we have no way of telling what went wrong until the problem resurfaces.

https://redd.it/10inb1i
@r_devops
How do y'all do Self Service/ Ease of setup for Observability with Dev's?

I am becoming the observability guy for a larger company. We are getting better doing DevOps patterns but our observability really sucks.

I am trying to setup new standards and make it easier for our devs a la platform engineer style.

So seeking input on how you all did it or would do differently (We have to use Elk but willing to implement new tech) .

Part I can't really figure out how to make it easier for the dev's to do this without a lot of extra demand on them.

We use Elk mostly and have logs, metrics, and traces for areas that are willing to take the time to implement but they are rare. Looking to remove the obstacles for other devs.

https://redd.it/10io87o
@r_devops
Take home assignments during recruitment (Poll)

Got take home assignment and tbh its not difficult. I estimate 8hr of work to finish and Test it (to make sure all is ok). We are talking about fully automated deployment. I eventually refused to do it completely (did most of it except cicd part and vpc peering) as I think it is a waste of time for Senior devops and those questions can be easily asked during technical interview.

I'm quite frustrated that I spent 4 hours to do useless thing. Is this i norm in the industry ?


Here is the assignment:
Create 3 vpc (database, application and public) with multi az
Create an application loadbalancer in the public subnet
Create an RDS database for the application
Create an ECS or Kubernetes application with a simple NGINX with any kind of hello world
Create a way for developers to push changes and have that deployed to AWS.

so according to me what they want is:
3 VPC, vpc peering (as you can't link security groups otherwise), lbs, target group, ecs (it it faster then making full blown k8s cluster), ecr, iam roles, cloudwatch log group, rds setup, ci/cd deployment (most likely aws codecommit/build/deploy/pipeline) , code all in terraform, make it nicely with modules, variables and ideally remote-exec to build image and upload to ecr.

Is this i norm in the industry ? Could we just vote to know general opinion about this ?

Thanks

​

TL;DR What is your opinion about take-home assignments during recruitment?

View Poll

https://redd.it/10iohr4
@r_devops
Where have you had secrets leaked?

So obviously git is the obvious place for secrets to get leaked, with accidental files / changes being commited etc.

There has been some research recently into places like pypi, with secrets in all code being pushed up in packages.

I was just wondering where else people have seen this kind of issue happen. I guess docs systems are another candidate?

https://redd.it/10ir2n0
@r_devops
Rating for my two clusters and their storage IOPS



I appreciate if you could rate my iops read/write(disk) on my two clusters

Tier one plus (4hosts) has 24 VMs of total Storage (mix of local datastores and nvme storage)

Highest disk|read : 7144 and average of 6000

Highest disk|write: 5363 and average of 4000

Tier one (6hosts) has 87 VMs of total storage (mix of local datastores and nvme storage)

Highest disk|read : 49879 and average of 35000

Highest disk|write: 11820 and average of 9500

Is this normal or should more tweaking be done on limiting iops?

https://redd.it/10is4bo
@r_devops
Challenging myself with DevOps - want to see if I’m on the right track

Hey DevOps friends - I’m a hobbyist who historically used Heroku for all my app deployment. I’m using their recent pricing changes as a good excuse to push my skills of deployment further, and would love some guidance if I’m veering off track. I'm comfortable with building a frontend, spinning up a backend server/API, and general DB management, and now I really want to dive head into the DevOps world. Heads up: It's a lot of questions/information! Here's the current set up:


1. I have a monorepo set up with NPM workspaces. One workspace has two repos (Svelte frontend and Express backend); the other is a “common” workspace with shared schemas, env configurations, etc. General structure:

-deploy
-backend.dockerfile
-frontend.dockerfile
-packages
-backend
-index.js
-package.json
-...routers, controllers, db, etc.
-frontend
-build/
-src/
-esbuild.config.js
-package.json
-common
-schemas
-zod-schema-files/
-package.json
-env
-env-configs/
-package.json
-package.json
-package-lock.json
-docker-compose.yml
-nginx.conf



2. I use esbuild to compile and bundle all my frontend assets into a build/ folder. I have an NGINX file set up to serve these assets.


3. There are two Dockerfiles in a deploy/
folder (one for the frontend, one for the backend). The frontend file uses the NGINX image, copies the frontend assets, then copies and installs everything from the frontend repo as well as the shared repo.


This is the config of my frontend.dockerfile, which installs the packages from the frontend repo as well as both common ones, runs my build script, then copies the appropriate files over for my NGINX config:

FROM node:alpine AS web

WORKDIR /build

COPY ./package*.json ./
COPY ./packages/frontend/ ./packages/frontend/
COPY ./common/schemas/ ./common/schemas/
COPY ./common/env/ ./common/env/
RUN npm install
RUN npm run build

FROM nginx:latest

COPY ../nginx.conf /etc/nginx/conf.d/default.conf
COPY --from=web /build/packages/frontend/build /usr/share/nginx/html/



And the backend file, which copies the backend repo and the common ones again (this seemed extraneous but I couldn't get it running without copying in both places). It also generates my Prisma instance:

FROM node:alpine

WORKDIR /usr/src/app

COPY ./package*.json ./
COPY ./packages/backend/ ./packages/backend/
COPY ./common/schemas/ ./common/schemas/
COPY ./common/env/ ./common/env/
RUN npm install
RUN cd ./packages/backend/ && npx prisma generate

EXPOSE 3003

CMD ["npm", "start"]



These are the contents of my docker-compose file:

version: "3"
services:
web:
build:
context: .
dockerfile: ./deploy/frontend.dockerfile
ports:
- 8000:80
node:
build:
context: .
dockerfile: ./deploy/backend.dockerfile
ports:
- 49160:3003
depends_on:
- web



And my nginx.conf file:

server {
listen 80;
root /usr/share/nginx/html;
gzip on;
gzip_types text/css application/javascript application/json image/svg+xml;
gzip_comp_level 9;
etag on;
index index.html index.htm;

error_page 404 /404.html;
error_page 500 502 503 504 /50x.html;

location ~* \.(?:css|js|map|jpe?g|gif|png|ico)$ { }

location / {
autoindex on;
try_files $uri $uri/ /index.html;
}
}


Running things locally works fine (I can open the app in localhost, navigate around, hit my API, query my DB, etc.), but I have some questions and confusion on next steps:

1. I see some set ups include things like Postgres and Redis images in their Docker setup. Is this a best practice, or typical just for local testing? Both my prod and dev DBs are set up through AWS, and I have my .env file pointing to my dev DB URL, and I'm planning to add Redis for caching. I'm struggling to see the benefit of including images in Docker here for either.


2. My plan is to host everything via AWS. If my proposed end state is:
-Frontend is routed to my-domain.com, served via
NGINX
-Backend is routed to api.my-domain.com, also served via NGINX

Should the NGINX config have locations for both my top-level and subdomain? Or should there be two separate NGINX configurations? If one file, does it matter if the NGINX configuration is with my frontend image?


3. If planning to use AWS for deployment, what's the best way if using Docker Compose? I saw a tutorial where they deployed a web build image via ECS, but I've also seen tutorials recommending EC2. Or could/would be used in conjunction?


4. I see a lot of different recommendations online about setting up SSL certs (note: I have my domain and certifications purchased through AWS already). From what I gather, it'll be its own Docker image, though I think it's largely dependent on how the project will ultimately be deployed.


5. Presumably the CI/CD should manage the updates after all tests/checks have passed. This would update the images/containers wherever this is ultimately hosted in AWS, as well as any updates made to my dev DB schema and apply them to my prod DB, correct?


Again: This was a lot, but any guidance for a newbie in DevOps world would be great. I know I'm boiling the ocean to a degree, but that's part of the fun.

https://redd.it/10iwm68
@r_devops
Which monitoring system do you use in your company?

Please explain why do you think it is good or bad!

https://redd.it/10iztux
@r_devops
Is there a GitHub Actions equivalent to CircleCI dynamic config?

I’m using a monorepo and only want to run workflows for my affected projects. In CircleCI, this was pretty simple using https://circleci.com/docs/dynamic-config/. GHA doesn’t seem to mention anything similar in documentation. It seems like some people have done something similar https://stackoverflow.com/questions/65384420/how-to-make-a-github-action-matrix-element-conditional.

Have you guys done dynamic workflows in GHA, if so, how’d you do it?

Best answer gets a beer 🍻

https://redd.it/10j4100
@r_devops
Location of cloud builder workspace?

When you clone a repo in gcloud cloud builder, where is the location actual local workspace?

https://redd.it/10iue0o
@r_devops
Is there a single specification and a tool which can run pipelines in any ci/cd provider?

CircleCI, bitbucket, github workflows all these are similar in nature but different in the specs(yml).

To migrate btw different ci/cd provider, you have to convert the pipelines yaml to the other provider's spec.

Are there tools which work on a higher level, probably a super set of all these provider specs and can produce provider specific spec dynamically?

https://redd.it/10imsw2
@r_devops
Looking for a way to test CI pipeline (gitlab) locally

Hey,


I want to improve our current CI pipeline and was wondering if there is a good solution for testing my changes locally? Something like a gitlab runner that I can setup on my machine, give it my CI definition file and it runs it?
Always having to push the changes to gitlab and checking over there if the changes are working does not feel like the best workflow.

https://redd.it/10j7e7l
@r_devops
What do you use for externalizing APIs?

Hey,

Hope it's okay to ask this here, I'm slowly getting into the devops side of things, experience largely based in the legacy era.

I've been asked to take a look into what we can do to make some of our apps ( standard apps, think jira / jenkins ) available for responses from other SaaS services. I was taking a look into Apigee and Azure API Man ( we are hosted on Azure ). I think these may be overkill / over priced for what we require.

I think we essentially need an API Proxy, something that can receive json and then trigger a process internally to make an API request to the internal apps.

Any suggests on any other solutions I could take a look into?

Thanks in advance.

https://redd.it/10j89be
@r_devops
DevOps training project ideas

Hi all,

I am currently on a path of self-learning after recently becoming the Cloud/DevOps person at my workplace. I come from a physical/on-prem infrastructure background rather than development.

Looking to push myself and increase practical experience. I've looked through the roadmap and know what I need to learn.

I'm looking for some examples of personal projects I can focus on that would replicate a real world scenario - something I can use to say 'I have demonstrable experience of doing xyz'

Thanks!

https://redd.it/10jbmfn
@r_devops
Are there any DevOps podcasts that you would recommend for learning purposes ?

.

https://redd.it/10jecii
@r_devops
Self-made API Gateway in a MicroService architecture ?

I'm starting a new project where I want to explore the construction of a microservice architecture for a e-commerce website. The whole ecosystem would be made of the following components:

A `register` microservice that handles the account creation (email-password || SSO) / reset password features
An auth microservice that handles authentication by issuing JWTs with embedded permissions
An `inventory` microservice that handles CRUD of sellable items
An order microservice that handles order requests from buyers
A `pdf` microservice that handles creation/distribution of PDF invoices using templates
A storefront microservice that serves the front-end app to buy stuff


All these services would be independent units running in a self-hosted Kubernetes cluster. I will be writing these in different languages (Python/Golang/Node) and they should still be able to interact with each other. The whole system would be supported by one PostgreSQL database.
I want to support a complete observability stack, with a focus on distributed tracing. For this I will use Loki/Grafana/Tempo/Mimir (LGTM), and hopefully be able to implement log-to-traces and traces-to-metrics.

With this in mind, I have a few questions:

Should I build my own API Gateway or are there tools out-there that could do the heavy lifting for me ? I imagine having this API gateway having an Ingress on api.foo.bar/, requests from my users would be POST api.foo.bar/auth/ + {email,password} which would then be internally routed to POST https://auth.svc/. I have seen this article which shows that it shouldn't be more than a 100 LoC in Golang ;

Should the API gateway be doing authorization instead of delegating this to the microservices themselves ? Does blocking requests at the gateway level, before they access the service have any advantages ? I fear this pattern would lead to rewriting code of the APIG each time a new service is added.

Would sharing database tables between services be an issue in the future ? auth and register would both be reading/writing to a User table. How would you share the database model, handle the migrations too ?


I'm still learning about this kind of architecture, so feel free to share your experience with it and any common pitfalls I could avoid !

https://redd.it/10jc02k
@r_devops
Ubuntu 22.04 RDP breaks connection and takes me to the login screen

What is this issue with login when RDP connecting on Ubuntu 22.04? I connect successfully with Remmina, but when I try to switch to just single screen in Displays, the connection breaks and the server takes me to the login screen.

Here it says to install this Gnome extension and log out from Ubuntu (server machine) and then log in from Remmina, but then it won't connect to the logged out Ubuntu at all. How do you solve this?

https://askubuntu.com/questions/1411504/connect-when-remote-desktop-is-on-login-screen-or-screen-locked-without-autolog

https://redd.it/10jke2a
@r_devops
How do you debug and log CI docker-build failures?

As intended, I have Jenkins Pipelines connected to Github repositories of my developer's services and a function that builds the service for them on commit. Sometimes, of course, the pipelines fail. My Jenkinsfiles are split into stages, but even that doesn't make it clear enough why it failed. More specifically, if the developer introduced a build-error, we have to scroll through logs until we understand in what layer the error occurred and why.
Is there any method or tool to extract build logs, errors and status without string manipulation or other Jenkins hacks? I could send this result to slack or any other place, but I need first to make it clear why the pipeline failed, and if it failed on build, why specifically did the build fail.

https://redd.it/10jgrih
@r_devops
is DevOps just another job now?

So I've been around a while and when the conversation and thinking began to change in my circles around 2010-2012, DevOps was a concept , a culture. Building on Agile principles and bringing in the thought leadership of the likes of Jez Humble and a dozen others.

It was universally felt at that time that you couldn't have a "DevOps engineer" and job postings for that role were mocked as recruiters and orgs not having an understanding of what DevOps was..

Rolling forward 10 years..DevOps engineer is now a commonly found standard role which seems to me to be an owner/enforcer of automation tooling..

Conversations I'm having with newer "thought leaders" sound very much like DevOps is just tooling and I'm doubting my own understanding or recollections of the past.

Seeking discussion and viewpoints on this ideally from others who were around pre 2012 and have worked through this evolution.

Does the original DevOps still exist? Did it ever? Or has it become something else now?

https://redd.it/10jnyyd
@r_devops
Career Options as a CS Sophomore

Hello everyone! I am a 2nd year Computer Science Student. I've been checking out DevOps quite lately and I eventually want to work as a DevOps engineer after graduation. However, as I've heard from this sub, I can't work as a DevOps straight as a freshman out of college since I may need experience as a software developer and internships are quite rare. If this is 100% true, what other career options can I purse that'll help ease me to getting eventually to DevOps (cloud engineer, solutions architect...?).

https://redd.it/10jnwu2
@r_devops