Reddit DevOps
267 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Anybody adopted OpenTelemetry for all observability signals (logs, metrics, and traces)? If so, any thoughts?

Looks like there have been a lot of recent advancements with OTel, and I know traces were always had first-class support. I'm curious if anybody has also adopted OTel to handle their metrics and logs as well. If so, what's the good/bad/ugly?

https://redd.it/12u3g7z
@r_devops
How do you deploy secret files within a pipeline?

I currently have a bitbucket pipeline that deploys to docker hub a image and thats fine and all but now i need a way to also include a file (a .crt certificate) while building the image and deploying it to docker hub.

I am imagining holding said file in a bucket, something called PRODUCT-secrets, then copying all contents of the bucket to a folder within the image, but that would kinda suck to do.

The other alternative of course if to simply commit the file in the source code but then every developer would have access to that file, making it a security risk.

There is also the other alternative which is to have the files in the VM that would have the image be deployed onto and bind mount them into the container, but then when the VM gets destroyed the files would be lost, kinda bad as well.

​

Is there a better way to do this?

https://redd.it/12u5fpb
@r_devops
Broken websites: are they 2x as frustrating because we know how to fix them?

Specifically: 404 on https://careers.etsy.com/ca/

Or broken profile views when an ad blocker is installed: https://www.reddit.com/r/Etsy/comments/11512uh/favorites\_not\_loading\_properly/

Sometimes it seems like reddit or some other "social media" site is the only way to report these things.

Is there an alternative? Like a "report problem" link with capcha. Who does this well?

Or hell, just checking your links with a scanner or checking your logs for errors.

https://redd.it/12udo4x
@r_devops
How Useful has Distributed Tracing Been for You?

Do you think it’s worth investing in? What pain points do you have?

https://redd.it/12ufbqh
@r_devops
Five Rookie Mistakes with Kubernetes on AWS

We recently setup a new Kubernetes cluster on AWS and made some rookie mistakes. In a nutshell:

Not setting resource memory limits
Using EBS storage with more then one AZ
Using the default instance templates
Not using version control for your config

Here is the full story Five Rookie Mistakes with Kubernetes on AWS

Can you think of any others or any that you have stumbled over yourself?

https://redd.it/12ugpqp
@r_devops
6 months, 1 failed attempt and a lot of studying - got my RHCSA

Finally got this one over the line today, what a rollercoaster 🎢.

Would recommend to anyone though, feels like the biggest cert I've achieved to date. Onwards to the next (CKA + CKS babyyy!)

Cheers

Good night

https://redd.it/12ujta0
@r_devops
Mentors?

I was wondering if anyone can suggest any mentorship programs, either paid or free. I've got so little time nowadays that working on a single project for too long is difficult and I usually lose interest. I was hoping to find someone who can help me structure some kind of plan and aid in my career. Any recommendations are appreciated, even if they're not related to mentorship!

https://redd.it/12udkts
@r_devops
First DevOps Job is a dumpster fire

I started my first DevOps job a few months ago, and it has rapidly shown itself as a total shit show. The team is heavily demoralized, with almost half on the verge of quitting and/or being fired. We are underappreciated and overstressed, with no upper management buy-in for additional personnel or helpful tools.


The team is too busy dealing with day-to-day fires to build appropriate automation, additional tasks and responsibilities keep getting added on, etc. I was specifically hired to improve automation, but spend so much time dealing with daily work that I am unable to. The job is also 100% onsite for anyone who isn’t management or related to the CEO, and the overall compensation is poor.


While I am learning a lot about DevOps tooling, ways of thinking, etc, I am also definitely getting fed up with the overall culture and environment here.

My question is, how long should I stay here before moving on to a better position/company? Is there anything I can do in the meantime to make the future transition easier or to improve things where I am?

https://redd.it/12umxp0
@r_devops
Intermittent DNS issues in EKS

So I've got a simple pytest that deploys a curlimages/curl pod, execs into it and sequentially curls several AWS endpoints (like 'https://s3.us-west-2.amazonaws.com' etcetc).

It mostly works. However, sometimes it doesn't, and that's fun. Sometimes curl returns 'Could not resolve host' - for different endpoints, on different nodes etc.

I've tried many things and none have helped. Some coredns tweaks (using EKS coredns addon, EKS 1.25 if that helps), using an ubuntu image instead of curl based on alpine (read some scary stuff), looking into coredns logs (failed curls don't show up there so I guess coredns is off the hook?), adding some sleeps between execs, using FQDNs ('https://s3.us-west-2.amazonaws.com.', adding more coredns pods (up to 10 on a 10 node cluster)...

More than that, I can't reproduce this manually (i.e. running multiple infinite loops of kubectl exec -ti curlpod -- curl -vvv url doesn't produce errors **at all**). That's the strangest thing - both pytest and manual kubectl execs run locally from my machine, and pytest fails fairly consistently, while manual repro doesn't, at all.

Y'all, I hate computers sometimes.

https://redd.it/12unh2a
@r_devops
Hey guys, just landed a gig as a DevOps release engineer! I'm super stoked but also pretty nervous. Any seasoned vets out there have any tips or advice for a newbie like me? Would love to hear your thoughts!

Hey everyone, I just got a sweet job offer at a company that makes iOS and Android apps. Despite not having many options in these tough times, I accepted the offer and I'm excited to get started.

The company's infrastructure is all on-premise, with Jenkins as the main tool and some Python scripts thrown in to maintain the apps. My main task will be writing pipelines to build the apps with "xcode," but I don't have any experience with mobile app development.

Do you guys have any recommendations for materials I can use to get up to speed before I start next year? Any fellow release engineers out there with advice for a newbie like me? Thanks in advance!

https://redd.it/12ubjcr
@r_devops
I’m looking for my first engineering role and I feel like I’m being marketed for roles I’m not qualified for

Quick background about my skill set
* Work at a large company in a non-technical IT team
* I have all AWS associate certs but I’m not a great coder (basic-intermediate Python and HTML/CSS). I’ve done a few small projects (ex. Cloud Resume challenge)
* I use Linux often and am relatively comfortable with it and some very basic bash

Quick overview of my company IT teams
* We have teams of Tier 1-2 Application Engineers across IT that own 1-2 web apps. They basically just code and use CICD to push small stuff through with a handful of AWS services. Nothing flashy. This is what I envisioned my first step being.
* We also have mixed teams (Technical Project Managers, Scrum Masters, and often just one tier 2-3 Cloud Engineer). These engineering roles are closer to DevOps with Docker, Ansible, Kubernetes and seem quite advanced. This is what is being marketed to me.

My issue - I’m being pushed towards the latter (mixed team) based on my manager’s networking and opportunities arising in my department/subdivision. He seems to think I can do it. The hiring managers seem to think so too, but I really don’t. I envisioned my first engineering role to be on a team of others I can learn from. Instead, it seems like everyone wants me in a team where I’m alone to build things by myself. This is really daunting to me as someone who can barely code. Yeah I know a lot about AWS, but I’ve never even deployed to prod before. I’m going through courses now for front end to complement my Python but it’s going to take me months to be even be intermediate. I can’t help but feel like all of these people are overestimating me. My manager talks like a Tier 1 Application Engineer is beneath me (which sounds crazy to me).

Are they overestimating my ability like I think they are? Or am I underestimating myself?

https://redd.it/12uaax5
@r_devops
How can I pass a terraform variable in to a custom _data script block in a terraform file

Please accept my apologises if I explain incorrectly I’m a self taught DevOps engineer coming from a traditional windows infrastructure engineer background got here with some guidance’s from this page. I have a script that installs a application on a azure Linux vm silently with a configuration using terraform ..I’m using custom_ data this works great but it can only silently install the package with me giving it the password for service account , instance name, etc. This is all sensitive information that I’d rather the bash script gets from variables in the variable.tf file. Is there any way to pass through these variable in bash script. For example the password for the service account in bash script i’d like to use my variable.tf file to pass on to bash script which would be var.password. Hitting a dead end try to add terraform variable from variable.tf to my script.sh file using custom_data ..Happy to hear any other elegant way to achieve adding configuration like this .. I have terraform ,azure,GitHub actions to achieve this task unfortunately I don’t have ansible or cloud-init to work with. Any help would be appreciated

https://redd.it/12uvqeu
@r_devops
Advice after failure

Failed my apprenticeship in the uk

I was a devops apprentice and I worked in cloud.

I tried extremely hard to pass. I worked really hard on my project and created a good presentation.

The feedback says I didn’t show that I met the knowledge of monitoring, CICD, and unit testing.

In the presentation I talked about installing monitoring tools (azure log analytics with terraform), I talked about configuring notifications argocd notifications and I debugged them with logs showing the credential had expired. I showed the CICD process and I talked about unit testing with kustomize and validation scripts.

It seems like the assessor was extremely harsh or got confused during my presentation because it wasn’t clear enough. In the report they don’t mention any of the above they just hold onto the fact that some of the stuff was broken / being tested by other people at the time of the demo even though I showed previous runs from when it was working.

They’re telling me to resit it I have to pick a brand new project and spend 3 months preparing.

I already was offered a full time position which I started a month or two ago.

I guess I’m just looking for advice.

1) Do you think this is going to affect my employment in any way?

2) Should I try to keep it a secret at work?

3) Should I try to retake it with a new project and spend another 3 months on it or focus on new things?

The qualification was going to be a Level 4 apprenticeship.

Thanks.

https://redd.it/12uw3wp
@r_devops
Building from source or not?

Wrote up a counter argument to "Build Once, Deploy Anywhere" artifact promotion here:
https://dabase.com/blog/2023/build-from-source/

Perhaps I'm being naive about:
A) Build + test can be fast
B) Builds can be reproducible?

https://redd.it/12uy6q4
@r_devops
Terraform giving 403 AuthorizationFailure after accidentally deleting the private endpoint to a storage account

I added a wrong configuration (multiple subresource names for an endpoint) and applied instead of planned. My old private endpoint got deleted and now I get this 403 error whenever I try to reapply with the good configuration.

I am applying it from a github workflow (that I did not create because i am a beginner). Can anyone give me a suggestion? I also tried creating the endpoint manually, but for some reasons the organization rules i am under did not allow me to add manually a private dns zone. So now i get Failure sending request, status code = 0, context deadline exceeded. Help please?

https://redd.it/12u3kdh
@r_devops
Question How to approach a system architecture design problem when needing to balance expandability and development

I have a project I want to take seriously which involves designing an app that performa similarly to google maps (mostly geo queries).

I need to store data about sites as well as their position and allow searching and filtering by location (as well as other parameters).

### Some technical data:

- Support around 1,000 monthly users (similar use time to door dash or uber.

- Support around 1,000 geolocations

- Support fast image queries (around 5,000 images), editing them is rare.

- Support user authentication and authorization.


Since I’m working alone I need to build something that will serve as a prototype and would be useful to convert into a real usable system by someone more experienced than me.

### My current plan:

1) Use a no-sql database and basic geohashing to store data.

2) Store the database in a centralized server (since all locations will probably start in 1 city as a pilot).

3) Use a rest api to interact with the database (mostly due to scalability issues I might face).
Avoid micro-services for now (I’m really not sure about this so I need your help).

4) Use an SQL database with blobs to store images. Alternatively use buckets, depends on how much hard it’ll be to implement and how important it would be.

5) Use a third party service (such as firebase) to support push notifications and 2 factor authentication.


I’m a software engineer and most of my experience has been on building web apps, so I need some feedback about my plan.

- What can I give up on to make development easier and faster, while not requiring to rewrite everything later?

- Are there any things that I must do that aren’t in my list?

https://redd.it/12v00z8
@r_devops
What are good job boards for senior (ICs and managers/directors) EU-based remote jobs?

Hey all! I'm looking for information and opinions from both hiring managers/recruiters and job seekers. What are the best places to look for/post jobs for remote EU-based SRE/DevOps roles, ideally at a senior IC level and above, and for the management side, e.g. Manager, Director, VP.

Most job boards I've seen are overwhelmingly US-focused and have very little traction with EU roles/viewers. Some don't even have a way to filter only EU-based posts. The best I've seen so far is LinkedIn, but I feel like there must be something better. What say you, r/DevOps?

(and yes, I used the search, and I considered other subreddits but this one seems appropriate given this is a discussion. mods correct me if I'm wrong)

https://redd.it/12uzvdy
@r_devops
Are there things like Ephemeral Virtual Machines that can be used in CI system?

So I have prepared an Ubuntu-based VM via Packer + Ansible for our team at work. It all works smoothly, Here is the catch, the team wish to do some integration tests in the VM.

AFAIK packer in this case won't be suitable because on succesful runs, packer (with the qemu plugin) will try to generate the a new VM on the test server for me which is not what I want. I want something where I can spin up my VM, push the test dirs into it, trigger them and throw the VM out on successful testing.

The only solution I can think of at the moment is spinning the VM via a bash script with qemu-system-x64 commands and then achieve what I want.

Is there some tool that can integrate well with my GitLabCI system to achieve this ephemeral VM logic?

I thought about Vagrant but I doubt it might be a good solution since I see it more as a development tool.

https://redd.it/12v4gwe
@r_devops
Github actions conditional matrix

Hi all,

Trying to reign in the github actions runners and can't seem to figure out how to do it. Or even if it's possible
Here's a trimmed version of the workflow file

​

name: Build on Pull Request

on:
pull_request:
paths:
- 'path/test1/**'
- 'path/test2/**'
- '.github/workflows/this-file.yaml'

jobs:
build:
runs-on: ubuntu-latest

permissions:
id-token: write
contents: read
pull-requests: write

strategy:
matrix:
env:
- "dev"
- "prod"
dirs:
- dummy
exclude:
- env: ${{ github.base_ref == 'main' && 'dev' || github.base_ref == 'develop' && 'prod' }}



steps:

- name: Build
run: build something
working-directory: ${{ matrix.dirs }}
- name: Validate
run: validate something
working-directory: ${{ matrix.dirs }}

The problem is that if i want to make it more complex for other items I am having trouble understanding how to do it.
For example if i wanted to change the format to


env:
- account: "1234567890"
name: "dev"
- account: "0987654321"
name: "prod"




How can I go about getting the same behaviour. The documentation for [exclusions](https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#excluding-matrix-configurations) is extremely lightweight and deals more with hardcoded values than conditional logic.


trying to add logic with fails with my attempts.

strategy:
matrix:
env:
- account: "1234567890"
name: "dev"
- account: "0987654321"
name: "prod"
dirs:
- dummy
exclude:
- env:
- account: "1234567890"
name: "dev"
if: github.base_ref == 'main'
- env:
- account: "0987654321"
name: "prod"
if: github.base_ref == 'develop'


Any help would be appreciated

https://redd.it/12v5bcm
@r_devops
Nomad agent dev - missing drivers (docker)

Hello,

I'm desperately trying to run ***any*** job with `nomad agent -dev`. I am using nomad `v1.5.3`.

Unfortunately I get something like:

```
2023-04-22T18:12:02+02:00: Task Group "redis-test-web" (failed to place 1 allocation):
* Constraint "missing drivers": 1 nodes excluded by filter
2023-04-22T18:12:02+02:00: Evaluation "949981aa" waiting for additional capacity to place remainder
```

The information I could get was to reboot and/or restart the `docker` service (did not work). I even uninstalled and reinstalled `nomad`, but didn't work either.

So in summary:
* Rebooting didn't work
* My user is in the `docker` group
* Reinstalling `nomad` didn't work/help either

Where can I get more information ? I tried the logs (terminal output) but it didn't really help me.

I am using `archlinux`.

Thank you very much in advance for any help

https://redd.it/12vb0x3
@r_devops
Do you know, or or is it worth knowing, any front-end languages?

I know languages like Python, Bash, etc are useful for DevOps, but what about front-end languages like basic HTML, CSS, Javascript?

https://redd.it/12vd3ni
@r_devops