Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Write docker image size and build date to a file and contain in the image

I want to be able to read the container image size and date from a file in the container after it’s published and when running.
I’m also working on a bash script to read the date on the file but having some issues.
Any suggestions or help greatly appreciated!

https://redd.it/yk33ls
@r_devops
I wrote an OSS tool to tunnel your IDE to Kubernetes

Since the day I started my DevOps journey, it was always a dream of mine to create an open-source devtool.
I co-wrote a tool called \#KubeTunnel which connects your local development environment to your Kubernetes cluster for debugging complex microservice architectures without deploying them locally, without waiting for a long CI/CD process and without any syncing mechanism to the cluster.
This achieves developing exactly as you would locally with the added benefit of getting full network access to and from your cluster.

Check it out here: https://github.com/we-dcode/kubetunnel


*Buy me a cup of coffee by leaving a star on Github🌟*

https://redd.it/yk2i5b
@r_devops
Different IaC environments on cloud

So I've been working with IaC (Terraform and CloudFormation) on AWS for awhile. I've touched on simple environment stacks where Dev, sit, UAT and prod are identical, this makes trunk based development very simple and easy.

However, I also touched on more complicated environments where the application stack uses different AWS services in different environments to save cost.

just as an example, Dev may only use EC2 instances to run the app, then UAT will include ASG. In prod it will use ASG + ALB...

I'm curious to know if this practice of using different services in different environments is normal? I find it very difficult to make an IaC change to say ALB where it only exists in prod.

In my opinion, UAT should be the exact same replica of prod, so testing can be done in UAT (non production) at the least... this still makes me think what branching and coding strategy is right for this type of infrastructure requirement?

Have anyone else here face similar challenges?

https://redd.it/yk3ppf
@r_devops
Datadog has OAuth Support Now

I'm a little surprised it took them this long but now I expect several companies will build on top of it. For example LambdaTest can show test results from within Datadog, https://www.datadoghq.com/blog/oauth/

It's not clear what endpoints are exposed yet but I imagine documentation will be forthcoming, and hopefully self-serve submissions too.

https://redd.it/yk6whi
@r_devops
How do you control images pulled from public image repositories like DockerHub?

We have a need to control what images a developer can source from DockerHub. Ideally we only want them to pull verified, approved images. But, how to ensure that only approved images are sourced?

For any images brought in, we want to have them scanned to ensure that they are safe to use. But are any other controls recommended to use?

I work in a highly regulated industry and our risk tolerance is very low. The more safeguards, the better. But we are new to container management.

https://redd.it/yk90ba
@r_devops
Guidance on provisioning QEMU VM images based on specific hardware products

## Description
I work for a company that mainly develops custom industrial grade Computer hardware. As a part of the Software, we ship the hardware with an Ubuntu Image with all the bells and whistles in it (think Docker, Linux Cockpit, necessary configuration, container images)

### Tools Used
- Cloud-Init (first-boot provisioning)
- Hashicorp Packer with QEMU Plugin for x86_64
- Ansible (post-processor provisioning)

### Resultant Output

I have `qcow2` images that are successfully push to our internal artifacts registry.


## Query

Since we have a couple of different hardware that we produce in-house, I would like to separate the provisioning on the QEMU virtual machine images based on the Hardware Product Family.

The only problem here is, in a QEMU virtual image, Ansible Facts generally do not work. We build the images in a CI system and then create the filesystem tarballs and boot them "manually" in post-production stage of hardware.

Is there some way I can create Ansible Roles, that can be according to the Product Hardware Family without actually provisioning on "actual hardware"?

### TL;DR

How to create ansible roles for diverse hardware products when trying to provision images virtually using qemu?
e.g.
Product A --> consists of APT packages x,y,z,docker
Product B --> consists of APT packages x,z,docker
Product C --> consists of APT packages y,docker

etc.

https://redd.it/ykfuf7
@r_devops
DevOps for generated art?

Not sure if this is the correct subreddit to post in, but here goes. (feel free to point me to a more appropriate one)

I am getting into generated art, which is going in the way of AI. I want to deploy some sort of pipeline of AI tools/services. But, I don't know where to start? Where do I begin? What tools should I be using? What AI models are simple to deploy and use?

If anyone has experience doing this, I'd love to hear from you.

Thanks!

https://redd.it/ykqhou
@r_devops
I need help with jq

Hey all. Hope it's OK to post this question here, since the context for what I'm trying to do with `jq` is an automation/monitoring that my team is trying to do.

I have a JSON payload with the following structure:

{
"bigArray":
{
"key1": "value1",
"key2": "value2",
"key3": value3,
"key4": value4,
"key5": "value5",
"key6": value6,
"key7": value7
},
{
"key1": "value1",
"key2": "value2",
"key3": value3,
"key4": value4,
"key5": "value5",
"key6": value6,
"key7": value7
},
{
"key1": "value1",
"key2": "value2",
"key3": value3,
"key4": value4,
"key5": "value5",
"key6": value6,
"key7": value7
},
...

}

I must parse/reduce this JSON. I don't care about all key/value pairs; I only care, say, about key2 and key4. So I need a `jq` query that would take as an input the JSON above, and generate the JSON below as an outcome:

{
"bigArray":
{
"key2": "value2",
"key4": value4
},
{
"key2": "value2",
"key4": value4
},
{
"key2": "value2",
"key4": value4
},
...

}

I have no clue how to do this. Can anyone help? I've been Google things like "filter by key" but no good so far.

https://redd.it/yky998
@r_devops
Are you concerned about the economy and potential layoffs?

What skills are you brushing up on, and trying to pick up on in case the inevitable happens?

I feel like our org could strip 80% of our Agile <insert buzzword here> roles lol..

https://redd.it/yl4c6n
@r_devops
How do you avoid DevOps jobs that are really just ops / sysadmin jobs?

Title. How do you filter out the actual DevOps / SWE - Infra jobs, compared to the ones that are really just sysadmin jobs?

https://redd.it/yl3845
@r_devops
(RANT) Gov Devops is Difficult


Run away from any environment which you do not have complete control / access to everything in said environment. All the pain you will experience is not worth it unless you are getting paid six figures.

https://redd.it/yl7z0l
@r_devops
Gitops as an auditlog is not very accessible or informative.

Wrote a blog post about something that was bugging me for a long time:

Gitops as an auditlog is not very accessible or informative.

https://gimlet.io/blog/three-problems-with-gitops-as-deployment-history-and-how-we-overcome-them

But I like the gitops approach, so wanted to fix this and I believe many others made an attempt doing so. What do you think of the issue? How did you solve it?

https://redd.it/yl60ax
@r_devops
How to communicate my manager that our implementation of Ansible is totally wrong?

Title.

Last month I started working for a new company. We work with Ansible and automate mostly simple tasks within our organization. Loads of LDAP management, some infra, etc. From my experience with Ansible I've never come across an environment like the one we have now but I know that none of the best practices are being followed. Things that should be simple playbooks are created as roles. The roles have only 1 main.yml tasks file and a couple variables in defaults/ but absolutely nothing else. Stuff like that should just be playbooks whilst roles should contain more than a couple things (templates, vars, files, etc). They also create new roles which use 90% import_roles from other places and the other 10% is "new" tasks. Needless to say this creates a dependency hell. What happens when they update 1 role? they need to update it in another 50 places. Ah.. they don't use Ansible tags either.

I believe this environment is beyond salvation at this point. It's been going for a long time so there is a lotttt of work done following these implementations. It'd also require a change of mind. How do I tell this to my new manager without sounding like a moron and having my team mates disliking me for basically telling them their work is done wrong? I wanted to create some sort of analysis of the situation and present it to my manager just to explain why this is not following standards and also providing a better understanding on what steps should be followed to improve our work environment. And... admitting everything we have done so far would take too long to repair so we should change that way of working from now on.

https://redd.it/ylebse
@r_devops
How do you handle metrics aggregation over a period of time? Sliding window?

Let's say you are monitoring a metric and you want to alert off of a timespan, not just a single instance.

Let's say CPU. You want to alert if CPU is over 80% for 5 minutes. Do you do static 5 window analysis times? E.g. On minute 5, average out from minutes 1 - 5. On minute 10, average out from minutes 6 - 10. On minute 15, average from 11 - 15, etc. So if the metric is exceeded from minutes 1 - 5, the alert condition would be for the next 5 minutes until minute 10 when it is recalculated and possibly resolved.

Or do you approach this with a sliding window? E.g. On minute 5, average from 1 - 5. On minute 6, average from 2 - 6. On minute 7, average from 3 - 7. Etc. If a metric is exceeded from minutes 1 - 5, the alert condition would be for the next 1 minute until minute 6 when the recalculation puts it below the threshold then.

I feel like sliding window is more accurate because it doesn't "reset" the counter every duration milestone. But curious what the industry standard approach to this is.

https://redd.it/ylf52s
@r_devops
Jenkins Error - doesn’t look like a JDK directory

Hi all

&#x200B;

Im getting the following error in jenkins while trying to specify JAVA_HOME

&#x200B;

" /usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 doesn’t look like a JDK directory "

However java is installed at this directory as can be seen below.

echo $JAVA_HOME

/usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64

[root@ip-172-31-xx-xxx jvm]# ls

java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre jre-11 jre-11-openjdk jre-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre-openjdk

jvm]# ls

java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre jre-11 jre-11-openjdk jre-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre-openjdk

&#x200B;

I dont think i should be receiving this error, i tried changing it to jre-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 but that also didnt work

&#x200B;

Any help would be appreciated. Thanks in advance.

https://redd.it/ylk5k7
@r_devops
Looking for opinions on spinning up dev/staging environment databases.

As the title says, trying to work out a good plan for spinning up ad hoc environments with Pulumi and the databases are the sticking point. All dbs are SQL Server on Azure Cloud.

Prisma is implemented for some of the newer stuff, so theoretically I can set up the db migrations to run as part of the release pipeline, but there are some legacy dbs with a fair amount of stored proc code & lookup tables (and no setup script files)

Ideally I'd like to be able to do any of the following based on the needs of the environment:

1. Spin up a copy of staging/prod with data
2. Spin up an empty copy of the database, with stored procedures, table schema, and lookup tables
3. Spin up a copy of the database with sanitized or faked data

1&2 are sufficient, but if there are tools out there to help with 3 (without me having to write a sanitize script) that would work.

https://redd.it/yljepd
@r_devops
Deploying Next site + Node app and database.

Hey all, looking to host a community site I built in NextJS that is using server side rendering which is the least of my worries but I'm trying to host an instance of directus (node app that interacts with a DB and adds an API layer), I'm trying to find a cost effective way of hosting this setup that is reliable, I don't mind doing heavy lifting but also don't want to over engineer and prefer if if I could find a way to rebuild similar setups for the future.

I figured I could go with the droplet route and setup everything myself, which I went ahead today and did 90% of just that but ran into issues with the reverse proxy on nginx, also it was a bit hefty but I could probably build an ansible role to do most of it, but I was thinking maybe docker would be an thought to make this happen? Directus has a docker image and I could use a postgres image as well, I just don't know how well that works in production or if I should just host that on a droplet or a container service (sounds pricier but idk)

Sorry, if I'm not super clear just trying to find way to make this way and keep it cheap, I imagine traffic will be pretty damn minimal so I don't think I need much. I also don't want to over engineer or have things band-aided together.

thanks.

https://redd.it/ylq6md
@r_devops
GCP from AWS

Beyond searching for the equivalent of the services between cloud providers (e.g. EC2 vs Compute Engine), are there any tips and advice one could share for organizations switching from AWS to GCP?

For starters, I’ve found that there are no accounts, but instead groupings based on “projects” in GCP.

https://redd.it/ylpz5h
@r_devops
Scaling Your Team From 5 to 250+ Engineers: FULL Checklist from your feedback!

A few weeks ago I shared a post on here about scaling your engineering organization from 2 to 250 engineers. It was a long blog post that detailed the stages of growth and what to do in terms of Velocity, Quality and Outcomes.

The feedback I got on that post was honestly overwhelming!

I love this community and your comments and suggestions were truly valuable, as I've been putting together something a bit more extensive for engineering leaders... a full checklist to help navigate these stages, step by step. What to focus on in terms of yourself as a leader, your teams and your processes. I included items on culture (something which a lot of you brought up) and each checklist items has extra resources so you can explore more :)

It came out on Product Hunt a couple of hours ago, so you can check it out there, and if you like it, give it an upvote!

This checklist is a living thing, and it really wouldn't be possible without this community, so, if you have more feedback and suggestions, let me know in the comments, as I'll be adding more items and resources as they come!!

Thank you so much for all your support on this!

https://redd.it/yltt9e
@r_devops