Reddit DevOps
270 subscribers
6 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
providing a unified control plane.
* It provides built-in security features that help to secure communication between microservices.
* Istio makes it easier to manage and monitor microservices by providing centralized visibility and control over the network.
* Istio provides an API-driven approach to traffic management that makes it easy to manage and control the flow of requests between microservices.

## Problems with using Istio in Kubernetes Jobs and docker images built using “FROM scratch”

Kubernetes Jobs (cronjobs) are used to run batch jobs, and when the main container is completed, the Istio proxy sidecar continues to run, which can cause issues with resource usage and security. 

To terminate the Istio proxy sidecar, you need to run the following command in your main container:

wget -q --post-data='' -S -O /dev/null https://127.0.0.1:15020/quitquitquit

However, this can be a challenge when the main container is built using the "FROM scratch" Docker image, as it does not include the wget/curl tool, which is required to terminate the Istio proxy sidecar.

In such cases, there are a few alternative options to consider:

* Modify the Docker image to include the wget tool: You can modify the Docker image to include the wget tool by using a base image that includes it or by installing it directly in the Dockerfile. This solution would require modifying the existing Docker image and could potentially impact the size of the image.
* Configure Istio to automatically terminate the proxy: Istio can be configured to automatically terminate the proxy when the main container is completed. This can be done by setting the terminationGracePeriodSeconds property in the Kubernetes Job definition. This solution would require modifying the existing Kubernetes Job definition and may require additional configuration changes in Istio also we need to be sure that our main container is completed. 
* Using init containers to copy tools into the main container.

Let’s see what the third option looks like. 

An init container is a special type of container that runs before the main container in a Pod.

In this case, the init container can be based on the **busybox** image and have the following steps:

* Copy /bin/sh and /bin/busybox to a volume (e.g. emptyDir) shared between the init container and the main app.

The command for the main container will be set as follows:

set -x; /myapp || EXIT_CODE=$?; /share/busybox wget -q --post-data='' -S -O /dev/null https://127.0.0.1:15020/quitquitquit; exit ${EXIT_CODE}

The above command runs the main application /myapp and sets an exit code stored in the EXIT\_CODE variable. When the main application finishes, the Istio proxy sidecar is terminated by sending a wget request to https://127.0.0.1:15020/quitquitquit. The exit code of the main application is then returned to ensure that the correct exit status is passed back to the system.

Here is an example of a Kubernetes CronJob YAML definition that implements the solution:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-cronjob
spec:
  schedule: "0 0 * * *" # run every day at midnight
  jobTemplate:
    spec:
      template:
        spec:
          volumes:
            - name: share
              emptyDir: {}
          initContainers:
            - name: init
              image: busybox:musl
              command: ["cp", "/bin/sh", "/bin/busybox", "/share/"]
              volumeMounts:
                - name: share
                  mountPath: /share
          containers:
            - name: main
              image: my-main-container-image
              command:
                - /share/sh
                - -c
                - set -x; /myapp || CODE=$?; /share/busybox wget -q --post-data='' -S -O /dev/null https://127.0.0.1:15020/quitquitquit; exit ${CODE}
              volumeMounts:
                - name: share
                  mountPath: /share
              livenessProbe:
               
tcpSocket:
                  port: 15020
              readinessProbe:
                tcpSocket:
                  port: 15020
          restartPolicy: Never

In addition, the above definition includes liveness/readiness probes for the main container to check that the Istio proxy container has started and is ready to serve traffic. Keep in mind that this cronjob spec doesn’t have special annotations/labels to enable injecting istio-proxy sidecar. 

## Conclusion

When using Istio with Kubernetes Jobs, it is important to consider the limitations of the main container's Docker image and the tools that are included. By exploring alternative solutions, such as modifying the Docker image, using a different tool, or configuring Istio, you can ensure that your Kubernetes Jobs run smoothly and securely with Istio.

https://redd.it/120jdfe
@r_devops
Syntax error Unexpected token after deployment

Hi guys. I have this app which is deployed on Kubernetes. Was previously using a local host now shifted to domain and deployed. But i cant solve this issue that when i try to access the app through domain it gives an error that “uncaught syntax error: Unexpected token ‘<‘ at main.f2d33…… “. Please help me

https://redd.it/120n0ii
@r_devops
Did you become a Devops Engineer instead of SWE because you suck at leetcode?

Not to belittle anybody - we all know the devops space is extremely complex so we don't have to hide behind SWEs when it comes to solving hard problems.

However, after I initially became an SRE "by accident" (the job was simply offered after a SWE internship), I only stayed on the path because the interviews don't require (as much) Leetcode as normal SWE gigs and pay equally well. Even though I find YAML boring and actually dislike overengineering using 100 tools.

View Poll

https://redd.it/120pbgv
@r_devops
CircleCI reduced outages after their promise to do so. We proved it with data.

Almost a year ago CircleCI committed to reducing outages.

I decided to check it with data-backed analysis of CircleCI’s outages before and after their public commitment.

Here is what I did:

1. I used my own product, StatusGator, an outage data platform, to export CircleCi’s incident history for 2022.
2. I calculated the average number and duration of outages by month.
3. That produced this chart (the numbers are average per month)

I can say that CircleCI came very close to keeping its promises -- they decreased outage duration by 41% per month on average.

&#x200B;

||Before promise|After promise|Change|
|:-|:-|:-|:-|
|Outage count |4.5|5.63|\+ 18%|
|Outage duration | 4:37:30 |2:47:30|\- 41%|

Making a commitment to such a level of transparency and providing comments on outages is a brave move for CircleCI.
Rob Zuber, CircleCI’s CTO, and their entire team are definitely improving reliability.

https://redd.it/120ppb9
@r_devops
Heads up: Datadog overcharging customers and (probably) not informing them

So for background, I've been peeved by Datadog's pricing for a while now. It's insanely high, about as much as my company's actual infra. It's so high that it's almost making the priority cut at a startup (with about a million important things to do) which is wild.

I spent some time early last year optimizing our bill by carefully calculating how much reserved capacity we would need and configuring that in Datadog to reduce our bill.

Then in Nov/Dec I went to review our bill and I saw that they'd been ignoring the reserved capacity I'd reserved and charging us for on-demand everything.

I alerted them of the billing error and now finally fucking four months later they issued a refund in the exact amount of their error.

I've never seen a company take four months to resolve a billing issue. They could have addressed it quickly like a normal company. And/or issued a token credit as a "sorry for cheating you" apology like any decent company would. I had to catch them with their hand in the cookie jar or they simply would have done it indefinitely.

Ever since I alerted them of the error, I've been waiting for a batch email letting me and all other affected customers know there was an issue with the billing system that my account was affected by. Never came, and they didn't address it when I asked, so all I can assume is that they are not informing other customers and continuing to cheat you.

So, heads up!

https://redd.it/120rjz5
@r_devops
[Production Issue] RabbitMQ TLS Mystery

Hi Guys,

Just been debugging a prod issue and still can't get to the root cause, although I have fixed the issue, so thought I might crowd source some answers to help me sleep at night.

We use rabbitmq heavily where I work, a typical service bus and microservices situation.

Rabbitmq is managed by ansible, but never really touched.

Anyway last night one service restarted and after that it was unable to connect to rabbitmq. After debugging with s_client the issue ended up being that the ca cert and server cert on the rabbitmq nodes (deployed by ansible literally years ago) were mismatched, the server cert was created with a different ca then the provided ca cert.

This was confirmed as other services connected to rabbitmq, when restarted wouldn't be able to reconnect.

The rabbitmq cluster hasn't had any changes to configuration in a very longtime, the services were connecting fine just 12 hours before and working correctly.

How has this system been working this entire time?

Why has TLS all of a sudden started failing, how was it even working for this many years?

I know there's isn't tons of detail but yeah I am not that well versed in PKI so wondering if a wizard could help me.

Thanks,
Spoved

https://redd.it/120ryg3
@r_devops
What is your process of patching 3rd party configurations?

Say a vendor deployed an app on-prem in your infrastructure via terraform/helm-charts awhile back, and it's been working. When you find security issues like the storage they use isn't encrypted, or EC2 instance is missing some settings, what is your process of addressing it?

I could patch their terraform+helm-charts, but that means when I upgrade the app, might be a pain to merge changes back into their app. If I do my own config on top of theirs, terraform might want to change things back when I re-deploy or upgrade. Do you just maintain your own IaaC to patch and re-patch other poeple's software?

https://redd.it/120utr4
@r_devops
Internal Package Library

Wondering if anyone had experience creating an internal package library for consistent package management. I know you can make one using PyPI, but that’s Python specific. And the industry-standard alternatives seem to be Nexus and Artifactory which you have to pay for.

Anyone have experience with any of these? I’m wondering if it’s even feasible to create an internal library for package management across projects.

https://redd.it/120qxo8
@r_devops
Download packages for different architectures in your Dockerfiles using dumb-downloader, instead of writing scripts or separate Dockerfiles

I hope somebody will find it useful or show me a better solution.

## What
Link: [https://github.com/allanger/dumb-downloader](https://github.com/allanger/dumb-downloader)

This tool is using `env::consts::{ARCH, OS}` to get the current system info, and then using mappings. For example, if you're running `dudo "{{ version }}-{{ os }}_{{ arch }}.tar.gz" -p v1.0.0` on `linux` on `x86_64`, with the default mapping, it will try to download
- v1.0.0-linux-x86_64
- v1.0.0-linux-amd64
- v1.0.0-linux-intel

*Once it has worked out, it will not continue

if the default mapping is not working for you, you can pass a custom. For example, you need to download a package `package-v1.0.0/gnu-linux_amd64intel-v1.0.0` on `x86_64` and `package-v1.0.0/linux_aarch64nonintel-v1.0.0` on arm64. Then you can create a config

```YAML
---
os:
linux:
- linux
- gnu-linux
arch:
x86_64:
- amd64intel
aarch64:
- aarch64nonintel
```

And just pass a path to that file to `dudu` via `-c` flag
## Why?
If you wonder, why would I do that...

I'm pretty certain that there are better ways, but I haven't found one yet. Every time I was trying to build docker image that should work on both `amd64` and `arm64`, I was annoyed by inconvenient names for systems and architectures.
The worst, IMO, are MacOS builds, because they are `macos/darwin/mac/apple` `intel/amd64/x84_64/arm/aarch64/arm64/m1`, and even though situation on Linux is better, I couldn't say it's perfect.

I'm not talking about packages that can be installed by package manager, but for example, when you need to download `helmfile` in you `Dockerfile`. To me, writing several dockerfiles is not an option, so I need to get an arch during the build-time. The most obvious way that I've found, was ofc `uname -m`. But if I run it on the machine provided by Github Actions I'd get `x86_64` when the package is `amd64`, and for `arm64` there are some troubles too, because it could be either `aarch64`, or `arm64`, etc.

Writing a script with kind of mapping and put it to the repo was an option, but since it was not the first time I faced that issue, I've decided to write a cli tool.

And now I can just run `dudo -l "https://github.com/helmfile/helmfile/releases/download/v{{ version }}/helmfile_{{ version }}_{{ os }}_{{ arch }}.tar.gz" -i /tmp/helmfile.tar.gz -p $HELMFILE_VERSION`

https://redd.it/120o95w
@r_devops
Automatically provisioning custom domains for users

I recently came across a web app that had interesting behaviour and thought I’d ask to see if someone knows how they are doing it.

Let’s say a user with a business called businessX signs up for an account with this company. They get automatic acccess to the web app for a 30day trial period.

What I noticed is that the URL that the user uses to access the site contains their business name as part of the subdomain. I’m not sure if they then allow users to point custom domains to that URL.

What I wanted to ask: does anyone know how they can automatically get that URL to reflect the business name? The website is cliniko.com if anyone wants to see the behaviour I’m talking about in action

https://redd.it/1211j7z
@r_devops
OneNote and Teams

OneNote and Teams

Hi

I'm having some issues and thought best to reach out for some guidance/advice in this subreddit but I'll also separately post in another one. I'm trying to move away from the conventional boards system with to do tasks tand the card based system to Onenote as it seems visually to be more intuitive and less jumbled looking.

I want to feature a chat conversation within a page on OneNote that links/mirrors teams so that every time i update the 'chat' in the teams page it updates OneNote. From the research done, it seems as though onenote only does static links that you have to manually click through to get to external content and not dynamic, live interactive stuff.

So i tried a workaround of maybe using power automate to capture info from a teams post and then this info being fed into . i set up two flows, one to sync teams with power automate and the other to sync power automate with teams. The issue is that i dont seem to be getting replies to the post come through to onenote, just the post initial message and mostly notifications.

Also, i have about 60 cards with individual chats on teams for 60 different topics, so using power automate would be in my view incredible complex and time consuming.

So then i thought about what if i created an external bot in replit, registering it with microsoft azure and then embedding it somehow on a blank webpage. And then getting it to mirror conversations from one of my teams chats i want to have feature on onenote, i could somehow embed that webpage onto the desired onenote page. I'm still working on the bot- this seems to be even more complex and challenging but reason i thought of this option was because i can see that onenote offers live preview of something like youtube videos, so why can't it offer something like a live preview of a teams chat which is also interactive.

Thoughts? Any advice on how to proceed and any other work arounds? I don't want hyperlinked links which i could add to the onenote lag because somehow just seems messy.

https://redd.it/120lt8g
@r_devops
CCNA + AWS Solution architect and knowledge of Python. How to enter DevOps/Cloud?

I have 3 months of helpdesk experience.

Which are the next steps?

Thanks.

https://redd.it/120lses
@r_devops
How is the Delivery or Deployment process supposed to work with dynamic data (ie: environment variables)?

So I have some code for an application in a GitLab repository, and some of that code needs to read variables containing data like secrets, URLs, etc. These variables are in a .env file.

The system that ultimately runs the application would, once pulled, read the variables from that file into its environment where the application could then retrieve them (I recognize that another common method is to put the variables into a permissions protected file as opposed to using environment variables, and I think this question would cover both use cases).

In my pipeline, I have a job that collects this data (ie: HashiCorp Vault), but now what?

I was under the impression that a pipeline job would be able to take this data and write it back out to the appropriate file(s) in the repository. It seems to me like that would be a perfectly reasonable approach to making it so that a user or operator could checkout code and run it without having to do things like manual variable substitution; that seems counter intuitive to me.

And so I tried to build a job that did that, but while the file in the job had the correct data, the file in the repository didn't. While I was trying to look for answers as to why, I read a bunch of things that basically said "don't let your pipeline write back to your repository, that's bad".

If it's actually bad practise, what is good practise?

To be clear, keeping passwords and secrets protected isn't the important point of this question. The important point of this question is how to make it so people don't need to be involved in the deployment by having to edit variables in pulled code before running it.

https://redd.it/120lk3w
@r_devops
Question Deploying a Node JS app on a windows server

A bit of a dumb question since I’m not a devops expert.

I have an express js REST api, and a react app that sends requests to it.

All of the system will be run on a private network for internal use, the problem is that the machine currently used runs a windows server and IIS, serving more internal use systems.

Serving the react static files was not a problem, but running node was a bit harder.

### What I tried:

1) Running
 run start-prod
as a service. I didn’t like this approach since I don’t have control over the traffic in IIS and it was hard to restart the server if a crash happens.

2) Using iisnode. This didn’t work and required me to change my code which I’m trying to avoid to be able to easily develop and test my api locally.

What I already know and wanted to do is build the api as a docker container since I’m more familiar with configuring a linux machine, but then I’d need to run docker inside the windows machine which seems like adding an unnecessary layer to the production environment.

Is there a correct way to do things? All the methods I tried look dirty and incorrect and I’d love to get some tips from people with experience on this.
A really nice bonus would be to automate updating the code to include some what of a CI/CD environment.

TLDR: Have an express js api, need to run it on a windows server that is already serving other applications.

https://redd.it/120kvxm
@r_devops
Tell me your CI/CD process

I'd like to see how you set up your ci/cd in high level for developers. I like to get an idea if ever we upgrade. I'll start.

1. Users create a project in gitlab. They follow rules like specific json filenames for dev, stage and prod. These json files are for building the CD pipeline in Spinnaker. Then there is another json named pipeline.json which is meant for building CI in Jenkins automatically.
2. Once a developer git pushes their code with valid json files, a jenkins job will automatically be created. It runs their code for building their application inside a slave machine(triggered by master jenkins)
3. Once the build is complete, an artifact is created. It could be an rpm, deb, tar.gz etc and is uploaded to a central internal registry
4. Spinnaker is poke by the Jenkins job that finished running and will say "I'm done with my job"
5. Spinnaker has created a CD pipeline which will do several tasks, pull the artifact that was uploaded to the registry. It also builds the resources in AWS. We are in AWS. So if EC2 type was chosen by the developer, Spinnaker will provision an EC2 instance and it will install the artifact.
6. Done

Tell me your setup. Thank you.

https://redd.it/121aav8
@r_devops
Dataflow app I'm working on

Hi, I'm working on launching a little web app / service that helps people doing data engineering, operations, devops, perhaps other functions get data from where it originates and into the cloud. Curious if people have thoughts about this and might find it useful:

https://hexcloud.co/

https://redd.it/120jckl
@r_devops
Render.com vs DigitalOcean App Platform - PHP Benchmark

I'm benchmarking PaaS providers to determine which one will be my next, and I thought why not sharing the results with the community...

&#x200B;

DO Instance type: PRO - 2GB/1vCPU (25$/month)

Render Instance type: Standard - 2GB/1vCPU (25$/month)

Both running PHP 8.2

&#x200B;

Test script:

**https://github.com/lavoiesl/php-benchmark/blob/master/tests/memory.php**

&#x200B;

DO PHP memory benchmark (1/2/1381ms):

^(1024 * 256) ^(1) ^(ms 0 B)

^(1024 * 1024) ^(2) ^(ms 100 % 0 B)

^(1024 * 1024 * 16) ^(1381)^(ms) ^(13800 % 0 B)

&#x200B;

Render.com PHP memory benchmark (1/6/1870ms):

^(1024 * 256) ^(1) ^(ms 0 B)

^(1024 * 1024) ^(6) ^(ms 500 % 0 B)

^(1024 * 1024 * 16) ^(1870) ^(ms 186900 % 0 B)

&#x200B;

I must say I'm a bit disappointed in Render's result, I was leaning towards them.

https://redd.it/121cm3j
@r_devops
DevOps speakers

I was put in charge of the agenda for our biweekly DevOps meeting for the entire IT team in our company. The idea was to find some internal/external speakers. I'm ok with the internal speakers, but I wonder how can I find people from outside of the company? Did anyone do anything similar? Thanks for any tips!

https://redd.it/121e142
@r_devops
DevOps and bombardment of tools

Hi everyone,

Does anyone feel like we are under bombardment of tools. When i've started working as a DevOps engineer 5 years ago, there aren't too many tools for various tasks for example:

zabbix for vm monitoring

ansible for configuration management

jenkins for all ci/cd. You can also combine ansible with jenkins.

graylog for logging.

and bash scripts for everything else.

But now cloud providers have their own tools for every single thing.

Think about k8s and related tools, everything become unbelievably complicated.

Every company have different tool set, if you change your job boom. your knowledge no longer works and you have to learn new tools for the same job. I really don't understand, all i want is deploying the code for example but it has become a rocket science.

Depending on cloud tools, is the worst thing based on my experience, i've seen that my company suddenly demand that migrate all our infra from aws to gcp couple of years ago.

What are you thinking?

https://redd.it/121dfc9
@r_devops