Reddit DevOps
270 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How do you Bootstrap an Organization in Google Cloud Platform?

I found this process very intense from a team interaction point of view, especially when the conversation goes down a rabbit hole trying to solve the chicken and egg problem.

I try to optimise based on principles while still knowing that we are in a state when we cannot adhere to them 100%. I proceed in a three phases approach:

* Inception Phase (Ring 0)
* Pre-operational Phase (Ring 1)
* Operational Phase (Ring 2)

You can imagine these 3 phases like the protection rings in an operation system where you gradually tighten the adhere to principles and policies. I explained in more detail in this video: [https://youtu.be/RDF4Yf5JhPI](https://youtu.be/RDF4Yf5JhPI)

Would appreciate any feedback.

https://redd.it/11njbmz
@r_devops
How do you handle CSP Headers for a multi tenant application?

right now its just one CSP for all of our tenants and we keep adding domains if we see a block. as you can imagine our CSP is huge.

you think doing a * will not be a security issue? (my heart says it is.. lol)

Dev team seems dont think its a priority to include this in the application per tenant

https://redd.it/11nlice
@r_devops
I'm looking for a good vulnerability management tool, big bonus if it can integrate into the deployment pipeline

Hello all.

Can anybody recommend a good vulnerability management tool for cloud-native applications?

I am currently trying to find something that can perform real time monitoring for a basic tech stack (java, php, go etc), api's and ideally infrastructure,

Bonus if it can perform static code analysis! I am hoping there is a singular tool I can get for all of this, but as it seems it looks like I may have to go with 2 or 3

Currently looking at Dynatrace, which has some great agent-based real time monitoring, but it lacks the overall infra and static code monitoring

Any help appreciated, thanks

https://redd.it/11nl6jf
@r_devops
Is Push based GirOps dying, or it's just me?

Personally, as a guy who works with k8s a lot, I prefer the Push GitOps because I feel that I have more control over what I'm dealing with. The whole ArgoCD tendency is driving me nuts, because I think that ArgoCD is cool if it's provided kinda as a service to devs, and they can use it to deploy what they need to have without my help, but if I'm running something I prefer other tools, for example Helmfile

I feel that Pull GitOps is displacing the Push one. And it seems a logical continuation of the getting rid of human factor thing, but still doesn't seem right to me. Because my job is to deal with systems and I need to be able to do it fast in case of emergency. And with pull-git-ops I don't feel that it's possible.

I had a lot of situations when I had to create/remove/update k8s resources manually to fix things fast and then write those changes to git repo without any hurry, because everything was fixed already. And, for example, when you have your whole infra deployed by ArgoCD with AutoSync enabled, you can't really do that without disabling it, and it feels like a very ugly workaround to me, like a sign that system doesn't work after all. (Or you do follow GitFlow, update manifests/charts/whatever, merge, and check if problem is solved, so fixing a problem is taking a way more time)

And the main benefit that I've been told is that you always can see everything that is running in the cluster and the state is always synced. But I can see anything that is running in the cluster with kubectl, (and I actually would never go to ArgoCD Dashboard myself), and the state is not always synced because someone would eventually disable syncing for fixing something and wouldn't enable it again. Also, I don't care what developers are running there, if I'm not afraid that they will ruin the cluster, but preventing a situation like this, is also a part of my job. And also you still can apply anything that won't be spotted by argocd, so you can't even be sure that you see everything there.

But I'm getting more and more confused about this ArgoCD topic while reading articles, talking to people, etc. I've got a feeling that everybody want to use this Pull gitOps approach for everything, but I still don't see real benefits. And since I've got a feeling that I'm becoming a minority, I think that I should consider changing my mind on this topic. So, if you prefer Pull GitOps for infra stuff, could you say why?

UPD: I meant Push based GitOps, of course, but I think that I can't change the name anymore

https://redd.it/11noszn
@r_devops
How are you splitting out your IaC pipelines?

Hi everyone,

I have started to leverage Azure Devops pipelines and IaC (with bicep) to manage our Azure environment. I am curious about how I should be separating out resources in different pipelines though.

Currently we have been running a pipeline for each resource type with different environment variables depending on which environment we want to update via a specific set of bicep/parameters files.

Does it make sense to continue this way? I had gone over the ms learn documentation and a lot of that involved deploying the entirety of an application with various modules for different resources via a single deployment.

So say for virtual machines, would I have them all deploy via a single "VM" pipeline or group them by application in separate pipelines?

I've also started to work with microservices (api gateway w/ functions, logic apps, etc.) and it seems like for that, it would make more sense to deploy all the components of a specific application via its own pipeline but I'm not entirely sure that's the best approach.

Any input or suggestions are greatly appreciated.

Thanks!

https://redd.it/11nnvzh
@r_devops
How Do We Save about ~$10,000 a Year Using Self-Hosted GitLab

## Moving from GitLab CE to GitLab Premium

In October 2022, GitLab changed its subscription model. There are three plans: 

* Free
* Premium—$19 per user/month
* Ultimate—$99 per user/month.

Switching to a paid subscription or looking for alternatives became necessary for large teams and projects.

The free plan supports up to 5 users in a project or group and is unsuitable for us. It is possible to deploy our self-hosted GitLab CE, but this will require infrastructure and support costs.

In one of our projects, all the binding in the form of CI and environments had already been made for the specifics of GitLab, and the number of repositories numbered several dozen. First, we looked at the proposals of GitLab so as not to waste time on pipelines. Moreover, we had to consider that the total number of users in the project was around 64.

The free plan supports up to 5 users in a project or group, and it was not a right fit for us. We could deploy our self-hosted GitLab CE, but it would require infrastructure and support costs.

Let’s do the simple math.

If we had bought a Premium subscription: 64 users \* $19 = **$1,216 per month** or $14,592 per year (Subscriptions must be paid annually).

And if we raise our GitLab in AWS (the cost in GCP is about the same):

* The minimum recommendation for a self-hosted instance for a service of up to 500 users is 4 CPU 8 Mem, which is \~$130 per month;
* 200Gb drive with daily snapshots up to 14 days is \~$26 per month;
* RDS database with daily snapshots storage for up to 10 days is \~$50 per month;
* S3 bucket for storing caches and artifacts is \~$1 per month.
* **Total: \~$207 per month**.

NB: Here, we consider only the main GitLab service without runners because their value in all cases is constant.

After the estimation, we presented the results to the customer and discussed the obvious benefit of having a leftover $1,000 per month. We also separately drew attention to the need to support and regularly update our own GitLab CE. We added about 6 hours a month for support and started moving.

## Getting ready

Using Terraform, we created a network, storage, S3, instance, and RDS in the cloud. This is our favorite IaC (Infrastructure as Code) approach, which makes it convenient to manage the infrastructure and, if necessary, reuse the finished code.

As a VM image, we used the official GitLab CE AMI (Amazon Machine Image)—an image that is updated and maintained by GitLab itself. To update the GitLab version and not be afraid that the instance will break, we used the ASG (Auto Scaling Group) with the Launch template, to which we transferred the AMI image, instance type, disk configuration, etc.

Moreover, we used a small bash script in User Data to reconfigure and roll our data and configs automatically. It runs immediately after creating the instance, checks the availability of the allocated IP address and storage with data and configuration, and subsequently reconfigures the new model into “our” GitLab.

So, “our” GitLab is configured and tested. Next, we must migrate users and repositories from SaaS GitLab to self-hosted GitLab.

## Moving

To avoid violating the deadlines and not interfering with the developers themselves, it was necessary to agree on and draw up a plan to migrate repositories. For the convenience of user migration, we added Google OAuth with authorization in our Google organization and asked all developers to log in to the new GitLab, thus getting users.

Furthermore, the repositories had to be migrated one by one manually through the export/import mechanism. At the same time, it was necessary to consider that the CI/CD and webhook settings are not exported because they depend on the environments. They had to be adjusted manually for each repository. In addition, we had to connect our group runners as shared runners from GitLab SaaS will not be available.

We moved the repositories, set up the CI environment and webhooks, and checked with the developers that everything worked.

Pros and cons of
this decision:

​

|**FEATURES**|**SELF-HOSTED GITLAB**|**SAAS GITLAB**|
|:-|:-|:-|
|Price|**+**|**-**|
|Support|**-**|**+**|
|Logs|**+**|**-**|
|Administration|**+**|**-**|
|Full access to the API|**+**|**-**|
|Privacy|**+**|**-**|

While Self-Hosted GitLab provides more options, it requires you to have your own support.

## Results

For a modest amount of money, the client received git hosting (GitLab CE), which is slightly inferior in functionality to the premium version of SaaS in some aspects but is generally suitable for work.

If you have a large team and are not willing to pay over $10,000 per year, working with self-hosted GitLab is for you. Of course, such a choice will oblige you to deal with support, allocate additional time for engineers, and the responsibility for the work of GitLab will be entirely on the DevOps team, but this can save you a lot of money.

If you have a small team and don't want to spend time maintaining git hosting, SaaS is a great option. You can get an out-of-the-box, working solution by buying a subscription rather than worrying about infrastructure.

https://redd.it/11nr4gv
@r_devops
Feeling pretty down/demoralized. Any suggestions on easy wins for my team?

In charge of the DevOps team that’s part of a dev org touched by layoffs about a month ago.


Was everyone’s first time experiencing something like that. Anxiety and nervousness have been almost palpable while we’ve kind of just been attempting to run the same org without a lot of the teammates we cared about.


Been trying to figure out some easy wins for myself and the broader team to try and feel like progress is being made. Any and all suggestions welcome.

https://redd.it/11nttd2
@r_devops
Weird override experience on Pagerduty (compared with Opsgenie)

Imagine there are a couple of overrides defined in the schedule (like Person-A coordinated with Person-M to swap their oncall using override for their weeks, and so on).

Now if someone deletes an override, all future overrides shifts left and thus changing the people originally overriding the oncalls. In some weird cases, it's also possible that Person-A overrides their own shift.

This can be annoying especially if a person has spent some time to find their overrides, and with the updates to the other overrides, it's possible that you might not have any overrides in a short period of time (lets say).

Opsgenie on the other hand treats overrides just like how u expect them to be. If you override with someone for a given week, it will stay the same regardless of whether other overrides are deleted or not.

If anyone has come across this, please share your experience and/or any solution to this problem. Thank you.

PS: Not making this up. Experienced it myself and was not able to find any online help on this topic.

https://redd.it/11nsqfd
@r_devops
We created a free AI Code Assistant that understands context

Hey devops folks!

I want to share Safurai's AI Code Assistant - available for VS Code at **www.safurai.com**!
Our code assistant is the ultimate tool for anyone looking to streamline their coding experience. And it's completely free (we'll monetize on enterprises in the future).

Safurai can understand your project, remember past questions and is fine-tuned (thanks to our own models) to give you the best possible results.

If you've ever spent hours trying to debug your code or searching for the right solution, you know how frustrating it can be. With Safurai's AI Code Assistant, you can get personalized and contextual code recommendations based on your specific project needs - saving you time and energy. We believe that coding should be accessible and user-friendly, and our new AI Code Assistant brings that vision to life.

We're just looking for feedback - let me know if you have any :)

https://redd.it/11nxsmg
@r_devops
Is there like a ELI5 for Hashicorp Consul for someone who is not using kubernetes but rather docker compose?

I have been trying to wrap around what Hashicorp Consul does but I just cannot join the dots with my home lab where at the moment I have Pi-4s that are running docker compose files?

Is Consul NOT meant for compose?

https://redd.it/11nvyqa
@r_devops
Feeling burnt out

Good day y’all.

Im feeling a bit… run down. A little bit, tired, exhausted and generally feel beaten.

I’m a senior Devops engineer, I do everthing you’d expect me to do. I’m happy where I am, great people, great company, but there are a few things that’s driving me into the ground.

The first thing is meetings. Oh meetings. My day is probably 50 - 70% meetings. Meetings that could easily be a note or message in teams. I’m told I’m needed, and I ask for the reasoning. When I don’t get it, I decline. When I decline I’m bombarding with messages on teams asking why. Ugh. So I join.
Last week I was asked why I declined a meeting, even though my calendar said I was free. Like… what?! Just because the calendar has a free slot, doesn’t mean I’m available. I have work do. If I’m in meetings all the time, I’ll never get work done… which I don’t.

Secondly. Everything is urgent to everyone. I’m exhausted. I have projects to get out, which are urgent, and I’m fighting fires.

Last but not least, my team, they’re great people, but boy they do not understand …. Anything. I have to hold their hands on everything. Asking me about errors that could be easily googled. Asking me how to do X in bash. How to do Y in Kubernetes. Guys, use your initiative and go and do your own research.

I feel like I’m drowning.

Devs don’t care about how to support their application whilst it’s in production. They don’t care about the disparate characteristics of VMs and pods, so code is a complete mess and doesn’t function correctly in Kubernetes. (Why are we running hour long synchronous processes in a pod and wondering why clients complain when the autoscaler kills it)

Im just, burnt out I think. I need some encouragement.

https://redd.it/11o18cn
@r_devops
How are you handling Terraform & Dev accounts?

Curious to how others are enforcing their engineers to go through Terraform when creating resources in a Dev or any lower env account.

The number one complaint I get is, "writing everything through Terraform is slow" and they'd rather experiment quickly on the console, then move onto Terraform.

However, we need some type of process in place.

I want to restrict people from doing changes through the console, but at the same time, I don't want to hamper their ability to POC or experiment.

Any ideas?

https://redd.it/11o0gos
@r_devops
Scaling java deployment on Kubernetes based on heap memory utilization?

How to monitor and trigger scaling on heap memory utilization? If my heap memory average utilization across pods is set to like 2GB I want to scale if heap memory gets to like 80% as I'm seeing heap space issues.

https://redd.it/11o3t0x
@r_devops
Container Certification

I'm debating between CKA and a Red Hat Containerization cert. Is there any strong argument to go with Red hat as opposed to CKA?

I'm a non-IT guy looking to break into DevOps or SRE. For what it's worth, I have RHCSA, doing Sec+ exam in a couple of days, and doing RHCE exam in a couple of weeks. I'm already leaning Red Hat for the purposes of working toward RHCA in Dev Ops or Open Hybrid Cloud.

https://redd.it/11nuq66
@r_devops
Help me understand networking of kubernetes + wireguard

So, here’s the situation:

I have 3 virtual machines(HyperV), let’s call them A,B and C on my computer.

I also have 2 VMs set up on Azure, let's call them D and E, so I thought it would be cool idea to setup a VPN using wireguard to create a private network for all of those machines. I've created 192.168.20.0/24 private subnet using those configs:


Interface
PrivateKey = <privatekey>

Address = 192.168.10.3/24

ListenPort = 51194

SaveConfig = false
PostUp = /etc/wireguard/helper/add-nat-routing.sh
PostDown = /etc/wireguard/helper/remove-nat-routing.sh

Peer
PublicKey = <pubkey>
AllowedIPs = 192.168.10.2/32

<other peers...>

On the server(machine D).
And

Interface
PrivateKey = <private key>

Address = 192.168.10.2/24

Peer
PublicKey = <pubkey>

AllowedIPs = 0.0.0.0/0

Endpoint = <ip address>:51194

PersistentKeepalive = 15

On the client(machine A).

&#x200B;

Everything worked smoothly, but then I wanted to create a kubernetes cluster(my first time ever) on machine A. I used

&#x200B;

sudo kubeadm init --pod-network-cidr=192.168.10.0/24

and that's when the problems started. I was no longer able to connect/ping from machine D to A, and vice versa. I figured out that there has to be a problem with iptables, and after some looking I figured out that packets are being dropped because of


-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP

Mark "0x8000/0x8000" seems to be set in

&#x200B;

-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000

except... there's no rule that would actually jump there! This left me rather baffled and after some searching I discovered that if I replace AllowedIPs = 0.0.0.0/0 with AllowedIPs = 192.168.10.0/24 I can connect with machine A again.

I searched a little more and found that wireguard marks packets with fwmark 0xca6c, so that they can be identified when going through ip rules. If I'm understanding -m mark documentation correctly.

! --mark value/mask Matches packets with the given unsigned mark value (if a mask is specified, this is logically ANDed with the mask before the comparison).

Soo since ANDing 0x8000 with 0x8000 seems redundant I think that "this" refers to the value of the mask on the packet? In this case 0xca6c | 0x8000 would equal 0x8000, which would make the packet match DROP condition, but that would still leave me with question as to why changing AllowedIPs helps. Is wireguard traffic in that case completely local and so it's not a subject to iptables rules?

&#x200B;

So, does anyone have any thoughts on that? Am I correct in my assumptions on that last paragraphs, or am I completely misunderstanding everything?

&#x200B;

And also, what is KUBE-MARK-DROP for, since it appears to not be used at all? Will it be used later on as I add more services?

&#x200B;

Also, here's the entire iptables configuration, if it's needed:

&#x200B;


# Generated by iptables-save v1.8.7 on Thu Mar 9 11:45:10 2023
mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
COMMIT
# Completed on Thu Mar 9 11:45:10 2023
# Generated by iptables-save v1.8.7 on Thu Mar 9 11:45:10 2023
filter
:INPUT ACCEPT 2631617:9241306330
:FORWARD ACCEPT 2700:348185
:OUTPUT ACCEPT 2609743:9125831887
:KUBE-EXTERNAL-SERVICES - 0:0
:KUBE-FIREWALL - 0:0
:KUBE-FORWARD - 0:0
:KUBE-KUBELET-CANARY - 0:0
:KUBE-NODEPORTS - 0:0
:KUBE-PROXY-CANARY - 0:0
:KUBE-PROXY-FIREWALL - 0:0
:KUBE-SERVICES - 0:0
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes
Help me understand networking of kubernetes + wireguard

So, here’s the situation:

I have 3 virtual machines(HyperV), let’s call them A,B and C on my computer.

I also have 2 VMs set up on Azure, let's call them D and E, so I thought it would be cool idea to setup a VPN using wireguard to create a private network for all of those machines. I've created [192.168.20.0/24](https://192.168.20.0/24) private subnet using those configs:


[Interface]
PrivateKey = <privatekey>

Address = 192.168.10.3/24

ListenPort = 51194

SaveConfig = false
PostUp = /etc/wireguard/helper/add-nat-routing.sh
PostDown = /etc/wireguard/helper/remove-nat-routing.sh

[Peer]
PublicKey = <pubkey>
AllowedIPs = 192.168.10.2/32

<other peers...>

On the server(machine D).
And

[Interface]
PrivateKey = <private key>

Address = 192.168.10.2/24

[Peer]
PublicKey = <pubkey>

AllowedIPs = 0.0.0.0/0

Endpoint = <ip address>:51194

PersistentKeepalive = 15

On the client(machine A).

&#x200B;

Everything worked smoothly, but then I wanted to create a kubernetes cluster(my first time ever) on machine A. I used

&#x200B;

sudo kubeadm init --pod-network-cidr=192.168.10.0/24

and that's when the problems started. I was no longer able to connect/ping from machine D to A, and vice versa. I figured out that there has to be a problem with iptables, and after some looking I figured out that packets are being dropped because of


-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP

Mark "0x8000/0x8000" seems to be set in

&#x200B;

-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000

except... there's no rule that would actually jump there! This left me rather baffled and after some searching I discovered that if I replace AllowedIPs = [0.0.0.0/0](https://0.0.0.0/0) with AllowedIPs = [192.168.10.0/24](https://192.168.10.0/24) I can connect with machine A again.

I searched a little more and found that wireguard marks packets with fwmark 0xca6c, so that they can be identified when going through ip rules. If I'm understanding -m mark documentation correctly.

[!] --mark value[/mask] Matches packets with the given unsigned mark value (if a mask is specified, this is logically ANDed with the mask before the comparison).

Soo since ANDing 0x8000 with 0x8000 seems redundant I think that "this" refers to the value of the mask on the packet? In this case 0xca6c | 0x8000 would equal 0x8000, which would make the packet match DROP condition, but that would still leave me with question as to why changing AllowedIPs helps. Is wireguard traffic in that case completely local and so it's not a subject to iptables rules?

&#x200B;

So, does anyone have any thoughts on that? Am I correct in my assumptions on that last paragraphs, or am I completely misunderstanding everything?

&#x200B;

And also, what is KUBE-MARK-DROP for, since it appears to not be used at all? Will it be used later on as I add more services?

&#x200B;

Also, here's the entire iptables configuration, if it's needed:

&#x200B;


# Generated by iptables-save v1.8.7 on Thu Mar 9 11:45:10 2023
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
COMMIT
# Completed on Thu Mar 9 11:45:10 2023
# Generated by iptables-save v1.8.7 on Thu Mar 9 11:45:10 2023
*filter
:INPUT ACCEPT [2631617:9241306330]
:FORWARD ACCEPT [2700:348185]
:OUTPUT ACCEPT [2609743:9125831887]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-PROXY-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes