Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How do you avoid DevOps jobs that are really just ops / sysadmin jobs?

Title. How do you filter out the actual DevOps / SWE - Infra jobs, compared to the ones that are really just sysadmin jobs?

https://redd.it/yl3845
@r_devops
(RANT) Gov Devops is Difficult


Run away from any environment which you do not have complete control / access to everything in said environment. All the pain you will experience is not worth it unless you are getting paid six figures.

https://redd.it/yl7z0l
@r_devops
Gitops as an auditlog is not very accessible or informative.

Wrote a blog post about something that was bugging me for a long time:

Gitops as an auditlog is not very accessible or informative.

https://gimlet.io/blog/three-problems-with-gitops-as-deployment-history-and-how-we-overcome-them

But I like the gitops approach, so wanted to fix this and I believe many others made an attempt doing so. What do you think of the issue? How did you solve it?

https://redd.it/yl60ax
@r_devops
How to communicate my manager that our implementation of Ansible is totally wrong?

Title.

Last month I started working for a new company. We work with Ansible and automate mostly simple tasks within our organization. Loads of LDAP management, some infra, etc. From my experience with Ansible I've never come across an environment like the one we have now but I know that none of the best practices are being followed. Things that should be simple playbooks are created as roles. The roles have only 1 main.yml tasks file and a couple variables in defaults/ but absolutely nothing else. Stuff like that should just be playbooks whilst roles should contain more than a couple things (templates, vars, files, etc). They also create new roles which use 90% import_roles from other places and the other 10% is "new" tasks. Needless to say this creates a dependency hell. What happens when they update 1 role? they need to update it in another 50 places. Ah.. they don't use Ansible tags either.

I believe this environment is beyond salvation at this point. It's been going for a long time so there is a lotttt of work done following these implementations. It'd also require a change of mind. How do I tell this to my new manager without sounding like a moron and having my team mates disliking me for basically telling them their work is done wrong? I wanted to create some sort of analysis of the situation and present it to my manager just to explain why this is not following standards and also providing a better understanding on what steps should be followed to improve our work environment. And... admitting everything we have done so far would take too long to repair so we should change that way of working from now on.

https://redd.it/ylebse
@r_devops
How do you handle metrics aggregation over a period of time? Sliding window?

Let's say you are monitoring a metric and you want to alert off of a timespan, not just a single instance.

Let's say CPU. You want to alert if CPU is over 80% for 5 minutes. Do you do static 5 window analysis times? E.g. On minute 5, average out from minutes 1 - 5. On minute 10, average out from minutes 6 - 10. On minute 15, average from 11 - 15, etc. So if the metric is exceeded from minutes 1 - 5, the alert condition would be for the next 5 minutes until minute 10 when it is recalculated and possibly resolved.

Or do you approach this with a sliding window? E.g. On minute 5, average from 1 - 5. On minute 6, average from 2 - 6. On minute 7, average from 3 - 7. Etc. If a metric is exceeded from minutes 1 - 5, the alert condition would be for the next 1 minute until minute 6 when the recalculation puts it below the threshold then.

I feel like sliding window is more accurate because it doesn't "reset" the counter every duration milestone. But curious what the industry standard approach to this is.

https://redd.it/ylf52s
@r_devops
Jenkins Error - doesn’t look like a JDK directory

Hi all

​

Im getting the following error in jenkins while trying to specify JAVA_HOME

​

" /usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 doesn’t look like a JDK directory "

However java is installed at this directory as can be seen below.

echo $JAVA_HOME

/usr/lib/jvm/java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64

[root@ip-172-31-xx-xxx jvm]# ls

java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre jre-11 jre-11-openjdk jre-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre-openjdk

jvm]# ls

java-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre jre-11 jre-11-openjdk jre-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 jre-openjdk

​

I dont think i should be receiving this error, i tried changing it to jre-11-openjdk-11.0.16.0.8-1.amzn2.0.1.x86_64 but that also didnt work

​

Any help would be appreciated. Thanks in advance.

https://redd.it/ylk5k7
@r_devops
Looking for opinions on spinning up dev/staging environment databases.

As the title says, trying to work out a good plan for spinning up ad hoc environments with Pulumi and the databases are the sticking point. All dbs are SQL Server on Azure Cloud.

Prisma is implemented for some of the newer stuff, so theoretically I can set up the db migrations to run as part of the release pipeline, but there are some legacy dbs with a fair amount of stored proc code & lookup tables (and no setup script files)

Ideally I'd like to be able to do any of the following based on the needs of the environment:

1. Spin up a copy of staging/prod with data
2. Spin up an empty copy of the database, with stored procedures, table schema, and lookup tables
3. Spin up a copy of the database with sanitized or faked data

1&2 are sufficient, but if there are tools out there to help with 3 (without me having to write a sanitize script) that would work.

https://redd.it/yljepd
@r_devops
Deploying Next site + Node app and database.

Hey all, looking to host a community site I built in NextJS that is using server side rendering which is the least of my worries but I'm trying to host an instance of directus (node app that interacts with a DB and adds an API layer), I'm trying to find a cost effective way of hosting this setup that is reliable, I don't mind doing heavy lifting but also don't want to over engineer and prefer if if I could find a way to rebuild similar setups for the future.

I figured I could go with the droplet route and setup everything myself, which I went ahead today and did 90% of just that but ran into issues with the reverse proxy on nginx, also it was a bit hefty but I could probably build an ansible role to do most of it, but I was thinking maybe docker would be an thought to make this happen? Directus has a docker image and I could use a postgres image as well, I just don't know how well that works in production or if I should just host that on a droplet or a container service (sounds pricier but idk)

Sorry, if I'm not super clear just trying to find way to make this way and keep it cheap, I imagine traffic will be pretty damn minimal so I don't think I need much. I also don't want to over engineer or have things band-aided together.

thanks.

https://redd.it/ylq6md
@r_devops
GCP from AWS

Beyond searching for the equivalent of the services between cloud providers (e.g. EC2 vs Compute Engine), are there any tips and advice one could share for organizations switching from AWS to GCP?

For starters, I’ve found that there are no accounts, but instead groupings based on “projects” in GCP.

https://redd.it/ylpz5h
@r_devops
Scaling Your Team From 5 to 250+ Engineers: FULL Checklist from your feedback!

A few weeks ago I shared a post on here about scaling your engineering organization from 2 to 250 engineers. It was a long blog post that detailed the stages of growth and what to do in terms of Velocity, Quality and Outcomes.

The feedback I got on that post was honestly overwhelming!

I love this community and your comments and suggestions were truly valuable, as I've been putting together something a bit more extensive for engineering leaders... a full checklist to help navigate these stages, step by step. What to focus on in terms of yourself as a leader, your teams and your processes. I included items on culture (something which a lot of you brought up) and each checklist items has extra resources so you can explore more :)

It came out on Product Hunt a couple of hours ago, so you can check it out there, and if you like it, give it an upvote!

This checklist is a living thing, and it really wouldn't be possible without this community, so, if you have more feedback and suggestions, let me know in the comments, as I'll be adding more items and resources as they come!!

Thank you so much for all your support on this!

https://redd.it/yltt9e
@r_devops
Using a single Flux instance and single repo to deploy workload on remote cluster with a kubeconfig secret

I am trying to configure multiple K8s clusters via a single Flux instance and a single repo with the following process. Please note cluster provisioning is handled outside of Flux.

The following example installs flux on the management cluster and sync with the repo.
Once Flux is setup, I clone the repo and add sync files for it to deploy workloads on the remote staging cluster using a kubeconfig secret.

When I omit the kubeconfig from the staging-sync.yaml, all get deployed on the management cluster (which is logic but not what I want to achieve), and as soon as i ass the kubeconfig, I get the following error: `Kustomization/flux-system/apps dry-run failed, error: no matches for kind "Kustomization" in version "kustomize.toolkit.fluxcd.io/v1beta2"` and i can't get it work.

But in [this](https://github.com/fluxcd/flux2/discussions/2258) github discussion from January, a member of the Flux project stated that targeting remote clusters using kubeconfig is fully supported.

```sh
# Store Gitlab known_hosts file in a variable
locals {
known_hosts = "${file("${path.module}/known_hosts")}"
}

# Generate an SSH keypair
resource "tls_private_key" "main" {
algorithm = "RSA"
rsa_bits = "4096"
}

# Generate manifests
data "flux_install" "main" {
target_path = var.target_path
}

data "flux_sync" "main" {
target_path = var.target_path
url = "ssh://git@${var.gitlab_base_url}:2222/${var.gitlab_owner}/${var.repository_name}.git"
branch = var.branch
}

# Create Namespace for Flux
resource "kubernetes_namespace" "flux_system" {
metadata {
name = "flux-system"
}

lifecycle {
ignore_changes = [
metadata[0].labels,
]
}
}

# Split multi-doc YAML
data "kubectl_file_documents" "install" {
content = data.flux_install.main.content
}

data "kubectl_file_documents" "sync" {
content = data.flux_sync.main.content
}

# Convert documents list to include parsed yaml data
locals {
install = [for v in data.kubectl_file_documents.install.documents : {
data : yamldecode(v)
content : v
}
]
sync = [for v in data.kubectl_file_documents.sync.documents : {
data : yamldecode(v)
content : v
}
]
}

# Apply manifests on the cluster
resource "kubectl_manifest" "install" {
for_each = { for v in local.install : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
depends_on = [kubernetes_namespace.flux_system]
yaml_body = each.value
}

resource "kubectl_manifest" "sync" {
for_each = { for v in local.sync : lower(join("/", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, "namespace", ""), v.data.metadata.name]))) => v.content }
depends_on = [kubernetes_namespace.flux_system]
yaml_body = each.value
}

# Generate a Kubernetes secret with the GitLab credentials
resource "kubernetes_secret" "main" {
depends_on = [kubectl_manifest.install]

metadata {
name = data.flux_sync.main.secret
namespace = data.flux_sync.main.namespace
}

data = {
identity = tls_private_key.main.private_key_pem
"identity.pub" = tls_private_key.main.public_key_pem
known_hosts = local.known_hosts
}
}

# Create a repository
resource "gitlab_project" "main" {
name = var.repository_name
visibility_level = var.repository_visibility
initialize_with_readme = true
default_branch = var.branch
}

# Deploy generated SSH public key to Gitlab
resource "gitlab_deploy_key" "main" {
title = "flux"
project = gitlab_project.main.id
key = tls_private_key.main.public_key_openssh

depends_on = [gitlab_project.main]
}

# Deploy generated manifests to Gitlab project
resource "gitlab_repository_file" "install" {
project = gitlab_project.main.id
branch = gitlab_project.main.default_branch
file_path = data.flux_install.main.path
content =
base64encode(data.flux_install.main.content)
commit_message = "Add ${data.flux_install.main.path}"

depends_on = [gitlab_project.main]
}

resource "gitlab_repository_file" "sync" {
project = gitlab_project.main.id
branch = gitlab_project.main.default_branch
file_path = data.flux_sync.main.path
content = base64encode(data.flux_sync.main.content)
commit_message = "Add ${data.flux_sync.main.path}"

depends_on = [gitlab_repository_file.install]
}

resource "gitlab_repository_file" "kustomize" {
project = gitlab_project.main.id
branch = gitlab_project.main.default_branch
file_path = data.flux_sync.main.kustomize_path
content = base64encode(data.flux_sync.main.kustomize_content)
commit_message = "Add ${data.flux_sync.main.kustomize_path}"

depends_on = [gitlab_repository_file.sync]
}

# Create the kubeconfig secret for the staging cluster
resource "kubernetes_secret" "kubeconfig_staging" {

metadata {
name = "kubeconfig-staging"
namespace = data.flux_sync.main.namespace
}

data = {
"value.yaml" = base64decode(data.terraform_remote_state.downstream_cluster.outputs.kubeconfig)
}
}
```


## Repository structure after pushing new files

```sh
.
├── apps # Added files after Flux is setup
│   ├── base
│   │   └── podinfo
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   └── release.yaml
│   └── staging
│   ├── kustomization.yaml
│   └── podinfo-values.yaml
├── clusters
│   ├── management
│   │   ├── flux-system # Files generated by Terraform flux provider
│   │   │   ├── gotk-components.yaml
│   │   │   ├── gotk-sync.yaml
│   │   │   └── kustomization.yaml
│   │   ├── kustomization.yaml
│   │   └── staging-sync.yaml
│   └── staging # Added files after Flux is setup
│   ├── apps.yaml
│   └── infrastructure.yaml
├── infrastructure # Added files after Flux is setup
│   ├── kustomization.yaml
│   ├── nginx
│   │   ├── kustomization.yaml
│   │   ├── namespace.yaml
│   │   └── release.yaml
│   ├── redis
│   │   ├── kustomization.yaml
│   │   ├── kustomizeconfig.yaml
│   │   ├── namespace.yaml
│   │   ├── release.yaml
│   │   └── values.yaml
│   └── sources
│   ├── bitnami.yaml
│   ├── kustomization.yaml
│   └── podinfo.yaml
└── README.md
```

## Files content

### clusters/management/flux-system

Note, I omitted the gotk-componenets.yaml intentionally because of it's length.

```sh
# clusters/management/flux-system/gotk-sync.yaml

# This manifest was generated by flux. DO NOT EDIT.
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 1m0s
ref:
branch: master
secretRef:
name: flux-system
url: URL
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 10m0s
path: ./clusters/management
prune: true
sourceRef:
kind: GitRepository
name: flux-system
```

```sh
# clusters/management/flux-system/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-sync.yaml
- gotk-components.yaml
```

### clusters/management
Those are the files that target the remote staging cluster using the kubeconfig for that cluster.

```sh
# clusters/management/staging-sync.yaml

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 1m0s
ref:
branch: master
secretRef:
name: flux-system
URL: URL
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 10m0s
kubeConfig:
secretRef:
name: kubeconfig-staging
timeout: 2m10s
path: ./clusters/staging/
prune: true
sourceRef:
kind: GitRepository
name: flux-system
```

```sh
# clusters/management/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
-
Infrastructure as Code Through Ansible

Latest webcast from the Software Engineering Institute, https://youtu.be/PSUDNYXAONA

It's a basic beginners introduction to the concept and tool. Any thoughts?

Description: Infrastructure as code (IaC) is a concept that enables organizations to automate the provisioning and configuration of their IT infrastructure. This concept also aids organizations in applying the DevOps process (plan, code, build, test, release, deploy, operate, monitor, repeat) to their infrastructure. Ansible is a popular choice within the IaC tool landscape for realizing this goal.

What Attendees Will Learn:

• What is Infrastructure as Code (IaC)?

• How does Ansible fit into the IaC tool landscape?

• How do I get started with Ansible?

Who Should Attend:

• system administrators

• IT operations managers

• automation engineers

• DevOps engineers

https://redd.it/ylzsbp
@r_devops
Switching to DevOps

Hi all,

I'm your typical Linux sysdmin (I have a RHCE cert), also working with SAN and VMware, some basic network knowledge (routing, vlan, LACP, NAT), I have worked with deployments before (gitlab to staging and to prod), and briefly worked on some AWS infrastructure, but I'd consider myself a beginner in that role. I have experience with docker, but not with Kubernetes. I got a job offer that pays significantly more than my current sysadmin job, I was very upfront about my knowledge and what I'm missing, and of course now my imposter syndrome is kicking in.

Is it realistic to learn terraform, more AWS and kubernetes (administration, not setting it up) in like 2 months or so? They of course offered I learn it on the job, they offered me any courses or certifications I'd want, but it's still quite a bit different from what I worked on until now. I am interested in this work otherwise and have been meaning to start learning it, I just haven't gotten around to it yet. I think I learn pretty quickly, and I am definitely willing to learn, but what if I'm biting off more than I can chew?

Thanks for any comments.

https://redd.it/ylx9yp
@r_devops
DevOps & DevSecOps: What Are the Key Differences Between the Two?

The terms DevOps and DevSecOps have been in the air of technology for a long time. But, still, the concepts of these two terms have been misunderstood by many; many are not even aware of the differences these terms have.

Let's dive into the concepts of DevOps and DevSecOps in detail.

https://redd.it/yly6je
@r_devops
Best certification for DevOps?

Yes, I know it's not required, I have 3 years of experience and no one ever bothered me with it besides a ton of free courses, but still, it sure is a nice peace of paper.

Looking at this certificate. I feel like I could get most of these right off the bat, except Cloudformation, which I never touched, as I only ever used Terraform. I actually completed the training for it since we were subscribed to AWS Skill Builder. What's the passing grade on the exam? Is it really worth it?

https://redd.it/yly4iq
@r_devops
Career Change

Hi everyone,

Today is my last day at my current job. I'm taking a huge risk by quitting, pulling my pension to pay off my debt and pay for rent so that I can take some time myself and really figure out the trajectory of the rest of my life. At 33, it might seem a little late but I just need a change in life to work on something that I've become passionate about.

Over the last few years, I've been immersed in the self-hosting, coding, web-development, networking, cyber security, all-things-IT world. I really want to continue to and build a new career from it. But as you can probably tell, I'm having a hard time focusing and deciding on which pathway I should take.

I have the most experience working with Docker and self-hosting although I have yet to deploy anything that doesn't get taken down within a week or so because I either screw it up or change my mind. I also have been working a lot with front-end web development using JavaScript, HTML, & CSS, mostly using ReactJS.

There's a reason why I have spent most of my time in self-hosting using Docker and frontend development with React. It's because I enjoy having ownership over my data and privacy (self-hosting) and love exercising my creative side which is where frontend web development comes in.

With all that being said, I'm just looking for some guidance or possible mentorship in continuing this journey. I find that I do better when there is structure in my life so I've been looking for some type of online course or even just a lesson plan that I follow. I don't want to go back to a traditional classroom structure, I don't think that'll work for me. Online courses and bootcamps are expensive and risky.

Does anyone have any suggestions, tools, resources, YouTube channels, lesson plans, or pathways that I should take?

Thank you!

https://redd.it/ym6o6i
@r_devops
How common areas devops jobs without on call

Throughout my career, I find that I in less interested the business domain, but more interested in technology. I care less about the features my company delivers to customers, but I care more about infrastructure (terraform, kubernetes, AWS, Azure, CI/CD build and deployment, grafana, elastic, security, service mesh, java, javascript). However, I dread being on call, and fire-fighting.

How common are these types of jobs? I find that these jobs that companies are hiring for usually involve being on call, and fire-fighting. I also find that jobs that are strictly backend (i.e., java, .NET), front-end (i.e., react) involve less on-call and fire-fighting. What are my options?

https://redd.it/ymatwn
@r_devops
My job title is “DevOps Engineer” but the work doesn’t line up. Help?

I 32F have been a “DevOps Engineer” for two years & made a switch from being a Big Data Engineer for three years.

I made the switch. I was headhunted, it’s a great company to have on my LinkedIn & the work that I was told we’d be doing sounded exciting but none of it has actually happened.

Since then, our team stack has changed to a point where I know that it’s not really DevOps anymore.

We do use Config Management tools being Puppet & Terraform. We no longer look after CI/CD tools being Jenkins & Spinnaker. It’s now maintained by Release Engineers in another team. We do look after Logging tools being ELK, InfluxDB & Grafana.

We were told that we would be looking at adding Docker as a containerisation tool only for that to be full-steam-ahead by Release Engineers.

I genuinely feel like a fraud. Having a job title with tasks & tools that don’t align. I’ve spent most of the year doing documentation & on-call on random things. I feel like tech support & I hate that.

I’m not growing at all. I’m incredibly bored. I’ve barely done any code all year. I’ve been doing a lot of self-learning to fill my knowledge gaps that it doesn’t look like I’ll get in this team.

I’ve been told we’re changing our names to SREs but that doesn’t make sense either.

I’m not a DevOps Engineer really, am I. Any advice?

https://redd.it/yle5jk
@r_devops