Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Which Sheet should I follow for my Intern Preparation?

I am unsure about which sheet should I follow? Striver's A2Z or SDE. I have been suggested to A2Z as it is more beginner friendly and I should use SDE for my revision. But I do not have much time. Companies have already started approaching in my campus. I want to know opinion of you guys.

https://redd.it/1eaa7xg
@r_devops
Networking for DevOps

Hey there,

I'm a junior backend engineer with experience in both Python and Go. I'm interested in gradually transitioning into the DevOps field and was wondering how much networking knowledge is required for an entry-level DevOps position. Are the study materials for Network+ (or A+) sufficient, or do they contain too many unnecessary details, or should I aim for higher-level certifications? Also, do you have any course recommendations?

https://redd.it/1eabyne
@r_devops
DOCKERS in JENKINS

Trying to study up on Dockers and few things I don't understand so far. Firstly, why when you instantiate a docker, you need a DB connection with your data base. If you are using Java project, you may have zipped libraries in your JAR file to connect with DB, but DB itself is never even on GIT repo of a Java project to begin with. Secondly, am I right that for a pipeline, you need only one Docker image. It will then determine where to send your code

https://redd.it/1eac89g
@r_devops
What should I know when going from a bigger team to a team where I'm the only DevOps engineer?

I'm in talks with some potential employers and all of them have a small number of DevOps engineers (1-3 people) or they need only one DevOps engineer for the position.

At the moment I'm in a team of around 10-15 DevOps engineers (it's mostly DevOps with a mix of SecOps Engineers, DBAs) If I'm stuck with something I have the option to ask someone else on the team for help.

What should I know if I switch to a mixed team that has developers/QAs and I'm the only DevOps engineer?

https://redd.it/1ea4c1y
@r_devops
Roast this Github app I built, DevOps use case?

Hi folks πŸ‘‹

I'm sharing my Github app called Pull Checklist. Pull Checklist lets you build checklists that block PR merging until all checks are ticked.

I created this tool because:

1. I found myself using checklists outside of Github to follow specific deployment processes
2. I worked at a company where we had specific runbooks we needed to follow when migrating the db or interacting with the data pipeline

Would really appreciate any feedback on this and whether there's a good use case for DevOps teams.

https://redd.it/1eahtmu
@r_devops
Best PagerDuty Alternative? Lets be honest PagerDuty is expensive and full of feature bloat.

My team has been using PagerDuty for a bit, but we are now looking for an alternative as the system itself is a bit confusing, the scheduling sucks, and the pricing is ridiculous for what we are looking for.

Rather than spend weeks testing and trialing everything on the market, we thought we would ask the group what oncall management/alerting tool you all have had the best luck with.

We are truly just looking for on-call scheduling, alerting, and possibly call routing, as well as the ability to integrate with some common systems we utilize.

What are everyone's thoughts on a better alternative to PagerDuty? Thanks in advance!

https://redd.it/1eahol3
@r_devops
Need Help with Terraform EKS Cluster - Cannot Access API Endpoint from Jumphost

Hey everyone,

I'm currently facing an issue with the EKS infrastructure I set up using Terraform. Everything seems to be standing up correctly, but I'm having trouble accessing the cluster.

Here's a brief overview of what I've done:

1. I wrote the infrastructure in Terraform to create an EKS cluster and associated resources.
2. Everything deploys without any errors.
3. I set up an SSH tunnel to a jump host to access the EKS API server.

However, when I try to access the API endpoint, I get a timeout. Here’s what I’m doing:

>connect to jumphost via ssh
curl --insecure `https://eks-api-endpoint:6443`

Despite the tunnel being established, the curl command times out. I've double-checked my Security Groups and VPC configurations, and everything appears to be in order. Is there anything I'm missing or doing wrong? Any help or pointers would be greatly appreciated!

My main.tf looks like that:

locals {
name = "some"
region = "eu-north-1"

vpccidr = "10.0.0.0/16"
azs = slice(data.aws
availabilityzones.available.names, 0, 3)

bastion
amitype = data.awsami.amazonlinux23.id
ec2instancetype = "t3.small"
tags = {
Example = local.name
}
}

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = local.name
cidr = local.vpccidr

azs = local.azs
public
subnets = cidrsubnet(local.vpc_cidr, 8, 0), cidrsubnet(local.vpc_cidr, 8, 1), cidrsubnet(local.vpc_cidr, 8, 2)
privatesubnets = [cidrsubnet(local.vpccidr, 8, 3), cidrsubnet(local.vpccidr, 8, 4), cidrsubnet(local.vpccidr, 8, 5)]
databasesubnets = [cidrsubnet(local.vpccidr, 8, 6), cidrsubnet(local.vpccidr, 8, 7), cidrsubnet(local.vpccidr, 8, 8)]

enablenatgateway = true
singlenatgateway = true
# onenatgatewayperaz = false
createdatabasesubnetgroup = true
map
publiciponlaunch = true

public
subnettags = {
"
kubernetes.io/role/elb" = 1
}

private
subnettags = {
"
kubernetes.io/role/internal-elb" = 1
}
}

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"

cluster
name = "${local.name}-cluster"
clusterversion = "1.30"

cluster
addons = {
# aws-ebs-csi-driver = {}
coredns = {}
kube-proxy = {}
vpc-cni = {}
}

vpcid = module.vpc.vpcid
subnetids = module.vpc.privatesubnets
createcloudwatchloggroup = false

eks
managednodegroups = {
bottlerocket = {
amitype = "BOTTLEROCKETx8664"
platform = "bottlerocket"

instance
types = "c5.large"
capacitytype = "ONDEMAND"

minsize = 1
max
size = 3
desiredsize = 1
}
}

tags = local.tags

}
resource "aws
keypair" "terraformec2key" {
key
name = "terraformec2key"
publickey = "${file("terraformec2key.pub")}"
}

module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 5.0"

name = "bastion-${
local.name}"
ami = local.bastion
amitype
instance
type = local.ec2instancetype
subnetid = module.vpc.publicsubnets1
vpcsecuritygroupids = [module.ec2securitygroup.securitygroupid]
key
name = "terraformec2key"
}

locals {
name = "some"
region = "eu-north-1"


vpccidr = "10.0.0.0/16"
azs = slice(data.aws
availabilityzones.available.names, 0, 3)


bastion
amitype = data.awsami.amazonlinux23.id
ec2instancetype = "t3.small"
tags = {
Example = local.name
}
}


module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"


name = local.name
cidr = local.vpccidr


azs = local.azs
public
subnets = cidrsubnet(local.vpc_cidr, 8, 0), cidrsubnet(local.vpc_cidr, 8, 1), cidrsubnet(local.vpc_cidr, 8, 2)
privatesubnets = [cidrsubnet(local.vpccidr, 8, 3), cidrsubnet(local.vpccidr, 8, 4), cidrsubnet(local.vpccidr, 8, 5)]
databasesubnets = [cidrsubnet(local.vpccidr, 8, 6), cidrsubnet(local.vpccidr, 8, 7), cidrsubnet(local.vpccidr, 8, 8)]


enablenatgateway = true
singlenatgateway = true
# onenatgatewayperaz = false
createdatabasesubnetgroup = true
map
publiciponlaunch = true


public
subnettags = {
"
kubernetes.io/role/elb" = 1
}


private
subnettags = {
"
kubernetes.io/role/internal-elb" = 1
}
}


module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"


cluster
name = "${local.name}-cluster"
clusterversion = "1.30"


cluster
addons = {
# aws-ebs-csi-driver = {}
coredns = {}
kube-proxy = {}
vpc-cni = {}
}


vpcid = module.vpc.vpcid
subnetids = module.vpc.privatesubnets
createcloudwatchloggroup = false


eks
managednodegroups = {
bottlerocket = {
amitype = "BOTTLEROCKETx8664"
platform = "bottlerocket"


instance
types = "c5.large"
capacitytype = "ONDEMAND"


minsize = 1
max
size = 3
desiredsize = 1
}
}


tags = local.tags


}
resource "aws
keypair" "terraformec2key" {
key
name = "terraformec2key"
publickey = "${file("terraformec2key.pub")}"
}


module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 5.0"


name = "bastion-${
local.name}"
ami = local.bastion
amitype
instance
type = local.ec2instancetype
subnetid = module.vpc.publicsubnets1
vpcsecuritygroupids = [module.ec2securitygroup.securitygroupid]
key
name = "terraformec2key"
}







https://redd.it/1eahid2
@r_devops
Do you abstract and reuse common IaC patterns?

In the middle of sort of a philosophical discussion. I'm curious where you all stand. Say with something like CDK. You notice the same pattern of resources being implemented multiple times. For example an SQS queue triggers a lambda function. The same lines of code are written over and over again to create the queue, lambda, event source, alarms, slack notification, etc. Or maybe it's the same API Gateway to lambda setup. Or it could be a little more complicated like a dynamo stream filter and event bridge. Point is you keep seeing the same code copy/pasted.

Does the repetition bother you? Do you think it should be swapped out for a custom built (shared) construct that creates all of those resources instead of everyone copying/pasting the same code over and over? How do you decide? Is there a threshold of complexity that makes you lean either way?

Pros/cons for building a reusable package? Pros/cons to just keep copying and pasting?

https://redd.it/1eamgo0
@r_devops
Injecting files securely into container during runtime.

Hi. I have a file for django (local_settings.py) that has lots of secrets/passwords in it and right now I'm keeping that file locally on my server and copying it into place before building the Dockerfile, which does the copy into the container. I'm wondering how folks are copying files from a secure location into the container and then protecting it if it has a lot of passwords in it.

https://redd.it/1eagosd
@r_devops
Telegraf / Sensu

Evening, first post here.

Has anyone any experience with using telegraf and sensu together.

Our sensu set up, we have complete control of writing subscriptions but no access to the servers or anything via ssh.

Telegraf, ive installed this on a server, followed standard install guide from them, basic config, inputs atm are just cpu for testing purposes. Output is sensu api url.

In sensu the event appears, however ive no idea how to transform the data to a useful alert/monitor.

I.e if i was sending 10 different inputs, and i wanted to grab metrics around disk space...how do i do that.

Thanks in advance

P.s not using sensu isnt an option πŸ˜©πŸ˜†

https://redd.it/1eagoz8
@r_devops
how to do proper canary deployment for mutli-region application?

hello, i am in charge of designing canary deployment for our microservices. In the same region, it's relatively simple, I use a weighted route53 and wrote a lambda to control the weight while listen to alerts for rollbacks.

How do i do proper canary for applications that's active-passive in two AWS regions? The application has limitation that it can't be active-active due to data consistency concerns. My current idea is to canary one region, then do the other region, but it seems not efficient, so i am here asking for industry best practice. Thanks!

https://redd.it/1eaqadx
@r_devops
Start-up DevOps

I just joined a start-up

They have few GoDaddy web hosts.
Where
Multiple websites are hosted.

1 was windows server with multiple Databases and.
Net projects.
Should I tell the CEO that it's cheaper to use lambda/Linux servers for some of the services


https://redd.it/1eawj7c
@r_devops
Does anyone have internal CLI tools they have built?

I've started building a CLI tool for our team to use to perform regular actions or search logs in a way that is more aligned to how to how we deploy our applications (think get logs <some-api-we-have> and it'll return back a sensible time ordered collection of logs from various k8s pods, queues and such)

Does anyone else have similar tools? What do they do? Do you find them useful?

https://redd.it/1eb0ni4
@r_devops
Just in time (JIT) AWS escalation tool?

Looking for some tool or service that is:

- cheap / free
- not awful to set up
- can be used with one account/organization
- allows approval and review for temporary audited access to elevated AWS access

I read through this AWS TEAM tool but it requires a second federated organization and my team doesnt want to set up another org in our AWS account.

Any suggestions?

https://redd.it/1eb2ew8
@r_devops
I am a complete noob to devops, and was offered an IaaC role. I am terrified to take it but I really think it can be a great opportunity.

Hi guys, I am currently an a cloud/network engineer supporting a live financial application. I've written SQL scripts, PS scripts, built a few network automation scripts through python, built a few playbooks with Ansible, and learned OOO with C++ in college. However, I have been offered an IaaC engineer role (no production code involved, yet) and I am extremely nervous to take it. I only have about 5 years of true experience in IT but I think this role can be a great segway for me into automation, which is what I've always wanted to focus on rather than pure infrastructure side of things. Im extremely nervous, and I would love to succeed in this role but I do not have much help except this community. Please offer me any advice you have!

https://redd.it/1eb3e3x
@r_devops
CrowdStrike Preliminary Post Incident Review

CrowdStrike put out their official PIR on the incident. I hope whoever wrote this was banging their head against a desk when they had to basically write out "our only testing for this was an automated test that didn't even officially pass".

Here's the link for anyone interested: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

https://redd.it/1eb40oo
@r_devops
No Vault TLS for Production cluster

Hi, i'm trying to set up a Vault production cluster for our company.
The issue i'm having right now is that the browser doesn't recognize my CA certificate. I have created it with this command:

#generate ca in /tmp
cfssl gencert -initca ca-csr.json | cfssljson -bare /tmp/ca

#generate certificate in /tmp
cfssl gencert \
-ca=/tmp/ca.pem \
-ca-key=/tmp/ca-key.pem \
-config=ca-config.json \
-hostname="vault,vault.vault.svc.cluster.local,vault.vault.svc,localhost,127.0.0.1" \
-profile=default \
ca-csr.json | cfssljson -bare /tmp/vault

As i understood this a self signed certificate that's valid only inside my cluster. Used this method as the Vault setup requires tls-server and tls-ca. I can generate the tls-server in my Cloudflare account or use the cert-manager to create one for myself but it doesn't want to work as intended.

extraEnvironmentVars:
VAULTCACERT: /vault/userconfig/tls-ca/tls.crt

extraVolumes:
- type: secret
name: tls-server
- type: secret
name: tls-ca

standalone:
enabled: false
ha:
enabled: true
replicas: 3
config: |
ui = true

listener "tcp" {
tls
disable = 0
address = "0.0.0.0:8200"
tlscertfile = "/vault/userconfig/tls-server/tls.crt"
tlskeyfile = "/vault/userconfig/tls-server/tls.key"
tlsminversion = "tls12"
}

storage "consul" {
path = "vault"
address = "consul-consul-server:8500"
}

# Vault UI
ui:
enabled: true
externalPort: 8200

I was thinking may be to have another certificate to cover the ingress exit only and to use for local cluster a the self signed certificates, but won't work like that too.
Here's the ingress i try to create the connection:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: vault-ingress
namespace: vault
spec:
rules:
- host: vault.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: vault-ui
port:
number: 8200
tls:
- hosts:
- vault.company.com
secretName: default-workshop-example-tls
ingressClassName: nginx

I'm trying to get my head around this for a week, but i can't. Any help would be welcomed! πŸ™


The questions are:
How to generate a valid CA certificate? As i understood i can't do it.
How to enable TLS in Vault?
Is my config may be wrong?


https://redd.it/1eb273e
@r_devops
Serverless observability tools (new relic etc.): are my expectations off?

Recently I tried new relic with AWS Lambda (python) and was surprised by how awkward "the basics" of logging and metrics seemed to be compared to my previous experiences with other tools (datadog, elasticsearch, grafana, riemann). Most of my experience with those tools is not with serverless, though.

I'm wondering if that's more a new relic issue or a general problem with poor support for serverless? Am I expecting too much? Am I doing it wrong?

What I expected:

* Searchable, correlated logs across multiple services
* Application metrics (e.g., products sold per hour) and infrastructure metrics (e.g., 50x responses per hour)
* Alerts based on these metrics, integrated with slack and out-of-hours tools like pagerduty
* Performance tracing

What I got from their lambda extension (there are other integration options - e.g. Opentelemetry - but it seems they all have limitations and are a little work-in-progress):

* Quirky Lambda extension with documentation / usability issues
* Logs: currently not clear to me that it's useable to search across multiple Lambdas/services (?!)
* Custom metrics: fine for what I needed, but with caveats (e.g., no tags - no "dimensional metrics")
* Alerts: seems fine. I didn't try it with slack/pagerduty though
* Performance tracing: I didn't need this in my test, but again hindered by documentation issues

How do other tools do on serverless? (datadog, honeycomb, etc.)

https://redd.it/1eb7ddg
@r_devops
SRE/DevOps IDE

Hi! Imagine the perfect SRE/DevOps IDE for your tasks. In your opinion, what is the most important feature it should have? What specific technologies, stacks, integrations, and scenarios should it support? Is there anything else you would like to include?

https://redd.it/1eb9qsf
@r_devops
Daily Work Problems Faced by Engineers in Cloud DevOps and SRE

Hello people.

I know that this may sounds like a very general ask but bear with me. I am looking for problems or process improvements in the Cloud DevOps and SRE work domain. What's something that you as an engineer (or any other employee in this area) face on daily basis or have faced in the past and would like to be solved or made a tool for? My intentions are to start an year long project (so the problem should be big/small enough) that will span my whole senior year in college and the end product being something that helps or solves the said problem.

P.S I would really prefer if it's something that can use some ML to enhance it.

https://redd.it/1ebbo6y
@r_devops