Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Review my resume for a 3 months experience

I have been working in this small service based company for about 7 months now but the first 4.5 months were just training/self learning . I've only really worked since mid May . I don't see myself growing much here and therefore I want to switch asap before the end of this year. I feel like my company doesn't focus much on Iaac and I miss out on a lot of good practices too. Please suggest what I can add/remove from my resume and what kind of personal projects related to devops i can add.
https://drive.google.com/file/d/1aK5ZCTxR4NJ94IXnRzAP6zw9_eJSY5-n/view?usp=drivesdk


https://redd.it/1e9zsmo
@r_devops
Multiple deployment channels Octo/DEVOPS

Our Octo setup has multiple deployment channels (Feature 1, Feature 2, ,Main etc) that each go through all the different environments, currently when a PR is merged in on TFS we can choose which feature branch to build to – but only ever use this for testing. Main being the only branch/package that goes all the way through to production

I currently only have 1 pipeline in devops, which will act as main and as far as I understand this will be set up as a continuous deploy, how do I cater for all the other channels in devops?


Hope I'm being clear, this is my first time doing something like this



https://redd.it/1ea1ow9
@r_devops
Prometheus as receiver

Hello all,

I am relatively new to Prometheus and have a quick question. We want to use our Prometheus as a receiver and get metrics from a remote write Prometheus. As I have read we need to use --enable-feature=remote-write-receiver. The Prometheus installation was installed locally on a Linux Ubuntu server.

Where in which file do I have to enter --enable-feature=remote-write-receiver?

Is the endpoint that I have to pass on the remote write prometheus the following? LocalServerIP/api/v1/write ? Can I find the URL in a file? Which port is used for this?

Many thanks in advance!

https://redd.it/1ea2gtn
@r_devops
⚠️ Need Help Migrating MySQL DB from 8.0 to 8.0.23 in Docker

I'm in need of some assistance regarding the migration of my MySQL database.

Current Setup:

- I have a slave MySQL database running on MySQL 8.0 in a Docker container.
- I've mounted custom folders as follows:
  -v /opt/mysql/data:/var/lib/mysql \
-v /opt/mysql/my.cnf:/etc/mysql/my.cnf \
-v /opt/mysql/log:/var/lib/mysql/log \

- The database size is 73G.

Goal:

- I want to upgrade to MySQL 8.0.23.

Issue:

- When I mount the same directories to the new MySQL 8.0.23 image, I encounter the following error:

 16:24:07+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2024-07-19 16:24:07+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.23-1debian10 started.
2024-07-19 16:24:32+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.23-1debian10 started.
2024-07-19 16:24:32+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2024-07-19 16:24:32+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.23-1debian10 started.


- The container keeps restarting in loops.

Request:

- Can anyone guide me on how to properly migrate from MySQL 8.0 to 8.0.23? What steps should I take to ensure a smooth transition and avoid the errors mentioned above?


Thank you in advance for your help!

https://redd.it/1ea3d52
@r_devops
GitFlow question, my senior is on vacation, did I fuck up?

My senior, who usually adresses all the deployment process is on vacation this week. During this time, I have to deploy several of the features he finished and some of the features I am working on. After executing the git flow workflow, my develop and my main branch are in different states, exactly main branch is 8 commits in front of develop and 14 commits behind, which does not seem fine. Taking at the difference in the commits apparently the feature branch in develop has commits that the main branch has not. Honestly i did not know how this happen as I pulled the latest version of main and develop, merged both feature branches into develop from the github webapp and then proceeded to pull develop, git flow release start (branch_name), blablabla command for code formatting & tests, git flow release finish.

Now i am a bit affraid of doing something on main outside of the common workflow, main and develop can be merged and i guess i can just go to main, merge develop and push it but I dont know if someone is going to kill me on monday. I was wondering if someone could give me any hint on how this happened and wether it can be fixed on a safe way

https://redd.it/1ea3lzr
@r_devops
Add Users to SQL Database (Azure SQL Managed) In CI/CD Pipeline - Permissions Question

Hello,

I originally posted this in the terraform sub but it hasn't gained any traction so trying here.

I have a CI/CD pipeline in AzDevops that runs on a self-hosted agent with a user-assigned Identity. I provision a new SQL Database with terraform and want to add a user to it in the pipeline.

The only solution I've seen so far is to add the identity of the agent as an admin to the SQL server via an Entra Group. This feels bad security wise as a breach of the CI/CD agent would expose every database we have. Am I overthinking this?

Any better solutions?

https://redd.it/1ea272u
@r_devops
CI with JENKINS

I am a QA and all companies I have been at QA's don't even use Maven, let alone Jenkins, but I am trying to understand the CI process. Here is the way I see it. Correct me where I am wrong. Firstly, I think that CI is only used, if you have automation testing, since with manual testing there is nothing to integrate dev code with. Also, you can have dev without qa (though your app will be riddled with defects), but you can't have qa without dev. That is the reason Jenkins connects with dev branch on GIT. After packaging, it sends JAR to docker container. Which then destributes the code to various environments. IT goes to PROD environment only when you do release. Build is any update in the code and one release is comprised of multiple builds. Still some unanswered questions, but is all that correct?

https://redd.it/1ea7fj2
@r_devops
Running a Sidecar container as a cron job

Googling this topic shows a few methods of achieving this but I'm not sure which way would be best for my needs.

In my current setup I'm spinning up a pod with 2 containers:

- Main container (Thanos Ruler)
- Sidecar container (just my Python script)

This is the Helm values file:

ruler:
enabled: true
logLevel: debug
clusterName: local-ruler
alertmanagers:
- https://prometheus-kube-prometheus-alertmanager.prometheus.svc.cluster.local:9093
extraFlags:
- --rule-file=/synced-rules/*.yml
sidecars:
- name: rule-syncer
image: python:3.12-alpine
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args:
- -c
- |
echo "Starting rule-syncer sidecar"
pip install requests pyyaml --quiet
echo "Running script"
python /scripts/ruler_syncer.py
volumeMounts:
- name: synced-rules
mountPath: /synced-rules
- name: rule-syncer-script
mountPath: /scripts
extraVolumes:
- name: synced-rules
emptyDir: {}
- name: rule-syncer-script
configMap:
name: rule-syncer-script
defaultMode: 0755
extraVolumeMounts:
- name: synced-rules
mountPath: /synced-rules

Instead of running my script in a `while True` loop, I'd rather just run it as a cron job. My script needs to be mounted with the volume used in the main container.

What would be the ideal way to achieve this? I'm planning to build an image for the script/sidecar, but once that's done, how would I run it periodically?

Any help would be appreciated. Kind of new to Kubernetes.

https://redd.it/1ea8z43
@r_devops
Which Sheet should I follow for my Intern Preparation?

I am unsure about which sheet should I follow? Striver's A2Z or SDE. I have been suggested to A2Z as it is more beginner friendly and I should use SDE for my revision. But I do not have much time. Companies have already started approaching in my campus. I want to know opinion of you guys.

https://redd.it/1eaa7xg
@r_devops
Networking for DevOps

Hey there,

I'm a junior backend engineer with experience in both Python and Go. I'm interested in gradually transitioning into the DevOps field and was wondering how much networking knowledge is required for an entry-level DevOps position. Are the study materials for Network+ (or A+) sufficient, or do they contain too many unnecessary details, or should I aim for higher-level certifications? Also, do you have any course recommendations?

https://redd.it/1eabyne
@r_devops
DOCKERS in JENKINS

Trying to study up on Dockers and few things I don't understand so far. Firstly, why when you instantiate a docker, you need a DB connection with your data base. If you are using Java project, you may have zipped libraries in your JAR file to connect with DB, but DB itself is never even on GIT repo of a Java project to begin with. Secondly, am I right that for a pipeline, you need only one Docker image. It will then determine where to send your code

https://redd.it/1eac89g
@r_devops
What should I know when going from a bigger team to a team where I'm the only DevOps engineer?

I'm in talks with some potential employers and all of them have a small number of DevOps engineers (1-3 people) or they need only one DevOps engineer for the position.

At the moment I'm in a team of around 10-15 DevOps engineers (it's mostly DevOps with a mix of SecOps Engineers, DBAs) If I'm stuck with something I have the option to ask someone else on the team for help.

What should I know if I switch to a mixed team that has developers/QAs and I'm the only DevOps engineer?

https://redd.it/1ea4c1y
@r_devops
Roast this Github app I built, DevOps use case?

Hi folks πŸ‘‹

I'm sharing my Github app called Pull Checklist. Pull Checklist lets you build checklists that block PR merging until all checks are ticked.

I created this tool because:

1. I found myself using checklists outside of Github to follow specific deployment processes
2. I worked at a company where we had specific runbooks we needed to follow when migrating the db or interacting with the data pipeline

Would really appreciate any feedback on this and whether there's a good use case for DevOps teams.

https://redd.it/1eahtmu
@r_devops
Best PagerDuty Alternative? Lets be honest PagerDuty is expensive and full of feature bloat.

My team has been using PagerDuty for a bit, but we are now looking for an alternative as the system itself is a bit confusing, the scheduling sucks, and the pricing is ridiculous for what we are looking for.

Rather than spend weeks testing and trialing everything on the market, we thought we would ask the group what oncall management/alerting tool you all have had the best luck with.

We are truly just looking for on-call scheduling, alerting, and possibly call routing, as well as the ability to integrate with some common systems we utilize.

What are everyone's thoughts on a better alternative to PagerDuty? Thanks in advance!

https://redd.it/1eahol3
@r_devops
Need Help with Terraform EKS Cluster - Cannot Access API Endpoint from Jumphost

Hey everyone,

I'm currently facing an issue with the EKS infrastructure I set up using Terraform. Everything seems to be standing up correctly, but I'm having trouble accessing the cluster.

Here's a brief overview of what I've done:

1. I wrote the infrastructure in Terraform to create an EKS cluster and associated resources.
2. Everything deploys without any errors.
3. I set up an SSH tunnel to a jump host to access the EKS API server.

However, when I try to access the API endpoint, I get a timeout. Here’s what I’m doing:

>connect to jumphost via ssh
curl --insecure `https://eks-api-endpoint:6443`

Despite the tunnel being established, the curl command times out. I've double-checked my Security Groups and VPC configurations, and everything appears to be in order. Is there anything I'm missing or doing wrong? Any help or pointers would be greatly appreciated!

My main.tf looks like that:

locals {
name = "some"
region = "eu-north-1"

vpccidr = "10.0.0.0/16"
azs = slice(data.aws
availabilityzones.available.names, 0, 3)

bastion
amitype = data.awsami.amazonlinux23.id
ec2instancetype = "t3.small"
tags = {
Example = local.name
}
}

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = local.name
cidr = local.vpccidr

azs = local.azs
public
subnets = cidrsubnet(local.vpc_cidr, 8, 0), cidrsubnet(local.vpc_cidr, 8, 1), cidrsubnet(local.vpc_cidr, 8, 2)
privatesubnets = [cidrsubnet(local.vpccidr, 8, 3), cidrsubnet(local.vpccidr, 8, 4), cidrsubnet(local.vpccidr, 8, 5)]
databasesubnets = [cidrsubnet(local.vpccidr, 8, 6), cidrsubnet(local.vpccidr, 8, 7), cidrsubnet(local.vpccidr, 8, 8)]

enablenatgateway = true
singlenatgateway = true
# onenatgatewayperaz = false
createdatabasesubnetgroup = true
map
publiciponlaunch = true

public
subnettags = {
"
kubernetes.io/role/elb" = 1
}

private
subnettags = {
"
kubernetes.io/role/internal-elb" = 1
}
}

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"

cluster
name = "${local.name}-cluster"
clusterversion = "1.30"

cluster
addons = {
# aws-ebs-csi-driver = {}
coredns = {}
kube-proxy = {}
vpc-cni = {}
}

vpcid = module.vpc.vpcid
subnetids = module.vpc.privatesubnets
createcloudwatchloggroup = false

eks
managednodegroups = {
bottlerocket = {
amitype = "BOTTLEROCKETx8664"
platform = "bottlerocket"

instance
types = "c5.large"
capacitytype = "ONDEMAND"

minsize = 1
max
size = 3
desiredsize = 1
}
}

tags = local.tags

}
resource "aws
keypair" "terraformec2key" {
key
name = "terraformec2key"
publickey = "${file("terraformec2key.pub")}"
}

module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 5.0"

name = "bastion-${
local.name}"
ami = local.bastion
amitype
instance
type = local.ec2instancetype
subnetid = module.vpc.publicsubnets1
vpcsecuritygroupids = [module.ec2securitygroup.securitygroupid]
key
name = "terraformec2key"
}

locals {
name = "some"
region = "eu-north-1"


vpccidr = "10.0.0.0/16"
azs = slice(data.aws
availabilityzones.available.names, 0, 3)


bastion
amitype = data.awsami.amazonlinux23.id
ec2instancetype = "t3.small"
tags = {
Example = local.name
}
}


module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"


name = local.name
cidr = local.vpccidr


azs = local.azs
public
subnets = cidrsubnet(local.vpc_cidr, 8, 0), cidrsubnet(local.vpc_cidr, 8, 1), cidrsubnet(local.vpc_cidr, 8, 2)
privatesubnets = [cidrsubnet(local.vpccidr, 8, 3), cidrsubnet(local.vpccidr, 8, 4), cidrsubnet(local.vpccidr, 8, 5)]
databasesubnets = [cidrsubnet(local.vpccidr, 8, 6), cidrsubnet(local.vpccidr, 8, 7), cidrsubnet(local.vpccidr, 8, 8)]


enablenatgateway = true
singlenatgateway = true
# onenatgatewayperaz = false
createdatabasesubnetgroup = true
map
publiciponlaunch = true


public
subnettags = {
"
kubernetes.io/role/elb" = 1
}


private
subnettags = {
"
kubernetes.io/role/internal-elb" = 1
}
}


module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"


cluster
name = "${local.name}-cluster"
clusterversion = "1.30"


cluster
addons = {
# aws-ebs-csi-driver = {}
coredns = {}
kube-proxy = {}
vpc-cni = {}
}


vpcid = module.vpc.vpcid
subnetids = module.vpc.privatesubnets
createcloudwatchloggroup = false


eks
managednodegroups = {
bottlerocket = {
amitype = "BOTTLEROCKETx8664"
platform = "bottlerocket"


instance
types = "c5.large"
capacitytype = "ONDEMAND"


minsize = 1
max
size = 3
desiredsize = 1
}
}


tags = local.tags


}
resource "aws
keypair" "terraformec2key" {
key
name = "terraformec2key"
publickey = "${file("terraformec2key.pub")}"
}


module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 5.0"


name = "bastion-${
local.name}"
ami = local.bastion
amitype
instance
type = local.ec2instancetype
subnetid = module.vpc.publicsubnets1
vpcsecuritygroupids = [module.ec2securitygroup.securitygroupid]
key
name = "terraformec2key"
}







https://redd.it/1eahid2
@r_devops
Do you abstract and reuse common IaC patterns?

In the middle of sort of a philosophical discussion. I'm curious where you all stand. Say with something like CDK. You notice the same pattern of resources being implemented multiple times. For example an SQS queue triggers a lambda function. The same lines of code are written over and over again to create the queue, lambda, event source, alarms, slack notification, etc. Or maybe it's the same API Gateway to lambda setup. Or it could be a little more complicated like a dynamo stream filter and event bridge. Point is you keep seeing the same code copy/pasted.

Does the repetition bother you? Do you think it should be swapped out for a custom built (shared) construct that creates all of those resources instead of everyone copying/pasting the same code over and over? How do you decide? Is there a threshold of complexity that makes you lean either way?

Pros/cons for building a reusable package? Pros/cons to just keep copying and pasting?

https://redd.it/1eamgo0
@r_devops
Injecting files securely into container during runtime.

Hi. I have a file for django (local_settings.py) that has lots of secrets/passwords in it and right now I'm keeping that file locally on my server and copying it into place before building the Dockerfile, which does the copy into the container. I'm wondering how folks are copying files from a secure location into the container and then protecting it if it has a lot of passwords in it.

https://redd.it/1eagosd
@r_devops
Telegraf / Sensu

Evening, first post here.

Has anyone any experience with using telegraf and sensu together.

Our sensu set up, we have complete control of writing subscriptions but no access to the servers or anything via ssh.

Telegraf, ive installed this on a server, followed standard install guide from them, basic config, inputs atm are just cpu for testing purposes. Output is sensu api url.

In sensu the event appears, however ive no idea how to transform the data to a useful alert/monitor.

I.e if i was sending 10 different inputs, and i wanted to grab metrics around disk space...how do i do that.

Thanks in advance

P.s not using sensu isnt an option πŸ˜©πŸ˜†

https://redd.it/1eagoz8
@r_devops
how to do proper canary deployment for mutli-region application?

hello, i am in charge of designing canary deployment for our microservices. In the same region, it's relatively simple, I use a weighted route53 and wrote a lambda to control the weight while listen to alerts for rollbacks.

How do i do proper canary for applications that's active-passive in two AWS regions? The application has limitation that it can't be active-active due to data consistency concerns. My current idea is to canary one region, then do the other region, but it seems not efficient, so i am here asking for industry best practice. Thanks!

https://redd.it/1eaqadx
@r_devops
Start-up DevOps

I just joined a start-up

They have few GoDaddy web hosts.
Where
Multiple websites are hosted.

1 was windows server with multiple Databases and.
Net projects.
Should I tell the CEO that it's cheaper to use lambda/Linux servers for some of the services


https://redd.it/1eawj7c
@r_devops