Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Add Users to SQL Database (Azure SQL Managed) In CI/CD Pipeline - Permissions Question

Hello,

I originally posted this in the terraform sub but it hasn't gained any traction so trying here.

I have a CI/CD pipeline in AzDevops that runs on a self-hosted agent with a user-assigned Identity. I provision a new SQL Database with terraform and want to add a user to it in the pipeline.

The only solution I've seen so far is to add the identity of the agent as an admin to the SQL server via an Entra Group. This feels bad security wise as a breach of the CI/CD agent would expose every database we have. Am I overthinking this?

Any better solutions?

https://redd.it/1ea272u
@r_devops
CI with JENKINS

I am a QA and all companies I have been at QA's don't even use Maven, let alone Jenkins, but I am trying to understand the CI process. Here is the way I see it. Correct me where I am wrong. Firstly, I think that CI is only used, if you have automation testing, since with manual testing there is nothing to integrate dev code with. Also, you can have dev without qa (though your app will be riddled with defects), but you can't have qa without dev. That is the reason Jenkins connects with dev branch on GIT. After packaging, it sends JAR to docker container. Which then destributes the code to various environments. IT goes to PROD environment only when you do release. Build is any update in the code and one release is comprised of multiple builds. Still some unanswered questions, but is all that correct?

https://redd.it/1ea7fj2
@r_devops
Running a Sidecar container as a cron job

Googling this topic shows a few methods of achieving this but I'm not sure which way would be best for my needs.

In my current setup I'm spinning up a pod with 2 containers:

- Main container (Thanos Ruler)
- Sidecar container (just my Python script)

This is the Helm values file:

ruler:
enabled: true
logLevel: debug
clusterName: local-ruler
alertmanagers:
- https://prometheus-kube-prometheus-alertmanager.prometheus.svc.cluster.local:9093
extraFlags:
- --rule-file=/synced-rules/*.yml
sidecars:
- name: rule-syncer
image: python:3.12-alpine
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args:
- -c
- |
echo "Starting rule-syncer sidecar"
pip install requests pyyaml --quiet
echo "Running script"
python /scripts/ruler_syncer.py
volumeMounts:
- name: synced-rules
mountPath: /synced-rules
- name: rule-syncer-script
mountPath: /scripts
extraVolumes:
- name: synced-rules
emptyDir: {}
- name: rule-syncer-script
configMap:
name: rule-syncer-script
defaultMode: 0755
extraVolumeMounts:
- name: synced-rules
mountPath: /synced-rules

Instead of running my script in a `while True` loop, I'd rather just run it as a cron job. My script needs to be mounted with the volume used in the main container.

What would be the ideal way to achieve this? I'm planning to build an image for the script/sidecar, but once that's done, how would I run it periodically?

Any help would be appreciated. Kind of new to Kubernetes.

https://redd.it/1ea8z43
@r_devops
Which Sheet should I follow for my Intern Preparation?

I am unsure about which sheet should I follow? Striver's A2Z or SDE. I have been suggested to A2Z as it is more beginner friendly and I should use SDE for my revision. But I do not have much time. Companies have already started approaching in my campus. I want to know opinion of you guys.

https://redd.it/1eaa7xg
@r_devops
Networking for DevOps

Hey there,

I'm a junior backend engineer with experience in both Python and Go. I'm interested in gradually transitioning into the DevOps field and was wondering how much networking knowledge is required for an entry-level DevOps position. Are the study materials for Network+ (or A+) sufficient, or do they contain too many unnecessary details, or should I aim for higher-level certifications? Also, do you have any course recommendations?

https://redd.it/1eabyne
@r_devops
DOCKERS in JENKINS

Trying to study up on Dockers and few things I don't understand so far. Firstly, why when you instantiate a docker, you need a DB connection with your data base. If you are using Java project, you may have zipped libraries in your JAR file to connect with DB, but DB itself is never even on GIT repo of a Java project to begin with. Secondly, am I right that for a pipeline, you need only one Docker image. It will then determine where to send your code

https://redd.it/1eac89g
@r_devops
What should I know when going from a bigger team to a team where I'm the only DevOps engineer?

I'm in talks with some potential employers and all of them have a small number of DevOps engineers (1-3 people) or they need only one DevOps engineer for the position.

At the moment I'm in a team of around 10-15 DevOps engineers (it's mostly DevOps with a mix of SecOps Engineers, DBAs) If I'm stuck with something I have the option to ask someone else on the team for help.

What should I know if I switch to a mixed team that has developers/QAs and I'm the only DevOps engineer?

https://redd.it/1ea4c1y
@r_devops
Roast this Github app I built, DevOps use case?

Hi folks 👋

I'm sharing my Github app called Pull Checklist. Pull Checklist lets you build checklists that block PR merging until all checks are ticked.

I created this tool because:

1. I found myself using checklists outside of Github to follow specific deployment processes
2. I worked at a company where we had specific runbooks we needed to follow when migrating the db or interacting with the data pipeline

Would really appreciate any feedback on this and whether there's a good use case for DevOps teams.

https://redd.it/1eahtmu
@r_devops
Best PagerDuty Alternative? Lets be honest PagerDuty is expensive and full of feature bloat.

My team has been using PagerDuty for a bit, but we are now looking for an alternative as the system itself is a bit confusing, the scheduling sucks, and the pricing is ridiculous for what we are looking for.

Rather than spend weeks testing and trialing everything on the market, we thought we would ask the group what oncall management/alerting tool you all have had the best luck with.

We are truly just looking for on-call scheduling, alerting, and possibly call routing, as well as the ability to integrate with some common systems we utilize.

What are everyone's thoughts on a better alternative to PagerDuty? Thanks in advance!

https://redd.it/1eahol3
@r_devops
Need Help with Terraform EKS Cluster - Cannot Access API Endpoint from Jumphost

Hey everyone,

I'm currently facing an issue with the EKS infrastructure I set up using Terraform. Everything seems to be standing up correctly, but I'm having trouble accessing the cluster.

Here's a brief overview of what I've done:

1. I wrote the infrastructure in Terraform to create an EKS cluster and associated resources.
2. Everything deploys without any errors.
3. I set up an SSH tunnel to a jump host to access the EKS API server.

However, when I try to access the API endpoint, I get a timeout. Here’s what I’m doing:

>connect to jumphost via ssh
curl --insecure `https://eks-api-endpoint:6443`

Despite the tunnel being established, the curl command times out. I've double-checked my Security Groups and VPC configurations, and everything appears to be in order. Is there anything I'm missing or doing wrong? Any help or pointers would be greatly appreciated!

My main.tf looks like that:

locals {
name = "some"
region = "eu-north-1"

vpccidr = "10.0.0.0/16"
azs = slice(data.aws
availabilityzones.available.names, 0, 3)

bastion
amitype = data.awsami.amazonlinux23.id
ec2instancetype = "t3.small"
tags = {
Example = local.name
}
}

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = local.name
cidr = local.vpccidr

azs = local.azs
public
subnets = cidrsubnet(local.vpc_cidr, 8, 0), cidrsubnet(local.vpc_cidr, 8, 1), cidrsubnet(local.vpc_cidr, 8, 2)
privatesubnets = [cidrsubnet(local.vpccidr, 8, 3), cidrsubnet(local.vpccidr, 8, 4), cidrsubnet(local.vpccidr, 8, 5)]
databasesubnets = [cidrsubnet(local.vpccidr, 8, 6), cidrsubnet(local.vpccidr, 8, 7), cidrsubnet(local.vpccidr, 8, 8)]

enablenatgateway = true
singlenatgateway = true
# onenatgatewayperaz = false
createdatabasesubnetgroup = true
map
publiciponlaunch = true

public
subnettags = {
"
kubernetes.io/role/elb" = 1
}

private
subnettags = {
"
kubernetes.io/role/internal-elb" = 1
}
}

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"

cluster
name = "${local.name}-cluster"
clusterversion = "1.30"

cluster
addons = {
# aws-ebs-csi-driver = {}
coredns = {}
kube-proxy = {}
vpc-cni = {}
}

vpcid = module.vpc.vpcid
subnetids = module.vpc.privatesubnets
createcloudwatchloggroup = false

eks
managednodegroups = {
bottlerocket = {
amitype = "BOTTLEROCKETx8664"
platform = "bottlerocket"

instance
types = "c5.large"
capacitytype = "ONDEMAND"

minsize = 1
max
size = 3
desiredsize = 1
}
}

tags = local.tags

}
resource "aws
keypair" "terraformec2key" {
key
name = "terraformec2key"
publickey = "${file("terraformec2key.pub")}"
}

module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 5.0"

name = "bastion-${
local.name}"
ami = local.bastion
amitype
instance
type = local.ec2instancetype
subnetid = module.vpc.publicsubnets1
vpcsecuritygroupids = [module.ec2securitygroup.securitygroupid]
key
name = "terraformec2key"
}

locals {
name = "some"
region = "eu-north-1"


vpccidr = "10.0.0.0/16"
azs = slice(data.aws
availabilityzones.available.names, 0, 3)


bastion
amitype = data.awsami.amazonlinux23.id
ec2instancetype = "t3.small"
tags = {
Example = local.name
}
}


module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"


name = local.name
cidr = local.vpccidr


azs = local.azs
public
subnets = cidrsubnet(local.vpc_cidr, 8, 0), cidrsubnet(local.vpc_cidr, 8, 1), cidrsubnet(local.vpc_cidr, 8, 2)
privatesubnets = [cidrsubnet(local.vpccidr, 8, 3), cidrsubnet(local.vpccidr, 8, 4), cidrsubnet(local.vpccidr, 8, 5)]
databasesubnets = [cidrsubnet(local.vpccidr, 8, 6), cidrsubnet(local.vpccidr, 8, 7), cidrsubnet(local.vpccidr, 8, 8)]


enablenatgateway = true
singlenatgateway = true
# onenatgatewayperaz = false
createdatabasesubnetgroup = true
map
publiciponlaunch = true


public
subnettags = {
"
kubernetes.io/role/elb" = 1
}


private
subnettags = {
"
kubernetes.io/role/internal-elb" = 1
}
}


module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"


cluster
name = "${local.name}-cluster"
clusterversion = "1.30"


cluster
addons = {
# aws-ebs-csi-driver = {}
coredns = {}
kube-proxy = {}
vpc-cni = {}
}


vpcid = module.vpc.vpcid
subnetids = module.vpc.privatesubnets
createcloudwatchloggroup = false


eks
managednodegroups = {
bottlerocket = {
amitype = "BOTTLEROCKETx8664"
platform = "bottlerocket"


instance
types = "c5.large"
capacitytype = "ONDEMAND"


minsize = 1
max
size = 3
desiredsize = 1
}
}


tags = local.tags


}
resource "aws
keypair" "terraformec2key" {
key
name = "terraformec2key"
publickey = "${file("terraformec2key.pub")}"
}


module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "~> 5.0"


name = "bastion-${
local.name}"
ami = local.bastion
amitype
instance
type = local.ec2instancetype
subnetid = module.vpc.publicsubnets1
vpcsecuritygroupids = [module.ec2securitygroup.securitygroupid]
key
name = "terraformec2key"
}







https://redd.it/1eahid2
@r_devops
Do you abstract and reuse common IaC patterns?

In the middle of sort of a philosophical discussion. I'm curious where you all stand. Say with something like CDK. You notice the same pattern of resources being implemented multiple times. For example an SQS queue triggers a lambda function. The same lines of code are written over and over again to create the queue, lambda, event source, alarms, slack notification, etc. Or maybe it's the same API Gateway to lambda setup. Or it could be a little more complicated like a dynamo stream filter and event bridge. Point is you keep seeing the same code copy/pasted.

Does the repetition bother you? Do you think it should be swapped out for a custom built (shared) construct that creates all of those resources instead of everyone copying/pasting the same code over and over? How do you decide? Is there a threshold of complexity that makes you lean either way?

Pros/cons for building a reusable package? Pros/cons to just keep copying and pasting?

https://redd.it/1eamgo0
@r_devops
Injecting files securely into container during runtime.

Hi. I have a file for django (local_settings.py) that has lots of secrets/passwords in it and right now I'm keeping that file locally on my server and copying it into place before building the Dockerfile, which does the copy into the container. I'm wondering how folks are copying files from a secure location into the container and then protecting it if it has a lot of passwords in it.

https://redd.it/1eagosd
@r_devops
Telegraf / Sensu

Evening, first post here.

Has anyone any experience with using telegraf and sensu together.

Our sensu set up, we have complete control of writing subscriptions but no access to the servers or anything via ssh.

Telegraf, ive installed this on a server, followed standard install guide from them, basic config, inputs atm are just cpu for testing purposes. Output is sensu api url.

In sensu the event appears, however ive no idea how to transform the data to a useful alert/monitor.

I.e if i was sending 10 different inputs, and i wanted to grab metrics around disk space...how do i do that.

Thanks in advance

P.s not using sensu isnt an option 😩😆

https://redd.it/1eagoz8
@r_devops
how to do proper canary deployment for mutli-region application?

hello, i am in charge of designing canary deployment for our microservices. In the same region, it's relatively simple, I use a weighted route53 and wrote a lambda to control the weight while listen to alerts for rollbacks.

How do i do proper canary for applications that's active-passive in two AWS regions? The application has limitation that it can't be active-active due to data consistency concerns. My current idea is to canary one region, then do the other region, but it seems not efficient, so i am here asking for industry best practice. Thanks!

https://redd.it/1eaqadx
@r_devops
Start-up DevOps

I just joined a start-up

They have few GoDaddy web hosts.
Where
Multiple websites are hosted.

1 was windows server with multiple Databases and.
Net projects.
Should I tell the CEO that it's cheaper to use lambda/Linux servers for some of the services


https://redd.it/1eawj7c
@r_devops
Does anyone have internal CLI tools they have built?

I've started building a CLI tool for our team to use to perform regular actions or search logs in a way that is more aligned to how to how we deploy our applications (think get logs <some-api-we-have> and it'll return back a sensible time ordered collection of logs from various k8s pods, queues and such)

Does anyone else have similar tools? What do they do? Do you find them useful?

https://redd.it/1eb0ni4
@r_devops
Just in time (JIT) AWS escalation tool?

Looking for some tool or service that is:

- cheap / free
- not awful to set up
- can be used with one account/organization
- allows approval and review for temporary audited access to elevated AWS access

I read through this AWS TEAM tool but it requires a second federated organization and my team doesnt want to set up another org in our AWS account.

Any suggestions?

https://redd.it/1eb2ew8
@r_devops
I am a complete noob to devops, and was offered an IaaC role. I am terrified to take it but I really think it can be a great opportunity.

Hi guys, I am currently an a cloud/network engineer supporting a live financial application. I've written SQL scripts, PS scripts, built a few network automation scripts through python, built a few playbooks with Ansible, and learned OOO with C++ in college. However, I have been offered an IaaC engineer role (no production code involved, yet) and I am extremely nervous to take it. I only have about 5 years of true experience in IT but I think this role can be a great segway for me into automation, which is what I've always wanted to focus on rather than pure infrastructure side of things. Im extremely nervous, and I would love to succeed in this role but I do not have much help except this community. Please offer me any advice you have!

https://redd.it/1eb3e3x
@r_devops
CrowdStrike Preliminary Post Incident Review

CrowdStrike put out their official PIR on the incident. I hope whoever wrote this was banging their head against a desk when they had to basically write out "our only testing for this was an automated test that didn't even officially pass".

Here's the link for anyone interested: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

https://redd.it/1eb40oo
@r_devops
No Vault TLS for Production cluster

Hi, i'm trying to set up a Vault production cluster for our company.
The issue i'm having right now is that the browser doesn't recognize my CA certificate. I have created it with this command:

#generate ca in /tmp
cfssl gencert -initca ca-csr.json | cfssljson -bare /tmp/ca

#generate certificate in /tmp
cfssl gencert \
-ca=/tmp/ca.pem \
-ca-key=/tmp/ca-key.pem \
-config=ca-config.json \
-hostname="vault,vault.vault.svc.cluster.local,vault.vault.svc,localhost,127.0.0.1" \
-profile=default \
ca-csr.json | cfssljson -bare /tmp/vault

As i understood this a self signed certificate that's valid only inside my cluster. Used this method as the Vault setup requires tls-server and tls-ca. I can generate the tls-server in my Cloudflare account or use the cert-manager to create one for myself but it doesn't want to work as intended.

extraEnvironmentVars:
VAULTCACERT: /vault/userconfig/tls-ca/tls.crt

extraVolumes:
- type: secret
name: tls-server
- type: secret
name: tls-ca

standalone:
enabled: false
ha:
enabled: true
replicas: 3
config: |
ui = true

listener "tcp" {
tls
disable = 0
address = "0.0.0.0:8200"
tlscertfile = "/vault/userconfig/tls-server/tls.crt"
tlskeyfile = "/vault/userconfig/tls-server/tls.key"
tlsminversion = "tls12"
}

storage "consul" {
path = "vault"
address = "consul-consul-server:8500"
}

# Vault UI
ui:
enabled: true
externalPort: 8200

I was thinking may be to have another certificate to cover the ingress exit only and to use for local cluster a the self signed certificates, but won't work like that too.
Here's the ingress i try to create the connection:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: vault-ingress
namespace: vault
spec:
rules:
- host: vault.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: vault-ui
port:
number: 8200
tls:
- hosts:
- vault.company.com
secretName: default-workshop-example-tls
ingressClassName: nginx

I'm trying to get my head around this for a week, but i can't. Any help would be welcomed! 🙏


The questions are:
How to generate a valid CA certificate? As i understood i can't do it.
How to enable TLS in Vault?
Is my config may be wrong?


https://redd.it/1eb273e
@r_devops