Reddit DevOps
271 subscribers
9 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Automate Servers patching across multiple cloud providers

So I've been tasked to find a long term solution for automating and centralising the patching of all of our linux servers across multiple cloud providers. We're currently mostly on AWS-GCP, some Azure exposure and to add to the mess, some old onprem stuff on V-Sphere.

So far I've been dealing mostly with AWS myself, successfully automating the patching of EC2 instances using the in-built functionalities like PatchManager and AWS' Automations.

As of today for some reason, the bosses don't want to patch our servers from within the cloud provider anymore but they're asking for a solution that can be centralised instead, with the goal of patching all of our servers with one unified procedure, regardless of their cloud location or operating system (We run RedHat, Debian, AmazonLinux etc.).



I need to come up with a plan. So far I've been thinking maybe I could set up Ansible Playbooks and run them across all the VMs targeting their operating system, that's the first thing that comes to mind. I'm not sure as to how to proceed yet.

Do you have any suggestions/tips as to how you would tackle this? Also is there a service out there already doing this?

Any insight is much appreciated!




https://redd.it/1d7ym6u
@r_devops
Struggling to stop containers in docker

When I try to stop containers in Docker Desktop, the containers just... don't stop. So I end up restarting my computer, which forces them to stop.

The problem is that I am developing in VS Code with Docker. And every couple of hours, VS Code will lose the Docker connection. So I will restart my computer, which 'solves' the problem, but obviously isn't a great solution.

When I try to stop containers in Docker Desktop, the containers just... don't stop. So I end up restarting my computer, which forces them to stop.

The problem is that I am developing in VS Code with Docker. And every couple of hours, VS Code will lose the Docker connection. So I will restart my computer, which 'solves' the problem, but obviously isn't a great solution.

'Error: Process exited with code 126' is what VS Code is showing the Docker error to be.

https://redd.it/1d7zuy1
@r_devops
Consulting Educational Resources

I have recently been asked by a former director to consult part time for a company that I used to work for. The initial contract would be quite short, and consist only of providing subject matter expertise and project planning to the current team of Platform Engineers.

I have been considering transitioning into consulting at some point, likely after my wife goes back to work (SAHM until our daughter goes to school) in about a year. I think this might be a good opportunity to start getting a foot in the door, however I have not contracted for companies before, and am quickly trying to get up to speed on legal concerns, taxes, finances, etc. This offer has been made pretty short notice.

If anyone has any good resources they would recommend on how to start getting an LLC setup, or general advice they are willing to share I would greatly appreciate it.

https://redd.it/1d7z3ni
@r_devops
I made iOS monitoring app for DigitalOcean with homescreen widgets. Free, no ads, no tracking.

App Store link: https://apps.apple.com/us/app/status-for-digitalocean/id6499493955



Core features:

- View service status

- View current incidents

- View past incidents

- View scheduled and active maintenances

- Add customizable home screen widgets

- Light and dark mode support

- Customizable app icon



I just wanted to make something cool people would use, I’m not a devops engineer so I’d appreciate any feedback. If you’d like to suggest any improvements feel free to leave any comments below.

https://redd.it/1d8534x
@r_devops
Does an IaC Platform exist? HELP!

I had always dreamed of using a platform where I could select some settings based on the type of application and then generate all the infra code in Terraform that I would need to achieve that, and probably just manage everything in that platform as well.


For example, imagine I have a Java dockerized microservice that will use Kubernetes, needs security, and has a Postgres DB.

Let's assume it will be deployed in a brand-new AWS account.

What I'm looking is for a platform where I can create everything just as a wizard; that platform has to:

1) Create code to connect GitHub/GitLab to AWS (build docker container and push it to ECR)
2) Create a CI/CD pipeline to deploy the dockerized service to EKS (This can be triggered from the platform as well, to hide implementation)
3) Under the hood, based on previous settings, it knows the service needs an EKS cluster, ECR, Cognito, and Postgres.

Wondering if you are aware of a platform with those capabilities.



https://redd.it/1d86won
@r_devops
NLB to lightsail data charges



Hi,

AWS Support has given me both answers on two separate tickets on this so I'm turning to you guys.

If an NLB proxies tcp_udp connections to Lightsail through VPC peering (using Lightsail's private IP), and Lightsail "replies" to the client using a TCP tunnel, does the data transferred to the end-client count as Lightsail data charges or EC2 data charges (meaning, if Lightsail sends 4TB worth of data, will they be charged or fall under the free-tier of the respective package?)

Keep in mind that the client ONLY ever sees/sends/receives traffic from the NLB's public IP, none of the packets are marked with Lightsail's public IP.

AWS support has literally given me two separate answers, one being that they fall under EC2 and the other being the opposite -.-

Thanks in advance

https://redd.it/1d84drl
@r_devops
Certifications worth pursuing?

I am aware that experience outweighs certifications any day. However, besides the obvious ones like AWS, GCP, and Azure, are there any other certifications that would make a difference for a software engineer transitioning into DevOps?

If you would pick any from this list, which one would be?

1. Certified Kubernetes Administrator (CKA)
2. Docker Certified Associate (DCA)
3. HashiCorp Certified: Terraform Associate
4. Certified Jenkins Engineer (CJE)
5. Red Hat Certified Specialist in Ansible Automation
6. Puppet Certified Professional
7. ITIL Foundation Certification
8. Certified Agile DevOps Professional
9. CompTIA Linux+
10. Google Professional Cloud DevOps Engineer

https://redd.it/1d8ct5k
@r_devops
Seeking Advice on Learning Development and Starting a Tech Startup

Hi everyone,

I'm a former AI Product Manager with no prior development experience. I have a strong desire to build a software product and start my own tech startup. To achieve this, I know I need to gain development knowledge and learn how to code.

Where should I start? Any tips or resources you can recommend would be greatly appreciated!

Thanks in advance!

https://redd.it/1d8j5py
@r_devops
Internet Speed vs LAN Switch

Vendors are tying to push Giga Switch when my internet speed is, download at 30mb/s. Are they trying to upsell me?

Will I loose performance on the internet speed if I go for a Switch that supports up to 100mb/s?

https://redd.it/1d8kuu8
@r_devops
Should I create the database and the user using Terraform or ansible?

I am working in a software house and for app demopnstration towards the client we are using EC-2 with installed LEMP stack. For the server I use terraform:


```
resource "aws_instance" "instance" {
ami=var.ami
instance_type="t3a.micro"
key_name = var.ssh_key
iam_instance_profile = aws_iam_instance_profile.ec2_profile.name

root_block_device {
volume_size = 30
volume_type = "gp3"
}

count = var.ec2_instance_num

vpc_security_group_ids=var.ec2_security_groups

provisioner "file" {
source = "${path.module}/provision.sh"
destination = "/home/ubuntu/provision.sh"
}

provisioner "remote-exec" {
inline = [
"chmod +x /home/ubuntu/provision.sh",
local.final_provision_command
]
}

connection {
type = "ssh"
user = "ubuntu"
private_key = "${file(var.private_key_path)}"
host = self.public_ip
}
}
```

With the follwing script:

```
#!/usr/bin/env bash

if tput colors >/dev/null 2>&1; then
RED='\033[0;31m'
YELLOW='\033[1;33m'
CYAN='\033[1;35m'
NC='\033[0m' # No Color
else
RED=''
GREEN=''
YELLOW=''
NC=''
fi

print_help() {
echo -e "Usage: ${YELLOW}$0${NC} [options]"
echo -e "${CYAN}Options:${NC}"
echo " --php_ver <version> Specify PHP version (default is 8.2)"
echo " --nodb Do not install any database"
echo " --db_root_password <pass> Set root password for the database"
echo " -h, --help Show this help message"
}

cleanup () {
echo -e "${CYAN}Cleanup${NC}"
rm -rf /home/ubuntu/install

apt-get autoremove && apt-get autoclean
reboot
exit 0;
}

if [ "$EUID" -ne 0 ]; then
echo -e "${RED}ERROR: Run this script as root or via using sudo.${NC}"
echo
print_help
exit 1;
fi

export DEBIAN_FRONTEND=noninteractive

PHP_VERSION="8.2"
DB_TYPE="mariadb"


while [ "$1" != "" ]; do
case $1 in
"--php_ver")
PHP_VERSION=$2
shift 2
;;

"--nodb")
DB_TYPE="none"
shift
;;

"--db_root_password")
DB_ROOT_PASSWORD=$2
shift 2
;;

"-h" | "--help")
print_help
exit 0
;;

*)
echo -e " ${RED}Invalid option: ${YELLOW}$1${NC}"
exit 1
;;
esac
done

apt-get update && apt-get upgrade -y


if [ "$PHP_VERSION" == "" ]; then
echo -e "${RED}No php version provided defaulting into 8.2${NC}"
PHP_VERSION="8.2"
fi

echo -e "${CYAN}PHP ${YELLOW}$PHP_VERSION${CYAN} will be installed ${NC}"

apt-get install -y nginx ca-certificates apt-transport-https software-properties-common ruby-full
add-apt-repository -y ppa:ondrej/php
apt-get update

apt-get install -y php${PHP_VERSION}-fpm \
php${PHP_VERSION}-mbstring \
php${PHP_VERSION}-mysql \
php${PHP_VERSION}-oauth \
php${PHP_VERSION}-opcache \
php${PHP_VERSION}-readline \
php${PHP_VERSION}-xml



POOL_CONF="/etc/php/${PHP_VERSION}/fpm/pool.d/www.conf"
if [ -f "$POOL_CONF" ]; then
echo -e "${CYAN}Configuring PHP-FPM to listen on ${YELLOW}127.0.0.1:9000${NC}"
sed -i "s|^listen = .*|listen = 127.0.0.1:9000|" "$POOL_CONF"
systemctl restart php${PHP_VERSION}-fpm
else
echo -e "${RED}Failed to configure PHP-FPM: ${POOL_CONF} not found${NC}"
cleanup
exit 1
fi

echo -e "${CYAN}Configuring default Vhost${NC}"

rm -rf /var/www/html/*

echo "<?php phpinfo();" > /var/www/html/index.php
systemctl stop nginx

cat >/etc/nginx/sites-available/default <<EOL
server {
listen 80 default_server;
listen [::]:80 default_server;

root /var/www/html;

index index.php index.html index.htm index.nginx-debian.html;

server_name _;

location / {
try_files $uri $uri/ =404;
}

location ~ \.php$ {
include snippets/fastcgi-php.conf;


# With php-cgi (or other tcp sockets):
fastcgi_pass 127.0.0.1:9000;
}

location ~ /\.ht {
deny all;
}
}
EOL

systemctl start nginx

echo -e "${CYAN}Installing ${YELLOW}Codeploy Agent${NC}"

rm -rf ./install
wget
https://aws-codedeploy-eu-west-1.s3.eu-west-1.amazonaws.com/latest/install
chmod +x ./install
./install auto
systemctl start codedeploy-agent

echo -e "${CYAN}Config ${YELLOW}cron${CYAN} for ${YELLOW}Codeploy Agent${NC}"

croncmd="@reboot systemctl start codedeploy-agent"
( crontab -l | grep -v -F "$croncmd" ; echo "$croncmd" ) | crontab -

if [ "$DB_TYPE" == 'none' ];then
echo -e "${YELLOW}No Db support will be installed${NC}"
cleanup
exit 0;
fi

echo -e "${CYAN}Installing ${YELLOW}${DB_TYPE}${NC}"

apt-get -y install mariadb-server mariadb-client

if [ "$DB_ROOT_PASSWORD" == "" ]; then
echo -e "${YELLOW}DB Root password is missing. skipping${NC}"
cleanup
exit 0;
fi

echo "${CYAN}Provisioning Root User${NC}"

# Make sure that NOBODY can access the server without a password
mysql -e "UPDATE mysql.user SET Password = PASSWORD('${DB_ROOT_PASSWORD}') WHERE User = 'root'"
# Kill the anonymous users
mysql -e "DROP USER ''@'localhost'"
# Because our hostname varies we'll use some Bash magic here.
mysql -e "DROP USER ''@'$(hostname)'"
# Kill off the demo database
mysql -e "DROP DATABASE IF EXISTS test"
# Make our changes take effect
mysql -e "FLUSH PRIVILEGES"


```

And I have a question should I also create the db via terraform or use ansiublwe for that. My concern is, because terraform encourages the Immutable Infrastructure if I need to change the db user password I will also lose the db data.

So should do you reccomend using Ansible Instread?

https://redd.it/1d8kd2f
@r_devops
Debug Github actions with the help of an LLM-powered pull request bot

[I built this during a recent hackday](https://github.com/marketplace/treebeard-build)...here's the background:


I maintain a popular [pytest plugin](https://github.com/treebeardtech/nbmake) and throughout its life have supported and observed many developers struggling with GitHub actions.

* It's hard to identify what caused a failure given the length of some ci logs
* Multiple ci jobs can fail with the same cause meaning it's noisy
* It's unclear how to prioritise fixes to these failures

This Github app gives you a prioritised, de-duplicated list of issues relating to your GitHub actions failure.

It uses LLMs (GPT3.5 at the moment) to identify the most likely root cause, highlight relevant source files, and order the issues by priority.

Feedback welcome!



https://redd.it/1d8kbto
@r_devops
is monitoring Kafka hard for you? Looking for feedback on some features for better monitoring and troubleshooting Kafka

Working in the observability and monitoring space for the last few years, we have had multiple folks complain about the lack of detailed monitoring for messaging queues and Kafka in particular. Especially with the coming of instrumentation standards like OpenTelemetry, we thought there must a better way to solve this.

We dived deeper into the problem and were trying to understand what better can be done here to make understanding and remediating issues in messaging systems much easier.

We would love to understand if these problem statements resonate with the community here and would love any feedback on how this can be more useful to you. We also have shared some wireframes on proposed solutions, but those are just to put our current thought process more concretely. We would love any feedback on what flows, starting points would be most useful to you.

One of the key things we want to leverage is distributed tracing. Most current monitoring solutions for Kafka show metrics about Kafka, but metrics are often aggregated and often don’t give much details on where exactly things are going wrong. Traces on the other hand shows you the exact path which a message has taken and provides lot more details. One of our focus is how we can leverage information from traces to help solving issues much faster.

Please have a look on a detailed blog we have written on the some problems and proposed solutions. https://signoz.io/blog/kafka-monitoring-opentelemetry/

Would love any feedback on the same -

1. which of these problems resonate with you?
2. Do proposed solutions/wireframes make sense? What can be done better?
3. Anything we missed which might be important to consider

https://redd.it/1d8pue6
@r_devops
What factors are important when wanting to downscale datadog-agents?

I'm working on reducing the number of datadog-agents we have running on our hosts. A lot of hosts I have visualized have below 5% CPU Utilization, low disk usage and not that much log ingestion. However, the downside of removing the agent is that you have no visibility other than accessing the instance itself or using cloud provider metrics. I wonder if those three points I mentioned are valid reasons to remove visibility on hosts. I could not find anything on the internet that answers something similiar.

https://redd.it/1d8pb7d
@r_devops
is anybody using dagger.io in production?

Hello,

I recently discovered https://dagger.io/ and I love the concept and after playing around with some scripts and examples, I have a semi working pipeline im starting to get happy with.

I am interested in looking into using this production (or at least some staging environments for now) but I have concerns, mostly a) the documentation is terrible, its constantly out of date and suggesting different approaches b) the API is constantly changing and there is no clear approach to solve problems. The best source of documentation has been the github issues list, which worries me.

That being said it seems like a fantastic tool to prevent our current YML soup and promote code reuse. Once I made some initial headway it has been refreshing.

Is anybody using it and having the same difficulties? Anybody know of similar tooling for producing build images? Interested to hear peoples thoughts.

https://redd.it/1d8sqk6
@r_devops
What make a monitoring tool (production ready) ?

There are numerous tools available for monitoring, ranging from open-source and freeware to premium options. Based on your experience, what makes a tool suitable for production use? What criteria/functinality do you consider when selecting a tool or stack for your environments?

In line with this discussion, here is my go to for discovering new open-source tools

https://ossinsight.io/collections/monitoring-tool/

https://redd.it/1d8uul5
@r_devops
Need Advice

After I graduated I got my first career job two years ago as an Integration Dev, I want to start learning new tech so I can find a DevOp job starting next year. Im looking at KoudeCloud tried there DevOp course on Udemy and its 90% of the things I do at work. Will it be hard finding a DevOp job with 3 years experience as I only do light work at my job and the 20+ years experience do the hard work. I want to learn more but everyone is busy at my Job and there is no salary increase for three years. Any courses, subjects I could learn that could help more to break into DevOps?
I got a BS in CS, work at IBM for two years now as an integration dev but barely any work as I had no experience at all.


https://redd.it/1d8wfoe
@r_devops
Web app deployment not pulling forked repository code

### Scenario
I'm attempting to deploy a sample .NET Hello World Webapp to Azure, referencing a forked GitHub repository. The deployment process indicates that it is referencing my forked repository, but the GitHub Actions workflow does not trigger, and the web app is not deployed from the forked repo. I've tinkered with various settings with the terraform config, to no avail.

### Issue
Despite configuring the repository and providing the GitHub authentication token, the GitHub Actions workflow is not triggered, and the web app is not deployed from the forked repository. I've also tinkered with setting the manual_integration variable of the source control slot to no avail.

### Question:
Is there an additional authentication step or configuration that I'm missing between GitHub and the Azure portal to ensure the deployment triggers and pulls the code from my forked repository?

I am being patient post-deployment and waiting 10 minutes for the site to show. And it does work when deployed manually in the Azure portal.

Any help would be greatly appreciated! Thank you!

## Terraform config:

### webapp.tf

resource "azurerm_service_plan" "srv_plan" {
name = "${local.prefix}service-plan"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
sku_name = var.webapp_sku
os_type = var.webappos
tags = local.common_tags
}

resource "azurerm_windows_web_app" "dot_net_web_app" {
name = "${local.prefix}dotnet-app"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
service_plan_id = azurerm_service_plan.srv_plan.id
https_only = true
public_network_access_enabled = false
enabled = true
webdeploy_publish_basic_authentication_enabled = true

site_config {
always_on = true
minimum_tls_version = var.tls_version
application_stack {
current_stack = var.stack
dotnet_version = var.stack_version
}
}
}

resource "azurerm_source_control_token" "source_token" {
type = "GitHub"
token = var.github_auth_token
token_secret = var.github_auth_token
}

resource "azurerm_windows_web_app_slot" "slot" {
name = "${local.prefix}app-slot"
app_service_id = azurerm_windows_web_app.dot_net_web_app.id
site_config {}
}

resource "azurerm_app_service_source_control_slot" "git_source" {
slot_id = azurerm_windows_web_app_slot.slot.id
repo_url = var.webapp_repo_url
branch = var.webapp_repo_branch
use_mercurial = false
use_manual_integration = false
depends_on = [azurerm_source_control_token.source_token]
github_action_configuration {
generate_workflow_file = false
}
}


### variables.tf

variable "webapp_sku" {
type = string
default = "P1v2"
}

variable "webappos" {
type = string
default = "Windows"
}

variable "tls_version" {
type = string
default = "1.2"
}

variable "stack" {
type = string
default = "dotnet"
}

variable "stack_version" {
type = string
default = "v7.0"
}

variable "subresource" {
type = list(string)
default = ["sites"]
}

variable "webapp_repo_url" {
type = string
default = "https://github.com/ZimCanIT/hello-world-webapp"
}

variable "webapp_repo_branch" {
type = string
default = "main"
}

variable "github_auth_token" {
type = string
sensitive = true
description = "Token for authorization"
}


https://redd.it/1d8wack
@r_devops
How does the logging platform of your company look like?

Would love to know the following,

Technology e.g. ELK, Splunk, or any other self managed OSS?
Data size
UX - SQL or free form text
Structured/Unstructured logs/Mix

https://redd.it/1d8zyzu
@r_devops
Devops Resume Review

I've been in DevOps for a little bit and I'm looking for another DevOps Position. I'm striking out on my hunt and I think its because of my resume.

Please either

- roast me
- tell me how to make my resume better

Thanks!

https://imgur.com/UNMF3Ar

https://redd.it/1d8yp6m
@r_devops