Reddit DevOps
270 subscribers
9 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Seeking Advice on Learning Development and Starting a Tech Startup

Hi everyone,

I'm a former AI Product Manager with no prior development experience. I have a strong desire to build a software product and start my own tech startup. To achieve this, I know I need to gain development knowledge and learn how to code.

Where should I start? Any tips or resources you can recommend would be greatly appreciated!

Thanks in advance!

https://redd.it/1d8j5py
@r_devops
Internet Speed vs LAN Switch

Vendors are tying to push Giga Switch when my internet speed is, download at 30mb/s. Are they trying to upsell me?

Will I loose performance on the internet speed if I go for a Switch that supports up to 100mb/s?

https://redd.it/1d8kuu8
@r_devops
Should I create the database and the user using Terraform or ansible?

I am working in a software house and for app demopnstration towards the client we are using EC-2 with installed LEMP stack. For the server I use terraform:


```
resource "aws_instance" "instance" {
ami=var.ami
instance_type="t3a.micro"
key_name = var.ssh_key
iam_instance_profile = aws_iam_instance_profile.ec2_profile.name

root_block_device {
volume_size = 30
volume_type = "gp3"
}

count = var.ec2_instance_num

vpc_security_group_ids=var.ec2_security_groups

provisioner "file" {
source = "${path.module}/provision.sh"
destination = "/home/ubuntu/provision.sh"
}

provisioner "remote-exec" {
inline = [
"chmod +x /home/ubuntu/provision.sh",
local.final_provision_command
]
}

connection {
type = "ssh"
user = "ubuntu"
private_key = "${file(var.private_key_path)}"
host = self.public_ip
}
}
```

With the follwing script:

```
#!/usr/bin/env bash

if tput colors >/dev/null 2>&1; then
RED='\033[0;31m'
YELLOW='\033[1;33m'
CYAN='\033[1;35m'
NC='\033[0m' # No Color
else
RED=''
GREEN=''
YELLOW=''
NC=''
fi

print_help() {
echo -e "Usage: ${YELLOW}$0${NC} [options]"
echo -e "${CYAN}Options:${NC}"
echo " --php_ver <version> Specify PHP version (default is 8.2)"
echo " --nodb Do not install any database"
echo " --db_root_password <pass> Set root password for the database"
echo " -h, --help Show this help message"
}

cleanup () {
echo -e "${CYAN}Cleanup${NC}"
rm -rf /home/ubuntu/install

apt-get autoremove && apt-get autoclean
reboot
exit 0;
}

if [ "$EUID" -ne 0 ]; then
echo -e "${RED}ERROR: Run this script as root or via using sudo.${NC}"
echo
print_help
exit 1;
fi

export DEBIAN_FRONTEND=noninteractive

PHP_VERSION="8.2"
DB_TYPE="mariadb"


while [ "$1" != "" ]; do
case $1 in
"--php_ver")
PHP_VERSION=$2
shift 2
;;

"--nodb")
DB_TYPE="none"
shift
;;

"--db_root_password")
DB_ROOT_PASSWORD=$2
shift 2
;;

"-h" | "--help")
print_help
exit 0
;;

*)
echo -e " ${RED}Invalid option: ${YELLOW}$1${NC}"
exit 1
;;
esac
done

apt-get update && apt-get upgrade -y


if [ "$PHP_VERSION" == "" ]; then
echo -e "${RED}No php version provided defaulting into 8.2${NC}"
PHP_VERSION="8.2"
fi

echo -e "${CYAN}PHP ${YELLOW}$PHP_VERSION${CYAN} will be installed ${NC}"

apt-get install -y nginx ca-certificates apt-transport-https software-properties-common ruby-full
add-apt-repository -y ppa:ondrej/php
apt-get update

apt-get install -y php${PHP_VERSION}-fpm \
php${PHP_VERSION}-mbstring \
php${PHP_VERSION}-mysql \
php${PHP_VERSION}-oauth \
php${PHP_VERSION}-opcache \
php${PHP_VERSION}-readline \
php${PHP_VERSION}-xml



POOL_CONF="/etc/php/${PHP_VERSION}/fpm/pool.d/www.conf"
if [ -f "$POOL_CONF" ]; then
echo -e "${CYAN}Configuring PHP-FPM to listen on ${YELLOW}127.0.0.1:9000${NC}"
sed -i "s|^listen = .*|listen = 127.0.0.1:9000|" "$POOL_CONF"
systemctl restart php${PHP_VERSION}-fpm
else
echo -e "${RED}Failed to configure PHP-FPM: ${POOL_CONF} not found${NC}"
cleanup
exit 1
fi

echo -e "${CYAN}Configuring default Vhost${NC}"

rm -rf /var/www/html/*

echo "<?php phpinfo();" > /var/www/html/index.php
systemctl stop nginx

cat >/etc/nginx/sites-available/default <<EOL
server {
listen 80 default_server;
listen [::]:80 default_server;

root /var/www/html;

index index.php index.html index.htm index.nginx-debian.html;

server_name _;

location / {
try_files $uri $uri/ =404;
}

location ~ \.php$ {
include snippets/fastcgi-php.conf;


# With php-cgi (or other tcp sockets):
fastcgi_pass 127.0.0.1:9000;
}

location ~ /\.ht {
deny all;
}
}
EOL

systemctl start nginx

echo -e "${CYAN}Installing ${YELLOW}Codeploy Agent${NC}"

rm -rf ./install
wget
https://aws-codedeploy-eu-west-1.s3.eu-west-1.amazonaws.com/latest/install
chmod +x ./install
./install auto
systemctl start codedeploy-agent

echo -e "${CYAN}Config ${YELLOW}cron${CYAN} for ${YELLOW}Codeploy Agent${NC}"

croncmd="@reboot systemctl start codedeploy-agent"
( crontab -l | grep -v -F "$croncmd" ; echo "$croncmd" ) | crontab -

if [ "$DB_TYPE" == 'none' ];then
echo -e "${YELLOW}No Db support will be installed${NC}"
cleanup
exit 0;
fi

echo -e "${CYAN}Installing ${YELLOW}${DB_TYPE}${NC}"

apt-get -y install mariadb-server mariadb-client

if [ "$DB_ROOT_PASSWORD" == "" ]; then
echo -e "${YELLOW}DB Root password is missing. skipping${NC}"
cleanup
exit 0;
fi

echo "${CYAN}Provisioning Root User${NC}"

# Make sure that NOBODY can access the server without a password
mysql -e "UPDATE mysql.user SET Password = PASSWORD('${DB_ROOT_PASSWORD}') WHERE User = 'root'"
# Kill the anonymous users
mysql -e "DROP USER ''@'localhost'"
# Because our hostname varies we'll use some Bash magic here.
mysql -e "DROP USER ''@'$(hostname)'"
# Kill off the demo database
mysql -e "DROP DATABASE IF EXISTS test"
# Make our changes take effect
mysql -e "FLUSH PRIVILEGES"


```

And I have a question should I also create the db via terraform or use ansiublwe for that. My concern is, because terraform encourages the Immutable Infrastructure if I need to change the db user password I will also lose the db data.

So should do you reccomend using Ansible Instread?

https://redd.it/1d8kd2f
@r_devops
Debug Github actions with the help of an LLM-powered pull request bot

[I built this during a recent hackday](https://github.com/marketplace/treebeard-build)...here's the background:


I maintain a popular [pytest plugin](https://github.com/treebeardtech/nbmake) and throughout its life have supported and observed many developers struggling with GitHub actions.

* It's hard to identify what caused a failure given the length of some ci logs
* Multiple ci jobs can fail with the same cause meaning it's noisy
* It's unclear how to prioritise fixes to these failures

This Github app gives you a prioritised, de-duplicated list of issues relating to your GitHub actions failure.

It uses LLMs (GPT3.5 at the moment) to identify the most likely root cause, highlight relevant source files, and order the issues by priority.

Feedback welcome!



https://redd.it/1d8kbto
@r_devops
is monitoring Kafka hard for you? Looking for feedback on some features for better monitoring and troubleshooting Kafka

Working in the observability and monitoring space for the last few years, we have had multiple folks complain about the lack of detailed monitoring for messaging queues and Kafka in particular. Especially with the coming of instrumentation standards like OpenTelemetry, we thought there must a better way to solve this.

We dived deeper into the problem and were trying to understand what better can be done here to make understanding and remediating issues in messaging systems much easier.

We would love to understand if these problem statements resonate with the community here and would love any feedback on how this can be more useful to you. We also have shared some wireframes on proposed solutions, but those are just to put our current thought process more concretely. We would love any feedback on what flows, starting points would be most useful to you.

One of the key things we want to leverage is distributed tracing. Most current monitoring solutions for Kafka show metrics about Kafka, but metrics are often aggregated and often donโ€™t give much details on where exactly things are going wrong. Traces on the other hand shows you the exact path which a message has taken and provides lot more details. One of our focus is how we can leverage information from traces to help solving issues much faster.

Please have a look on a detailed blog we have written on the some problems and proposed solutions. https://signoz.io/blog/kafka-monitoring-opentelemetry/

Would love any feedback on the same -

1. which of these problems resonate with you?
2. Do proposed solutions/wireframes make sense? What can be done better?
3. Anything we missed which might be important to consider

https://redd.it/1d8pue6
@r_devops
What factors are important when wanting to downscale datadog-agents?

I'm working on reducing the number of datadog-agents we have running on our hosts. A lot of hosts I have visualized have below 5% CPU Utilization, low disk usage and not that much log ingestion. However, the downside of removing the agent is that you have no visibility other than accessing the instance itself or using cloud provider metrics. I wonder if those three points I mentioned are valid reasons to remove visibility on hosts. I could not find anything on the internet that answers something similiar.

https://redd.it/1d8pb7d
@r_devops
is anybody using dagger.io in production?

Hello,

I recently discovered https://dagger.io/ and I love the concept and after playing around with some scripts and examples, I have a semi working pipeline im starting to get happy with.

I am interested in looking into using this production (or at least some staging environments for now) but I have concerns, mostly a) the documentation is terrible, its constantly out of date and suggesting different approaches b) the API is constantly changing and there is no clear approach to solve problems. The best source of documentation has been the github issues list, which worries me.

That being said it seems like a fantastic tool to prevent our current YML soup and promote code reuse. Once I made some initial headway it has been refreshing.

Is anybody using it and having the same difficulties? Anybody know of similar tooling for producing build images? Interested to hear peoples thoughts.

https://redd.it/1d8sqk6
@r_devops
What make a monitoring tool (production ready) ?

There are numerous tools available for monitoring, ranging from open-source and freeware to premium options. Based on your experience, what makes a tool suitable for production use? What criteria/functinality do you consider when selecting a tool or stack for your environments?

In line with this discussion, here is my go to for discovering new open-source tools

https://ossinsight.io/collections/monitoring-tool/

https://redd.it/1d8uul5
@r_devops
Need Advice

After I graduated I got my first career job two years ago as an Integration Dev, I want to start learning new tech so I can find a DevOp job starting next year. Im looking at KoudeCloud tried there DevOp course on Udemy and its 90% of the things I do at work. Will it be hard finding a DevOp job with 3 years experience as I only do light work at my job and the 20+ years experience do the hard work. I want to learn more but everyone is busy at my Job and there is no salary increase for three years. Any courses, subjects I could learn that could help more to break into DevOps?
I got a BS in CS, work at IBM for two years now as an integration dev but barely any work as I had no experience at all.


https://redd.it/1d8wfoe
@r_devops
Web app deployment not pulling forked repository code

### Scenario
I'm attempting to deploy a sample .NET Hello World Webapp to Azure, referencing a forked GitHub repository. The deployment process indicates that it is referencing my forked repository, but the GitHub Actions workflow does not trigger, and the web app is not deployed from the forked repo. I've tinkered with various settings with the terraform config, to no avail.

### Issue
Despite configuring the repository and providing the GitHub authentication token, the GitHub Actions workflow is not triggered, and the web app is not deployed from the forked repository. I've also tinkered with setting the manual_integration variable of the source control slot to no avail.

### Question:
Is there an additional authentication step or configuration that I'm missing between GitHub and the Azure portal to ensure the deployment triggers and pulls the code from my forked repository?

I am being patient post-deployment and waiting 10 minutes for the site to show. And it does work when deployed manually in the Azure portal.

Any help would be greatly appreciated! Thank you!

## Terraform config:

### webapp.tf

resource "azurerm_service_plan" "srv_plan" {
name = "${local.prefix}service-plan"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
sku_name = var.webapp_sku
os_type = var.webappos
tags = local.common_tags
}

resource "azurerm_windows_web_app" "dot_net_web_app" {
name = "${local.prefix}dotnet-app"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
service_plan_id = azurerm_service_plan.srv_plan.id
https_only = true
public_network_access_enabled = false
enabled = true
webdeploy_publish_basic_authentication_enabled = true

site_config {
always_on = true
minimum_tls_version = var.tls_version
application_stack {
current_stack = var.stack
dotnet_version = var.stack_version
}
}
}

resource "azurerm_source_control_token" "source_token" {
type = "GitHub"
token = var.github_auth_token
token_secret = var.github_auth_token
}

resource "azurerm_windows_web_app_slot" "slot" {
name = "${local.prefix}app-slot"
app_service_id = azurerm_windows_web_app.dot_net_web_app.id
site_config {}
}

resource "azurerm_app_service_source_control_slot" "git_source" {
slot_id = azurerm_windows_web_app_slot.slot.id
repo_url = var.webapp_repo_url
branch = var.webapp_repo_branch
use_mercurial = false
use_manual_integration = false
depends_on = [azurerm_source_control_token.source_token]
github_action_configuration {
generate_workflow_file = false
}
}


### variables.tf

variable "webapp_sku" {
type = string
default = "P1v2"
}

variable "webappos" {
type = string
default = "Windows"
}

variable "tls_version" {
type = string
default = "1.2"
}

variable "stack" {
type = string
default = "dotnet"
}

variable "stack_version" {
type = string
default = "v7.0"
}

variable "subresource" {
type = list(string)
default = ["sites"]
}

variable "webapp_repo_url" {
type = string
default = "https://github.com/ZimCanIT/hello-world-webapp"
}

variable "webapp_repo_branch" {
type = string
default = "main"
}

variable "github_auth_token" {
type = string
sensitive = true
description = "Token for authorization"
}


https://redd.it/1d8wack
@r_devops
How does the logging platform of your company look like?

Would love to know the following,

Technology e.g. ELK, Splunk, or any other self managed OSS?
Data size
UX - SQL or free form text
Structured/Unstructured logs/Mix

https://redd.it/1d8zyzu
@r_devops
Devops Resume Review

I've been in DevOps for a little bit and I'm looking for another DevOps Position. I'm striking out on my hunt and I think its because of my resume.

Please either

- roast me
- tell me how to make my resume better

Thanks!

https://imgur.com/UNMF3Ar

https://redd.it/1d8yp6m
@r_devops
There are no good options for hosting medium-sized applications

I am a software engineer and have been actively developing and maintaining small to medium web applications for the last few yearsโ€”for example, small CRMs, e-shops with thousands of users, etc.

There are great hosting options for small and large-size applications but no good options for medium ones.

For example, for free, a small-size website/web app can be hosted by a static website hosting provider such as Netlify/Vercel/Cloudflare pages or even by a PaaS such as Render or Digital Ocean Apps.

Large-size applications are complex, and there are plenty of options, such as Kubernetes.

Medium-size applications do not need the complexity of the large ones. They usually have one server for the web application, one queue system such as Celery or Laravel Horizon, one service for scheduling cron jobs, one cache service such as Redis, and one relational database.

If you host this in AWS using managed services such as AWS RDS, Elasticache, Fargate, etc, you might end up paying thousands of dollars for something that could host 50 dollars per month in a simple VPS in Hetzner. However, if you do it manually, then it is too much trouble to maintain lots of them without a full-time sysadmin/DevOps engineer allocated.

What are you using in these cases?

https://redd.it/1d938ub
@r_devops
How do I successfully start this open source project in github codespaces?

Here is a repo: https://github.com/pschlan/cron-job.org

Using a github codespaces instance with 16gb ram, I'm trying to figure out how to get the site working with docker compose, some instructions are in the readme. But the docker-compos.yml seems to be outdated.

I want to be able to access the local frontend, create an account, open the confirmation email, and then log in using my credentials. Also to be able to create a cron job and view it in the dashboard.

I will have to setup a SMTP server in docker-compose.yml for the site to work. I tried using mailserver container but it can't even send because the api container doesn't work.

I tried all this myself but got errors with mysql and the SMTP server not starting. Here is my troubleshooting so far: https://github.com/pschlan/cron-job.org/issues/242

Appreciate any help, thank you

https://redd.it/1d953ij
@r_devops
Oracle cloud wierd issue



I have 3 mongodb VMs.

I have NSG which allows both ingress and egress for the CIDR that all 3 VMs are part of.

If I allow all ports on source and destination on these NSGs then everything is fine.

But when I allow only mongodb ports, the communication stops

https://www.mongodb.com/docs/manual/reference/default-mongodb-port/

even netcat cannot reach another VM on the mongo ports only if I allow the mongo ports on the NSG.

I have tried this with both ingress and egress NSG with same result.

I also have a security list with mongo port 27017 for the mongodb VMs CIDR block. I did not touch that yet.

https://redd.it/1d96oxo
@r_devops
Subgroup of team selected for secret 6-12 months planning

Found out about it yesterday by mistake, same day as the meeting. 4 out of 10 in the team, and it's obviously been kept under wraps. We recently had an incident in which the incredibly incompetent scrum master screamed at some of our team members, yet for some reason both the PO and the PL not only keep the wanker on, but he's also in this meeting.


Meanwhile, I am known within the organization as the problemsolver and for working hard, and I sincerely believe that the entire team has input well worth taking into consideration.


Literally the only thing that they would have to do in order to achieve minimum effort would have been "we're doing this planning meeting, please put forth your opinions suggestions and recommendations". Instead, more than half of the people involved in this meeting are either bad or completely incompetent at their jobs.


I am honestly furious about this - I've worked very hard for the last ten months to get the platform into a usable state. I have a 1on1 with the PL tomorrow, and I'm feeling like I'm about to do a rage quit.


Would really appreciate some input on how to handle this - I really don't like to feel angry.

https://redd.it/1d9f0l7
@r_devops
PagerDuty

Hi guys, recently my manager tasked me with learning pager Duty as i have to implement it in our project. Any starting points that you guys can recommend? Any sites or Materials? Is it worth it to do a certification for jt?

https://redd.it/1d9g2vz
@r_devops
Help with my resume, is it that bad to not even get even a interview for a intern position?

Basically I am trying to get a entry level position for devops. So I've been applying to all the devops internship position, but still haven't even heard from one them till now. I do have experience with IT helpdesk, would that help in getting a devops role?

Link to me resume : https://imgur.com/xCD8Pe8

https://redd.it/1d9f5xc
@r_devops
Stormforge.io AI for automating kubernetes resource requests

Has anyone tried this out? I came across an ad for it on Reddit and it seems like it would solve a lot of the issues I'm dealing with in trying to administer a centralised runner solutions to thousands of end users but not allowed go too hard on limits because that will impact developers too much and so on.
I'd love to here if anyone has experience with this product and what their thoughts are?
It kinda seems a bit too good to be true ATM but I havent investigated the pricing models yet
https://www.stormforge.io/

https://redd.it/1d9i7ei
@r_devops