Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Advice regarding job orchestrator (HPC)

We are currently in the process of putting together a HPC cluster (up to 10 machines).
For the time being every customer do get a local account so they can launch their jobs.

Obviously, this has a number of drawbacks:
* quite often, customers forget to use tmux / screen, therefore, their job get killed once they disconnect
* all accounts should be provisioned on each machine
* resources are not used optimally
* ...


In order to get a *better* management of the infrastructure, we are looking for an orchestration solution.

Other departments are using SLURM but we find the administration quite *painful*.

I came across Nomad which promises to be lighter and easier.

Can anyone share their experience with Nomad or that type of tool?

https://redd.it/g8wh9j
@r_devops
UI to run Ansible playbook

I am using AWX now and it has a lot of bugs. Semaphore looks good but it looks like there are no releases from the past two years. Please suggest if any other good alternatives?

https://redd.it/g8t4wl
@r_devops
MacOS Catalina apps

Hi there! For those using Macbooks, im curious if you’re using zsh or bash, what terminal, IDE, text editor, and other apps you use.
Please let me know so I could take a look. Thanks!!

https://redd.it/g8srpy
@r_devops
Monitoring & logging - how to navigate current situation with 143 different tools?

I may sound like an old grouch but coming back after couple of years of not being in devops/sysadmin landscape and researching monitoring solution gave me a headache :) Back in the days there was just a 1-2 solutions (none of them as advanced as current of course) which was easy to navigate.

---

Is there any simple rundown of what tools? From Googling around my mind wrapped itself around. I will like to develop some easy wiki entry but I'm lost myself :)

## All-in-one
- `nagios`: still in business and it's still pretty much a simple offline/online alerting & monitoring
- `observium`: a one-stop solution for smaller installations and little customization
- `Cacti`: an old friend with mostly SNMP-centered support
- `Zabbix`: all-in-one solution for monitoring & graphing, uses push model
- `GrayLog`: complete solution for log collection & analysis, uses `ElasticSearch` as a data store (+`MongoDB` for app config). Can collect text & structured app data.
- `DataDog`: cloud-based closed source solution for complete monitoring and no 2FA authentication (?!) and crazy slow UI. It's based on push model and agents.

---

## Data Visualization
- `Grafana`: swiff-army knife for visualization of any data; contains a permanent data store for aggregated data in any popular RDBMS
- `Kibana`: same as `Grafana` but from `ElasticSearch` authors

---

## Data Storage/Collection

### Metrics
- `InfluxDB`: (just?) a database for storing time series data implementing a push model, usually used with `Grafana`. Doesn't persist data.
- `Prometheus`: like `InfluxDB`, but uses pull model (and push model with an additional addon). Includes alerting too. Usually used with `Grafana`; replaces `InfluxDB` + `Telegraf` + `Kapacitor`. Offers data persistence.
- `Graphite`: some metrics aggregation tool which describes itself as "Kick ass. Chew bubblegum." (ooh... ok.).
- `Telegraf`: an agent which is able to pull information from a local system and push them to almost any database (and since it's from `InfluxDB` authors presumably `InfluxDB` is recommended)
- `Kapacitor`: alerting for environments without `Prometheus` but with `InfluxDB`

### Logs
- `Loki`: same as `Prometheus` but deals with logs. Usually used with `Grafana`
- `Logstash`: same as `Loki` but geared towards `Kibana`


### All-in-one
- `ElasticSearch`: a tool I know from the dev side of things; a datastore centered around searching. For some reason it can be plugged to `Grafana` as a data source
- `OpenTSDB`: like `InfluxDB` but for really, really, REALLY large data sets. Can be plugged to `Grafana` as a data source



Ok.... I'm officially lost. I probably missed some crucial tools here. So far however, I see that essentially there are two major players in visualization and a ton of tools to send them data and any tool you choose you will be happy.

https://redd.it/g8qnhh
@r_devops
AWS First Timer: Any suggestions on how to get AWS certifications?

I am a first timer with zero knowledge or experience with AWS. There's are a ton of online resources but I am not sure which one would be the right one to help me prepare for the AWS certification exams. I heard the courses on official AWS training websites are not that great, however, I found the AWS educate website got solid material but not sure if it is relevant to AWS certifications. It would be great if any of you experienced folks out there to share their success stories.

https://redd.it/g8baad
@r_devops
7 DevOps Books

I put together those 7 DevOps Books that I read recently and I highly recommend them for anyone that is into DevOps transformation.

Some of them, like The Goal and Toyota Kata, are not even about technology, but process-oriented and focused on continuous improvement.

Of course on this list has the classics from Gene Kim, the DevOps Handbook, Phoenix and Unicorn Project.

Now I would like to hear your opinion on those books and what real outcomes you've taken from them.

And as important as that, what recommendation you have that could be part of this list?

[https://medium.com/devops-cloud-it-career/7-must-read-devops-books-f7b6e9f30f6e](https://medium.com/devops-cloud-it-career/7-must-read-devops-books-f7b6e9f30f6e)

https://redd.it/gallif
@r_devops
Debugging builds on a remote server

Currently my workflow is to hit build in visual studio, copy the files to the server via win scp, and then run them through mono (NET framework app) on the remote server (I can't run the app locally). This makes debugging a pain. Is there some way I can hit run and step through the code as it runs on the server? Or should I just create a replica of the infrastructure using local VMs

https://redd.it/gam6d1
@r_devops
Which cloud provider do you choose and why?

Is there any way to say this is the best cloud provider or it depends on the company vision and other factors?

How do you usually consider cloud providers?

https://redd.it/gacm66
@r_devops
Database for large amount of data without retention

Hello,

I have an app which I would like to monitor (usage statistics mostly). We are talking about roughly 100 metrics per 30/60 seconds from about 100 servers. The number of servers will grow more in the future, although not at a high rate. The number of metrics, I don't think so.

Right now we are using Cacti (rrd files...) but it has become a PITA to maintain, so I'm thinking to move this to a more modern approach that supports Grafana. I am considering Prometheus or InfluxDB/Telegraf for this. The thing is that we must keep all the data at all times (no retention at all) since they are vital to the way our company operates.

Any opinions on this? Was anyone ever tasked with something like this?

Thanks in advance

https://redd.it/gagl5p
@r_devops
What is the best practice to organize kubernetes tools (ELK, zipkin,..) by namespaces?

I have some tools running in my kubernetes cluster (ELK, zipkin,..) and i want to know in which namespace to place them, for example i have fluentd which is a daemonset running in kube-system namespace so should i place elasticsearch in the same namespace or put them together in a custom namespace so they can reach each other, i just want to know what is the best practice to do it

https://redd.it/gaea16
@r_devops
Getting '502' Errors on creating a simple load balancer through NGINX

Im creating a simple LB using two Azure VMs using Nginx and getting '502 bad gateway errors'. Im probably doing a simple mistake but cant figure it out, my firewall through Azure allows TCP connections through 80, and also remove the symbolic link. My nginx.conf file is as follows:

​

`http {`

​

`upstream backend {`

`server` [`13.x`](https://13.82.231.115)`xx.xxx.xxx;`

`server` [`52.x`](https://52.149.201.19)`xx.xxx.xxx;`

`}`

​

`# This server accepts all traffic to port 80 and passes it to the upstream.`

​

`server {`

`listen 80;`

​

`location / {`

`proxy_pass` [`https://backend`](https://backend)`;`

`}`

​

`access_log /var/log/nginx/access.log;`

`error_log /var/log/nginx/error.log;`

`}`

`}`

​

I can run a simple web page and access it and it gives the default NGINX splash screen, but the LB thing doesnt work, my error.log shows these messages

"2020/04/29 21:33:44 \[alert\] 20746#20746: \*1 open socket #3 left in connection 2

2020/04/29 21:33:44 \[alert\] 20746#20746: \*2 open socket #10 left in connection 3

2020/04/29 21:33:44 \[alert\] 20746#20746: aborting"

https://redd.it/gaial1
@r_devops
Low ESX to VM density

I encountered this statement recently.

What does this mean?

> low ESX to VM density (1:9)

ESX is a hypervisor. VM is the virtual machine. Is this saying that the number of hyper visors should be higher relative to the number of VMs? Is the suggestion that there should be more hypervisors? More hypervisors -- I'm assuming -- gives more granular control over each individual VM?

https://redd.it/gagpma
@r_devops
Sidecar container pattern is the best we can do?

Am I the only one that thinks sidecar container patterns bring the worst of system administration into the containerized world? e.g (IPC) communicating between processes using the file system. or [https://www.reddit.com/r/kubernetes/comments/g7sw1o/running\_crontabs\_in\_pods\_as\_sidecars/](https://www.reddit.com/r/kubernetes/comments/g7sw1o/running_crontabs_in_pods_as_sidecars/)

https://redd.it/gatpkg
@r_devops
Preseed file for Ubuntu 18.04 / 20.04 hanging on formatting disk screen

Hello,I'm trying to create a working preseed file to automate the installation of an ubuntu VM version 18.04 or 20.04 via packer.

packer build -var-file variables.json ubuntu_buildtemplate.json

The problem is that it does start the setup selecting the different options but :

​

* On Ubuntu 18.04 On the "Filesystem Setup" screen it loops on "No" in the pop up window: Confirm destructive action selecting continue below will continue the installation process and results in the loss of data on the disks selected to be formatted.
* On Ubuntu 20.04 Gets stuck on the "Guided Storage Configuration"screen Use entire disk is checked but no validation pop up

Here is the json file:

​

{
"builders": [
{
"type": "vsphere-iso",

"vcenter_server": "{{user `vsphere_server`}}",
"username": "{{user `vsphere_username`}}",
"password": "{{user `vsphere_password`}}",

"insecure_connection": "true",

"vm_name": "T-ubuntu",
"datastore": "{{user `vsphere_datastore`}}",
"folder": "{{user `vsphere_folder`}}",
"host": "{{user `vsphere_host`}}",
"convert_to_template": "true",
"network": "{{user `vsphere_network`}}",
"boot_order": "disk,cdrom",

"guest_os_type": "ubuntu64Guest",

"CPUs": 1,
"RAM": 2048,
"RAM_reserve_all": false,

"disk_controller_type": "pvscsi",
"disk_size": 80000,
"disk_thin_provisioned": false,

"network_card": "vmxnet3",


"ssh_username": "{{user `ssh_username`}}",
"ssh_password": "{{user `ssh_password`}}",

"iso_paths": [
"[datastore1] ISO/Linux/ubuntu-18.04.4-live-server-amd64.iso"
],

"boot_command": [
" d-i auto-install/enable boolean true<wait>",
" d-i debconf/priority select critical<wait>",

" d-i debian-installer/locale string en_US.UTF-8<wait>",
" d-i localechooser/supported-locales multiselect en_US.UTF-8<wait>",
" d-i console-setup/ask_detect boolean false<wait>",
" d-i keyboard-configuration/xkb-keymap select GB<wait>",

" d-i /choose_interface select auto<wait>",
" d-i netcfg/get_hostname string unassigned-hostname<wait>",
" d-i netcfg/get_domain string unassigned-domain<wait>",
" d-i hw-detect/load_firmware boolean true<wait>",


" d-i mirror/country string manual<wait>",
" d-i mirror/http/hostname string archive.ubuntu.com<wait>",
" d-i mirror/http/directory string /ubuntu<wait>",
" d-i mirror/http/proxy string<wait>",


" d-i passwd/root-login boolean true<wait>",
" d-i passwd/root-password-crypted password !!<wait>",
" d-i passwd/make-user boolean false<wait>",


" d-i clock-setup/utc boolean true<wait>",
" d-i time/zone string UTC<wait>",
" d-i clock-setup/ntp boolean true<wait>",
" d-i clock-setup/ntp-server string ntp.ubuntu.com<wait>",

" d-i preseed/early_command string umount /media || true<wait>",

" d-i grub-installer/only_debian boolean true<wait>",
" d-i grub-installer/with_other_os boolean true<wait>",

" d-i partman-efi/non_efi_system boolean true<wait>",
" d-i partman-auto/disk string /dev/sda<wait>",
" d-i partman-auto/init_automatically_partition select biggest_free<wait>",
" d-i partman-auto/method string regular<wait>",
" d-i partman-auto/choose_recipe select atomic<wait>",

" d-i partman/confirm_write_new_label boolean true<wait>",
" d-i partman/choose_partition select finish<wait>",
" d-i partman/confirm boolean true<wait>",
" d-i partman/confirm_nooverwrite boolean true<wait>",
" d-i
partman-auto/confirm boolean true<wait>",

" d-i base-installer/install-recommends boolean true<wait>",
" d-i base-installer/kernel/image string linux-generic<wait>",
" d-i debconf debconf/frontend select Noninteractive<wait>",

" d-i apt-setup/restricted boolean true<wait>",
" d-i apt-setup/universe boolean true<wait>",
" d-i apt-setup/backports boolean true<wait>",
" d-i apt-setup/use_mirror boolean false<wait>",
" d-i apt-setup/services-select multiselect security, updates<wait>",
" d-i apt-setup/security_host string security.ubuntu.com<wait>",
" d-i apt-setup/security_path string /ubuntu<wait>",


" d-i tasksel/first multiselect none<wait>",
" d-i pkgsel/include string openssh-server python<wait>",
" d-i pkgsel/upgrade select full-upgrade<wait>",
" d-i pkgsel/update-policy select unattended-upgrades<wait>",


" d-i pkgsel/include string openssh-server vim git tmux build-essential open-vm-tools telnet wget curl python<wait>",


" d-i debian-installer/splash boolean false<wait>",
" d-i cdrom-detect/eject boolean true<wait>",

" d-i finish-install/reboot_in_progress note<wait>",
" d-i debian-installer/exit/poweroff boolean true<wait>"
]
}
],


"provisioners": [
{
"type": "shell",
"inline": ["echo 'Template build complete'"]
}
]
}


Has anyone made a working preseed for ubuntu ?
What am I missing ?


Thank you for your help

https://redd.it/gaglte
@r_devops
Offline server for automatic backups?

A want to disclaimer that this is not related to me, I am just wondering how to solve this problem correctly.

Lately in my country there was an ransomware attack on two universities of which data became encrypted. Lets try to ignore how this happened, the most important part is that the backups were also encrypted and therefore useless.

This leads me to suspicion that if the attack has been done by taking the root access to the main server, then the main server has also direct access to backup servers which seems like big security flaw unless we have an secure backup of the backup servers.

How would you create an automatic backup of the server in such a manner that even if root account was compromised, the attacker would not be able to touch the backups stored on another (secured) server?

My first idea is to make an use of some external tool that is able to perform operation on both machines (lets simplify it to SSH access), but the tool itself is in private network accessible only from the inside of the building.

https://redd.it/gaaoyc
@r_devops
ci/cd pipeline for deploying once every few months?

I always worked in companies that deploys at least once per week but I recently moved to company that is in highly regulated environment. Because of that our software needs to be validated before each deployment and in general is painful :D
I'm the only devops engineer there. Here is how the process looks:
1. Merge to dev builds new docker images and deploys to ECS
2. Merge to test at specific time. New image is build deployed. QA validates move to uat.
3. Merge to uat. Again images are built and deployed to uat cluster.
4. Manual step in prod. pick build number in Jenkins and it changes that tag in ecr repository and redeploy ecs services.

However, I find that Jenkins is not very flexible with these infrequent deployments. I'm not even sure how to start to improve this build process as we only deploy to prod once every 2 months and every deployment is very painful, because usually not only code but also infrastructure changes.

https://redd.it/ga7x22
@r_devops
Are there any business classes that one would consider essential?

I am currently pursuing a BS in computer science.

Just as the title states, would certain business classes help me pursue a career as either a DevOps engineer or an SRE?

I understand that DevOps is a philosophy, so even if not business. Are there any classes outside of the standard curriculum that would be beneficial?

https://redd.it/gaaoez
@r_devops
Do you prefer GitLab or Jenkins?

Student, totally newbie on the topic. When trying to work on a team and realizing the problems of merge hell, i've concluded that having an structured pipeline and continuous integration is the way to go. After some research, both GitLab and Jenkins caught my eye. There are tons of features to compare between the two, but i could only understand so far (also, some of posts were outdated so they aren't necessarily making fair comparissons since they didn't have new features on mind). In general, what caught my eye about Jekins is the great customization with plugins / full workspace control. In general, what caught my eye about GitLab is that it doesn't always have to be self hosted and is easier to setup / get going with. Which one do you prefer and why? Does one scale better than the other?

https://redd.it/gayl97
@r_devops
AZ-204 Exam Prep

Hey all,

I've been going over the AZ-204 prep pretty heavily. I noticed that for the AZ-204, Microsoft changed the exam guide and it will be updated in May which you can find [here](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4oZ7B).

The one thing that stood out to me was that Microsoft is now specifying Azure PowerShell and AZ CLI. Before it was just something along the lines of "know an SDK".

Because of that, I put together a GitHub repository that is currently a work in progress with PowerShell/AZ CLI code for each module on the AZ-204. If you like it or would like to contribute to the idea, feel free to let me know! [https://github.com/AdminTurnedDevOps/AZ-204-Code](https://github.com/AdminTurnedDevOps/AZ-204-Code)

https://redd.it/gawevd
@r_devops