Reddit DevOps
267 subscribers
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
I want to scale up/down as fast as AWS Lambda but also be able to allocate the vCPU (minimum 8) per task, any advice?

# Goal

So I'm looking for an AWS product or combination of products to accomplish the following:

* Ability to do CPU heavy (non-parallel) calculations (min 8 vCPU's per node)
* Minimum timeout limit of 1800 seconds (AWS Lambda has a limit of 900 seconds)
* Ability to automatically scale up and down to 0/1
* Ability to scale up really fast (<30 seconds)
* Event-driven execution model (one task per node, node gets destroyed after the job is finished)
* Assign the desired vCPU's per "type" (3 predefined types) event/task/job

I'll try to be as detailed as possible in my description of what the job is, the two setups I've tried and what I don't like about them.

\---

## Definitions

* ROE: Route Optimisation Engine
* VRP: Vehicle Routing ProblemA single JSON containing all the information regarding the stops, vehicles, time windows, start/end addresses) which needs to be optimised. VRP's can be classified into three different complexities \[easy, medium, hard\] based on:
* number of stops
* restrictions per stop (time window, capacity)
* number of vehicles
* restrictions per vehicle (time window, maximum range, capacity, breaks)
* Easy classified VRP's will need less CPU resources and can be solved quicker than for example high classified VRP's.
* Solution: The best solution for the VRP (the most efficient routes for the VRP)

## Setup 1 - AWS Lambda

## Diagram

[https://preview.redd.it/iqabr1pt2tu41.png?width=743&format=png&auto=webp&s=fe05f8015a8d72ef71c016c9aa200023a390cf6c](https://preview.redd.it/iqabr1pt2tu41.png?width=743&format=png&auto=webp&s=fe05f8015a8d72ef71c016c9aa200023a390cf6c)

## AWS SQS

Standard type AWS SQS queue for optimisation request messages. Every message that enters triggers the ROE (AWS Lambda).

## AWS Lambda

The Lambda (3008 MB memory) is triggered by the AWS SQS queue for optimisation request messages and processes alles messages when they are added to the queue. The maximum timeout limit of all AWS Lambda's is 15 minutes.

## Problems

* AWS Lambda does not allow us to select more CPU resources and therefore optimisations of medium complexity VRP's take a long time.
* AWS Lambda maximum timeout settings of 15 minutes makes the usage of a Lambda for doing optimising high complexity VRP impossible

## Setup 2 - AWS Batch

## Diagram

[https://preview.redd.it/twkxguw03tu41.png?width=808&format=png&auto=webp&s=34f951730078dc9fc25c345a3a5584ce8980992d](https://preview.redd.it/twkxguw03tu41.png?width=808&format=png&auto=webp&s=34f951730078dc9fc25c345a3a5584ce8980992d)

## Steps

* WA sends optimisation request to API
* API creates a VRP and stores the VRP in S3
* API evaluates the complexity of the VRP (low, medium or high)
* API creates an AWS Batch Job (with the parameter problemId=123456789)
* API allocates the correct number of vCPU's and Memory (predefined) to that job based on the VRP complexity
* API adds the AWS Batch Job to the correct AWS Batch queue (the queue is defined by the VRP complexity)
* API returns a 200 - OK response to the WA
* AWS Batch Job is taken from the AWS Batch Queue for execution
* AWS Batch Job fetches the VRP from S3 based on the problemId it received during step 4
* AWS Batch Job solves the VRP
* AWS Batch Job stores the solution in S3
* WA requests the solution from the API
* API gets the solution from S3 and returns it to WA

## Problems

* Scaling up takes unacceptably long
* Going from 0 to 1 node takes too long (300+ seconds)
* Going from 1 to X nodes takes too long (60/300+ seconds)

https://redd.it/g7gs31
@r_devops
Using AWS SDK from Fargate

I'm new to devops, new to terraform, new to aws and new to the cloud. So please be nice with me! :)

With TerraForm I have provisioned an ECS environment running a fargate container that is executing a golang binary on AWS.

In my go app I am using the AWS SDK as a library to fetch values from SSM parameter store. The app fails because no credentials are provided to make the API call.

From what I read, it is later the task execution role that determines the AWS credentials and access in an environment. So if the task execution role has the permission to get parameters from SSM, so should my app be able to work with the SDK to do the same. I have added the according policy to the role, but it does not work. And let me say, I did this under strong supervision of random people on the internet, open source code and medium articles, so I do not fully understand what I was actually doing. So let me show.

# ECS task execution role data
data "aws_iam_policy_document" "ecs_task_execution_role" {
version = "2012-10-17"
statement {
sid = ""
effect = "Allow"
actions = ["sts:AssumeRole"]

principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}

# ECS task execution role
resource "aws_iam_role" "ecs_task_execution_role" {
name = var.ecs_task_execution_role_name
assume_role_policy = data.aws_iam_policy_document.ecs_task_execution_role.json
}

variable "iam_policy_arn" {
default = [
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
"arn:aws:iam::aws:policy/AmazonSSMReadOnlyAccess" # the relevant policy
]
}

# ECS task execution role policy attachment
resource "aws_iam_role_policy_attachment" "ecs_task_execution_role" {
role = aws_iam_role.ecs_task_execution_role.name
count = length(var.iam_policy_arn)
policy_arn = var.iam_policy_arn[count.index]
}

For test cases I was even fetching a value from SSM in the template of my ECS and set it as an environment variable. It looked like this:


"secrets": [
{
"name": "POSTGRES_PASSWORD",
"valueFrom": "${database_password_arn}"
}
]

Only to later print it out to the shell in my *docker-entrypoint.sh*. Which it did with the correct value.

In the next line I'd execute the go binary, which fails at the following script that is trying to do the very same thing but inside my application:


sess, err := session.NewSessionWithOptions(session.Options{
SharedConfigState: session.SharedConfigEnable,
})
if err != nil {
panic(err)
}

ssmsvc := ssm.New(sess, aws.NewConfig().WithRegion("eu-central-1"))
keyname := "/myapp-secret/database/password/master"
withDecryption := true
param, err := ssmsvc.GetParameter(&ssm.GetParameterInput{
Name: &keyname,
WithDecryption: &withDecryption,
})

if err != nil {
panic(err) // no credentials provided
}

I am hoping for support 🙈Thank you in advance. I'm sure there were also some misconceptions from my side, so don't hesitate to correct me.

https://redd.it/g7fsoz
@r_devops
Ubuntu 20.04 may be causing issues in docker builds with tzdata prompt

Today routine docker build in our CI got stuck.

```
Setting up tzdata (2019c-3ubuntu1) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Configuring tzdata
------------------

Please select the geographic area in which you live. Subsequent configuration
questions will narrow this down by presenting a list of cities, representing
the time zones in which they are located.

1. Africa 4. Australia 7. Atlantic 10. Pacific 13. Etc
2. America 5. Arctic 8. EurJob Finished

```

The base image we used is ubuntu:latest. But Ubuntu just released 20.04 Focal Fossa yesterday, and one of the changes there broke our builds.

Here are 2 options to go around fixing it:

#1 Change to ubuntu:18.04

Just change your base image down to an older stable version.

```
-FROM ubuntu:latest
+FROM ubuntu:18.04
```

#2 DEBIAN_FRONTEND=noninteractive apt-get…

```
-RUN apt-get update && \
+RUN DEBIAN_FRONTEND=noninteractive \
+ TZ=Asia/Singapore \
+ apt-get update &&
apt-get install -y \

```

I don’t go for ENV DEBIAN_FRONTEND=noninteractive because that’s not an env var I wanna set permanently for the container image.

Anyway that’s just a quick fix, I hope they patch it soon!

[Original source](https://anonoz.github.io/tech/2020/04/24/docker-build-stuck-tzdata.html)

https://redd.it/g781rs
@r_devops
BI / Analytics / Reporting in a modern microservice stack

Hi All

I'm a PM at an AgriTech BPM startup.

We're in the process of planning out the next version of our tech stack. We'll be breaking down our current +-5 year old PHP/jQuery/OrientDB monolith into (most likely) Nodejs/Vue.js/Couchbase etc microservices.

We currently use 2 separate components for reporting: Hand built frontend tables, graphs and the like for the basic "inline" stuff like KPI's and basic aggregated tables, charts etc, and then M$ Power BI for the more advanced dashboard-like stuff which we then embed. The latter is fed by ETL jobs that push data on a schedule into an Azure SQL DB. The licensing thereof is also damn near crippling TBH.

So with all that in mind, I'd very much like to hear your opinions on how/what/where to incorporate a reporting and BI solution. I'm basically trying to answer the following question:

Is it realistic to want to build one microservice that does everything from quick basic KPIs all the way up to complex dashboards and embedded reports?

* We're thinking along the lines of a Postgres instance with Cube.js and some form of Vue.js & D3.js frontend that can be embedded, or even something like Plotly's Dash (Python substack). Ideally we'd like to be able to eventually hire a data analyst or 3 that can run with this with minimal input from Devs.
* Is there merit in splitting the two ends of the spectrum into basically what we have now, namely one component that sits inline on the frontend, that pulls from the production DB, but only ever remains a mini framework for the basics like the aforementioned KPIs and basic stuff, and then treating the proper embedded reports and dashboards as a separate thing. Also bear in mind the possible AI/ML components that may come along in future.

P.S. I'm trying to stay away from the Docker & Kubernetes conversation on this specific point, but if it's relevant please feel free.

I've been accused of rambling on Reddit before, so please forgive if that's the case here.

Looking forward to your opinions.

https://redd.it/g77y88
@r_devops
I'm doing a survey in the devops community: What challenges do you face implementing continuous security into your workflows?

I'm hoping to do a talk on the several challenges that make devsecops difficult, what challenges do your security people struggle with compared to the old days of "pentest at the end". With continuous delivery, pipelines get more complicated and attack vectors are no longer just limited to the applications, so as a community we have to adapt to this, for example we struggle with the cloud infrastructure side a lot - most of the staff we have working on the terraform builds and cloud infrastructure pieces aren't actually infrastructure people, so it leads to exposed services and lazy configurations that are "function-first". Any comments or opinions like this would be much appreciated!

Thank you.

https://redd.it/g76tf5
@r_devops
What are good log analysis tools (not using Java)

Im looking to setup some remote log analytics but elasticsearch, elkstack, graylog etc etc are all Java based and not suitable to run on a small vm with 2GB ram (Java issues with not enough memory!), surely there is some solutions out there that are lightweight ? maybe running golang or something

https://redd.it/g76exg
@r_devops
Any tips/tutorials/cheatsheet to build skills in core networking concepts (vpn, subnetting,proxy, NAT, ssh tunnels, port forwarding...)

Working as a junior DevOps engineer it is frustrating I have a weak knowledge on networking... Need some guidance. How did you guys learnt networking ?

https://redd.it/g87f2c
@r_devops
Embarrassing Question

So I’ve been a DevOps Engineer now for about a year and a half, Windows sysadmin and application analyst for a few years prior to this and I’m embarrassed to admit I type horrifically.

It was never a huge deal before but I’ve found that as I’ve moved more into scripting, IaC and even high level app coding my current typing method is just super slow, inaccurate and cumbersome.

I’m not pecking each key bad, but I almost always need to be looking at the keyboard and I often am slow and still make errors.

I was wondering if anyone had any good tips or resources for learning to type quickly and without needing to look at the keyboard? I’ve of course just googled for typing tutorials and the like but I wanted to see if anyone had things they could say had helped them personally.

Thanks for the time all! Hope everyone is staying healthy and sane!

https://redd.it/g84mlk
@r_devops
Exposing internal services to developers

Hello!

I'm going to create three Kubernetes clusters for a small team of developers. I'm going to need a set of internal tools (e.g. Grafana dashboards). Would you propose to use SSO or VPN to expose internal services?

https://redd.it/g81for
@r_devops
Understand how Prometheus Monitoring works | Explaining Prometheus Architecture

Prometheus has become the mainstream monitoring tool of choice in container and microservice world.

[**In this video**](https://youtu.be/h4Sl21AKiDg) I explain following topics:

* **Why Prometheus is so important in such infrastructure** and what are some specific use cases
* **Where and why is Prometheus used with specific use cases**?
* How Prometheus works? What are targets and metrics?
* How does Prometheus collect those metrics from its targets?
* I explain **Prometheus Architecture with simple diagrams** and go through the main components: Prometheus Server, Pushgateway, Alertmanager
* Configuring Prometheus - Example YAML Configuration
* The **advantages** of Prometheus Pull System compared to alternative monitoring tools, which use Push System
* Using Prometheus Monitoring with **Docker 🐳 and Kubernetes**

A separate practical video for monitoring Kubernetes services with Prometheus will follow.

&#x200B;

Thought I share it, could be helpful for some of you 🙂 Also appreciate any feedback.

https://redd.it/g7v7ny
@r_devops
Need recommendation for a CD platform

Hi,

I'm looking to host a $20 droplet on DigitalOcean, and I need a recommendation for the following setup.

1. Has an option to build container images (Docker) and save them to a local registry.
2. Deploys the said container images as a single-instance system on the same machine.
3. Routes HTTP / HTTPs traffic to the per-container domain (e.g. [myapp.mydomain.com](https://myapp.mydomain.com)) and requests LE certificates automatically.
4. Has a web GUI (optional).
5. Directly syncs with GitHub.
6. Is easy to install.

I tried Dokku and such and I didn't like the Duct-tapyness of it - I'm looking for something more enterprise-grade that has an on-premise option available.

https://redd.it/g80j53
@r_devops
Trying to understand Helm and multiple applications

Hi Everyone,

I'm working on architecting a structure for a complex application and need to understand if I'm approaching this incorrectly. I am new to K8s and Helm, so you've been warned. ;)

I have a collection of applications (Client apps) that consumes services from a single application (Server app). Since I would rather not spin up duplicate Server app for each Client app, I would like to get the charts to detect and use the existing services that are there. From my understanding, I might be able to accomplish this with requirements.yaml? But I can't find any documentation to confirm this.

To add to this, I plan on having multiple environments in the development cluster. I know I could use namespaces, but I'm hoping I can avoid it if it's not necessary.

PS, If Helm is the wrong approach, I am not married to it too.

Thanks!

https://redd.it/g7v40h
@r_devops
DevOps vs. SRE — Which is better for your career?

What do you guys think? Which is a better job title?
[https://medium.com/devops-dudes/devops-vs-sre-which-is-better-for-your-career-5694b5719d88?source=friends\_link&sk=7fde8bc1092eb01bf57cd79ba666f0d9](https://medium.com/devops-dudes/devops-vs-sre-which-is-better-for-your-career-5694b5719d88?source=friends_link&sk=7fde8bc1092eb01bf57cd79ba666f0d9)

https://redd.it/g7zgde
@r_devops
kubeletctl is an open-source client for kubelet with an option to scan for vulnerable containers

**kubeletctl is an open-source client for kubelet with an option to scan for vulnerable containers**

What can it do:

* Run any kubelet API call
* Scan for nodes with opened kubelet API
* Scan for containers with RCE
* Run a command on all the available containers by kubelet at the same time
* Get service account tokens from all available containers by kubelet
* Nice printing 📷

Check it out:[https://github.com/cyberark/kubeletctl](https://slack-redir.net/link?url=https%3A%2F%2Fgithub.com%2Fcyberark%2Fkubeletctl)

\#kubernetes #kubelet #kubeletctl

https://redd.it/g7ssbm
@r_devops
Openshift pipeline help

Need some good reading or video resources to understand how to design and implement entire pipeline on openshift or kubernetes. I understand kubernetes and openshift from infrastructure standpoint but I need to learn how to take traditional on prem application and convert to devops pipeline on openshift or kubernetes with entire build test ( all level of testing) deploy test scenarios. I have always been a infrastructure guy and never worked as software developer Thank you in advance

https://redd.it/g7sglh
@r_devops
Seeking for guidance.

Hello everyone,

I have some thoughts about which programming language i should learn (Pyhton || Ruby), and i want to share with you, and get some advice about it.

I have been working with ansible to provision the infrastructure for long time already, and using molecule with testinfra(pyhton) to test playbook, but i got a issue that put me in the situation of migrate to chef, and all it tools(InSpec, RSpec, serverspec, KitchenCI, etc), thing that i don't regret at all, and i kind of love it actually, because gave a test-driven provision, but all that is base on Ruby.
Now every time that i got an interview to DevOps position, some of the requirements always are python and bash for scripting, thing that is ok, but what if i can do the same scripting on Ruby??

Thanks in advance.

https://redd.it/g7s2ps
@r_devops
Migration from Docker Swarm to Kubernetes with same IP?

HI All,

I am working on migrating docker swarm based microservices to kubernetes using helm 3 charts. The migration job/script does the job of importing current config files ,volumes etc. Now all the services come up and we are able to validate the services. I am going to use metallb load balancer for services.

Any suggestions on how to switch the IP from docker swarm VM to kubernetes metallb online ?

Note that both the systems have single entry point for the microservices(api gateway).

https://redd.it/g7pnq4
@r_devops
Lacking some devops basics

So I’ve been working as a devops engineer for the past year. I had no prior knowledge about what devops was coming out of college at all. But I made due just getting hands on experience on aws, kubernetes, etc. but I feel like I’m missing some basic knowledge that I should know. I know how to fix certain issues, debug my way out of random requirements, utilize aws services all by googling stuff on the job. But i don’t know how to start learning and understanding why certain things work they way they do. Topics including SNI, why some tcp traffic needs Level 3 - 4 layer to work. Mostly network, certs, proxies related things. Not that I can’t google all this myself, but can anyone point me to the right direction of how to start or suggest any books that helped them really understand these abstract topics.

https://redd.it/g7pr1n
@r_devops
How come Amazon deploys 23,000 times a day? What are they changing so often?

ok so im new to devops. i came across this image [https://imgur.com/a/3uBZKBN](https://imgur.com/a/3uBZKBN) and i was wondering what exactly does amazon (and other companies) change in these so many deployments? Because i see pretty much the same website everyday

https://redd.it/g8ktuu
@r_devops
Praise dependabot! The github bot to manage your code's vulnerabilities

I just got on with a new project to perform an automation engineer role to help streamline the little resources this team has. First order of business was moving out of their private GitLab box that wasn't enforcing HTTPS to a GitHub Org, so we can be a little more confident in the confidentiality of our source code..

I enabled the dependency alerts under the new private Repo, and now there's this trusty bot named Dependabot scanning and submitting PRs to update the dependencies to clear all sorts of CVEs that have been posted on the tools in use. I've never seen this feature before, so I figured I'd inform the masses of this neat feature

https://redd.it/g8ncd9
@r_devops