Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
github-actions: Is there any way to build docker-compose from cache?

I have a docker-compose file and I don't want to build it from scratch, every time I send a PR or make a pull request.Instead, I want it should build from the cache.

here is my action.yml file, in case you need it.

name: Github Action
on: push: branches:
staging jobs: test: runs-on: ubuntu-18.04
steps:
uses: actions/checkout@v1
name: Bootstrap app on Ubuntu uses: actions/setup-node@v1 with: node-version: '12'
uses: whoan/docker-build-with-cache-action@v5 with: image_name: whoan/node
name: Install global packages run: npm install -g yarn prisma
name: Install project deps if: steps.cache-yarn.outputs.cache-hit != 'true' run: yarn
name: Build docker-compose run: docker-compose -f docker-compose.test.prisma.yml up --build -d

​

https://redd.it/ga0q8d
@r_devops
Hot to setup HTTPS on AWS EC2

I'm running into problems while trying to run my API that's hosted on AWS EC2 through a HTTPS protocol.

The API runs normally **without** the ELB setup, however, after trying to configure (I follow the recommended steps), I get the **502 Bad Gateway** message.

Here's my configuration:

* AWS EC2 (t3a.small) running a docker container of my ExpressJS app listening on port 3000;
* Security group with http:80 and https:443 open;
* ACM that covers the following domains (mydomain.com, \*.mydomain.com);
* ELB listening to ports: http:80, https:443, https:3000;
* Route 53 with my hosted zone containing the A-type record with the ELB DNS value;

I'm running into problems while trying to run my API that's hosted on AWS EC2 through an HTTPS protocol.

https://ec2-ip-address.zone.compute.amazonaws.com:3000/api/

**Now**

https://api.mydomain.com:443/api/{resourceName}

Please, I will appreciate any insight on how to properly set up in case I missed something let me know.

https://redd.it/ga01ym
@r_devops
Self-hosting a cloud-native microservice project

I'm planning to create a large-ish cloud-native microservice project as a learning experience and playground to test various technologies that I don't get to use at work. Usually I would go with AWS but for cost reasons I have to self-host most of the infrastructure on a home-server.

- There will be two Kubernetes clusters for production and pre-production environments.
- Inside the clusters I will use Istio as the service mesh.
- Code will be hosted on gitlab.com (or self-hosted gitlab if necessary).
- I will follow a push-based GitOps workflow: When a PR is merged into master, the CI pipeline builds the docker image, publishes it and deploys to the production environment. I will keep the necessary credentials as environment variables for now, that means any deployment can only happen on protected branches or else someone from outside could make a PR and change the .gitlab-ci.yml to deploy whatever they want. I don't know yet how I could automate a deployment to the preproduction environment and running of integration tests. If I were to make a second "staging" branch besides master that deploys to preproduction then staging and master would quickly diverge and because "staging" branch is protected, it is not possible to overwrite commits there (which is necessary during testing/QA).
- In place of S3 I have to self-host a MinIO storage instance. Assets of the frontend-application will be uploaded there so that older assets are still available during incremental rollouts.
- Docker images will be published either to Gitlab.com's container registry (10GB free per repo) or to my own MinIO storage.
- I want to use Terraform as much as possible for creating all my infrastructure. There will be an infrastructure repository that applies changes on commit to master. Secrets in the Terraform files will be encrypted using git-crypt.
- I will use only open source products for observability: ELK for logging and OpenTelemetry for metrics+tracing. That means at the very least I have to self-host Kibana, Zipkin, Prometheus and Grafana instances.
- I suppose I will need a domain name and somehow link that to my server so that the web app will be available from outside. For development and access to the preproduction web app I can use ZeroTier instead of a corporate VPN.

To sum it up, my home-server will run at least: 2 Kubernetes clusters, Gitlab Runners, MinIO, ZeroTier, lots of databases for the microservices, Kibana, Zipkin, Prometheus, Grafana, an internal Maven repository, some kind of service to link my domain-name to the dynamic IP, and a personal NAS.

This foundational ops stuff is all new to me. Where do I even start setting this up? Should I host everything on bare metal or use VMs? If so how would I provision the VMs in a reproducable manner? Where do the databases for the microservices live?

Naturally this is completely overkill for a side-project, but the whole point is for me to learn how to do it, so I want to follow enterprise best practices as closely as is manageable.

https://redd.it/g9sqgf
@r_devops
MaaS Node reboots as soon as PXE loads initrd, Supermicro X11SSE-F

Using MaaS 2.6.2, and when trying to commissions blades with the Supermicro X11SSE-F blade motherboard, they PXEboot, load the kernel and initrd and immediatley reboot. Currently at a loss as to what may be the cause. Anyone else experience anything like this?

https://redd.it/g9tjkt
@r_devops
moving into leadership to start a new devops team from scratch - seeking advice

copy/paste from this thread: https://old.reddit.com/r/ITCareerQuestions/comments/g9s5ku/im_being_offered_a_position_for_a_title_that/

i'm currently a devops engineer, in discussions with another company to help start a brand new department and lead the "devops transformation" charge. it would be part engineering, part mentoring, part hiring and staffing a team...hence the question about moving into management, i'd be over these new people that i'd help hire.

i've been a team lead before, but never a manager. i've helped start new teams, but never as a leader, always as a peer advocate and consultant.

they're talking like this would be the equivalent to a director level position, reporting to a VP. i've never been in a leadership position, this would be a move to start a new team from scratch, so it's all foreign territory to me.

i'm asking for help on how to figure out what a salary for a position like this would be and how to research my market rate so that i'm asking for a realistic number, or able to evaluate their offer. is there a general step & column of what a salary increase should be going from a staff employee to a leadership position?

https://redd.it/g9tit8
@r_devops
Anyone here work for Oracle?

Just got approached by a internal employee for a devops position. I work for one of their competitors and my company would definitely sue. Is oracle good to work for?

https://redd.it/g9tdax
@r_devops
How do you deal with an automated job that modify a git repo?

How do you deal with automatic merging and potential conflicts?

Tips?

https://redd.it/g9szoz
@r_devops
Django HelloWorld Changes not showing up.

I created a HelloWorld starter app.

I pulled the code into my machine and started making changes, I can see the changes locally but not one AWS.

I can see my changes in CodeCommit's master branch. Project is builidng successfully.

What should I try?

​

Thank you.

https://redd.it/g9g0y0
@r_devops
Building docker image in gitlab public

Is it possible these days to build and push docker images from public gitlab runners? I know you normally need a privileged / own runner for that, but maybe there are workarounds?

https://redd.it/g8zmoa
@r_devops
devops' iteration is wearing me out

I'm coming from a developer background where you save your program and in a matter of seconds (usually less) you get feedback on whether what you did was right or wrong.

Now I've taken a devops role and I find it **5x harder to concentrate**. I've been building a big pipeline and I'm trying to test it. Each iteration circle takes so much time though:

1. Fix/Build the pipeline
2. Trigger the pipeline to run and wait for it to run, as a test.
3. Something does break, cool; time to fix it.
4. Back to step (1)

The issue is, step (2) takes minutes to run! Because you need to install dependencies, build docker images, possibly build infrastructure (clusters) through terraform and all that fuss! I reddit / check slack / day dream while waiting for something to break... but of course this makes me lose focus too much!

I've resorted to build the pipeline image myself and try to walk through it command-by-command by hand to see if something fails to fix it and then just rebuild the image myself, saving the constant git commit/push cycle and having better control of what I want to run and what is irrelevant, but it requires a lot of manual control and still a lot of repetitiveness, not to mention that you aren't reproducing the pipeline as accurately as possible at all times.

Seriously, it's tiring as hell, I've never felt before like that and I've been developing for a lot of years.

How do you solve this? Any devs-to-devops here to chime in or any tips in general? It's driving me crazy!

https://redd.it/g8z11r
@r_devops
Cloud in Turkey

Hi ya,

My company is expanding and we need to get our apps running in Turkey (this is a must-have). None of the big cloud providers have their data centers there.

A perfect solution would allow me to dynamically add resources like machines/hdd/ram and so on. Managed Kubernetes would also be much appreciated. English speaking support is also a must (no one in the company speaks Turkish).

Google does not show many options. I'm looking for a good, reliable provider. Do you guys have any recommendations?

Cheers

https://redd.it/g8yacy
@r_devops
Monitoring celery cluster

We have a celery cluster, using rabbitmq as a broker, that is running asynchronous tasks, some takes a long time (4h +) and some are running periodically. However the flower monitoring provided is not ideal as sometimes we are losing references of tasks, as if the tasks process continues to run on its own, whereas his supposed associated task is either unknown or in succeeded/failed state.

What alternative to flower can be used to monitor a celery cluster ? I'm thinking of monitoring rabbitmq queues + processes using datadog, but then if I want to kill a particular task, without going to server, is there a celery api ? If flower can do it remotely, I suppose it's doable.

https://redd.it/g8xlya
@r_devops
Advice regarding job orchestrator (HPC)

We are currently in the process of putting together a HPC cluster (up to 10 machines).
For the time being every customer do get a local account so they can launch their jobs.

Obviously, this has a number of drawbacks:
* quite often, customers forget to use tmux / screen, therefore, their job get killed once they disconnect
* all accounts should be provisioned on each machine
* resources are not used optimally
* ...


In order to get a *better* management of the infrastructure, we are looking for an orchestration solution.

Other departments are using SLURM but we find the administration quite *painful*.

I came across Nomad which promises to be lighter and easier.

Can anyone share their experience with Nomad or that type of tool?

https://redd.it/g8wh9j
@r_devops
UI to run Ansible playbook

I am using AWX now and it has a lot of bugs. Semaphore looks good but it looks like there are no releases from the past two years. Please suggest if any other good alternatives?

https://redd.it/g8t4wl
@r_devops
MacOS Catalina apps

Hi there! For those using Macbooks, im curious if you’re using zsh or bash, what terminal, IDE, text editor, and other apps you use.
Please let me know so I could take a look. Thanks!!

https://redd.it/g8srpy
@r_devops
Monitoring & logging - how to navigate current situation with 143 different tools?

I may sound like an old grouch but coming back after couple of years of not being in devops/sysadmin landscape and researching monitoring solution gave me a headache :) Back in the days there was just a 1-2 solutions (none of them as advanced as current of course) which was easy to navigate.

---

Is there any simple rundown of what tools? From Googling around my mind wrapped itself around. I will like to develop some easy wiki entry but I'm lost myself :)

## All-in-one
- `nagios`: still in business and it's still pretty much a simple offline/online alerting & monitoring
- `observium`: a one-stop solution for smaller installations and little customization
- `Cacti`: an old friend with mostly SNMP-centered support
- `Zabbix`: all-in-one solution for monitoring & graphing, uses push model
- `GrayLog`: complete solution for log collection & analysis, uses `ElasticSearch` as a data store (+`MongoDB` for app config). Can collect text & structured app data.
- `DataDog`: cloud-based closed source solution for complete monitoring and no 2FA authentication (?!) and crazy slow UI. It's based on push model and agents.

---

## Data Visualization
- `Grafana`: swiff-army knife for visualization of any data; contains a permanent data store for aggregated data in any popular RDBMS
- `Kibana`: same as `Grafana` but from `ElasticSearch` authors

---

## Data Storage/Collection

### Metrics
- `InfluxDB`: (just?) a database for storing time series data implementing a push model, usually used with `Grafana`. Doesn't persist data.
- `Prometheus`: like `InfluxDB`, but uses pull model (and push model with an additional addon). Includes alerting too. Usually used with `Grafana`; replaces `InfluxDB` + `Telegraf` + `Kapacitor`. Offers data persistence.
- `Graphite`: some metrics aggregation tool which describes itself as "Kick ass. Chew bubblegum." (ooh... ok.).
- `Telegraf`: an agent which is able to pull information from a local system and push them to almost any database (and since it's from `InfluxDB` authors presumably `InfluxDB` is recommended)
- `Kapacitor`: alerting for environments without `Prometheus` but with `InfluxDB`

### Logs
- `Loki`: same as `Prometheus` but deals with logs. Usually used with `Grafana`
- `Logstash`: same as `Loki` but geared towards `Kibana`


### All-in-one
- `ElasticSearch`: a tool I know from the dev side of things; a datastore centered around searching. For some reason it can be plugged to `Grafana` as a data source
- `OpenTSDB`: like `InfluxDB` but for really, really, REALLY large data sets. Can be plugged to `Grafana` as a data source



Ok.... I'm officially lost. I probably missed some crucial tools here. So far however, I see that essentially there are two major players in visualization and a ton of tools to send them data and any tool you choose you will be happy.

https://redd.it/g8qnhh
@r_devops
AWS First Timer: Any suggestions on how to get AWS certifications?

I am a first timer with zero knowledge or experience with AWS. There's are a ton of online resources but I am not sure which one would be the right one to help me prepare for the AWS certification exams. I heard the courses on official AWS training websites are not that great, however, I found the AWS educate website got solid material but not sure if it is relevant to AWS certifications. It would be great if any of you experienced folks out there to share their success stories.

https://redd.it/g8baad
@r_devops
7 DevOps Books

I put together those 7 DevOps Books that I read recently and I highly recommend them for anyone that is into DevOps transformation.

Some of them, like The Goal and Toyota Kata, are not even about technology, but process-oriented and focused on continuous improvement.

Of course on this list has the classics from Gene Kim, the DevOps Handbook, Phoenix and Unicorn Project.

Now I would like to hear your opinion on those books and what real outcomes you've taken from them.

And as important as that, what recommendation you have that could be part of this list?

[https://medium.com/devops-cloud-it-career/7-must-read-devops-books-f7b6e9f30f6e](https://medium.com/devops-cloud-it-career/7-must-read-devops-books-f7b6e9f30f6e)

https://redd.it/gallif
@r_devops
Debugging builds on a remote server

Currently my workflow is to hit build in visual studio, copy the files to the server via win scp, and then run them through mono (NET framework app) on the remote server (I can't run the app locally). This makes debugging a pain. Is there some way I can hit run and step through the code as it runs on the server? Or should I just create a replica of the infrastructure using local VMs

https://redd.it/gam6d1
@r_devops
Which cloud provider do you choose and why?

Is there any way to say this is the best cloud provider or it depends on the company vision and other factors?

How do you usually consider cloud providers?

https://redd.it/gacm66
@r_devops
Database for large amount of data without retention

Hello,

I have an app which I would like to monitor (usage statistics mostly). We are talking about roughly 100 metrics per 30/60 seconds from about 100 servers. The number of servers will grow more in the future, although not at a high rate. The number of metrics, I don't think so.

Right now we are using Cacti (rrd files...) but it has become a PITA to maintain, so I'm thinking to move this to a more modern approach that supports Grafana. I am considering Prometheus or InfluxDB/Telegraf for this. The thing is that we must keep all the data at all times (no retention at all) since they are vital to the way our company operates.

Any opinions on this? Was anyone ever tasked with something like this?

Thanks in advance

https://redd.it/gagl5p
@r_devops