Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Password Mangers

Just wondering what folks here use for sharing passwords. I've used Dashlane, LastPass and 1Password but I'm looking for a tool which is SOC2 compliant and support GSuite as an identity provider.

Thus far, only Keeper seems to fit the bill but I haven't had a chance to test it out yet.

Looking for suggestions/recommendations.

https://redd.it/k7zphn
@r_devops
Stumped on automating single sign on saml for each new aws account

Basically I'm using jumpcloud as my sso provider and they have no apis to programmatically create a single sign on solution. This is a pain point because creating the single sign on saml is very manual process (the aws side can be automated with terraform but not the sso provider jumpcloud portion). Any suggestions / better sso providers anyone would recommend? I also tried using aws sso itself and ran into the same manual process issue and also HashiCorp sso and it also does not appear to have resources / apis available to programmatically create sso saml. Any suggestions?

https://redd.it/k7yn2i
@r_devops
32 hour outage and it's my fault. What do?

Throwaway account.

I'm a DevOps/CloudEng at a growing fintech company. We're approaching launch day and as you might imagine, everyone wants to deploy their projects and get them live ASAP.

Ultimately the goal is to automate as much as possible, but as an early stage company, there are many unknowns so there's still a lot of human intervention in the deployment process. Lastly I've been working non-stop over 10-11 hours everyday and some weekends.

So come Friday night, I'm changing the S3/Cloudfront configuration for our main website and tested it partially so I left for the weekend. I didn't realize it was misconfigured untill this morning when the CEO slacked me that the site was down.

Obviously panicked and fixed it immediately, only to realize that the site had been down for around 32 hours (weekend)

I wrote a detailed postmortem with the Cloudtrail events and I pretend to discuss it later with everyone. But some comments from the CTO made me feel quite guilty and bad as it looks like it was a total rookie mistake.

What do you recommend doing here?

https://redd.it/k7t0yy
@r_devops
Windows and Docker in production

Does anyone use Windows Docker containers for live applications?

I’ve been experimenting with it for a while and am unconvinced it’s suitable for use. Docker EE seems massively under documented and has an almost non-existent community.

It works, I can create images and run containers using Docker EE on Windows Server 2019 but I’m concerned we would get issues in live that could be very complex to resolve.

I’m wondering if we’d be better off sticking with scripted VMs and save containers until we can go fully Linux.

https://redd.it/k7r0mc
@r_devops
Terraform: EC2 Stop action

Hi,

I'm currently trying to set up a Cloudwatch metric alarm in Terraform that will stop an EC2 instance when it gets into its "Alarm" state. The idea is to stop the ec2 instance if it remains idle for too long. This seems to be possible when done via the AWS Console according to those [docs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html#AddingStopActions) but I struggle to create an equivalent "Stop action" in Terraform.

I used the [aws\_cloudwatch\_metric\_alarm](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) resource to create the alarm but this is all so far 🤷🏻‍♂️

Does anybody happen to know how to do that (if it's even possible)?

Note: I've found a few tutorials/gists online showing how to achieve that via a lambda but I was hoping there would be another way given it's feasible with just a couple clicks in the console.

https://redd.it/k93v45
@r_devops
Adaptive Request Concurrency. Resilient observability at scale.

This is a post written by the [Vector.dev](https://Vector.dev) team that discusses Adaptive Request Concurrency for observability pipelines. Automatically optimizing HTTP communication. It introduces an old networking concept into the o11y domain. Let us know what you think!

[https://vector.dev/blog/adaptive-request-concurrency/](https://vector.dev/blog/adaptive-request-concurrency/)

https://redd.it/k966i3
@r_devops
Going From Infrastructure to Developer Is A Reality

If you look at conferences from say, five years ago to now, it's changed drastically. There's no more talk about operating systems or server versions. Instead, it's all about "the cloud" and "development". Conferences like:

1. MS Build
2. AWS re-invent
3. MS Ignite

and even smaller conferences/meetups.

What does this mean for infrastructure people?

Well, here's the thing... it's exactly what we've always seen in tech, a transition. The same transition as when bare-metal server folks had to start thinking about virtualization. It's just another shift. The biggest difference is this shift is happening pretty fast.

It also kind of feels like the standard sysadmin/infrastructure engineer is being pushed to the side, which may be the reality. However, this reality isn't a death sentence. It's an opportunity. Let me explain why.

When people think of a "developer", they automatically think to be a developer you have to build the next Twitter or some other crazy app. The simple fact is, that's not the case. You can be an infrastructure developer or a cloud developer that writes code for cloud servers or on-prem environments.

The problem with the explanations we see today is no one is actually explaining HOW a sysadmin or infrastructure engineer can move into a developer role and STILL BE an infrastructure guy or gal. No one is explaining what concepts an infrastructure person needs to know to be a developer. A few of these concepts are:

1. Computer science concepts like declarative/imperative or pointers.
2. Testing (unit, mock, integration, etc.) infrastructure code
3. Source control. Not only GitHub, but a little history like distributed vs centralized source control
4. What sprints are and different types of cultural working environments
5. CICD for infrastructure
6. Code editors and IDEs

There's definitely more concepts, but I think these are the core focuses.

I recently created a YouTube video about this and I'm going to start a little series on this. Let me know your thoughts :)

[https://www.youtube.com/watch?v=u-0T-JN0GZc](https://www.youtube.com/watch?v=u-0T-JN0GZc)

https://redd.it/k97vuf
@r_devops
Great GitHub feature announcements

Really cool features announced at GitHub Universe.

- Dark mode
- Auto-merge pull requests
- Discussions for a community chats
- Required manual approval for Actions

And many more as well. Read more [here](https://github.blog/2020-12-08-new-from-universe-2020-dark-mode-github-sponsors-for-companies-and-more).

Really great to see GitHub advancing as a DevOps platform!

https://redd.it/k9945g
@r_devops
sops - A simple and flexible tool for managing secrets

I thought the r/devops subreddit might be interested in this project I just found!

https://github.com/mozilla/sops

If you like this, [I do a weekly roundup of open source projects that includes an interview with one of the devs you can subscribe to.](https://console.substack.com/)

https://redd.it/k96s27
@r_devops
Issues with my demo configuration for a university projects (Docker Compose + NGINX)

Hello everyone,

​

I'm currently preparing my presentation for my cyber security class which includes a demo of an attack. My general topic is denial of service and distributed denial of service attacks. My plan is to cover multiple attack types and important historical attacks (such as the attacks on Dyn as an example for DDoS). For demo purposes I chose to use my personal favorite, the slow loris/slow http attack. Firstly, because its an relatively easy demo and doesn't include bombarding my own servers with gigabytes of data and risking getting kicked out, secondly, I think there is some beauty to this attack.

​

Now I obviously know that I can't jump around the internet randomly DOSing websites, also, not all webservers are equally vulnerable. Thats why I chose to build a custom example setup. For portability reasons, I thought docker would be a good idea, especially since I can then share my configuration and my code with my classmates so they can try it themselves. The reason why I share my question with this specific subreddit is the setup I decided to make.

​

I want two web servers serving two different static pages (I built two simple pages explaining common use cases for the Apache 2 and the NGINX Webserver as well as some history of both programs). They both listen under different ports under web.mydomain.tld, waiting to be DOSed thanks to their vulnerable configuration. Under monitoring.mydomain.tld I decided to make a Grafana dashboard available to show the resource usage of the system (under attack), as well as how the Webservers behave (showing a graph with open connections, etc). I think as a datasource prometheus should be the best option.

​

Problem is, I'm familiar with the basic concepts of Prometheus and Grafana and worked with both as an end user, but I never had to configure them myself (even less with docker compose), and that is exactly where I'm struggling and where I'm hoping to get some help from you guys.

​

My current progress is getting Apache and NGINX to serve their respective sites, and getting prometheus, node-exporter and cadvisor to run when starting (this was the sample configuration I found when Googling). My next planned step was to configure the Nginx and Apache Exporter for prometheus, starting with the Nginx because I personally hate Apache.

​

I know that in order for the Nginx exporter to work, I need a status endpoint for Nginx, which is implemented by the stub\_status which I can return on a specific location. I found [the documentation](https://nginx.org/en/docs/http/ngx_http_stub_status_module.html#stub_status) and implemented it accordingly.

​

The configuration for my Nginx service looks as follows

nginx-webserver:
image: nginx
container_name: nginx-webserver
ports:
- 80:80
networks:
- back-tier
- front-tier
volumes:
- ./nginx:/usr/share/nginx/html:ro
- ./nginx.conf:/etc/nginx/nginx.conf.d/custom.conf:ro

​

the nginx.conf that is included contains the following

http {
server {
listen 80;
server_name web.mysite.net;
root /usr/share/nginx/html;
index index.html;
}
server {
listen 127.0.0.1:80;
server_name 127.0.0.1;
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
}

Now to my eye, this looks like it should work without any issues, but when visiting the status endpoint, I receive a 404 not found instead. Do you see any issues with my configuration?

​

The main suggestion I found was people saying that the stub\_status\_module needs to be enabled, but I checked it by running the nginx -V command inside a temporary container and found the \*--with-http\_stub\_status\_module\* flag, so the this should not be the
Help on learning system design | scenario and options

I am working through some different scenarios for system design trying to improve my limited skills. This question is not for a project or business goal, it is to learn about what is a better design.

Here is the scenario and some options for how to handle the design, I am looking for input on design of a system like this and which option is better or if there is some other option I haven't thought of that would be better than the provided.

**Scenario:** I have an application which is made up of multiple micro services which are mixed between EKS and Lambda. The auth service signs up a 'sub' user which needs to be accounted for in the company service which controls all data about the company including the number of sub users and high level details about them. Both services mentioned are running in EKS.

**Option 1:**

The auth service does an API call to the company service which updates the correct info.

*Pros:*

Simple

Fastest time to update

*Cons:*

Possible lost update if company service is down

**Option 2:**

Auth service processes sub user signup normally then sends a SQS message which is processed by a Lambda

*Pros:*

Can handle the lambda being down by queuing the system and processing later on

*Cons:*

More complex, need another Lambda and SQS queue

Possible delay in updating which could cause unknown errors

https://redd.it/k9d378
@r_devops
Job offers and developers

Hi!,

I found out that it is really hard to find a good discord server where I can find job offers or post an offer for developers... Because of that I decided to create a new discord server only for that. I would like to create a nice, friendly community to help each other finding new projects or developers to develop new incredible things! I would like to invite you there, here is a link [https://discord.gg/gmy8P52J](https://discord.gg/gmy8P52J) I am also looking for mods and people that would like to help me to grow it so please feel free to write to me and ask for joining our admins!

​

Kind regards

https://redd.it/k9eclp
@r_devops
Sad day for CentOS users

Currently CentOS is created to be binary compatible with stable releasees of RHEL. This is changing fr the future and CentOS will be the upstream source for RHEL. This is a sad day for any CentOS users [https://blog.centos.org/2020/12/future-is-centos-stream/](https://blog.centos.org/2020/12/future-is-centos-stream/)

https://redd.it/k9b397
@r_devops
How to get side gigs while working as a full-time DevOps?

I want to expand a bit, at least during this pandemic time with some side gigs to work on.

Anyone that works full-time with gigs on the side please share your story.

I've had a few projects through recommendations but they all dried up. So I'm looking for something new and don't know if I should pursue it again through referral/networking or go with Fiverr, Upwork, and the likes.

I'm not *very* interested in part-time gigs, but more of project-based ones. Finish the project, get paid. What do you guys suggest?

https://redd.it/k9bmbh
@r_devops
Help: Memory Usage by Process

Guys, I’m investigating how a process uses memory in Linux and in container. Firstly, I just run the process in my Linux environment, I grabbed the process’s pid, and recorded memory usage to a csv file. In the mean time, I also opened system monitor to look virtually how the memory used. An interesting thing I found is that the data recorded in csv file are larger than system monitor shows. For example, if system monitor shows 1.1GB, the csv file will show 1.7GB.

Anyone can help to explain this?

https://redd.it/k9a3kv
@r_devops
Tomorrow i have an interview for a DevOps Intern role. How should i prepare for it?

What type of questions might they ask?
Given that i have very little knowledge about Devops.

https://redd.it/k93jlq
@r_devops
Introduction to Docker

I wrote an article about what Docker is and what it aims to solve. Would be glad if you all could give it a read and provide some feedback. Thanks :D


[https://dev.to/rinkiyakedad/introduction-to-docker-1hp2](https://dev.to/rinkiyakedad/introduction-to-docker-1hp2)

https://redd.it/k91rt8
@r_devops
Cannot deploy Portworx in Kubernetes

Hi guys, Did anybody face with issue to deploy Portworx in Kubernetes?

The pods give the followin error : Failed to load PX filesystem dependencies for kernel 4.18.0-193.28.1.el8\_2.x86\_64

​

Please assist

https://redd.it/k8ynmq
@r_devops
Second DevOps Interview Readiness Help

Hi DevOps,

First off I want to say thank you to everyone in this subreddit you have helped me grow so much.

I had an interview for a DevOps Engineer role for Microsoft Azure shop and it went quite well the interview was scheduled for 30 minutes the IT directory and I and chatted for 50+ minutes. My current role is as a system administrator but I have cloud and DevOps experience.

My second interview is with a higher-up engineer and I want to ace the interview I'm wondering if you lovely people can help me with some questions I should ask and want to expect with a Sr engineer or am I overthinking this will this interview be more relaxed? My apologies for the bad spelling and grammar. I have Dyslexia along with Dysnomia and Dysgraphia.

Summary of the job description:

Support Microsoft web app

Skills with Microsoft Azure
Powershell scripting
Microsoft SQL
SQL Azure
Cosmos
Docker Containers
Cut costs with cloud services, cost, ratios, monitor
Customer support
Redundant web applications within the budget
Infrastructure as code on Azure

https://redd.it/k9lr8r
@r_devops