Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
When should I start applying for devops jobs?

Hi everyone, I have been working as a cloud engineer for 2 years now working primarily with GCP/AWS and in administration. I have three certs: GCP Professional Architect cert, AWS Solutions Architect Associate, AWS Security Specialty. I have been wanting to transition myself into devops as I believe the technology is very cool, so my question is: When should I start applying? Currently, I am scared to apply thinking I may not have all the qualifications needed. I am familiar with cloudformation & python, and have general concept knowledge of other devops tools such as kubernetes, docker, terraform, ansible, but I have not had the opportunity to work with them. I am thinking I need to get maybe an AWS Dev Associate Cert and AWS Devops Professional Cert. Then after that I could try to apply while pursuing a Certified Kubernetes Administrator and Terraform Cert. Should I start applying for jobs right now even if I know I don't meet all the prerequisites? Or should I wait till I get some additional certs before I start applying. Thank you all!

https://redd.it/k8kmkv
@r_devops
Cannot ssh to my vm after switching to NAT and port forwarding

[https://www.simplified.guide/\_media/virtualbox/port-forwarding/virtualbox-settings-network-advanced-port-forwarding-configure.png](https://www.simplified.guide/_media/virtualbox/port-forwarding/virtualbox-settings-network-advanced-port-forwarding-configure.png?w=620&tok=e460a2)

I switched from bridge adapter to NAT and then decided to use port forwarding, but I can no longer connect to ssh. I used 192.168.2.16 and port 22 to connect to ssh, but when I put those two values in the Guest IP and port number field, I can no longer connect via ssh. Can someone explain me what I am doing wrong?

https://redd.it/k8ulww
@r_devops
Having a little trouble with TLS cert for Hashicorp Vault/Consul cluster

Hi there.

So I'm a bit new at this, but I recently figured out all the steps to stand up a Vault/consul cluster using Terraform. The cluster works, but I can't go live with it until I can figure out what is going wrong with the cert.

Here's what I understand to be the facts - let me know if one of these is incorrect:

* In order for the backend (Consul) to work correctly, the cert needs a SAN for both vault.service.consul, and [127.0.0.1](https://127.0.0.1).
* SSL/TLS certificates don't let you add SANs for domains that are not related to the primary Common Name, so my cert for [vault.example.com](https://vault.example.com) gives errors if I try to add these in.
* The documentation directs you to use a Self-Signed cert, and then make the public key readily available. This is not ideal, as I'm trying to roll this out as our general password manager, and having to install certs is going to reduce the adoption rate amongst less tech-savvy users in my org.

Are there any suggestions on how to resolve this so that I can use a public cert and still have backend functionality? Does everyone else using Vault seriously use a self-signed cert? That seems bonkers to me.

​

UPDATE: Just throwing stuff at the wall to see what sticks, I installed a public wildcart cert, with no SANs. The frontend works with no errors, and the Vault appears to be talking to the backend. But if I run the `vault status` command on the Vault server, I'm getting an error that there are no SANs for [127.0.0.1](https://127.0.0.1). I don't really want to leave it like this, because I don't know what else might not be talking.

I've been messing with different options in my Default.hcl file, but anything I update to the FQDN here either has DNS problems, or, if I route it correctly in the hosts file, swings back to the same SAN error.

https://redd.it/k8iu0u
@r_devops
Choosing between AWS and Heroku for a brand new stack

What would you build your new SaaS product on?

AWS is so powerful, and gives tons of free credits to startups. But then it's also quite complex, even with Terraform it seems tricky.

Heroku is so nice and simple, it just works, with CI and all. But then I imagine it'll get expensive quickly, and they only offer a basic set of services without much flexibility.

What else would you consider?

https://redd.it/k8ipqw
@r_devops
Password Mangers

Just wondering what folks here use for sharing passwords. I've used Dashlane, LastPass and 1Password but I'm looking for a tool which is SOC2 compliant and support GSuite as an identity provider.

Thus far, only Keeper seems to fit the bill but I haven't had a chance to test it out yet.

Looking for suggestions/recommendations.

https://redd.it/k7zphn
@r_devops
Stumped on automating single sign on saml for each new aws account

Basically I'm using jumpcloud as my sso provider and they have no apis to programmatically create a single sign on solution. This is a pain point because creating the single sign on saml is very manual process (the aws side can be automated with terraform but not the sso provider jumpcloud portion). Any suggestions / better sso providers anyone would recommend? I also tried using aws sso itself and ran into the same manual process issue and also HashiCorp sso and it also does not appear to have resources / apis available to programmatically create sso saml. Any suggestions?

https://redd.it/k7yn2i
@r_devops
32 hour outage and it's my fault. What do?

Throwaway account.

I'm a DevOps/CloudEng at a growing fintech company. We're approaching launch day and as you might imagine, everyone wants to deploy their projects and get them live ASAP.

Ultimately the goal is to automate as much as possible, but as an early stage company, there are many unknowns so there's still a lot of human intervention in the deployment process. Lastly I've been working non-stop over 10-11 hours everyday and some weekends.

So come Friday night, I'm changing the S3/Cloudfront configuration for our main website and tested it partially so I left for the weekend. I didn't realize it was misconfigured untill this morning when the CEO slacked me that the site was down.

Obviously panicked and fixed it immediately, only to realize that the site had been down for around 32 hours (weekend)

I wrote a detailed postmortem with the Cloudtrail events and I pretend to discuss it later with everyone. But some comments from the CTO made me feel quite guilty and bad as it looks like it was a total rookie mistake.

What do you recommend doing here?

https://redd.it/k7t0yy
@r_devops
Windows and Docker in production

Does anyone use Windows Docker containers for live applications?

I’ve been experimenting with it for a while and am unconvinced it’s suitable for use. Docker EE seems massively under documented and has an almost non-existent community.

It works, I can create images and run containers using Docker EE on Windows Server 2019 but I’m concerned we would get issues in live that could be very complex to resolve.

I’m wondering if we’d be better off sticking with scripted VMs and save containers until we can go fully Linux.

https://redd.it/k7r0mc
@r_devops
Terraform: EC2 Stop action

Hi,

I'm currently trying to set up a Cloudwatch metric alarm in Terraform that will stop an EC2 instance when it gets into its "Alarm" state. The idea is to stop the ec2 instance if it remains idle for too long. This seems to be possible when done via the AWS Console according to those [docs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html#AddingStopActions) but I struggle to create an equivalent "Stop action" in Terraform.

I used the [aws\_cloudwatch\_metric\_alarm](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) resource to create the alarm but this is all so far 🤷🏻‍♂️

Does anybody happen to know how to do that (if it's even possible)?

Note: I've found a few tutorials/gists online showing how to achieve that via a lambda but I was hoping there would be another way given it's feasible with just a couple clicks in the console.

https://redd.it/k93v45
@r_devops
Adaptive Request Concurrency. Resilient observability at scale.

This is a post written by the [Vector.dev](https://Vector.dev) team that discusses Adaptive Request Concurrency for observability pipelines. Automatically optimizing HTTP communication. It introduces an old networking concept into the o11y domain. Let us know what you think!

[https://vector.dev/blog/adaptive-request-concurrency/](https://vector.dev/blog/adaptive-request-concurrency/)

https://redd.it/k966i3
@r_devops
Going From Infrastructure to Developer Is A Reality

If you look at conferences from say, five years ago to now, it's changed drastically. There's no more talk about operating systems or server versions. Instead, it's all about "the cloud" and "development". Conferences like:

1. MS Build
2. AWS re-invent
3. MS Ignite

and even smaller conferences/meetups.

What does this mean for infrastructure people?

Well, here's the thing... it's exactly what we've always seen in tech, a transition. The same transition as when bare-metal server folks had to start thinking about virtualization. It's just another shift. The biggest difference is this shift is happening pretty fast.

It also kind of feels like the standard sysadmin/infrastructure engineer is being pushed to the side, which may be the reality. However, this reality isn't a death sentence. It's an opportunity. Let me explain why.

When people think of a "developer", they automatically think to be a developer you have to build the next Twitter or some other crazy app. The simple fact is, that's not the case. You can be an infrastructure developer or a cloud developer that writes code for cloud servers or on-prem environments.

The problem with the explanations we see today is no one is actually explaining HOW a sysadmin or infrastructure engineer can move into a developer role and STILL BE an infrastructure guy or gal. No one is explaining what concepts an infrastructure person needs to know to be a developer. A few of these concepts are:

1. Computer science concepts like declarative/imperative or pointers.
2. Testing (unit, mock, integration, etc.) infrastructure code
3. Source control. Not only GitHub, but a little history like distributed vs centralized source control
4. What sprints are and different types of cultural working environments
5. CICD for infrastructure
6. Code editors and IDEs

There's definitely more concepts, but I think these are the core focuses.

I recently created a YouTube video about this and I'm going to start a little series on this. Let me know your thoughts :)

[https://www.youtube.com/watch?v=u-0T-JN0GZc](https://www.youtube.com/watch?v=u-0T-JN0GZc)

https://redd.it/k97vuf
@r_devops
Great GitHub feature announcements

Really cool features announced at GitHub Universe.

- Dark mode
- Auto-merge pull requests
- Discussions for a community chats
- Required manual approval for Actions

And many more as well. Read more [here](https://github.blog/2020-12-08-new-from-universe-2020-dark-mode-github-sponsors-for-companies-and-more).

Really great to see GitHub advancing as a DevOps platform!

https://redd.it/k9945g
@r_devops
sops - A simple and flexible tool for managing secrets

I thought the r/devops subreddit might be interested in this project I just found!

https://github.com/mozilla/sops

If you like this, [I do a weekly roundup of open source projects that includes an interview with one of the devs you can subscribe to.](https://console.substack.com/)

https://redd.it/k96s27
@r_devops
Issues with my demo configuration for a university projects (Docker Compose + NGINX)

Hello everyone,

​

I'm currently preparing my presentation for my cyber security class which includes a demo of an attack. My general topic is denial of service and distributed denial of service attacks. My plan is to cover multiple attack types and important historical attacks (such as the attacks on Dyn as an example for DDoS). For demo purposes I chose to use my personal favorite, the slow loris/slow http attack. Firstly, because its an relatively easy demo and doesn't include bombarding my own servers with gigabytes of data and risking getting kicked out, secondly, I think there is some beauty to this attack.

​

Now I obviously know that I can't jump around the internet randomly DOSing websites, also, not all webservers are equally vulnerable. Thats why I chose to build a custom example setup. For portability reasons, I thought docker would be a good idea, especially since I can then share my configuration and my code with my classmates so they can try it themselves. The reason why I share my question with this specific subreddit is the setup I decided to make.

​

I want two web servers serving two different static pages (I built two simple pages explaining common use cases for the Apache 2 and the NGINX Webserver as well as some history of both programs). They both listen under different ports under web.mydomain.tld, waiting to be DOSed thanks to their vulnerable configuration. Under monitoring.mydomain.tld I decided to make a Grafana dashboard available to show the resource usage of the system (under attack), as well as how the Webservers behave (showing a graph with open connections, etc). I think as a datasource prometheus should be the best option.

​

Problem is, I'm familiar with the basic concepts of Prometheus and Grafana and worked with both as an end user, but I never had to configure them myself (even less with docker compose), and that is exactly where I'm struggling and where I'm hoping to get some help from you guys.

​

My current progress is getting Apache and NGINX to serve their respective sites, and getting prometheus, node-exporter and cadvisor to run when starting (this was the sample configuration I found when Googling). My next planned step was to configure the Nginx and Apache Exporter for prometheus, starting with the Nginx because I personally hate Apache.

​

I know that in order for the Nginx exporter to work, I need a status endpoint for Nginx, which is implemented by the stub\_status which I can return on a specific location. I found [the documentation](https://nginx.org/en/docs/http/ngx_http_stub_status_module.html#stub_status) and implemented it accordingly.

​

The configuration for my Nginx service looks as follows

nginx-webserver:
image: nginx
container_name: nginx-webserver
ports:
- 80:80
networks:
- back-tier
- front-tier
volumes:
- ./nginx:/usr/share/nginx/html:ro
- ./nginx.conf:/etc/nginx/nginx.conf.d/custom.conf:ro

​

the nginx.conf that is included contains the following

http {
server {
listen 80;
server_name web.mysite.net;
root /usr/share/nginx/html;
index index.html;
}
server {
listen 127.0.0.1:80;
server_name 127.0.0.1;
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
}

Now to my eye, this looks like it should work without any issues, but when visiting the status endpoint, I receive a 404 not found instead. Do you see any issues with my configuration?

​

The main suggestion I found was people saying that the stub\_status\_module needs to be enabled, but I checked it by running the nginx -V command inside a temporary container and found the \*--with-http\_stub\_status\_module\* flag, so the this should not be the
Help on learning system design | scenario and options

I am working through some different scenarios for system design trying to improve my limited skills. This question is not for a project or business goal, it is to learn about what is a better design.

Here is the scenario and some options for how to handle the design, I am looking for input on design of a system like this and which option is better or if there is some other option I haven't thought of that would be better than the provided.

**Scenario:** I have an application which is made up of multiple micro services which are mixed between EKS and Lambda. The auth service signs up a 'sub' user which needs to be accounted for in the company service which controls all data about the company including the number of sub users and high level details about them. Both services mentioned are running in EKS.

**Option 1:**

The auth service does an API call to the company service which updates the correct info.

*Pros:*

Simple

Fastest time to update

*Cons:*

Possible lost update if company service is down

**Option 2:**

Auth service processes sub user signup normally then sends a SQS message which is processed by a Lambda

*Pros:*

Can handle the lambda being down by queuing the system and processing later on

*Cons:*

More complex, need another Lambda and SQS queue

Possible delay in updating which could cause unknown errors

https://redd.it/k9d378
@r_devops
Job offers and developers

Hi!,

I found out that it is really hard to find a good discord server where I can find job offers or post an offer for developers... Because of that I decided to create a new discord server only for that. I would like to create a nice, friendly community to help each other finding new projects or developers to develop new incredible things! I would like to invite you there, here is a link [https://discord.gg/gmy8P52J](https://discord.gg/gmy8P52J) I am also looking for mods and people that would like to help me to grow it so please feel free to write to me and ask for joining our admins!

​

Kind regards

https://redd.it/k9eclp
@r_devops
Sad day for CentOS users

Currently CentOS is created to be binary compatible with stable releasees of RHEL. This is changing fr the future and CentOS will be the upstream source for RHEL. This is a sad day for any CentOS users [https://blog.centos.org/2020/12/future-is-centos-stream/](https://blog.centos.org/2020/12/future-is-centos-stream/)

https://redd.it/k9b397
@r_devops
How to get side gigs while working as a full-time DevOps?

I want to expand a bit, at least during this pandemic time with some side gigs to work on.

Anyone that works full-time with gigs on the side please share your story.

I've had a few projects through recommendations but they all dried up. So I'm looking for something new and don't know if I should pursue it again through referral/networking or go with Fiverr, Upwork, and the likes.

I'm not *very* interested in part-time gigs, but more of project-based ones. Finish the project, get paid. What do you guys suggest?

https://redd.it/k9bmbh
@r_devops
Help: Memory Usage by Process

Guys, I’m investigating how a process uses memory in Linux and in container. Firstly, I just run the process in my Linux environment, I grabbed the process’s pid, and recorded memory usage to a csv file. In the mean time, I also opened system monitor to look virtually how the memory used. An interesting thing I found is that the data recorded in csv file are larger than system monitor shows. For example, if system monitor shows 1.1GB, the csv file will show 1.7GB.

Anyone can help to explain this?

https://redd.it/k9a3kv
@r_devops