Reddit DevOps
269 subscribers
4 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How to promote career development and social engagement? DevOps Leadership

So finally I managed to get buy in from upper management to encourage career development and social engagement within our team due to the recent wave of resignations attributed to burnouts. Formerly a startup, it has grown out of proportion from 3 to 10(now 6) where before COVID19 things were manageable as we had an office. We used to conduct demos and lunch N Learn but we can no longer do that since we’re now remote.

Feedback I’ve gotten:

- No time to learn in business hours and life gets in the way (people with family)
- No opportunities to apply new skills
- Loss of interest after a few sessions

I’m wondering if anyone else’s place have some sort of system to encourage people to learn new skills or increase interests in the technology stack?

What has worked for you?

TIA

https://redd.it/kz9gei
@r_devops
Agile and Ci/CD means that a project is never finished

It also means that developers are paid to produce both unfinished work and also to "fix" what has already been released until the end of time.

https://redd.it/kzbc6w
@r_devops
Azure Devops STFP intergration

I wanted to run a pipeline where everything is going to be built in my target machine but after it's built, I wanted to make it into an artifact that can be consumed by a release pipeline and from there, I wanted to get that artifact, a simple csv file of my making, and transfer it to another computer via sftp.

I have winscp on my machine now but I wanted to make it more flexible by making it where I don't need to have it installed on the machine so that I can change machines on the fly without having to go in and install it if it's not there already.

Is there anyway I can approach this?

https://redd.it/kzf0dr
@r_devops
AWS cloudformation - python canary?

Hi Guys,

I'm trying to create a python synthetics canary via cloudformation using the AWS::Synthetics::Canary resource with an inline script and I'm getting stuck on the "Script.Handler" value.

My resource definition is as follows - I'm aware the script does not do anything, I'm just trying to get it to work at the moment:

FrontendCanary:
Type: AWS::Synthetics::Canary
Properties:
ArtifactS3Location: !Sub 's3://${CanaryArtifactBucket}/frontend-loadbalancer'
Code:
Handler: 'script.handler'
Script: | # TODO
def handler(event, context):
pass
ExecutionRoleArn: !GetAtt CanaryRole.Arn
FailureRetentionPeriod: 10
SuccessRetentionPeriod: 10
Name: !Sub 'pc-${EnvName}-fe'
RuntimeVersion: syn-python-selenium-1.0
Schedule:
Expression: 'rate(1 minute)'
StartCanaryAfterCreation: true

From the cloudformation docs and the error messages I am getting, it appears the handler needs to end in '.handler'. I've tried various values for the first part including the function name and 'index', all of which produce an error.

In a test canary that I created via the console, the handler value is set to 'pageLoadBlueprint.handler'. "pageLoadBlueprint" is the name of the filename in the lambda layer package, and "handler" is the name of the handler function.

When I download the lambda function code package for my cloudwatch-generated canary it is completely empty. Despite that, I can see the function code in the AWS console.

Annoyingly, I can't find any examples of the inline python script cloudformation pattern on the internet.

Does anybody have any ideas on this, or any examples?

https://redd.it/kzbj8e
@r_devops
How could I have handled this better?

Hi!

I'm a solo developer working on a management system for a client. After a month of trial testing, we're ready to release the system for approval tomorrow. As a precursor, here's the stack that I use:

1. A monolithic Django app running with Gunicorn.
2. It uses PostgreSQL for persistent storage.
3. Nginx sits in front of it to handle HTTP requests.
4. All of the above are running as separate Docker containers in which I use docker-compose to setup.
5. The infrastructure is setup in AWS using Terraform (manually through terraform apply)
6. Note that the application code and the database is hosted in the same EC2 instance
6. Configuration is handled by Ansible (manually as well using playbooks)
8. Code is stored in Gitlab

Essentially whenever I feel like my current code is good enough, I run a deploy playbook to update the prod's codebase and rebuild the containers. Every now and then I update my Terraform files when I need a new AWS service provisioned.

---

The problem:

I was working on a feature wherein a PDF copy of a Document entity is generated and saved to S3. Now this Document entity might contain a ton of images, so I decided to move the processing in the background using RabbitMQ. So the process becomes this:

1. User submits a form containing Document data
2. The Document instance is saved to Postgres
3. The instance is is then passed to RabbitMQ where the PDF file is generated in order to not block the current request

I ran the tests, deployed my code, and even used a dummy account in production to check if the feature works as expected. I went to sleep, and was woken up to the news that our prod server is down. Upon looking at the AWS metrics, it seems that the CPU utilization hits max just before the server crashed. So I rebooted the EC2 instance and ran my rebuild playbook and everything works fine again. The database seems up to date, and the PDFs are stored properly in S3 as well.

The first thought that came to mind is that Django might have passed too many tasks to RabbitMQ, overloading the CPU usage. But upon looking at the RabbitMQ, Django, and Celery logs, I can't pinpoint a specific area that might confirm my theory. All of Celery's tasks completed without any error FWIW.

As a resolution, I temporarily removed the RabbitMQ components from my stack.

This is my first time handling such a system from development to production, and I want to improve my ops and cloud skills in order to identify and prevent such events. Can you guys provide some practical tips for me?

Thanks a lot and stay safe!

https://redd.it/kz9gcq
@r_devops
Help finding scripting opportunities

Hey all,

I recently (6months) started a new position at a small shop (maybe taking care of ~500 machines) and as the lowest staff member am not given any interesting tasks that can help me grow as an engineer. Mostly I do password resets or SSL installs, or fix customers' WordPress bugs.

Any interesting tasks that the company thinks of are given to either the CTO or the guy above me. I had a chat with my boss about this and he said any other tasks I wanted to do I had to think of myself rather than getting given more interesting tasks. Then upper management would decide if it was a good idea or not.

So, I'm struggling for ideas and was wondering if anyone can help? We mostly run websites with the LAMP stack, using Nginx, Varnish and Apache2. We also have some email servers and customers running their own email servers on VMs.

My current ideas are to:

1. Add automatic email checking for customers posting tickets to ensure they're authorised

2. Have a script to do an automatic new WordPress install.

3. A script looking at Varnish logs to see if we're catching stuff that barely gets hits from the cache and add exceptions for those to save cache space

4. A script to auto install email on a customers VM (could this be done with Ansible?)

5. A script to check the email server configuration just like apache2ctl or nginx -t

6. Maybe some machine learning tool to auto optimise the varnish cache. Using Selenium to test whether the site is the same as when uncached.

Just wondering if anyone has some neat script ideas or have thoughts on if my ideas could be helpful, or build on my ideas, or even just have advice for someone in my position.

Cheers!

https://redd.it/kz471g
@r_devops
How to avoid Remote access server crashing?

So, I'm sure a lot of people are SSH'ing Into servers these days to remotely run their code. I have a local server in my office which I SSH into. Running Ubuntu and I use it primarily for Deep Learning and ML training and inference.

Usually, there's some support staff that can quickly act if the server crashes or something and restarts. But recently due to COVID, access has been pretty limited and while running a piece of code it crashed the system (maybe!?) and I'm not sure what crashed. Memory error or some Python library issue. But, the SSH ssems to be inactive and failed now.

Is there some way, to run code in some way that does not ever crash my system? I'm thinking to put a CRON monitoring script to check every few minutes if the system is fine. Maybe SSH or something and if there's an issue it can sudo - restart quickly. Any other tips/tricks? This isn't a production server or anything. Just my own system.

Thanks!

https://redd.it/kz2awk
@r_devops
Slow local queries

Using AWS' DocumentDB, I've deployed a test database cluster in Germany. When I run a test-query on it from an EC2 instance in Germany, it takes less than a 2 seconds. When I query it from my country (middle-east) it takes more then a minute.

This is the simple test I run (it just goes through the whole collection):

var lastdoc= db.dev.find()
while(last
doc.hasNext()){ lastdoc = lastdoc.next(); }

Thing is, we also have an RDS (MySQL) DB in AWS and a query of the same size takes less than 2 seconds from my country to RDS (Also in Germany).

I tried viewing the logs so I followed this document but I can't seem to find any logs from docdb in CloudWatch. These are my cluster parameters. I also tried opening a ticket with AWS but apparently our basic subscription doesn't allow creating tickets.

Does anyone happen to have suggestion on how to tackle this? What should I do/look for?

Thanks ahead!

https://redd.it/kzqs4d
@r_devops
Question: SaaS delivery to private customers

Has anyone delivered their public saas application, also to a customer who is walled privately (eg. AWS outpost, private DC)?

Assumption: that platform used by this private customer 100% conforms to architecture and requirements of your SaaS application. (K8s, AWS whatever).

If so, how do you manage your ops, specially when customer controls inbound updates?

When your CICD pipeline delivers a release to public, but is selectively allowed by customer (say every 6 months) - does this create problem for engineering and/or devops?

Observability is blocked, and you get controlled access when error happens and reported reactively by customer.

https://redd.it/kzsleg
@r_devops
How to move WITs from one board to another?

I have a backlog with a bunch of features. I've described the the states that a feature goes through from idea to done and want to implement this in my backlog setup. The states cover two overall process:

1) Specifying the feature (4 states)

2 Developing and releaseing the feature (5 states)

Question: Is it possible to have two boards attached to your backlog, so after specifying the feature on Board #1, I move it to Board #2, where I handle development release?

The easy solution would be to put the 9 states into the same board, but this would be a mess. Want to split in two for easier overview it´s two seperated workflows...


Any advice?

https://redd.it/kztzqp
@r_devops
Sysadmin looking to enter the dev side of things... where to begin?

Hey all, I've been working as sysadmin/infrastructure side of things for the last several years, ranging from Windows shops w/Powershell to AWS shops and running everything in Terraform/Bash/Ansible/etc.

The writing is on the wall with things like CDK becoming more popular in the market. A lot of local companies I'm interested in are NodeJS/React shops. At first I considered a coding bootcamp, but I'm motivated enough to self-teach for the time being.

So that said... where do I begin? There's a plethora of courses out there, but I'm looking for ones that might appeal to more recoverying sysadmins. Any suggestions?

Thanks!

https://redd.it/l07sz2
@r_devops
CICD pipeline

Hey guys, sorry for the novice questions but as I am studying CICD flows a lot of questions come to my mind and I am looking for a couple of answers:

In the scenario where I have a pipeline where I do a build (docker image) on every commit, what’s the best way to manage and handle all the docker images created? Let’s say, 10 devs commit/push the code upstream and that would build a docker image 10x, how do we control it in a docker repository? Keep the same tag or different tags so that the next phase of the pipeline can take the image/deploy to be tested?

Also, what’s the best way to create CD to deploy in a Kubernetes cluster with Jenkins? During the cicd pipeline, would my newly docker image have a release tag which then Jenkins could trigger a set image in my deployment?

Thank you in advance!

https://redd.it/l07fk8
@r_devops
Anyone available for a quick chat regarding user provisioning?

"DevOps" / Software Developer here. I've done data integrations for K-12. Integrating Active Directory, Google Workspace, and a bunch of other apps. Important question to ask you!

Today companies generally have AD, Azure AD (Microsoft 365), and/or Google Workspace. Companies generally also have some kind of HR system.

To get new employees accounts into these systems, IT gets an email from HR or a manager that someone is starting and a manual back and forth begins. There's the entire life cycle of an employee at your company that requires manual steps as well.

I think it's crazy we don't have an easy way to automate the integration between HR and IT systems. Every company generally has the same issue. How do you handle this today?

There's things like Okta, Onelogin, JumpCloud, etc. My issue with these apps (I've used Okta and Onelogin) is that as a new business today do you really want to also pay for Okta on top of Microsoft 365 / Google?

There's also Manage Engine, Tools4Ever, and a flury of other products. My issue with all of these apps is they hide the mappings (ie code) behind an App which leads to all of your standard low/no code issues.

Would anyone be willing to hop on a quick call, video call, or even just a comment below?

https://redd.it/l05iya
@r_devops
Fullstack Webdev or Devops? (3rd-world citizen here)

Hello everyone, I'm a uni student from Egypt, a country where society is toxic, backwards, and corrupt. Living conditions are terrible. My mental state has been at an all-time low ever since I returned here.

I really need a lifeline. My goal is to live in any developed country where I can get by in English.

I'm currently a senior enrolled in an IT-related uni degree (it's essentially an MIS degree). My GPA is 3.4.

I am proficient in C++, frontend webdev essentials (JS, html/css), and some python. Most of this I learnt on my own.

I'm currently at a road-block and don't know what to learn anymore.

- I'm having a hard time evaluating what career/skillset would most likely help me land a job abroad as a 3rd-world citizen.

- I keep procrastinating on whether I should focus on fullstack webdev or just devops. Heck sometimes I wonder if I should focus on business analysis/intelligence.

I've got around a year to learn something before i do my mandatory military service, so I need some direction. I'd like to know what you guys think gives me the best chance of getting out of here (i.e what career would be best to pursue given my situation)

P.S. Yes, I realize that I’ll definitely need years of experience in some sort of job. But my question is essentially which set of skills can reduce the amount of time needed for me to get the fuck out of here, or at least land a remote position for the time being.

https://redd.it/l0adst
@r_devops
Log aggregation and parsing

Hello redditors working on log aggregation and log parsing, what tools do you use for log aggregation and parsing? are you using any vendor tools or open source solutions like NiFi, Storm, etc.? and why?

I am working on a project and would like to learn from experience of someone implementing these solutions. I am trying to understand pros and cons of different solutions.

https://redd.it/l012fy
@r_devops
Best way to manage multiple game server containers?

Hi all. Long time lurker, first time poster. I am looking to create several game servers (personal servers hosted for the public) for older games such as Unreal Tournament and Call of Duty 2 and would like to remain independent of locking myself into specific services. That is, I would like to remain as vendor neutral as possible (aside from Docker, I suppose) so that, if for some reason I feel the need to change vendors, I can do so with relative ease. My thought was to build Docker containers with the base image for each game (probably on Alpine, if possible) and then modify each container as needed by attaching the necessary config file and third-party maps, etc. I will have over 10 servers, each in their own Docker container and probably each of them on their own individual instance (a t3.nano on AWS or equivalent). I realize managing all of these containers/servers this might get unruly and so I was trying to understand the best way to manage them. I have very little exposure to Kubernetes, but would this be one method of managing these or would that be a separate use case? Any thoughts much appreciated!

https://redd.it/l04xiw
@r_devops
DNS Load balancers?

Hello,

I'm currently working on a DDoS protection service for locations that other providers don't provide. The host I'm using offers cheap servers, but low bandwidth (750GB). That's plenty enough for a few customers home-hosting a SMP for their friends, but not nearly enough for a few hundred player on the network 100% of the time.

So far, I have found no TCP load balancers that can provide load balancing without proxying the connection as well. I started looking into the DNS side of things, so far what I have found that could possible work is a DNS ROUND-ROBIN solution. This would forward a player to any proxy that is listed in the list of proxies.

If anyone has a better solution to this problem, please contact me.

Mitch, AusGuard

https://redd.it/l0e50o
@r_devops
Laptop?

I’ve always used a MBP for all of my dev and admin stuff at my previous job without an issue.
But I’m now taking a new Sr Role and they asked what I wanted.

Have any MBP users had any DevOps Mac regrets?

I just need my stuff to work.

https://redd.it/l023r8
@r_devops
docker-compose failing

Hey,

I'm just a beginner in this ,

I built this docker-compose file which in the same folder I have a folder for the app and a Dockerfile file.

Why do I keep reeving this errors after running docker-compose up ?

Is there anything wrong with the .yml ?

​

version: ‘3’
services:
consul:
image: consul
restart: unless-stopped
command: agent -server -ui -node=server-1 -bootstrap-expect=1 -clinet=0.0.0.0
ports:
- 0.0.0.0:8500:8500
- 0.0.0.0:8600:8600/udp
networks:
- consul
proudctionapp:
build: .
ports:
-
0.0.0.0:8000:8000
networks:
- consul
networks:
consul:

**The errors I receive:**

Traceback (most recent call last):
File "bin/docker-compose", line 6, in <module>
File "compose/cli/
main.py", line 71, in main
File "compose/cli/
main.py", line 124, in performcommand
File "compose/cli/command.py", line 41, in projectfromoptions
File "compose/cli/command.py", line 113, in getproject
File "compose/config/
config.py", line 385, in load
File "compose/config/
config.py", line 385, in <listcomp>
File "compose/config/
config.py", line 518, in processconfigfile
File "compose/config/
config.py", line 226, in getservicedicts
File "distutils/
version.py", line 46, in eq
File "distutils/
version.py", line 337, in cmp
TypeError: '<' not supported between instances of 'str' and 'int'
3972 Failed to execute script docker-compose

Thanks...

https://redd.it/l04g76
@r_devops