Reddit DevOps
270 subscribers
8 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
12-factor app and once-off scripts?

According to 12-factor app, once off scripts should be run in an environment identical to production.

So if you have a script like python backfill_missing_records.py then you should run it in an environment that matches production, i.e., same docker image, same environment variables, etc.

But this is pretty hard to do?

Our production system is docker containers running on ECS fargate. We can have jenkins issue a task that pulls the latest docker container, runs the desired script and then exits, however --

1 - this is slow

2 - feedback cycles are slow (you don't find out if the script had a problem for several minutes)

3 - compared to SSHing into a production machine to run the script, you can't immediately take corrective action if something does go wrong

Up till now we've been maintaining machines in production environments that expose a SSH port and we've been manually going into those machines and running once of scripts as needed. However, these machines aren't guaranteed to stay in parity with production and generally just have a copy of the repository and some of the correct environment variables.

These machines are different from prod (EC2 vs ECS, no docker image, parameter drift, etc). But on the other hand, running these kinds of scripts in a prod-like environment seems pretty inconvenient as well.

How do other teams resolve this?

https://redd.it/n7z34n
@r_devops
To be a devops, do we need to be a peogrammer?

Do I need to know program in order to be a devops?

https://redd.it/n817i9
@r_devops
Failed dream job interview due to 2 questions, would appreciate any answers about these

Hey guys, coming down from a big failure today. So I got into selected for interview in a dream job, not big name but in terms of $$$, worklife balance, culture, etc. So I clear first round it was standard, complete some assignment, all good. Had talk with Devops lead. He asked me 2 questions that I had no answers for and that was it. I mean 2 questions.

So here is how it went,

He: So we want continuous deployment very frequently what is the best way to achieve this?

Me: CICD pipeline using tools like Jenkins, Gitlab, etc.

He: Okay, great. Now say for some reason there is a faulty release, how will you delete and roll it back automatically?

Me: Not sure maybe we can have Prometheus send alert which would trigger a python k8s script for rollback?

He: Not the best way

​

Next one, he asked me something related to helm before

He: So, let me give you a scenario. Our users are growing and we are making application ever better by introducing new tools and more applications, like Redis, python app , etc. What would you do in devops perspective so that adding and maintaining those new applications is as easy as possible?

Me: Create docker image for new component, manifest and helm and then Maybe we can have git repository to store all helm chart ?

He: Are you sure?

Me: Not really

​

We that's it to be honest some questions here and there, I guess either I am kinda dumb, maybe these were easy questions. Anyway I'd appreciate the answers guys. Thank you

https://redd.it/n7qiwt
@r_devops
Podman vs Docker

How many of you are using podman at work in place of Docker? How good it is?

https://redd.it/n7tidm
@r_devops
Host apps on different subdomains on the same ec2 server

Hello everyone,

Let me start by saying, I'm a noob in DevOps. Recently I encountered a situation where I have to host 2 flask apps on two different subdomains, for example, I've to host flask_app_1 on sub1.example.com and flask_app_2 on sub2.example.com. I have an ec2 instance and I own the example.com domain and I'm using Nginx as well. I did some trying before. I tried making a .conf file in the sites-available folder for both of them and linking them, but that didn't work. I'm not sure if I'm doing something wrong in this method or whether I should be doing something else other than this.

I'm pretty sure this is a famous thing that many websites do, but I just can't get it. Can somebody help me to get this problem fixed?

Thank you!

https://redd.it/n90ewe
@r_devops
Developing on Apple M1 Silicon with Virtual Environments

I teach a graduate course on DevOps and Agile Methodologies which is 50% lecture and 50% hands-on. In the hands-on labs, I use Vagrant and VirtualBox to provide consistent development environments for my students. This eliminates the problems with some students having Macs while others have Windows... everyone develops on Linux! 😁

That worked really well until Apple released their new 2020 Macs with Apple M1 Silicon chips based on the ARM architecture which VirtualBox won't run on. My luck, 8 students showed up for the spring semester with Apple M1 Silicon Macs and none of my VirtualBox based labs would work. So I purchased an Apple M1 Mac mini and began looking for a solution.

I just published Developing on Apple M1 Silicon with Virtual Environments to document how I solved this problem and provided a consistent development environment for all students using Vagrant with Docker as a provider.

You can read it here (feedback/questions are welcome): https://johnrofrano.medium.com/developing-on-apple-m1-silicon-with-virtual-environments-4f5f0765fd2f

If you want to try this out on an Apple M1 Mac, you can clone one of my lab reps ion GitHub and bring it up for yourself: https://github.com/nyu-devops/lab-flask-rest.git

(Next article will be how I got this working with Visual Studio Code Remote Containers as well)

https://redd.it/n973sc
@r_devops
Web application support resources

Any resources you can provide so I can get better at supporting web app. Mainly the server side troubleshooting part? I mean I know stuff from working over the years but would like a proper go to resource where I can learn about all the possible scenarios on why a site may go down and best solution to troubleshoot them.

https://redd.it/n97j8q
@r_devops
Issues with TFS agent version on Android Builds?

Hey guys so our dev team at our company started reported android build fails for our android apps. It looks like sometime in the last week (last successful android build was 5/3..however we only have 2 android apps we don't touch daily).

Wondering if anyone has seen this/gotten past this without just upgrading agent versions. I'm just wondering what happened that would force something like this and I hope this info can help others.

​

Required version as of some time in the last week: 2.182.1

Our current agent versions: 2.173

https://redd.it/n96yra
@r_devops
DoorDash Custom Canary Kubernetes Controller

Hey folks, I thought you might be interested in learning how we at DoorDash built a custom Kubernetes Canary controller on top of Argo Rollouts in the linked blog post. Let us know your comments and feedback!
https://doordash.engineering/2021/04/14/gradual-code-releases-using-an-in-house-kubernetes-canary-controller/

https://redd.it/n8zo60
@r_devops
Making the case for a new quality dimension for K8s apps

One of my mentors once said "Don't optimize before it works". I think that we could make a case that cloud native applications do already work. Hence, the next logical step is to think about running our apps with the optimal resource configuration. My team and I make the case that resource efficiency should be a key dimension for cloud native applications and I'd like to invite you to join our discussion next Thursday about best ways to integrate efficiency in our daily work. Find more details here:https://www.stormforge.io/event/crossing-kubernetes-performance-chasm/?utm\_medium=social&utm\_source=Reddit&utm\_campaign=crossingthechasm

https://redd.it/n7r4fb
@r_devops
Ad hoc jobs question

Hello everyone,

I was hoping to ask the experienced DevOps Engineers here for some help with the concept of ad-hoc jobs. I have read that "AWS CodePipeline and GitHub Actions do not cater for ad hoc jobs. AWS CodePipeline needs a trigger, and then runs a static pipeline. GitHub Actions is listening to git events. "

Our team is leaning towards using GitHub Actions and I am trying to determine if GitHub Actions not catering for ad-hoc jobs is something that we should seriously consider. However, I am not sure I understand clearly the concept of ad-hoc jobs. Could someone clarify what these ad-hoc jobs are, and what do they actually do? Is it a big downside that GitHub Actions do not cater for them? Any help will be much appreciated.

https://redd.it/n7q44g
@r_devops
Slack ChatOps for releases

Looking for solutions,

problem:

inherited project, release process is semi manual and cannot be completed without heavy engineer involvement.

​

proposition:

in order to win back time for engineers to refactor the release process into something a little more up to date, we should create a slack chatbot which can be used to send api requests to our build and release automation services.

​

I have done some research and the most accessible solution appears to be errbot.io which supports a sort of ACL which would be perfect for the gated release process our stakeholders require.

The solution needs to be fairly lightweight and written in an accessible language that won't require much up skilling from our engineering team should they need to support it.

I am about to enter the rabbit hole on this topic, I have no idea what the infrastructure will look like yet. Hopefully it can all be run from a lambda on AWS.

​

Any tips, ideas, warnings or references would be greatly appreciated at this point. Keen to hear what you've got for me DevOps!!!

https://redd.it/n7lv7w
@r_devops
New to cloud CI infrastructure (Bitbucket Pipelines in my case). What is the proper way to make a release?

I do traditional desktop software development with slow point releases, e.g. myapp_1.2.1.tar.gz. So no CD, I just build, compress, and upload to the Downloads section of my repo, from where people can manually obtain my release packages. I recently started using Bitbucket.

I already set up Bitbucket Pipelines to launch a build whenever there's a commit. I'd like to start using it to create the actual release package: so manually click something to initiate my intention to make a 1.2.1 release from a commit, compile the app, run tests, update a header file with the string "1.2.1", create the archive myapp_1.2.1.tar.gz, and upload that.

I read the doc and learned how to do these steps but I can't tell how I'm supposed to send the desired version name to this pipeline.

Based on what I read (their doc isn't the best btw), I saw 2 ways:

* pipelines can be triggered based on a pushed git tag. So instead of making a release from the web UI, I could open a git terminal, create and push the git tag "release-1.2.1". This will trigger a pipeline configured to trigger on "release-*" tags. Anything in the pipeline can then extract "1.2.1" from the value of the $BITBUCKET_TAG envvar. This seems very unnatural to me, it inverts the flow I'm used to.

* I could abuse repo variables. Keep an CURRENT_RELEASE_VERSION=1.2.1 variable which I update before making a release. I don't like this because I could forget to update the variable, or click the Run Pipeline button by mistake when I'm not trying to make a release. And because of that silly stuff can happen like overwriting past versions, overwriting git tags, etc. Also, this doesn't communicate the explicit intent of making a release.

It feels like there's a clean 3rd method I'm missing.

Here's a skeleton of the pipeline I had in mind before I started: https://paste.ubuntu.com/p/zmn8ffkkJ6/

https://redd.it/n9k1u8
@r_devops
Production Ready DevOps Books

Hi there,

I'm writing this post to ask for any books that discusses production-ready devops techniques/designs/prototypes. I'm moving our applications to Docker and i think it's ready to go to production. I also built Jenkins pipeline that does the build and deployment to my Swarm Cluster. It's surfing perfectly. But I'm interested in reading more about what others are building and using for their environments, best practices, security todos, and more. I know it's a big topic but there must be a book that discusses this.
My stack is: Java Spring Boot, Jenkins, Docker Swarm, Haproxy.

Thank you guys for the help.

https://redd.it/n7hfvt
@r_devops
data redundancy. Where to back stuff up to?

Hi all.

I have an app in Google Cloud platform. I have the following data:

* Bucketed misc files in storage (~ 1gb)
* Bucketed secondary files (~ 1tb and growing). If we lost these, it's not the end of the world, but not ideal.
* Database (~ 1gb)

What is the best way of keeping all that safe? I have the regular 7 day backups on the database.

I am most concerned about a scenario where we are held ransom, or we lose access to our account.

I would ideally like to store this data offsite, or perhaps in another cloud provider? What do people recommend?

Edit: I came across rsync.net. Seems like something that could be useful as a simple solution?

https://redd.it/n7gcku
@r_devops
How are you measuring DevOps performance?

Hi r/devops,


Many people here are familiar with the four key metrics identified by DORA for measuring performance of software development teams i.e. lead time, deployment frequency, change failure rate, mean time to recover

I'm curious to know some of the different ways in which you are measuring these metrics. Are there any well known tools/approaches that make this easy, or are you building internal applications to measure this stuff? e.g. Incrementing a counter in a data store after every successful deployment and pulling this data into a nice dashboard

I apologize if this is a simple question, just curious to see how others are measuring the impact of a devops culture

https://redd.it/n9o76m
@r_devops
How to persist volumes/filesystems in a Packer EBS AMI for use in newly created EC2 instances?



I'm trying to build an AWS AMI that has all my filesystems set up as I'd expect, i.e. /var, /var/log, /tmp etc. I am attempting to achieve this using packer in conjunction with the Ansible provisioner.

Here is my HCL2 build file

source "amazon-ebs" "example" {
ami_name = "test_ami ${local.timestamp}"
ami_description = "test ami with predefined filesystems ${local.timestamp}"
instance_type = "t2.micro"
region = "eu-west-2"
source_ami_filter {
filters = {
name = "amzn2-ami-hvm-2.0.*-gp2"
root-device-type = "ebs"
virtualization-type = "hvm"
architecture = "x86_64"
}
most_recent = true
owners = ["amazon"]
}
# EBS for root volume
launch_block_device_mappings {
device_name = "/dev/xvda"
volume_size = 10
volume_type = "gp2"
delete_on_termination = true
}
# EBS for data volume
launch_block_device_mappings {
device_name = "/dev/sdb"
volume_size = 5
volume_type = "gp2"
delete_on_termination = true
}
ssh_username = "ec2-user"
}

I then have ansible provisioners to set up my physical volumes, volume groups and logical volumes, along with some xfs filesystems. This all works fine during the Packer AMI build. I can verify using PACKER\_LOG=1 packer build . that the plays are successful in my ansible playbook.

Once the AMI is created, I have built an EC2 instance off of it, but all the work the Ansible playbook has done in setting up the aforementioned volumes and file systems has disappeared. For example, /dev/sdb1 doesn't exist when I run blkid or fdisk -l
. My /etc/fstab file has also disappeared.

I was under the impression that although I've selected delete\_on\_termination under launch\_block\_device\_mappings , that the snapshot created from the AMI build would be applied to any EC2 instances that were built from the AMI, therefore my physical volumes and filesystems would have been intact.

Am I misunderstanding this? If so, can anybody clarify where I'm going wrong?

https://redd.it/n9rtll
@r_devops
Delete CloudFormation Stack Including S3 Objects

I needed to create and tear down development environments. Deleting CloudFormation Stack has issue with S3 objects. S3 bucket can not be deleted if it has objects (to the best of my understanding). So I wrote a script which does:

1. Removes deletion protection from DB instances belonging to the stack
2. Deletes S3 objects including versions (10 in parallel) in buckets belonging to the stack
3. Issues delete stack command after the above is finished

The script is at https://github.com/ngs-lang/nsd/blob/master/aws/cloudformation/delete-stack.ngs

It is written in Next Generation Shell.

Hope that helps!

https://redd.it/n9sf9o
@r_devops