Reddit DevOps
268 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Watch Kubernetes Experts Attempt to Fix Broken Kubernetes Clusters (Episode II)

Damn ... this weeks was really fucking tough.

Both Jason DeTiberus (@detiber) and Walid Shaari (@walidshaari) decided to "mess" with `etcd` and debugging the problem was rather challenging.

I hope you find this entertaining and helpful

I need a really strong drink after this one.

https://www.youtube.com/watch?v=JzGv36Pcq3g

https://redd.it/ls9ctm
@r_devops
calling terratest operators

does anyone have a solution for generating a report from terratest, something easy on the eyes?

https://redd.it/lscyrh
@r_devops
Team meeting on future priorities next week

Hi all, I’m currently a systems engineer in a team of 8, working under a larger IT services team of around 100 and serving around 10k staff total. Next week we have a meeting to discuss our ongoing priorities and spitball what new ways of working we could implement/ where to put our focus for the future. I’m desperate to put the case forward for DevOps processes, Agile working and focus on pushing the company towards the Cloud/ Azure. I’m wondering what areas I should suggest we focus on to try and move us from the old school and into a much more DevOps-focussed mindset.
Right off the bat I think we should initially concentrate on areas like making sure we have Azure policy right (enforcing resource group tags, only allowing certain SKUs of VMs to be spun up, correct data residency etc) and getting DevTest labs rolled out, thus creating a platform for our Devs to start being able to provision environments through self-service and start working in Azure for Dev/Test and Production.
I also think we should look at applying some elements of Site Reliability Engineering into our team. We don’t operate a shift pattern but I think we could benefit from having a designated ‘on call’ engineer, whom initially responds to our major outages/ P1 tickets, oversees the resolution and documents after.
Does that make sense, or do we instead need to start the conversation fresh between the Dev team and us to decide how we’re going to achieve ‘DevOps’? I think if we put in the initial groundwork to start removing the barriers between Ops and Dev as above, the collaboration between the two teams will start to run a bit smoother and we can build from there.

https://redd.it/lscmgf
@r_devops
Microsoft hosted agent - No folders to save?

Hey,

Im pretty new to understand things, but I understand that the Microsoft hosted agent has some limitations vs the self hosted ones. Im trying to do the following on a release:

I'm trying to download something and save it somewhere but it seems like I can't save it anywhere? I can't seem to find any target. As a save location I already tried stuff like $(Build.ArtifactStagingDirectory) or $(Build.SourcesDirectory) but nothing seems to be available.
I now created a cmd task to list the output of those variables by using the code:

echo "Structure of work folder of this pipeline:"
tree $(Agent.WorkFolder)\1 /f

echo "Build.ArtifactStagingDirectory:"

echo "$(Build.ArtifactStagingDirectory)"

echo "Build.BinariesDirectory:"

echo "$(Build.BinariesDirectory)"

echo "Build.SourcesDirectory:"

echo "$(Build.SourcesDirectory)"

The output is:

Folder PATH listing for volume Temporary Storage
Volume serial number is 0000028E 0CE3:CCEA
D:\A\1
Invalid path - \A\1
No subfolders exist
"Build.ArtifactStagingDirectory:"
"$(Build.ArtifactStagingDirectory)"

"Build.BinariesDirectory:"
"$(Build.BinariesDirectory)"

"Build.SourcesDirectory:"
"$(Build.SourcesDirectory)"

So like, where can I even download and put stuff?
I dont have an artifact connected as this doesnt depend on code, it just has some tasks to download stuff.

https://redd.it/lscgo8
@r_devops
Automated Remediation / Automated Runbooks

A while back there was a thread on automated remediation for incidents that's relevant to some work that I'm doing, The idea of machines telling us where our systems are broken and then solving those problems for us sounds like a really nice idea in practice, but I have yet to run into someone responsible for operating systems that wants to remove the human from the loop...yet.

Which sort of brings us to the space of the automated runbooks where an SRE or DevOps team can define catalogues of system solutions that are served to support teams to click "execute" on.

One of the things I'm thinking through is even when we have automated runbooks, we're still requiring human interaction, which means we've gotten a customer complaint and so something somewhere is broken, but we want a human being to check in on the problem and make sure a fix isn't making something worse.

However, I've seen some systems that were large / complex enough that ticket volume was so high from redundant issues that it made sense to systematically automate away problems -- as in, create scripts that resolve redundant issues in order to eliminate toil -- in order to get operational teams head above water. The justification is that it takes a while to 1) get to RCA and 2) get an eng to prioritize a fix and get it deployed that operational teams ought to have a path to end-to-end solve problems.

Doing this at scale has it's own set of risks or things that would understandably freak people out (don't make the situation worse!), but I'm curious if anyone has some examples of this type of work happening on operational teams.

https://redd.it/lsgxi7
@r_devops
Devops for data engineering role

Hi folks,

We have a position open in our company:

https://www.linkedin.com/jobs/view/2427007934/

We are trying to find a DevOps lead with some spark experience. We want to productize an ML analytics application that is built on Kafka+Spark on GCP and we'd like to automate the deployment of the architecture based on customer supplied parameters.

If you are comfortable with Kafka deployment (on-prem/cloud), know how to setup authentication/authorization for a multi-tenant Kafka broker, know all about load balancing, have basic knowledge on spark and database deployment on the cloud, and can automate this end-to-end pipe using a tool of your choice, please do apply.

https://redd.it/lsg56p
@r_devops
How We Securely Deploy our Website with AWS

Hi all,

We've just put up our first blog post which is a detailed look at how we rely on various AWS services such as CodePipeline, Secrets Manager & CloudFormation for development & secure deployment of our company website.

We use Hugo to generate a static HTML site, Ansible for configuration management, and GitHub for our private repos.

There's some improvements we'd like to make like using an S3 bucket to host the static site, and we'd also like to implement AWS CodeGuru into the pipeline for ML-powered code reviews.

We'd love to hear any feedback about this post, and we'd be happy to discuss this solution further in the comments if anyone wants to.

https://www.acutestack.com/blog/how-we-securely-deploy-our-website-with-aws/

https://redd.it/ls0vik
@r_devops
calling terratest operators

does anyone have a solution for generating a report from terratest, something easy on the eyes?

https://redd.it/lscz0t
@r_devops
Introducing: Dashbird's serverless Well-Architected Insights feature

I'm super excited to tell you all about Dashbird's new feature that I believe will change the way serverless developers build and operate their environments. TL;DR: Based on the principles of WAF’s best practices, Dashbird now runs 80+ continuous checks against your infrastructure and gives you actionable advice on how to optimize your infrastructure.

Why did we decide to add this feature and how does it actually work? Learn more here: https://dashbird.io/blog/introducing-well-architected-insights/

https://redd.it/lsc5wc
@r_devops
Jenkins: agent label inheritance among stages?

Hey, I'm wondering if a stage can inherent agent labels or if they override the values when refactoring it within a stage?

I'm using a parallel build and would want each stage to have its own agent label. Example:

pipeline {
agent { label 'baselabel' }
stages {
stage('Build') {
parallel {
stage('Test
1') {
agent { label 'childlabel' }
steps {
script {
sh "echo 'hello world'"
}
}
}
stage('Test
2') {
agent { label 'childlabel2' }
steps {
script {
sh "echo 'hello world'"
}
}
}
}
}
}

In my example above, would stage **Test\
1 have base_label and child_label? Or would the child_label override the base one?

Thanks,

https://redd.it/lsbxa6
@r_devops
Moving to a Devops role from windows

As the title suggest, I am moving from a Senior Windows Infrastructure engineer role with a large PLC company (mostly on-prem) to a Devops role with a global company that provides accountancy software on aws using linux.

I am very excited to get stuck in and will no doubt be posting loads of questions. In my current position the technology stack wasn't keeping up the market and became apparent that staying with them would eventually come back to bite me.

One question I do have for the Devops community, do certifications still hold their value in this job sector? I was very much of the opinion that I did the Microsoft exams to validate my skills but it is time consuming and demanding. Obvious routes I can see is getting Devops AWS certified, however there are loads of other technologies in use such as Ansible, docker & team city for CI/ CD.

Interested to know your thoughts.

https://redd.it/ls5gx3
@r_devops
How to deal with multiple interviewing processes and offers?



I already had some interviews with positive feedback that will probably result in an offer, and also have still some interview processes starting and going on.

How do I have to deal with multiple processes and offers if I'm about to receive an offer from one company but still want to wait for other companies? Should I gather as much offers as possible and wait for all processes to have finished before accepting an offer, or should I even try to sort of delay the process at other comapanies?

Not sure how to handle this.

https://redd.it/ls66yy
@r_devops
DevOps for beginners? Part II

Hi all, I hope you all are having an exceptional week. Last week I published an article called DevOps is not a thing. Due to its coverage I had to split it up in two articles. I have the second one ready with me today. so go ahead and check it out. https://vibhanshuspeaks.medium.com/devops-is-not-a-thing-part-ii-26fb223f0dbf
If you like it please appreciate it, found any issue report back I will fix that, want to discuss? I'm open for your comments! :)

https://redd.it/ls3rb0
@r_devops
Anyone tries to use Digital Ocean App Platform?

I am developing a web widget that can be embedded into another website. DO App Platform looks good since it has a load balancer built-in. However, I am not sure if it can handle CORS, content security policy, web server headers, etc.

https://redd.it/ls39pw
@r_devops
Full monitoring in one place with Grafana and Kubernetes (+100 instances)

I have over 100 instances on AWS. I want full monitoring in one place - Kubernetes cluster with Grafana.

My question is, what do you think about generating dashboards (IaaC) with CPU/RAM/IOps usage views for so many instances?

Is it a good idea to use helm for that and then somehow switch values so that it can fetch data from other instances and create charts on a per-instance basis?

Perhaps one dashboard with one chart, which shows all of CPU usage, another with RAM etc.?

What solutions worked for you in such a scenario?

https://redd.it/ls0f9l
@r_devops
CKA/CKAD still worthwhile?

My company is offering to pay for my training and exams. I have no k8s experience, so I think I'm going to go for it, if for nothing else just to learn the tech. Just curious if these certs are actually held in high regard?

https://redd.it/lroeh7
@r_devops
Terraform EC2 post deploy configuration

Wondering if anyone can share their ideas on getting config files and installing packages on new EC2 instances provisioned using terraform.

options considered:

\- baking packages into AMI & deploying config files to EC2 instance using Terraform

\- using Terraform to run post exec hooks on the EC2 instance after deploy

\- using Ansible to deploy scripts and packages to EC2 instance after deploy

These seem to be the only ways to keep the configuration of the instance located with the IAC package, I'm a little fuzzy on how I would execute these solutions so any advice if you have done it before or think it's a good idea would be useful.

Would like to avoid deploying supporting resources like a chef or puppet server.

https://redd.it/lrniza
@r_devops
Terraform EC2 post configuration

Wondering if anyone can share their ideas on getting config files and installing packages on new EC2 instances provisioned using terraform.

options considered:

\- baking packages into AMI & deploying config files to EC2 instance using Terraform

\- using Terraform to run post exec hooks on the EC2 instance after deploy

\- using Ansible to deploy scripts and packages to EC2 instance after deploy

These seem to be the only ways to keep the configuration of the instance located with the IAC package, I'm a little fuzzy on how I would execute these solutions so any advice if you have done it before or think it's a good idea would be useful.

Would like to avoid deploying supporting resources like a chef or puppet server.

https://redd.it/lrmwgk
@r_devops
How do i manage several processes - without containers

Since it's 2021, the standard way of running several processes across a number of virtual machines is to run them in containers under Kubernetes. That enables automatic monitoring of the processes, failover, scaling, and all those good things.

But before containers were a thing (or even today, because containers and Kubernetes add a level of complexity that you may not want or need), how would you manage several running processes on a server cluster? Starting new processes on the machine with enough capacity, reporting if they fail, restarting, etc. -- there surely must be some tools for that, similar to what you get with Kubernetes but with standard Linux processes instead of containers.

https://redd.it/lrmfis
@r_devops
Auditable SSH access to server maintenance + Jenkins jobs

We deploy and manage services/servers for lots of different customers and we need to comply with new regulatory requirements for auditability.

For most of the "manual" maintenance tasks we can just use a bastion server with SSH sessions recordings, automatic keys assignments, directory auth and 2FA, all of that, no problem. But when it comes to the jobs going through Jenkins, things become cloudy.

We have a few Jenkins nodes (agents) around but most of the deployments go through SSH (ansible, rsync etc). We can't just have the same rule applied here (who is going to type in 2FA code all the time a job runs ;-) but at least we must be able to concentrate those accesses in the bastion and keep track of those activities as well, apart from Jenkins or repository audit.

Is this something you guys have been through?

https://redd.it/lrlrt2
@r_devops
Azure DevOps lefthand menu. #HATEPOST

Please. Anyone.


Does anyone know how to stop the hover over functionality of the left hand navigation menu?


https://imgur.com/KMt0M9U


I keep accidently taking my hand off my mouse, which then falls onto one of these icons, meanwhile I go to type and end up leaving the page without saving.


Fucking awful design.

https://redd.it/lsyyvx
@r_devops