Reddit DevOps
268 subscribers
1 photo
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Capturing dynamic infrastructure in a Infrastructure as code world

I'm looking for someone to help me connect some dots conceptually.

I understand the basics of IOC, but some situations it disconnects for me.

For example, in my application, when I create a user, I want to create a pubsub topic for that user, so that if I need to send a notification to that user, I can publish to their topic and leave the part of the system subscribed to that to deliver the message.

Likewise, if a 'team' is created in my application, I create a topic for that team, which fans out to the members of the team.

All of these things are dynamically created. How would I capture that into IAC? Or is it simply that this describes something that wouldn't be IAC?

Thanks for reading!

https://redd.it/ftqnml
@r_devops
what do you use python for in the devops space?

Looking for ideas to create some new tools for devops using python. Any ideas would be very appreciated. Looking to solve new problems but also interested in what has already been solved using python.

https://redd.it/ftr5mx
@r_devops
Custom Openshift PAAS Alternative with Kubernetes

Hi, first of all, sorry for my poor english, i'm french. I don't know if i got the good approach to do something.

I work for a company which purprose an web application for banking operation.

We host our application and we have multiple separated instance with different version and different parameter of it. It's hosted with the "old way" : A web server + A BDD on baremetal or on virtual machine . No docker or Kubernetes yet.

As we have many instance with different version, different parameter, we had to maintain some "replica" of each instance for the hotline service which permit them to reproduce some bugs whenever a customer call. For now, we want to change this.

We want something which permits the hotline operator to ask with a web form, an instance of the application with their own parameter and **the version to follow** (for example when a tag is updated, their instance is update). After filled all those information, they receive an url which permit them to access to the instance.

To do it, i first dockerize the app, place all the app parameter as environment variable, then i try openshift 3.11 which have a kind of webform that fill a template file according to the filled parameter. This template file have a Deployment, A service and a route (kind of ingress). Openshift can also "follow" a tag and trigger an rollout on deployment when there is a new tag.

I manage to make it work but after facing many problem (prometheus, nfs subpath volume claim bug etc ) due to the old version of kubernetes included in Openshift 3.11, and after seen the painful process to upgrade to Openshift 4, I decided to switch directly to Kubernetes and write custom script and use CI/CD tools.

I installed kubernetes successfully on baremetal.

I made an little flask application, which permit user to fill an yaml template. This template contains Deployment, A service and an Ingress. Then i apply this yaml to the cluster and reply back with the url.

This feature work.

For the "version to follow" feature. I just looked some tutorial about Jenkins (first time doing Jenkins) .

It's seems that the approach is the following :

Scan the git repository, whenever a specific branch is updated, clone it, Build it, Publish it on the registry, then recreate a specific deployment on the Kubernetes cluster.

But, i want this approach

Scan the git repository, whenever a specific branch is updated, clone it, Build it, publish it on the registry, then patch and rollout many deployment with the same label or with the same image version on the Kubernetes cluster.

As i cannot find any documentation with this approach, i want to know if i have the good approach. Can someone help me please ?

PS : We host everything :

GIT server : Gitea

Docker registry : Private hosted one

https://redd.it/ftrc2n
@r_devops
Searching for a Book of Bamboo server/system(?)

My Boyfriend started to work in IT and looks for a book of the bamboo server. we just only found the pdf version (for free, we know, but we want a real book).. Do anyone have a result? maybe a book with this and some other topics?

https://redd.it/ftstw4
@r_devops
How to host your helm repository on GitHub

Hi everyone, over the past couple of weeks I've been working on [a Helm chart](https://github.com/renovatebot/helm-charts) for [Renovate](https://github.com/renovatebot/renovate), an open source project I'm involved in. Previously, I'd have just hosted this chart in the [helm/charts](https://github.com/helm/charts) repository, but now that's deprecated it wasn't clear where the best place to host helm charts were.

I'd seen some discussions about using GitHub Pages to host on [Helm Hub](https://hub.helm.sh/), and some tools like [Chart Testing](https://github.com/helm/chart-testing) or [helm-docs](https://github.com/norwoodj/helm-docs) to ensure high quality charts and documentation. I hadn't seen anything tying it all together in a complete package, so that's exactly what I tried to do.

Check out [How to host your Helm chart repository on GitHub](https://jamiemagee.co.uk/blog/how-to-host-your-helm-chart-repository-on-github/). It walks you through how to setup GitHub Pages and GitHub Actions build and host your helm repository, using best practices.

I'd love to hear how everyone else is doing this as well. And, of course, if you've got any feedback for me too!

https://redd.it/ftrf9y
@r_devops
Kubernetes cluster down on AWS?

Built a cluster on AWS Cloud9 using EKS but the cluster is down even the master node showing not ready..dont know whats going on since i looked at EKS and check on the nodes. All the nodes even say "Not Ready" even master, it was working previously but think have might done something to break it.

​

"AccessDeniedYour worker nodes do not have access to the cluster.

eksctl-eksworkshop-eksctl-nodegro-NodeInstanceRole-1BJZZG66QZ0KN"

​

I troubleshooted using this guide [https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-cluster/](https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-cluster/) but problem still persists.

https://redd.it/fty2gc
@r_devops
Your Worklife balanace? On call nights and weekends?

Curious about transiting into devOps. I find the tools, tech, and the mindset interesting but concerned about the work/life balance.

Would you guys mind sharing your perspective on this? Is it the typical 40 hours/week? How often are you on-call, every night or weekends?

https://redd.it/ftu35m
@r_devops
Udemy is having a massive sale and my colleague recently published a course on Prometheus

Hi everyone,


Thought I might help him share his new course on Prometheus monitoring. It's 88 lessons and \~10hr of content. It's only $10 USD because Udemy seems to be having a massive sale rn. The thing is that there's only 1 day left in the sale!


The link is his referral link because the course isn't in the Udemy search cache yet but there's no incentive because every course is still $10 because the sale.
If I'm doing this wrong(promoting), please let me know.


Thank you

[LINK](https://www.udemy.com/course/master-devops-monitoring-with-prometheus/?referralCode=37BDCE3060F77F89A1A2)

https://redd.it/fttayu
@r_devops
Struggling where I should find issues in my IaC like CFT, Terraform, Ansible or ARMTemplate

Infrastructure as Code, it doesn’t matter if via CloudFormation, Terraform, Ansible, ARMTemplate or others, are the de facto standard for creating the cloud environments in most enterprises.

If you are using one of these technologies to build your infra today, where would you prefere to have visibility on any potential issues across the many cloud services that you probably use today, such as public-read buckets (S3) that should be private?

1. In the IDE with some IaC issues detection Plugin
2. CI/CD pipeline with some IaC scanning issues detection
3. Only after building the cloud infrastructure
4. Doesn't matter unless the security team flags it

https://redd.it/ftwjp3
@r_devops
Documentation as a Path to DevOps Automation

Interesting post on how documentation can be a helpful step in setting up automation of your DevOps systems: [https://www.transposit.com/blog/2020.04.02-documentation-as-a-path-to-devops-automation/](https://www.transposit.com/blog/2020.04.02-documentation-as-a-path-to-devops-automation/)

https://redd.it/fu0d33
@r_devops
Considering Microsoft platforms: Azure Devops or Devops Server? Most of our data is on prem

To make things easier, we don't currently have an appropriate devops environment and so consider this as a new instance. Most of our data is on prem. Its mostly sql server and a couple of .net applications hosted on our iis server.

Our hope is to build a more modern devops pipeline. However, with most of our data on prem, it makes sense to go with Azure Devops server on first glance. But I also want to consider whether the cloud version might have something we haven't considered.

We have support the hardware requirements on the on-prem and scale as appropriate so that isn't a deal breaker

Given these few background information, what would be your recommendation? The on-prem version or cloud version?

Thanks

https://redd.it/ftr3d3
@r_devops
Anyone use Concourse at their company? What are your thoughts?

The place I just started at uses Concourse for their CI/CD, but I’ve never used it. Seems pretty straightforward to pick up.

One thing I did notice though is that it seems awfully slow, and the system itself is very unreliable

https://redd.it/fsyefh
@r_devops
Kubernetes Custom Resource Metrics

Kubernetes Operators and Custom Resource Definitions (CRDs) manage Custom Resources. We have added support for getting metrics for Custom Resources, such as CPU/memory usage, number of underlying pods created, etc. in our KubePlus API add-on tool ([https://github.com/cloud-ark/kubeplus](https://github.com/cloud-ark/kubeplus)). You can read about it in this blog post:

[https://medium.com/@cloudark/kubernetes-custom-resource-metrics-2a947dd4d954](https://medium.com/@cloudark/kubernetes-custom-resource-metrics-2a947dd4d954)

Looking for feedback on how you might use these metrics in your environments and what additional metrics you might find useful.

https://redd.it/fu6zmn
@r_devops
Good devops courses that actually teach you how to do devops and build devops infrastructure for any project?

I took some courses on Docker, but it was so basic I couldn't apply it on a production application that communicated with 15 of microservices, because of the complex port management required, so I am wondering if there's any actual good course that shows you how to do devops on a really complex non-trivial application.

https://redd.it/fu778s
@r_devops
What are some Free Open Source monitoring tool(s) comparable to SignalFX/Datadog

We're evaluating SignalFX at my work and I was wondering what free options are out there for monitoring:

Ruby/Rails APM
Kubernetes
Google Cloud: Functions, Compute, Storage, GKE
Golang APM

https://redd.it/fugd5p
@r_devops
Does anyone know any video training that pertains specifically to Jenkinsfiles?

Ive got some courses on Jenkins but they only lightly touch on Jenkinsfiles. Is there like a comprehensive training on Jenkinsfiles? I request video training because that's how I (and i think most people) learn the fastest. Thank you :).

https://redd.it/fuk5uy
@r_devops
Automating initial install of new server

Hey all,

​

This is a quarantine-driven exercise as part of a workflow that has been on my to-do list for quite a while. I'm a software dev by day, but have two servers at home that I want to experiment with in this free time, and the name of the game is complete automation.

​

Given that this is an exercise, instead of a quick-fix or anything of that sort, I'm looking to follow industry standards, where possible. One exception being that I am learning with servers at home, so "on-prem", whereas I know the industry is standardizing around cloud-native.

​

Now my question for you guys is, pretending these two servers I have are brand new servers at your job, how is the OS installation handled in an automated way? I can't imagine someone is sitting there babysitting a new server OS install. Or, as a devops person, is the OS installation taken care of before you get your hands on it?

​

With what I currently know, PXE booting the OS seems like the best(only?) option. From my limited reading, I am assuming this can be fully automated, as long as the BIOS boot order is set to check for PXE/network boot. For my two servers, this was enabled by default, so I will assume that is somewhat standard, correct me if I am wrong.

​

Another aspect that I haven't yet had time to research, is an initial-setup script, of sorts, to be run right after the OS installation. Any insights, here?

​

If I am wildly off base please set me straight! I feel that this is a tricky part of the automation workflow to research, or my google-foo is off.

https://redd.it/fuirkl
@r_devops
Network automation, Ansible, and 2FA

Hey all,

So, I'm just getting started down the network automation path.

My current infrastructure is mostly Cisco IOS and Nexus, with some Juniper and Fortinet mixed in. In the future we hope to be moving mostly towards Cumulus + Fortinet.

Right now we have multiple authentication methods for network infrastructure. It's a bit disjointed, but it could be any of:

- TACACS --> Windows AD (1FA), restricted to 2FA-enforced (CAC) hosts and/or Guacamole (Duo)
- AD (LDAP) only (1FA), restricted to 2FA-enforced (CAC) hosts and/or Guacamole (Duo)
- RADIUS --> SecureID
- LDAP --> Duo --> LDAP to AD
- TACACS --> TACACS Server --> LDAP to Duo --> LDAP to AD
- Local

On top of this, we use separate AD accounts for our privileged access to network equipment. We log into our desktops and jumpboxes with our standard accounts (CAC) and then log into devices with our privileged accounts.

One thing that's on my docket is getting *everything* 2FA'd, in some capacity. If that means a mix of 2-3 solutions, so be it...

Has anybody done 2FA in such an environment? Did you just use a bastion host with some sort of key management and just push out public keys to everything via playbooks? I can't determine an easy way to handle this that wouldn't be a total culture shock to everyone. I don't see us just going 0 to 100 overnight with Ansible...but at the same time it doesn't seem Cisco supports any central management of SSH Keys, or using SSH Keys and LDAP/TACACS/RADIUS simultaneously.

https://redd.it/fufn7m
@r_devops
Digital Ocean VPC setup

Hi y'all.
I need to replicate a multi-az VPC setup in AWS (with public and private subnets, NAT gateway) on Digital Ocean. How should I go about this?
Thanks!

https://redd.it/fucydg
@r_devops
Refreshing non-prod Windows MS SQL servers on-premise in a VMware setting

How would you define / process refreshing a non-production environment?

A while back, we were getting rid of CommVault which our DBAs tied their refresh / restore process to but due to timing, we needed to do a refresh process without CommVault since our new backup solution was just starting up.

I was tasked to do this via Storage snapshots using Pure storage. The process works well. Basically, I create new datastore snapshots in Pure, translate it to VMware, and then attach it to the correct VM hands off. I am basically using Pure PowerShell, VMware PowerCli, Windows PowerShell, and a JSON file that has the mappings / translations of source drive to destination all in a Git Repo. This is all ran in Jenkins with a drop down selection for which environment you want to refresh. I can refresh any size server, whether it's 100 GB or 2 TB in the same amount of time which is about 20 - 30 minutes. A downside is I can only do this one at a time because of the scanning that happens in VMware for the datastores.

I guess my question is, does this sound like a good approach to stick with and improve? Or should we be using our backup solution for automated refreshes? I guess a downside to it though is it can take 8+ hours to restore the larger databases. On top of that, aside from restoring, they still need to scrub the data which takes time.

https://redd.it/fum978
@r_devops