Reddit DevOps
269 subscribers
4 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
windows metrics relabeling in Prometheus

​

​

Hello team, need some help,

I have in Grafana at metrics from my windows pods ( using windows_exporter):

if i try get metrics for example: rate(windows_container_cpu_usage_seconds_total[5m\])

I will see only container_id ( pod id) for example: docker://0040308261f7aa694ac13ca1d7fc92ee9c781892de774b4b87ea4f6a167344be not a pod name/container name because windows_exporter can provide only at container ID

as I understand I need to do relabeling in Prometheus-operator ( I'm using helpfile for this in kube-prometheus-stack)

how I understand I need to add metric_relabel_configs in values but I do not understand how I can do this.



additionalScrapeConfigs: [\]

\# - job_name: kube-etcd

\# kubernetes_sd_configs:

\# - role: node

\# scheme: https

\# tls_config:

\# ca_file: /etc/prometheus/secrets/etcd-client-cert/etcd-ca

\# cert_file: /etc/prometheus/secrets/etcd-client-cert/etcd-client

\# key_file: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key

\# relabel_configs:

\# - action: labelmap

\# regex: __meta_kubernetes_node_label_(.+)

\# - source_labels: [__address__\]

\# action: replace

\# targetLabel: __address__

\# regex: ([\^:;\]+):(\\d+)

\# replacement: ${1}:2379

\# - source_labels: [__meta_kubernetes_node_name\]

\# action: keep

\# regex: .*mst.*

\# - source_labels: [__meta_kubernetes_node_name\]

\# action: replace

\# targetLabel: node

\# regex: (.*)

\# replacement: ${1}

\# metric_relabel_configs:

\# - regex: (kubernetes_io_hostname|failure_domain_beta_kubernetes_io_region|beta_kubernetes_io_os|beta_kubernetes_io_arch|beta_kubernetes_io_instance_type|failure_domain_beta_kubernetes_io_zone)

\# action: labeldrop

https://redd.it/l9ysz8
@r_devops
Question about devops engineer's job

Hi everybody, I am currently doing an engineer's school to work in IT jobs. I have both network and programmation's courses. I took some interest in the job of "Devops Engineer" (At least what I understood about it on Internet). I think that the information about this job is pretty lame. I would like to know if you could tell me more about it ? Moreover, I like to write code. Do you code as a Devops Engineer ? If yes, what kind of programs or scripts do you write ? Is Devops Engineer only about network management ? Is it about both ?

I would like to specify that i like both network management and programming.

Thank you for your answer

​

P.S : Sorry if I made mistakes. English isn't my first language and i didn't write it since 3 years.

https://redd.it/l9s1iz
@r_devops
When are DevOps the blockers?

Hey everyone. Been a DevOps engineer for close to a year now.

I know the purpose of our role is to streamline processes, through automation, best practices, etc. But when do we become the blockers?

I've noticed situations where devs are blocked, because DevOps have to review PR's which create infrastructure through TF or management via Ansible. But our roles are so busy, we have tickets coming in every second, that this just ends up blocking the devs since they're waiting.

Or another case, where even most members of DevOps don't have AWS permissions to do anything. So we have to use the CLI to get most of our work done, which is a huge blocker. For example, doing something on the console is 10x faster than figuring out the inline json syntax from the CLI.

Is my experience normal? And if so, what practices do you guys follow that makes DevOps less of a blocker, for others and for themselves?

https://redd.it/lb4ux1
@r_devops
Recommended Git client for Windows

Linux luser looking for some help migrating Windows users from TortoiseSVN on a network share to GitHub. Would people recommend TortoiseGit or GitHub Desktop or something else for the GUI approach? Would chocolatey git be the best command line / Ansible approach? Last time I tried running the git-scm.com installer it gave weird errors. Users are on Windows 10. Deploys will go to 7 or 10. Thanks!

https://redd.it/laxxrq
@r_devops
What makes a DevOps standup/scrum etc successful ?


I work on a small team at a \~100 person tech company.

2 "Sr. SRE" (including myself), a DBA, a buildmaster, and a newly promoted "Manager, DevOps" who is reasonably technical but was previously in a Sr. customer success type role sortof like a sales engineer but who was on our team. There is a totally separate small team that handles "Production Operations" and I have significant overlap with them in terms of responsibilities, often more than my own team, The only difference being I generally work on stuff before it is deployed vs after, in theory at least.

​

We've been having a monday morning standup call/meeting since I started over a year ago and frankly I feel like it's lacking and unfruitful. Nobody really comes prepared, we just rattle off what we are planning to work on that week, I sometimes references my jira queue etc, but there is little coherence to the whole thing and no followup on it, the next monday just rolls around and we do the same thing.

​

I've talked to some other companies that do standups daily or 4x/week. It seems like we should have a list of items prepared and then review our progress as a team at the end of the week or something.

​

So what makes a successful scrum, what should a lead of a small team like this be doing to ensure productivity and success?

https://redd.it/lb29ei
@r_devops
What is the best infrastructure as code tool for dynamic deployment?

I am searching for the best tool to dynamically and on the fly manage and edit my infrastructure. By that I mean we have a Kubernetes cluster on AWS or Azure or Google cloud and I want to add a new worker group on the fly and when I am done I will delete that.
At the moment I am using Terraform for setting up the cluster but our product needs to continuously manage the underlying infrastructure and it should be able to at least cover the three bigger players (AWS, Azure, Google) and another requirement is any tool that I am going to use must have a Golang client or expose an API so I can interact with it from Go.
Any suggestion?

https://redd.it/lasett
@r_devops
[HELP] Octopus deploy: Run a script in a worker container from an external feed

Hi all,

Pre-apologies if this does not fit this sub.

**What I am trying to do:**

* I am using hosted Octopus Deploy. I need to run an AWS CloudFormation job from a Octopus worker. The job is written in NodeJS.

**How I am trying to do it:**

* I am trying to use a dynamic Ubuntu 18.04 worker node that is offered on the hosted Octopus Deploy variant.
* I am trying to get the Ubuntu worker node to spin up a custom Alpine container from AWS ECR that contains all dependencies for the deployment. (I don't want to have to install all dependencies on the worker directly per deployment).
* I am trying to run a script in a GitHub repository on the Alpine container.

**What is happening:**

* The Ubuntu worker node gets the Alpine container and the GitHub repository containing the script successfully.
* The GitHub repository is unzipped to `/home/Octopus/work/xxxxxxxxx-xxxx-50` (notice the `50` at the end). This is the current working directory for this step.
* The Ubuntu worker then spins up the Alpine container successfully, but mounts the **wrong** directory. It mounts `/home/Octopus/work/xxxxxxxxx-xxxx-51` (notice the working directory has changed `51`).
* The script attempts to execute an fails with `standard_init_linux.go:211: exec user process caused "no such file or directory" ` which appears to be a Docker error message. A verbose message states that `Process /bin/bash in /home/Octopus/work/xxxxxxxxx-xxxx-51` exited with code 1 `

**What I have already tried:**

* Running the same steps above but directory on the Ubuntu worker. It is successful as the working directory is not changing from `50` to `51`.
* Moving the package to `/home/Octopus` (as this directory is mounted within the Alpine container), then using an inline script (i.e. copy/pasted code directly into the step rather than an external script from GitHub) to call the script that was moved to `/home/Octopus`. This fails as the inline script in the step is being placed into working directory `50` then `51` is being mounted to the container.
* Simply running an inline script within a container. Also fails with the same issue of the wrong working directory.


Any help to be able to even run a `Hello World` script from within my Alpine container will be greatly appreciated!!!

https://redd.it/lautoz
@r_devops
Is Jenkins mandatory for beginner?

Hello,

I wanted to pick up on CI/CD tool for my next dev project just to learn how it all works. I wanted to make use of GitHub Actions, since I use GitHub extensively. However, everyone everywhere recommends to learn Jenkins, and I get the feeling that it's everywhere and I will miss out. Is it good idea to learn something else first, like mentioned GitHub Actions? Are those skills transferable between CI/CD technologies?

Thanks :)

https://redd.it/larz2v
@r_devops
Is The DevOps Handbook. Good read to learn and get into Devops?

I am leaving oking for a book to read and I keep seeing Project Unicorn and The Devops Handbook around. I read The Phoenix project which I liked so I am going with something baking that line

https://redd.it/lbczug
@r_devops
Discussion: Agility

I've been reading The Phoenix Project. I quote, "...get your build process automated." "Get humans out of the deployment business. Figure out how to get to ten deploys a day." (Chapter 30)

Getting 10 deploys a day was just a number that Erik threw out as a baseline for Bill to work towards. But, that number can be at any threshold that works for your business.

I wanted to open this up to a discussion about agility in your day to day.

1. What does your current build process look like? (Is it fully automated from Dev to Prod? Is human interaction required?)
2. How did you get your build process automated? (tools/solutions used)
3. How many deploys are you pushing a day/week/month?
4. What is your deploy frequency? (on demand/daily/weekly/monthly?)
5. What constraints or bottlenecks, if any, are you still hitting?

https://redd.it/lb99en
@r_devops
Client side javascript usage - best practice how to analyze it?

So we have moved our applications to the cloud in Azure Kubernetes Services. We have the prometheus stack installed for some interesting analysis.

Our main website is a react clientside website. We want to monitor how our users use the website. I am talking what menu items they press, what configuration preferences they have stored in their local storage etc.

We already have a method to store "server side" metrics via Prometheus.net (it's a .net application).

I have looked into Azure Application Insights. They mention things like monitoring Retention, Funnels and Users, sessions, and events analysis. I am not sure if this makes sense for our scenario.

I am curious, what is the best practice way to gather this type of clientside data?

https://redd.it/lb5nxo
@r_devops
CI-CD Platform (A new initiative) --- Trying to make CI-CD processes smoother

Hey Everyone,

I am a Product Manager & is driven by the aim of bringing innovation to B2B SaaS & delivering ground breaking product initiatives.

One of the opportunity areas identified is “Making DevOps teams less busy & more productive”. We envision to achieve this by conceptualising & building a “CI-CDaaS Platform” to offer seamless experience and efficient software delivery pipeline management without hiccups and much manual intervention.

If your daily life involves a tussle with the CI-CD processes, understanding the challenges faced by you will be of UTMOST IMPORTANCE to us and will help us to identify the right problem areas to solve.

Below is the link to a short survey to get your thoughts:

https://forms.office.com/Pages/ResponsePage.aspx?id=2bCoUwDZzEidflk13I1bF0GBbFQhc-5BqssHSqnq37NUMFk2QUFVWTZVSFJKNUxPUUdSMFlaSlBDWC4u

I would sincerely appreciate your response.

Cheers...!!

https://redd.it/larjvh
@r_devops
What belongs in an RPM vs a Config Management Tool (Ansible)?

Currently my team is trying to put things like:

Creation of new users on a machine
firewalld rules

into RPMs. This seems wrong to me though I can't really place why. I am newish to the team and only 3 years into my career though, so I don't have as much perspective as others. Is there a good reason why those things should be in an RPM versus an Ansible role? More generally, how do you delineate what belongs in an RPM versus Ansible?

I don't think the other more senior developers on my team like Ansible and would prefer to put as much as possible in the RPM and document the rest of what needs to be setup. The RPMs and documentation would be given to other teams. However, I manage our own VMWare environment where I would much prefer to use Ansible to update development clusters.

To be fair, I think my predecessor put too much in our Ansible roles and, rather than using group_vars, put all variables in an included variable file. Every dev on our team then runs the Ansible playbooks to update their own cluster. Furthermore, every development cluster has its own yum repository (running in a docker container. No one on our team uses or really wants to use docker) which the Ansible roles copy RPMs to. In short, I think some of the design was overly complex and left a bad first impression to some of the other developers.

I am sorry for being scatter brained, but the more I type, I wonder if this is more of a process / education / culture problem, rather than a what tool to use problem. Does any of this seem whacky?

Thanks, I appreciate any help. I don't really have anyone at work to talk with for advice.

https://redd.it/lbdhzi
@r_devops
wrap.sh: an experiment in devops UX

My team has been looking into ways of improving the devops experience for a while now, and testing our theories through the spontaneous creation of a bunch of small tools.

One theory we examined goes as follows:

1. The projects with bad CI experiences have poor, slow and low-signal feedback from their CI pipelines.

2. The most direct signal you can get is from direct access to the broken version of the project

3. So if a project lets you connect directly to the pipeline when a test fails, that should come with a good CI experience.

This led to one of the more promising experiments: Wrap.sh - an attempt to increase feedback and visibility in CI pipelines. It's also open-source.

We don't fully know what to expect from this thing; it might be useful to someone, maybe for OSS projects using end-to-end tests. Or maybe not.

In any case, we'd love to hear your questions and feedback regarding the tool (or the theory it's based on).

The links:

https://wrap.sh

https://github.com/layer-devops/wrap.sh

https://redd.it/lbdmgo
@r_devops
Containers monitoring solution

Hello Folks,

I am working on a task for comparison on different monitoring tools currently out there for containers/k8s. I understand Prometheus has some native support like service discovery for Kubernetes, but i would like to understand what are the close competitors to look for. I am looking for about 6-7 tools for a deeper evaluation. A quick Google search returns many tools but would like to hear from the community experts about your preference on monitoring k8s clusters. Any useful inputs on this is highly appreciated.

Thanks

https://redd.it/larf53
@r_devops
Solving ArgoCD Secret Management with ArgoCD-Vault-Plugin

Hi everyone, i wanted to share an ArgoCD plugin that i have been working on that allows for connecting to Vault in a simple way that does not require an Operator or CRD. The plugin is in its early stages and only supports a couple backends but we look forward to any contributions/suggestions or ideas you may have!

https://werne2j.medium.com/argocd-secret-management-with-argocd-vault-plugin-539f104aff05

https://redd.it/lbpcpp
@r_devops
How do you automate AWS AMI updates?

I currently manage most of our infra with terraform. I have a module that returns the latest AWS AMI for a particular service (EKS, ECS, etc). This means that whenever we run a terraform plan for a project that uses the service, the plan will include an AMI update if AWS has released a newer AMI. This has worked fine but I'd like to make this a little bit more stable. I'd like to have the latest AMI run for a while in our non-prod environments and then have some sort of approval process so that production gets updated later. Any ideas on how to make this work? Or any ideas for an alternative approach?

https://redd.it/lbrfhs
@r_devops
transfer thousands of files of any size with optimization

We have been doing a mix of manual process and some scripts to transfer files of various sizes from one system to another. Basically there are shares where people may dump hundreds or thousands of files of varying sizes. We then move these files to another location.

We want to use a tool that would automatically optimize speed/perf based on file size and amount and transfer the files. (nifi maybe?)

https://redd.it/lboo0s
@r_devops