Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Terminating Elegantly: A Guide to Graceful Shutdowns

For applications deployed in orchestrated environments (e.g., Kubernetes), graceful handling of termination signals is crucial.

I prepared this repo to demonstrate how to do it in Go/Kubernetes to make sure there is no loss of requests/data - https://github.com/plutov/packagemain/tree/master/graceful-shutdown

https://redd.it/1e5sdp4
@r_devops
What was the most challenging bug you ever fixed?

What's the most challenging bug you ever fixed?

Share your toughest debugging stories! 🚀

https://redd.it/1e5pffl
@r_devops
I am so baffed

Can someone explain to me what devops is? I am starting a DEVops role this September which is a 4 year apprenticeship with uni but have nooo idea what DEVops is. I been reading the thread and see alot of different versions of what responsibilities they have. Is DEVops a support role like IT where you’re just a fixer? Am I working on the cloud using AWS? What the flip is a docker? Is DEVops a mix between being a support engineer and a cloud engineer? Any help would be appreciated!!

https://redd.it/1e60ae4
@r_devops
We have a "code sync up" meeting after our standup that I find useless..

Some of our devs want to discuss our code more and requested an additional daily meeting an hour after standup..

I kind of got a bit flustered and said something along the lines of..

- We're all senior+ devs here.. if you have an issue just bring it up after standup..

- Put a PR together and i/we can review your code and provide feedback

- 9AM - 10AM standup then 11AM code review standup killls my entire morning. What. the. fuck my dudes.. figure it tf out or gtfo..

- If we're all actually senior devs then we do not need an additional meeting to dive deeper into stories/code.. (honestly i got quite flustered and pushed this point lol)

..and yes i'm actively looking for an internal transfer.. love the company but this team is just odd

https://redd.it/1e5zvd9
@r_devops
Build server specifications

Hi everyone,

I'm planning to set up a new on-premise build server for our development team and could use some advice on the specifications. Here are the key details and requirements:

Project Details:

Type of Projects: A mix of C++ and C#
Number of Developers: Around 15 developers.
Build Frequency: Multiple builds per day, with CI/CD pipelines (AzureDevOps).
Expected Load: Simultaneous builds for different PR

Current Specifications:

vCPU: VM\_1 24, VM\_2 24, VM\_3 24
RAM: VM_1 24, VM_2 24, VM_3 16
OS: Windows Server

The 3 VMs works in a VSphere cluster under VMware. The pyshical machine is shared with testing and PO environments. We woulkd like to build a dedicated build server.

Currently the total build process of a PR takes 1 hour. Some application builds on VM\_1, some on VM\_2 and others on VM\_3


In my wettest dreams I'll love a docker configuration. Anyway, we would like to decrease the PR time down to 10-20 minutes.

Additional Information:

Budget: Open to suggestions, but looking for a balance between performance and cost-efficiency. They actually asked for 2 tiers: a mid-tier solution and a beefy solution.
Scalability: Should be able to scale with increased load in the future.
Other Requirements: Suggestions for backup solutions, redundancy, or any other considerations would be appreciated.

Any recommendations or experiences you can share would be incredibly helpful.

Thanks in advance!

https://redd.it/1e65bmf
@r_devops
Dependency Track not showing components (and vulnerabilities )for some SBOMs made with Syft

I'm using Dependency Track to monitor for vulnerabilities on multiple systems. I create an SBOM in CycloneDX 1.6 format using Syft and then import the SBOM into Dependency Track. The problem is that for some systems I upload the SBOM and the system accepts it without complaining that something is wrong but then nothing happens. The component list just stays empty and nothing is shown.

For other systems doing the same works just fine.

Any ideas what could be wrong?

https://redd.it/1e65jfu
@r_devops
9LPA TO 8LPA developer to cloud engineer Pune

Hi Friends. currently i have 9lpa as a front end developer 2.7yoe. I want to transition to Cloud/Infra role for Good Career in future.

I got opportunity to give interview for Cloud engineer 2-3yrs of experience. i aquired all the skillset that company needed.

Actual Company is big tech giant but Consultant telling that Budget is 8LPA.

Will this Lowering in Pay will make my Future better or i continue with Devlopment

https://redd.it/1e66mq5
@r_devops
How to get real time work experience as a Devops engineer?

Hello, I'm from Hyderabad, Telangana, India. I have completed Devops training two months ago and have been practicing from 3-4 months. Previously, I have worked for 2 years in a finance company and left it because of low salary and increasing targets. Prior to that I have worked in a web hosting company until covid-19 pandemic and had experienced in working with servers and troubleshooting errors related to hosting and websites.

So after knowing about Devops subject and considering my previous experience with servers, I've joined a training institute to learn Devops with AWS and Linux. The topics I've learnt are : Git, Github, Maven, Jenkins, Nexus,Tomcat, Ansible, Docker, K8s, Prometheus, Grafana, Argocd, Helm and Terraform.

Now, before applying for Devops jobs, I want to experience what real-time work looks like. What should I do? I'm ready to work for free fulltime in exchange of gaining real-time experience. Any suggestions or help is appreciated. Sorry for lenghty post.

Thanks.

https://redd.it/1e66g9e
@r_devops
Why deploy Argo components (Workflows, ArgoCD, Events, Rollouts) in different namespaces?

I'm looking to stand up the full Argo ecosystem in a test cluster to try out the full Argo flavoured CICD system. In the Argo docs each component is to be deployed in it's own namespace:

- ArgoCD -> `argocd` namespace
- Argo Workflows -> `argo` namespaces
- Argo Events -> `argo-events` namespace
- Argo Rollouts -> `argo-rollouts` namespace

Why is this?

Wouldn't it make more sense to deploy them all in an `argo-system` ns for example?

If anybody has any experience deploying the different components to a single common namesapce I would love to hear about your experience.

https://redd.it/1e67txc
@r_devops
What are some of your life hacks when it comes to DevOps? Share your tricks

It was only recently where I learnt about firefox containers. Really cool feature that allows me to have multiple different AWS Accounts open at the same time. I used to have to have different browsers open for this.

Good documentation is also a good one. I try to document pretty much everything I do. That way whenever I get stuck, I hopefully have a note somewhere the helps.
I also always have a tab open on the far left of my browser for ChatGPT.

Really interested to hear any tips you all have for getting a tiny bit further in your day to day work.

https://redd.it/1e6b4al
@r_devops
Who deploys and manages API Gateway

Folks - I have a question on API gateway usage. Who actually uses API gateways? Who sets it up and manages it? Is it Infrastructure engineer who sets it up and manages it? And devs use it to configure routes ?

https://redd.it/1e6asow
@r_devops
What are some practices you follow to reduce cloud infra cost ?


I have been advised to look at cloud cost across aws and gcp and it’s wild. Anything you do which helps you control the infra costs ? These are demo or test envs btw the resources with most cost are

Compute engines
Cloud filestore
Kubernetes engine
RDS

https://redd.it/1e6ca8v
@r_devops
Triggering alerts for PrometheusRules in a multi cluster setup

We deployed kube-prom-stack in a multi cluster setup where Thanos is deployed on the observability cluster and can see all of the rules we configure. Thanos does this by reading a path we provide it. Just for context we're doing something like this:

Deploy Thanos with Terraform and provide it with a values file:

...
...
values =
"${templatefile("chart_thanos_values.yml", {
cluster_name = local.cluster_name,
...
...
...
alerts = join("\n", [
for fn in fileset("", "./monitoring-rules/prometheus/*.yml") : file(fn)
)
})
}"
]
}

Then in the values file with pass alerts:

ruler:
enabled: true
clusterName: ${env}-ruler
alertmanagers:
- https://prometheus-kube-prometheus-alertmanager:9093
config: |-
groups:
${indent(6, alerts)}


Up until recently we never PrometheusRules manually, we only defined them in that ./monitoring-rules/prometheus/ folder.

For testing purposes I created a rule manually on our staging cluster. The rule is visible in Thanos (obsrv cluster) and it's even in firing state, but Alertmanager doesn't pick up the rule, so we're not getting the alert.

I'm pretty new to Prometheus and maybe I'm missing something, but how do I make my Alertmanager see these rules? Eventually we're planning on creating multiple rules on different clusters, but either it's not possible with our current config, or I'm just not doing something right.

I tried moving to Grafana Alerts. It sees all of the alerts, including the manual ones, it sees that it's firing but I wasn't able to make them fire from Grafana's alert manager. It seems like it's not possible for Grafana to alert on non-Grafana-managed rules.

Any help would be appreciated.

https://redd.it/1e66d4j
@r_devops
Career Advice Network Engineer -> Software / CloudDev / DevOps

Good day,

Looking for the advice for the above.

Essentially I am currently in a Helpdesk role with a company and looking at paths to further my career.

Preferably, the end goal would be for a remote position, however, that is not a requirement.

Current certification is primarily CCNA, of which I am pursuing my Cisco DevNet as well.

I've played around a bit with some software development, with a small number of languages, as well docker which i find rather fascinating. So not 100% on which path would work best for me, however, I am still researching what each position entails and would, of possible hear from people in similar roles already, who wouldn't , mind offering some guidance.

I have considered looking into a BSc in Computer Science from the University of London, however, with my current age, (31) I'm not sure how feasible that would be.

Any and all advice, suggestions, opinions are welcome.



https://redd.it/1e6gifa
@r_devops
Documentation

Shout out to y'all who spent hours writing those support documentation tasks which will never be read and stashed away in confluence until the end of time. Peace out homies.

https://redd.it/1e6ii75
@r_devops
Sysadmin here - do you manage your software yourself or let admins do it?

Hello,

Sysadmin here, currently updating software via SCCM, to get rid of some vulnerabilities. I've noticed that a lot of dev & devops users do not update their software (docker, python etc).

Since I'm a sysadmin, I'm more than happy to do it for you in bulk, but I'm aware that developer apps are very delicate and can break when updating.

So my question is - would you rather prefer to receive an email, giving you a month to update your apps (after that time, it's my time to shine) or you don't care and want admins to do it for you?

I realize the first option may not work, as probably a lot of people would just ignore an email.

All thoughts appreciated, thanks.

https://redd.it/1e6l38a
@r_devops
DevOps for industrial automation - SCADA, PLC controllers and the like (rant and a question)

/ ===== OPENING RANT =====
Hope you enjoy my writing
It provides context for the question
But it is not required to understand it
Skip to the next comment like this if you don't care :(
======================== /

I got hired as the IT team at a small company two weeks ago. I'm not even out of university and I'm already an entire engineering department, cool. We do mainly PV substations, construction and maintenance; but also home automation, power grid connections and the like. Since the company is small (less than 10 people) I also do the gritty industrial stuff, both in the office and on-site, in addition to being a code monkey and the like.

Hailing from the software engineering world, I have a very particular take on the process of creating
stuff. I have a nice modern code editor with bells and whistles, variable names are long, I write tests, commit changes to a VCS, run tests, maybe even automate running tests. Sometimes I even automate deployments! There's also the project management side - GitHub issues, projects, checklists, TODOs in code and out of code. Libraries are well documented (usually), or at the very least, I can look at the code.

Imagine the whiplash I got when I opened the SCADA software we use. It's older than me, the documentation is impenetrable (or maybe I just don't get it), and one of the main protocols is broken (though we don't know who's responsible, both implementors blame each other). Support for automating away boilerplate is almost non-existent. Did you know you can use AutoHotkey as an ad-hoc "code" generator? It's really neat! The SCADA uses JS as its scripting language. The engine has probably not been updated since I was born. It does not even have standard types, so you have to learn a custom `String` type. The system also uses a proprietary data format - it's a bunch of XML nodes glued with binary data. You can manually edit that, but only to a limited extent.
Okay, so maybe it'll get better when we get to safety-critical systems. After all, they better not fail. It would be quite unwise to, I dunno, not disconnect a 500 kW PV plant when the protection and management controller loses power, wouldn't it? This is not an edge case, right? You probably don't want the grid to exist in an unmanaged state. Wouldn't it be unfortunate if this specific scenario- who am I kidding, this happened today. No damage done, beyond a reputation hit, because it was done during project hand-off to the owner. Testing is done by finishing the contract and hoping nothing will explode or catch fire. I don't think the editor even supports tests, nor can I really check because it's proprietary software and a second copy has not yet been bought.
Version management does not exist. It's just not a thing. I copied the SCADA design file I was working on to my local computer, renamed it to include the feature I was working on and I upload it to the main server when I'm done for the day, with a README.txt in which I describe what's been done and what's left to do. I fuck something up? I better remember what I did, or revert to yesterday's copy entirely. Editor history is sketchy at best. Merging changes from two different people to two different things? You wish. One of us will need to manually copy the changes to their project. What changes? God himself could not diff those files and neither can I.
Project management? Done with notebooks. Sometimes. By some people. Usually we just wing it. I don't know what others are doing, and they don't know what I'm doing. I started writing READMEs, but I don't think this will catch on. 


/
===== QUESTION TIME =====
I apologize for any incoherence
I am currently kinda sick
And also falling asleep
Hope it was enjoyable anyway
========================= /

How would you go about DevOps for industrial automation? I can use Git with a self-hosted frontend for tracking changes, but that's not
really enough. Files are in a semi-binary format, so a standard diff isn't the right tool, and merging will basically have to not be done. I'm thinking of rolling a custom tool, specifically for working with those files, both for diffing and merging, but that will require reverse-engineering the file format and also plenty of time. Is there anything that can be done in the meantime? What about testing? I've read a paper about a similar situation, and, inspired by it, I'm considering hooking up the controller to a simulated IO device and either rolling my own or adapting an existing test harness to use real world hardware.
What about deployment? I think it's done rarely enough that doing it manually is fine, but still, not having to do that would be cool. Would something like Ansible work here?
If you have had experience dealing with similar systems, could you share any tips, mistakes you have made and the like?
Feel free to be imaginative - I have very few workflows and people to fight. Securing funding might be difficult, but I'm willing to give wacky ideas a shot.

This isn't really fitting here, so feel free to point me in the direction of more fitting subreddits, but I already wrote an essay so who cares if it's a bit longer. Any suggestions regarding automating some parts of the workflow? AutoHotkey really does help, but it's flimsy. One wrong press of a button and you entered a string of commands that did god knows what. Changing parameters requires opening a code editor and replacing strings. Is there anything short of rev-engineering the custom file format that would allow me to not do the same thing 30 times in a row?

https://redd.it/1e6lq7j
@r_devops
Anyone else sitting here waiting for Azure to come back up?

Been hours now, we are currently trying to move 25TB of data from one cloud hosting to another while hoping Azure Central US comes back up.

https://redd.it/1e6qe2o
@r_devops
How Do You Automate Your Status Pages?

Hi r/devops community,

I'm looking for advice and best practices on automating status pages for monitoring service health and notifying users of outages or performance issues. Specifically, I'm considering using Instatus to create and manage our status page.
Here's a bit of background:

I'm running multiple Kubernetes services, and I want each service to have its own component on the status page.
The goal is to automate the process of updating the status (Operational, Partial Outage, Major Outage, Degraded Performance, Under Maintenance) for each service.

Before I dive into implementing anything, I wanted to ask:

1. How do you automate your status pages?
2. What tools and processes do you use?
3. Any tips or best practices for integrating Kubernetes with a status page tool like Instatus?

I appreciate any insights or feedback!

https://redd.it/1e6xekh
@r_devops