Reddit DevOps
270 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Need help in my research on deciding on a centralized logging system

# Background

I am the sole person looking at setting up a robust infrastructure for my company (IT department of an audio book studio which has become a daughter company which strives to help modernizing the publishing business), which is not a big one. So I would like to get some input from some people who is a bit more knowledgeable than me if that is okay?

I am working on our CI (GitHub Actions), combined with Portainer to make some kind of CD using releases in GitHub, it is not perfect at all, but what can I do as the only one who is looking at these things?

I have also setup some basic monitoring of our virtual machines with Prometheus and Grafana.

# Setup
We are mainly hosting on a physical server in a basement which is not optimal, but because we have to connect to some cd production robots on the local network to be able to produce physical cds, mainly for the public libraries in the company origin company) and we have a lot of audio data, just about ~140TB.

I am working on splitting our stack such that resources can be split between locations.

I have managed to make a wireguard connection which makes it possible for the different servers to speak together on the internal network devices.

The application the development team is working on is hosted on a docker swarm, which is currently hosted on the physical machine.

# Back to the question

I have looked at rsyslog, syslog-ng, fluentd and loki.

I started with rsyslog, but I cannot seem to configure it as I want. I haven't tried the others yet, either seems good in their own way.

One thing I would really like is that I can achieve the same as I did with setting up Prometheus, being a pull mechanism instead of a push. Also the lead developer needs to be able to look through the logs in plain text, so he can grep and such.

https://redd.it/11m1i2r
@r_devops
How do you visualize pod health alongside application health?

K9s and OpenLens seem pretty slick (compared to kubectl) for seeing the status of the pods in my cluster (CPU, Mem, etc.).


For my application metrics, I've used tools like Wavefront, Datadog, and CloudWatch.


Does anyone else have a tool that can combine both? I've been able to do this with metrics platforms (by emitting pod stats as metrics), but that approach has issues when pod counts increase.


I've been working on and using something that can do this on a pod/app metric level. I have yet to use it on a super colossal cluster, though.


What combinations of tools do others here use to monitor both pod and app-level metrics in a singular cohesive view?

https://redd.it/11ma6d6
@r_devops
how to track VM images in Azure

Hey,

I am building custom VM images and putting in compute gallery for consumption but now need to be able to track their usage.

If a VM is just spun up using the image this is fine but the problem occurs when someone uses the image I created as the base for a new image.

I had hoped there would be a way to include some sort of unique identifier when building the image but this doesn't seem to be the case. The URN is overwritten as soon as the new image is built and tags don't seem to work either as the icimpute gallery is shared over multiple subscriptions and even at that I can't see how tags can be permanently added to the image.

I would have thought this would be a pretty standard request but just can't seem to find any documentation about it.

Would anyone have any ideas how to go about this?

https://redd.it/11mayzu
@r_devops
Integration Testing in CI/CD Pipeline

How are you running integration testing in a pipeline?

I'm fairly new to devops and am currently in the process of writing out some unit tests for various lambda functions and am then going to move into creating integration tests for various AWS components. I did some digging and found some easy ways to accomplish this via using SAM and running them locally, but I'm struggling to find a solution to automate this in a pipeline.

My understanding is that SAM spins up a docker container and runs the tests from within. I have some concerns if this is even viable to have the build agent install docker in a reasonable time. I'm also leveraging terraform and SAM is only in beta for the time being, although looks to be fairly promising.

Curious to see what others are doing and how they are incorporating integration testing within their pipelines.

https://redd.it/11mcwgv
@r_devops
Any tool you use to track all your tools/library/platform/framework/language release version?

Just want a centralized spot to track my versions. I can’t seem to find one. And don’t want to subscribe to thousands of different tool newsletter neither.

https://redd.it/11me618
@r_devops
trying to understand d365 from devops perspective

Hey all - I have knowledge on devops practices, though I still feel pretty junior at it despite working in devops for a few years. I am very new to Microsoft D365 and haven't worked with anything like it, really. Basically my initial impression is that it's kind of a devops nightmare. Does anyone else have this impression too? Any other devops people have any other high-level thoughts ir understanding on it?

https://redd.it/11m1fkw
@r_devops
Sending build jobs to a backend service?

I've got a home server setup where I manually build custom firmware for others. This involves making their desired changes to a specific configuration file, initiating the build with PlatformIO's CLI, then waiting about 5 minutes for it to dump out a binary.

I would like to automate this, giving them a web interface where they can make these config changes, initiate the build remotely, then receive a download link for the file once complete. So far I've looked into Jenkins, Gitlab Runners, etc. However, being that each user will provide their own custom config file (or I generate it based on their input), I'm not sure these would be the best solution.

Is there anything that would better fit this particular goal, or am I better off just making something in Python?

https://redd.it/11mhhm6
@r_devops
How can implement SSO for this situation?

Its my first time messing with authentication. I am creating a web app. The people in my office use 2 tools: Kibana and Grafana.

I want to create/implement a login portal where they will authenticate only once. Upon authentication I will display two links that can take them to their Kibana/Grafana accounts without having to login to them individually.

They both support LDAP and SAML. I've also heard a bit about Okta (these are all topics i don't know much about)

How can I achieve this? What do you recommend?

https://redd.it/11m1043
@r_devops
Utility to observe github pipelines

Are there any tools (paid or OS) that are commonly used to monitor Git pipelines, re-try failed steps etc (general observability in pipeline flow). As in how many jobs failed, resource usage (self-hosted runner), cost monitoring etc.

https://redd.it/11mjvfh
@r_devops
Lambda function error

In my Python 3.9 Lambda function , I am using these imports

import json
import os
from smbprotocol.connection import Connection
from smbprotocol.exceptions import SMBException



I get this error

{
"errorMessage": "Unable to import module 'lambda_function': No module named 'smbprotocol'",
"errorType": "Runtime.ImportModuleError",
"requestId": "ead10ed4-a135-4ffc-a0b1-985bdff0b88b",
"stackTrace": []
}



I have added smbprotocol as layered library in the Lambda

This is the library contents (screenshot)

https://i.imgur.com/V0Gqe4x.png


How to fix this error ?

https://redd.it/11mk3xn
@r_devops
Will AI automate the authoring of IaC?

I believe that IA can write IaC and will have a huge impact in the DevOps industry. There are already generative AIs that can create workflows from simple text instructions.
In addition cloud APIs are extremely well documented and there is already lot of code out there to train the AI.

What do you think?

https://redd.it/11lq342
@r_devops
How am I getting captcha for a site that I've visited the first time?

I'm abroad at the moment (in the UK by default), and suddenly all sites think I'm a robot, even if I've never been on that site.

How are those working? It must be an automated check (not something that is manually set up on their server, because they don't have data to work with, because I was visiting the site first time), and wondering if there are automated ways of preventing suspicious activity.

I want to set something like this up in my app, but never done it, so I'm looking for something simple, and automated.

https://redd.it/11mmy9p
@r_devops
A company is offering internship to me but I don't know anything about devops

I am doing a CS degree (first year) at the moment and they have offered me a devops role unpaid internship. Should I go for it? Is it worth it? For context, I am east European.

I have gotten myself involved in CS degree because I want to make lots of money and that's my main motivator. Do I like this degree? I do but that's mainly because I see it as a great way to earn more money. This might seem very shallow maybe but that's just how I am.

So with that said, should I take this internship and start moving in this direction? So far I have been preparing myself to get Software Dev role in the future in some good company because it pays well as far as I know but I am in the beginning state only as I am still learning C++.

https://redd.it/11lq71s
@r_devops
Question: tools for JSON-RPC calls?

Hey, devs!

Can you share your experience with using online tools for JSON-RPC calls and provide recommendations on which tools are most effective for this purpose?

I would appreciate any insights you can offer.

https://redd.it/11moczv
@r_devops
ai based recommendation engine

Hi guys,

maybe this is a dumb question but: how difficult is it to write an (ai based) recommendation engine, that recommends a handful of content pieces to a user after the user has made a few entries. (it should adapt based on how the user consumes said content pieces)

is it possible to do this with a small team 1-3 devs in weeks/months or is this a completely impossible task unless you have millions of dollars and a big company.


(also it needs to have an ai element)

Thank you !

https://redd.it/11mrevu
@r_devops
Update: Datadog Outage

https://status.datadoghq.com/

Well everyone, the nightmare is nearing an end as DD Eng worked tirelessly through the day/night for close to a day straight on what has been an anxiety inducing day for everyone involved. A full post-mortem will be coming later but the main gist is below...

"At 06:00 UTC on March 8th, 2023 the Datadog platform started experiencing widespread issues across multiple products and regions . The web application was unavailable or intermittently loading, and data ingestion & monitor evaluation were delayed.

We will share a more detailed analysis post-recovery, but at a very high level:
A system update on a number of hosts controlling our compute clusters caused a subset of these hosts to lose network connectivity.

As a result a number of the corresponding clusters entered unhealthy states and caused failures in a number of the internal services, datastores and applications hosted on these clusters."

Data is being backfilled as we speak and we're back to fully operational. All things considered, this was a disaster, but we got through it. I know everyone (sorta rightfully) likes to shit on us for our AEs/CSMs and the price, but I know eng is doing their best because goddamn it was a long night for them trying to get us back to our usual flavor of "just working". And yes, for everyone who asks, we do in fact use our own software and it did in fact help us figure out what was going on.

Signed, a sales engineer who has to give a demo today and pray not too many hard questions get asked.

https://redd.it/11mt2eg
@r_devops
How the hell do you reference an artifact to download from another pipeline in Github Actions?

I've got two pipelines, one is called **Build.yml**

- name: Archive WebAppContent
run: Compress-Archive -Path '${{ env.RUNNER_TEMP }}\WebAppContent' -DestinationPath './drop/drop.zip'

- name: Upload artifact
uses: actions/upload-artifact@v2
with:
name: drop
path: './drop/drop.zip'

another is called **Deploy.yml**

- name: Download Drop
uses: actions/download-artifact@v3
with:
name: drop
path: './drop/drop.zip'

- name: Deploy to staging # deploys to uat-01-staging
id: deploy-to-staging
uses: azure/webapps-deploy@v2
with:
app-name: 'webapp'
slot-name: 'staging'
azure-tenant-id: ${{ secrets.AZURE_TENANT_ID }}
azure-subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
azure-client-id: ${{ secrets.AZURE_CLIENT_ID }}
azure-client-secret: ${{ secrets.AZURE_CLIENT_SECRET }}
package: ${{ github.workspace }}/drop/drop.zip

How the hell do I get the second pipeline to find the location of the artifact created in the **build.yaml** pipeline and use it for my azure deployment in the second pipeline? I've scoured the internet and can't find any clear answer about why my artifact is not going to the correct place/why the deploy pipeline can't find it.

Note that both pipelines are within the same repository.

Thank you for your help.

https://redd.it/11mv3o9
@r_devops
How to people organize their Repos?

Our dev team are wondering what the best practice is for organizing GitHub repos around VS projects. I am responsible for all the DB stuff (i.e. SQL Server, SSIS, SSAS, SSRS etc). Is it best practice to create one repo for all these DB related VS solutions or create a separate repo for each one?

https://redd.it/11mvqxv
@r_devops
A 0.6 release of UI for Apache Kafka w/ cluster configuration wizard & ODD Platform integration is out!

Hi redditors!

Today I'm delighted to bring you the latest 0.6 release of UI for Apache Kafka, packed with new features and enhancements!

This version offers:
- A configuration wizard that simplifies cluster setup (right in web UI!). Now we can launch the app via AWS AMI image and setup a cluster on the go
- Integration with OpenDataDiscovery Platform to gain deeper insight into your metadata changes
- Support for protobuf imports & file references

Other minor, yet significant, enhancements include:
- Embedded Avro embedded serde plugin
- Improved ISR display on Topic overview (now you can view it per partition!)

And a cherry on top? Now we’re able to work around kafka ACL errors so you won’t need to confront pesky permission issues when using the app.

Don’t wait, the update is already available on github & @ AWS Marketplace!

Full changelog: https://github.com/provectus/kafka-ui/releases/tag/v0.6.0
Thanks to everyone who just started and continued to contribute!
In the next release, we'll focus a bit on expanding our RBAC possibilities (support for LDAP and universal OAuth providers) and some Wizard features!

https://redd.it/11mxpbj
@r_devops
RMM/UEM

Good morning everyone,


I've done quite a bit of Googling regarding this but haven't gotten very far. Short of taking advantage of all the free trials, which I will soon, it's hard to tell the difference from app to app.

With CMMC compliance on the horizon I need to remotely manage around 10 linux machines and 10 macs spread out across the states. Ideally I will be able to self host the central server but most of the options I have come across are cloud based.

Any suggestions or guidance is deeply appreciated.

Pros:

Opensource (TacticalRMM was all I found but there were some glaring concerns)
Can manage both Mac and Linux machines
Hosted on site
CIS/NIST configuration templates are a major plus

https://redd.it/11mvso9
@r_devops
SUSE Elemental Toolkit

Has anybody used Elemental Toolkit? Seems to provide a good tool set for k8s cluster lifecycle management, including OS build and maintenance

https://redd.it/11mtxom
@r_devops