Reddit DevOps
270 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Stratoshark was released today – "Wireshark for the Cloud"

Stratoshark was just released, it was made by the same people that are behind Wireshark. Looks like it could be a super useful tool for my workflow, I'm checking it out later today.

Here's their more in depth description:
Stratoshark lets you explore and investigate the application-level behavior of your systems. You can capture system call and log activity and use a variety of advanced features to troubleshoot and analyze that activity. If you've ever used Wireshark, Stratoshark will look very familiar! It's a sibling application that shares the same dissection and filtering engine and much of the same user interface. It supports the same file format as Falco and Sysdig CLI, which lets you pivot seamlessly between each tool. As an added bonus, it's open source, just like Wireshark and Falco.

https://stratoshark.org/

https://redd.it/1i7dmb0
@r_devops
ECS with multiple containers hostname resolve issue

Hi,



I am working on a dev environment where I want to dpeloy my on-prem docker-compose on ecs.

The app needs to connect to the db but I got stuck in the host name issue.



In Docker compose, we could easily reference the service name when it requires a connection from one container to another in the bridge network. However, in AWS ECS, when I try to do the same with bridge mode, awsvpc mode, it still did not work.



I tried to use localhost, 127.0.01, postgres.my-namespace.local, both either of them work in my situation. What is the solution on this case?



They are both running on my EC2 instances via ECS, much appreciated it!



I feel feel ECS is like the docker instance that you manage yourself. They are not really HA or robust unless you are using fargate mode. The storage part for the EC2 based is still the same and manage by myself.. It is good for the testing environment but to move forward, it will be eks.



https://redd.it/1i7ie1j
@r_devops
BMC on unsupported HW?

New devops gig and I come from a managing enterprise servers.

Now the new place is fun but kinda shot themselves in the foot with hard to reach machines with no native bmc support that I am aware of (thinkstation p620).

What can I do if I want to send them a reboot command, set the boot order to pxe, set the remote image uri and reboot?

I was thinking of creating the images with ssh access so a script can login passwordless (interbal network only so I don't mind) and maybe some grub magic to boot to pxe?

https://redd.it/1i7k8mf
@r_devops
Getting started with IaC?

Hi everybody, I’m fairly new to iac having only done KodeKloud’s terraform course and played around with pulumi at my new job.

Some guy set up some of our infra using pulumi and left. No one else knows what that guy did. He also wrote the infra in C sharp and didn’t document what he did.

I code mostly in python, we use azure, and I’ve want to set up some basic infra for some projects: function apps, some docker web apps, some data/mlops.

But because I’m new to azure (I worked as a data scientist/ml engineer at a company that didn’t use cloud - self hosted) I am not familiar with a lot of the options for different resources. Especially, around networking. I am finding going to the Azure GUI back and back again and almost wondering if I should start from the Azure GUI and once I’m more familiar with all the resources and their options move over to Terraform/pulumi. Thoughts?

https://redd.it/1i7jr63
@r_devops
How Can I Redirect Azure APIM Traffic to a Self-Hosted API Gateway Without DNS Changes?

We are using Azure in the company I work for. I am a backend developer, and since I have limited knowledge of cloud systems, I was asked to review the costs alongside the relevant teams due to increased expenses.

An Azure API Management Service (APIM) has been set up here, which I believe is unnecessary because its costs are extremely high.

Unfortunately, there are clients, such as handheld terminals, that rely on this APIM. These devices do not dynamically retrieve the API address; it is hardcoded into the application. In short, changing the APIM address is not currently possible.

Is it feasible to assign the `xxxxx-azure-api.net` address (which is managed by Azure, so it doesn’t seem likely) to an Application Gateway? My assumption is that when the APIM is shut down, this DNS would become available, and I could assign it to the Application Gateway.

Ultimately, the goal is to shut down the APIM and redirect traffic to a self-hosted API Gateway without requiring a DNS change. Could you suggest any methods to achieve this?

Thank you for your assistance.

https://redd.it/1i7mnmm
@r_devops
Can we all together create The perfect DevOps roadmap?

Hey folks,

I know we have roadmap sh however imo it is too vague and has too many tools that serve the same purpose. For example, why does it list 4-5 observability tools, languages, and CI/CDs? It also keeps outdated tools like Jenkins and Puppet while newer solutions are increasingly preferred in job postings. The whole thing is overwhelming and even confusing. Roadmaps are often used by newbies - they need guidance, not a list of all available tools.

I think we need something concrete, modern, and extremely relevant to the current job market, especially during these times.

Could we join forces to create a great DevOps roadmap? I created a public GitHub repo: https://github.com/prepare-sh/roadmaps. You can fork it, edit devops.json, and create a PR.

I also created a UI that will be displayed at: https://prepare.sh/roadmap/devops

thanks

edit: if you don't like UI we can host it anywhere else as long as it will stays free and available to the community

https://redd.it/1i7rn02
@r_devops
Project Idea for Resume?

So i'm currently working as an Automation Architect for a mid-size company. I have a C.S. degree and have decent coding proficiency. That being said i've had a desire to make a lateral move or at least dip my fingers into DevOps.

I'm going through KodeKloud right now doing their devops path and am enjoying it, I want to eventually get the docker cert (How hard is it btw?) and Kubernetes cert.

That being said I don't really have anything personal I can sort of "practice" this on. I know people at work that have cool home labs but I just don't really have anything like that I can do.

An idea was to make a project and use docker to sort of develop/deploy a simple web application (I mean super simple) using various docker containers and linking them up.

Like flak/redis/mysql etc...

Is that something that would be like "ok this guy at least knows some shit" if I were to put it on github or would it be kind of a waste.

Also curious what sort of things people practice on their homelab? Like what products do you use?

https://redd.it/1i7rgeo
@r_devops
Is finishing Full Stack Open course (https://fullstackopen.com/) worth the time to get started?

Background: I currently work as a tech support. I have been taking online courses on web dev for some time now (have a few certs for basic courses on Coursera and Udemy), but I do not have any practical/professional experience with web development. My plan is to switch from tech support to web dev in the near future. I have just started Full Stack Open and I really like the course so far.


In my job, one of our tasks is to sometimes collaborate with our DevOps team. I have no experience and knowledge on devops, but I have been intrigued by what they do so I have been looking up on some info about the field. And I find it really interesting to the point where I'm thinking of changing my plan and focus on DevOps instead.

My questions are:

* Is it still worth the time to finish fullstackopen or should I try to look for and focus on more DevOps-focused course? The course covers a lot of skills required to be a full stack web developer, so I think it is still worth the time, but I could be wrong.
* The next best course I found is this specialization from Cousera: [IBM DevOps and Software Engineering Professional Certificate](https://www.coursera.org/professional-certificates/devops-and-software-engineering). I am thinking of either switching to this course now or taking this after completing fullstackopen. Is this a good course to get started?
* Brutally honest: is my goal realistic? That is, is it a common scenario for someone with no experience to take online courses, build a portfolio, and switch to a DevOps role (either within company or outside)?


Thank you!

https://redd.it/1i7utwd
@r_devops
Terraform Github Provider | Question for users

Hi,

I'm one of the administrators of a large GitHub organization. We are in the process of creating self-service capabilities for our users since we are struggling with the volume of support tasks for minor operations, such as assigning organization secrets to repositories and adding repositories to GitHub apps.

Currently, we are using self-service repositories (running custom scripts on GitHub Actions under the hood) where users can create pull requests to request changes.

I am considering migrating to Terraform since it is more robust, and we can manage the current state more effectively than with custom scripts. I would appreciate hearing from those who have experience with the Terraform GitHub provider. What are the pros and cons, and what potential hidden issues should we watch out for?

The key requirement is that users should still be able to create pull requests with suggested changes, so we need to keep the configuration files user-friendly.

Looking forward to your insights!

https://redd.it/1i7yqf6
@r_devops
From Marketing Problem Solver to Developer: Seeking Guidance to Build My Tech Portfolio!

I'm considering a career transition into software development and would appreciate your insights and recommendations.

I have a background in problem-solving for clients in the marketing field, where I've spent the last 15 years. Throughout this time, I've frequently engaged in building MVPs and solutions to address issues arising from various platforms' inability to communicate effectively. My experience includes extensive data-driven analysis using tools like SQL and BigQuery.

Fundamentally, I was trained in the old days of VB6, ASP, and even some C, along with various front-end web development technologies. Additionally, I have a working understanding of machine learning models and have utilized large language models (LLMs) in a few projects.

While I have accumulated a lot of practical knowledge over the years, I sometimes feel like I have "too much knowledge for my own good" without a clear direction on how to formalize it. I'm eager to create a tangible portfolio that I can showcase on platforms like GitHub. My goal is to prepare myself for more formal projects or job opportunities in the software development field within the next year or two.

As a newbie looking to break into this field, I'm seeking advice on how to effectively leverage my existing skills, resources for building a portfolio, or steps to take for transitioning into development. Any guidance would be greatly appreciated!

https://redd.it/1i7yxk3
@r_devops
Opengrep - a truly Open Source fork of the Code Security tool Semgrep - Announced

In December, the code security scanner Semgrep made a bunch of changes to their licensing model and scanning engine making it harder to use and share rules between various tools or use the free version at scale. Opengrep was launched by a consortium of vendors for a truly open source alternative: https://www.opengrep.dev/

https://redd.it/1i83yde
@r_devops
Cluster API to production: authentication with service accounts and RBAC using External Secrets and Kyverno

Hi everyone!
I've just published the third part of my Cluster API to production series: focusing on providing tenant clusters service accounts for the management cluster.
This is an important step in managing clusters, as it provides clusters credentials they can use to access a secret manager, container registry, object storage and more.

The series follows every step needed from where the Cluster API documentation ends to deploying production clusters managed with GitOps.
With this part we're finally done with boilerplate for tenant clusters.
The next couple of parts will explore setting up a telemetry exporter with OpenTelemetry Collector, and setting up automated DNS and certificate renewal.
Slowly making our way towards the final goal: managing clusters with GitOps.

I'm still in the beginning of my technical writing journey, and would appreciate any feedback.

https://redd.it/1i846pe
@r_devops
Building Reliable AI: A Step-by-Step Guide

Artificial intelligence is revolutionizing industries, but with great power comes great responsibility. Ensuring AI systems are reliable, transparent, and ethically sound is no longer optional—it’s essential.

Our new guide, "Building Reliable AI", is designed for developers, researchers, and decision-makers looking to enhance their AI systems.

Here’s what you’ll find:
✔️ Why reliability is critical in modern AI applications.
✔️ The limitations of traditional AI development approaches.
✔️ How AI observability ensures transparency and accountability.
✔️ A step-by-step roadmap to implement a reliable AI program.

💡 Case Study: A pharmaceutical company used observability tools to achieve 98.8% reliability in LLMs, addressing issues like bias, hallucinations, and data fragmentation.

📘 **Download the guide now** and learn how to build smarter, safer AI systems.

Let’s discuss: What steps are most critical for AI reliability? Are you already incorporating observability into your systems?

https://redd.it/1i89o6u
@r_devops
Thoughts on Theo’s viral Stripe repo? Where our research led us

yo all, Y Combinator alum here and bit of an outsider to this group. My cofounder and I still struggle with payments code, even at $2M ARR for one of our businesses - despite Stripe’s reputation for being easy. It took us weeks, sometimes months, to fix all the bugs. Then Theo shared his struggles with Stripe, and to our surprise, many other devs shared the same experience. [https://github.com/t3dotgg/stripe-recommendations](https://github.com/t3dotgg/stripe-recommendations)

We’re in the early days of building something new: 

* Delete your Stripe webhook
* Drop your “Payments” table
* Drop your “stripeCustomerId” and “stripeSubscriptionId” columns

It’s 2025. We don’t need to live like this lol.

We’re a Stripe alternative that requires no webhooks, a drop in payments and billing for React devs of sorts.

It works like the rest of your React stack: updates “propagate” down from our server to you, so you don’t have to manage any billing / payments state on your side. 

You just call our “billing()” function on your backend, or our “useBilling()” hook on your frontend when you need your customers’ billing state. 

Since data flows down from us, you can run pricing experiments without ever needing to open a pull request or redeploy. This is all built on top of Stripe still, but built around react’s “updates flow down” paradigm.

Does this resonate with you? Why or why not? What payment challenges do you face, and if you had a magic want, what would you want or fix?

Bonus points and tons of gratitude for [hopping on a call with us](https://cal.com/harrisontelyan/flowglad-chat) to treat us like your therapist to tell us about your payment problems. In exchange - down to provide a design critique on any projects you’re working on (RISD/founding designer of Imgur).

https://redd.it/1i8aox1
@r_devops
Need suggestion on: How to manage DB Migration across environment

# TLDR;

We have a PostgreSQL cluster with 4 DB, one for each environment.
We develop on Development env., we edit the structure of the tables through PGAdmin and everything works fine.
Recently we had to port all the modification to 2 other env. we weren't able to do so due to conflicts.
Any suggestion on how to work and fix this issue?

# Structure explained

So we are a team that has been destroyed by a bad project manager and we had to start over.
New platform in development, new life for the devs.

The managers wanted a P.O.C. about an idea we had, we built it in a couple of months, they presented it to all the clients, they liked it and the manager gave a date without asking anything.

We didn't have the time to think and research too much on how to build the structure but we had the experience on what didn't work before so we built everything on AWS, 4 env: Development, Test, Demo, Production.
Every environment has his own front end with it's alias on the lambda functions and it's DB inside the cluster.

The DB is an Aurora instance compatible with PostgreSQL

The FE is hosted through S3 behind CloudFront

# What does work?

The lambda thing works well. We have a console that manages every day more thing, from enabling the various env, to enabling logs, publishing new versions and binding alias to those new versions.

The FE deployment kinda works.
We don't have alias and version there but through tags and branched on git we can deploy old and new version as wonted in every env.

# What doesn't work?

The management of the DB.

At the moment 2/3 people are touching the structure of the DBs, one of witch is me.
We are doing all the stuff from PGAdmin through the UI.

It works for what we need but some days ago we were required to apply all the new developments done over the months in the Test and Demo env and the DB migration didn't go as planned.

We used the diff schema functionality offered by PGAdmin but the script was huge and the alters were all over the place.

Fortunately we have yet to release anything to the public so for now we were able to remove the old db and recreate it but when we will deploy the Production we won't be able to do so, obviously.

We don't have any CI/CD, this week I had the opportunity to do some researched and I landed on Jenkins, SonarQube and Gitea (our GitHub is an enterprise server instance self hosted witch don't ave Actions so we have to try something else) but we are more interested on CI at the moment.

I know we are not well organized but we try really hard and we are a small team that produces a bunch of code every day.
The pace can't be slowed down due to "business needings" and we are tired of having problems caused by little time dedicated to R&D

BTW the team is composed by 4 junior dev (I'm one of them) and a single senior dev that now have to manage the whole dev department.

I'm open to any suggestion.
Tanks to anyone who will help. <3

https://redd.it/1i8bvb7
@r_devops
Share artifacts between two jobs that run at different times

So the entire context is something like this,

I've two jobs let's say JobA and JobB, now JobA performs some kind of scanning part and then uploads the SAST scan report to AWS S3 bucket, once the scan and upload part is completed, it saves the file path of file uploaded to the S3 in an environment variable, and later push this file path as an artifact for JobB.

JobB will execute only when JobA is completed successfully and pushed the artifacts for other jobs, now JobB will pull the artifacts from JobA and check if the file path exists on S3 or not, if yes then perform the cleanup command or else don't. Here, some more context for JobB i.e., JobB is dependent on JobA means, if JobA fails then JobB shouldn't be executed. Additionally, JobB requires an artifact from JobB to perform this check before the cleanup process, and this artifact is kinda necessary for this crucial cleanup operation.

Here's my Gitlab CI Template:

stages:
- scan
image: <ecr_image>
.send_event:
script: |
function send_event_to_eventbridge() {
event_body='[{"Source":"gitlab.pipeline", "DetailType":"cleanup_process_testing", "Detail":"{\"exec_test\":\"true\", \"gitlab_project\":\"${CI_PROJECT_TITLE}\", \"gitlab_project_branch\":\"${CI_COMMIT_BRANCH}\"}", "EventBusName":"<event_bus_arn>"}]'
echo "$event_body" > event_body.json
aws events put-events --entries file://event_body.json --region 'ap-south-1'
}
clone_repository:
stage: scan
variables:
REPO_NAME: "<repo_name>"
tags:
- $DEV_RUNNER
script:
- echo $EVENING_EXEC
- printf "executing secret scans"
- git clone --bare
- mkdir ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result
- export SCAN_START_TIME="$(date '+%Y-%m-%d:%H:%M:%S')"
- ghidorah scan --datastore ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/datastore --blob-metadata all --color auto --progress auto $REPO_NAME.git
- zip -r ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/datastore.zip ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/datastore
- ghidorah report --datastore ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/datastore --format jsonl --output ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}-${SCAN_START_TIME}_report.jsonl
- mv ${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/datastore /tmp
- aws s3 cp ./${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result s3://sast-scans-bucket/ghidorah-scans/${REPO_NAME}/${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}/${SCAN_START_TIME} --recursive --region ap-south-1 --acl bucket-owner-full-control
- echo "ghidorah-scans/${REPO_NAME}/${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}/${SCAN_START_TIME}/${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}-${SCAN_START_TIME}_report.jsonl" > file_path # required to use this in another job
artifacts:
when: on_success
expire_in: 20 hours
paths:
- "${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}_secret_result/${CI_PROJECT_TITLE}-${CI_COMMIT_BRANCH}-*_report.jsonl"
- "file_path"
#when: manual
#allow_failure: false
rules:
- if: $EVENING_EXEC == "false"
when: always
perform_tests:
stage: scan
needs: ["clone_repository"]
#dependencies: ["clone_repository"]
tags:
- $DEV_RUNNER
before_script:
- !reference [.send_event, script]
script:
- echo $EVENING_EXEC
- echo "$CI_JOB_STATUS"
- echo "Performing numerous tests on the previous job"
- echo "Check if the previous job has successfully uploaded the file to AWS S3"
- aws s3api head-object --bucket sast-scans-bucket --key `cat file_path` || FILE_NOT_EXISTS=true
- |
if [[ $FILE_NOT_EXISTS = false ]]; then
echo "File doesn't exist in the bucket"
exit 1
else
echo -e "File Exists in the bucket\nSending an event to EventBridge"
send_event_to_eventbridge
fi
rules:
- if: $EVENING_EXEC == "true"
when: always
#rules:
#- if: $CI_COMMIT_BRANCH ==
"test_pipeline_branch"
# when: delayed
# start_in: 5 minutes
#rules:
# - if: $CI_PIPELINE_SOURCE == "schedule"
# - if: $EVE_TEST_SCAN == "true"https://gitlab-ci-token:$[email protected]/testing/$REPO_NAME.git

Now the issue I am facing with the above gitlab CI example template is that, I've created two scheduled pipelines for the same branch where this gitlab CI template resides, now both the scheduled jobs have 8 hours of gap between them, Conditions that I am using above is working fine for the JobA i.e., when the first pipeline runs it only executes the JobA not the JobB, but when the second pipeline runs it executes JobB not JobA but also the JobB is not able to fetch the artifacts from JobA.

Previously I've tried using \`rules:delayed\` with \`start\_in\` time and it somehow puts the JobB in pending state but later fetches the artifact successfully, however in my use case, the runner is somehow set to execute any jobs either in sleep state or pending state once it exceeds the timeout policy of 1 hour which is not the sufficient time for JobB, JobB requires at least a gap of 12-14 hours before starting the cleanup process.

https://redd.it/1i8driq
@r_devops
Apple DevOps Interview

Hi I have a DevOps Engineer 60 min Interview with Hiring Manager coming up coming up for AI/ML team, wondering how to best prepare? Pls share any advice. Thank you in advance.

https://redd.it/1i8fi86
@r_devops
LGTM Stack with TF for AWS Infrastructure with Application Integration Running on AWS ECS Fargate

I'm looking for someone who has worked on something similar where he/she has integrated Current AWS ECS Fargate Application Infrastructure for Metrics, Logs & Traces using TF only with smooth integration + Dashboard creation as well. Something similar to i shared in my recent post.

- Application Running on AWS ECS Fargate
- Grafana Stack : Grafana Alloy running as Sidecar with ECS Tasks + Loki, Mimir & Tempo running on ECS/EKS and AWS Managed Grafana for smooth SSO Integration with AWS for easy Login
- Grafana Dashboard for Metrics, Logs & Traces using TF as well

Seperate consolidated dashboards for all the API where Metrics, Logs and Traces for each of them are coupled in single dashboard

Deployment using TF Apply only, no clickops approach.


Please let me know if you've done something similar.

Thanks.

https://redd.it/1i8cph4
@r_devops