Reddit DevOps
270 subscribers
9 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How do you leverage your TAM's?

We are multi-cloud, but mostly AWS. We have enterprise accounts but honestly we almost never talk to them except to escalate a ticker, and even that is extremely rare.

What kinds of things do you use a TAM for? I honestly don't even know what I would ask them to support with.

https://redd.it/1jfxvf9
@r_devops
How much traction does SLSA have? With ML pipeline safety trending, is it getting more interest?

I remember there was a big splash a few years ago with Google kicking off a pubic SLSA (Supply-chain Levels for Software Artifacts, it's a mouthful) group. Is anyone actually actively adopting SLSA? Or under pressure to adopt it?


Just looking at public sources, there's a lot of regular activity on https://slsa.dev/, with release 1.1 coming out soon. And I've found some papers that are recently published, and the occasional blog post on the topic. And I did notice a recent small spike in google search queries.


Is there more to it than that? I don't see very many Reddit posts about it at any rate.

https://redd.it/1jg2hak
@r_devops
AWS costs. Save me.

Why does it feel impossible to forecast application hosting prices? I have used AWS calculator and it is like another language.I literally want to host a KeyCloak server and .NET/Postgres RDS calendar scheduling, pdf storage and note taking application that will serve initially 4 people but could serve 5000 active daily users by next year. AWS calculator gives me anywhere between £100 and £20,000 a month.Why isn't there a human guide to these costs? Like "10,000 people transferring x mb per session per day would cost X amount"

https://redd.it/1jg3154
@r_devops
I’ve applied to over 100 jobs with no luck. Can you please roast my resume?

What’s wrong with my resume? I have yet to receive any positive responses from the companies I’ve applied to. I would appreciate some feedback. Thanks in advance!

Here’s my resume: https://imgur.com/a/akSS1FL

https://redd.it/1jg33yr
@r_devops
Newbie to DevOps here - General advice requested

Hi. I'm starting with DevOps and would like to do a Proof of Concept deployment of an application to experiment and learn.

The application has 3 components (frontend, backend and keycloak) which can be deployed as containers. The data tier is implemented through an PostgreSQL database.

There is not development involved for the components. The application is an integration of existing components.

We are using GitLab with Ultimate licenses and target AWS for the deployment.

We would like to deploy on a Kubernetes cluster using AWS EKS service. For the database we want to use Aurora RDS for postgresql.

The deployment will be replicated in 4 environments (test, uat, stage, production), each of them with different sizing for the components (e.g. number of nodes in the kubernetes cluster, number of availability zones, size of the ec2 instances...). Each of those environments is implemented in a different AWS account, all of them part of the same AWS Organization.

In our vision we will have a pipeline that will have 4 jobs, each of them deploying the infrastructure components in the relevant AWS account using terraform. The first job (deploy to test) is triggered by a commit on the main branch. And the rest are triggered manually with the success of the previous as requisite.

And we have some (millions of) doubts... but I will include here only a few of them:

1. GitLab groups/projects: a single project for everything or should we have a group including then a project for the infrastructure and another for the deployment of the application? Or it is better to organize it in a complete different way.

2. Kubernetes/EKS: a single cluster per environment or a cluster per component (e.g. frontend, backend, keycloak...)?

3. Helm: we plan to do the deployment on the kubernetes cluster using helm charts. Any thoughts on that?


Thanks in advance to everybody reading this and trying to help!

https://redd.it/1jg1jbm
@r_devops
Any Dev or User Experience with CoreWeave or Nebius for AI/ML Workloads?

I’m curious to hear about your experience—good or bad—as a developer or user working with CoreWeave or Nebius, especially for AI or machine learning workloads.
• How’s the developer experience (e.g., SDKs, APIs, tooling, documentation)?
• What’s the user experience like in terms of performance, reliability, and support?
• How do they compare in cost, scalability, and ease of integration with existing ML pipelines?
• Anything you love or hate about either platform?

Would love to hear your insights or compare notes if you’ve used one or both

https://redd.it/1jg90ks
@r_devops
What DevOps project should I build to showcase my skills in interviews?

Not sure if this is the right place to ask, but I recently started a DevOps course, and so far, I’ve learned about Git, Docker, Kubernetes, Helm, and Ansible. I’m looking to build a project that I can showcase in future interviews to demonstrate my skills, but I’m not sure what would be the most impactful.

I searched on ChatGPT for project ideas, and one suggestion was:
• A scalable web platform: Deploying a web app using Terraform, Kubernetes, and Docker, with CI/CD pipelines, load balancing, and monitoring.

While this sounds interesting, I’m not sure if it would be enough to stand out. If you were interviewing a DevOps candidate, what kind of projects would impress you? What real-world problems should I try to tackle to make my project more relevant?

Any advice or recommendations would be greatly appreciated!


https://redd.it/1jgchw8
@r_devops
Open-source for On-Call Solution?

We’ve been working on **Versus Incident**, an open-source incident management tool that supports alerting across multiple channels with easy custom messaging. Now **we’ve added on-call support with AWS Incident Manager integration**! 🎉

This new feature lets you escalate incidents to an on-call team if they’re not acknowledged within a set time. Here’s the rundown:

* **AWS Incident Manager Integration**: Trigger response plans directly from Versus when an alert goes unhandled.
* **Configurable Wait Time**: Set how long to wait (in minutes) before escalating. Want it instant? Just set wait\_minutes: 0 in the config.
* **API Overrides**: Fine-tune on-call behavior per alert with query params like `?oncall_enable=false` or `?oncall_wait_minutes=0`.
* **Redis Backend**: Use Redis to manage states, so it’s lightweight and fast.

Here’s a quick peek at the config:

oncall:
enable: true
wait_minutes: 3 # Wait 3 mins before escalating, or 0 for instant
aws_incident_manager:
response_plan_arn: ${AWS_INCIDENT_MANAGER_RESPONSE_PLAN_ARN}

redis:
host: ${REDIS_HOST}
port: ${REDIS_PORT}
password: ${REDIS_PASSWORD}
db: 0

I’d love to hear what you think! Does this fit your workflow? Thanks for checking it out—I hope it saves someone’s bacon during a 3 AM outage! 😄.

Check here: [https://versuscontrol.github.io/versus-incident/on-call-introduction.html](https://versuscontrol.github.io/versus-incident/on-call-introduction.html)

https://redd.it/1jgdljl
@r_devops
Gitlab project domain transfer

Hi there,

I'm a start up owner (don't worry, service biz, not AI bollocks) and I'm very stuck with some gitlab stuff. If someone can help out / do this for me, I'm also very happy to pay. Our current software devs are far too busy on our current project to help with it and the previous dev who built our system doesn't work on this kind of stuff any more as he's set up a new biz.

We have

\- a website

\- a booking form

\- a staff app

\- an admin panel

\- digital reports for our customers

all of these are hosted on the same domain which is the problem

i.e.

domain.com

domain.com/booking

domain.com/admin

domain.com/reports


We have a new website built in webflow that we can't publish on domain.com because it crashes all the above as there's nowhere pointing to them once we host the domain on webflow.

We either need to move all of the above to subdomains i.e. booking.domain.com or to copy the project and host them on webflow or something.

I have very entry level database knowledge and maybe I'm looking at this totally wrong, but we are dying to launch our website and are stuck in the meantime. We're actually building out a whole new system that will replace all of the above, but it's not ready yet. So all of this would be a temporary fix until it is so we can at least publish our new website.

Here's hoping the above isn't complete gibberish. Thanks all.

https://redd.it/1jgdrfh
@r_devops
DevOps/Platform recommended reading


Hi. Am looking for any current recommended reads around the devops/ platform area. Wondered if books like Accelerate or Continuous Delivery are still current enough to be a valuable read without being too dated. Have read Phoenix project and The DevOps Handbook so anything in that vein would be good. Thank you!

https://redd.it/1jgdi4v
@r_devops
Azure API - too many requests issue.

I am trying to fetch the cost & the sub for which it is in a certain limit , like under 5 $, May you guys please take a look, how can i optimize this. I have already fetched the sub ID in a different txt file & importing those here in this script. Taken help from co pliot as well



import requests
import pandas as pd
import time
import random
import ssl
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from datetime import datetime

# Azure Credentials
TENANTID = "x"
CLIENT
ID = "x"
CLIENTSECRET = "x"
# File containing subscription IDs
SUBSCRIPTIONS
FILE = "subscriptions.txt"
# Exclude specific subscriptions
EXCLUDEDNAMES = ["visual studio", "suscripción de visual studio", "mpn", "pay-as-you-go"]

# Azure Endpoints
TOKEN
URL = f"https://login.microsoftonline.com/{TENANTID}/oauth2/v2.0/token"
# Force TLS 1.2+ to prevent SSL errors
ssl
context = ssl.createdefaultcontext()
sslcontext.setciphers('DEFAULT:@SECLEVEL=1')

# Configure Requests session with retries
session = requests.Session()
retries = Retry(
total=3,
backofffactor=5, # Increase delay between retries
status
forcelist=429, 500, 502, 503, 504 # Retry on rate limits and server errors
)
session.mount("https://", HTTPAdapter(maxretries=retries))

# Get Access Token
def get
accesstoken():
data = {
"grant
type": "clientcredentials",
"client
id": CLIENTID,
"client
secret": CLIENTSECRET,
"scope": "
https://management.azure.com/.default"
}
response =
session.post(TOKENURL, data=data)
response.raiseforstatus()
return response.json()"access_token"

# Read subscription IDs from file
def readsubscriptionids():
with open(SUBSCRIPTIONSFILE, "r") as file:
return [line.strip() for line in file.readlines() if line.strip()]

# Get cost details for multiple subscriptions in a batch
def get
costsforsubscriptions(subscriptionids, token):
results = []
failed
subscriptions =

BATCHSIZE = 5 # Batch size to avoid Azure rate limits
for i in range(0, len(subscription
ids), BATCHSIZE):
batch = subscription
idsi:i + BATCH_SIZE

for subid in batch:
COST
URL = f"https://management.azure.com/subscriptions/{subid}/providers/Microsoft.CostManagement/query?api-version=2023-03-01"
headers = {"Authorization": f"Bearer {token}"}

cost
query = {
"type": "ActualCost",
"timeframe": "Custom",
"timePeriod": {
"from": "2025-02-01T00:00:00Z",
"to": "2025-02-28T23:59:59Z"
},
"dataset": {
"granularity": "None",
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
}
}
}

for attempt in range(3): # Retry max 3 times
try:
response = session.post(COSTURL, headers=headers, json=costquery)

if response.statuscode == 429:
wait = 5 ** attempt + random.uniform(1, 3) # Exponential backoff
print(f"🔁 429 Too Many Requests for {sub
id}. Retrying in {wait:.2f}s...")
time.sleep(wait)
continue # Retry request
elif response.statuscode == 400:
print(f" 400 Bad Request for {sub
id}. Skipping...")
failedsubscriptions.append({"Subscription ID": subid,
Azure API - too many requests issue.

I am trying to fetch the cost & the sub for which it is in a certain limit , like under 5 $, May you guys please take a look, how can i optimize this. I have already fetched the sub ID in a different txt file & importing those here in this script. Taken help from co pliot as well



import requests
import pandas as pd
import time
import random
import ssl
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from datetime import datetime

# Azure Credentials
TENANT_ID = "x"
CLIENT_ID = "x"
CLIENT_SECRET = "x"
# File containing subscription IDs
SUBSCRIPTIONS_FILE = "subscriptions.txt"
# Exclude specific subscriptions
EXCLUDED_NAMES = ["visual studio", "suscripción de visual studio", "mpn", "pay-as-you-go"]

# Azure Endpoints
TOKEN_URL = f"https://login.microsoftonline.com/{TENANT_ID}/oauth2/v2.0/token"
# Force TLS 1.2+ to prevent SSL errors
ssl_context = ssl.create_default_context()
ssl_context.set_ciphers('DEFAULT:@SECLEVEL=1')

# Configure Requests session with retries
session = requests.Session()
retries = Retry(
total=3,
backoff_factor=5, # Increase delay between retries
status_forcelist=[429, 500, 502, 503, 504] # Retry on rate limits and server errors
)
session.mount("https://", HTTPAdapter(max_retries=retries))

# Get Access Token
def get_access_token():
data = {
"grant_type": "client_credentials",
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"scope": "https://management.azure.com/.default"
}
response = session.post(TOKEN_URL, data=data)
response.raise_for_status()
return response.json()["access_token"]

# Read subscription IDs from file
def read_subscription_ids():
with open(SUBSCRIPTIONS_FILE, "r") as file:
return [line.strip() for line in file.readlines() if line.strip()]

# Get cost details for multiple subscriptions in a batch
def get_costs_for_subscriptions(subscription_ids, token):
results = []
failed_subscriptions = []

BATCH_SIZE = 5 # Batch size to avoid Azure rate limits
for i in range(0, len(subscription_ids), BATCH_SIZE):
batch = subscription_ids[i:i + BATCH_SIZE]

for sub_id in batch:
COST_URL = f"https://management.azure.com/subscriptions/{sub_id}/providers/Microsoft.CostManagement/query?api-version=2023-03-01"
headers = {"Authorization": f"Bearer {token}"}

cost_query = {
"type": "ActualCost",
"timeframe": "Custom",
"timePeriod": {
"from": "2025-02-01T00:00:00Z",
"to": "2025-02-28T23:59:59Z"
},
"dataset": {
"granularity": "None",
"aggregation": {
"totalCost": {
"name": "PreTaxCost",
"function": "Sum"
}
}
}
}

for attempt in range(3): # Retry max 3 times
try:
response = session.post(COST_URL, headers=headers, json=cost_query)

if response.status_code == 429:
wait = 5 ** attempt + random.uniform(1, 3) # Exponential backoff
print(f"🔁 429 Too Many Requests for {sub_id}. Retrying in {wait:.2f}s...")
time.sleep(wait)
continue # Retry request
elif response.status_code == 400:
print(f" 400 Bad Request for {sub_id}. Skipping...")
failed_subscriptions.append({"Subscription ID": sub_id,
"Error": "400 Bad Request"})
break # Stop retrying on 400 errors
response.raise_for_status()
data = response.json()
rows = data.get("properties", {}).get("rows", [])

if rows:
cost = rows[0][0]
if cost < 5:
print(f" {sub_id} has low spend: ${cost}")
results.append({"Subscription ID": sub_id, "Monthly Spend ($)": cost})
break # Exit retry loop if successful
except requests.exceptions.SSLError as e:
print(f"⚠️ SSL Error on {sub_id}: {e}. Retrying in 5s...")
time.sleep(5)

except requests.exceptions.RequestException as e:
print(f" Failed to fetch cost for {sub_id}: {e}")
failed_subscriptions.append({"Subscription ID": sub_id, "Error": str(e)})
break # Stop retrying
time.sleep(2) # Slower request rate to prevent rate limiting
return results, failed_subscriptions

# Main execution
if __name__ == "__main__":
print("🔄 Fetching Azure costs for February (subscriptions under $5)...")

token = get_access_token()
subscriptions = read_subscription_ids()

results, failed_subscriptions = get_costs_for_subscriptions(subscriptions, token)

# Export results to Excel
if results:
df = pd.DataFrame(results)
filename = f"low_cost_subscriptions_{datetime.now().strftime('%Y%m%d_%H%M%S')}.xlsx"
df.to_excel(filename, index=False)
print(f"\n Exported low-cost subscriptions to: {filename}")

if failed_subscriptions:
df_fail = pd.DataFrame(failed_subscriptions)
fail_filename = f"failed_subscriptions_{datetime.now().strftime('%Y%m%d_%H%M%S')}.xlsx"
df_fail.to_excel(fail_filename, index=False)
print(f"\n⚠️ Exported failed subscriptions to: {fail_filename}")





https://redd.it/1jgh69t
@r_devops
Anyone build their own peronal CI/CD pipeline before?

Hello fellow devops engineers, has anyone ever tried to develop a basic self-hosted CI/CD pipeline before?

https://redd.it/1jgjpmw
@r_devops
I don't know where to get started

I'm a mid-level DevOps engineer with average Java backend experience, and I've just been assigned to a .NET project at my new company. Since my background is in Java, I honestly have no idea what's going on. The project's documentation isn't clear, and even though my teammates might help, I don’t want to come across as someone who needs to be spoon-fed, especially since I'm new to the team. They gave me a high-level overview of the project, but I'm still confused—I don’t even know which file to build or how to run things locally. Any advice?

https://redd.it/1jgh1xt
@r_devops
Is there a better way to build react production projects as a mono repo?

An interesting repo that landed in my lap today, it is not meant for containerized solution but something native.

The repo is just a bunch of really small plugin-ish type react projects all configured with vite. A total of 20 such small plugins and the final artifact to generate was all of the project's production-ready distribution dirs bundled as a final tarball.

CI/CD: Gitlab-CI and push the generated artifacts to Artifactory.

Repo structure is as follows:

repo_root/
plugins/
example-1-plugin/
...
example-20-plugin/


I made a simple Makefile

PLUGINS := example-1 example-2 ... example-20

all: $(PLUGINS)

$(PLUGINS):
npm install --prefix=plugins/$@-plugin/
npm build run --prefix=plugins/$@-plugin/


this will recursively build the projects with a caveat that it will keep installing vite for each and every plugin locally.

In order to avoid redudantly pulling vite everytime I used npm link on installed node_modules in order to symlink the already existing vite vite-react-swc tailwind stuff.

$(PLUGINS):
npm install --prefix=plugins/$@-plugin/ && \
npm link --prefix=plugins/$@-plugin && \
npm link --prefix=plugins/$@-plugin vite vite-react-swc && \
npm run build --prefix=plugins/$@-plugin/


which reduced the build times for me.

Granted this is not by a long shot a good repo structure and neither could I deem it as a monorepo of sorts but this was what handed to me to work with and it got the job done.

Any recommendations, comments on things I can improve, take care or refactor when working with such an npm node scenario.

https://redd.it/1jgnn7u
@r_devops
GitHub Actions Supply Chain Attack: A Targeted Attack on Coinbase Expanded to the Widespread tj-actions/changed-files Incident

The original compromise of the tj-actions/changed-files GitHub action reported last week was initially intended to specifically target Coinbase. After they mitigated it, the attacker initiated the Widespread attack.
https://unit42.paloaltonetworks.com/github-actions-supply-chain-attack/

https://redd.it/1jgob6a
@r_devops
Got a new role in DevOps but need advice since my background is sysadmin

Just received an offer for a full time devops engineer but my background is in linux/sysadmin for the past 4 years. I will say that I was very stagnant in my previous position and instead of learning and developing it was constant firefighting and due to the unstable nature of the job market I was reluctant to look for a new job.

A recruiter reached out to me with this opportunity and even though my experience was limited I still had working knowledge of Jenkins/Datadog but nothing related to docker and AWS but still went ahead and impressed them in the interview process that they gave me an offer. I want to really succeed in this position and just need help where I need to upskill/focus new tools to hit the ground running and keep up.

https://redd.it/1jgpd17
@r_devops
No-code platform for easy editing, responsiveness, and Figma integration

Hey everyone! How’s it going?

I’m a UX Designer, and I’m facing a problem that I believe you might be able to help me with. I design interfaces for an education network, and since we have multiple products, each with its own website, our development team struggled to implement basic updates and improvements. Simple requests, like changing images, text, or buttons, would take days to be completed.

Because of this, management decided to move our websites to a no-code or more user-friendly platform (I was against this decision) and chose WIX as the solution. The issue is that WIX has terrible integration with Figma. Every time I try to import a project, it breaks and comes with a lot of bugs. My only option is to design in Figma and then manually rebuild everything on the platform, which creates a huge amount of extra work. On top of that, the projects become heavy, and I have to fine-tune every little detail using prebuilt elements and templates, which significantly limits customization.

Another major issue is mobile responsiveness. WIX requires manual adjustments on almost every screen, and even then, the final result is far from optimized, which negatively impacts the user experience. Additionally, the platform is incredibly slow for basic tasks like aligning elements and adjusting spacing, making the editing process even more frustrating.

Do you know of any platform similar to WIX that integrates well with Figma, is easy to edit for someone with little coding knowledge, and offers better mobile responsiveness?

https://redd.it/1jgrabw
@r_devops
No-code platform for easy editing, responsiveness, and Figma integration

Hey everyone! How’s it going?

I’m a UX Designer, and I’m facing a problem that I believe you might be able to help me with. I design interfaces for an education network, and since we have multiple products, each with its own website, our development team struggled to implement basic updates and improvements. Simple requests, like changing images, text, or buttons, would take days to be completed.

Because of this, management decided to move our websites to a no-code or more user-friendly platform (I was against this decision) and chose WIX as the solution. The issue is that WIX has terrible integration with Figma. Every time I try to import a project, it breaks and comes with a lot of bugs. My only option is to design in Figma and then manually rebuild everything on the platform, which creates a huge amount of extra work. On top of that, the projects become heavy, and I have to fine-tune every little detail using prebuilt elements and templates, which significantly limits customization.

Another major issue is mobile responsiveness. WIX requires manual adjustments on almost every screen, and even then, the final result is far from optimized, which negatively impacts the user experience. Additionally, the platform is incredibly slow for basic tasks like aligning elements and adjusting spacing, making the editing process even more frustrating.

Do you know of any platform similar to WIX that integrates well with Figma, is easy to edit for someone with little coding knowledge, and offers better mobile responsiveness?

https://redd.it/1jgra8h
@r_devops
"devops"->"DevOps" on Linkedin gave 100,000+ more results

I've been looking for a new job for a few weeks now and decided to look for devops roles on LinkedIn. Typed in "devops" and got like few thousand results.. felt pretty down.

I've been working with Linkedin API and by complete accident I capitalized it to "devops"->"DevOps" and HOLY SHIT - 110,000+ JOBS APPEARED OUT OF NOWHERE! 🤯
This piece of crap website is case sensitive no wonder I saw no results in UI.

https://ibb.co/9BvWDPK vs. https://ibb.co/fYdLJWgC
anyway my side project is devops market analysis tool. I did a UI for it and there results are matching I got few other stats too, gonna keep it updated prepare.sh/trends/devops

https://redd.it/1jgx2mt
@r_devops