Reddit DevOps
270 subscribers
2 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
What issues do you usually have with splunk or other alerting platforms?

Yo software developer here wanted to know what kind of issues people might have with splunk are there any pain points you are facing? One issue my team is having is not being able to get alerts on time due to our internal splunk team limiting alerts to a 15 minute delay. Doesn't seem like much but our production support team flips out every time it happens

https://redd.it/1lteuuf
@r_devops
DevOps Azure Checkbox Custom Field

I feel I am losing my nut...

I want to add Custom Fields to my Bug Tickets & User Story tickets, but I want them to be checkboxes. The only option I have found is this one:
https://stackoverflow.com/questions/74994552/azure-devops-work-item-custom-field-as-checkbox

But it has really odd behaviour that is outside of simply checkboxes.

The reason I do not want toggles is because I do not want an "Off" or "False" state as a visible option, I want users to update the checkbox to be checked if the option is applicable.

Surely there is a way to have a simple checkbox custom field on a work type item?

I am sure this has likely been asked a billion times, but my googling skills are letting me down, as I either get the same responses, or irrelevant responses.

Cheers

https://redd.it/1ltdg2p
@r_devops
Advice for CI/CD with Relational DBs

Hey there folks!

Most of the the Dbs I've worked with in the past have been either non relational or laughably small PG DBs. I'm starting on a project that's going to be reliant on a much heavier PG db in AWS. I don't think my current approaches are really viable for a big boy relational setup.

So if any of you could shed some light on how you approach handling your DB's I'd very much appreciate it.

Currently I use Prisma, which works but I don't think is optimal. I'd like to move away from ORMs. I've been eying Liquibase.

https://redd.it/1ltcylo
@r_devops
Separate pipeline for application configuration? Or all in IaC?

I'm working in the AWS world, and using CloudFormation + SAM Templates, and have API endpoints, Lambda functions, S3 Buckets and configuration all in the one big template.



Initially was working with a configuration file in DEV and now want to move these parameters over to Param Store in AWS, but the thought of adding these + tagging (required in our company) for about 30 parameters just makes me feel like I'm catastrophically flooding the template with my configuration.



The configuration may change semi regularly, outside of the code or any other infra, and would be pushed through the pipeline to release.



Is anyone out there running a configuration pipeline to release config changes? On one side it feels like overkill, on the other side it makes sense to me.



What's your opinions please brains trust?

https://redd.it/1ltjqmz
@r_devops
Canary Deployment Strategy with Third-Party Webhooks

We're setting up canary deployments in our multi-tenant architecture and looking for advice.

Our current understanding is that we deploy a v2 of our code and route some portion of traffic to it. Since we're multi-tenant, our initial plan was to route entire tenants' traffic to the v2 deployment.

However, we have a challenge: third-party tools send webhooks to our Azure function apps, which then create jobs in Redis that are processed by our workers. Since we can't keep changing the webhook endpoints at the third-party services, this creates a problem for our canary strategy.

Our architecture looks like:

* Third-party services → Webhooks → Azure Function Apps → Redis jobs → Worker processing

How do you handle canary deployments when you have external webhook dependencies? Any strategies for ensuring both v1 and v2 can properly process these incoming webhook events?Canary Deployment Strategy with Third-Party Webhooks

Thanks for any insights or experiences you can share!

https://redd.it/1ltmjre
@r_devops
Can lambda inside a vpc get internet access without nat gateway?

Guys, I have a doubt in devops.
Can a lambda inside a vpc get internet access without nat gateway
Note:I need to connect my private rds and I can't make it public and I can't use nat instance as well

https://redd.it/1ltpqvu
@r_devops
Struggling to put two instances in targetid for alb module?

Do i need to create a different alb
targetgroupattachment resource block associating it with the alb module?

https://redd.it/1ltssey
@r_devops
Made a huge mistake that cost my company a LOT – What’s your biggest DevOps fuckup?

Hey all,

Recently, we did a huge load test at my company. We wrote a script to clean up all the resources we tagged at the end of the test. We ran the test on a Thursday and went home, thinking we had nailed it.

Come Sunday, we realized the script failed almost immediately, and none of the resources were deleted. We ended up burning $20,000 in just three days.

Honestly, my first instinct was to see if I can shift the blame somehow or make it ambiguous, but it was quite obviously my fuckup so I had to own up to it. I thought it'd be cleansing to hear about other DevOps' biggest fuckups that cost their companies money? How much did it cost? Did you get away with it?


https://redd.it/1ltuz99
@r_devops
Is there some way to get 10$ AWS credits as a student?

Hey everyone!

I'm a student currently learning AWS and working on DevOps projects like Jenkins pipelines, Elastic Load Balancers, and EKS. I've already used up my AWS Free Tier, and I just need around $10 in credits to test my deployments for an hour or two and take screenshots for my resume/blog.

I’ve tried AWS Educate, but unfortunately it didn’t work out in my case. I also applied twice for the AWS Community Builders program, but got rejected both times.

Is there any other way (like student programs, sponsorships, or community grants) to receive a small amount of credits to continue building and learning?

I'd be really grateful for any suggestions — even a little support would go a long way in helping me continue this journey.

Thanks so much in advance! 🙏

https://redd.it/1ltuqjm
@r_devops
Set up real-time logging for AWS ECS using FireLens and Grafana Loki

I recently set up a logging pipeline for ECS Fargate using FireLens (Fluent Bit) and Grafana Loki. It's fully serverless, uses S3 as the backend, and connects to Grafana Cloud for visualisation.

I’ve documented the full setup, including task definitions, IAM roles, and Loki config, plus a demo app to generate logs.

Full details here if anyone’s interested: https://medium.com/@prateekjain.dev/logging-aws-ecs-workloads-with-grafana-loki-and-firelens-2a02d760f041?sk=cf291691186255071cf127d33f637446

https://redd.it/1ltxvni
@r_devops
requesting advice for Personal Project - Scaling to DevOps

TL;DR - I've built something on my own server, and could use a vector-check if what I believe my dev roadmap looks like makes sense. Is this a 'pretty good order' to do things, and is there anything I'm forgetting/don't know about.

---------------------------------

Hey all,

I've never done anything in a commercial environment, but I do know there is difference between what's hacked together at home and what good industry code/practices should look like. In that vein, I'm going along the best I can, teaching myself and trying to design a personal project of mine according to industry best practices as I interpret what I find via the web and other github projects.

Currently, in my own time I've setup an Ubuntu server on an old laptop I have (with SSH config'd for remote work from anywhere), and have designed a web-app using python, flask, nginx, gunicorn, and postgreSQL (with basic HTML/CSS), using Gitlab for version control (updating via branches, and when it's good, merging to master with a local CI/CD runner already configured and working), and weekly DB backups to an S3 bucket, and it's secured/exposed to the internet through my personal router with duckDNS. I've containerized everything, and it all comes up and down seamlessly with docker-compose.

The advice I could really use is if everything that follows seems like a cohesive roadmap of things to implement/develop:

Currently my database is empty, but the real thing I want to build next will involve populating it with data from API calls to various other websites/servers based on user inputs and automated scraping.

Currently, it only operates off HTTP and not HTTPS yet because my understanding is I can't associate an HTTPS certificate with my personal server since I go through my router IP. I do already have a website URL registered with Cloudflare, and I'll put it there (with a valid cert) after I finish a little more of my dev roadmap.

Next I want to transition to a Dev/Test/Prod pipeline using GitLab. Obviously the environment I've been working off has been exclusively Dev, but the goal is doing a DevEnv push which then triggers moving the code to a TestEnv to do the following testing:
Unit, Integration, Regression, Acceptance, Performance, Security, End-to-End, and Smoke.

Is there anything I'm forgetting?

My understanding is a good choice for this is using pytest, and results displayed via allure.

Should I also setup a Staging Env for DAST before prod?

If everything passes TestEnv, it then either goes to StagingEnv for the next set of tests, or is primed for manual release to ProdEnv.

In terms of best practices, should I .gitlab-ci.yml to automatically spin up a new development container whenever a new branch is created?

My understanding is this is how dev is done with teams.
Also, Im guessing theres "always" (at least) one DevEnv running obviously for development, and only one ProdEnv running, but should a TestEnv always be running too, or does this only get spun up when there's a push?

And since everything is (currently) running off my personal server, should I just separate each env via individual .env.dev, .env.test, and .env.prod files that swap up the ports/secrets/vars/etc... used for each?

Eventually when I move to cloud, I'm guessing the ports can stay the same, and instead I'll go off IP addresses advertised during creation.

When I do move to the cloud (AWS), the plan is terraform (which I'm already kinda familiar with) to spin up the resources (via gitlab-ci) to load the containers onto. Then I'm guessing environment separation is done via IP addresses (advertised during creation), and not ports anymore.
I am aware there's a whole other batch of skills to learn regarding roles/permissions/AWS Services (alerts/cloudwatch/cloudtrails/cost monitoring/etc...) in this, maybe some AWS certs (Solutions Architect > DevOps Pro)

I also plan on migrating everything to kubernetes, and manage the spin up and deployment via helm charts into the cloud, and get into load
balancing, with a canary instance and blue/green rolling deployments. I've done some preliminary messing around with minikube, but will probably also use this time to dive into CKA also.

I know this is a lot of time and work ahead of me, but I wanted to ask those of you with real skin-in-the-game if this looks like a solid gameplan moving forward, or you have any advice/recommendations.

https://redd.it/1ltz902
@r_devops
Need Help with Cloud Server Scheduling Setup

In our organization, we manage infrastructure across **three cloud platforms**: **AWS, Azure, and GCP**. We have **production, development, and staging servers** in each.

* **Production servers** run 24/7.
* **Development and staging servers** run based on a **scheduler**, from **9:00 AM to 8:00 PM**, Monday to Friday.

# Current Setup:

We are using **scheduler tags** to automate start/stop actions for dev and staging servers. Below are the tags currently in use:

* `5-sch` (9 AM to 5 PM)
* `in-sch` (9 AM to 8 PM)
* `10-sch` (9 AM to 10 PM)
* `12-sch` (9 AM to 12 AM)
* `ext-sch` (9 AM to 2 AM)
* `sat-sch` (Saturday only, 9 AM to 8 PM)
* `24-sch` (Always running)

**Issue:**
Developers request tag changes manually based on their working hours. For example, if someone requests a 9 AM to 11 PM slot, we assign the `12-office` tag, which runs the server until 12 AM—**resulting in unnecessary costs**.

# Requirements for a New Setup:

1. **Developer Dashboard:**
* A UI where developers can request server runtime extensions.
* They should be able to select the server, date, and required stop time.
2. **DevOps Approval Panel:**
* Once a request is made, DevOps gets notified and can approve it.
* Upon approval, **automated actions** should update the schedule and stop the server at the requested time.
3. **Automated Start Times:**
* Some servers should start at **8:00 AM**, others at **9:00 AM**.
* This start time should be automatically managed per server.

Is there any **built-in dashboard** or tool that supports this kind of setup across all three clouds? Any suggestions or references would be really helpful.

https://redd.it/1ltzzwa
@r_devops
DiffuCode vs. LLMs. Non-linear code generation workflows

I know it seems to be unclear whether DiffuCode will change the game for software developers, but Mitch Ashley made a good point - "Developers rarely develop software in a linear flow. They design abstractions, objects, methods, microservices and common, reusable code, and often perform significant refactoring, adding functionality along the way." I always thought LLMs were flawed for software development and DevOps, and Apple open-sourcing Diffucode on HuggingFace could be their seriously significant contribution in the AI race
https://devops.com/apples-diffucode-why-non-linear-code-generation-could-transform-development-workflows/

https://redd.it/1ltzw2d
@r_devops
Best aws cdk alternative for multicloud - pulumi?

Im a big fan of aws cdk and want to use something similar for cross cloud especially azure or gcp. From my understanding terraform cdk is not properly supported. What is a good alternative? Pulumi?

https://redd.it/1lu7owc
@r_devops
Backstage - Is it possible to modify something you created with a template using backstage?

Hello everyone!

I'm new to Backstage and I am trying to fully understand what I can and can't do with Backstage. Here is my question: if I deploy any code in a repository, am I able to change it in Backstage without re-creating?

For example, I want to allow our devs to create some resources in AWS using Backstage + IaC, but I wish they could change configs even after they had created the resources. It would really be great if they could open the form again and change just what they want.

Thanks in advance!

https://redd.it/1lublee
@r_devops
GitHub Actions analytics: what am I missing?


How are you actually tracking GitHub Actions costs across your org?

I've been working on a GitHub Actions analytics tool for the past year, and honestly, when GitHub rolled out their own metrics dashboard 6 months ago, I thought I was done for.

But after using GitHub's implementation for a while, it's pretty clear they built it for individual developers, not engineering managers trying to get org-wide visibility. The UX is clunky, you can't easily compare teams or projects.

For those of you managing GitHub Actions at scale - what's been your experience? Are you struggling with the same issues, or have you found workarounds that actually work?

Some specific pain points I've heard:

No easy way to see which teams/repos are burning through your Actions budget
Can't create meaningful reports for leadership
Impossible to benchmark performance across different projects
Zero alerting when costs spike

Currently working on octolense.com to tackle these problems, but curious what other approaches people are taking. Anyone found tools that actually solve the enterprise analytics gap?

https://redd.it/1luc2t4
@r_devops
Setting up a Remote Development Machine for development

Hello everyone. I am kind of a beginner at this but I have been assigned to make an RDM at my office (Software development company). The company wants to minimize the use of laptop within the office as some employees don't have the computing powers for deploying/testing codes. What they expect of the RDM is as follows:

* The RDM will be just one main machine where all the employees (around 10-12) can access simultaneously (given that we already make an account for them on the machine). If 10 is a lot (for 1 machine), then we can have 2 separate RDM's, 5 users on one and 5 on the other

* The RDM should (for now) be locally accessible, making it public is not a need as of now

* Each employee will be assigned his account on the RDM thus every employee can see ONLY their files and folders

Now my question here is, is this achievable? I can't find an online source that has done it this way. The only source I could find that matched my requirements was this:
https://medium.com/@timatomlearning/building-a-fully-remote-development-environment-adafaf69adb7

https://medium.com/walmartglobaltech/remote-development-an-efficient-solution-to-the-time-consuming-local-build-process-e2e9e09720df (This just syncs the files between the host and the server, which is half of what I need)

Any help would be appreciated. I'm a bit stuck here

https://redd.it/1lugiq8
@r_devops
What would be considered as the best achievement to list in a CV for DevOps intern role?

Hi everyone,
I’m currently preparing my CV for DevOps intern applications and I’m wondering — what kind of achievements or experience would actually stand out?

I’ve worked on a few personal projects with Docker, GitHub Actions, and basic CI/CD setups. But I’m not sure how to frame them as solid achievements. Could anyone share examples or tips on what recruiters/hiring managers look for at the intern level?

Thanks in advance!

https://redd.it/1lui32h
@r_devops
Looking for recommendations on SMS and email providers with API and pay-as-you-go pricing

Hi everyone,

I’m developing a software app that needs to send automated SMS and email notifications to customers.

I’m looking for reliable SMS and email providers that:

* offer easy-to-use APIs
* support pay-as-you-go pricing
* provide delivery reports

What providers do you recommend? Any personal experience or advice would be really appreciated!

Thanks in advance!

https://redd.it/1lujeur
@r_devops