Reddit DevOps
270 subscribers
8 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Could Platform Engineering Tactics Solve Some of these Common DevOps challenges?

DevOps has revolutionized software development and deployment, but as the complexity of modern cloud-native technologies increases, it has become evident that the current approach has limitations and inefficiencies. As a technology leader myself, it’s become increasingly clear to me that the traditional role of DevOps may not survive in the future if we can’t overcome our current challenges and struggles to automate.DevOps leaders need to get on board with the latest evolution of DevOps to adapt and overcome some of these challenges in order to keep pace with ever-changing technology demands. The answer is here, and it starts with platform engineering.

Read on for the full blog: https://www.getambassador.io/blog/platform-engineering-solution-common-devops-challenges

https://redd.it/1bq0nxy
@r_devops
Do any of you actually have the capacity to do your jobs correctly? Mostly speaking to more “platform” folks.

In the last year I’ve become very apathetic to process and “rigor”. As long as we can get a change out with confidence that it didn’t break anything, I’ve stopped caring how we get to that point. If something’s broken, I only care that it was fixed. If someone’s misconfigured something, I only care that it’s now configured correctly with no additional thought to how we may be able to build a better process.

I just want to get the job done by any means necessary. We just don’t have the time to do anything else. The team’s been cut to ribbons and our ownership is massive. I want to stay because the upward mobility is good and the job market looks like crap, but it sucks feeling like I’m just keeping the lights on and not improving.

Do any of you actually have the time to do things correctly, automate, and simplify or are you just wading through the muck waiting for your shares to vest?

https://redd.it/1bq59to
@r_devops
Savvy: Create and Share Runbooks directly from your terminal

I built [Savvy](https://getsavvy.so) to help developers automatically create and share runbooks directly from your terminal.


**Savvy Record**

Savvy's [open source CLI](https://github.com/getsavvyinc/savvy-cli) works with any bash or zsh shell and can record your commands and uses AI to automatically create an accurate and easy to follow runbook. `savvy record` automatically expands all aliases to ensure anyone can follow the runbook.

`savvy record --ignore-errors` will ignore any command that returns a non-zero exit code. This is great for creating long runbooks and not have to worry about any typos you make along the way. Checkout our demo here: [https://youtu.be/GzcvGEg6oYc](https://youtu.be/GzcvGEg6oYc)

Savvy makes it really easy create a runbook from your shell history with `savvy record history.` Simply select your shell commands and watch Savvy's CLI do its magic. Here's a demo of `savvy record history` in action: [https://youtu.be/Nk\_NeLjt2Tk](https://youtu.be/Nk_NeLjt2Tk)


**Share and Export Runbooks**


All runbooks are private by default. However, you can easily share runbooks with a private link or make them public. Here's the [public runbook](https://app.getsavvy.so/runbook/rb_7b74b43c5d61bd57/How-To-Validate-Kubernetes-Root-Certificate) I created from the demo above.


Savvy also allows you to export runbooks to markdown with one click. You can paste the runbook in any document editor like Notion, Coda, Slab, Google Docs etc. that supports markdown.


**Get Started**

* Follow our [Quick Start](https://github.com/getsavvyinc/savvy-cli?tab=readme-ov-file#quick-start) to download the CLI and create your first runbook
* Check out our public roadmap at [feedback.getsavvy.so](https://feedback.getsavvy.so)
* Join our [Discord](https://getsavvy.so/discord)

https://redd.it/1bq70kw
@r_devops
Help on understanding Dev, Staging and Prod Environments

Hey there,

I'm working on a small app with separate backend and frontend components. The backend consists of two containers: a Java API and a PostgreSQL database. These are deployed on AWS using one EC2 instance to host both containers and an S3 bucket primarily for storing assets like images. The frontend is built on React.

I've divided backend and frontend development, and now I'm figuring out how to manage different environments. Here are my questions:

Development/Test Environment: Testing the backend is straightforward on a local machine by running the containers, but what about S3? Should I simulate storage locally or connect to a dev S3 bucket?
Staging Environment: I'm using Terraform to provision staging and production environments. My plan is to create the staging environment when needed and tear it down after testing. Should I use separate S3 buckets for staging to avoid extra costs? Any recommendations for managing staging efficiently?
Production Environment: Once staging tests are successful, updates and fixes need to go to production. I'll use Terraform to check for updates and then update the code with Ansible. Does this approach make sense? Any recommendations for handling production updates?
Terraform Management: Should staging and production Terraform configurations be separate files? If yes, how do I promote changes from staging to production? Also, do I need Terraform for development/testing?

I know it's a lot, but I want to follow best practices. Any advice would be greatly appreciated. Thanks!

https://redd.it/1bq5v44
@r_devops
Have you ever wanted to monitor whether the API response is correct or not?

Hi,

I'm wondering if anyone has had this thought before: having a third-party service that monitors whether your API responses are working as expected. For example:


1. An array/dictionary should never be empty.
2. A field should never be null
3. Timestamps should always be greater than or equal to the current time.

I'm just curious if this is a common need

https://redd.it/1bqdhrx
@r_devops
Grafana + Prometheus on AKS

Hey,

I'm using Managed Prometheus on AKS, and have installed DCGM in order to scrape GPU data. When port forwarding the DCGM, I can see that it is scraping fine. In azure docs, it says that the managed prometheus will automatically pick up any new scraping, however when on Grafana and am adding metrics such as 'DCGM_FI_DEV_GPU_TEMP{instance=\~"${instance}", gpu=\~"${gpu}"}' ( Temperature! ), I get the 'No Data screen'.

Is there any other way I can verify that this managed prometheus is scraping the GPU data correctly?

Thanks.

https://redd.it/1bqe3du
@r_devops
Anyone done hub and spoke networking across AWS and Azure

Any recommendations? I’m going to be new to Azure but 10 YoE with AWS.

Is it going to be worth doing TGW and Azures equivalent? Should I just do site to site vpn and call it?

Any help is appreciated

https://redd.it/1bqg7f2
@r_devops
Retiring Ansible?

Hey guys,

So my company heavily uses Ansible. We run tons of k8s but we have our database and a few other legacy services running directly within VMs.

Said VMs are launched from a custom configured AMI but still some configuration is done via Ansible.

In my interviews with other companies, many have moved away from Ansible only using Terraform.

My question, for those who run VMs outside k8s, how do you configure or setup your VMs using only Terraform?

In later interviews, it came to pass that many of them no longer ran any VMs outside of k8s, which allowed them to retire Ansible. That said, I'm curious if others have seen or done differently in their own experience.

I have used Terraform for a few projects but not full GitOps where my entire infra is managed by Terraform.

This post is an attempt at discovering maybe a piece of the infra or usage of Terraform I don't fully grasp or am I aware. I don't know, what I don't know I guess lol.

Thanks

https://redd.it/1bqs1o3
@r_devops
What do you look for in case studies?

Hey all! I'm slightly new to the dev world and have been tasked with writing a case study for a data management solution. I've been reading through examples but I feel like they all say very similar things, and so I'm finding it difficult to understand what might set them apart to readers.

I'm hearing a lot that case studies are very valuable, but is it simply the fact of having one, no matter how generic, that is the important part, or are there specific things readers want to see?

https://redd.it/1bqszay
@r_devops
Is DevOps role near to dead ?

There used to be a time when DevOps was booming now we rarely see any opening for DevOps and SRE.


What your thoughts?

https://redd.it/1bqwaig
@r_devops
What should I brush up on when it comes to Infrastructure provisioning and automation?

I have an interview coming up for a SWE internship with a team that works on provisioning, and I thought it would be best to ask you all what you think I should touch up on before the interview.

My previous internship involved working with Docker, k8, and micro-services in general. So I am assuming those are things Ill need to refresh.

https://redd.it/1bqv2eq
@r_devops
Build agent (runner) with option to execute as regular user

I searched for such option but surprisingly I wasn’t able to find one. Gitlab has an open issue for allowing their windows runners be installed without elevated access but that’s the only one I found. Has anyone used windows build agent that has an option to run as regular user? Thanks.

https://redd.it/1bqv14l
@r_devops
Senior advice

I'm in a weird spot. I manage a Jenkins instance and automation that oversees hundreds of millions of dollars in Revenue thousands of builds and mobile application deployments. I do ad hoc projects lots of scripting, lots of code, and application consultation for Developers. I have very rudimentary Cloud skills and the closest I got to infrastructure as code was Windows desired state. I feel like I could pick all of these skills up very quickly but I feel like if I ever got laid off I would not be able to get into another devops role without them. I mostly just like to learn theory and how to do things, I don't really care about tools as they are just an abstraction of the process.

I have managed doctor I've never managed kubernetes. Am I fucked if I get fired?

https://redd.it/1bqzt02
@r_devops
Certification Exam Nightmare w PSI

Recently I attempted to take a Consul Associate Exam with PSI Services from Hashicorp. I clicked on the Launch button, and it immediately opens up a psiexam:// URL, which the browser cannot understand (Firefox on both macOS or Windows 10). Apparently, there's supposed to be this PSI Secure browser installed, but there's no documented requirement for this. It seems like it should be installed through the Launch process, but that is not happening.

When I attempted to get support (calling to reach an actual human agent), the text exam window expired, and I was marked as absent. So the agent said they could not help me. Apparently, I would have to buy multiple exams, get human tech support during one of these exam windows, and hopefully, they would be able to find a solution by remoting into my system.


I am not sure what to do. I am concerned as I think Linux Foundation uses PSI services as well. :'(

https://redd.it/1br21dv
@r_devops
PSI Nightmares and How to handle it

I have used Firefox browser did compatibility checks also this is my third exam with PSI first two exams were smooth.

More than exam PSI will teach you patience and more than exam how to think and react when unexpected issues from exam software crops up....

Do compatibility checks also give all permission and allow all pop ups from PSI before exam otherwise use chrome which is mentioned to be best experience for PSI.

Even I had faced PSI browser issue in between exam last 26 min was left and the session closed by itself in between file edit for launching yaml object.

I followed one simple thing don't panic!

Restart the session it wasn't opening again.

Don't call tech support Linux foundation has limited technical personnel especially PSI procters raise ticket instead, which is going to be resolved in 2-4 days at least things will be in record according to timing of the ticket matching your exam...

I didn't get the session back but check-in process was followed again, I had to restart the pc and uninstalled the PSI browser and reinstall from downloads again to get this session.

I had to do end-procter session otherwise I would be not marked against the attempted questions...which the procter didn't end before the exam timer ends which is great also I mentioned to new procter i didnt get my older session he/she assured that recording is there so PSI will check where the fault is and better raise technical support which I already raised.

My exam was in grading for 3-4 days for which I had raised extra Linux foundation ticket after result timing exceeded 24hrs ...

Both tickets were closed for unexpected psi browser close and Grading still after 24hrs as I passed the exam if it was not given I think if would mentioned for free retake i wouldnt have a case as issue happened in between exam and exam session record shows I am doing needful file edits also ticket was raised....


People facing issue at start of exam which my friend also faced, In their case as longs as exam didn't start 2hrs in your hand + half hour before required check-in time utilise for raising first support ticket otherwise if marked absent due to no-show we should have raised support to justify issue faced then doing troubleshooting by checking psi browser compatibility with different browser......allow all pop-ups don't use VPN and multiple monitor PSI checks this as well also don't have multiple programs opened also mute notifications from external discord reddit or Outlook etc so in between exam it doesn't create hassle don't use virtualistion of any kind or VM to launch exam have good bandwidth wifi UPS if affordable so even if generator kicks in there shouldn't be flap in networks in between exam....


Don't give up also not to be disheartened due to these issues be prepared for next exam do all needful checks read through their documentation initially regarding exams and scenarios you think might affect you to be prepared search any community posting similar issues faced by you...


https://redd.it/1br6skw
@r_devops
Forced from slack to teams

My company pulled the plug on slack after 8 years. We were given a two weeks notice to migrate over 100 integrations and all our alerts.

MS Teams freaked out a couple times and we've had to delete teams channels and recreate them to get our integrations to work. Channels feel like Twitter or social media posts. I can't limit notifications as well or set groups to mention.



Is it wrong to quit just because they took away slack? Anyone else go through this?

https://redd.it/1br7ig1
@r_devops
DevOps Intern Interview

I have an Interview with a relatively small company as a DevOps Intern and they use Terraform. Do you guys know what I would need to know prior to the interview?

https://redd.it/1br9t1e
@r_devops
The Critical Role of Continuous Integration in Agile Software Development

The guide explores how agile transforms software development, making it easier and faster if developers practice test-driven development (TDD) and continuous integration (CI) simultaneously as well as how to take CI to the next level with CodiumAI as well as how it involves deeper integration with practices like Continuous Delivery (CD) and DevOps, enhanced automation, and improved collaboration and efficiency in software teams.

https://redd.it/1branno
@r_devops
(HELP) Currently I am only logging all the requests in my access.log that comes to my squid proxy server. I need to enable response logging that comes from the destination server. Can anyone help?

How can I enable response body in the squid logformat?

https://redd.it/1brb6za
@r_devops
One piece of advice you wish you'd heard sooner?

Mine is pretty basic: it's not worth it to learn a new framework before getting pretty good at one. I wasted a solid year (doing tech support and trying to break into a product team) because I kept changing languages/frameworks/tools. I guess the general advice is 'for the first year, pick a context and stick with it.'

It's a lot easier to learn AWS after you've stuck with Azure for a year solid. It's a lot easier to learn Playwright tests if you have a good grasp of Selenium, rather than switching back and forth as you're first learning.

https://redd.it/1brd69a
@r_devops
How would you devops a minecraft server?

Hi All, Im a software dev trying to get better at devops. Google cloud has a bunch of free credits for 90 days so why not setup a modded minecraft server from scratch

My current setup is a pretty basic terraform where

1. 2 buckets get provisioned (server_files and world_backups)
2. A script checks if the buckets are empty and if they are then local directories are zipped and uploaded to the buckets
3. Compute instance is provisioned and the startup script grabs the latest files from the buckets and runs the server

This works OK but there is probably a better way to do it, I also want to add some more functionality like automatic rolling world dir backups that can be invoked manually if needed.

My first thought would be to add scripts to the server that are invoked with cron and add a lightweight api to allow manual execution or just ssh in when i want to. But atm, updating the server requires completely destroying and recreating it which takes ages and destroys the active world dir. So maybe put that stuff in a docker container so i can rebuild it easily on the server, or maybe do everything with cloud functions that ssh in and run commands

Essentially, it seems like there are a million ways to do this and each has downsides. Looking for some input

cheers

​

https://redd.it/1brdynb
@r_devops