Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Looking for a DevOps Role in the UK – 5+ Years Experience

Hi everyone,

I’m a DevOps Engineer with 5+ years of experience in cloud infrastructure, automation, and security. I have hands-on expertise in:

Cloud Platforms: Azure, AWS
Infrastructure as Code (IaC): Terraform, Bicep, ARM Templates
CI/CD Pipelines: Azure DevOps, GitHub Actions, Jenkins, FluxCD
Containers & Orchestration: Docker, Kubernetes (AKS, EKS)
Monitoring & Logging: Prometheus, Grafana, ELK Stack, Azure Monitor
Security & Compliance: OWASP ZAP, SonarCloud, SAST, DAST
Cloud Cost Optimization: FinOps
Scripting & Automation: PowerShell, Bash, Python

I’m the founder of Nimbus Compute, where I provide DevOps consulting, mentorship, and training. I break down complex DevOps concepts for beginners, using real-life examples and hands-on projects. I also post educational DevOps content on TikTok, helping newcomers navigate the field.

I am currently seeking a DevOps role in the UK and open to opportunities anywhere in the country. I’m ready to start immediately and eager to contribute my expertise to a great team.

If you know of any opportunities or have any advice, please reach out! I appreciate any leads, referrals, or networking connections.

Thanks in advance!

Location: Sheffield, England
Availability: Immediate
LinkedIn: https://linkedin.com/in/ifebuche-omeke-9181b7325


https://redd.it/1ixrxxs
@r_devops
3YOE SRE Struggling to Get Interviews. Resume and Career Advice Needed!

Hey everyone,


I've been applying for entry-level SRE/SWE roles for almost two months, but I haven't heard back from recruiters. I got my first SRE job right after college and now have 3 years of experience in the field.

I enjoy being an SRE, but I do feel stuck in my current company with no growth(I’m doing 70% ops and 30% coding). I've also realized that this field requires deeper domain knowledge than I expected.


I’ve seen a lot of discussion in this subreddit about how SRE isn’t really an entry-level role, which matches what I’m seeing in job postings—most require at least 3+ years of software industry experience.


I’m now considering switching tracks to SWE, since it seems to have more entry-level opportunities and gain some coding experience.


I have two questions:

1. Is there anything wrong with my SRE resume that might be hurting my chances?


2. If I want to gain developer experience to diversify my background. What should I change in my resume to increase my chance for SWE?


My resume is attached. Would really appreciate any feedback! Thank you! 🙏

https://imgur.com/a/XKP9rFZ


https://redd.it/1iy1wph
@r_devops
My company is hiring if you have high clearance and live in Northern Virginia.

Hi, my company is looking to hire junior and mid level DevOps/Cloud engineer. If you have TS/SCI and live in Northern Virginia please dm me.

https://redd.it/1iy4u4q
@r_devops
Does your on-call engineer for the week focus on resolving bugs and operational issues?

In my previous roles whomever was oncall for the given week would be the dedicated engineer to work through resolving any bugs and keeping an eye on our observability. This really helped dedicate time to those not on call to focus on any stories outside of identified issues.

I brought this up to my new team and was shot down with angst. Seems like a reasonable pattern to follow

https://redd.it/1iy6f6x
@r_devops
How Can I Find a Good DevOps Job in the US or Europe?

Hi everyone,

I’ve been working in DevOps for about 5 years, currently leading a DevOps team at a large Russian mall. I’m AWS-certified, and I’ve been working remotely for a while now.

I want to transition to a DevOps job in the US or Europe, but coming from a third-world country, I’m not sure what the best approach is. I see many remote job postings, but I’ve heard that companies are hesitant to hire internationally unless you already have a visa or work authorization.

For those who have made the jump (or hired international DevOps engineers), what’s the best path? Should I focus on remote-first companies, apply to on-site roles and hope for visa sponsorship, or try something else? Any tips on how to stand out as a candidate?

Would appreciate any advice!

https://redd.it/1iy50vt
@r_devops
ec2 and azure vm logs

Security department is telling me we need to retain all logs for X number of days. What are others doing? Do you just send all your logs to cloudwatch in AWS or Azure log analytics ?

https://redd.it/1iy3zjs
@r_devops
How can i truly grow as a fullstack developer in the AI Era?


I’m a solo full-stack developer at my company, managing infrastructure and development with my team lead. While I can deploy applications using Kubernetes, Docker, and other modern tools, I rely heavily on AI (ChatGPT, DeepSeek) to complete tasks. This has made me efficient, but I lack deep technical understanding and struggle to answer in-depth questions, making interviews challenging.

With AI rapidly evolving, I want to future-proof my career. My main concerns:
1. How can I build a deeper understanding of technologies instead of just relying on AI?
2. What skills should I focus on to stay competitive and confident in interviews?
3. Should I transition towards AI-related development, or strengthen core engineering skills?

Looking for advice from experienced developers—how do I break out of this cycle and grow meaningfully?

https://redd.it/1iyc1kw
@r_devops
How to Prevent Ephemeral Storage from Filling Up in AWS Fargate with FireLens & Datadog?

I'm running a PHP app on AWS ECS Fargate and using FireLens (Fluent Bit) to send logs to Datadog. However, I'm facing an issue where ephemeral storage fills up quickly due to backpressure.

I want to:

* **Limit RAM usage** for log buffering (e.g., 256MB).
* **Use ephemeral storage only when needed** (max 5GB).
* **Increase worker threads** (16) to flush logs faster.

I'm using `storage.type=filesystem`, but Fargate **doesn’t allow sourcePath** for volumes, so I can't explicitly define a storage path. My task definition keeps failing.

How can I configure FireLens in Fargate to handle backpressure efficiently without filling up storage? Any best practices?

https://redd.it/1iyhwgb
@r_devops
Feeling Stuck in My DevOps Role – Need Career Advice

Hey DevOps folks,

I'm a DevOps engineer with 2 years of experience working at a startup. I primarily work with AWS cloud and some Azure (mostly pipelines), managing 7 applications across 3 environments each. Recently, we migrated to ECS with a cross-account setup, which was an exciting challenge. However, now that most things are automated with Terraform, there’s not much left to do—rarely any production issues, and my work feels stagnant.

Since I’m still early in my career, I don’t want to get stuck doing just this. I’m planning to switch to a new company and need some advice:

1. What type of company should I target? (Startups vs. bigger companies, service-based vs. product-based)

2. What technologies should I focus on learning? (I have hands-on experience with AWS, Azure DevOps, Jenkins, Prometheus, and Grafana. I know Kubernetes but haven’t used it in a real project.)

3. Any other suggestions? (e.g., full remote jobs, certifications, or alternative career paths)

Would really appreciate your insights!!

https://redd.it/1iyipp2
@r_devops
Jenkins CICD pipeline migration to GitLab

Hey guys,
What's your experience with migrating the CICD pipelines from jenkins to GitLab? Is it really the only way to rewrite the CICD files one by one or is there a tool for that? I hat do you think,what's the best practice?

https://redd.it/1iyjoyf
@r_devops
Debug & chill #2 - Articles of infra & devops debugging

Thrilled to Share the Second Episode of My Debug & Chill Series!

Back in 2020, I started documenting some of my most intriguing troubleshooting adventures, and now I’m releasing them as a blog series. Each post dives into real problems I faced, how I used different tools, and my step-by-step logic.

This second installment dives into a puzzling case of packet duplication in a VMware environment—a seemingly simple scenario that turned out to be much trickier than it looked. Curious about the cause and how we tracked it down?

Check out Debug & Chill #2 here:

https://royreznik.substack.com/p/debug-and-chill-2-strange-packet

I’d love to hear your thoughts or any similar experiences you’ve had. Let me know in the comments!

https://redd.it/1iyjs7q
@r_devops
Using engineering metrics for good!

Can you share some examples of implementing engineering metrics in your daily workflow that positively impact your team performance?

https://redd.it/1iyin9z
@r_devops
Analyzing OpenTelemetry Data in Real Time with SQL - All Open Source

Hi folks!

I recently wrote a blog post on how to analyze OTel data in real time with SQL, using Feldera and Grafana, both open source tools.

We collect data from OTel collector and send it to your self hosted Feldera instance for analysis, and visualize it with Grafana.

The blog post: https://www.feldera.com/blog/opentelemetry

We also have a more detailed use case article: https://docs.feldera.com/use\_cases/otel/intro

Feel free to ask any questions, and hopefully this is useful to you!

https://redd.it/1iymaze
@r_devops
Just Started a DevOps Blog – Looking for Feedback & Suggestions! 🚀

Hey r/devops community!

I recently launched a personal blog where I share my experiences, challenges, and insights as a DevOps engineer. My goal is to post weekly about new technologies, interesting problems I encounter, and solutions I find useful in real-world scenarios.

My latest post is about EKS Auto Mode – I cover provisioning from scratch, deploying both stateless and stateful applications, and all the details involved in setting up a cluster in Auto Mode. I believe it could be a game-changer in the field, and I’d love to hear your thoughts on it!

👉 https://haykops.com/posts/eks-auto-mode/

I'm open to any feedback—whether it's about the content, topics you'd like me to cover, or how I can make the blog more valuable for the DevOps community.

Would love to hear your thoughts! Thanks in advance. 🙌

https://redd.it/1iyligy
@r_devops
I built an open-source dashboard for VM images

Hi,

I built this project because I wanted an easier way to visualise all Virtual Machine Images. I was also just very sick of people not following naming conventions and keeping track of images in spreadsheets.

Img-Dash is a simple dashboard for VM images across AWS, GCP and Azure that you can run locally.

Features:-

Consolidated view of all VM images and their data
View, Attach or Delete contextual information (IaC code, Event Data, Compliance Scripts)
Even displays which VMs are using which Image
Simple search and list of images in the dashboard

As a DevOps engineer, it has been ages since I've developed a full stack application so feedback is much appreciated!

Repo: https://github.com/shaozae/Img-Dash

https://redd.it/1iyq02j
@r_devops
HELP Trying to optimize my Github Action to not install things every time. I'm new to this CI/CD thing

Hi friends, I'm looking for advice on speeding up my GitHub Actions workflow. Currently, a significant portion of my workflow which is taking some time involves:

sudo apt-get install -y gettext
yarn install --frozen-lockfile --silent
yarn my custom script which runs the react-gettext-parser npm library

These steps are executed on every push/PR, and I'm wondering if there's a more efficient way to handle them?
I wonder if it would be better if I could, for instance, compile what I'm installing, and instead use that compiled thing when my action triggers without having to install everything every time.

Has anyone faced similar challenges and found effective solutions? I'm open to any suggestions or best practices you can share. Thanks in advance : )

https://redd.it/1iyr471
@r_devops
How can I improve at performance tuning topologies/systems/deployments?

Machine learning engineer here, ~4.5 YOE. Most of my XP has been training and evaluating models. But I just started a new job where my primary responsibility will be to optimize systems/pipelines for low-latency, high-throughput inference. TL;DR: I struggle at this and want to know how to get better.

Model building and model serving are completely different beasts, requiring different considerations, skill sets, and tech stacks. Unfortunately I don't know much about model serving - my sphere of knowledge skews more heavily towards data science than computer science, so I'm only passingly familiar with hardcore engineering ideas like networking, multiprocessing, different types of memory, etc. As a result, I find this work very challenging and stressful.

For example, a typical task might entail answering questions like the following:

- Given some large model, should we deploy it with a CPU or a GPU?

- If GPU, which specific instance type and why?

- From a cost-saving perspective, should the model be available on-demand or serverlessly?

- If using Kubernetes, how many replicas will it probably require, and what would be an appropriate trigger for autoscaling?

- Should we set it up for batch inferencing, or just streaming?

- How much concurrency will the deployment require, and how does this impact the memory and processor utilization we'd expect to see?

- Would it be more cost effective to have a dedicated virtual machine, or should we do something like GPU fractionalization where different models are bin-packed onto the same hardware?

- Should we set up a cache before a request hits the model? (okay this one is pretty easy, but still a good example of a purely inference-time consideration)

The list goes on and on, and surely includes things I haven't even encountered yet.

I am one of those self-taught engineers, and while I have overall had considerable success as an MLE, I am definitely feeling my own limitations when it comes to performance tuning. To date I have learned most of what I know on the job, but this stuff feels particularly hard to learn efficiently because everything is interrelated with everything else: tweaking one parameter might mean a different parameter set earlier now needs to change. It's like I need to learn this stuff in an all-or-nothing fasion, which has proven quite challenging.

Does anybody have any advice here? Ideally there'd be a tutorial series (preferred), blog, book, etc. that teaches how to tune deployments, ideally with some real-world case studies. I've searched high and low myself for such a resource, but have surprisingly found nothing. Every "how to" for ML these days just teaches how to train models, not even touching the inference side. So any help appreciated!

https://redd.it/1iysmlj
@r_devops
Can Kaniko build a container with provenance=mode-min?

When going through the Kaniko docs I don't see an area for the Kaniko "--provenance" flag. Is setting this provenance level not a feature of Kaniko? Is there an alternate way of setting provenance with Notary/Oras? Is the provenance level set to min by default?

https://redd.it/1iyrvv9
@r_devops
can you guys roast my resume?

Hello everyone, I'm a masters student who has just started to apply for jobs. I don't have much experience in the IT field so I created my resume based on projects solely. I'm looking for jobs in devops(I know companies don't hire freshers for devops role) and SRE, cloud engineer and related jobs. I'm still learning devops so that is the reason I don't have any devops but will soon be adding it after learning.
can any of you guys could roast/review my resume? it would be really appreciated.

Resume link : https://www.reddit.com/r/aws/comments/1iyws7u/can\_you\_guys\_roast\_my\_resume/

Thanks in advance!

https://redd.it/1iywybb
@r_devops