Help - Best way to interview SRE/DevOps
Looking for advice from anyone with experience as a hiring manager or interviewer for an SRE team.
I usually prefer candidates with some HackerRank coding experience, strong Linux administration, Kubernetes expertise, and networking fundamentals. If anyone can share their best practices for evaluating these skills, that would be great.
I need to validate candidates for the following skills:
* Linux Administration (hands-on with Ubuntu)
* Networking Concepts (L2/L3, OSI layers)
* Kubernetes Administration (on-prem)
* Programming - Python/Go (developer-level preferred, but not mandatory)
* Observability Stack (Prometheus, Grafana, Loki, VictoriaMetrics)
* AWS Proficiency
* Ansible (comfortable using it for automation)
Ideal Candidate would have 5 years of experience. Again I am only looking for feedback and tips in the interview process feel free to share your views.
https://redd.it/1iwjd7s
@r_devops
Looking for advice from anyone with experience as a hiring manager or interviewer for an SRE team.
I usually prefer candidates with some HackerRank coding experience, strong Linux administration, Kubernetes expertise, and networking fundamentals. If anyone can share their best practices for evaluating these skills, that would be great.
I need to validate candidates for the following skills:
* Linux Administration (hands-on with Ubuntu)
* Networking Concepts (L2/L3, OSI layers)
* Kubernetes Administration (on-prem)
* Programming - Python/Go (developer-level preferred, but not mandatory)
* Observability Stack (Prometheus, Grafana, Loki, VictoriaMetrics)
* AWS Proficiency
* Ansible (comfortable using it for automation)
Ideal Candidate would have 5 years of experience. Again I am only looking for feedback and tips in the interview process feel free to share your views.
https://redd.it/1iwjd7s
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for a DevSecOps Role - Remote
Hey folks! I'm looking for a DevSecOps role where I can leverage my skills in automation, security, CI/CD, and cloud infrastructure. Experienced in AWS, Kubernetes, Docker, Terraform, and security best practices. Also, have a strong background in SecOps, DevOps, and FinOps.
Open to remote opportunities! Feel free to connect or drop any leads. Cheers!
https://redd.it/1iwm5kh
@r_devops
Hey folks! I'm looking for a DevSecOps role where I can leverage my skills in automation, security, CI/CD, and cloud infrastructure. Experienced in AWS, Kubernetes, Docker, Terraform, and security best practices. Also, have a strong background in SecOps, DevOps, and FinOps.
Open to remote opportunities! Feel free to connect or drop any leads. Cheers!
https://redd.it/1iwm5kh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Simplifying Infrastructure-as-Code with Our SaaS Solution
Imagine deploying powerful cloud infrastructurelike Google Cloud Storage or a full virtual machine without ever needing to write a single line of code or wrestle with complex tools. Our Software-as-a-Service (SaaS) application takes the headache out of Infrastructure-as-Code (IaC) and puts it into the hands of anyone, regardless of experience. Whether you're a small business owner, a startup founder, or a developer looking to save time, we make Google Cloud Platform (GCP) deployments effortless, secure, and scalable.
What We Offer
Our SaaS is built for simplicity and power:
No Expertise Needed: You don’t need to know Terraform, IaC, or even how GCP fully works. Just connect your GCP project, pick a service—like Google Cloud Storage—and hit "Deploy." We handle the rest.
Ready-Made Building Blocks: We maintain a library of pre-built Terraform modules (think of them as blueprints for cloud services) in our own GitHub repository. These are battle-tested and ready to go.
Personalized Deployment: Your infrastructure lives in your GCP project not ours. We use your authorized credentials to set everything up exactly where you want it.
Future-Proof Growth: Starting with services like Google Cloud Storage, we’re designed to easily add more GCP offerings as your needs evolve.
# How It Works: The Big Picture
Here’s what happens behind the scenes when you use our SaaS:
1. You Connect: Through a clean, intuitive interface, you link your GCP project to our app.
2. You Choose: Pick a service from our list-say, a secure storage bucket for your files.
3. We Deploy: Our system fetches the right Terraform module from our GitHub repo, customizes it for your project, and deploys it to GCP using your secure credentials. Done!
You get enterprise-grade infrastructure without the complexity.
# The Tech That Powers It
Frontend: It’s where you log in, connect your GCP account, and make selections.
Backend: They securely handle your authentication, fetch the Terraform modules, and execute the deployment process.
Terraform Magic: We store our predefined Terraform modules in a GitHub repository (saas-infra-modules). These are reusable scripts that define how services like Google Cloud Storage should be built in GCP. When you deploy, we tailor and apply them to your project.
Scalability: Our architecture is modular. Adding support for new GCP services—like Compute Engine or BigQuery—is as simple as dropping new Terraform modules into our repo.
# Authentication: How We Keep It Secure and Simple
Let’s talk about how we connect to your GCP project—because security and trust are non-negotiable. We use a standard called OAuth 2.0, the same technology you’ve likely used to log into apps with your Google account. Here’s how it works and why it’s safe:
1. Your Permission: When you connect your GCP project, our app redirects you to a Google login page. You sign in with your Google account—the one tied to your GCP project—and grant us permission to manage resources on your behalf. This happens in a secure, Google-controlled environment, not ours.
2. Limited Access: Google generates an OAuth token (a kind of digital key) that we use to act only within your project and only for the tasks you approve—like deploying a storage bucket. This token has an expiration date and can be revoked by you at any time through your Google account settings.
3. No Stored Secrets: We don’t ask for your GCP passwords or private keys. The OAuth token is temporary and encrypted, ensuring your credentials stay yours alone.
4. Our Side: To fetch our Terraform modules from GitHub, we use a Personal Access Token (PAT)—but that’s our key, not yours. It’s locked down to read-only access for our repo, keeping everything compartmentalized.
Think of it like giving a trusted contractor a keycard to renovate one specific room in your house. They can’t wander into other rooms, and you can take the keycard back whenever you want. That’s how we
Imagine deploying powerful cloud infrastructurelike Google Cloud Storage or a full virtual machine without ever needing to write a single line of code or wrestle with complex tools. Our Software-as-a-Service (SaaS) application takes the headache out of Infrastructure-as-Code (IaC) and puts it into the hands of anyone, regardless of experience. Whether you're a small business owner, a startup founder, or a developer looking to save time, we make Google Cloud Platform (GCP) deployments effortless, secure, and scalable.
What We Offer
Our SaaS is built for simplicity and power:
No Expertise Needed: You don’t need to know Terraform, IaC, or even how GCP fully works. Just connect your GCP project, pick a service—like Google Cloud Storage—and hit "Deploy." We handle the rest.
Ready-Made Building Blocks: We maintain a library of pre-built Terraform modules (think of them as blueprints for cloud services) in our own GitHub repository. These are battle-tested and ready to go.
Personalized Deployment: Your infrastructure lives in your GCP project not ours. We use your authorized credentials to set everything up exactly where you want it.
Future-Proof Growth: Starting with services like Google Cloud Storage, we’re designed to easily add more GCP offerings as your needs evolve.
# How It Works: The Big Picture
Here’s what happens behind the scenes when you use our SaaS:
1. You Connect: Through a clean, intuitive interface, you link your GCP project to our app.
2. You Choose: Pick a service from our list-say, a secure storage bucket for your files.
3. We Deploy: Our system fetches the right Terraform module from our GitHub repo, customizes it for your project, and deploys it to GCP using your secure credentials. Done!
You get enterprise-grade infrastructure without the complexity.
# The Tech That Powers It
Frontend: It’s where you log in, connect your GCP account, and make selections.
Backend: They securely handle your authentication, fetch the Terraform modules, and execute the deployment process.
Terraform Magic: We store our predefined Terraform modules in a GitHub repository (saas-infra-modules). These are reusable scripts that define how services like Google Cloud Storage should be built in GCP. When you deploy, we tailor and apply them to your project.
Scalability: Our architecture is modular. Adding support for new GCP services—like Compute Engine or BigQuery—is as simple as dropping new Terraform modules into our repo.
# Authentication: How We Keep It Secure and Simple
Let’s talk about how we connect to your GCP project—because security and trust are non-negotiable. We use a standard called OAuth 2.0, the same technology you’ve likely used to log into apps with your Google account. Here’s how it works and why it’s safe:
1. Your Permission: When you connect your GCP project, our app redirects you to a Google login page. You sign in with your Google account—the one tied to your GCP project—and grant us permission to manage resources on your behalf. This happens in a secure, Google-controlled environment, not ours.
2. Limited Access: Google generates an OAuth token (a kind of digital key) that we use to act only within your project and only for the tasks you approve—like deploying a storage bucket. This token has an expiration date and can be revoked by you at any time through your Google account settings.
3. No Stored Secrets: We don’t ask for your GCP passwords or private keys. The OAuth token is temporary and encrypted, ensuring your credentials stay yours alone.
4. Our Side: To fetch our Terraform modules from GitHub, we use a Personal Access Token (PAT)—but that’s our key, not yours. It’s locked down to read-only access for our repo, keeping everything compartmentalized.
Think of it like giving a trusted contractor a keycard to renovate one specific room in your house. They can’t wander into other rooms, and you can take the keycard back whenever you want. That’s how we
authenticate and protect your project.
# Why This Matters to You
Time Savings: Deploying infrastructure that might take hours or days (and a hired expert) now takes minutes.
Cost Efficiency: No need to hire IaC specialists or spend weeks learning Terraform. Our SaaS is your shortcut.
Control: Your infrastructure lives in your GCP account, under your billing and ownership—not some third-party sandbox.
Security: With Google’s OAuth and our transparent process, you’re protected at every step.
# The Vision
Today, it’s Google Cloud Storage. Tomorrow, it’s Compute Engine, Kubernetes, or whatever GCP service you need. Our SaaS grows with you, simplifying the cloud so you can focus on your business—not the tech.
Ready to deploy your first service? Let’s connect your GCP project and get started—no coding required.
Simplifying Infrastructure-as-Code with Our SaaS Solution
If you found this service helpful, how much would you be willing to pay to use it?
If you’re interested in this service, please reach out to join our waitlist! When we launch, you’ll get one month of free usage.
https://redd.it/1iwne3x
@r_devops
# Why This Matters to You
Time Savings: Deploying infrastructure that might take hours or days (and a hired expert) now takes minutes.
Cost Efficiency: No need to hire IaC specialists or spend weeks learning Terraform. Our SaaS is your shortcut.
Control: Your infrastructure lives in your GCP account, under your billing and ownership—not some third-party sandbox.
Security: With Google’s OAuth and our transparent process, you’re protected at every step.
# The Vision
Today, it’s Google Cloud Storage. Tomorrow, it’s Compute Engine, Kubernetes, or whatever GCP service you need. Our SaaS grows with you, simplifying the cloud so you can focus on your business—not the tech.
Ready to deploy your first service? Let’s connect your GCP project and get started—no coding required.
Simplifying Infrastructure-as-Code with Our SaaS Solution
If you found this service helpful, how much would you be willing to pay to use it?
If you’re interested in this service, please reach out to join our waitlist! When we launch, you’ll get one month of free usage.
https://redd.it/1iwne3x
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Best practices on storing user-uploaded files in containerized environment
I’m working on a job board and have recently containerized our Next.js/Node.js application using Docker (deployed on AWS ECS). One big technical hurdle is handling user-uploaded files (resumes) in a containerized setup.
Currently I'm writing these files to the container’s filesystem---definitely not ideal! What's a clean & simple way approach to file storage that aligns with DevOps best practices. Specifically:
1. Persistent storage options: Which solutions work best for ephemeral containers? An NFS volume, EFS, or a cloud storage bucket (e.g., S3)?
2. Deployment pipeline integration: How do you usually handle storing or moving uploads during blue/green or rolling deployments?
3. Security considerations: Any recommended steps to ensure data integrity and secure transfer? (e.g., encryption in transit, SSE for S3, etc.)
Ty!
https://redd.it/1iwohig
@r_devops
I’m working on a job board and have recently containerized our Next.js/Node.js application using Docker (deployed on AWS ECS). One big technical hurdle is handling user-uploaded files (resumes) in a containerized setup.
Currently I'm writing these files to the container’s filesystem---definitely not ideal! What's a clean & simple way approach to file storage that aligns with DevOps best practices. Specifically:
1. Persistent storage options: Which solutions work best for ephemeral containers? An NFS volume, EFS, or a cloud storage bucket (e.g., S3)?
2. Deployment pipeline integration: How do you usually handle storing or moving uploads during blue/green or rolling deployments?
3. Security considerations: Any recommended steps to ensure data integrity and secure transfer? (e.g., encryption in transit, SSE for S3, etc.)
Ty!
https://redd.it/1iwohig
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Keycloak on EKS Failing to Mount AWS Secrets Manager Credentials
Hey folks,
I’m running Keycloak on an EKS (v1.27) cluster and having trouble mounting secrets from AWS Secrets Manager using the Secrets Store CSI Driver (v1.3.4). Both the Keycloak and PostgreSQL pods are stuck in a `CreateContainerConfigError` state with errors like:
Error: secret "keycloak-secrets" not found
csi-secrets-store-controller: file matching objectName [secret] not found in pod
Below are the relevant details of my setup:
# Environment
* **EKS version**: 1.27
* **Secrets Store CSI Driver**: 1.3.4
* **AWS Secrets Manager**: Verified the secrets exist
* **IAM Policies**: Node role and/or IRSA with `SecretsManagerReadWrite` policy
# SecretProviderClass
Here’s an excerpt (Terraform format) showing how I’m configuring my `SecretProviderClass`:
resource "kubernetes_manifest" "keycloak_secret_provider" {
manifest = {
apiVersion = "secrets-store.csi.x-k8s.io/v1"
kind = "SecretProviderClass"
metadata = {
name = "keycloak-secret-provider"
namespace = "my-namespace"
}
spec = {
provider = "aws"
secretObjects = [{
secretName = "keycloak-secrets"
type = "Opaque"
data = [{
key = "postgres-password"
objectName = "nonprod-secret-postgres_keycloak_auth"
}]
}]
}
}
}
# Pod/Deployment Snippet
Here’s a condensed example of how my Keycloak Deployment references the `SecretProviderClass`:
apiVersion: apps/v1
kind: Deployment
metadata:
name: keycloak
namespace: my-namespace
spec:
replicas: 1
selector:
matchLabels:
app: keycloak
template:
metadata:
labels:
app: keycloak
spec:
securityContext:
fsGroup: 1000
serviceAccountName: keycloak-sa # (Has IRSA or node role with Secrets Manager perms)
containers:
- name: keycloak
image: quay.io/keycloak/keycloak:21.1
volumeMounts:
- name: secrets-store
mountPath: /mnt/secrets
readOnly: true
# other container configs ...
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: keycloak-secret-provider
# What’s Happening
1. Pods fail to start with `CreateContainerConfigError`.
2. Logs/Events complain that `secret "keycloak-secrets" not found`.
3. `csi-secrets-store-controller` logs say `file matching objectName [secret] not found in pod`.
# Troubleshooting So Far
* **AWS Secrets Manager**: Confirmed the secret `nonprod-secret-postgres_keycloak_auth1` exists.
* **IAM Policies**: Verified the node role (or service account with IRSA) has `secretsmanager:GetSecretValue` and other necessary permissions.
* **Terraform**: No drift reported; everything else is applying cleanly.
* **Namespace Check**: Both the `SecretProviderClass` and Keycloak pods are in the same namespace (`my-namespace`).
* **Multiple Pod Restarts**: No change in error status.
# Potential Issues / Questions
1. **Permission Gaps?** Is there a hidden or additional permission needed for the node (or service account) beyond `SecretsManagerReadWrite`?
2. **Secret Sync vs. Ephemeral Mount?** Am I accidentally referencing a Kubernetes Secret (`keycloak-secrets`) that isn’t being created because I only set up ephemeral volume mounting?
* If I need a native K8s Secret, do I have to enable `syncSecret.enabled: true` in the SecretProviderClass?
3. **Name Mismatch?** Could there be a subtle naming or label mismatch in my code—`keycloak-secret-provider` vs. `keycloak_secrets` or a missing [`metadata.name`](https://metadata.name) or `namespace`?
4. **Volume Permissions?** Does `fsGroup: 1000` cause any issues with
Hey folks,
I’m running Keycloak on an EKS (v1.27) cluster and having trouble mounting secrets from AWS Secrets Manager using the Secrets Store CSI Driver (v1.3.4). Both the Keycloak and PostgreSQL pods are stuck in a `CreateContainerConfigError` state with errors like:
Error: secret "keycloak-secrets" not found
csi-secrets-store-controller: file matching objectName [secret] not found in pod
Below are the relevant details of my setup:
# Environment
* **EKS version**: 1.27
* **Secrets Store CSI Driver**: 1.3.4
* **AWS Secrets Manager**: Verified the secrets exist
* **IAM Policies**: Node role and/or IRSA with `SecretsManagerReadWrite` policy
# SecretProviderClass
Here’s an excerpt (Terraform format) showing how I’m configuring my `SecretProviderClass`:
resource "kubernetes_manifest" "keycloak_secret_provider" {
manifest = {
apiVersion = "secrets-store.csi.x-k8s.io/v1"
kind = "SecretProviderClass"
metadata = {
name = "keycloak-secret-provider"
namespace = "my-namespace"
}
spec = {
provider = "aws"
secretObjects = [{
secretName = "keycloak-secrets"
type = "Opaque"
data = [{
key = "postgres-password"
objectName = "nonprod-secret-postgres_keycloak_auth"
}]
}]
}
}
}
# Pod/Deployment Snippet
Here’s a condensed example of how my Keycloak Deployment references the `SecretProviderClass`:
apiVersion: apps/v1
kind: Deployment
metadata:
name: keycloak
namespace: my-namespace
spec:
replicas: 1
selector:
matchLabels:
app: keycloak
template:
metadata:
labels:
app: keycloak
spec:
securityContext:
fsGroup: 1000
serviceAccountName: keycloak-sa # (Has IRSA or node role with Secrets Manager perms)
containers:
- name: keycloak
image: quay.io/keycloak/keycloak:21.1
volumeMounts:
- name: secrets-store
mountPath: /mnt/secrets
readOnly: true
# other container configs ...
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: keycloak-secret-provider
# What’s Happening
1. Pods fail to start with `CreateContainerConfigError`.
2. Logs/Events complain that `secret "keycloak-secrets" not found`.
3. `csi-secrets-store-controller` logs say `file matching objectName [secret] not found in pod`.
# Troubleshooting So Far
* **AWS Secrets Manager**: Confirmed the secret `nonprod-secret-postgres_keycloak_auth1` exists.
* **IAM Policies**: Verified the node role (or service account with IRSA) has `secretsmanager:GetSecretValue` and other necessary permissions.
* **Terraform**: No drift reported; everything else is applying cleanly.
* **Namespace Check**: Both the `SecretProviderClass` and Keycloak pods are in the same namespace (`my-namespace`).
* **Multiple Pod Restarts**: No change in error status.
# Potential Issues / Questions
1. **Permission Gaps?** Is there a hidden or additional permission needed for the node (or service account) beyond `SecretsManagerReadWrite`?
2. **Secret Sync vs. Ephemeral Mount?** Am I accidentally referencing a Kubernetes Secret (`keycloak-secrets`) that isn’t being created because I only set up ephemeral volume mounting?
* If I need a native K8s Secret, do I have to enable `syncSecret.enabled: true` in the SecretProviderClass?
3. **Name Mismatch?** Could there be a subtle naming or label mismatch in my code—`keycloak-secret-provider` vs. `keycloak_secrets` or a missing [`metadata.name`](https://metadata.name) or `namespace`?
4. **Volume Permissions?** Does `fsGroup: 1000` cause any issues with
how the CSI driver writes secret files?
# Additional Info
* **Logs**: I’ve checked the CSI driver logs in `kube-system` (or wherever it’s installed). They only say “file not found” which hints it can’t read or place the files in `/mnt/secrets`.
* **Secrets Manager Tests**: I can successfully `aws secretsmanager get-secret-value` from my workstation using the same IAM role to confirm the secret is accessible.
* **Terraform**: My `kubernetes_manifest` might need more explicit fields. But so far, I haven’t spotted an obvious misconfiguration.
# Key Things I’d Love Feedback On
* Has anyone run into this “file matching objectName not found” error with Secrets Store CSI on EKS?
* Is there a detail or annotation required to mount AWS secrets as ephemeral files under `/mnt/secrets`?
* Am I missing a step in the process of syncing the AWS Secret to a native K8s Secret if that’s what my app is expecting?
Any insights, especially from folks who have Keycloak + AWS Secrets Manager working in EKS, would be hugely appreciated. Thank you! I feel like I am between a rock and a hard place and have been going in circles with this.
https://redd.it/1iwnxmj
@r_devops
# Additional Info
* **Logs**: I’ve checked the CSI driver logs in `kube-system` (or wherever it’s installed). They only say “file not found” which hints it can’t read or place the files in `/mnt/secrets`.
* **Secrets Manager Tests**: I can successfully `aws secretsmanager get-secret-value` from my workstation using the same IAM role to confirm the secret is accessible.
* **Terraform**: My `kubernetes_manifest` might need more explicit fields. But so far, I haven’t spotted an obvious misconfiguration.
# Key Things I’d Love Feedback On
* Has anyone run into this “file matching objectName not found” error with Secrets Store CSI on EKS?
* Is there a detail or annotation required to mount AWS secrets as ephemeral files under `/mnt/secrets`?
* Am I missing a step in the process of syncing the AWS Secret to a native K8s Secret if that’s what my app is expecting?
Any insights, especially from folks who have Keycloak + AWS Secrets Manager working in EKS, would be hugely appreciated. Thank you! I feel like I am between a rock and a hard place and have been going in circles with this.
https://redd.it/1iwnxmj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
US cloud providers and Europe
Hi !
So i live in europe, and we all know about the actualities in the US. And a lot of company are talking about US cloud providers (that they should leave).
A lot of them are talking about RGPD(Personal data protection in EU) and about the fact that the US can have free access as the want to your data stored in ther servers (even hosted in EU).
What do you think about this ? Is Europe need to worry about this ?
https://redd.it/1iwqs76
@r_devops
Hi !
So i live in europe, and we all know about the actualities in the US. And a lot of company are talking about US cloud providers (that they should leave).
A lot of them are talking about RGPD(Personal data protection in EU) and about the fact that the US can have free access as the want to your data stored in ther servers (even hosted in EU).
What do you think about this ? Is Europe need to worry about this ?
https://redd.it/1iwqs76
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for a Devops/Data Engineer Job
Individual with 1 year 9 months industry experience at a MNC. Looking for a job to learn and grow more.
https://redd.it/1iwvf6r
@r_devops
Individual with 1 year 9 months industry experience at a MNC. Looking for a job to learn and grow more.
https://redd.it/1iwvf6r
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Best practices for managing schema updates during deployments (both rollout and rollback)
Hello there,
while walking my devops learning path, I started wondering about the industry best practices for the following use case:
1. app container gets update from v1 to v2
2. database schema need to be upgraded (new table, new columns)
3. (I suppose) the app have all the migration SQL commands to do that on startup once it detects that the schema need to be changed
4. App is online, great
5. OUCH! Something went wrong. Let's roll back... two scenarios:
1. data has been added into the DB in the meantime, we need to save that data and merge it later
2. let's ignore new data, just revert back ASAP
What do you think about those two scenarios? Should the app be responsible for everything or is it a separate process, which isn't automatable ?
Thanks for any explanation.
https://redd.it/1iwy2su
@r_devops
Hello there,
while walking my devops learning path, I started wondering about the industry best practices for the following use case:
1. app container gets update from v1 to v2
2. database schema need to be upgraded (new table, new columns)
3. (I suppose) the app have all the migration SQL commands to do that on startup once it detects that the schema need to be changed
4. App is online, great
5. OUCH! Something went wrong. Let's roll back... two scenarios:
1. data has been added into the DB in the meantime, we need to save that data and merge it later
2. let's ignore new data, just revert back ASAP
What do you think about those two scenarios? Should the app be responsible for everything or is it a separate process, which isn't automatable ?
Thanks for any explanation.
https://redd.it/1iwy2su
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Help Shape the Future of Incident Management! Seeking Insights from Engineering Teams
Ever found yourself wishing your incident response process was less "pulling hair out" and more "smooth sailing"? Well, here’s your chance to help make that happen! We’ve put together a survey because we’re dying to know how you handle the chaos when everything hits the fan.
From alert avalanches to post-mortem ghost towns, tell us what ticks you off and what tools save your bacon. It’s short, sweet, and your chance to rant (constructively!) about the tools and trials of your trade.
👉 Dive into the survey here: Incident Response 2025 Survey
Spare us 10 minutes (it's a coffee break well spent!) and who knows? Your insights might just lead to fewer late-night incident calls and more time for actual life. Let’s face it, we could all use a bit more of that.
https://redd.it/1iwywq8
@r_devops
Ever found yourself wishing your incident response process was less "pulling hair out" and more "smooth sailing"? Well, here’s your chance to help make that happen! We’ve put together a survey because we’re dying to know how you handle the chaos when everything hits the fan.
From alert avalanches to post-mortem ghost towns, tell us what ticks you off and what tools save your bacon. It’s short, sweet, and your chance to rant (constructively!) about the tools and trials of your trade.
👉 Dive into the survey here: Incident Response 2025 Survey
Spare us 10 minutes (it's a coffee break well spent!) and who knows? Your insights might just lead to fewer late-night incident calls and more time for actual life. Let’s face it, we could all use a bit more of that.
https://redd.it/1iwywq8
@r_devops
Google Docs
Incident Response 2025 Survey Questions
Welcome to the Incident Response 2025 Survey! Your feedback is crucial in helping us understand current practices and future needs in incident management. This survey should take approximately 10 minutes to complete.
One-time payment vs. subscription 🔥 what actually makes more money?
I built a habit-tracking app and launched it six months ago. Initially, I made it a one-time purchase for $9.99. Sales were okay, but nothing crazy. Recently, I switched to a $3.99/month subscription model, and suddenly my revenue is way higher... even with fewer purchases.
But now I’m getting tons of complaints from users who bought it before and feel “cheated.” Some are leaving 1-star reviews, and I feel like I burned my early adopters.
Did I screw up? Should I have offered lifetime access at a higher price? If you’ve switched models before, what worked best for you?
https://redd.it/1iwyszm
@r_devops
I built a habit-tracking app and launched it six months ago. Initially, I made it a one-time purchase for $9.99. Sales were okay, but nothing crazy. Recently, I switched to a $3.99/month subscription model, and suddenly my revenue is way higher... even with fewer purchases.
But now I’m getting tons of complaints from users who bought it before and feel “cheated.” Some are leaving 1-star reviews, and I feel like I burned my early adopters.
Did I screw up? Should I have offered lifetime access at a higher price? If you’ve switched models before, what worked best for you?
https://redd.it/1iwyszm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Why pay $150 per parallel e2e test, am I missing something?
Sharding Playwright across a few runners isn't particularly tricky. So, I'm confused how saucelabs and browserstack can charge $150 per parallel test in their virtual cloud. That's not even on real devices.
Is there something I'm missing that makes this appealing? Maybe it's only relevant for bigger test suites for reasons I haven't encountered yet.
https://redd.it/1ix065e
@r_devops
Sharding Playwright across a few runners isn't particularly tricky. So, I'm confused how saucelabs and browserstack can charge $150 per parallel test in their virtual cloud. That's not even on real devices.
Is there something I'm missing that makes this appealing? Maybe it's only relevant for bigger test suites for reasons I haven't encountered yet.
https://redd.it/1ix065e
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Just tried a new profiler: what would you optimize first?
I was looking for better ways to debug performance bottlenecks and came across a new profiling tool that just dropped on GitHub. Decided to test it out on one of our services, and the results were... eye-opening.
The flame graph it generated (screenshot attached) revealed:
\- A DB operation consuming way more resources than expected... we thought it was optimized, but apparently not.
\- Some unexpected runtime garbage collection overhead, wasn’t on our radar at all.
For those who’ve worked with flame graphs before, where would you start optimizing? Do I tackle the DB queries first or look at memory management?
Screenshot is attached here: https://drive.google.com/file/d/1QZJHtEyRxDr2LfIW8VIDVD6sZwokCneo/view?usp=sharing
https://redd.it/1ix0wfc
@r_devops
I was looking for better ways to debug performance bottlenecks and came across a new profiling tool that just dropped on GitHub. Decided to test it out on one of our services, and the results were... eye-opening.
The flame graph it generated (screenshot attached) revealed:
\- A DB operation consuming way more resources than expected... we thought it was optimized, but apparently not.
\- Some unexpected runtime garbage collection overhead, wasn’t on our radar at all.
For those who’ve worked with flame graphs before, where would you start optimizing? Do I tackle the DB queries first or look at memory management?
Screenshot is attached here: https://drive.google.com/file/d/1QZJHtEyRxDr2LfIW8VIDVD6sZwokCneo/view?usp=sharing
https://redd.it/1ix0wfc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
GitHub Actions, Pulumi GCP, Artifact Registry and Docker - Cannot perform an interactive login from a non TTY device
Hi everyone! [I'm cross-posting ](https://stackoverflow.com/questions/79463461/github-actions-pulumi-gcp-artifact-registry-and-docker-cannot-perform-an-int)from Stack Overflow.
I'm using Pulumi in GitHub Actions to deploy to GCP's Artifact Registry with Workload Identity Federation. When it reaches Pulumi's code to push to artifact registry I receive:
```
docker:image:Image temporal-worker-dev {"Client":{"Platform":{"Name":"Docker Engine - Community"},"Version":"26.1.3","ApiVersion":"1.45","DefaultAPIVersion":"1.45","GitCommit":"b72abbb","GoVersion":"go1.21.10","Os":"linux","Arch":"amd64","BuildTime":"Thu May 16 08:33:35 2024","Context":"default"},"Server":{"Platform":{"Name":"Docker Engine - Community"},"Components":[{"Name":"Engine","Version":"26.1.3","Details":{"ApiVersion":"1.45","Arch":"amd64","BuildTime":"Thu May 16 08:33:35 2024","Experimental":"false","GitCommit":"8e96db1","GoVersion":"go1.21.10","KernelVersion":"6.8.0-1021-azure","MinAPIVersion":"1.24","Os":"linux"}},{"Name":"containerd","Version":"1.7.25","Details":{"GitCommit":"bcc810d6b9066471b0b6fa75f557a15a1cbf31bb"}},{"Name":"runc","Version":"1.2.4","Details":{"GitCommit":"v1.2.4-0-g6c52b3f"}},{"Name":"docker-init","Version":"0.19.0","Details":{"GitCommit":"de40ad0"}}],"Version":"26.1.3","ApiVersion":"1.45","MinAPIVersion":"1.24","GitCommit":"8e96db1","GoVersion":"go1.21.10","Os":"linux","A
docker:image:Image temporal-worker-dev error: Error: Cannot perform an interactive login from a non TTY device
docker:image:Image temporal-worker-dev docker login failed
docker:image:Image remix-app-dev error: Error: Cannot perform an interactive login from a non TTY device
docker:image:Image remix-app-dev docker login failed
pulumi:pulumi:Stack alertdown-infra-dev running error: an unhandled error occurred: program failed:
docker:image:Image remix-app-dev **failed** 1 error
docker:image:Image temporal-worker-dev **failed** 1 error
pulumi:pulumi:Stack alertdown-infra-dev **failed** 1 error
Diagnostics:
docker:image:Image (remix-app-dev):
error: Error: Cannot perform an interactive login from a non TTY device
docker:image:Image (temporal-worker-dev):
error: Error: Cannot perform an interactive login from a non TTY device
pulumi:pulumi:Stack (alertdown-infra-dev):
error: an unhandled error occurred: program failed:
waiting for RPCs: docker login failed with error: exit status 1
```
I have two docker containers, and this is my yaml:
```
name: Deploy to Staging
on:
push:
branches:
- main
permissions:
actions: read
contents: read
id-token: write
jobs:
ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
- uses: pnpm/action-setup@v4
with:
version: 9
- uses: actions/setup-node@v4
with:
node-version: 22
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Build affected apps
run: pnpm exec nx affected -t build
deploy:
runs-on: ubuntu-latest
environment: staging
needs: [ci]
steps:
- uses: actions/checkout@v4
- name: Create .env file
run: |
cat << EOF > libs/infrastructure/src/pulumi/.env
PULUMI_MAIN_SERVICE_ACCOUNT_STAGING="${{ secrets.PULUMI_MAIN_SERVICE_ACCOUNT_STAGING }}"
PULUMI_WORKLOAD_IDENTITY_PROVIDER_ID_STAGING="${{ secrets.PULUMI_WORKLOAD_IDENTITY_PROVIDER_ID_STAGING }}"
PULUMI_DOPPLER_REMIX_PROJECT="remix-app"
PULUMI_DOPPLER_REMIX_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_REMIX_STAGING_TOKEN }}"
PULUMI_DOPPLER_REMIX_STAGING_BRANCH_NAME="stg"
PULUMI_DOPPLER_TEMPORAL_PROJECT="temporal-worker"
PULUMI_DOPPLER_TEMPORAL_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_TEMPORAL_STAGING_TOKEN }}"
PULUMI_DOPPLER_TEMPORAL_STAGING_BRANCH_NAME="stg"
Hi everyone! [I'm cross-posting ](https://stackoverflow.com/questions/79463461/github-actions-pulumi-gcp-artifact-registry-and-docker-cannot-perform-an-int)from Stack Overflow.
I'm using Pulumi in GitHub Actions to deploy to GCP's Artifact Registry with Workload Identity Federation. When it reaches Pulumi's code to push to artifact registry I receive:
```
docker:image:Image temporal-worker-dev {"Client":{"Platform":{"Name":"Docker Engine - Community"},"Version":"26.1.3","ApiVersion":"1.45","DefaultAPIVersion":"1.45","GitCommit":"b72abbb","GoVersion":"go1.21.10","Os":"linux","Arch":"amd64","BuildTime":"Thu May 16 08:33:35 2024","Context":"default"},"Server":{"Platform":{"Name":"Docker Engine - Community"},"Components":[{"Name":"Engine","Version":"26.1.3","Details":{"ApiVersion":"1.45","Arch":"amd64","BuildTime":"Thu May 16 08:33:35 2024","Experimental":"false","GitCommit":"8e96db1","GoVersion":"go1.21.10","KernelVersion":"6.8.0-1021-azure","MinAPIVersion":"1.24","Os":"linux"}},{"Name":"containerd","Version":"1.7.25","Details":{"GitCommit":"bcc810d6b9066471b0b6fa75f557a15a1cbf31bb"}},{"Name":"runc","Version":"1.2.4","Details":{"GitCommit":"v1.2.4-0-g6c52b3f"}},{"Name":"docker-init","Version":"0.19.0","Details":{"GitCommit":"de40ad0"}}],"Version":"26.1.3","ApiVersion":"1.45","MinAPIVersion":"1.24","GitCommit":"8e96db1","GoVersion":"go1.21.10","Os":"linux","A
docker:image:Image temporal-worker-dev error: Error: Cannot perform an interactive login from a non TTY device
docker:image:Image temporal-worker-dev docker login failed
docker:image:Image remix-app-dev error: Error: Cannot perform an interactive login from a non TTY device
docker:image:Image remix-app-dev docker login failed
pulumi:pulumi:Stack alertdown-infra-dev running error: an unhandled error occurred: program failed:
docker:image:Image remix-app-dev **failed** 1 error
docker:image:Image temporal-worker-dev **failed** 1 error
pulumi:pulumi:Stack alertdown-infra-dev **failed** 1 error
Diagnostics:
docker:image:Image (remix-app-dev):
error: Error: Cannot perform an interactive login from a non TTY device
docker:image:Image (temporal-worker-dev):
error: Error: Cannot perform an interactive login from a non TTY device
pulumi:pulumi:Stack (alertdown-infra-dev):
error: an unhandled error occurred: program failed:
waiting for RPCs: docker login failed with error: exit status 1
```
I have two docker containers, and this is my yaml:
```
name: Deploy to Staging
on:
push:
branches:
- main
permissions:
actions: read
contents: read
id-token: write
jobs:
ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
- uses: pnpm/action-setup@v4
with:
version: 9
- uses: actions/setup-node@v4
with:
node-version: 22
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Build affected apps
run: pnpm exec nx affected -t build
deploy:
runs-on: ubuntu-latest
environment: staging
needs: [ci]
steps:
- uses: actions/checkout@v4
- name: Create .env file
run: |
cat << EOF > libs/infrastructure/src/pulumi/.env
PULUMI_MAIN_SERVICE_ACCOUNT_STAGING="${{ secrets.PULUMI_MAIN_SERVICE_ACCOUNT_STAGING }}"
PULUMI_WORKLOAD_IDENTITY_PROVIDER_ID_STAGING="${{ secrets.PULUMI_WORKLOAD_IDENTITY_PROVIDER_ID_STAGING }}"
PULUMI_DOPPLER_REMIX_PROJECT="remix-app"
PULUMI_DOPPLER_REMIX_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_REMIX_STAGING_TOKEN }}"
PULUMI_DOPPLER_REMIX_STAGING_BRANCH_NAME="stg"
PULUMI_DOPPLER_TEMPORAL_PROJECT="temporal-worker"
PULUMI_DOPPLER_TEMPORAL_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_TEMPORAL_STAGING_TOKEN }}"
PULUMI_DOPPLER_TEMPORAL_STAGING_BRANCH_NAME="stg"
Stack Overflow
GitHub Actions, Pulumi GCP, Artifact Registry and Docker - Cannot perform an interactive login from a non TTY device
There are dozens of Q/A in Stack Overflow. I've applied all the solutions out there, but I keep getting the same error:
Cannot perform an interactive login from a non TTY device
For context:
I'm ...
Cannot perform an interactive login from a non TTY device
For context:
I'm ...
PULUMI_DOPPLER_CLOUD_RUN_REMIX_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_CLOUD_RUN_REMIX_STAGING_TOKEN }}"
PULUMI_DOPPLER_CLOUD_RUN_TEMPORAL_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_CLOUD_RUN_TEMPORAL_STAGING_TOKEN }}"
EOF
- name: Configure Workload Identity Federation
id: auth
uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.GCP_STAGING_WORKLOAD_IDENTITY_PROVIDER_ID }}
project_id: ${{ secrets.GCP_STAGING_PROJECT_ID }}
service_account: [email protected]
token_format: 'access_token'
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Configure Docker for Artifact Registry
run: |
gcloud auth configure-docker us-east1-docker.pkg.dev
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Artifact Registry
uses: docker/login-action@v3
with:
registry: us-east1-docker.pkg.dev
username: oauth2accesstoken
password: ${{ steps.auth.outputs.access_token }}
- name: Run Pulumi
uses: pulumi/actions@v6
with:
work-dir: 'libs/infrastructure/src/pulumi'
command: 'up'
stack-name: 'alertdown/alertdown-infra/dev'
comment-on-pr: true
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
```
I've verified that my service account has the right permissions, and that the `google-github-actions/auth@v2` works correctly.
Any ideas? I don't know what else to try.
https://redd.it/1ix1bf8
@r_devops
PULUMI_DOPPLER_CLOUD_RUN_TEMPORAL_STAGING_TOKEN="${{ secrets.PULUMI_DOPPLER_CLOUD_RUN_TEMPORAL_STAGING_TOKEN }}"
EOF
- name: Configure Workload Identity Federation
id: auth
uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.GCP_STAGING_WORKLOAD_IDENTITY_PROVIDER_ID }}
project_id: ${{ secrets.GCP_STAGING_PROJECT_ID }}
service_account: [email protected]
token_format: 'access_token'
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Configure Docker for Artifact Registry
run: |
gcloud auth configure-docker us-east1-docker.pkg.dev
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Artifact Registry
uses: docker/login-action@v3
with:
registry: us-east1-docker.pkg.dev
username: oauth2accesstoken
password: ${{ steps.auth.outputs.access_token }}
- name: Run Pulumi
uses: pulumi/actions@v6
with:
work-dir: 'libs/infrastructure/src/pulumi'
command: 'up'
stack-name: 'alertdown/alertdown-infra/dev'
comment-on-pr: true
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
```
I've verified that my service account has the right permissions, and that the `google-github-actions/auth@v2` works correctly.
Any ideas? I don't know what else to try.
https://redd.it/1ix1bf8
@r_devops
Reddit
From the devops community on Reddit: GitHub Actions, Pulumi GCP, Artifact Registry and Docker - Cannot perform an interactive login…
Explore this post and more from the devops community
Inexpensive managed code repos
Hey all,
I'm a CIO for a manufacturing firm. They have a couple of engineers who have asked me to spin up code repo infrastructure for storing some code and config files. Nothing serious, and they don't publish any public-facing apps. I have no intention on spending some inordinate amount of money.
That said, to ensure these repos are managed by the organization but the engineers retain full management of creating/managing repos, etc. There really should be very little if any IT support cost, it just needs to be owned by IT so that a terminated employee's code/configs cannot just be lost to time.
We use exclusively Microsoft services, so I was thinking potentially Azure Repos or GitHub. I have an absolute requirement for Entra SSO, but otherwise this will be a simple Git server. That said, what solution would be best for us?
Sorry if I seem exceedingly unfamiliar - I am! I don't typically work with firms that have any software dev capabilities.
https://redd.it/1ix490z
@r_devops
Hey all,
I'm a CIO for a manufacturing firm. They have a couple of engineers who have asked me to spin up code repo infrastructure for storing some code and config files. Nothing serious, and they don't publish any public-facing apps. I have no intention on spending some inordinate amount of money.
That said, to ensure these repos are managed by the organization but the engineers retain full management of creating/managing repos, etc. There really should be very little if any IT support cost, it just needs to be owned by IT so that a terminated employee's code/configs cannot just be lost to time.
We use exclusively Microsoft services, so I was thinking potentially Azure Repos or GitHub. I have an absolute requirement for Entra SSO, but otherwise this will be a simple Git server. That said, what solution would be best for us?
Sorry if I seem exceedingly unfamiliar - I am! I don't typically work with firms that have any software dev capabilities.
https://redd.it/1ix490z
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is Product Hunt rigged? Some products start with 50 votes, is that normal?
Hey everyone,
I posted my product today on Product Hunt and I’ve been working hard to create hype around it on X, LinkedIn, and Reddit. However, looking at the graph, I noticed something odd—some products seem to get 50 votes or more right from the start, while mine (and others) had to build up votes over time. It’s pretty clear that some products are boosting votes or starting with 50 votes out of nowhere.
Is this normal? How do some products get such a big initial push while others, like mine, don’t get the same? Any thoughts on this?
Thanks for your input!
https://drive.google.com/file/d/1QRt8PnAfN8lWeLD4S6v3TKbIyDwuL7hv/view?usp=sharing
this is the graph of the vote
https://redd.it/1ix5xgc
@r_devops
Hey everyone,
I posted my product today on Product Hunt and I’ve been working hard to create hype around it on X, LinkedIn, and Reddit. However, looking at the graph, I noticed something odd—some products seem to get 50 votes or more right from the start, while mine (and others) had to build up votes over time. It’s pretty clear that some products are boosting votes or starting with 50 votes out of nowhere.
Is this normal? How do some products get such a big initial push while others, like mine, don’t get the same? Any thoughts on this?
Thanks for your input!
https://drive.google.com/file/d/1QRt8PnAfN8lWeLD4S6v3TKbIyDwuL7hv/view?usp=sharing
this is the graph of the vote
https://redd.it/1ix5xgc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do companies hire fresher DevOps?
Does company hires newbie with no Job experience in DevOps but has build some impressive projects revolving around DevOps?
https://redd.it/1ix5a7g
@r_devops
Does company hires newbie with no Job experience in DevOps but has build some impressive projects revolving around DevOps?
https://redd.it/1ix5a7g
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone gave the CKA AFTER 18th Feb Changes?
Hello everyone, my exam is scheduled on 2nd March. Can anyone share the exam experience if they gave the new exam.
Thanks
https://redd.it/1ix17lw
@r_devops
Hello everyone, my exam is scheduled on 2nd March. Can anyone share the exam experience if they gave the new exam.
Thanks
https://redd.it/1ix17lw
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Ente: Self Host the Google Photos Alternative and Own Your Privacy
Hey folks,
After seeing too many half-baked self-hosting guides that leave out crucial production details, I decided to write a comprehensive guide on deploying Ente (an end-to-end encrypted Google Photos alternative) using Kubernetes.
What's covered:
- Full K8s deployment manifests with Kustomize
- Automated Docker image builds with GitHub Actions
- Frontend deployment to GitHub Pages
- Proper secrets management with External Secrets Operator
- Production-ready PostgreSQL setup using CloudNative PG operator
- Complete IaC using OpenTofu (Terraform)
No fluff, no basic tutorials - just practical, production-ready code that you can adapt for your setup.
All configurations are available in the post, and I've included detailed explanations for the important bits.
https://developer-friendly.blog/blog/2025/02/24/ente-self-host-the-google-photos-alternative-and-own-your-privacy/
Happy to answer any questions or discuss alternative approaches!
https://redd.it/1ix6zo8
@r_devops
Hey folks,
After seeing too many half-baked self-hosting guides that leave out crucial production details, I decided to write a comprehensive guide on deploying Ente (an end-to-end encrypted Google Photos alternative) using Kubernetes.
What's covered:
- Full K8s deployment manifests with Kustomize
- Automated Docker image builds with GitHub Actions
- Frontend deployment to GitHub Pages
- Proper secrets management with External Secrets Operator
- Production-ready PostgreSQL setup using CloudNative PG operator
- Complete IaC using OpenTofu (Terraform)
No fluff, no basic tutorials - just practical, production-ready code that you can adapt for your setup.
All configurations are available in the post, and I've included detailed explanations for the important bits.
https://developer-friendly.blog/blog/2025/02/24/ente-self-host-the-google-photos-alternative-and-own-your-privacy/
Happy to answer any questions or discuss alternative approaches!
https://redd.it/1ix6zo8
@r_devops
developer-friendly.blog
Ente: Self Host the Google Photos Alternative and Own Your Privacy - Developer Friendly Blog
Tutorial on self-hosting Ente (Google Photos alternative) using Kubernetes, with deployment and CI/CD setup via GitHub Actions for enhanced privacy.