DataDog ECS Fargate - Run ECS checks from agent service instead of within task definition?
Title sort of says it all. DataDog recommends including their agent as a second container in every task definition to monitor tasks running in Fargate. This means running x2 containers total for all our Fargate tasks.
Previously, on EC2, we ran the container as a daemon per instance, limiting our container resource overhead.
Has anyone come up with a creative workaround in Fargate to not have to run x2 containers for the DataDog agent-based monitoring approach? We can run the agent itself as a service but unclear how it will then be able to poll metrics about other containers (it may be impossible, hence their requirement for running per task definition).
https://redd.it/ywxc7g
@r_devops
Title sort of says it all. DataDog recommends including their agent as a second container in every task definition to monitor tasks running in Fargate. This means running x2 containers total for all our Fargate tasks.
Previously, on EC2, we ran the container as a daemon per instance, limiting our container resource overhead.
Has anyone come up with a creative workaround in Fargate to not have to run x2 containers for the DataDog agent-based monitoring approach? We can run the agent itself as a service but unclear how it will then be able to poll metrics about other containers (it may be impossible, hence their requirement for running per task definition).
https://redd.it/ywxc7g
@r_devops
reddit
DataDog ECS Fargate - Run ECS checks from agent service instead of...
Title sort of says it all. DataDog recommends including their agent as a second container in every task definition to monitor tasks running in...
DevSec for Scale Podcast Ep 6: Policy-as-Code
I really enjoyed this discussion from Akeyless on Policy-as-code and I figured that I could share it here. Podcast link.
Transcript for those who prefer written formats:
TRANSCRIPT
Jeremy Hess: Welcome to the DevSec for Scale Podcast, the show that makes security a first-class citizen for growing companies. My name is Jeremy Hess, Head of Developer Relations at Akeyless, the Secrets Management SaaS platform. This interview podcast brings security experts and practitioners together to offer practical and actionable ways for small and growing companies to implement security best practices using shift left principles, without interrupting developer lifecycles.
Welcome back everybody. My name is Jeremy Hess with Akeyless, and today my guest is Eran Bibi. He’s co-founder and Chief Product Officer at a fantastic startup called Firefly. They deal with many different things and we’re going talk about that a little bit soon. But Eran, before we get into you and the company and all that, let’s talk a little bit about policy-as-code, which is what this episode is going to be about. So, policy-as-code, it’s a bit of a newer term in the industry and it has these remnants and these ideas of old school policy and things like that. So, can you give us a little understanding about what policy-as-code is all about?
Eran Bibi: Yes of course. So, policy-as-code is one of those trends of doing everything as code. And I think the main advantage of using this new methodology is the power for developers to create policies for themselves, and to use the community to extend the policy surface on their organization. So, it’s a really cool methodology I would say.
Jeremy Hess: Got it. Well, what was it like in terms of looking at policy as an old school term? What was the difference there between what it was and how it’s changed?
Eran Bibi: So, the idea is basically about enforcement and prevention, so you would like to have more control over the stuff that you are deploying on your Kubernetes cluster or cloud workloads. And policy-as-code basically gives you the opportunity to create those gatings in the CI, and to make sure that the configuration that you have in place meets the policy that you decided to put on your organization. It can be stuff that’s related to security. This is the most common use case, but it’s also about having alignment and best practices, and even making sure stuff is built for scale.
Jeremy Hess: Got it, really cool. Alright, so Eran, why don’t you give us a little bit about you, your background, and more about what Firefly is doing today?
Eran Bibi: So, my name is Eran Bibi. I was a DevOps engineer in my previous role. I’ve held a few DevOps positions in the past 10 years, and now I co-founded and lead the product at Firefly, a new startup that basically helps DevOps to get better control over their cloud.
Jeremy Hess: Got it. Alright. Well, can you give us a little bit more detail? What’s Firefly doing in that realm of the cloud and how is it helping customers?
Eran Bibi: Sure, so Firefly is basically a cloud asset management tool, and we scan the cloud and then give you visibility about all the stuff that you have in the cloud. One of the main metrics we provide is what portion of the cloud is managed by infrastructure-as-code, and what is not. And when I say not, this is workloads you’re creating manually using a ClickOps kind of usage or a CLI tool. But in any case, you don’t have any infrastructure-as-code. So, Firefly gives you that visibility, and then gives you the automation to help you to increase that coverage of infrastructure-as-code. So, think about it as the tool that helps you achieve your goals of meeting best practices and industry standards.
Jeremy Hess: Got it. Awesome. So, getting a bit more into policy-as-code. What was the impetus for these changes for policy-as-code? What’s novel about the idea of policy-as-code specifically?
Eran Bibi: So, we have this trend of
I really enjoyed this discussion from Akeyless on Policy-as-code and I figured that I could share it here. Podcast link.
Transcript for those who prefer written formats:
TRANSCRIPT
Jeremy Hess: Welcome to the DevSec for Scale Podcast, the show that makes security a first-class citizen for growing companies. My name is Jeremy Hess, Head of Developer Relations at Akeyless, the Secrets Management SaaS platform. This interview podcast brings security experts and practitioners together to offer practical and actionable ways for small and growing companies to implement security best practices using shift left principles, without interrupting developer lifecycles.
Welcome back everybody. My name is Jeremy Hess with Akeyless, and today my guest is Eran Bibi. He’s co-founder and Chief Product Officer at a fantastic startup called Firefly. They deal with many different things and we’re going talk about that a little bit soon. But Eran, before we get into you and the company and all that, let’s talk a little bit about policy-as-code, which is what this episode is going to be about. So, policy-as-code, it’s a bit of a newer term in the industry and it has these remnants and these ideas of old school policy and things like that. So, can you give us a little understanding about what policy-as-code is all about?
Eran Bibi: Yes of course. So, policy-as-code is one of those trends of doing everything as code. And I think the main advantage of using this new methodology is the power for developers to create policies for themselves, and to use the community to extend the policy surface on their organization. So, it’s a really cool methodology I would say.
Jeremy Hess: Got it. Well, what was it like in terms of looking at policy as an old school term? What was the difference there between what it was and how it’s changed?
Eran Bibi: So, the idea is basically about enforcement and prevention, so you would like to have more control over the stuff that you are deploying on your Kubernetes cluster or cloud workloads. And policy-as-code basically gives you the opportunity to create those gatings in the CI, and to make sure that the configuration that you have in place meets the policy that you decided to put on your organization. It can be stuff that’s related to security. This is the most common use case, but it’s also about having alignment and best practices, and even making sure stuff is built for scale.
Jeremy Hess: Got it, really cool. Alright, so Eran, why don’t you give us a little bit about you, your background, and more about what Firefly is doing today?
Eran Bibi: So, my name is Eran Bibi. I was a DevOps engineer in my previous role. I’ve held a few DevOps positions in the past 10 years, and now I co-founded and lead the product at Firefly, a new startup that basically helps DevOps to get better control over their cloud.
Jeremy Hess: Got it. Alright. Well, can you give us a little bit more detail? What’s Firefly doing in that realm of the cloud and how is it helping customers?
Eran Bibi: Sure, so Firefly is basically a cloud asset management tool, and we scan the cloud and then give you visibility about all the stuff that you have in the cloud. One of the main metrics we provide is what portion of the cloud is managed by infrastructure-as-code, and what is not. And when I say not, this is workloads you’re creating manually using a ClickOps kind of usage or a CLI tool. But in any case, you don’t have any infrastructure-as-code. So, Firefly gives you that visibility, and then gives you the automation to help you to increase that coverage of infrastructure-as-code. So, think about it as the tool that helps you achieve your goals of meeting best practices and industry standards.
Jeremy Hess: Got it. Awesome. So, getting a bit more into policy-as-code. What was the impetus for these changes for policy-as-code? What’s novel about the idea of policy-as-code specifically?
Eran Bibi: So, we have this trend of
Akeyless
DevSec for Scale Podcast Ep 6: Policy-as-Code | Akeyless
In this episode, Eran Bibi explains how policy-as-code enhances business velocity and scale and how early stage companies can utilize policy-as-code to better automate their decisions to give developers more independence as well as ensure security.
shifting left, and giving more power to developers to do stuff early in the lifecycle of the software. So policy-as-code is basically the opposite of buying a very expensive kind of security system that gives new enforcement on the runtime. So, policy-as-code gives the power to the developers to use the community manifest and the community powers to put guardrails on the CI, and to make sure nothing is being provisioned into production. And I think Kubernetes has a lot to do with the trend. So, policy-as-code, even if I’m talking more specifically about frameworks that are popular in that domain like OPA, gives you a very simple syntax to create policies on the stuff that you can provision into the Kubernetes cluster.
Jeremy Hess: Yeah, exactly, and that’s where I wanted to take this, to understand a little bit more about OPA. So, first of all, why don’t you give us a little bit more detail about OPA? It’s a very large project and let’s hear a little bit more about that.
Eran Bibi: So, OPA is a framework built by a company called Styra, and I think it’s something like four or five years old, so it’s relatively new. It provides a very simple syntax for creating policies. The syntax is called Rego, and it basically checks against a JSON manifest. So, if you have a workload like in Kubernetes or Terraform for example, you can create, with very few lines of code in a human readable kind of syntax, a gate for stuff.
So, the output of the OPA is basically allow or deny. So, you can create those rules and, let me give you some very concrete examples. You will not allow any workload without a liveness probe to be provisioned into your Kubernetes cluster. So, you basically can create the Rego syntax that works against the manifest of a Kubernetes deployment, and just makes sure you have that block of liveness in your manifest. And if you don’t have it, it will basically provide a deny kind of output and then you can gate it early in the stage of deployment, and make sure you don’t have any deployment running without a liveness probe.
Jeremy Hess: Got it. Okay, so how would a startup or an early-stage company be able to implement a policy-as-code based on the OPA framework?
Eran Bibi: So, it’s very easy, because policy-as-code is a code, and it can be shared in GitHub. You can find tons of projects online that already have built-in kinds of policies that you can use even without writing a line of code. Just cloning those repositories and implementing them on your cluster. So, in Kubernetes’ case, there is a project called Gatekeeper that checks your policy against the running workload and makes sure you don’t have any violation on your runtime. There is also some tooling that you can put even before that in your CI that just scans your manifest. Like a Helm chart or a YAML manifest, and makes sure you don’t have anything that is misconfigured. But I think the real power of policy-as-code is the community.
The fact that you, without writing any line of code, have tons of policies out there and even like a compliance package. You can have PCI compliance, EPA compliance, best practices, well architected, all of that stuff already available for any developers to put on their CI to make sure nothing that is misconfigured will be provisioned to the cloud.
Jeremy Hess: Right, well that’s the power of community. We always love our community members or community friends, and developers helping developers is always a fantastic melding together of minds, so it’s really great. Let’s ask a little more about policy-as-code specifically in terms of the challenges, things that are a little bit more difficult, let’s say. What are the challenges that you see or what’s the biggest challenge, let’s say, of implementing policy-as-code at an early-stage company?
Eran Bibi: So, putting gates, whether it is like denying stuff from being provisioned to the cloud, can be something that can slow you down. So, if an early-stage startup is all about delivering fast value to the customers without having everything perfect, if the DevOps engineer or the
Jeremy Hess: Yeah, exactly, and that’s where I wanted to take this, to understand a little bit more about OPA. So, first of all, why don’t you give us a little bit more detail about OPA? It’s a very large project and let’s hear a little bit more about that.
Eran Bibi: So, OPA is a framework built by a company called Styra, and I think it’s something like four or five years old, so it’s relatively new. It provides a very simple syntax for creating policies. The syntax is called Rego, and it basically checks against a JSON manifest. So, if you have a workload like in Kubernetes or Terraform for example, you can create, with very few lines of code in a human readable kind of syntax, a gate for stuff.
So, the output of the OPA is basically allow or deny. So, you can create those rules and, let me give you some very concrete examples. You will not allow any workload without a liveness probe to be provisioned into your Kubernetes cluster. So, you basically can create the Rego syntax that works against the manifest of a Kubernetes deployment, and just makes sure you have that block of liveness in your manifest. And if you don’t have it, it will basically provide a deny kind of output and then you can gate it early in the stage of deployment, and make sure you don’t have any deployment running without a liveness probe.
Jeremy Hess: Got it. Okay, so how would a startup or an early-stage company be able to implement a policy-as-code based on the OPA framework?
Eran Bibi: So, it’s very easy, because policy-as-code is a code, and it can be shared in GitHub. You can find tons of projects online that already have built-in kinds of policies that you can use even without writing a line of code. Just cloning those repositories and implementing them on your cluster. So, in Kubernetes’ case, there is a project called Gatekeeper that checks your policy against the running workload and makes sure you don’t have any violation on your runtime. There is also some tooling that you can put even before that in your CI that just scans your manifest. Like a Helm chart or a YAML manifest, and makes sure you don’t have anything that is misconfigured. But I think the real power of policy-as-code is the community.
The fact that you, without writing any line of code, have tons of policies out there and even like a compliance package. You can have PCI compliance, EPA compliance, best practices, well architected, all of that stuff already available for any developers to put on their CI to make sure nothing that is misconfigured will be provisioned to the cloud.
Jeremy Hess: Right, well that’s the power of community. We always love our community members or community friends, and developers helping developers is always a fantastic melding together of minds, so it’s really great. Let’s ask a little more about policy-as-code specifically in terms of the challenges, things that are a little bit more difficult, let’s say. What are the challenges that you see or what’s the biggest challenge, let’s say, of implementing policy-as-code at an early-stage company?
Eran Bibi: So, putting gates, whether it is like denying stuff from being provisioned to the cloud, can be something that can slow you down. So, if an early-stage startup is all about delivering fast value to the customers without having everything perfect, if the DevOps engineer or the
one that is in charge of putting those policies is too strict, then it might impact the delivery and the velocity of the development too. So, I think you need to use that wisely and make sure that you maintain velocity, but use policies to prevent something that can be a disaster to the company. So, with great power comes great responsibility.
Jeremy Hess: Yeah.
Eran Bibi: It’s great but you need to make sure again in early startups, to move fast, and then you have the time to fix and align other non-perfect kinds of workloads that you have on the cloud.
Jeremy Hess: Got it. Well, you come from a bit of a larger company that was a startup not too long ago, Aqua. So, you also have that security background. What would you say are some things that you saw at a larger company you were able to implement, but now at a startup it’s a little bit more difficult to implement those specific ideas and tools?
Eran Bibi: So, my experience with Aqua was very similar to what we have right now in Firefly. Because I joined Aqua Security at a very early stage when we were like 20 employees. And again, it was the same dilemma because I was the one in charge of the CI/CD pipeline. I put the tooling in place to make sure the right gating was in place. So, I think if I’m looking specifically at security, the policy was very strict regarding high or critical vulnerabilities, that the CI would basically stop the build if there were such findings, but for less than that we were only about alerting, and not enforcing and stopping the build. Because again, startups need to make sure there is a high velocity of workloads, and developers don’t have to deal all the time with why the CI is stopping the build. Again, as I mentioned, it’s really in terms of finding the right balance.
Jeremy Hess: Absolutely. So, another question I like to ask all my guests when we’re talking about, especially security is, at Firefly do you use your own product to check your own product?
Eran Bibi: Yes of course, we call it drinking our own champagne.
Jeremy Hess: Yeah. Instead of dog food, right? Instead of dog food.
Eran Bibi: Yes, dog food, we found a prettier term for that. So, the dog food protocol in Firefly is about making sure all of our AWS accounts and Kubernetes clusters are integrated with Firefly. And we use that to make sure the cloud is aligned with best practices and everything is codified, meaning described as code. It’s a great kind of experience. We have our own DevOps engineer who uses Firefly like any other customer. And if he finds a defect he opens a defect for the development and it’s a great methodology.
Jeremy Hess: Alright, fantastic. One more question I’d like to also ask is just for our listeners, because of course we’re trying to talk about shift left principles and how to make sure that developers are still able to continue doing their work, with as little stopping and changing as possible. But also make sure that they’re implementing security best practices. What are, let’s say one or two tips you might give generally about how developers can implement security? What are some, in your head, in your eyes, basics that a developer could implement without worrying too much about overhead?
Eran Bibi: So, if shift left was all about putting stuff in the CI, I think right now the trend is even more left than that. So, every plugin that a developer can put in the IDE, whether it’s a DS code or other kind of tooling that gives them the visibility about the status of this code in all of those perspectives. Whether it’s a configuration or even a security on a specific code block. There is great tooling out there that integrates with the IDE. So, this is stuff that I recommend. And I think it’s become a standard kind of approach of having those plugins in the IDE.
Jeremy Hess: Great, that’s a great one. Yeah, I think that developers definitely have that idea. And on top of that, I wanted to ask a little extra here which is not specifically related to this necessarily. I’m going to be doing a meetup with your team on the newest product
Jeremy Hess: Yeah.
Eran Bibi: It’s great but you need to make sure again in early startups, to move fast, and then you have the time to fix and align other non-perfect kinds of workloads that you have on the cloud.
Jeremy Hess: Got it. Well, you come from a bit of a larger company that was a startup not too long ago, Aqua. So, you also have that security background. What would you say are some things that you saw at a larger company you were able to implement, but now at a startup it’s a little bit more difficult to implement those specific ideas and tools?
Eran Bibi: So, my experience with Aqua was very similar to what we have right now in Firefly. Because I joined Aqua Security at a very early stage when we were like 20 employees. And again, it was the same dilemma because I was the one in charge of the CI/CD pipeline. I put the tooling in place to make sure the right gating was in place. So, I think if I’m looking specifically at security, the policy was very strict regarding high or critical vulnerabilities, that the CI would basically stop the build if there were such findings, but for less than that we were only about alerting, and not enforcing and stopping the build. Because again, startups need to make sure there is a high velocity of workloads, and developers don’t have to deal all the time with why the CI is stopping the build. Again, as I mentioned, it’s really in terms of finding the right balance.
Jeremy Hess: Absolutely. So, another question I like to ask all my guests when we’re talking about, especially security is, at Firefly do you use your own product to check your own product?
Eran Bibi: Yes of course, we call it drinking our own champagne.
Jeremy Hess: Yeah. Instead of dog food, right? Instead of dog food.
Eran Bibi: Yes, dog food, we found a prettier term for that. So, the dog food protocol in Firefly is about making sure all of our AWS accounts and Kubernetes clusters are integrated with Firefly. And we use that to make sure the cloud is aligned with best practices and everything is codified, meaning described as code. It’s a great kind of experience. We have our own DevOps engineer who uses Firefly like any other customer. And if he finds a defect he opens a defect for the development and it’s a great methodology.
Jeremy Hess: Alright, fantastic. One more question I’d like to also ask is just for our listeners, because of course we’re trying to talk about shift left principles and how to make sure that developers are still able to continue doing their work, with as little stopping and changing as possible. But also make sure that they’re implementing security best practices. What are, let’s say one or two tips you might give generally about how developers can implement security? What are some, in your head, in your eyes, basics that a developer could implement without worrying too much about overhead?
Eran Bibi: So, if shift left was all about putting stuff in the CI, I think right now the trend is even more left than that. So, every plugin that a developer can put in the IDE, whether it’s a DS code or other kind of tooling that gives them the visibility about the status of this code in all of those perspectives. Whether it’s a configuration or even a security on a specific code block. There is great tooling out there that integrates with the IDE. So, this is stuff that I recommend. And I think it’s become a standard kind of approach of having those plugins in the IDE.
Jeremy Hess: Great, that’s a great one. Yeah, I think that developers definitely have that idea. And on top of that, I wanted to ask a little extra here which is not specifically related to this necessarily. I’m going to be doing a meetup with your team on the newest product
open-source project that was put out into the world, of ValidIaC. So, maybe as a little preview we can give our listeners, what’s ValidIaC all about?
Eran Bibi: Of course, so we basically created that SaaS offering for developers to have access directly from their browsers to make sure the infrastructure-as-code is aligned with a few verticals. So, we have a security scanner, a linter, and also a cost projection. We use a few of the most popular open-source tools and just combine them into one very nice SaaS-based UI. And of course, it’s open-source, so I encourage everybody to support us with stars and even contribute if you think additional tools can be added to that portal. But I will keep the demo to the meetup.
Jeremy Hess: Yeah, for later absolutely. Fantastic. Alright, so that was a fantastic interview. Eran, I really appreciate your time. Thank you so much. Good luck with Firefly, and hopefully we get to have you on again when Firefly hits their next rounds of funding and beyond. We wish you all the best. So, have a great day and thanks so much.
Eran Bibi: Thank you Jeremy. Thank you for having me.
https://redd.it/yx0ivu
@r_devops
Eran Bibi: Of course, so we basically created that SaaS offering for developers to have access directly from their browsers to make sure the infrastructure-as-code is aligned with a few verticals. So, we have a security scanner, a linter, and also a cost projection. We use a few of the most popular open-source tools and just combine them into one very nice SaaS-based UI. And of course, it’s open-source, so I encourage everybody to support us with stars and even contribute if you think additional tools can be added to that portal. But I will keep the demo to the meetup.
Jeremy Hess: Yeah, for later absolutely. Fantastic. Alright, so that was a fantastic interview. Eran, I really appreciate your time. Thank you so much. Good luck with Firefly, and hopefully we get to have you on again when Firefly hits their next rounds of funding and beyond. We wish you all the best. So, have a great day and thanks so much.
Eran Bibi: Thank you Jeremy. Thank you for having me.
https://redd.it/yx0ivu
@r_devops
reddit
DevSec for Scale Podcast Ep 6: Policy-as-Code
I really enjoyed this discussion from Akeyless on Policy-as-code and I figured that I could share it here. [Podcast...
What is a good way to document CI/CD pipelines?
I’m building some pipelines for various apps, this includes CI and CD. I want to start by illustrating to the team the different tools and steps within these pipelines. Are there any free tools for generating nice and illustrative docs that are DevOps orientated?
https://redd.it/yx4jv6
@r_devops
I’m building some pipelines for various apps, this includes CI and CD. I want to start by illustrating to the team the different tools and steps within these pipelines. Are there any free tools for generating nice and illustrative docs that are DevOps orientated?
https://redd.it/yx4jv6
@r_devops
reddit
What is a good way to document CI/CD pipelines?
I’m building some pipelines for various apps, this includes CI and CD. I want to start by illustrating to the team the different tools and steps...
Feeling not so great about being a DevOps\Cloud Engineer
Hello fellow DevOps friends,
Lately I have been feeling very down and depressed about how I'm functioning in DevOps, cloud, infrastructure, etc.
I got into DevOps about 6(?) years ago when I moved to it from QE, so I never had a strong programming background. I do love learning about different tools and technologies and finding effective ways to implement them, but I really feel like I'm lagging behind.
I've been working in Azure consistently for 7 months and still barely understand it even at a fundamental level. Aside from this, I've been experiencing major brain fog, not able to focus at work, etc. Not sure if that's stress of learning so many new tools, or how I'm feeling (or maybe it's all the Excel sheets... 🤢), but it's impacting how I'm performing.
I just wanna know if someone else in the DevOps world has experienced this, and have you/how did you overcome it? I'm feeling so scrambled 😫
https://redd.it/yx2pkx
@r_devops
Hello fellow DevOps friends,
Lately I have been feeling very down and depressed about how I'm functioning in DevOps, cloud, infrastructure, etc.
I got into DevOps about 6(?) years ago when I moved to it from QE, so I never had a strong programming background. I do love learning about different tools and technologies and finding effective ways to implement them, but I really feel like I'm lagging behind.
I've been working in Azure consistently for 7 months and still barely understand it even at a fundamental level. Aside from this, I've been experiencing major brain fog, not able to focus at work, etc. Not sure if that's stress of learning so many new tools, or how I'm feeling (or maybe it's all the Excel sheets... 🤢), but it's impacting how I'm performing.
I just wanna know if someone else in the DevOps world has experienced this, and have you/how did you overcome it? I'm feeling so scrambled 😫
https://redd.it/yx2pkx
@r_devops
reddit
Feeling not so great about being a DevOps\Cloud Engineer
Hello fellow DevOps friends, Lately I have been feeling very down and depressed about how I'm functioning in DevOps, cloud, infrastructure,...
DevOps infrastructure from scratch
I'm a long time Network/SysAdmin who wants to move into DevOps and SRE type roles.
I want to setup an environment from scratch implementing best practices, and need a little guidance with the foundational building blocks, and where to start. I want to do this on the cheap using FOSS and low cost services (but only when necessary). That being said, I don't want to close the door on paid services, especially Azure as our current application stack is Windows based (and could migrate to Azure in the future, but hopefully not.)
* I have a somewhat beefy server (dual Xeon, 192GB RAM, redundant storage) on premises that is a blank slate (but I'd like to use Proxmox as it allows hosting any OS). We have gigabit internet.
* I also have a free tier Oracle Cloud (Ampere aka ARM) account that seems pretty decent.
* Finally, I could add a cheap VPS (think LowEndBox) if there is any benefit. I also have more hardware on premises I can use.
I'd like to start my build with something like Terraform, but Ansible, Puppet, etc. are options. This kind of feels like picking oneself up by your own bootstraps. I'm trying to avoid installing directly on my workstation. I'm unclear on where to start.
Eventually I'd progress into Docker (or Podman?), Kubernetes, host my own code repo, monitoring, etc. I guess my confusion is also about the order of operations so that I'm not having to undo/redo things.
Any help or advice is appreciated. Many thanks.
https://redd.it/yx2vci
@r_devops
I'm a long time Network/SysAdmin who wants to move into DevOps and SRE type roles.
I want to setup an environment from scratch implementing best practices, and need a little guidance with the foundational building blocks, and where to start. I want to do this on the cheap using FOSS and low cost services (but only when necessary). That being said, I don't want to close the door on paid services, especially Azure as our current application stack is Windows based (and could migrate to Azure in the future, but hopefully not.)
* I have a somewhat beefy server (dual Xeon, 192GB RAM, redundant storage) on premises that is a blank slate (but I'd like to use Proxmox as it allows hosting any OS). We have gigabit internet.
* I also have a free tier Oracle Cloud (Ampere aka ARM) account that seems pretty decent.
* Finally, I could add a cheap VPS (think LowEndBox) if there is any benefit. I also have more hardware on premises I can use.
I'd like to start my build with something like Terraform, but Ansible, Puppet, etc. are options. This kind of feels like picking oneself up by your own bootstraps. I'm trying to avoid installing directly on my workstation. I'm unclear on where to start.
Eventually I'd progress into Docker (or Podman?), Kubernetes, host my own code repo, monitoring, etc. I guess my confusion is also about the order of operations so that I'm not having to undo/redo things.
Any help or advice is appreciated. Many thanks.
https://redd.it/yx2vci
@r_devops
reddit
DevOps infrastructure from scratch
I'm a long time Network/SysAdmin who wants to move into DevOps and SRE type roles. I want to setup an environment from scratch implementing best...
Harness CI launches the fastest CI leveraging Drone
Interesting announcement from Harness on being the fastest CI on the planet. https://harness.io/blog/fastest-ci-tool There a repo to test the results. Has anyone tried it?
https://redd.it/yx79jk
@r_devops
Interesting announcement from Harness on being the fastest CI on the planet. https://harness.io/blog/fastest-ci-tool There a repo to test the results. Has anyone tried it?
https://redd.it/yx79jk
@r_devops
Harness.io
The Data is In: Harness CI is the Fastest on the Planet! | Harness
New data shows that Harness Continuous Integration is the fastest CI tool, building up to four times faster. This validates the impact of three important new feature enhancements: Cache Intelligence, Test Intelligence (a technology exclusive to Harness CI)…
Argo Rollouts and Service Mesh: Automate and Control Canary Releases
Canary release sometimes referred to as a phased rollout or an incremental rollout, is a technique for lowering the risk of releasing a new software version into production by gradually introducing the change to a small subset of users before presenting it to the entire infrastructure and making it available to everyone. Similar to a Blue-Green Deployment, you start by deploying the new version of your software to a subset of your infrastructure, to which no users are routed. When you are satisfied with the new version, you can begin directing a few select users to it. There are various strategies for selecting which users will see the new version: a simple strategy is to use a random sample; some companies choose to release the new version to their internal users and employees before releasing it to the rest of the world, and a more sophisticated approach is to select users based on their profile and other demographics. As you gain more confidence in the new version, you can start releasing it to more servers in your infrastructure and routing more users to it.
This article will describe how to use Argo Rollouts and Service Mesh osm-edge for automated, controlled canary releases of applications.
https://blog.flomesh.io/argo-rollouts-and-service-mesh-automate-and-control-canary-releases-c71e5403eb2
https://redd.it/yxdvod
@r_devops
Canary release sometimes referred to as a phased rollout or an incremental rollout, is a technique for lowering the risk of releasing a new software version into production by gradually introducing the change to a small subset of users before presenting it to the entire infrastructure and making it available to everyone. Similar to a Blue-Green Deployment, you start by deploying the new version of your software to a subset of your infrastructure, to which no users are routed. When you are satisfied with the new version, you can begin directing a few select users to it. There are various strategies for selecting which users will see the new version: a simple strategy is to use a random sample; some companies choose to release the new version to their internal users and employees before releasing it to the rest of the world, and a more sophisticated approach is to select users based on their profile and other demographics. As you gain more confidence in the new version, you can start releasing it to more servers in your infrastructure and routing more users to it.
This article will describe how to use Argo Rollouts and Service Mesh osm-edge for automated, controlled canary releases of applications.
https://blog.flomesh.io/argo-rollouts-and-service-mesh-automate-and-control-canary-releases-c71e5403eb2
https://redd.it/yxdvod
@r_devops
flomesh.io
Pipy
A programmable network proxy for the cloud, edge and IoT.
This experiment shows that Harness CI is faster than GitHub actions
The Drone founder Brad Rydzewski's new article is where the new data shows that Harness Continuous Integration builds up to four times faster than other leading CI tools, which validates the impact of three important new feature enhancements: Cache Intelligence, Test Intelligence (a technology exclusive to Harness CI), and Hosted Builds.
Here is the article: https://www.harness.io/blog/fastest-ci-tool
https://redd.it/yxgg0m
@r_devops
The Drone founder Brad Rydzewski's new article is where the new data shows that Harness Continuous Integration builds up to four times faster than other leading CI tools, which validates the impact of three important new feature enhancements: Cache Intelligence, Test Intelligence (a technology exclusive to Harness CI), and Hosted Builds.
Here is the article: https://www.harness.io/blog/fastest-ci-tool
https://redd.it/yxgg0m
@r_devops
Harness.io
The Data is In: Harness CI is the Fastest on the Planet! | Harness
New data shows that Harness Continuous Integration is the fastest CI tool, building up to four times faster. This validates the impact of three important new feature enhancements: Cache Intelligence, Test Intelligence (a technology exclusive to Harness CI)…
How to help train a newbie?
Hey all,
I just started my first senior role, the team is great and I love working with everyone. I suspect that one of my areas of focus will be training our Junior DevOps Engineer on our team! He's a nice kid whos basically straight out of college and has that "deer caught in the headlights" vibe lol. I don't think he knows much about DevOps and just accepted the job because why not (I was the same at that stage in my career, so I get it).
Any advice for training him up? I was given the sink or swim treatment but that doesn't always work for everyone. He's a bit shy and reserved so I think if I go that route he would just drown without asking for help....
Mostly I've been doing pair programming with him as I learn the system and shadow him but any more advice is always appreciated, I'm sure he'd thank you for any advice you give me as well!
https://redd.it/yxgwj1
@r_devops
Hey all,
I just started my first senior role, the team is great and I love working with everyone. I suspect that one of my areas of focus will be training our Junior DevOps Engineer on our team! He's a nice kid whos basically straight out of college and has that "deer caught in the headlights" vibe lol. I don't think he knows much about DevOps and just accepted the job because why not (I was the same at that stage in my career, so I get it).
Any advice for training him up? I was given the sink or swim treatment but that doesn't always work for everyone. He's a bit shy and reserved so I think if I go that route he would just drown without asking for help....
Mostly I've been doing pair programming with him as I learn the system and shadow him but any more advice is always appreciated, I'm sure he'd thank you for any advice you give me as well!
https://redd.it/yxgwj1
@r_devops
reddit
How to help train a newbie?
Hey all, I just started my first senior role, the team is great and I love working with everyone. I suspect that one of my areas of focus will...
Agile workflow with Jira and Git
Does anyone have a good resource to learn the best practices for a small development team (5-10 devs) working in an agile workflow? The process we use at my work is not very efficient but there are some challenges I can't figure out. I would love to hear about the workflows that are successful for you. I googled this but surprisingly I didn't find anything super detailed, just high level principles.
For context we work on web front-ends in React and back-ends (REST or GraphQL APIs) in Node. Database is on Planetscale. We use Jira and deploy on AWS but I would be open to any other tools/platforms.
Here's the process as I understand it, as well as my questions:
1. Developer creates a new branch to work on a feature/bugfix/etc (an item in the sprint)
2. When done, dev creates a pull request to merge their feature branch into the Staging branch
3. Someone (product manager in our case) tests the app against the acceptance criteria and if everything is OK they approve + merge the pull request into the Staging branch
1. For web front-ends (static sites like React, Vue, etc), all of these feature branches are automatically deployed using branch previews (like in Netlify, AWS Amplify, etc), so the PM can do their testing vs acceptance criteria in these temporary automatic deployments, no issue there. Database branching is done in Planetscale and that part works well too.
2. But how can/should this be done for web back-ends, or other systems that run on a server? For example a REST API running on Kubernetes or AWS ECS or AWS EC2 etc? Right now these need to be deployed manually, which is a huge pain. Even using terraform/cloudformation there are still manual steps required. Ideally it would be great if every branch was automatically deployed to some unique URL that we can use temporarily for testing.
3. How do we ensure the front-end is communicating with the correct instance of the back-end (.env management)? Same thing for pointing the back-end to the correct database branch URL. Right now this is done manually with environment variables.
4. Since the front-end and back-end are in separate Git repos, how do we ensure both of these branches and PRs for the same feature are in sync? How do we avoid having to approve/merge 2 separate PRs in 2 separate repos for the same feature / item in Jira? Is a monorepo the best approach? What if there are separate teams for front-end and back-end?
4. Once the PR is merged, the feature branch is deleted and the dev moves on to the next task in Jira and repeats the process
5. Once all the items we want to include in the next release are done and merged into staging, some more testing is done in staging and finally a release PR is created and merged into main and automatically deployed (CI/CD pipeline automatically runs and deploys the staging and main branches)
6. What do we do if we don't want to deploy every single change that was merged to staging? For example maybe the business decides to delay the release of a certain feature but it was already merged to staging. How can we avoid merging that into main? Do we have to implement feature flags for everything?
Please forgive me if this isn't the right subreddit for this question, and I would greatly appreciate it if you could point me to a better place for this question.
https://redd.it/yxinb0
@r_devops
Does anyone have a good resource to learn the best practices for a small development team (5-10 devs) working in an agile workflow? The process we use at my work is not very efficient but there are some challenges I can't figure out. I would love to hear about the workflows that are successful for you. I googled this but surprisingly I didn't find anything super detailed, just high level principles.
For context we work on web front-ends in React and back-ends (REST or GraphQL APIs) in Node. Database is on Planetscale. We use Jira and deploy on AWS but I would be open to any other tools/platforms.
Here's the process as I understand it, as well as my questions:
1. Developer creates a new branch to work on a feature/bugfix/etc (an item in the sprint)
2. When done, dev creates a pull request to merge their feature branch into the Staging branch
3. Someone (product manager in our case) tests the app against the acceptance criteria and if everything is OK they approve + merge the pull request into the Staging branch
1. For web front-ends (static sites like React, Vue, etc), all of these feature branches are automatically deployed using branch previews (like in Netlify, AWS Amplify, etc), so the PM can do their testing vs acceptance criteria in these temporary automatic deployments, no issue there. Database branching is done in Planetscale and that part works well too.
2. But how can/should this be done for web back-ends, or other systems that run on a server? For example a REST API running on Kubernetes or AWS ECS or AWS EC2 etc? Right now these need to be deployed manually, which is a huge pain. Even using terraform/cloudformation there are still manual steps required. Ideally it would be great if every branch was automatically deployed to some unique URL that we can use temporarily for testing.
3. How do we ensure the front-end is communicating with the correct instance of the back-end (.env management)? Same thing for pointing the back-end to the correct database branch URL. Right now this is done manually with environment variables.
4. Since the front-end and back-end are in separate Git repos, how do we ensure both of these branches and PRs for the same feature are in sync? How do we avoid having to approve/merge 2 separate PRs in 2 separate repos for the same feature / item in Jira? Is a monorepo the best approach? What if there are separate teams for front-end and back-end?
4. Once the PR is merged, the feature branch is deleted and the dev moves on to the next task in Jira and repeats the process
5. Once all the items we want to include in the next release are done and merged into staging, some more testing is done in staging and finally a release PR is created and merged into main and automatically deployed (CI/CD pipeline automatically runs and deploys the staging and main branches)
6. What do we do if we don't want to deploy every single change that was merged to staging? For example maybe the business decides to delay the release of a certain feature but it was already merged to staging. How can we avoid merging that into main? Do we have to implement feature flags for everything?
Please forgive me if this isn't the right subreddit for this question, and I would greatly appreciate it if you could point me to a better place for this question.
https://redd.it/yxinb0
@r_devops
reddit
Agile workflow with Jira and Git
Does anyone have a good resource to learn the best practices for a small development team (5-10 devs) working in an agile workflow? The process we...
Best tips for preparing for technical DevOps interviews. Is grinding Leetcode needed/worth it at all?
Context: I am already a DevOps Engineer and currently looking for a new position. I had some previous experience as a Software Engineer doing Java development but took a break from development for a year as I didin't enjoy programming all day everyday. Got a different position at a new company more business focused but opportunities and my skillsets brought me through Release Engineering and then into DevOps. Coming to my current role, I knew enough about programming to get the position but due to this year break I had...my programming skills are a little rusty.
In my role I have been doing some light Groovy scripting. Maintaining our pipelines, adding new steps and functionality to a handful of them, but I don't feel like any of the work that I have been doing has been really exercising any HARD programming skills/concepts.
As I feel it is the most useful/practical in a DevOps role)and given my knowledge and background in OOP, i've been trying to learn python from scratch (Bash comes next).
What types of problems/concepts should I be practicing when I am trying to study for the coding portion of technical DevOps interviews? Is grinding leetcode problems and going through algorithm and data structure problems (stuff I would normally grind if I were going for a softare engineer position) worth it or might it be overkill for questions I would get asked to do?
Any input helps! Thank you.
https://redd.it/yxntyg
@r_devops
Context: I am already a DevOps Engineer and currently looking for a new position. I had some previous experience as a Software Engineer doing Java development but took a break from development for a year as I didin't enjoy programming all day everyday. Got a different position at a new company more business focused but opportunities and my skillsets brought me through Release Engineering and then into DevOps. Coming to my current role, I knew enough about programming to get the position but due to this year break I had...my programming skills are a little rusty.
In my role I have been doing some light Groovy scripting. Maintaining our pipelines, adding new steps and functionality to a handful of them, but I don't feel like any of the work that I have been doing has been really exercising any HARD programming skills/concepts.
As I feel it is the most useful/practical in a DevOps role)and given my knowledge and background in OOP, i've been trying to learn python from scratch (Bash comes next).
What types of problems/concepts should I be practicing when I am trying to study for the coding portion of technical DevOps interviews? Is grinding leetcode problems and going through algorithm and data structure problems (stuff I would normally grind if I were going for a softare engineer position) worth it or might it be overkill for questions I would get asked to do?
Any input helps! Thank you.
https://redd.it/yxntyg
@r_devops
reddit
Best tips for preparing for technical DevOps interviews. Is...
Context: I am already a DevOps Engineer and currently looking for a new position. I had some previous experience as a Software Engineer doing Java...
Server starts dropping http connections after a certain amount of requests
Hello, I'm not sure if this is the right place to ask such a question but I'm trying to get help somewhere as I'm unable to get this resolved in any other places (tried stack overflow, plesk forums, numerous other forums).
I have two domains setup on our server - let's say usersite.com and api.usersite.com. usersite.com is powered by nuxt.js - a front-end framework which runs on node.js. It makes API calls to api.usersite.com, which is a Laravel application. Both of these projects are running inside docker containers. Usersite is using a reverse nginx proxy to the api site.
Now to the problem - when there is slightly higher traffic to usersite (200 users per minute) API site starts to drop connections, immediately resulting in 504. Perhaps someone could guide me in the right direction of why this might be happening? I've noticed that API website logs show that all requests come from the same IP (the server it self), that means that as requests are proxied they take the proxy server ip instad of client ip. So perhaps a self-ddos is happening, where nginx thinks one ip is flooding it with requests and starts dropping connections? What could be the possible solution for this?
What's weird is that it's not an uncommon practice to have back-end separate from front-end and for them to communicate through API with reverse proxy but I can't find any results regarding such issue that I have on Google...
https://redd.it/yxmmxr
@r_devops
Hello, I'm not sure if this is the right place to ask such a question but I'm trying to get help somewhere as I'm unable to get this resolved in any other places (tried stack overflow, plesk forums, numerous other forums).
I have two domains setup on our server - let's say usersite.com and api.usersite.com. usersite.com is powered by nuxt.js - a front-end framework which runs on node.js. It makes API calls to api.usersite.com, which is a Laravel application. Both of these projects are running inside docker containers. Usersite is using a reverse nginx proxy to the api site.
Now to the problem - when there is slightly higher traffic to usersite (200 users per minute) API site starts to drop connections, immediately resulting in 504. Perhaps someone could guide me in the right direction of why this might be happening? I've noticed that API website logs show that all requests come from the same IP (the server it self), that means that as requests are proxied they take the proxy server ip instad of client ip. So perhaps a self-ddos is happening, where nginx thinks one ip is flooding it with requests and starts dropping connections? What could be the possible solution for this?
What's weird is that it's not an uncommon practice to have back-end separate from front-end and for them to communicate through API with reverse proxy but I can't find any results regarding such issue that I have on Google...
https://redd.it/yxmmxr
@r_devops
reddit
Server starts dropping http connections after a certain amount of...
Hello, I'm not sure if this is the right place to ask such a question but I'm trying to get help somewhere as I'm unable to get this resolved in...
Developer self-service portal for Kubernetes/Helm
We are working on a tool that allows **developers** to deploy their own services from a catalog, via a simple UI portal. DevOps engineers can create a catalog of deployable apps via templates. Each template can define custom user-inputs and can define one or more services(helm charts).
[https://github.com/JovianX/Service-Hub](https://github.com/JovianX/Service-Hub) (Please star ⭐ on GitHub if you think it's cool).
This is an alternative to what currently happens in many organizations where DevOps create hackware solutions for developers to deploy on-demand services with Jenkins Jobs, Scaffold Git repos with custom actions, and so on.
The tool offers a very simple way to create a Self-Service app deployment on Kubernetes with Helm. The tool creates a self-service UI, with custom user-inputs. The user-inputs can be used as Helm values to allow users to configure some parts of the application.
You can define [templates](https://github.com/JovianX/Service-Hub/blob/main/documentation/templates.md), which construct the catalog you expose to developers. An application template can compose multiple helm charts (for example, an app layer that needs a database, somewhat similar to Helmfile).
Here's a simple **Template** example for creating Redis-as--a-Service:
# Template reference and documentation at
# https://github.com/JovianX/Service-Hub/blob/main/documentation/templates.md
name: my-new-service
components:
- name: redis
type: helm_chart
chart: bitnami/redis
version: 17.0.7
values:
- db:
username: {{ inputs.username }}
inputs:
- name: username
type: text
label: 'User Name'
default: 'John Connor'
description: 'Choose a username'
The template creates this Self-Service experience [https://user-images.githubusercontent.com/2787296/198906162-5aaa83df-7a7b-4ec5-b1e0-3a6f455a010e.png](https://user-images.githubusercontent.com/2787296/198906162-5aaa83df-7a7b-4ec5-b1e0-3a6f455a010e.png)
We are gathering **feature requests**, and **user** **feedback**.
I would love to read thoughts and get extremely excited by GitHub **STARS**! ⭐
https://redd.it/yxrhxw
@r_devops
We are working on a tool that allows **developers** to deploy their own services from a catalog, via a simple UI portal. DevOps engineers can create a catalog of deployable apps via templates. Each template can define custom user-inputs and can define one or more services(helm charts).
[https://github.com/JovianX/Service-Hub](https://github.com/JovianX/Service-Hub) (Please star ⭐ on GitHub if you think it's cool).
This is an alternative to what currently happens in many organizations where DevOps create hackware solutions for developers to deploy on-demand services with Jenkins Jobs, Scaffold Git repos with custom actions, and so on.
The tool offers a very simple way to create a Self-Service app deployment on Kubernetes with Helm. The tool creates a self-service UI, with custom user-inputs. The user-inputs can be used as Helm values to allow users to configure some parts of the application.
You can define [templates](https://github.com/JovianX/Service-Hub/blob/main/documentation/templates.md), which construct the catalog you expose to developers. An application template can compose multiple helm charts (for example, an app layer that needs a database, somewhat similar to Helmfile).
Here's a simple **Template** example for creating Redis-as--a-Service:
# Template reference and documentation at
# https://github.com/JovianX/Service-Hub/blob/main/documentation/templates.md
name: my-new-service
components:
- name: redis
type: helm_chart
chart: bitnami/redis
version: 17.0.7
values:
- db:
username: {{ inputs.username }}
inputs:
- name: username
type: text
label: 'User Name'
default: 'John Connor'
description: 'Choose a username'
The template creates this Self-Service experience [https://user-images.githubusercontent.com/2787296/198906162-5aaa83df-7a7b-4ec5-b1e0-3a6f455a010e.png](https://user-images.githubusercontent.com/2787296/198906162-5aaa83df-7a7b-4ec5-b1e0-3a6f455a010e.png)
We are gathering **feature requests**, and **user** **feedback**.
I would love to read thoughts and get extremely excited by GitHub **STARS**! ⭐
https://redd.it/yxrhxw
@r_devops
GitHub
GitHub - JovianX/Service-Hub: ServiceHub is a Self-service Portal, for creation and day 2 operations, leverages existing automation…
ServiceHub is a Self-service Portal, for creation and day 2 operations, leverages existing automation processes. SerivceHub is built for Platform Engineers. - JovianX/Service-Hub
Aliasing of EKS endpoint domain
Hello peeps,
Would be aliasing `https://<HASH>.gr7.<region>.eks.amazonaws.com` to a custom CNAME, such as <myClusterName>.<region>.domain to have a predictable endpoint that in turn can be hardcoded in some places a bad practice? Any advice against or in favor of this?
Thank you for your input.
https://redd.it/yxpeo0
@r_devops
Hello peeps,
Would be aliasing `https://<HASH>.gr7.<region>.eks.amazonaws.com` to a custom CNAME, such as <myClusterName>.<region>.domain to have a predictable endpoint that in turn can be hardcoded in some places a bad practice? Any advice against or in favor of this?
Thank you for your input.
https://redd.it/yxpeo0
@r_devops
reddit
Aliasing of EKS endpoint domain
Hello peeps, Would be aliasing \`https://<HASH>.gr7.<region>.eks.amazonaws.com\` to a custom CNAME, such as <myClusterName>.<region>.domain to...
Best options for SLA/SLO tracking outside of data dog
We have very basic needs
Monitor uptime of MongoDB atlas cluster
A few ec2 instances
Need to ping a frontend react app
Need to ping uptime for a graphql api endpoint
That’s about it
I’ve set this up with datadog but worried about the cost, not today, but in two years
Are any other APMs going to be that much cheaper while still doing it all with one account?
https://redd.it/yxo7t0
@r_devops
We have very basic needs
Monitor uptime of MongoDB atlas cluster
A few ec2 instances
Need to ping a frontend react app
Need to ping uptime for a graphql api endpoint
That’s about it
I’ve set this up with datadog but worried about the cost, not today, but in two years
Are any other APMs going to be that much cheaper while still doing it all with one account?
https://redd.it/yxo7t0
@r_devops
reddit
Best options for SLA/SLO tracking outside of data dog
We have very basic needs Monitor uptime of MongoDB atlas cluster A few ec2 instances Need to ping a frontend react app Need to ping uptime for...
How do you track/help onboarding to on-call?
When it comes to something like interviewing, ramping someone to run interviews often involves a process of shadowing for a number of times and some level of feedback before you become officially 'ramped'.
When I've lead teams before, as a team lead I've tracked which incidents people have been involved in, and which services they've touched. But I never had a proper structure to the onboarding, probably because:
- Incident training often requires participating in real incidents, which can’t be scheduled in advance.
- When one does occur, responders want to focus fully on the incident: they don’t want to be searching for an onboarding spreadsheet, making coordinating onboarding a low priority.
- Incidents are varied, as is the way people participate in them, making it difficult to understand what qualifies as ‘training’.
I wondered if people have had more structure than me on this, and if so what and how are they tracking it?
The context is we're considering building this into our product (incident.io) as a concept of onboarding programmes, where you can say:
> You're ramped to handle SRE incidents once you've shadowed the lead for >3 incidents involving either Postgres, ElasticSearch, etc, and led at least one yourself
And want to know how/if people are doing this already.
https://redd.it/yxnd4o
@r_devops
When it comes to something like interviewing, ramping someone to run interviews often involves a process of shadowing for a number of times and some level of feedback before you become officially 'ramped'.
When I've lead teams before, as a team lead I've tracked which incidents people have been involved in, and which services they've touched. But I never had a proper structure to the onboarding, probably because:
- Incident training often requires participating in real incidents, which can’t be scheduled in advance.
- When one does occur, responders want to focus fully on the incident: they don’t want to be searching for an onboarding spreadsheet, making coordinating onboarding a low priority.
- Incidents are varied, as is the way people participate in them, making it difficult to understand what qualifies as ‘training’.
I wondered if people have had more structure than me on this, and if so what and how are they tracking it?
The context is we're considering building this into our product (incident.io) as a concept of onboarding programmes, where you can say:
> You're ramped to handle SRE incidents once you've shadowed the lead for >3 incidents involving either Postgres, ElasticSearch, etc, and led at least one yourself
And want to know how/if people are doing this already.
https://redd.it/yxnd4o
@r_devops
reddit
How do you track/help onboarding to on-call?
When it comes to something like interviewing, ramping someone to run interviews often involves a process of shadowing for a number of times and...
My mandate is being moved from “DevOps” to “Developer Experience.” Has anyone else made this switch?
Context: Been overseeing the devops for an ecomm company for a little over three years. We brought in a new CTO from a rival startup earlier this year who seems to be way more plugged in to trends in the broader developer community than most of us.
After mentioning “Developer Experience” without much explanation, he’s formally asked me to make it my priority for 2023.
The problem I’m having is there doesn’t even to seem be a crystalized consensus on what “Developer Experience” even means.
From my early research it’s everything from building new CI/CD frameworks to “making sure the developers have the muffins they like.”
Hoping to get any insights you might have on best practices as well as what falls under this responsibility so I can start making a plan.
https://redd.it/yxxeen
@r_devops
Context: Been overseeing the devops for an ecomm company for a little over three years. We brought in a new CTO from a rival startup earlier this year who seems to be way more plugged in to trends in the broader developer community than most of us.
After mentioning “Developer Experience” without much explanation, he’s formally asked me to make it my priority for 2023.
The problem I’m having is there doesn’t even to seem be a crystalized consensus on what “Developer Experience” even means.
From my early research it’s everything from building new CI/CD frameworks to “making sure the developers have the muffins they like.”
Hoping to get any insights you might have on best practices as well as what falls under this responsibility so I can start making a plan.
https://redd.it/yxxeen
@r_devops
reddit
My mandate is being moved from “DevOps” to “Developer Experience.”...
Context: Been overseeing the devops for an ecomm company for a little over three years. We brought in a new CTO from a rival startup earlier this...
Branching and deployment strategy for continuous integration
What branching/merging/deployment strategy would you use for a development team of 5 developing a webapp with 10,000 users (not small, not large)?
Currently we have three environments: development, staging, production. Features are developed on feature branches and merged to master, causing an auto-deployment to staging. After smoke testing on staging the developer click-ops to production.
If an issue is discovered on staging, the developer creates a new branch (hotfix) which is merged again to master. There is no way to reverse the feature branch merge to master after the fact.
An added complication: if production ever goes down while the master branch is compromised, the system will auto-deploy the compromised master branch to production.
Also, the development environment is a free-for-all.
There has to be a better approach...
https://redd.it/yxzi8d
@r_devops
What branching/merging/deployment strategy would you use for a development team of 5 developing a webapp with 10,000 users (not small, not large)?
Currently we have three environments: development, staging, production. Features are developed on feature branches and merged to master, causing an auto-deployment to staging. After smoke testing on staging the developer click-ops to production.
If an issue is discovered on staging, the developer creates a new branch (hotfix) which is merged again to master. There is no way to reverse the feature branch merge to master after the fact.
An added complication: if production ever goes down while the master branch is compromised, the system will auto-deploy the compromised master branch to production.
Also, the development environment is a free-for-all.
There has to be a better approach...
https://redd.it/yxzi8d
@r_devops
reddit
Branching and deployment strategy for continuous integration
What branching/merging/deployment strategy would you use for a development team of 5 developing a webapp with 10,000 users (not small, not large)?...