Comparison among techniques to share GPUs in Kubernetes
I recently released an [opensource library to dynamically leverage GPU with NVIDIA MIG and with MPS](https://github.com/nebuly-ai/nos), and the most appreciated component of the comparison among sharing technologies, so I wanted to share it here.
There are three approaches for sharing GPUs in Kubernetes:
1. Multi-Instance GPU ([MIG](https://github.com/NVIDIA/mig-parted))
2. Multi-Process Service ([MPS](https://docs.nvidia.com/deploy/mps/index.html))
3. Time Slicing ([TS](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html))
# Multi-Instance GPU (MIG)
**Workload isolation**: best
**Pros**
* Processes are executed in parallel
* Full isolation (dedicated memory and compute resources)
**Cons**
* Supported by fewer GPU architectures (only Ampere or more recent architectures)
* Coarse-grained control over memory and compute resources
**References**: [Tutorial on how to use Dynamic MIG Partitioning](https://towardsdatascience.com/dynamic-mig-partitioning-in-kubernetes-89db6cdde7a3)
# Multi-Process Service (MPS)
**Workload isolation**: medium
**Pros**
* Supported by almost every GPU architecture
* Processes are executed parallel
* Fine-grained control over memory and compute resources allocation
* It lets you setup memory limits
**Cons**
* No memory protection and error isolation
**References**: [Comparison of sharing techniques and tutorial on how to use MPS](https://towardsdatascience.com/how-to-increase-gpu-utilization-in-kubernetes-with-nvidia-mps-e680d20c3181)
# Time Slicing
**Workload isolation**: none
**Pros**
* Supported by almost every GPU architecture
* Processes are executed concurrently
**Cons**
* No resource limits
* No memory isolation
* Lower performance due to context-switching overhead
**References**: [Time-Slicing GPUs in Kubernetes](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html)
​
**Resources**
* [Dynamic GPU Partitioning documentation](https://docs.nebuly.com/nos/dynamic-gpu-partitioning/overview/)
* [NVIDIA GPU Operator documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html)
* [NVIDIA MIG User guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/)
https://redd.it/10xty21
@r_devops
I recently released an [opensource library to dynamically leverage GPU with NVIDIA MIG and with MPS](https://github.com/nebuly-ai/nos), and the most appreciated component of the comparison among sharing technologies, so I wanted to share it here.
There are three approaches for sharing GPUs in Kubernetes:
1. Multi-Instance GPU ([MIG](https://github.com/NVIDIA/mig-parted))
2. Multi-Process Service ([MPS](https://docs.nvidia.com/deploy/mps/index.html))
3. Time Slicing ([TS](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html))
# Multi-Instance GPU (MIG)
**Workload isolation**: best
**Pros**
* Processes are executed in parallel
* Full isolation (dedicated memory and compute resources)
**Cons**
* Supported by fewer GPU architectures (only Ampere or more recent architectures)
* Coarse-grained control over memory and compute resources
**References**: [Tutorial on how to use Dynamic MIG Partitioning](https://towardsdatascience.com/dynamic-mig-partitioning-in-kubernetes-89db6cdde7a3)
# Multi-Process Service (MPS)
**Workload isolation**: medium
**Pros**
* Supported by almost every GPU architecture
* Processes are executed parallel
* Fine-grained control over memory and compute resources allocation
* It lets you setup memory limits
**Cons**
* No memory protection and error isolation
**References**: [Comparison of sharing techniques and tutorial on how to use MPS](https://towardsdatascience.com/how-to-increase-gpu-utilization-in-kubernetes-with-nvidia-mps-e680d20c3181)
# Time Slicing
**Workload isolation**: none
**Pros**
* Supported by almost every GPU architecture
* Processes are executed concurrently
**Cons**
* No resource limits
* No memory isolation
* Lower performance due to context-switching overhead
**References**: [Time-Slicing GPUs in Kubernetes](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html)
​
**Resources**
* [Dynamic GPU Partitioning documentation](https://docs.nebuly.com/nos/dynamic-gpu-partitioning/overview/)
* [NVIDIA GPU Operator documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html)
* [NVIDIA MIG User guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/)
https://redd.it/10xty21
@r_devops
GitHub
GitHub - nebuly-ai/nos: Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real…
Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas - Effortless optimization at its finest! - nebuly...
Need a catchy name for a data migration tool
My company is developing a new data migration tool and we're trying to come up with a catchy name. Open to any and all suggestions!
https://redd.it/10xy5j9
@r_devops
My company is developing a new data migration tool and we're trying to come up with a catchy name. Open to any and all suggestions!
https://redd.it/10xy5j9
@r_devops
Reddit
r/devops - Need a catchy name for a data migration tool
2 votes and 9 comments so far on Reddit
Terraform vs. Cloudformation for an all-AWS Environment in 2023?
Current company uses Cloudformation for everything. I work in an AWS-only environment (except for a few data workloads on GCP, which use Terraform, but they're an exception and not worth considering in this question).
I'm wondering — in 2023, is there a tangible benefit for ripping up all our Cloudformation and rewriting it all in Terraform? Assuming we have no plans to migrate off of AWS anytime soon?
What are the pros and cons of Terraform vs. Cloudformation in 2023 for an AWS-only company?
https://redd.it/10y3p20
@r_devops
Current company uses Cloudformation for everything. I work in an AWS-only environment (except for a few data workloads on GCP, which use Terraform, but they're an exception and not worth considering in this question).
I'm wondering — in 2023, is there a tangible benefit for ripping up all our Cloudformation and rewriting it all in Terraform? Assuming we have no plans to migrate off of AWS anytime soon?
What are the pros and cons of Terraform vs. Cloudformation in 2023 for an AWS-only company?
https://redd.it/10y3p20
@r_devops
Reddit
r/devops - Terraform vs. Cloudformation for an all-AWS Environment in 2023?
Posted in the devops community.
Docker/Kubernetes Role in CI/CD
I want to gain a better understanding of how docker/kubernetes generally fits into the CI/CD pipeline, as I am completely new to docker/kubernetes.
Are docker/kubernetes generally used in the lower environments, and then when we reach the production stage, do companies generally just install the apps directly on the servers instead of containers?
https://redd.it/10y1frz
@r_devops
I want to gain a better understanding of how docker/kubernetes generally fits into the CI/CD pipeline, as I am completely new to docker/kubernetes.
Are docker/kubernetes generally used in the lower environments, and then when we reach the production stage, do companies generally just install the apps directly on the servers instead of containers?
https://redd.it/10y1frz
@r_devops
Reddit
r/devops - Docker/Kubernetes Role in CI/CD
3 votes and 1 comment so far on Reddit
Documentation Advice
I’m looking for the best ways to create and manage documentation for our companies projects.
Some basic principles I’ve been considering
- Documentation should be managed as source code in the same repository as the code
- Documentation should be generated and published as part of the deployment pipeline
- If possible it could be helpful to use a linter to warn that documentation and code is out of sync
I’m also trying to figure out the different levels of documentation. Here’s what I’m considering currently.
- High level architecture and how the components interact
- UI user stories and features
- API documentation
- Method level documentation such as Java Docs
Honestly I’m just looking for general advice and experiences.
Thanks!
https://redd.it/10y4wg4
@r_devops
I’m looking for the best ways to create and manage documentation for our companies projects.
Some basic principles I’ve been considering
- Documentation should be managed as source code in the same repository as the code
- Documentation should be generated and published as part of the deployment pipeline
- If possible it could be helpful to use a linter to warn that documentation and code is out of sync
I’m also trying to figure out the different levels of documentation. Here’s what I’m considering currently.
- High level architecture and how the components interact
- UI user stories and features
- API documentation
- Method level documentation such as Java Docs
Honestly I’m just looking for general advice and experiences.
Thanks!
https://redd.it/10y4wg4
@r_devops
Reddit
r/devops - Documentation Advice
Posted in the devops community.
How's the market for fully remote roles?
I'm from the UK. I've just accepted a new role that's two days on-site. I don't mind it as I feel it'll benefit me but would have preferred less time in the office. When I was applying I saw most places were hybrid, only a few advertised as fully remote.
How has it been finding fully remote work for you? Has your company ordered you back on a hybrid basis?
https://redd.it/10y7ne6
@r_devops
I'm from the UK. I've just accepted a new role that's two days on-site. I don't mind it as I feel it'll benefit me but would have preferred less time in the office. When I was applying I saw most places were hybrid, only a few advertised as fully remote.
How has it been finding fully remote work for you? Has your company ordered you back on a hybrid basis?
https://redd.it/10y7ne6
@r_devops
Reddit
How's the market for fully remote roles?
Posted in the devops community.
ChatGPT's Thoughts on DevOps Engineer vs Site Reliability Engineer vs Systems Engineer vs Software Architect vs Software Engineer
| Role | Description | Key Responsibilities | Skills |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DevOps Engineer | A DevOps Engineer is responsible for automating, testing, and deploying software releases. They work closely with software developers and IT operations to ensure that the software is reliable and scalable. They use tools like continuous integration, continuous delivery, and configuration management to automate the software delivery process. | Automating software releases, testing and deploying code, working with software developers and IT operations, using tools for continuous integration, delivery and configuration management. | Knowledge of continuous integration and delivery (CI/CD) tools, experience with automation and scripting, ability to work in a fast-paced environment, strong collaboration and communication skills. |
| Site Reliability Engineer (SRE) | An SRE is responsible for the availability, scalability, and performance of a company's production systems. They work closely with software developers to ensure that software is designed for operations, and they use a variety of tools and processes to automate and manage the deployment and maintenance of software releases. An SRE is also responsible for incident response, disaster recovery, and capacity planning. | Ensuring availability, scalability and performance of production systems, working with software developers, automating and managing software releases, incident response and disaster recovery, capacity planning. | Strong experience with Linux/Unix administration, experience with automation and scripting, experience with incident response and disaster recovery, ability to work with software developers, strong problem-solving skills. |
| Systems Engineer | A Systems Engineer is responsible for the design, implementation, and maintenance of an organization's IT systems. They work closely with software developers, network engineers, and security professionals to ensure
| Role | Description | Key Responsibilities | Skills |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DevOps Engineer | A DevOps Engineer is responsible for automating, testing, and deploying software releases. They work closely with software developers and IT operations to ensure that the software is reliable and scalable. They use tools like continuous integration, continuous delivery, and configuration management to automate the software delivery process. | Automating software releases, testing and deploying code, working with software developers and IT operations, using tools for continuous integration, delivery and configuration management. | Knowledge of continuous integration and delivery (CI/CD) tools, experience with automation and scripting, ability to work in a fast-paced environment, strong collaboration and communication skills. |
| Site Reliability Engineer (SRE) | An SRE is responsible for the availability, scalability, and performance of a company's production systems. They work closely with software developers to ensure that software is designed for operations, and they use a variety of tools and processes to automate and manage the deployment and maintenance of software releases. An SRE is also responsible for incident response, disaster recovery, and capacity planning. | Ensuring availability, scalability and performance of production systems, working with software developers, automating and managing software releases, incident response and disaster recovery, capacity planning. | Strong experience with Linux/Unix administration, experience with automation and scripting, experience with incident response and disaster recovery, ability to work with software developers, strong problem-solving skills. |
| Systems Engineer | A Systems Engineer is responsible for the design, implementation, and maintenance of an organization's IT systems. They work closely with software developers, network engineers, and security professionals to ensure
that systems are secure, scalable, and reliable. They are often responsible for the deployment and configuration of hardware and software systems, as well as for the design of backup and recovery systems. | Designing, implementing and maintaining IT systems, working with software developers, network engineers and security professionals, deploying and configuring hardware and software systems, designing backup and recovery systems. | Strong experience with network and system administration, knowledge of security best practices, experience with deployment and configuration of hardware and software systems, ability to work with a variety of technical teams. |
| Software Architect | A Software Architect is responsible for the overall design and architecture of software systems. They work closely with software developers, business stakeholders, and other technical teams to ensure that software systems are scalable, secure, and meet the needs of the business. A Software Architect is also responsible for making technology decisions and determining the best tools and frameworks to use for a particular project. | Designing the architecture of software systems, working with software developers, business stakeholders and other technical teams, making technology decisions, determining best tools and frameworks for a project. | Strong experience with software development, knowledge of software architecture and design patterns, ability to think strategically, strong communication and collaboration skills. |
| Software Engineer | A Software Engineer is responsible for writing and testing code for software systems. They work closely with software architects, business stakeholders, and other technical teams to develop software that is scalable, secure, and meets the needs of the business. A software engineer is also responsible for fixing bugs and addressing technical issues that arise during the development process. | Writing and testing code, working with software architects, business stakeholders and other technical teams, fixing bugs and addressing technical issues. | Strong programming skills in one or more languages, experience with software development processes, ability to work in a fast-paced environment, strong problem-solving skills. |
Similarities:
- All of these roles are technical positions that involve working with software and technology systems.
- They all require strong problem-solving skills and the ability to work in a fast-paced environment.
- They all involve working with other technical teams and stakeholders, such as software developers, network engineers, and business stakeholders.
- They all require a strong understanding of software development processes and technologies.
Differences:
- DevOps Engineer: DevOps Engineers focus on automating and streamlining the software delivery process, and work closely with software developers and IT operations to ensure that software is reliable and scalable.
- Site Reliability Engineer (SRE): SREs focus on the availability, scalability, and performance of a company's production systems. They work closely with software developers to ensure that software is designed for operations and use a variety of tools and processes to automate and manage the deployment and maintenance of software releases.
- Systems Engineer: Systems Engineers are responsible for the design, implementation, and maintenance of an organization's IT systems. They work closely with software developers, network engineers, and security professionals to ensure that systems are secure, scalable, and reliable.
- Software Architect: Software Architects are responsible for the overall design and architecture of software systems. They work closely with software developers, business stakeholders, and other technical teams to ensure that software systems are
| Software Architect | A Software Architect is responsible for the overall design and architecture of software systems. They work closely with software developers, business stakeholders, and other technical teams to ensure that software systems are scalable, secure, and meet the needs of the business. A Software Architect is also responsible for making technology decisions and determining the best tools and frameworks to use for a particular project. | Designing the architecture of software systems, working with software developers, business stakeholders and other technical teams, making technology decisions, determining best tools and frameworks for a project. | Strong experience with software development, knowledge of software architecture and design patterns, ability to think strategically, strong communication and collaboration skills. |
| Software Engineer | A Software Engineer is responsible for writing and testing code for software systems. They work closely with software architects, business stakeholders, and other technical teams to develop software that is scalable, secure, and meets the needs of the business. A software engineer is also responsible for fixing bugs and addressing technical issues that arise during the development process. | Writing and testing code, working with software architects, business stakeholders and other technical teams, fixing bugs and addressing technical issues. | Strong programming skills in one or more languages, experience with software development processes, ability to work in a fast-paced environment, strong problem-solving skills. |
Similarities:
- All of these roles are technical positions that involve working with software and technology systems.
- They all require strong problem-solving skills and the ability to work in a fast-paced environment.
- They all involve working with other technical teams and stakeholders, such as software developers, network engineers, and business stakeholders.
- They all require a strong understanding of software development processes and technologies.
Differences:
- DevOps Engineer: DevOps Engineers focus on automating and streamlining the software delivery process, and work closely with software developers and IT operations to ensure that software is reliable and scalable.
- Site Reliability Engineer (SRE): SREs focus on the availability, scalability, and performance of a company's production systems. They work closely with software developers to ensure that software is designed for operations and use a variety of tools and processes to automate and manage the deployment and maintenance of software releases.
- Systems Engineer: Systems Engineers are responsible for the design, implementation, and maintenance of an organization's IT systems. They work closely with software developers, network engineers, and security professionals to ensure that systems are secure, scalable, and reliable.
- Software Architect: Software Architects are responsible for the overall design and architecture of software systems. They work closely with software developers, business stakeholders, and other technical teams to ensure that software systems are
scalable, secure, and meet the needs of the business.
- Software Engineer: Software Engineers focus on writing and testing code for software systems. They work closely with software architects, business stakeholders, and other technical teams to develop software that is scalable, secure, and meets the needs of the business.
Additionally, here is a table comparing the average compensation (in the United States) for these roles. Note: These figures are estimates and may vary depending on factors such as location, experience, and company size; compensation can vary greatly within a given role.
| Role | Average Compensation (in USD) |
|---------------------------|-----------------------------------|
| DevOps Engineer | $120,000 - $140,000 |
| Site Reliability Engineer | $140,000 - $165,000 |
| Systems Engineer | $110,000 - $130,000 |
| Software Architect | $140,000 - $165,000 |
| Software Engineer | $105,000 - $130,000 |
https://redd.it/10y8k5t
@r_devops
- Software Engineer: Software Engineers focus on writing and testing code for software systems. They work closely with software architects, business stakeholders, and other technical teams to develop software that is scalable, secure, and meets the needs of the business.
Additionally, here is a table comparing the average compensation (in the United States) for these roles. Note: These figures are estimates and may vary depending on factors such as location, experience, and company size; compensation can vary greatly within a given role.
| Role | Average Compensation (in USD) |
|---------------------------|-----------------------------------|
| DevOps Engineer | $120,000 - $140,000 |
| Site Reliability Engineer | $140,000 - $165,000 |
| Systems Engineer | $110,000 - $130,000 |
| Software Architect | $140,000 - $165,000 |
| Software Engineer | $105,000 - $130,000 |
https://redd.it/10y8k5t
@r_devops
Reddit
ChatGPT's Thoughts on DevOps Engineer vs Site Reliability Engineer vs Systems Engineer vs Software Architect vs Software Engineer
Posted in the devops community.
Can anyone suggest good YouTube videos for Jenkins?
I have DevOps as my college course where we implement it on Jenkins, I am a beginner and I want to learn more about it along with real world projects. Can anybody help me with the good courses on YouTube that you liked?
https://redd.it/10y83jn
@r_devops
I have DevOps as my college course where we implement it on Jenkins, I am a beginner and I want to learn more about it along with real world projects. Can anybody help me with the good courses on YouTube that you liked?
https://redd.it/10y83jn
@r_devops
Reddit
r/devops - Can anyone suggest good YouTube videos for Jenkins?
2 votes and 1 comment so far on Reddit
Github actions doesn't show child job results when I remove "contents: write"
If I have contents write, then my child job results (i.e. lint-actions run by parent job) show along side the job results under actions on the left hand side + an "Annotations" section with all the specific lint errors. But if I switch the permissions to "read", it doesn't show up and I would have no idea that I have lint errors except that they do show up on the PR for example. Why is this?
https://redd.it/10y6z0d
@r_devops
If I have contents write, then my child job results (i.e. lint-actions run by parent job) show along side the job results under actions on the left hand side + an "Annotations" section with all the specific lint errors. But if I switch the permissions to "read", it doesn't show up and I would have no idea that I have lint errors except that they do show up on the PR for example. Why is this?
https://redd.it/10y6z0d
@r_devops
Reddit
r/devops on Reddit
Github actions doesn't show child job results when... - 2 votes and 1 comment
Terrahaxs: GitOps Terraform CI/CD
Hey r/devops!
I'm Gabe, the founder of Terrahaxs, a GitHub Application that makes it easier to get started with Terraform CI/CD.
Why did we build this?
—
We wanted something better than Atlantis and cheaper than TFE or Spacelift.
Atlantis gets the job done and we’ve used it. However, deploying Atlantis requires you to already have
Infrastructure setup and in place (i.e. VPC, subnets, K8s cluster, etc) and DevOps skills. Terrahaxs allows you to get started with Terraform CI/CD without needing to deploy anything. Terrahaxs is also highly available (something Atlantis does not support), has unlimited concurrency, and supports features such as drift protection.
Spacelift and TFE are great, but they are expensive. Terrahaxs is a cheaper alternative.
How does it work?
—
Terrahaxs is a GitHub Application that you install with a few clicks of a button. Once installed, it will look for a a Terrahaxs.yaml or atlantis.yaml file and start running your Terraform CI/CD commands. It is backwards compatible with Atlantis and implements most of the functionality with more coming soon.
Terrahaxs uses a runner to execute commands and the runner can be hosted by Terrahaxs, run on GitHub Actions, or self-hosted.
The ask
We would love to hear any feedback from people in the field on what we’ve built. Would you use this? It’s still early, there are kinks, but we really would love to hear your thoughts (positive or negative)! 😊
https://redd.it/10ycu2i
@r_devops
Hey r/devops!
I'm Gabe, the founder of Terrahaxs, a GitHub Application that makes it easier to get started with Terraform CI/CD.
Why did we build this?
—
We wanted something better than Atlantis and cheaper than TFE or Spacelift.
Atlantis gets the job done and we’ve used it. However, deploying Atlantis requires you to already have
Infrastructure setup and in place (i.e. VPC, subnets, K8s cluster, etc) and DevOps skills. Terrahaxs allows you to get started with Terraform CI/CD without needing to deploy anything. Terrahaxs is also highly available (something Atlantis does not support), has unlimited concurrency, and supports features such as drift protection.
Spacelift and TFE are great, but they are expensive. Terrahaxs is a cheaper alternative.
How does it work?
—
Terrahaxs is a GitHub Application that you install with a few clicks of a button. Once installed, it will look for a a Terrahaxs.yaml or atlantis.yaml file and start running your Terraform CI/CD commands. It is backwards compatible with Atlantis and implements most of the functionality with more coming soon.
Terrahaxs uses a runner to execute commands and the runner can be hosted by Terrahaxs, run on GitHub Actions, or self-hosted.
The ask
We would love to hear any feedback from people in the field on what we’ve built. Would you use this? It’s still early, there are kinks, but we really would love to hear your thoughts (positive or negative)! 😊
https://redd.it/10ycu2i
@r_devops
Terrahaxs
Terrahaxs - GitOps & Terraform CI/CD
Terrahaxs - Terraform CI/CD GitHub Application
Am I wrong to suggest that we should move away from in-house managed applications for SRE team?
So I recently joined a startup as head of SRE team of 4 engineers.
Two of the engineers have been with the company for a long time. There brilliant engineers, but one of them is quite stubborn and has strongly opinionated.
One of the problems I see is that the whole build and deployment happens in a server that is in-house built. Sort of like Jenkins, but it is way more integrated in the the process.
The devs have absolutely no idea how the build and deployment works. And it's basically this one engineer who builds and maintains this system.
For example, Cloudformation yaml files are generated in code. Rather than just writing the yaml. This, at least for me, makes the whole this very black box to everybody, unless you have time to through a ton of Ruby code to understand what's going on.
I suggested that we should, at least for production, should make the process more streamlined and try to decouple it from this system. Since it is a point point of failure and we don't need that in production deployment path.
I also opined that for a small team like us, we should try to use managed services much as we can, and try to move away from in house built and maintained services. Every in house managed services is costly to maintain.
Understandably my opinion was not well received by this engineer, although other engineers agreed with it.
One of the arguments was that devs do not have to worry about build and deployment and it's the responsibility of the SRE team. And that having one central place everything happens is easier to maintain than 5 different managed services.
I strongly think using managed services is better as it helps with continuity, and maintaining that platform. Rather than having an in house system, which is mainly maintained by one engineer.
I don't want to create too much rift as this engineer has been with the company for a long time and he's the go to guy for any issue in the system.
But am I wrong?
Sorry for the long rant.
https://redd.it/10ybpx9
@r_devops
So I recently joined a startup as head of SRE team of 4 engineers.
Two of the engineers have been with the company for a long time. There brilliant engineers, but one of them is quite stubborn and has strongly opinionated.
One of the problems I see is that the whole build and deployment happens in a server that is in-house built. Sort of like Jenkins, but it is way more integrated in the the process.
The devs have absolutely no idea how the build and deployment works. And it's basically this one engineer who builds and maintains this system.
For example, Cloudformation yaml files are generated in code. Rather than just writing the yaml. This, at least for me, makes the whole this very black box to everybody, unless you have time to through a ton of Ruby code to understand what's going on.
I suggested that we should, at least for production, should make the process more streamlined and try to decouple it from this system. Since it is a point point of failure and we don't need that in production deployment path.
I also opined that for a small team like us, we should try to use managed services much as we can, and try to move away from in house built and maintained services. Every in house managed services is costly to maintain.
Understandably my opinion was not well received by this engineer, although other engineers agreed with it.
One of the arguments was that devs do not have to worry about build and deployment and it's the responsibility of the SRE team. And that having one central place everything happens is easier to maintain than 5 different managed services.
I strongly think using managed services is better as it helps with continuity, and maintaining that platform. Rather than having an in house system, which is mainly maintained by one engineer.
I don't want to create too much rift as this engineer has been with the company for a long time and he's the go to guy for any issue in the system.
But am I wrong?
Sorry for the long rant.
https://redd.it/10ybpx9
@r_devops
Reddit
r/devops on Reddit
Am I wrong to suggest that we should move away fro... - 2 votes and 8 comments
I have an app to build, all my designs are done, prototype flawless. Should I use ChatGPT for the hell of it?
It a basic app that could be built with some HTML/CSS, JavaScript, maybe some php or node. Webapp it? Or Android/iOS it? Or all three (webapp/Android/iOS)?
https://redd.it/10yg1hh
@r_devops
It a basic app that could be built with some HTML/CSS, JavaScript, maybe some php or node. Webapp it? Or Android/iOS it? Or all three (webapp/Android/iOS)?
https://redd.it/10yg1hh
@r_devops
Reddit
r/devops on Reddit: I have an app to build, all my designs are done, prototype flawless. Should I use ChatGPT for the hell of it?
Posted by u/Likeitisouthere - No votes and no comments
Easy Prometheus/Grafana Setup With Dashboards Repo
So I came across this while streaming yesterday and setting up prometheus and grafana on my kubernetes cluster I use on stream. This thing was so easy to setup and includes a bunch of pre-built Grafana dashboards already for you for your kubernetes cluster.
Highly recommend, I have also included a link to the part on my stream where you can see some of these live if you are curious how they look but I'm very impressed.
​
The actual link to the prometheus/grafana bundle: https://github.com/prometheus-operator/kube-prometheus
​
My twitch link to the section showing the dashboards: https://www.twitch.tv/videos/1731954476?t=02h02m15s
​
Hope this helps for anyone that might be struggling to get this going.
https://redd.it/10xvczs
@r_devops
So I came across this while streaming yesterday and setting up prometheus and grafana on my kubernetes cluster I use on stream. This thing was so easy to setup and includes a bunch of pre-built Grafana dashboards already for you for your kubernetes cluster.
Highly recommend, I have also included a link to the part on my stream where you can see some of these live if you are curious how they look but I'm very impressed.
​
The actual link to the prometheus/grafana bundle: https://github.com/prometheus-operator/kube-prometheus
​
My twitch link to the section showing the dashboards: https://www.twitch.tv/videos/1731954476?t=02h02m15s
​
Hope this helps for anyone that might be struggling to get this going.
https://redd.it/10xvczs
@r_devops
GitHub
GitHub - prometheus-operator/kube-prometheus: Use Prometheus to monitor Kubernetes and applications running on Kubernetes
Use Prometheus to monitor Kubernetes and applications running on Kubernetes - prometheus-operator/kube-prometheus
which one would you prefer
If anyone work in both places comment the below pros and cons
View Poll
https://redd.it/10yhi77
@r_devops
If anyone work in both places comment the below pros and cons
View Poll
https://redd.it/10yhi77
@r_devops
Reddit
r/devops on Reddit
which one would you prefer
Moving from developer job to cloud architect (terraform) job?
Hi folks, did any of you moved from a software developer job to a cloud architect job?
I received an offer from a company and talked with one of their employees to get an idea of what they do. He told me that they design cloud architectures and 80% of the job is writing terraform modules. They also write lambda functions in python/javascript sometimes.
At the moment I work as a backend java developer and I think I would miss coding, but I know cloud market is hot and cloud architect is a niche role which could pay better in the future.
What do you think? I'm 1year into my career. Would it be a good choice to switch?
https://redd.it/10yjgl6
@r_devops
Hi folks, did any of you moved from a software developer job to a cloud architect job?
I received an offer from a company and talked with one of their employees to get an idea of what they do. He told me that they design cloud architectures and 80% of the job is writing terraform modules. They also write lambda functions in python/javascript sometimes.
At the moment I work as a backend java developer and I think I would miss coding, but I know cloud market is hot and cloud architect is a niche role which could pay better in the future.
What do you think? I'm 1year into my career. Would it be a good choice to switch?
https://redd.it/10yjgl6
@r_devops
Reddit
r/devops - Moving from developer job to cloud architect (terraform) job?
Posted in the devops community.
Intern in DevOps as future SWE (CANADA)
I'm doing 2 internships back to back in DevOps and it's really interesting. As an undergrad engineer sould I land internship in fullstack also or DevOps is more lucrative in the long run?
https://redd.it/10y5401
@r_devops
I'm doing 2 internships back to back in DevOps and it's really interesting. As an undergrad engineer sould I land internship in fullstack also or DevOps is more lucrative in the long run?
https://redd.it/10y5401
@r_devops
Reddit
r/devops - Intern in DevOps as future SWE (CANADA)
Posted in the devops community.
Version control with git + CI/CD for Wordpress.
Googling suggests several potential options for version controlling a WordPress site using git and potentially setting up a deployment pipeline for it.
Does anybody have any experience of this at all that they'd care to share?
https://redd.it/10y4o47
@r_devops
Googling suggests several potential options for version controlling a WordPress site using git and potentially setting up a deployment pipeline for it.
Does anybody have any experience of this at all that they'd care to share?
https://redd.it/10y4o47
@r_devops
Reddit
r/devops on Reddit
Version control with git + CI/CD for Wordpress. - 2 votes and 1 comment
I'm excited about how hard it was to push a change into a cluster I manage, so I blogged about it
Hi folks! 👋
After 4 hours of jumping through various hoops to get a change into our dev/prod clusters, I'm so happy that it's done, that I thought I'd write up the process, since (in hindsight, not at the time!) it's quite gratifying how hard it was to push something into production and have the various checks ensure that what comes out the other end is secure and supportable :)
Here's my midnight ramble: https://geek-cookbook.funkypenguin.co.nz/blog/2023/02/11/layered-kubernetes-security-is-a-pita/
D
https://redd.it/10yozmw
@r_devops
Hi folks! 👋
After 4 hours of jumping through various hoops to get a change into our dev/prod clusters, I'm so happy that it's done, that I thought I'd write up the process, since (in hindsight, not at the time!) it's quite gratifying how hard it was to push something into production and have the various checks ensure that what comes out the other end is secure and supportable :)
Here's my midnight ramble: https://geek-cookbook.funkypenguin.co.nz/blog/2023/02/11/layered-kubernetes-security-is-a-pita/
D
https://redd.it/10yozmw
@r_devops
geek-cookbook.funkypenguin.co.nz
Why security in-depth is a(n awesome) PITA - Funky Penguin's Geek Cookbook
Is it easy to deploy stuff into your cluster? Ha! 0wn3d. It's SUPPOSED to be a PITA!
Is there a way to disable a duplicate workflow running from a trigger in Github Actions?
I've created a workflow that is triggered by a review being request on a pull request, and notifies a slack channel. I do this using this trigger:
name: PR Raised
on: pullrequest:
types: [reviewrequested]
The Issue I am having is that when more than 1 reviewer is added to a pull-request, the workflow runs more than once and delivers multiple notifications to Slack. Is there a way to add a condition within the YAML workflow so that, if a job is already running for this trigger, another one does not run, so that there is only 1 slack notification delivered rather than 1 for each reviewer?
Thanks, I hope I made this clear to understand.
https://redd.it/10yq5gv
@r_devops
I've created a workflow that is triggered by a review being request on a pull request, and notifies a slack channel. I do this using this trigger:
name: PR Raised
on: pullrequest:
types: [reviewrequested]
The Issue I am having is that when more than 1 reviewer is added to a pull-request, the workflow runs more than once and delivers multiple notifications to Slack. Is there a way to add a condition within the YAML workflow so that, if a job is already running for this trigger, another one does not run, so that there is only 1 slack notification delivered rather than 1 for each reviewer?
Thanks, I hope I made this clear to understand.
https://redd.it/10yq5gv
@r_devops
Reddit
r/devops - Is there a way to disable a duplicate workflow running from a trigger in Github Actions?
Posted in the devops community.