Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
DevOps course for small companies and individuals

Hello everyone,

I've made a DevOps course covering a lot of different technologies and applications, aimed at startups, small companies and individuals who want to self-host their infrastructure.
To get this out of the way - this course doesn't cover Kubernetes or similar - I'm of the opinion that for startups, small companies, and especially individuals, you probably don't need Kubernetes. Unless you have a whole DevOps team, it usually brings more problems than benefits, and unnecessary infrastructure bills buried a lot of startups before they got anywhere.

As for prerequisites, you can't be a complete beginner in the world of computers. If you've never even heard of Docker, if you don't know at least something about DNS, or if you don't have any experience with Linux, this course is probably not for you. That being said, I do explain the basics too, but probably not in enough detail for a complete beginner.

Here's a 100% OFF coupon if you want to check it out:

https://www.udemy.com/course/real-world-devops-project-from-start-to-finish/?couponCode=FREEDEVOPS2302FIAPO

Be sure to BUY the course for $0, and not sign up for Udemy's subscription plan. The Subscription plan is selected by default, but you want the BUY checkbox. If you see a price other than $0, chances are that all coupons have been used already.

I encourage you to watch "free preview" videos to get the sense of what will be covered, but here's the gist:

The goal of the course is to create an easily deployable and reproducible server which will have "everything" a startup or a small company will need - VPN, mail, Git, CI/CD, messaging, hosting websites and services, sharing files, calendar, etc. It can also be useful to individuals who want to self-host all of those - I ditched Google 99.9% and other than that being a good feeling, I'm not worried that some AI bug will lock my account with no one to talk to about resolving the issue.

Considering that it covers a wide variety of topics, it doesn't go in depth in any of those. Think of it as going down a highway towards the end destination, but on the way there I show you all the junctions where I think it's useful to do more research on the subject.

We'll deploy services inside Docker and LXC (Linux Containers). Those will include a mail server (iRedMail), Zulip (Slack and Microsoft Teams alternative), GitLab (with GitLab Runner and CI/CD), Nextcloud (file sharing, calendar, contacts, etc.), checkmk (monitoring solution), Pi-hole (ad blocking on DNS level), Traefik with Docker and file providers (a single HTTP/S entry point with automatic routing and TLS certificates).

We'll set up WireGuard, a modern and fast VPN solution for secure access to VPS' internal network, and I'll also show you how to get a wildcard TLS certificate with certbot and DNS provider.

To wrap it all up, we'll write a simple Python application that will compare a list of the desired backups with the list of finished backups, and send a result to a Zulip stream. We'll write the application, do a 'git push' to GitLab which will trigger a CI/CD pipeline that will build a Docker image, push it to a private registry, and then, with the help of the GitLab runner, run it on the VPS and post a result to a Zulip stream with a webhook.

When done, you'll be equipped to add additional services suited for your needs.

If this doesn't appeal to you, please leave the coupon for the next guy :)

I hope that you'll find it useful!


Happy learning,
Predrag

https://redd.it/10xttch
@r_devops
How proficient should Solution Architects be at writing code?

I am a lead architect for a retail enterprise org that has a small global footprint.

I have an engineer on the engineering team who comes from a Cloud Consultant background (Think Cloudreach/Logicworks) who can write some bash, but is terrible with writing any other language.

We have a tool that generates CloudFormation templates using Python, which is our IAC tool that leverages an open-source Python library. We live and die by this tool for our IAC, which I have implemented and is rock solid for our org, but this individual is on the struggle bus to use the tool at anything but a beginner level due to his lack of coding skill, and has little to no desire to level-up.

He is very strong with how AWS services work and how they should be implemented in an org (Account management, config enforcement, etc), and understands when/how apps should be designed from the "pillar perspective".

I got talking with my leader, who seems to think that "Most Solutions Architects don't need to have strong development chops", which I strongly disagree with.

While I understand in the context of big cloud consulting agencies, a team of people with each skill is presented to clients to deliver, all with the specialties needed to fulfill the SOW, and from that perspective it might be fine, but in Enterprise, I was wondering what your experience is.

In our org, we need to empower our developers by helping them learn the way by implementing code in pipelines across multiple languages, which almost always involves reading their code, and suggesting implementation changes, in addition to our own IAC tools. On top of that, the normal architecture guidance of reliability, fault-taulerance, etc comes in.

IMO, having an idea of how it should be designed is only part of the picture, Architects should also know how to hand off the implementation and be familiar with the tools to implement the design, which in our org consists of Groovy and Python based tools.

So, what does reddit think? How proficient should an Architect be at writing code?

https://redd.it/10xvhd1
@r_devops
Comparison among techniques to share GPUs in Kubernetes

I recently released an [opensource library to dynamically leverage GPU with NVIDIA MIG and with MPS](https://github.com/nebuly-ai/nos), and the most appreciated component of the comparison among sharing technologies, so I wanted to share it here.

There are three approaches for sharing GPUs in Kubernetes:

1. Multi-Instance GPU ([MIG](https://github.com/NVIDIA/mig-parted))
2. Multi-Process Service ([MPS](https://docs.nvidia.com/deploy/mps/index.html))
3. Time Slicing ([TS](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html))

# Multi-Instance GPU (MIG)

**Workload isolation**: best

**Pros**

* Processes are executed in parallel
* Full isolation (dedicated memory and compute resources)

**Cons**

* Supported by fewer GPU architectures (only Ampere or more recent architectures)
* Coarse-grained control over memory and compute resources

**References**: [Tutorial on how to use Dynamic MIG Partitioning](https://towardsdatascience.com/dynamic-mig-partitioning-in-kubernetes-89db6cdde7a3)

# Multi-Process Service (MPS)

**Workload isolation**: medium

**Pros**

* Supported by almost every GPU architecture
* Processes are executed parallel
* Fine-grained control over memory and compute resources allocation
* It lets you setup memory limits

**Cons**

* No memory protection and error isolation

**References**: [Comparison of sharing techniques and tutorial on how to use MPS](https://towardsdatascience.com/how-to-increase-gpu-utilization-in-kubernetes-with-nvidia-mps-e680d20c3181)

# Time Slicing

**Workload isolation**: none

**Pros**

* Supported by almost every GPU architecture
* Processes are executed concurrently

**Cons**

* No resource limits
* No memory isolation
* Lower performance due to context-switching overhead

**References**: [Time-Slicing GPUs in Kubernetes](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html)

​

**Resources**

* [Dynamic GPU Partitioning documentation](https://docs.nebuly.com/nos/dynamic-gpu-partitioning/overview/)
* [NVIDIA GPU Operator documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html)
* [NVIDIA MIG User guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/)

https://redd.it/10xty21
@r_devops
Need a catchy name for a data migration tool

My company is developing a new data migration tool and we're trying to come up with a catchy name. Open to any and all suggestions!

https://redd.it/10xy5j9
@r_devops
Terraform vs. Cloudformation for an all-AWS Environment in 2023?

Current company uses Cloudformation for everything. I work in an AWS-only environment (except for a few data workloads on GCP, which use Terraform, but they're an exception and not worth considering in this question).

I'm wondering — in 2023, is there a tangible benefit for ripping up all our Cloudformation and rewriting it all in Terraform? Assuming we have no plans to migrate off of AWS anytime soon?

What are the pros and cons of Terraform vs. Cloudformation in 2023 for an AWS-only company?

https://redd.it/10y3p20
@r_devops
Docker/Kubernetes Role in CI/CD

I want to gain a better understanding of how docker/kubernetes generally fits into the CI/CD pipeline, as I am completely new to docker/kubernetes.

Are docker/kubernetes generally used in the lower environments, and then when we reach the production stage, do companies generally just install the apps directly on the servers instead of containers?

https://redd.it/10y1frz
@r_devops
Documentation Advice

I’m looking for the best ways to create and manage documentation for our companies projects.

Some basic principles I’ve been considering

- Documentation should be managed as source code in the same repository as the code
- Documentation should be generated and published as part of the deployment pipeline
- If possible it could be helpful to use a linter to warn that documentation and code is out of sync

I’m also trying to figure out the different levels of documentation. Here’s what I’m considering currently.

- High level architecture and how the components interact
- UI user stories and features
- API documentation
- Method level documentation such as Java Docs

Honestly I’m just looking for general advice and experiences.

Thanks!

https://redd.it/10y4wg4
@r_devops
How's the market for fully remote roles?

I'm from the UK. I've just accepted a new role that's two days on-site. I don't mind it as I feel it'll benefit me but would have preferred less time in the office. When I was applying I saw most places were hybrid, only a few advertised as fully remote.

How has it been finding fully remote work for you? Has your company ordered you back on a hybrid basis?

https://redd.it/10y7ne6
@r_devops
ChatGPT's Thoughts on DevOps Engineer vs Site Reliability Engineer vs Systems Engineer vs Software Architect vs Software Engineer

| Role | Description | Key Responsibilities | Skills |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DevOps Engineer | A DevOps Engineer is responsible for automating, testing, and deploying software releases. They work closely with software developers and IT operations to ensure that the software is reliable and scalable. They use tools like continuous integration, continuous delivery, and configuration management to automate the software delivery process. | Automating software releases, testing and deploying code, working with software developers and IT operations, using tools for continuous integration, delivery and configuration management. | Knowledge of continuous integration and delivery (CI/CD) tools, experience with automation and scripting, ability to work in a fast-paced environment, strong collaboration and communication skills. |
| Site Reliability Engineer (SRE) | An SRE is responsible for the availability, scalability, and performance of a company's production systems. They work closely with software developers to ensure that software is designed for operations, and they use a variety of tools and processes to automate and manage the deployment and maintenance of software releases. An SRE is also responsible for incident response, disaster recovery, and capacity planning. | Ensuring availability, scalability and performance of production systems, working with software developers, automating and managing software releases, incident response and disaster recovery, capacity planning. | Strong experience with Linux/Unix administration, experience with automation and scripting, experience with incident response and disaster recovery, ability to work with software developers, strong problem-solving skills. |
| Systems Engineer | A Systems Engineer is responsible for the design, implementation, and maintenance of an organization's IT systems. They work closely with software developers, network engineers, and security professionals to ensure
that systems are secure, scalable, and reliable. They are often responsible for the deployment and configuration of hardware and software systems, as well as for the design of backup and recovery systems. | Designing, implementing and maintaining IT systems, working with software developers, network engineers and security professionals, deploying and configuring hardware and software systems, designing backup and recovery systems. | Strong experience with network and system administration, knowledge of security best practices, experience with deployment and configuration of hardware and software systems, ability to work with a variety of technical teams. |
| Software Architect | A Software Architect is responsible for the overall design and architecture of software systems. They work closely with software developers, business stakeholders, and other technical teams to ensure that software systems are scalable, secure, and meet the needs of the business. A Software Architect is also responsible for making technology decisions and determining the best tools and frameworks to use for a particular project. | Designing the architecture of software systems, working with software developers, business stakeholders and other technical teams, making technology decisions, determining best tools and frameworks for a project. | Strong experience with software development, knowledge of software architecture and design patterns, ability to think strategically, strong communication and collaboration skills. |
| Software Engineer | A Software Engineer is responsible for writing and testing code for software systems. They work closely with software architects, business stakeholders, and other technical teams to develop software that is scalable, secure, and meets the needs of the business. A software engineer is also responsible for fixing bugs and addressing technical issues that arise during the development process. | Writing and testing code, working with software architects, business stakeholders and other technical teams, fixing bugs and addressing technical issues. | Strong programming skills in one or more languages, experience with software development processes, ability to work in a fast-paced environment, strong problem-solving skills. |

Similarities:
- All of these roles are technical positions that involve working with software and technology systems.
- They all require strong problem-solving skills and the ability to work in a fast-paced environment.
- They all involve working with other technical teams and stakeholders, such as software developers, network engineers, and business stakeholders.
- They all require a strong understanding of software development processes and technologies.

Differences:
- DevOps Engineer: DevOps Engineers focus on automating and streamlining the software delivery process, and work closely with software developers and IT operations to ensure that software is reliable and scalable.
- Site Reliability Engineer (SRE): SREs focus on the availability, scalability, and performance of a company's production systems. They work closely with software developers to ensure that software is designed for operations and use a variety of tools and processes to automate and manage the deployment and maintenance of software releases.
- Systems Engineer: Systems Engineers are responsible for the design, implementation, and maintenance of an organization's IT systems. They work closely with software developers, network engineers, and security professionals to ensure that systems are secure, scalable, and reliable.
- Software Architect: Software Architects are responsible for the overall design and architecture of software systems. They work closely with software developers, business stakeholders, and other technical teams to ensure that software systems are
scalable, secure, and meet the needs of the business.
- Software Engineer: Software Engineers focus on writing and testing code for software systems. They work closely with software architects, business stakeholders, and other technical teams to develop software that is scalable, secure, and meets the needs of the business.

Additionally, here is a table comparing the average compensation (in the United States) for these roles. Note: These figures are estimates and may vary depending on factors such as location, experience, and company size; compensation can vary greatly within a given role.

| Role | Average Compensation (in USD) |
|---------------------------|-----------------------------------|
| DevOps Engineer | $120,000 - $140,000 |
| Site Reliability Engineer | $140,000 - $165,000 |
| Systems Engineer | $110,000 - $130,000 |
| Software Architect | $140,000 - $165,000 |
| Software Engineer | $105,000 - $130,000 |

https://redd.it/10y8k5t
@r_devops
Can anyone suggest good YouTube videos for Jenkins?

I have DevOps as my college course where we implement it on Jenkins, I am a beginner and I want to learn more about it along with real world projects. Can anybody help me with the good courses on YouTube that you liked?

https://redd.it/10y83jn
@r_devops
Github actions doesn't show child job results when I remove "contents: write"

If I have contents write, then my child job results (i.e. lint-actions run by parent job) show along side the job results under actions on the left hand side + an "Annotations" section with all the specific lint errors. But if I switch the permissions to "read", it doesn't show up and I would have no idea that I have lint errors except that they do show up on the PR for example. Why is this?

https://redd.it/10y6z0d
@r_devops
Terrahaxs: GitOps Terraform CI/CD

Hey r/devops!

I'm Gabe, the founder of Terrahaxs, a GitHub Application that makes it easier to get started with Terraform CI/CD.

Why did we build this?



We wanted something better than Atlantis and cheaper than TFE or Spacelift.

Atlantis gets the job done and we’ve used it. However, deploying Atlantis requires you to already have

Infrastructure setup and in place (i.e. VPC, subnets, K8s cluster, etc) and DevOps skills. Terrahaxs allows you to get started with Terraform CI/CD without needing to deploy anything. Terrahaxs is also highly available (something Atlantis does not support), has unlimited concurrency, and supports features such as drift protection.

Spacelift and TFE are great, but they are expensive. Terrahaxs is a cheaper alternative.

How does it work?



Terrahaxs is a GitHub Application that you install with a few clicks of a button. Once installed, it will look for a a Terrahaxs.yaml or atlantis.yaml file and start running your Terraform CI/CD commands. It is backwards compatible with Atlantis and implements most of the functionality with more coming soon.

Terrahaxs uses a runner to execute commands and the runner can be hosted by Terrahaxs, run on GitHub Actions, or self-hosted.

The ask

We would love to hear any feedback from people in the field on what we’ve built. Would you use this? It’s still early, there are kinks, but we really would love to hear your thoughts (positive or negative)! 😊

https://redd.it/10ycu2i
@r_devops
Am I wrong to suggest that we should move away from in-house managed applications for SRE team?

So I recently joined a startup as head of SRE team of 4 engineers.
Two of the engineers have been with the company for a long time. There brilliant engineers, but one of them is quite stubborn and has strongly opinionated.

One of the problems I see is that the whole build and deployment happens in a server that is in-house built. Sort of like Jenkins, but it is way more integrated in the the process.
The devs have absolutely no idea how the build and deployment works. And it's basically this one engineer who builds and maintains this system.

For example, Cloudformation yaml files are generated in code. Rather than just writing the yaml. This, at least for me, makes the whole this very black box to everybody, unless you have time to through a ton of Ruby code to understand what's going on.

I suggested that we should, at least for production, should make the process more streamlined and try to decouple it from this system. Since it is a point point of failure and we don't need that in production deployment path.

I also opined that for a small team like us, we should try to use managed services much as we can, and try to move away from in house built and maintained services. Every in house managed services is costly to maintain.

Understandably my opinion was not well received by this engineer, although other engineers agreed with it.
One of the arguments was that devs do not have to worry about build and deployment and it's the responsibility of the SRE team. And that having one central place everything happens is easier to maintain than 5 different managed services.

I strongly think using managed services is better as it helps with continuity, and maintaining that platform. Rather than having an in house system, which is mainly maintained by one engineer.

I don't want to create too much rift as this engineer has been with the company for a long time and he's the go to guy for any issue in the system.

But am I wrong?
Sorry for the long rant.

https://redd.it/10ybpx9
@r_devops
I have an app to build, all my designs are done, prototype flawless. Should I use ChatGPT for the hell of it?

It a basic app that could be built with some HTML/CSS, JavaScript, maybe some php or node. Webapp it? Or Android/iOS it? Or all three (webapp/Android/iOS)?

https://redd.it/10yg1hh
@r_devops
Easy Prometheus/Grafana Setup With Dashboards Repo

So I came across this while streaming yesterday and setting up prometheus and grafana on my kubernetes cluster I use on stream. This thing was so easy to setup and includes a bunch of pre-built Grafana dashboards already for you for your kubernetes cluster.

Highly recommend, I have also included a link to the part on my stream where you can see some of these live if you are curious how they look but I'm very impressed.

​

The actual link to the prometheus/grafana bundle: https://github.com/prometheus-operator/kube-prometheus

​

My twitch link to the section showing the dashboards: https://www.twitch.tv/videos/1731954476?t=02h02m15s

​

Hope this helps for anyone that might be struggling to get this going.

https://redd.it/10xvczs
@r_devops
which one would you prefer

If anyone work in both places comment the below pros and cons

View Poll

https://redd.it/10yhi77
@r_devops
Moving from developer job to cloud architect (terraform) job?

Hi folks, did any of you moved from a software developer job to a cloud architect job?

I received an offer from a company and talked with one of their employees to get an idea of what they do. He told me that they design cloud architectures and 80% of the job is writing terraform modules. They also write lambda functions in python/javascript sometimes.

At the moment I work as a backend java developer and I think I would miss coding, but I know cloud market is hot and cloud architect is a niche role which could pay better in the future.

What do you think? I'm 1year into my career. Would it be a good choice to switch?

https://redd.it/10yjgl6
@r_devops