Reddit DevOps
269 subscribers
14 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Security Team wants me to join

The security team at my company wants me to join their team because they want more DevOps oriented people and they like my experience.

I’m not entirely sure which road is better. Should I stick with my DevOps team or start down the security path?

Thoughts?

https://redd.it/ywatx6
@r_devops
K8s and HIPPA/PHI compliant systems - Need advice!

I work on setting up aks clusters for a healthcare company and our security team wants no secrets in the cluster and even if there are any they should be encrypted. I am trying to understand how to solve the issue of not having secrets stored as kubernetes secrets since they are not encrypted.

So far for the application secrets e.g, database connection strings etc, we store them in azure-keyvault and we use azure-keyvault-secrets-store-csi-provider to bring those secrets and put them on a mount in the cluster only accessible to the pods, since we are usind dotnet applications, we made our applications look for the properties file via /appsettings.json. This setup is good since applications looking for the secrets would be picking them from the file instead of k8s secrets/configmap.

Now we are trying to setup ArgoCD as part of our GitOps setup, which would need cluster and repo secrets defined as `K8s secrets` for it to communicate with either cluster/repo. I dont think its possible to setup Argo without having kubernetes secrets.

Please help me with the following questions:

1. How do you handle the secrets in your applications again if they aren't supposed to be stored in k8s secrets

2. Is hashicorp vault gonna fix these two above issues, I still haven't looked into it but ig argo is looking for a k8s secret, then I dont think vault could help either.

Thank you.

https://redd.it/ywasp4
@r_devops
Question about PRs and chasing teams?

Wonder if this is the right place to post my question. We have new group of Devs and various characters. I want to avoid clashing but as Release Manager I’ve been told many years to chase and validate with team if all Pull Requests are done. Said that, I fell really stupid chasing group of very smart people if they did trivial thing. Any idea, and excuse my stupidity, how can I automate it so I do t have to chase unless it’s absolutely necessary?

https://redd.it/ywec7k
@r_devops
But it works on MY machine! Debugging GitHub Workflows with VS Code. Bad Practice?

I've been writing a ton of unit/integration and e2e tests in my career and often ran into issues were tests were failing in CI due to different environments. Especially once the number of architectural layers increase, e.g. in E2E tests, the likelihood for differences due to diverging environments increased as well.

For GitHub workflows I found a nifty little trick that would allow me to hijack GitHubs build machine and debug code directly in that machine, even push fixes back to the repository. Check it out: https://www.stateful.com/blog/debugging-github-workflows-with-vs-code

Do you think this will go away with devs moving to ephemeral workspaces? Do you see this as bad practice?

https://redd.it/yw43np
@r_devops
Remote management tool for various Linux servers

I am searching for an alternative for what we use today. Our use case is 2k plus servers with different os spread around the world. We currently use NeoRouter which is access based VPN.

We need a replacement as it support only 1000 servers. But the replacement must support centos 6 as lowest version.

Our requirements are that we need to grant users access to ssh into the server, also vnc on some. We need to easily be able to add a new user to a group of servers.

We tested:

- Teleport. It did not work in C6
- rPort. It is a hassle to connect for our end users to the servers (need to create a tunnel first)
- connectwise automate - really not a good option for linux, lacks real ssh

We would love if the software could:
- ACL for users (a must)
- Webgui script executions
- Server statistics / cpu memory etc / with warnings maybe?
- quick real ssh terminal
- easy file transfer
- vnc / http proxy

Any tip is appreciated!

https://redd.it/ywchsa
@r_devops
If you need to write an onboarding documentation for a junior devops or a non-devops, what would you include?

Let's say you have a complex cloud infrastructure using Ansible, Kubernetes and Terraform. What are the things you would include in an onboarding documentation for a junior devops or a non-devops to take on a senior role eventually?

https://redd.it/ywj1o9
@r_devops
Thoughts on Postgrad program in Devops?

Wondering if anyone can vouch for this "grad" online boot-camp/program
Devops Certification Bootcamp by Caltech CTME - California


It seems that they use Simplilearn for their instruction platform -- I saw some videos there and it seems low quality. Are you getting what you are paying for with this program? Will this add more knowledge to make you stronger/hirable as Devops candidate --- or just a waste of money?

https://redd.it/ywmpbz
@r_devops
Did your GitHub or Stack Overflow qualify you for a tech job when you had no degree or work history? Did the hiring manager/recruiter look at your profile to your knowledge? Assuming your GitHub demonstrated everything the job in question was requiring, did you get hired?

I’m asking because I’m conducting research on recruiting and the lack of fairness very qualified candidates receive.

If you weren’t hired, please briefly explain and give when this occurred. How many times? Did you get any sort of explanation? What company?

https://redd.it/ywirh1
@r_devops
Are there any companies that share their experience with Open Policy Agent in the recent years?

Hello,

I am searching for examples of companies implementing Open Policy Agent in their infrastructure, code and authorization mechanisms.

One good video about this is the one provided by Netflix - https://www.youtube.com/watch?v=R6tUNpRpdnY

But I just want to know if this is still one of the best Authorization solutions or if there is something better or are there any best practices in implementing it in code/infrastructure or any technical articles shared by infrastructure engineers or such people that are very motivated and understand the benefits of the technology/policy engine.

https://redd.it/ywocd2
@r_devops
Local Ansible Control node configuring AWS EC2: how to connect?

Hello,

I'm trying to use Ansible from my local machine (Ubuntu 22.04) to automate/provision services within AWS EC2 instance.

I have completed all the required steps in the control node/dev machine

- Installed aws cli (to manage aws access and secret key of the admin user,)
- installed phyton3
- installed pip3
- installed boto3

Within my IDE (vscode) I have also created the desired playbook (for now, just simple update task)

Question: how do I connect and run my playbook to aws ?

Reading online I got quite confused between access & secret key and .pem certificate

Is there a simple step by step guide I could follow?

Thank you

https://redd.it/ywo92y
@r_devops
Moving to a Devops team within a company

Hey guys I’m a solutions engineer in my new job we handle a lot of tickets. Which involve multiple CSPs I love the work cause I get to learn lot but I honestly want to eventually move to a Devops team when I see an opportunity but for now I want to talk to the engineers and understand their day to day and basically give myself an advantage when I apply as an internal candidate. Please any tips and advice would be appreciated. Any question I could ask the engineers to give me more insight please send those.

https://redd.it/ywpwhc
@r_devops
I created a CLI to collaboratively run container-native workflows

https://github.com/jatalocks/opsilon


With opsilon, you can connect to a folder/git repository, pull YAML-based workflow files (similar in syntax to Github/Gitlab Actions), and run them using arguments as prompts or flags.


My usecase as Devops team was giving my developers an abstraction to run automation, tests and scripts but not having to send them what to run/documentation every time. Instead, they run the CLI and automatically receive all available workflows and what they can/cannot run. It being container native also means they only need Docker on their machine, no other dependencies.

I'd love some feedback!

https://redd.it/ywqofb
@r_devops
I need to wait for a condition in my k8s pods. Should I have a sleep loop in the entrypoint or the init containers or should I have the execution fail until the condition is met?

I have a set of containers. I need some database migrations to run before they can start (triggered as a k8s job). Now I'm trying to decide which approach is best to do this:

1. In the entrypoint file I can wait for the migrations to finish running. This will also need a liveness probe to be added so that the pod doesn't appear ready
2. I can have an init container which does the waiting
3. I can have the entrypoint exit causing the container to restart
4. I can have the init container fail causing it to restart until the migrations have finished

Is there a better solution to the above here? How about a clear choice between them?

I don't really like the idea of having things constantly restarting. Seems a bit hacky. The init container solutions have a lot of extra k8s code to set up those pods. The sleeping in entrypoint solution would require that we add a liveness probe to prevent the old pods from being removed too soon.

I'm not entirely sure what the best option here would be. Any thoughts?

https://redd.it/ywxo0e
@r_devops
DataDog ECS Fargate - Run ECS checks from agent service instead of within task definition?

Title sort of says it all. DataDog recommends including their agent as a second container in every task definition to monitor tasks running in Fargate. This means running x2 containers total for all our Fargate tasks.

Previously, on EC2, we ran the container as a daemon per instance, limiting our container resource overhead.

Has anyone come up with a creative workaround in Fargate to not have to run x2 containers for the DataDog agent-based monitoring approach? We can run the agent itself as a service but unclear how it will then be able to poll metrics about other containers (it may be impossible, hence their requirement for running per task definition).

https://redd.it/ywxc7g
@r_devops
DevSec for Scale Podcast Ep 6: Policy-as-Code

I really enjoyed this discussion from Akeyless on Policy-as-code and I figured that I could share it here. Podcast link.

Transcript for those who prefer written formats:

TRANSCRIPT

Jeremy Hess: Welcome to the DevSec for Scale Podcast, the show that makes security a first-class citizen for growing companies. My name is Jeremy Hess, Head of Developer Relations at Akeyless, the Secrets Management SaaS platform. This interview podcast brings security experts and practitioners together to offer practical and actionable ways for small and growing companies to implement security best practices using shift left principles, without interrupting developer lifecycles.

Welcome back everybody. My name is Jeremy Hess with Akeyless, and today my guest is Eran Bibi. He’s co-founder and Chief Product Officer at a fantastic startup called Firefly. They deal with many different things and we’re going talk about that a little bit soon. But Eran, before we get into you and the company and all that, let’s talk a little bit about policy-as-code, which is what this episode is going to be about. So, policy-as-code, it’s a bit of a newer term in the industry and it has these remnants and these ideas of old school policy and things like that. So, can you give us a little understanding about what policy-as-code is all about?

Eran Bibi: Yes of course. So, policy-as-code is one of those trends of doing everything as code. And I think the main advantage of using this new methodology is the power for developers to create policies for themselves, and to use the community to extend the policy surface on their organization. So, it’s a really cool methodology I would say.

Jeremy Hess: Got it. Well, what was it like in terms of looking at policy as an old school term? What was the difference there between what it was and how it’s changed?

Eran Bibi: So, the idea is basically about enforcement and prevention, so you would like to have more control over the stuff that you are deploying on your Kubernetes cluster or cloud workloads. And policy-as-code basically gives you the opportunity to create those gatings in the CI, and to make sure that the configuration that you have in place meets the policy that you decided to put on your organization. It can be stuff that’s related to security. This is the most common use case, but it’s also about having alignment and best practices, and even making sure stuff is built for scale.

Jeremy Hess: Got it, really cool. Alright, so Eran, why don’t you give us a little bit about you, your background, and more about what Firefly is doing today?

Eran Bibi: So, my name is Eran Bibi. I was a DevOps engineer in my previous role. I’ve held a few DevOps positions in the past 10 years, and now I co-founded and lead the product at Firefly, a new startup that basically helps DevOps to get better control over their cloud.

Jeremy Hess: Got it. Alright. Well, can you give us a little bit more detail? What’s Firefly doing in that realm of the cloud and how is it helping customers?

Eran Bibi: Sure, so Firefly is basically a cloud asset management tool, and we scan the cloud and then give you visibility about all the stuff that you have in the cloud. One of the main metrics we provide is what portion of the cloud is managed by infrastructure-as-code, and what is not. And when I say not, this is workloads you’re creating manually using a ClickOps kind of usage or a CLI tool. But in any case, you don’t have any infrastructure-as-code. So, Firefly gives you that visibility, and then gives you the automation to help you to increase that coverage of infrastructure-as-code. So, think about it as the tool that helps you achieve your goals of meeting best practices and industry standards.

Jeremy Hess: Got it. Awesome. So, getting a bit more into policy-as-code. What was the impetus for these changes for policy-as-code? What’s novel about the idea of policy-as-code specifically?

Eran Bibi: So, we have this trend of
shifting left, and giving more power to developers to do stuff early in the lifecycle of the software. So policy-as-code is basically the opposite of buying a very expensive kind of security system that gives new enforcement on the runtime. So, policy-as-code gives the power to the developers to use the community manifest and the community powers to put guardrails on the CI, and to make sure nothing is being provisioned into production. And I think Kubernetes has a lot to do with the trend. So, policy-as-code, even if I’m talking more specifically about frameworks that are popular in that domain like OPA, gives you a very simple syntax to create policies on the stuff that you can provision into the Kubernetes cluster.

Jeremy Hess: Yeah, exactly, and that’s where I wanted to take this, to understand a little bit more about OPA. So, first of all, why don’t you give us a little bit more detail about OPA? It’s a very large project and let’s hear a little bit more about that.

Eran Bibi: So, OPA is a framework built by a company called Styra, and I think it’s something like four or five years old, so it’s relatively new. It provides a very simple syntax for creating policies. The syntax is called Rego, and it basically checks against a JSON manifest. So, if you have a workload like in Kubernetes or Terraform for example, you can create, with very few lines of code in a human readable kind of syntax, a gate for stuff.

So, the output of the OPA is basically allow or deny. So, you can create those rules and, let me give you some very concrete examples. You will not allow any workload without a liveness probe to be provisioned into your Kubernetes cluster. So, you basically can create the Rego syntax that works against the manifest of a Kubernetes deployment, and just makes sure you have that block of liveness in your manifest. And if you don’t have it, it will basically provide a deny kind of output and then you can gate it early in the stage of deployment, and make sure you don’t have any deployment running without a liveness probe.

Jeremy Hess: Got it. Okay, so how would a startup or an early-stage company be able to implement a policy-as-code based on the OPA framework?

Eran Bibi: So, it’s very easy, because policy-as-code is a code, and it can be shared in GitHub. You can find tons of projects online that already have built-in kinds of policies that you can use even without writing a line of code. Just cloning those repositories and implementing them on your cluster. So, in Kubernetes’ case, there is a project called Gatekeeper that checks your policy against the running workload and makes sure you don’t have any violation on your runtime. There is also some tooling that you can put even before that in your CI that just scans your manifest. Like a Helm chart or a YAML manifest, and makes sure you don’t have anything that is misconfigured. But I think the real power of policy-as-code is the community.

The fact that you, without writing any line of code, have tons of policies out there and even like a compliance package. You can have PCI compliance, EPA compliance, best practices, well architected, all of that stuff already available for any developers to put on their CI to make sure nothing that is misconfigured will be provisioned to the cloud.

Jeremy Hess: Right, well that’s the power of community. We always love our community members or community friends, and developers helping developers is always a fantastic melding together of minds, so it’s really great. Let’s ask a little more about policy-as-code specifically in terms of the challenges, things that are a little bit more difficult, let’s say. What are the challenges that you see or what’s the biggest challenge, let’s say, of implementing policy-as-code at an early-stage company?

Eran Bibi: So, putting gates, whether it is like denying stuff from being provisioned to the cloud, can be something that can slow you down. So, if an early-stage startup is all about delivering fast value to the customers without having everything perfect, if the DevOps engineer or the
one that is in charge of putting those policies is too strict, then it might impact the delivery and the velocity of the development too. So, I think you need to use that wisely and make sure that you maintain velocity, but use policies to prevent something that can be a disaster to the company. So, with great power comes great responsibility.

Jeremy Hess: Yeah.

Eran Bibi: It’s great but you need to make sure again in early startups, to move fast, and then you have the time to fix and align other non-perfect kinds of workloads that you have on the cloud.

Jeremy Hess: Got it. Well, you come from a bit of a larger company that was a startup not too long ago, Aqua. So, you also have that security background. What would you say are some things that you saw at a larger company you were able to implement, but now at a startup it’s a little bit more difficult to implement those specific ideas and tools?

Eran Bibi: So, my experience with Aqua was very similar to what we have right now in Firefly. Because I joined Aqua Security at a very early stage when we were like 20 employees. And again, it was the same dilemma because I was the one in charge of the CI/CD pipeline. I put the tooling in place to make sure the right gating was in place. So, I think if I’m looking specifically at security, the policy was very strict regarding high or critical vulnerabilities, that the CI would basically stop the build if there were such findings, but for less than that we were only about alerting, and not enforcing and stopping the build. Because again, startups need to make sure there is a high velocity of workloads, and developers don’t have to deal all the time with why the CI is stopping the build. Again, as I mentioned, it’s really in terms of finding the right balance.

Jeremy Hess: Absolutely. So, another question I like to ask all my guests when we’re talking about, especially security is, at Firefly do you use your own product to check your own product?

Eran Bibi: Yes of course, we call it drinking our own champagne.

Jeremy Hess: Yeah. Instead of dog food, right? Instead of dog food.

Eran Bibi: Yes, dog food, we found a prettier term for that. So, the dog food protocol in Firefly is about making sure all of our AWS accounts and Kubernetes clusters are integrated with Firefly. And we use that to make sure the cloud is aligned with best practices and everything is codified, meaning described as code. It’s a great kind of experience. We have our own DevOps engineer who uses Firefly like any other customer. And if he finds a defect he opens a defect for the development and it’s a great methodology.

Jeremy Hess: Alright, fantastic. One more question I’d like to also ask is just for our listeners, because of course we’re trying to talk about shift left principles and how to make sure that developers are still able to continue doing their work, with as little stopping and changing as possible. But also make sure that they’re implementing security best practices. What are, let’s say one or two tips you might give generally about how developers can implement security? What are some, in your head, in your eyes, basics that a developer could implement without worrying too much about overhead?

Eran Bibi: So, if shift left was all about putting stuff in the CI, I think right now the trend is even more left than that. So, every plugin that a developer can put in the IDE, whether it’s a DS code or other kind of tooling that gives them the visibility about the status of this code in all of those perspectives. Whether it’s a configuration or even a security on a specific code block. There is great tooling out there that integrates with the IDE. So, this is stuff that I recommend. And I think it’s become a standard kind of approach of having those plugins in the IDE.

Jeremy Hess: Great, that’s a great one. Yeah, I think that developers definitely have that idea. And on top of that, I wanted to ask a little extra here which is not specifically related to this necessarily. I’m going to be doing a meetup with your team on the newest product
open-source project that was put out into the world, of ValidIaC. So, maybe as a little preview we can give our listeners, what’s ValidIaC all about?

Eran Bibi: Of course, so we basically created that SaaS offering for developers to have access directly from their browsers to make sure the infrastructure-as-code is aligned with a few verticals. So, we have a security scanner, a linter, and also a cost projection. We use a few of the most popular open-source tools and just combine them into one very nice SaaS-based UI. And of course, it’s open-source, so I encourage everybody to support us with stars and even contribute if you think additional tools can be added to that portal. But I will keep the demo to the meetup.

Jeremy Hess: Yeah, for later absolutely. Fantastic. Alright, so that was a fantastic interview. Eran, I really appreciate your time. Thank you so much. Good luck with Firefly, and hopefully we get to have you on again when Firefly hits their next rounds of funding and beyond. We wish you all the best. So, have a great day and thanks so much.

Eran Bibi: Thank you Jeremy. Thank you for having me.

https://redd.it/yx0ivu
@r_devops
What is a good way to document CI/CD pipelines?

I’m building some pipelines for various apps, this includes CI and CD. I want to start by illustrating to the team the different tools and steps within these pipelines. Are there any free tools for generating nice and illustrative docs that are DevOps orientated?

https://redd.it/yx4jv6
@r_devops
Feeling not so great about being a DevOps\Cloud Engineer

Hello fellow DevOps friends,

Lately I have been feeling very down and depressed about how I'm functioning in DevOps, cloud, infrastructure, etc.

I got into DevOps about 6(?) years ago when I moved to it from QE, so I never had a strong programming background. I do love learning about different tools and technologies and finding effective ways to implement them, but I really feel like I'm lagging behind.

I've been working in Azure consistently for 7 months and still barely understand it even at a fundamental level. Aside from this, I've been experiencing major brain fog, not able to focus at work, etc. Not sure if that's stress of learning so many new tools, or how I'm feeling (or maybe it's all the Excel sheets... 🤢), but it's impacting how I'm performing.

I just wanna know if someone else in the DevOps world has experienced this, and have you/how did you overcome it? I'm feeling so scrambled 😫

https://redd.it/yx2pkx
@r_devops
DevOps infrastructure from scratch

I'm a long time Network/SysAdmin who wants to move into DevOps and SRE type roles.

I want to setup an environment from scratch implementing best practices, and need a little guidance with the foundational building blocks, and where to start. I want to do this on the cheap using FOSS and low cost services (but only when necessary). That being said, I don't want to close the door on paid services, especially Azure as our current application stack is Windows based (and could migrate to Azure in the future, but hopefully not.)

* I have a somewhat beefy server (dual Xeon, 192GB RAM, redundant storage) on premises that is a blank slate (but I'd like to use Proxmox as it allows hosting any OS). We have gigabit internet.
* I also have a free tier Oracle Cloud (Ampere aka ARM) account that seems pretty decent.
* Finally, I could add a cheap VPS (think LowEndBox) if there is any benefit. I also have more hardware on premises I can use.

I'd like to start my build with something like Terraform, but Ansible, Puppet, etc. are options. This kind of feels like picking oneself up by your own bootstraps. I'm trying to avoid installing directly on my workstation. I'm unclear on where to start.

Eventually I'd progress into Docker (or Podman?), Kubernetes, host my own code repo, monitoring, etc. I guess my confusion is also about the order of operations so that I'm not having to undo/redo things.

Any help or advice is appreciated. Many thanks.

https://redd.it/yx2vci
@r_devops