Reddit DevOps
270 subscribers
5 photos
31K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
What about cloud cost optimization?

I'm curious to find out if as a DevOps, part of your role is to optimize cloud cost?


I've worked with several big organizations in the past, we were spending tons of money on the cloud, in some cases, it came to 50 million dollars a year, and I could literally do a quick scan of our servers and reduce the monthly invoice by 250k$ by simply eliminating idle machines.

I used to come to the DevOps team with these lists and numbers and was usually shooed away and dismissed.

So I'm seriously curious if you as a DevOps care about cloud cost? Do you work on optimizing it? who in your organization cares?

I've also written a short article about cloud cost optimization. Might be relevant to the topic:

https://zukeep.com/what-is-cloud-cost-optimization-3-actions-you-can-take-today-to-reduce-your-cloud-cost/

https://redd.it/zvnfix
@r_devops
Is anyone there who have changed their career after 10 years from test automation engineer (selenium + java) to Devops engineer?

How was the learning curve and how tough is it?

https://redd.it/zssoq5
@r_devops
Moving to gitlab/GitHub from ADO

I'm trying to write a proposal for my org to consider switching to a better platform like Gitlab or GitHub. ADO is lacking most basic features, i for a fact knew that both the gitlab and GitHub has shit Ton of neat features and also considering that MSFT is only focusing on GitHub now, i thought why not?

If anyone has gone down this road , please do share your inputs.

https://redd.it/zvq7d1
@r_devops
What is key usage of artifactory (nexus or Jfrong)?

3 year experience level question about artifactory?

https://redd.it/zsrw89
@r_devops
How to mark Airflow DAG failed if a function return false value inside the DAG?

GOAL: I am building a DAG that if it returns a false value at the end I want to make that DAG run a failure if it returns true than it can finish as a succesful run.

https://redd.it/zsqyci
@r_devops
My Terraform Bootcamp Udemy Course is Free!

Hey everyone, hope you all had a good holiday.

I'm an avid user of Terraform, and for the past 3 years working in 4 different companies, my career has been largely revolved around managing and scaling infra with Terraform.

I've created my first Udemy course with around 10+ hours of content, which focuses on ramping people up from a beginner Terraform user, to an intermediate one.

I absolutely love what I do and teaching others is just pretty fun.

The course is completely free. There's currently 95 coupons left until the promotion runs out (just how Udemy works).

https://www.udemy.com/course/terraform-iac-bootcamp/?couponCode=8F3602ECE527CA598D99

I'm really hoping this course helps someone understand Terraform and actually use Terraform at their workplace.

If this post goes against the subreddit rules, please let me know and I will take this down asap.

Cheers!


EDIT:

Looks like the coupon has ran out. Unfortunately I've used up all my free promotion coupons and not able to send out any more :/.

If this is your first time using Udemy:

NEVER BUY A COURSE FULL PRICE! Udemy continuously & frequently discounts the courses to $9.99. Please wait for people to review my course, to see if it's up to standards, then wait for a discount before thinking about purchasing the course.

​

​

https://redd.it/zvyzqh
@r_devops
Is GKE Autopilot suitable for running CI pipelines?

My company uses CircleCI right now and I'm looking to bring costs and build times down. Circle has a new "container runner" for self hosting that works by spinning up individual pods in a k8s cluster -- like, one pod per CI job, so a pipeline might execute over a whole bunch of potentially heterogeneous pods.

I'm pretty new to k8s, so I was considering a GKE Autopilot cluster, which seemed easier to manage. Looking into it more closely, though, it seems like the scaling characteristics might be mismatched: as far as I can tell, Autopilot only spins up resources when there is already demand for them in scheduled work, with no concept of time to live. It seems like the way to scale smoothly with Autopilot is to use horizontal pod autoscaling with resource utilization triggers, so that more pods get allocated before the cluster is overwhelmed.

CI pipelines are obviously burst-y work, and it's worth overallocating a bit to prevent startup delays. So my intuition is that I'm better off managing the node pools myself for this application. Is that right, or is there an elegant way to do this with Autopilot I'm not seeing?

https://redd.it/zvzm63
@r_devops
Do you use a Helm chart repository?

Do you push the helm chart first into a repository before deploying it or do you install the helm chart directly from your git repository?

Would you please give me the reasoning why you prefer one over the other?

https://redd.it/zw19w3
@r_devops
Dumb Question: Boost the capacity of my phone to detect a wifi ?

Hey guys , it might be a dumb question but is it possible to boost the capacity of my phone to detect a wifi by using an app ?
I'm trying to convince my friend that it is not possible.

https://redd.it/zshodg
@r_devops
Realistic data for load tests

Are there any load testing platforms/libraries that can automatically generate unique data (ex: query params, basic Json body data) for each API request in a larger load test?

I do have some existing logged request data, are there any platforms that could sample from an existing dataset to populate a load test?

https://redd.it/zrr3et
@r_devops
Can sysadmin install app through my connection to wifi's company ?

I realize i have a new folder name 'linux' inside that have 2 folder 'docker-desktop' and 'docker-desktop-data'.

I think sysadmin can use it to block my specific service and track my interaction or screen record from remote (sound violate my personal). I curious any app can actually do it.

If it true about the app, can somebody please recommend me link to uninstall it ?

Thank for reading.

https://redd.it/zshua4
@r_devops
How run minio on docker-compose + nginx reverse proxy?

I have problem with minio, not started on selected domain - 502 error.
- my docker-compose.yml for nginx proxy reverse + le
services:
nginx:
container_name: nginx
image: nginxproxy/nginx-proxy
restart: unless-stopped
ports:
- 80:80
- 443:443
volumes:
- /var/run/docker.sock:/tmp/docker.sock:ro
- /var/docker/nginx/html:/usr/share/nginx/html
- /var/docker/nginx/certs:/etc/nginx/certs
- /var/docker/nginx/vhost:/etc/nginx/vhost.d
logging:
options:
max-size: "10m"
max-file: "3"

letsencrypt-companion:
container_name: nginx-le
image: jrcs/letsencrypt-nginx-proxy-companion
restart: unless-stopped
volumes_from:
- nginx
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/docker/nginx/acme:/etc/acme.sh
environment:
DEFAULT_EMAIL: [email protected]

- docker-compose.yml for minio
version: '2'

services:
minio:
container_name: minio.domain.com
command: server /data --console-address ":9001"
environment:
- MINIO_ROOT_USER=admin
- MINIO_ROOT_PASSWORD=supersecret
- MINIO_BROWSER_REDIRECT_URL=https://minio.domain.com
- MINIO_DOMAIN=minio.domain.com
image: quay.io/minio/minio:latest
volumes:
- minio:/data
restart: unless-stopped
expose:
- "9000"
- "9001"
environment:
VIRTUAL_HOST: minio.domain.com
LETSENCRYPT_HOST: minio.domain.com
networks:
- proxy

networks:
proxy:
external:
name: nginx_default

volumes:
minio:

- logs from docker logs for minio container
Warning: Default parity set to 0. This can lead to data loss.
WARNING: Detected default credentials 'minioadmin:minioadmin', we recommend that you change these values with 'MINIO_ROOT_USER' and 'MINIO_ROOT_PASSWORD' environment variables
MinIO Object Storage Server
Copyright: 2015-2022 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2022-12-12T19-27-27Z (go1.19.4 linux/amd64)

Status: 1 Online, 0 Offline.
API: https://192.168.0.7:9000 https://127.0.0.1:9000
Console: https://192.168.0.7:9001 https://127.0.0.1:9001

Documentation: https://min.io/docs/minio/linux/index.html


When I put in docker-compose for minio:
   ports:
- '9000:9000'
- '9001:9001'


Minio working, but for all domain on my server.
How I can fix that minio show only on minio.domain.com ?

https://redd.it/zrm101
@r_devops
Do you enjoy being in DevOps?

I would especially be interested in hearing from people who came from a general Systems / Network Administration background.

Rather than make the same post about how to switch from one field to DevOps, I am interested in how you all feel about being in the field. I understand YMMV depending on the role and company.

When did you feel like your programming knowledge was sufficient enough to make the leap?

https://redd.it/zr8sgs
@r_devops
How to automate: restart service on a remote machine

Hello, hopefully a devops question.

We have one old service that fails from time to time, it is not critical enough to fix it and it does not fail often, so a `systemd restart` is sufficient. It is a Linux VM.

Current implementation: logging into elasticsearch and elastalert monitors for failure event, and once such event is detected, elastalert can execute API call to Ansible AWX. Ansible then runs a playbook to restart the failing service.

Considering to stop using AWX for this and I am looking for another approach..

Here are my options:

1. execute command via ssh. Secure and simple enough but I need to keep SSH key on the elastalert container for root or limited access to systemd.
2. create a small ansible playbook on elastalert container to run it against failing server.
3. use gRPC, this tutorial makes it look fairly simple. But is it the right tool to use for this case?
4. run flask app on target server to listen for API events.

In all cases above, I need to add stuff into docker container image, or load files from the host via volume mounts. Also step 3 and 4 are safer in a way as I can program them to run limited set of commands, in my case - restart just one service. First two options are less secure as systemd restart needs root access, but I might be able to limit that in `sudoers` config.

Elastalert supports many actions when event is detected, two of them: run a command or make HTTP POST. Ideally, as elastalert supports http/http2 post,it would make it easier to use this option to make API call.

Is there another, possibly standard way which I might not know? I might want to expand this to more than just one service, and use it to make a sort of self-healing self hosted infrastructure.

https://redd.it/zwevvp
@r_devops
react js application in s3 bucket

Is it possible to host a React JS application in an S3 bucket?

I want to deploy a React Js web application in an S3 bucket that will call an AWS Lambda function. Is it feasible?

My doubt is, since React JS is a dynamic scripting language, can this be hosted in an S3 bucket? Can React JS call a Lambda function endpoint?

https://redd.it/zwe0ad
@r_devops
Helm-Dashboard now enables cluster installation

A few months ago, we at Komodor released a new open-source project called Helm-Dashboard, which got a lot of positive feedback and attention from the community. I’m happy to share that now Helm-Dashboard can be installed both locally AND on a cluster.

It’s basically a GUI for Helm, designed to solve some of the more acute pain points of Helm users by visualizing changes in Helm charts. The goal is to help beginner Helm users to get started with Helm, and for more experienced users to speed up operations. The new cluster installation capability would enable users to collaborate better and share the same view of their charts.

Check it out on GitHub: https://github.com/komodorio/helm-dashboard

Feel free to join our Slack Kommunity: https://join.slack.com/t/komodorkommunity/shared_invite/zt-1dm3cnkue-ov1Yh\~_95teA35QNx5yuMg

Give it a ⭐️ if you liked it :)

https://redd.it/zwg7wy
@r_devops
User lifecycle management and IaC

Wanted to know how people are managing user lifecycles in a way that is compatible with IaC. For example we use Okta for provisioning and managing users but Terraform for basically everything else and have found that trying to keep our Terraform up to date with user churn is a challenge for tools like PagerDuty and others where the list of users is important but consistently changing.

https://redd.it/zwg7lt
@r_devops
ARGOCD app not identifying resources

Hi,

I am trying to use the sample app, from the documentation and I cannot figure out why its not identifying the underlying resources.

I tried "refresh", "hard refresh" checked the logs but all seems ok... even reinstalled argo

Any pointers would be appreciated.

https://redd.it/zwdp9b
@r_devops
Does your team do sprints this week when half the team is out for the holidays?

*jokingly* suggested we just have a few days of learning time this week instead of starting another sprint, but that was shot down..

Oh well.. march forward! AGILITY!

https://redd.it/zwi9iu
@r_devops