Reddit DevOps
266 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
SRE vs. Platform Engineering

Over the past decade, engineering and technology organizations have converged on a common set of best practices for building and deploying cloud-native applications. These best practices include continuous delivery, containerization, and building observable systems.

At the same time, cloud-native organizations have radically changed how they’re organized, moving from large departments (development, QA, operations, release) to smaller, independent development teams. These application development teams are supported by two new functions: site reliability engineering and platform engineering. SRE and platform engineering are spiritual successor of traditional operations teams, and bring the discipline of software engineering to different aspects of operations.

https://blog.getambassador.io/the-rise-of-cloud-native-engineering-organizations-1a244581bda5

https://redd.it/lnhwkb
@r_devops
Should DevOps Toolchain contain Azure KeyVault

Basically what the title says. In your opinion, should a tool like Azure KeyVault be in a DevOps Toolchain?

https://redd.it/lnh5er
@r_devops
Infrastructure for hosting a web scraper that scrapes huge quantities of data? (Interview Q)

Hey guys, I’ve been given an interview question to complete over the next few days which I’m a little stuck with. Basically - I’ve been asked to design the infrastructure for hosting an internal web scraper (the code for the scraper has already been written). Have to create a diagram and name the technologies (Docker/AWS/HAProxy etc) and explain my decisions. It’s a little out of my depth and I’m wondering if anyone has any resources or tips where I could learn more about infrastructure design? I know I’ll need lots of databases and probably a load balancer to divide up work between worker nodes/servers and maybe a load balancer between those and the databases? I just want to learn a bit more about the specifics so that I can design something that makes sense! I know it’s a very open ended question and there is an infinite amount to learn - but any examples or central ideologies would be great! Thanks in advance :)

https://redd.it/lnb3eh
@r_devops
training question (employer paid vs PTO)

I'm a senior developer for a major national consulting company, billing 100% for a project (years & years). My employer offers FT employees seasonal certification bootcamps - DevSecOps, AWS, Azure, etc, which are usually several days and then the exam. Senior mentors from my company do the training in-house. The company pays for the test but they don't cover all the time for attending the bootcamp -- they do half and you have to take vacation time for half. (Since these certs are not-required it's legal to require PTO for training.)

I'm just curious how common this is. In my past jobs w/ non-consultant IT departments, the company covered all time & costs of training -- as a perk for the employee, and I imagine also because it benefits them to have better-qualified staff. This seems kinda cheap to me, considering the training certs are relevant to the work & tools we use on the project.

What are your experiences? Ever have to use PTO for your skills training?

https://redd.it/ln2om4
@r_devops
What are the disadvantages of going cloud-native?

So, I think my previous post about the benefits of going cloud-native (https://www.reddit.com/r/devops/comments/lkbx9e/what\_cloud\_native\_is\_really\_good\_for) was entertaining and certainly useful. My main take-away is that with cloud-native you design your software to make the best use of a public cloud infrastructure - with all the benefits that [public cloud infra\] entails, such as scaling up and down, deploying when and where you needed it, etc. All the other benefits mentioned (e.g. speed) can be realised without "cloud-native" in my view.

But surely cloud-native has its drawbacks too. Off the top of my head, I'd say performance overhead and dependance on a rather limited number of public cloud service providers.

Other views?

https://redd.it/lky489
@r_devops
Best way to learn Linux?

I've been looking at improving my core skills like networking and Linux. I was thinking about using LA playgrounds, installing Linux as dual boot on my laptop, renting a VPS etc...

Has anyone got any good recommendations?

https://redd.it/lob2ck
@r_devops
Watch Kubernetes Experts Fix Broken Kubernetes Clusters Live

I’ve launched a new series of episodes called Klustered. These episodes feature myself and a guest from the Kubernetes community attempting to fix some Kubernetes clusters. These clusters are also broken by community members 😀

We know nothing upfront. The first episode was very fun. Episodes will be live on YouTube every Thursday. Best week we have clusters broken by Jason DeTiberus and Justin Garrison.

I hope you enjoy

https://youtu.be/teB22ZuV_z8

https://redd.it/lo7a8v
@r_devops
Monitoring 5,000 nodes

Hello.

I’m curious what solutions a community like this employs for the following scenario:

We’re looking to put about 5,000 Linux boxes across America inside of stores. They serve an important purpose and will be more or less 5,000 of the same image. This is a big increase in scale for us as our existing Linux server footprint is roughly 1,500.

We currently use Zabbix but I find it lacks in scalability and supportability.

The support will require cross collaboration between Linux OS support, database support, and application developers, so I am looking for a solution where these disparate teams can write their own monitoring and alerting solutions for their use-cases relatively easily (definitely a challenge to do with Zabbix).

I’ve been thinking about Sensu but I am interested in hearing other options/experiences here.

https://redd.it/lo9l76
@r_devops
How do you trace root cause analysis on your microservices



Hey guys trying to gain some inspiration to rethink how can I make this process less horrible in my own life

Seems to me that everyone is using the same method when doing root cause analysis (on dev/staging/prod envs), Plugging it all to some ELK, Using Kiali/Other tool for specific MS log trailing.

The process is usually something like getting some first order cause like a request failing -> finding where it started -> going to the Log trailing tool(Kiali etc.) finding the exception -> getting the trace id -> search in Kibana with trace id -> move through massive number of lines -> find next stacktrace on another MS -> repeat until finding root cause.

This is of course when you even have a stack trace that gives you more info, what if it is some authorization issue between services or some other DevOps tools in the stack (istio etc.)

Tools like datadog/splunk show the request trace and status but this doesn't solve the long root cause analysis in most of the cases

Hope you guys have something better in practice =)


Thanks in advance

https://redd.it/loawxb
@r_devops
Best practices for domain configuration

I'm setting up my own ci/cd pipeline on Docker with GitLab-CE and NGINX as reverse proxy.

I'm trying to set it up in a way where it will be fairly portable so I can use it, set it up quickly on different VPS with just docker compose.

Right now I'm on my laptop and in my host file I just set my localhost to some fake local domain local.lab

What's the proper, secure way of doing this and how is it done in a companies?

When I'm preparing setup like that should I even rely on the localhost or should I use actual domain and use SSL certificates? If you use the real domain name, how do you restrict it and make it secure?

https://redd.it/loaq1r
@r_devops
Question regarding database for responsive analytics

On current project we have a webapp with analytics module. The users select some filters and based on those filters table or graph is shown. We want the module to be responsive, so when the users select the filter that it can get data in matters of seconds.

Users filter are querying a large table (~1,000,000,000 rows and 20 columns). All columns except two are filtrable.Currently we are using Redshift but it's way too slow. Also, there is daily import in a table lasts around 15 hours (it is also too slow).

We are discussing between Clickhouse, Vertica and  BigQuery to replace Redshift.

Did anyone had similar a use case and which database solution would you recommend?

https://redd.it/loal9f
@r_devops
Nginx / uWsgi crashing about once an hour, please help

I’m running uWsgi and Nginx with Python.About once an hour my application is going down. When it goes down, I am unable to make API calls from the frontend (or hit any url for that manner).


I AM still able to SSH in, I run htop and the CPU and memory are just fine. Even our long running scripts are running and logging correctly. The var/log/nginx/error.log file has these main errors:

connect() to unix:///tmp/price.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 127.0.0.1, server

and

upstream timed out (110: Connection timed out) while reading response header from upstream,

and

upstream prematurely closed connection while reading response header from upstream

I have tried increasing the max socket connections:
https://stackoverflow.com/questions/44581719/resource-temporarily-unavailable-using-uwsgi-nginx

I have tried increasing worker_rlimit_nofile and worker_connections

https://gist.github.com/denji/8359866

I have tried spinning up a heavier EC2 server (although like I said, memory and CPU are not issues)

I have tried increasing the listensettings on my uwsgi.ini
file https://stackoverflow.com/questions/12340047/your-server-socket-listen-backlog-is-limited-to-100-connections

If you have any idea what could be causing this please help, I’m running out of ideas.

https://redd.it/lon7cn
@r_devops
Is there a good certificate manager for managing all VMs, CF and K8s workload certificates?

Running in private cloud, so please no cloud vendor solutions.

https://redd.it/lol0zv
@r_devops
Thoughts on a new CI

Just saw these guys on Hackernews: https://deltaci.com

I’m tempted as the build time at my company is about 40 minutes and I’ve spent days shaving off minutes.

Is this really true? What am I missing here?

https://redd.it/lokltw
@r_devops
Publicly share IAC orchestration template for AWS/GCP/Azure etc...?

Is there a free SaaS IAC orchestrator? Basically looking for something like AWS Cloud Formation that I can export and give to other people, but works for AWS/GCP/Azure etc...

Scenario: Build an IAC template that deploys a project (vm or container) that I can share with a community. The project is a node.js game server which uses a webserver

Goal: Share the 'IAC template' & wiki documentation via github to the community. Community would be able to import the template, input their parameters, deploy to their AWS/GCP/Azure account.

Reason: Bored ops + programing tinkerer person that would like a project to play with (to learn more AWS/GCP/Azure) and to support my community

Someone else has already built this in AWS Cloud Formation, I could go rebuild this in Azure Resource Manager and the like.... but then there are multiple independent templates

I am about to start researching Terraform cloud free tier and plumi free but wondering what other free hosted service is out there to look into.

https://redd.it/loerka
@r_devops
Industry Standards Now for CI/CD

Which technologies should I learn for setting up CI/CD, pipelines, etc? I work in an azure environment if that matters and will be using containers and orchestrator like AKS.

https://redd.it/lo5w2r
@r_devops
Question about learning aws

Hey guys, i am currently a software student and i am interested in getting a job in devops, currently i am trying to improve my python ,git and linux knowledge and i saw that it is important to learn a cloud provider service as well for the start like aws but im not sure what that means, should l just learn matiriel from resources for certifications even if im not going to take a cert exam, or i should focus on certain services, and if so then any recommandations on what to focus on that is more relevent to devops ?

https://redd.it/lo9vjc
@r_devops
Is there any good tutorial on how to make a dev and a production environment for a Wordpress application?

Is there any good tutorial on how to make a dev and a production environment for a Wordpress application? I am trying to learn some basics, so I can make a dev and a production environment with docker for most simple Wordpress application.

https://redd.it/lo8eb9
@r_devops
Question about setting rolling updates and pipelines

I'm trying to get a better understanding of devops concepts and haven't had much luck reading through aws documentation for rolling updates.

I'm aware of how rolling updates are supposed to work, my question is more to the specifics of how it would be configured.

Is there a specific aws tool that would work best to setup automated rolling updates?

My example scenario would be a working pipeline set to a test instance. The rolling update would then be set up and applied from the test instance on to a live production environment using cloudformation (or is there a better service for this?).

https://redd.it/lo92fu
@r_devops