Artem Ryblov’s Data Science Weekly
616 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
Machine Learning System Design Interview by Ali Aminian and Alex Xu

Machine learning system design interviews are the most difficult to tackle of all technical interview questions. This book provides a reliable strategy and knowledge base for approaching a broad range of ML system design questions. It provides a step-by-step framework for tackling an ML system design question. It includes many real-world examples to illustrate the systematic approach, with detailed steps you can follow.

This book is an essential resource for anyone interested in ML system design, whether they are beginners or experienced engineers. Meanwhile, if you need to prepare for an ML interview, this book is specifically written for you.

What’s inside?
- An insider’s take on what interviewers really look for and why.
- A 7-step framework for solving any ML system design interview question.
- 10 real ML system design interview questions with detailed solutions.
- 211 diagrams that visually explain how various systems work.

Table Of Contents
Chapter 1 Introduction and Overview
Chapter 2 Visual Search System
Chapter 3 Google Street View Blurring System
Chapter 4 YouTube Video Search
Chapter 5 Harmful Content Detection
Chapter 6 Video Recommendation System
Chapter 7 Event Recommendation System
Chapter 8 Ad Click Prediction on Social Platforms
Chapter 9 Similar Listings on Vacation Rental Platforms
Chapter 10 Personalized News Feed
Chapter 11 People You May Know

Links:
- Paper version
- Digital version
- Solutions

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #mlsd #machinelearning #machinelearningsystemdesign

@data_science_weekly
👍8
A/B Testing course by Skoltech

Each of us regularly makes decisions. The optimal solution is often not obvious, and the cost of error is high. A/B tests are the most accurate way to choose the best option.

A/B experiments are used to test the effectiveness of new drugs and are also widely used in business. Companies that use A/B experiments make more accurate decisions, allowing them to stay ahead of the competition.

Mathematical statistics is the foundation of A/B tests. It provides mathematically sound criteria for testing hypotheses. This allows us to be confident in the accuracy of our results.

Upon completion of this course, you'll be able to design experiments and evaluate them using A/B testing techniques, including advanced ones such as variance reduction and ratio metric analysis. If you're a manager, you'll learn the full A/B testing pipeline, its key steps, and the typical mistakes people make when conducting A/B tests.

Table Of Contents
Week 1. A/B Tests. Introduction
Week 2. Statistics Basics. Parametric Estimation. Bootstrapping
Week 3. Statistics Basics. Hypothesis Testing
Week 4. A/B Tests. Basic Level
Week 5. A/B Tests. Increasing Sensitivity. Review of Modern Methods

Link: Course

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #ab #abtesting

@data_science_weekly
👍5
System Design 101

Explain complex systems using visuals and simple terms.

Whether you're preparing for a System Design Interview or you simply want to understand how systems work beneath the surface, we hope this repository will help you achieve that.

Link: Repo

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #systemdesign

@data_science_weekly
👍6
Machine Learning Q and AI. 30 Essential Questions and Answers on Machine Learning and AI by Sebastian Raschka

If you’re ready to venture beyond introductory concepts and dig deeper into machine learning, deep learning, and AI, the question-and-answer format of Machine Learning Q and AI will make things fast and easy for you, without a lot of mucking about.

Born out of questions often fielded by author Sebastian Raschka, the direct, no-nonsense approach of this book makes advanced topics more accessible and genuinely engaging. Each brief, self-contained chapter journeys through a fundamental question in AI, unraveling it with clear explanations, diagrams, and hands-on exercises.

WHAT'S INSIDE:
FOCUSED CHAPTERS: Key questions in AI are answered concisely, and complex ideas are broken down into easily digestible parts.
WIDE RANGE OF TOPICS: Raschka covers topics ranging from neural network architectures and model evaluation to computer vision and natural language processing.
PRACTICAL APPLICATIONS: Learn techniques for enhancing model performance, fine-tuning large models, and more.

You’ll also explore how to:
• Manage the various sources of randomness in neural network training
• Differentiate between encoder and decoder architectures in large language models
• Reduce overfitting through data and model modifications
• Construct confidence intervals for classifiers and optimize models with limited labeled data
• Choose between different multi-GPU training paradigms and different types of generative AI models
• Understand performance metrics for natural language processing
• Make sense of the inductive biases in vision transformers

If you’ve been on the hunt for the perfect resource to elevate your understanding of machine learning, Machine Learning Q and AI will make it easy for you to painlessly advance your knowledge beyond the basics.

Link: Site

Navigational hashtags: #armknowledgesharing #armbook
General hashtags: #ml #machinelearning #nlp #cv #dl #nn #neuralnetworks #deeplearning #computervision #naturallanguageprocessing

@data_science_weekly
👍10
PyTorch internals

This talk is for those of you who have used PyTorch, and thought to yourself, "It would be great if I could contribute to PyTorch," but were scared by PyTorch's behemoth of a C++ codebase. I'm not going to lie: the PyTorch codebase can be a bit overwhelming at times. The purpose of this talk is to put a map in your hands: to tell you about the basic conceptual structure of a "tensor library that supports automatic differentiation", and give you some tools and tricks for finding your way around the codebase. I'm going to assume that you've written some PyTorch before, but haven't necessarily delved deeper into how a machine learning library is written.

The talk is in two parts: in the first part, I'm going to first introduce you to the conceptual universe of a tensor library. I'll start by talking about the tensor data type you know and love, and give a more detailed discussion about what exactly this data type provides, which will lead us to a better understanding of how it is actually implemented under the hood. If you're an advanced user of PyTorch, you'll be familiar with most of this material. We'll also talk about the trinity of "extension points", layout, device and dtype, which guide how we think about extensions to the tensor class. In the live talk at PyTorch NYC, I skipped the slides about autograd, but I'll talk a little bit about them in these notes as well.

The second part grapples with the actual nitty gritty details involved with actually coding in PyTorch. I'll tell you how to cut your way through swaths of autograd code, what code actually matters and what is legacy, and also all of the cool tools that PyTorch gives you for writing kernels.


Link: Site

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #dl #deeplearning #pytorch

@data_science_weekly
👍5
Deep Learning Tuning Playbook by Google

This document helps you train deep learning models more effectively. Although this document emphasizes hyperparameter tuning, it also touches on other aspects of deep learning training, such as training pipeline implementation and optimization.

This document assumes your machine learning task is either a supervised learning problem or a similar problem (for example, self-supervised learning) That said, some of the advice in this document may also apply to other types of machine learning problems.

Links:
- GitHub
- Site

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #dl #deeplearning #google

@data_science_weekly
👍3
Happy New Year!
👍4
Forwarded from TGStat Bot
Summary of the year for the channel "Artem Ryblov’s Data Science Weekly" from @TGStat
👍7
Tech Interview Cheat Sheet

This list is meant to be both a quick guide and reference for further research into these topics. It's basically a summary of that comp sci course you never took or forgot about, so there's no way it can cover everything in depth.

Link: Site

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #interview #techinterview #interviewprep #interviewpreparation

@data_science_weekly
👍4
A/B Testing & Experimentation Roadmap

This roadmap is for analysts, data scientists, and product folks who want to go from “I know what an A/B test is” to running trustworthy, advanced online experiments (CUPED, sequential testing, quasi-experiments, Bayesian, etc.).

It’s organized by topics. You don’t have to go strictly top-to-bottom, but earlier sections are foundations for later ones.

Link: GitHub

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #statistics #abtesting #ab

@data_science_weekly
👍4
Machine Learning Design Primer

Some helpful notes for Machine Learning System Design Interview preparation, which author gathered from various resources to prepare for machine learning systems design interview.

Link: GitHub

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #interview #techinterview #interviewprep #interviewpreparation #mlsd #mlsystemdesign #mlsysdes #systemdesign

@data_science_weekly
👍6