Artem Ryblov’s Data Science Weekly – Telegram

Artem Ryblov’s Data Science Weekly

@data_science_weekly

618 subscribers

139 photos

163 links

@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com

Download Telegram

About

Blog

Apps

Platform

Artem Ryblov’s Data Science Weekly

618 subscribers

Artem Ryblov’s Data Science Weekly

Spark in Action by Jean-Georges Perrin

Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #spark #bigdata #sql

@data_science_weekly

👍5

747 viewsedited 07:01

Artem Ryblov’s Data Science Weekly

ML and LLM system design: 500 case studies to learn from

How do companies like Netflix, Airbnb, and Doordash apply AI to improve their products and processes? We put together a database of 500 case studies from 100+ companies that share practical ML use cases, including applications built with LLMs and Generative AI, and learnings from designing ML and LLM systems.

Navigation tips. You can play around with the database by filtering case studies by industry or ML use case. We added tags based on recurring themes. This is not a perfect or mutually exclusive division, but you can use the tags to quickly find:
- Generative AI use cases. Look for tags “generative AI” and “LLM” to find examples of real-world LLM applications.
- ML systems with different data types: computer vision (CV) or natural language processing (NLP).
- ML systems for specific use cases. The most popular are recommender systems, search and ranking, and fraud detection.
- We also labeled use cases where ML powers a specific user-facing "product feature": from grammatical error correction to generating outfit combinations.

Link: Site

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #mlsystemdesign #ml #systemdesign #llm

@data_science_weekly

👍10

944 viewsedited 07:00

Artem Ryblov’s Data Science Weekly

Data Structures & Algorithms by Google

Familiarize yourself with common data structures and algorithms such as lists, trees, maps, graphs, Big-O analysis, and more!

Topics:
- Maps/Dictionaries
- Linked Lists
- Trees
- Stacks & Queues
- Heaps
- Graphs
- Runtime Analysis
- Searching & Sorting
- Recursion & DP

Link: Site

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #algorithms #leetcode #programming

@data_science_weekly

👍7

906 viewsedited 08:01

Artem Ryblov’s Data Science Weekly

Feature Engineering A-Z by Emil Hvitfeldt

This book is written to be used as a reference guide to nearly all feature engineering methods you will encounter. This is reflected in the chapter structure. Any question a practitioner is having should be answered by looking at the index and finding the right chapter.

Each section tries to be as comprehensive as possible with the number of different methods and solutions that are presented. A section on dimensionality reduction should list all the practical methods that could be used, as well as a comparison between the methods to help the reader decide what would be most appropriate. This does not mean that all methods are recommended to use. A number of these methods have little and narrow use cases. Methods that are deemed too domain-specific have been excluded from this book.

Each chapter will cover a specific method or small group of methods. This will include motivations and explanations for the method. Whenever possible each method will be accompanied by mathematical formulas and visualizations to illustrate the mechanics of the method. A small pros and cons list is provided for each method. Lastly, each section will include code snippets showcasing how to implement the methods. This is done in R and Python, using tidymodels and scikit-learn respectively. This book is a methods book first, and a coding book second.

Links:
- Site
- Repository

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering

@data_science_weekly

👍4

873 viewsedited 08:00

Artem Ryblov’s Data Science Weekly

The Little Book of Deep Learning by François Fleuret

Although the bulk of deep learning is not difficult to understand, it combines diverse components such as linear algebra, calculus, probabilities, optimization, signal processing, programming, algorithmics, and high-performance computing, making it complicated to learn.

Instead of trying to be exhaustive, this little book is limited to the background necessary to understand a few important models. This proved to be a popular approach, resulting in more than 500,000 downloads of the PDF file in the 12 months following its announcement on Twitter.

Link: Site

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #deeplearning #dl

@data_science_weekly

👍3

904 viewsedited 08:00

Artem Ryblov’s Data Science Weekly

Speech and Language Processing by Dan Jurafsky and James H. Martin

A textbook that covers both classical and modern approaches by Daniel Jurafsky is an immortal classic that is constantly updated.

As a supplement, you can also look at the course LSA 311: Computational Lexical Semantics by the same author.

Table of contents:

Part I: Fundamental Algorithms
1: Introduction
2: Regular Expressions, Tokenization, Edit Distance
3: N-gram Language Models
4: Naive Bayes, Text Classification, and Sentiment
5: Logistic Regression
6: Vector Semantics and Embeddings
7: Neural Networks
8: RNNs and LSTMs
9: Transformers
10: Large Language Models
11: Masked Language Models
12: Model Alignment, Prompting, and In-Context Learning

Part II: NLP Applications
13: Machine Translation
14: Question Answering, Information Retrieval, and RAG
15: Chatbots and Dialogue Systems
16: Automatic Speech Recognition and Text-to-Speech

Part III: Annotating Linguistic Structure
17: Sequence Labeling for Parts of Speech and Named Entities
18: Context-Free Grammars and Constituency Parsing
19: Dependency Parsing
20: Information Extraction: Relations, Events, and Time
21: Semantic Role Labeling and Argument Structure
22: Lexicons for Sentiment, Affect, and Connotation
23: Coreference Resolution and Entity Linking
24: Discourse Coherence

Links:
- Site
- Book (wait, it'll load eventually)

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #nlp #naturallanguageprocessing #llm #nn

@data_science_weekly

👍4

1.08K viewsedited 11:25

Artem Ryblov’s Data Science Weekly

Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee

Data is bigger, arrives faster, and comes in a variety of formats and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark.

Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you'll be able to:
- Learn Python, SQL, Scala, or Java high-level Structured APIs
- Understand Spark operations and SQL Engine
- Inspect, tune, and debug Spark operations with Spark configurations and Spark UI
- Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka
- Perform analytics on batch and streaming data using Structured Streaming
- Build reliable data pipelines with open source Delta Lake and Spark
- Develop machine learning pipelines with MLlib and productionize models using MLflow

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #bigdata #spark #pyspark

@data_science_weekly

👍4

1.26K viewsedited 07:00

Artem Ryblov’s Data Science Weekly

Trustworthy Online Controlled Experiments by Ron Kohavi

Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests.

Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions.

Learn how to:
- Use the scientific method to evaluate hypotheses using controlled experiments.
- Define key metrics and ideally an Overall Evaluation Criterion.
- Test for trustworthiness of the results and alert experimenters to violated assumptions.
- Build a scalable platform that lowers the marginal cost of experiments close to zero.
- Avoid pitfalls like carryover effects and Twyman's law
- Understand how statistical issues play out in practice.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ab #statistics #abtests

@data_science_weekly

👍2

1.47K viewsedited 07:04

Artem Ryblov’s Data Science Weekly

The Kaggle Book by Konrad Banachewicz and Luca Massaron

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.

The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp

@data_science_weekly

👍4

1.74K viewsedited 07:02

Artem Ryblov’s Data Science Weekly

Grokking Algorithms. An illustrated guide for programmers and other curious people by Aditya Y. Bhargava

Grokking Algorithms is a friendly take on this core computer science topic. In it, you'll learn how to apply common algorithms to the practical programming problems you face every day.

You'll start with tasks like sorting and searching. As you build up your skills, you'll tackle more complex problems like data compression and artificial intelligence. Each carefully presented example includes helpful diagrams and fully annotated code samples in Python.

By the end of this book, you will have mastered widely applicable algorithms as well as how and when to use them.

Table of Contents:
1. Introduction to algorithms
2. Selection sort
3. Recursion
4. Quicksort
5. Hash tables
6. Breadth-first search
7. Dijkstras algorithm
8. Greedy algorithms
9. Dynamic programming
10. K-nearest neighbors

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #algorithms #datastructures #leetcode

@data_science_weekly

👍1

1.29K viewsedited 07:01

Artem Ryblov’s Data Science Weekly

The Hitchhiker’s Guide to Python. Python Best Practices Guidebook by Kenneth Reitz, Tanya Schlusser

The Hitchhiker's Guide to Python takes the journeyman Pythonista to true expertise. More than any other language, Python was created with the philosophy of simplicity and parsimony. Now 25 years old, Python has become the primary or secondary language (after SQL) for many business users. With popularity comes diversity and possibly dilution.

This guide, collaboratively written by over a hundred members of the Python community, describes best practices currently used by package and application developers. Unlike other books for this audience, The Hitchhiker's Guide is light on reusable code and heavier on design philosophy, directing the reader to excellent sources that already exist.

Links:
- Site
- Book

Navigational hashtags: #armknowledgesharing #armbooks #armsites
General hashtags: #python #development

@data_science_weekly

👍5

1.1K views07:01

Artem Ryblov’s Data Science Weekly

Models Demystified
A Practical Guide from Linear Regression to Deep Learning
by Michael Clark & Seth Berry

This book is designed to guide readers on a comprehensive journey into the world of data science and modeling. For those just beginning their exploration, it offers:
- A solid foundation in the basics of modeling, presented from a practical and accessible perspective.
- A versatile toolkit of models and concepts that can be immediately applied to real-world problems.
- A balanced approach that integrates both statistical and machine learning methodologies.

For readers already experienced in modeling, the book provides:
- Deeper context and insights into familiar models.
- An introduction to new and advanced models that expand your knowledge.
- Enhanced understanding of how to select the most appropriate model for a given task and where to focus your efforts.

Above all, this book aims to highlight the common threads that connect different models, offering readers a clear and intuitive understanding of how they function and interrelate. Whether you're a beginner or a seasoned practitioner, this resource is crafted to deepen your expertise and broaden your perspective on the art and science of modeling.

Table of contents:

Preface
1 Introduction
2 Thinking About Models
3 The Foundation
4 Understanding the Model
5 Understanding the Features
6 Model Estimation and Optimization
7 Estimating Uncertainty
8 Generalized Linear Models
9 Extending the Linear Model
10 Core Concepts in Machine Learning
11 Common Models in Machine Learning
12 Extending Machine Learning
13 Causal Modeling
14 Dealing with Data
15 Danger Zone
16 Parting Thoughts

Link: Site

Navigational hashtags: #armknowledgesharing #armbooks #armsites
General hashtags: #ml #machinelearning #r #python #linear #metrics #featureengineering #optimization #mle #glm #dl #deeplearning

@data_science_weekly

👍2

1.14K views07:02

Artem Ryblov’s Data Science Weekly

Deep Learning
by Ian Goodfellow, Yoshua Bengio and Aaron Courville

The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.

Table of Contents:

Part I: Applied Math and Machine Learning Basics
2 Linear Algebra
3 Probability and Information Theory
4 Numerical Computation
5 Machine Learning Basics
Part II: Modern Practical Deep Networks
6 Deep Feedforward Networks
7 Regularization for Deep Learning
8 Optimization for Training Deep Models
9 Convolutional Networks
10 Sequence Modeling: Recurrent and Recursive Nets
11 Practical Methodology
12 Applications
Part III: Deep Learning Research
13 Linear Factor Models
14 Autoencoders
15 Representation Learning
16 Structured Probabilistic Models for Deep Learning
17 Monte Carlo Methods
18 Confronting the Partition Function
19 Approximate Inference
20 Deep Generative Models

Links:
- Site
- Book

Navigational hashtags: #armknowledgesharing #armbooks #armsites
General hashtags: #dl #deeplearning

@data_science_weekly

👍5

1.01K views07:01

Artem Ryblov’s Data Science Weekly

🤗 AI Agents Course

This free course will take you on a journey, from beginner to expert, in understanding, using and building AI agents.

In this course, you will:
- 📖 Study AI Agents in theory, design, and practice.
- 🧑‍💻 Learn to use established AI Agent libraries such as smolagents, LangChain, and LlamaIndex.
- 💾 Share your agents on the Hugging Face Hub and explore agents created by the community.
- 🏆 Participate in challenges where you will evaluate your agents against other students’.
- 🎓 Earn a certificate of completion by completing assignments.

And more!

At the end of this course you’ll understand how Agents work and how to build your own Agents using the latest libraries and tools.

Here is the general syllabus for the course:
Onboarding
Set you up with the tools and platforms that you will use.
Agent Fundamentals
Explain Tools, Thoughts, Actions, Observations, and their formats. Explain LLMs, messages, special tokens and chat templates. Show a simple use case using python functions as tools.
Frameworks
Understand how the fundamentals are implemented in popular libraries : smolagents, LangGraph, LLamaIndex
Use Cases
Let’s build some real life use cases (open to PRs 🤗 from experienced Agent builders)
Final Assignment
Build an agent for a selected benchmark and prove your understanding of Agents on the student leaderboard 🚀

Link: Course

Navigational hashtags: #armcourses
General hashtags: #nlp #llm #agents

@data_science_weekly

👍6

990 views13:03

Artem Ryblov’s Data Science Weekly

Machine Learning in Production by Carnegie Mellon University

This is a course for those who want to build software products with machine learning, not just models and demos. We assume that you can train a model or build prompts to make predictions, but what does it take to turn the model into a product and actually deploy it, have confidence in its quality, and successfully operate and maintain it at scale?

The course is designed to establish a working relationship between software engineers and data scientists: both contribute to building ML-enabled systems but have different expertise and focuses. To work together they need a mutual understanding of their roles, tasks, concerns, and goals and build a working relationship. This course is aimed at software engineers who want to build robust and responsible products meeting the specific challenges of working with ML components and at data scientists who want to understand the requirements of the model for production use and want to facilitate getting a prototype model into production; it facilitates communication and collaboration between both roles. The course is a good fit for student looking at a career as an ML engineer. The course focuses on all the steps needed to turn a model into a production system in a responsible and reliable manner.

It covers topics such as:
- How to design for wrong predictions the model may make?
How to assure safety and security despite possible mistakes? How to design the user interface and the entire system to operate in the real world?
- How to reliably deploy and update models in production?
How can we test the entire machine learning pipeline? How can MLOps tools help to automate and scale the deployment process? How can we experiment in production (A/B testing, canary releases)? How do we detect data quality issues, concept drift, and feedback loops in production?
- How to scale production ML systems?
How do we design a system to process huge amounts of training data, telemetry data, and user requests? Should we use stream processing, batch processing, lambda architecture, or data lakes?
- How to test and debug production ML systems?
How can we evaluate the quality of a model’s predictions in production? How can we test the entire ML-enabled system, not just the model? What lessons can we learn from software testing, automated test case generation, simulation, and continuous integration for testing for production machine learning?
- Which qualities matter beyond a model’s prediction accuracy?
How can we identify and measure important quality requirements, including learning and inference latency, operating cost, scalability, explainablity, fairness, privacy, robustness, and safety? Does the application need to be able to operate offline and how often do we need to update the models? How do we identify what’s important in a ML-enabled product in a production setting for a business? How do we resolve conflicts and tradeoffs?
How to work effectively in interdisciplinary teams?
How can we bring data scientists, software engineers, UI designers, managers, domain experts, big data specialists, operators, legal council, and other roles together and develop a shared understanding and team culture?

Link: Course

Navigational hashtags: #armcourses
General hashtags: #ml #dl #machinelearning #deeplearning #mlsystemdesign #mlops #mlsysdes

@data_science_weekly

👍9

992 views07:01

Artem Ryblov’s Data Science Weekly

MIT 6.S191 Introduction to Deep Learning

An efficient and high-intensity bootcamp designed to teach you the fundamentals of deep learning as quickly as possible!

MIT's introductory program on deep learning methods with applications to natural language processing, computer vision, biology, and more! Students will gain foundational knowledge of deep learning algorithms, practical experience in building neural networks, and understanding of cutting-edge topics including large language models and generative AI. Program concludes with a project proposal competition with feedback from staff and panel of industry sponsors. Prerequisites assume calculus (i.e. taking derivatives) and linear algebra (i.e. matrix multiplication), we'll try to explain everything else along the way! Experience in Python is helpful but not necessary.

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #llm #cv #nlp

@data_science_weekly

👍6

870 views09:01

Artem Ryblov’s Data Science Weekly

What happens when...
you type google.com into your browser's address box and press enter?

This repository is an attempt to answer the age-old interview question "What happens when you type google.com into your browser's address box and press enter?"

Except instead of the usual story, we're going to try to answer this question in as much detail as possible. No skipping out on anything.

This is a collaborative process, so dig in and try to help out! There are tons of details missing, just waiting for you to add them! So send us a pull request, please!

Link: GitHub

Navigational hashtags: #armrepo
General hashtags: #systemdesign

@data_science_weekly

👍5

907 views19:00

Artem Ryblov’s Data Science Weekly

🤗 Deep Reinforcement Learning Course

This course will teach you about Deep Reinforcement Learning from beginner to expert. It’s completely free and open-source!

In this course, you will:
- 📖 Study Deep Reinforcement Learning in theory and practice.
- 🧑‍💻 Learn to use famous Deep RL libraries such as Stable Baselines3, RL Baselines3 Zoo, Sample Factory and CleanRL.
- 🤖 Train agents in unique environments such as SnowballFight, Huggy the Doggo 🐶, VizDoom (Doom) and classical ones such as Space Invaders, PyBullet and more.
- 💾 Share your trained agents with one line of code to the Hub and also download powerful agents from the community.
- 🏆 Participate in challenges where you will evaluate your agents against other teams. You’ll also get to play against the agents you’ll train.
- 🎓 Earn a certificate of completion by completing 80% of the assignments.

And more!

At the end of this course, you’ll get a solid foundation from the basics to the SOTA (state-of-the-art) of methods.

The course is composed of:
- A theory part: where you learn a concept in theory.
- A hands-on: where you’ll learn to use famous Deep RL libraries to train your agents in unique environments. These hands-on will be Google Colab notebooks with companion tutorial videos if you prefer learning with video format!
- Challenges: you’ll get to put your agent to compete against other agents in different challenges. There will also be a leaderboard for you to compare the agents’ performance.

Link: Course

Navigational hashtags: #armcourse
General hashtags: #reinforcementlearning #rl #deeprl #agents #hf #huggingface

@data_science_weekly

👍5

1K views08:04

Artem Ryblov’s Data Science Weekly

Machine Learning from Scratch by Danny Friedman

This book is for readers looking to learn new machine learning algorithms or understand algorithms at a deeper level. Specifically, it is intended for readers interested in seeing machine learning algorithms derived from start to finish. Seeing these derivations might help a reader previously unfamiliar with common algorithms understand how they work intuitively. Or, seeing these derivations might help a reader experienced in modeling understand how different algorithms create the models they do and the advantages and disadvantages of each one.

This book will be most helpful for those with practice in basic modeling. It does not review best practices—such as feature engineering or balancing response variables—or discuss in depth when certain models are more appropriate than others. Instead, it focuses on the elements of those models.

Link: Book

Navigational hashtags: #armbooks
General hashtags: #ml #machinelearning

@data_science_weekly

👍5

942 views07:01

Artem Ryblov’s Data Science Weekly

Deep Learning Models by Sebastian Raschka

A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks:
- Traditional Machine Learning
- Multilayer Perceptrons
- Convolutional Neural Networks
- Basic
- Concepts
- AlexNet
- DenseNet
- Fully Convolutional
- LeNet
- MobileNet
- Network in Network
- VGG
- ResNet
- Transformers
- Ordinal Regression and Deep Learning
- Normalization Layers
- Metric Learning
- Autoencoders
- Fully-connected Autoencoders
- Convolutional Autoencoders
- Variational Autoencoders
- Conditional Variational Autoencoders
- Generative Adversarial Networks (GANs)
- Graph Neural Networks (GNNs)
- Recurrent Neural Networks (RNNs)
- Many-to-one: Sentiment Analysis / Classification
- Many-to-Many / Sequence-to-Sequence
- Model Evaluation
- K-Fold Cross-Validation
- Data Augmentation
- Tips and Tricks
- Transfer Learning
- Visualization and Interpretation
- PyTorch Workflows and Mechanics
- PyTorch Lightning Examples
- Custom Datasets
- Training and Preprocessing
- Improving Memory Efficiency
- Parallel Computing
- Other
- Autograd
- TensorFlow Workflows and Mechanics
- Custom Datasets
- Training and Preprocessing
- Related Libraries

Link: GitHub

Navigational hashtags: #armtutorials
General hashtags: #ml #machinelearning #dl #deeplearning #pytorch #tensorflow #tf #pytorchlightning

@data_science_weekly

👍9

929 views07:01

Artem Ryblov’s Data Science Weekly

MLOps guide by Chip Huyen

A collection of materials from introductory to advanced. This is roughly the path she would follow if she were to start my MLOps journey again.

Table of contents:
- ML + engineering fundamentals
- MLOps
- Overview
- Intermediate
- Advanced
- Career
- Case studies
- Bonus

Link: Guide

Navigational hashtags: #armtutorials
General hashtags: #ml #mlops

@data_science_weekly

👍9

801 views07:02