Artem Ryblov’s Data Science Weekly
616 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
Multimodal Deep Learning

In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multi-modal approaches have become a very active area of research.

In this seminar, authors reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually.

- Further, modeling frameworks are discussed where one modality is transformed into the other Chapter 3.1 and Chapter 3.2), as well as models in which one modality is utilized to enhance representation learning for the other (Chapter 3.3 and Chapter 3.4). To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced (Chapter 3.5).
- Finally, they also cover other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose multi-modal models (Chapter 4.3), which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art, Chapter 4.4) eventually caps off this booklet.

Links:
- Book

Navigational hashtags: #armbooks #armsite
General hashtags: #dl #deeplearning #nlp #cv

@data_science_weekly
👍4
The Tensor Cookbook by Thomas Dybdahl Ahle

What are Tensor Diagrams? Machine learning involves a lot of tensor manipulation, and it's easy to lose track of the larger structure when manipulating high-dimensional data using notation designed for vectors and matrices.
Graphical notation (first introduced by Roger Penrose in 1971) reduces the mental overhead and makes the connections "come alive":

In short, each edge is the index of a tensor, and connecting two edges contracts the tensors over this dimension. After a bit of practice, this becomes incredibly intuitive.
The Tensor Cookbook aims to popularize tensor diagrams by rewriting the classical "Matrix Cookbook". You can think of it as a reference book, skip around for some cool diagrams, or a crash course full of exercises to practice your skill.

Links:
- Book
- Site

Navigational hashtags: #armbooks #armsite
General hashtags: #tensor #matrix #derivative #statistics #probability #ml

@data_science_weekly
👍5
Part 2: Deep Learning from the Foundations by fast.ai

This course shows how to build a state of the art deep learning model from scratch.

It takes you all the way from the foundations of implementing matrix multiplication and back-propagation, through to high performance mixed-precision training, to the latest neural network architectures and learning techniques, and everything in between.

It covers many of the most important academic papers that form the foundations of modern deep learning, using “code-first” teaching, where each method is implemented from scratch in python and explained in detail (in the process, we’ll discuss many important software engineering techniques too). Before starting this part, you need to have completed Part 1: Practical Deep Learning for Coders.

The first five lessons use Python, PyTorch, and the fastai library; the last two lessons use Swift for TensorFlow, and are co-taught with Chris Lattner, the original creator of Swift, clang, and LLVM.

The purpose of Deep Learning from the Foundations is, in some ways, the opposite of part 1. This time, we’re not learning practical things that we will use right away, but are learning foundations that we can build on. This is particularly important nowadays because this field is moving so fast. In this new course, we will learn to implement a lot of things that are inside the fastai and PyTorch libraries. In fact, we’ll be reimplementing a significant subset of the fastai library! Along the way, we will practice implementing papers, which is an important skill to master when making state of the art models.

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #ml #machinelearning

@data_science_weekly
👍5
Personalized Machine Learning by Julian McAuley

Every day we interact with machine learning systems that personalize their predictions to individual users, whether to recommend movies, find new friends or dating partners, or organize our news feeds. Such systems involve several modalities of data, ranging from sequences of clicks or purchases, to rich modalities involving text, images, or social interactions.

While settings and data modalities vary significantly, in this book we introduce a common set of principles and methods that underpin the design of personalized predictive models.

The book begins by revising "traditional" machine learning models, with a special focus on how they should be adapted to settings involving user data. Later, we'll develop techniques based on more advanced principles such as matrix factorization, deep learning, and generative modeling. Finally, we conclude with a detailed study of the consequences and risks of deploying personalized predictive systems.

By understanding the principles behind personalized machine learning, readers will gain the ability to design models and systems for a wide range of applications involving user data. A series of case-studies will help readers understand the importance of personalization in domains ranging from e-commerce to personalized health, and hands-on projects and code examples (and an online supplement) will give readers experience working with large-scale real-world datasets.

Link: Book

Navigational hashtags: #armbooks #armsite
General hashtags: #ml #machinelearning #regression #classification #recommendation #recsys #nlp

@data_science_weekly
👍7
Hey! 👋

We have some exciting news! Telegram offers a fantastic feature called auto-translation for posts, which would make our channel accessible to a much wider global audience 🌍

But here's the catch: To unlock this feature for our channel, we need Telegram Premium users to boost us! 🔋

How you can help (if you're a Premium user):

1. Tap on the channel name at the top.
2. Select "Boost Channel" (or find it in the channel menu).
3. Choose how many boosts you'd like to contribute (even 1 helps!).
4. Confirm – it's quick and free for Premium users!

Or simply use this link to boost!

Why boosting matters:

- 🌐 Break Language Barriers: Auto-translation will instantly translate our posts into your preferred language, making our content accessible to everyone.
- 💡 Share Knowledge Widely: Reach more people who can benefit from what we share here.
- 🚀 Grow Together: Help our community expand and become even more vibrant!

We're currently at 4 boosts (Level 2). Our goal is Level 4 to unlock auto-translation! Every single boost from a Premium user gets us closer.

To our amazing Premium members: Your boosts are incredibly valuable! If you find this channel useful and want to help us reach more people globally, please consider boosting us. It makes a huge difference! 🙏

To everyone else: Even if you're not Premium, you can still help massively! Please share this message with friends or groups who are Premium users and might be willing to support us. 🤝

Let's unlock the power of translation together! Thank you for being such a fantastic community!

With gratitude,
Artem Ryblov
👍3
python-patterns

A collection of design patterns and idioms in Python.

Remember that each pattern has its own trade-offs. And you need to pay attention more to why you're choosing a certain pattern than to how to implement it.

Link: GitHub

Navigational hashtags: #armsite
General hashtags: #python #programming #patterns #development #engineering

@data_science_weekly
👍6
How to avoid machine learning pitfalls by Michael A. Lones

Mistakes in machine learning practice are commonplace, and can result in a loss of confidence in the findings and products of machine learning.

This guide outlines common mistakes that occur when using machine learning, and what can be done to avoid them.

Whilst it should be accessible to anyone with a basic understanding of machine learning techniques, it focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions.

It covers five stages of the machine learning process:
- What to do before model building
- How to reliably build models
- How to robustly evaluate models
- How to compare models fairly
- How to report results

Link: arXiv

Navigational hashtags: #armarticles
General hashtags: #ml #machinelearning #mlsystemdesign

@data_science_weekly
👍5
Deep Learning Fundamentals by Sebastian Raschka and Lightning AI

Deep Learning Fundamentals is a free course on learning deep learning using a modern open-source stack.

If you found this page, you probably heard that artificial intelligence and deep learning are taking the world by storm. This is correct. In this course, Sebastian Raschka, a best-selling author and professor, will teach you deep learning (machine learning with deep learning) from the ground up via a course of 10 units with bite-sized videos, quizzes, and exercises. The entire course is free and uses the most popular open-source tools for deep learning.

What will you learn in this course?
- What machine learning is and when to use it
- The main concepts of deep learning
- How to design deep learning experiments with PyTorch
- How to write efficient deep learning code with PyTorch Lightning

What will you be able to do after this course?
- Build classifiers for various kinds of data like tables, images, and text
- Tune models effectively to optimize predictive and computational performance

How is this course structured?
- The course consists of 10 units, each containing several subsections
- It is centered around informative, succinct videos that are respectful of your time
- In each unit, you will find optional exercises to practice your knowledge
- We also provide additional resources for those who want a deep dive on specific topics

What are the prerequisites?
- Ideally, you should already be familiar with programming in Python
- (Some lectures will involve a tiny bit of math, but a strong math background is not required!)

Are there interactive quizzes or exercises?
- Each section is accompanied by optional multiple-choice quizzes to test your understanding of the material
- Optionally, each unit also features one or more code exercises to practice implementing concepts covered in this class

Is there a course completion badge or certificate?
- At the end of this course, you can take an optional exam featuring 25 multiple-choice questions
- Upon answering 80% of the questions in the exam correctly (there are 5 attempts), you obtain an optional course completion badge that can be shared on LinkedIn

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #pytorch #ligthning

@data_science_weekly
👍7
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering.

Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence.

Authors establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. They present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities.

Additionally, authors provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. They further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.

Link: ArXiv

Navigational hashtags: #armarticles
General hashtags: #promptengineering #prompts #prompt #llm

@data_science_weekly
👍6
Linear Algebra for Data Science by Prof. Wanmo Kang and Prof. Kyunghyun Cho

Authors have been discussing over the past few years how they should teach linear algebra to students in this new era of data science and artificial intelligence.

Over these discussions, which also led to some research collaboration as well, they realized that (one of the central concepts from linear algebra that is used frequently in practice, if not every day, was) the central concepts from linear algebra invoked frequently in practice, if not every day, were projection, and consequently singular value decomposition (SVD) as well as even less frequently positive definiteness.

Unfortunately, they noticed that existing courses on linear algebra often focus much more on the invertibility (or lack thereof), to the point that many concepts are introduced not in the order of their practicality nor usefulness but in the order of the conveniences in mathematical derivations/introductions.

They began to wonder a lot whether they can introduce concepts and results in linear algebra in a radically different way.

So, here’s a new textbook on linear algebra, where they re-imagined how and in which order linear algebra could be taught.

Links:
- Site
- Book

Navigational hashtags: #armbooks
General hashtags: #math #mathematics #linearalgebra

@data_science_weekly
👍3
Problem Solving with Algorithms and Data Structures using Python by Brad Miller and David Ranum, Luther College

This textbook is about computer science. It is also about Python. However, there is much more.

The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence.

This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving.

Authors cover abstract data types and data structures, writing algorithms, and solving problems. They look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.

Links:
- Site
- Book

Navigational hashtags: #armbooks #armcourses
General hashtags: #python #algorithms #datastructures #programming #cs #computerscience

@data_science_weekly
👍5
Deep Learning and Computational Physics by Deep Ray, Orazio Pinti, Assad A. Oberai

These notes were compiled as lecture notes for a course developed and taught at the University of the Southern California. They should be accessible to a typical engineering graduate student with a strong background in Applied Mathematics.

The main objective of these notes is to introduce a student who is familiar with concepts in linear algebra and partial differential equations to select topics in deep learning. These lecture notes exploit the strong connections between deep learning algorithms and the more conventional techniques of computational physics to achieve two goals. First, they use concepts from computational physics to develop an understanding of deep learning algorithms. Not surprisingly, many concepts in deep learning can be connected to similar concepts in computational physics, and one can utilize this connection to better understand these algorithms. Second, several novel deep learning algorithms can be used to solve challenging problems in computational physics. Thus, they offer someone who is interested in modeling a physical phenomena with a complementary set of tools.

Links:
- ArXiv
- Book

Navigational hashtags: #armbooks
General hashtags: #dl #deeplearning #physics

@data_science_weekly
👍3
Feature Selection in Machine Learning by Soledad Galli

Feature selection is the process of selecting a subset of features from the total variables in a data set to train machine learning algorithms. Feature selection is an important aspect of data mining and predictive modelling.

Feature selection is key for developing simpler, faster, and highly performant machine learning models and can help to avoid overfitting. The aim of any feature selection algorithm is to create classifiers or regression models that run faster and whose outputs are easier to understand by their users.

In this book, you will find the most widely used feature selection methods to select the best subsets of predictor variables from your data. You will learn about filter, wrapper, and embedded methods for feature selection. Then, you will discover methods designed by computer science professionals or used in data science competitions that are faster or more scalable.

First, we will discuss the use of statistical and univariate algorithms in the context of artificial intelligence. Next, we will cover methods that select features through optimization of the model performance. We will move on to feature selection algorithms that are baked into the machine learning techniques. And finally, we will discuss additional methods designed by data scientists specifically for applied predictive modeling.
In this book, you will find out how to:
- Remove useless and redundant features by examining variability and correlation.
- Choose features based on statistical tests such as ANOVA, chi-square, and mutual information.
- Select features by using Lasso regularization or decision tree based feature importance, which are embedded in the machine learning modeling process.
- Select features by recursive feature elimination, addition, or value permutation.
Each chapter fleshes out various methods for feature selection that share common characteristics. First, you will learn the fundamentals of the feature selection method, and next you will find a Python implementation.

The book comes with an accompanying Github repository with the full source code that you can download, modify, and use in your own data science projects and case studies.

Feature selection methods differ from dimensionality reduction methods in that feature selection techniques do not alter the original representation of the variables, but merely select a reduced number of features from the training data that produce performant machine learning models.

Using the Python libraries Scikit-learn, MLXtend, and Feature-engine, you’ll learn how to select the best numerical and categorical features for regression and classification models in just a few lines of code. You will also learn how to make feature selection part of your machine learning workflow.

Link:
- Book

Navigational hashtags: #armbooks
General hashtags: #ml #machinelearning #featureselection #fs

@data_science_weekly
👍7
SQL Tutorial

Learn to answer questions with data using SQL. No coding experience necessary.

Link: Site

Navigational hashtags: #armknowledgesharing #armsites #armcourses
General hashtags: #sql

@data_science_weekly
👍6
Recommenders

Recommenders objective is to assist researchers, developers and enthusiasts in prototyping, experimenting with and bringing to production a range of classic and state-of-the-art recommendation systems.

Recommenders is a project under the Linux Foundation of AI and Data.

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:
- Prepare Data: Preparing and loading data for each recommendation algorithm.
- Model: Building models using various classical and deep learning recommendation algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
- Evaluate: Evaluating algorithms with offline metrics.
- Model Select and Optimize: Tuning and optimizing hyperparameters for recommendation models.
- Operationalize: Operationalizing models in a production environment on Azure.

Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the Recommenders documentation.

For a more detailed overview of the repository, please see the documents on the wiki page.

For some of the practical scenarios where recommendation systems have been applied, see scenarios.

Link: Repository

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #recsys #recommendersystems #recommenders

@data_science_weekly
👍4
CS50’s Introduction to Programming with Python by Harvard

An introduction to programming using a language called Python. Learn how to read and write code as well as how to test and “debug” it. Designed for students with or without prior programming experience who’d like to learn Python specifically.

Learn about functions, arguments, and return values (oh my!); variables and types; conditionals and Boolean expressions; and loops. Learn how to handle exceptions, find and fix bugs, and write unit tests; use third-party libraries; validate and extract data with regular expressions; model real-world entities with classes, objects, methods, and properties; and read and write files.

Hands-on opportunities for lots of practice. Exercises inspired by real-world programming problems.

No software required except for a web browser, or you can write code on your own PC or Mac.

Link: Course

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #python

@data_science_weekly
👍3
Interpreting Machine Learning Models With SHAP. A Guide With Python Examples And Theory On Shapley Values by Christoph Molnar

Machine learning is transforming fields from healthcare diagnostics to climate change predictions through their predictive performance. However, these complex machine learning models often lack interpretability, which is becoming more essential than ever for debugging, fostering trust, and communicating model insights.

Introducing SHAP, the Swiss army knife of machine learning interpretability:
- SHAP can be used to explain individual predictions.
- By combining explanations for individual predictions, SHAP allows to study the overall model behavior.
- SHAP is model-agnostic – it works with any model, from simple linear regression to deep learning.
- With its flexibility, SHAP can handle various data formats, whether it’s tabular, image, or text.
- The Python package shap makes the application of SHAP for model interpretation easy.

This book will be your comprehensive guide to mastering the theory and application of SHAP. It starts with the quite fascinating origin in game theory and explores what splitting taxi costs has to do with explaining machine learning predictions. Starting with using SHAP to explain a simple linear regression model, the book progressively introduces SHAP for more complex models. You’ll learn the ins and outs of the most popular explainable AI method and how to apply it using the shap package.

In a world where interpretability is key, this book is your roadmap to mastering SHAP. For machine learning models that are not only accurate but also interpretable.

Links:
- Paperback
- eBook

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #shap #interpretability #python #shapley #shapleyvalues

@data_science_weekly
👍8
Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin

Even bad code can function. But if code isn’t clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn’t have to be that way.

Noted software expert Robert C. Martin, presents a revolutionary paradigm with Clean Code: A Handbook of Agile Software Craftsmanship. Martin, who has helped bring agile principles from a practitioner’s point of view to tens of thousands of programmers, has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code “on the fly” into a book that will instill within you the values of software craftsman, and make you a better programmer―but only if you work at it.

What kind of work will you be doing? You’ll be reading code―lots of code. And you will be challenged to think about what’s right about that code, and what’s wrong with it. More importantly you will be challenged to reassess your professional values and your commitment to your craft.

Clean Code is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up code―of transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and “smells” gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code.

Readers will come away from this book understanding:
- How to tell the difference between good and bad code
- How to write good code and how to transform bad code into good code
- How to create good names, good functions, good objects, and good classes
- How to format code for maximum readability
- How to implement complete error handling without obscuring code
- How to unit test and practice test-driven development
- What “smells” and heuristics can help you identify bad code

This book is a must for any developer, software engineer, project manager, team lead, or systems analyst with an interest in producing better code.

Link: Paperback

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #development #cleancode

@data_science_weekly
👍8
A new perspective on Shapley values, part I: Intro to Shapley and SHAP by Edden Gerber

This post is the first in a series of two posts about explaining statistical models with Shapley values.

There are two main reasons you might want to read it:
1. To learn about Shapley values and the SHAP python library.
This is what this post is about after all. The explanations it provides are far from exhaustive, and contain nothing that cannot be gathered from other online sources, but it should still serve as a good quick intro or bonus reading on this subject.
2. As an introduction or refresher before reading the next post about Naive Shapley values.
The next post is my attempt at a novel contribution to the topic of Shapley values in machine learning. You may be already familiar with SHAP and Shapley and are just glancing over this post to make sure we’re on common ground, or you may be here to clear up something confusing from the next post.

Link: Post

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #shap #shapley #interpretation #ml #python

@data_science_weekly
👍7
Robotics Course by Hugging Face 🤗

This free course will take you on a journey, from classical robotics to modern learning-based approaches, in understanding, implementing, and applying machine learning techniques to real robotic systems.

This course is based on the Robot Learning Tutorial, which is a comprehensive guide to robot learning for researchers and practitioners. Here, we are attempting to distill the tutorial into a more accessible format for the community.

This first unit will help you onboard. You’ll see the course syllabus and learning objectives, understand the structure and prerequisites, meet the team behind the course, learn about LeRobot and the surrounding Huggnig Face ecosystem, and explore the community resources that support your journey.

This course bridges theory and practice in Robotics! It's designed for students interested in understanding how machine learning is transforming robotics. Whether you're new to robotics or looking to understand learning-based approaches, this course will guide you step by step.


What to expect from this course?

Across the course you will study classical robotics foundations and modern learning‑based approaches, learn to use LeRobot, work with real robotics datasets, and implement state‑of‑the‑art algorithms. The emphasis is on practical skills you can apply to real robotic systems.

At the end of this course, you'll understand:
- How robots learn from data
- Why learning-based approaches are transforming robotics
- How to implement these techniques using modern tools like LeRobot

What's the syllabus?

Here is the general syllabus for the robotics course. Each unit builds on the previous ones to give you a comprehensive understanding of Robotics.
- Course Introduction. Welcome, prerequisites, and course overview
- Introduction to Robotics. Why Robotics matters and LeRobot ecosystem
- Classical Robotics. Traditional approaches and their limitations
- Reinforcement Learning. How robots learn through trial and error
- Imitation Learning. Learning from demonstrations and behavioral cloning
- Foundation Models. Large-scale models for general robotics

Link: Course

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #robotics #rf #reinforcementlearning #foundationalmodels #hf #huggingface

@data_science_weekly
👍6
A new perspective on Shapley values, part II: The Naïve Shapley method by Edden Gerber

Why should you read this post?
1. For insight into Shapley values and the SHAP tool.
Most other sources on these topics are explanations based on existing primary sources (e.g. academic papers and the SHAP documentation). This post is an attempt to gain some understanding through an empirical approach.
2. To learn about an alternative approach to computing Shapley values, that under some (limited) circumstances may be preferable to SHAP.
If you are unfamiliar with Shaply values or SHAP, or want a short recap of how the SHAP explainers work, check out the previous post. In a hurry? The author has emphasized the key sentences in bold to assist your speed-reading.

Link: Post

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #shap #shapley #interpretation #ml #python

@data_science_weekly
👍6