Artem Ryblov’s Data Science Weekly
617 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
The Book of Statistical Proofs

A centralized, open and collaboratively edited archive of statistical theorems for the computational sciences!

Link: Site

Navigational hashtags: #armbooks #armsites
General hashtags: #statistics

@data_science_weekly
👍7
Machine Learning Refined. Foundations, Algorithms, and Applications by Jeremy Watt, Reza Borhani and Aggelos K. Katsaggelos

Now more than ever, it is crucial to understand the core foundations of AI and machine learning. True mastery of a subject means understanding its tenets from multiple, complementary angles. 

Ideally, this means being able to explain what you know intuitively.  
- Being able to draw a picture of an idea plainly on a cocktail napkin.
- Being able to recall key formulae that rigorously support or define an idea. 
- And finally, being able to apply a concept practically, in code.
 
This book aims to lead you towards this mastery of AI fundamentals by explaining every concept intuitively first, visually second, mathematically third, and fourth in code. In that order.  For every major concept.

Links:
- Site
- GitHub

Navigational hashtags: #armbooks #armsites
General hashtags: #ml #machinelearning #optimization #regression #classification #nn #neuralnetworks #trees

@data_science_weekly
👍4
Introduction to Machine Learning by Laurent Younes

This book introduces the mathematical foundations and techniques that lead to the development and analysis of many of the algorithms that are used in machine learning.

It starts with an introductory chapter that describes notation used throughout the book and serve at a reminder of basic concepts in calculus, linear algebra and probability and also introduces some measure theoretic terminology, which can be used as a reading guide for the sections that use these tools. The introductory chapters also provide background material on matrix analysis and optimization. The latter chapter provides theoretical support to many algorithms that are used in the book, including stochastic gradient descent, proximal methods, etc.

After discussing basic concepts for statistical prediction, the book includes an introduction to reproducing kernel theory and Hilbert space techniques, which are used in many places, before addressing the description of various algorithms for supervised statistical learning, including linear methods, support vector machines, decision trees, boosting, or neural networks.

The subject then switches to generative methods, starting with a chapter that presents sampling methods and an introduction to the theory of Markov chains.

The following chapter describe the theory of graphical models, an introduction to variational methods for models with latent variables, and to deep-learning based generative models.

The next chapters focus on unsupervised learning methods, for clustering, factor analysis and manifold learning.

The final chapter of the book is theory-oriented and discusses concentration inequalities and generalization bounds.

Links:
- arXiv
- pdf

Navigational hashtags: #armbooks
General hashtags: #ml #machinelearning #optimization #regression #classification #nn #neuralnetworks #trees

@data_science_weekly
👍7
Multimodal Deep Learning

In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multi-modal approaches have become a very active area of research.

In this seminar, authors reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually.

- Further, modeling frameworks are discussed where one modality is transformed into the other Chapter 3.1 and Chapter 3.2), as well as models in which one modality is utilized to enhance representation learning for the other (Chapter 3.3 and Chapter 3.4). To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced (Chapter 3.5).
- Finally, they also cover other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose multi-modal models (Chapter 4.3), which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art, Chapter 4.4) eventually caps off this booklet.

Links:
- Book

Navigational hashtags: #armbooks #armsite
General hashtags: #dl #deeplearning #nlp #cv

@data_science_weekly
👍4
The Tensor Cookbook by Thomas Dybdahl Ahle

What are Tensor Diagrams? Machine learning involves a lot of tensor manipulation, and it's easy to lose track of the larger structure when manipulating high-dimensional data using notation designed for vectors and matrices.
Graphical notation (first introduced by Roger Penrose in 1971) reduces the mental overhead and makes the connections "come alive":

In short, each edge is the index of a tensor, and connecting two edges contracts the tensors over this dimension. After a bit of practice, this becomes incredibly intuitive.
The Tensor Cookbook aims to popularize tensor diagrams by rewriting the classical "Matrix Cookbook". You can think of it as a reference book, skip around for some cool diagrams, or a crash course full of exercises to practice your skill.

Links:
- Book
- Site

Navigational hashtags: #armbooks #armsite
General hashtags: #tensor #matrix #derivative #statistics #probability #ml

@data_science_weekly
👍5
Part 2: Deep Learning from the Foundations by fast.ai

This course shows how to build a state of the art deep learning model from scratch.

It takes you all the way from the foundations of implementing matrix multiplication and back-propagation, through to high performance mixed-precision training, to the latest neural network architectures and learning techniques, and everything in between.

It covers many of the most important academic papers that form the foundations of modern deep learning, using “code-first” teaching, where each method is implemented from scratch in python and explained in detail (in the process, we’ll discuss many important software engineering techniques too). Before starting this part, you need to have completed Part 1: Practical Deep Learning for Coders.

The first five lessons use Python, PyTorch, and the fastai library; the last two lessons use Swift for TensorFlow, and are co-taught with Chris Lattner, the original creator of Swift, clang, and LLVM.

The purpose of Deep Learning from the Foundations is, in some ways, the opposite of part 1. This time, we’re not learning practical things that we will use right away, but are learning foundations that we can build on. This is particularly important nowadays because this field is moving so fast. In this new course, we will learn to implement a lot of things that are inside the fastai and PyTorch libraries. In fact, we’ll be reimplementing a significant subset of the fastai library! Along the way, we will practice implementing papers, which is an important skill to master when making state of the art models.

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #ml #machinelearning

@data_science_weekly
👍5
Personalized Machine Learning by Julian McAuley

Every day we interact with machine learning systems that personalize their predictions to individual users, whether to recommend movies, find new friends or dating partners, or organize our news feeds. Such systems involve several modalities of data, ranging from sequences of clicks or purchases, to rich modalities involving text, images, or social interactions.

While settings and data modalities vary significantly, in this book we introduce a common set of principles and methods that underpin the design of personalized predictive models.

The book begins by revising "traditional" machine learning models, with a special focus on how they should be adapted to settings involving user data. Later, we'll develop techniques based on more advanced principles such as matrix factorization, deep learning, and generative modeling. Finally, we conclude with a detailed study of the consequences and risks of deploying personalized predictive systems.

By understanding the principles behind personalized machine learning, readers will gain the ability to design models and systems for a wide range of applications involving user data. A series of case-studies will help readers understand the importance of personalization in domains ranging from e-commerce to personalized health, and hands-on projects and code examples (and an online supplement) will give readers experience working with large-scale real-world datasets.

Link: Book

Navigational hashtags: #armbooks #armsite
General hashtags: #ml #machinelearning #regression #classification #recommendation #recsys #nlp

@data_science_weekly
👍7
Hey! 👋

We have some exciting news! Telegram offers a fantastic feature called auto-translation for posts, which would make our channel accessible to a much wider global audience 🌍

But here's the catch: To unlock this feature for our channel, we need Telegram Premium users to boost us! 🔋

How you can help (if you're a Premium user):

1. Tap on the channel name at the top.
2. Select "Boost Channel" (or find it in the channel menu).
3. Choose how many boosts you'd like to contribute (even 1 helps!).
4. Confirm – it's quick and free for Premium users!

Or simply use this link to boost!

Why boosting matters:

- 🌐 Break Language Barriers: Auto-translation will instantly translate our posts into your preferred language, making our content accessible to everyone.
- 💡 Share Knowledge Widely: Reach more people who can benefit from what we share here.
- 🚀 Grow Together: Help our community expand and become even more vibrant!

We're currently at 4 boosts (Level 2). Our goal is Level 4 to unlock auto-translation! Every single boost from a Premium user gets us closer.

To our amazing Premium members: Your boosts are incredibly valuable! If you find this channel useful and want to help us reach more people globally, please consider boosting us. It makes a huge difference! 🙏

To everyone else: Even if you're not Premium, you can still help massively! Please share this message with friends or groups who are Premium users and might be willing to support us. 🤝

Let's unlock the power of translation together! Thank you for being such a fantastic community!

With gratitude,
Artem Ryblov
👍3
python-patterns

A collection of design patterns and idioms in Python.

Remember that each pattern has its own trade-offs. And you need to pay attention more to why you're choosing a certain pattern than to how to implement it.

Link: GitHub

Navigational hashtags: #armsite
General hashtags: #python #programming #patterns #development #engineering

@data_science_weekly
👍6
How to avoid machine learning pitfalls by Michael A. Lones

Mistakes in machine learning practice are commonplace, and can result in a loss of confidence in the findings and products of machine learning.

This guide outlines common mistakes that occur when using machine learning, and what can be done to avoid them.

Whilst it should be accessible to anyone with a basic understanding of machine learning techniques, it focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions.

It covers five stages of the machine learning process:
- What to do before model building
- How to reliably build models
- How to robustly evaluate models
- How to compare models fairly
- How to report results

Link: arXiv

Navigational hashtags: #armarticles
General hashtags: #ml #machinelearning #mlsystemdesign

@data_science_weekly
👍5
Deep Learning Fundamentals by Sebastian Raschka and Lightning AI

Deep Learning Fundamentals is a free course on learning deep learning using a modern open-source stack.

If you found this page, you probably heard that artificial intelligence and deep learning are taking the world by storm. This is correct. In this course, Sebastian Raschka, a best-selling author and professor, will teach you deep learning (machine learning with deep learning) from the ground up via a course of 10 units with bite-sized videos, quizzes, and exercises. The entire course is free and uses the most popular open-source tools for deep learning.

What will you learn in this course?
- What machine learning is and when to use it
- The main concepts of deep learning
- How to design deep learning experiments with PyTorch
- How to write efficient deep learning code with PyTorch Lightning

What will you be able to do after this course?
- Build classifiers for various kinds of data like tables, images, and text
- Tune models effectively to optimize predictive and computational performance

How is this course structured?
- The course consists of 10 units, each containing several subsections
- It is centered around informative, succinct videos that are respectful of your time
- In each unit, you will find optional exercises to practice your knowledge
- We also provide additional resources for those who want a deep dive on specific topics

What are the prerequisites?
- Ideally, you should already be familiar with programming in Python
- (Some lectures will involve a tiny bit of math, but a strong math background is not required!)

Are there interactive quizzes or exercises?
- Each section is accompanied by optional multiple-choice quizzes to test your understanding of the material
- Optionally, each unit also features one or more code exercises to practice implementing concepts covered in this class

Is there a course completion badge or certificate?
- At the end of this course, you can take an optional exam featuring 25 multiple-choice questions
- Upon answering 80% of the questions in the exam correctly (there are 5 attempts), you obtain an optional course completion badge that can be shared on LinkedIn

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #pytorch #ligthning

@data_science_weekly
👍7
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering.

Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence.

Authors establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. They present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities.

Additionally, authors provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. They further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.

Link: ArXiv

Navigational hashtags: #armarticles
General hashtags: #promptengineering #prompts #prompt #llm

@data_science_weekly
👍6
Linear Algebra for Data Science by Prof. Wanmo Kang and Prof. Kyunghyun Cho

Authors have been discussing over the past few years how they should teach linear algebra to students in this new era of data science and artificial intelligence.

Over these discussions, which also led to some research collaboration as well, they realized that (one of the central concepts from linear algebra that is used frequently in practice, if not every day, was) the central concepts from linear algebra invoked frequently in practice, if not every day, were projection, and consequently singular value decomposition (SVD) as well as even less frequently positive definiteness.

Unfortunately, they noticed that existing courses on linear algebra often focus much more on the invertibility (or lack thereof), to the point that many concepts are introduced not in the order of their practicality nor usefulness but in the order of the conveniences in mathematical derivations/introductions.

They began to wonder a lot whether they can introduce concepts and results in linear algebra in a radically different way.

So, here’s a new textbook on linear algebra, where they re-imagined how and in which order linear algebra could be taught.

Links:
- Site
- Book

Navigational hashtags: #armbooks
General hashtags: #math #mathematics #linearalgebra

@data_science_weekly
👍3
Problem Solving with Algorithms and Data Structures using Python by Brad Miller and David Ranum, Luther College

This textbook is about computer science. It is also about Python. However, there is much more.

The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence.

This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving.

Authors cover abstract data types and data structures, writing algorithms, and solving problems. They look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.

Links:
- Site
- Book

Navigational hashtags: #armbooks #armcourses
General hashtags: #python #algorithms #datastructures #programming #cs #computerscience

@data_science_weekly
👍5
Deep Learning and Computational Physics by Deep Ray, Orazio Pinti, Assad A. Oberai

These notes were compiled as lecture notes for a course developed and taught at the University of the Southern California. They should be accessible to a typical engineering graduate student with a strong background in Applied Mathematics.

The main objective of these notes is to introduce a student who is familiar with concepts in linear algebra and partial differential equations to select topics in deep learning. These lecture notes exploit the strong connections between deep learning algorithms and the more conventional techniques of computational physics to achieve two goals. First, they use concepts from computational physics to develop an understanding of deep learning algorithms. Not surprisingly, many concepts in deep learning can be connected to similar concepts in computational physics, and one can utilize this connection to better understand these algorithms. Second, several novel deep learning algorithms can be used to solve challenging problems in computational physics. Thus, they offer someone who is interested in modeling a physical phenomena with a complementary set of tools.

Links:
- ArXiv
- Book

Navigational hashtags: #armbooks
General hashtags: #dl #deeplearning #physics

@data_science_weekly
👍3
Feature Selection in Machine Learning by Soledad Galli

Feature selection is the process of selecting a subset of features from the total variables in a data set to train machine learning algorithms. Feature selection is an important aspect of data mining and predictive modelling.

Feature selection is key for developing simpler, faster, and highly performant machine learning models and can help to avoid overfitting. The aim of any feature selection algorithm is to create classifiers or regression models that run faster and whose outputs are easier to understand by their users.

In this book, you will find the most widely used feature selection methods to select the best subsets of predictor variables from your data. You will learn about filter, wrapper, and embedded methods for feature selection. Then, you will discover methods designed by computer science professionals or used in data science competitions that are faster or more scalable.

First, we will discuss the use of statistical and univariate algorithms in the context of artificial intelligence. Next, we will cover methods that select features through optimization of the model performance. We will move on to feature selection algorithms that are baked into the machine learning techniques. And finally, we will discuss additional methods designed by data scientists specifically for applied predictive modeling.
In this book, you will find out how to:
- Remove useless and redundant features by examining variability and correlation.
- Choose features based on statistical tests such as ANOVA, chi-square, and mutual information.
- Select features by using Lasso regularization or decision tree based feature importance, which are embedded in the machine learning modeling process.
- Select features by recursive feature elimination, addition, or value permutation.
Each chapter fleshes out various methods for feature selection that share common characteristics. First, you will learn the fundamentals of the feature selection method, and next you will find a Python implementation.

The book comes with an accompanying Github repository with the full source code that you can download, modify, and use in your own data science projects and case studies.

Feature selection methods differ from dimensionality reduction methods in that feature selection techniques do not alter the original representation of the variables, but merely select a reduced number of features from the training data that produce performant machine learning models.

Using the Python libraries Scikit-learn, MLXtend, and Feature-engine, you’ll learn how to select the best numerical and categorical features for regression and classification models in just a few lines of code. You will also learn how to make feature selection part of your machine learning workflow.

Link:
- Book

Navigational hashtags: #armbooks
General hashtags: #ml #machinelearning #featureselection #fs

@data_science_weekly
👍7
SQL Tutorial

Learn to answer questions with data using SQL. No coding experience necessary.

Link: Site

Navigational hashtags: #armknowledgesharing #armsites #armcourses
General hashtags: #sql

@data_science_weekly
👍6
Recommenders

Recommenders objective is to assist researchers, developers and enthusiasts in prototyping, experimenting with and bringing to production a range of classic and state-of-the-art recommendation systems.

Recommenders is a project under the Linux Foundation of AI and Data.

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:
- Prepare Data: Preparing and loading data for each recommendation algorithm.
- Model: Building models using various classical and deep learning recommendation algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
- Evaluate: Evaluating algorithms with offline metrics.
- Model Select and Optimize: Tuning and optimizing hyperparameters for recommendation models.
- Operationalize: Operationalizing models in a production environment on Azure.

Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the Recommenders documentation.

For a more detailed overview of the repository, please see the documents on the wiki page.

For some of the practical scenarios where recommendation systems have been applied, see scenarios.

Link: Repository

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #recsys #recommendersystems #recommenders

@data_science_weekly
👍4
CS50’s Introduction to Programming with Python by Harvard

An introduction to programming using a language called Python. Learn how to read and write code as well as how to test and “debug” it. Designed for students with or without prior programming experience who’d like to learn Python specifically.

Learn about functions, arguments, and return values (oh my!); variables and types; conditionals and Boolean expressions; and loops. Learn how to handle exceptions, find and fix bugs, and write unit tests; use third-party libraries; validate and extract data with regular expressions; model real-world entities with classes, objects, methods, and properties; and read and write files.

Hands-on opportunities for lots of practice. Exercises inspired by real-world programming problems.

No software required except for a web browser, or you can write code on your own PC or Mac.

Link: Course

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #python

@data_science_weekly
👍3
Interpreting Machine Learning Models With SHAP. A Guide With Python Examples And Theory On Shapley Values by Christoph Molnar

Machine learning is transforming fields from healthcare diagnostics to climate change predictions through their predictive performance. However, these complex machine learning models often lack interpretability, which is becoming more essential than ever for debugging, fostering trust, and communicating model insights.

Introducing SHAP, the Swiss army knife of machine learning interpretability:
- SHAP can be used to explain individual predictions.
- By combining explanations for individual predictions, SHAP allows to study the overall model behavior.
- SHAP is model-agnostic – it works with any model, from simple linear regression to deep learning.
- With its flexibility, SHAP can handle various data formats, whether it’s tabular, image, or text.
- The Python package shap makes the application of SHAP for model interpretation easy.

This book will be your comprehensive guide to mastering the theory and application of SHAP. It starts with the quite fascinating origin in game theory and explores what splitting taxi costs has to do with explaining machine learning predictions. Starting with using SHAP to explain a simple linear regression model, the book progressively introduces SHAP for more complex models. You’ll learn the ins and outs of the most popular explainable AI method and how to apply it using the shap package.

In a world where interpretability is key, this book is your roadmap to mastering SHAP. For machine learning models that are not only accurate but also interpretable.

Links:
- Paperback
- eBook

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #shap #interpretability #python #shapley #shapleyvalues

@data_science_weekly
👍8
Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin

Even bad code can function. But if code isn’t clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn’t have to be that way.

Noted software expert Robert C. Martin, presents a revolutionary paradigm with Clean Code: A Handbook of Agile Software Craftsmanship. Martin, who has helped bring agile principles from a practitioner’s point of view to tens of thousands of programmers, has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code “on the fly” into a book that will instill within you the values of software craftsman, and make you a better programmer―but only if you work at it.

What kind of work will you be doing? You’ll be reading code―lots of code. And you will be challenged to think about what’s right about that code, and what’s wrong with it. More importantly you will be challenged to reassess your professional values and your commitment to your craft.

Clean Code is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up code―of transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and “smells” gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code.

Readers will come away from this book understanding:
- How to tell the difference between good and bad code
- How to write good code and how to transform bad code into good code
- How to create good names, good functions, good objects, and good classes
- How to format code for maximum readability
- How to implement complete error handling without obscuring code
- How to unit test and practice test-driven development
- What “smells” and heuristics can help you identify bad code

This book is a must for any developer, software engineer, project manager, team lead, or systems analyst with an interest in producing better code.

Link: Paperback

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #development #cleancode

@data_science_weekly
👍8