Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
Deep Learning. Foundations and Concepts by Chris Bishop and Hugh Bishop

"Deep Learning is Springer Nature’s bestselling book of 2024, cementing its position as a cornerstone resource in the field of artificial intelligence."
- Springer Nature

This book offers a comprehensive introduction to the central ideas that underpin deep learning. It is intended both for newcomers to machine learning and for those already experienced in the field. Covering key concepts relating to contemporary architectures and techniques, this essential book equips readers with a robust foundation for potential future specialization. The field of deep learning is undergoing rapid evolution, and therefore this book focusses on ideas that are likely to endure the test of time.

The book is organized into numerous bite-sized chapters, each exploring a distinct topic, and the narrative follows a linear progression, with each chapter building upon content from its predecessors. This structure is well-suited to teaching a two-semester undergraduate or postgraduate machine learning course, while remaining equally relevant to those engaged in active research or in self-study.

A full understanding of machine learning requires some mathematical background and so the book includes a self-contained introduction to probability theory. However, the focus of the book is on conveying a clear understanding of ideas, with emphasis on the real-world practical value of techniques rather than on abstract theory. Complex concepts are therefore presented from multiple complementary perspectives including textual descriptions, diagrams, mathematical formulae, and pseudo-code.

Link: Book

Navigational hashtags: #armbooks
General hashtags: #dl #deeplearning

@data_science_weekly
👍5
Full Speed Python by João Miguel Jones Ventura

This book aims to teach the Python programming language using a practical approach. Its method is quite simple: after a short introduction to each topic, the reader is invited to learn more by solving the proposed exercises.

These exercises have been used extensively in author's web development and distributed computing classes at the Superior School of Technology of Setúbal. With these exercises, most students are up to speed with Python in less than a month. In fact, students of the distributed computing course, taught in the second year of the software engineering degree, become familiar with Python's syntax in two weeks and are able to implement a distributed client-server application with sockets in the third week.

Link: Book

Navigational hashtags: #armbooks
General hashtags: #python #programming

@data_science_weekly
👍4
Pen and Paper Exercises in Machine Learning by Michael Gutmann

This is a collection of (mostly) pen-and-paper exercises in machine learning. Each exercise comes with a detailed solution. The following topics are covered:
- Linear Algebra
- Optimisation
- Directed Graphical Models
- Undirected Graphical Models
- Expressive Power of Graphical Models
- Factor Graphs and Message Passing
- Inference for Hidden Markov Models
- Model-based Learning (including ICA and unnormalised models)
- Sampling and Monte-Carlo Integration
- Variational Inference

Link: GitHub

Navigational hashtags: #armrepo
General hashtags: #math #mathematics #linearalgebra

@data_science_weekly
👍12
Practical Deep Learning for Coders by fast.ai

Practical Deep Learning for Coders 2022 part 1, recorded at the University of Queensland, covers topics such as how to:
- Build and train deep learning models for computer vision, natural language processing, tabular analysis, and collaborative filtering problems
- Create random forests and regression models
- Deploy models
- Use PyTorch, the world’s fastest growing deep learning software, plus popular libraries like fastai and Hugging Face

There are 9 lessons, and each lesson is around 90 minutes long. The course is based on their 5-star rated book, which is freely available online.

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #ml #machinelearning

@data_science_weekly
👍8
LLM Engineering Essentials by Nebius Academy

Gain the skills to build LLM-powered services that work. Master LLM APIs and self-hosted LLMs as you code, experiment, and create a platform for custom AI-powered NPCs.

During the course you will:
1. Understand the fundamentals of LLM APIs and workflows to create a chatbot based on your favorite fantasy character
2. Learn to work with self-hosted LLMs, encoders, and vector stores, and build a RAG system
3. Explore monitoring tools like Prometheus and Grafana. Optimize and fine-tune your LLM-powered service

Syllabus:

Week 1. LLM Basics
Week 2. LLM Workflows
Week 3. Context
Week 4. Self-served LLMs
Week 5. Optimization and Monitoring
Week 6. Fine-Tuning

Link: GitHub

Navigational hashtags: #armcourses
General hashtags: #llm #largelanguagemodels

@data_science_weekly
👍10
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of colour graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting - the first comprehensive treatment of this topic in any book.

This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorisation, and spectral clustering. There is also a chapter on methods for "wide'' data (p bigger than n), including multiple testing and false discovery rates.

Links:
- Book Homepage
- PDF

Navigational hashtags: #armbooks
General hashtags: #ml #machinelearning #supervised #unsupervised

@data_science_weekly
👍6
The Book of Statistical Proofs

A centralized, open and collaboratively edited archive of statistical theorems for the computational sciences!

Link: Site

Navigational hashtags: #armbooks #armsites
General hashtags: #statistics

@data_science_weekly
👍7
Machine Learning Refined. Foundations, Algorithms, and Applications by Jeremy Watt, Reza Borhani and Aggelos K. Katsaggelos

Now more than ever, it is crucial to understand the core foundations of AI and machine learning. True mastery of a subject means understanding its tenets from multiple, complementary angles. 

Ideally, this means being able to explain what you know intuitively.  
- Being able to draw a picture of an idea plainly on a cocktail napkin.
- Being able to recall key formulae that rigorously support or define an idea. 
- And finally, being able to apply a concept practically, in code.
 
This book aims to lead you towards this mastery of AI fundamentals by explaining every concept intuitively first, visually second, mathematically third, and fourth in code. In that order.  For every major concept.

Links:
- Site
- GitHub

Navigational hashtags: #armbooks #armsites
General hashtags: #ml #machinelearning #optimization #regression #classification #nn #neuralnetworks #trees

@data_science_weekly
👍4
Introduction to Machine Learning by Laurent Younes

This book introduces the mathematical foundations and techniques that lead to the development and analysis of many of the algorithms that are used in machine learning.

It starts with an introductory chapter that describes notation used throughout the book and serve at a reminder of basic concepts in calculus, linear algebra and probability and also introduces some measure theoretic terminology, which can be used as a reading guide for the sections that use these tools. The introductory chapters also provide background material on matrix analysis and optimization. The latter chapter provides theoretical support to many algorithms that are used in the book, including stochastic gradient descent, proximal methods, etc.

After discussing basic concepts for statistical prediction, the book includes an introduction to reproducing kernel theory and Hilbert space techniques, which are used in many places, before addressing the description of various algorithms for supervised statistical learning, including linear methods, support vector machines, decision trees, boosting, or neural networks.

The subject then switches to generative methods, starting with a chapter that presents sampling methods and an introduction to the theory of Markov chains.

The following chapter describe the theory of graphical models, an introduction to variational methods for models with latent variables, and to deep-learning based generative models.

The next chapters focus on unsupervised learning methods, for clustering, factor analysis and manifold learning.

The final chapter of the book is theory-oriented and discusses concentration inequalities and generalization bounds.

Links:
- arXiv
- pdf

Navigational hashtags: #armbooks
General hashtags: #ml #machinelearning #optimization #regression #classification #nn #neuralnetworks #trees

@data_science_weekly
👍7
Multimodal Deep Learning

In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multi-modal approaches have become a very active area of research.

In this seminar, authors reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually.

- Further, modeling frameworks are discussed where one modality is transformed into the other Chapter 3.1 and Chapter 3.2), as well as models in which one modality is utilized to enhance representation learning for the other (Chapter 3.3 and Chapter 3.4). To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced (Chapter 3.5).
- Finally, they also cover other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose multi-modal models (Chapter 4.3), which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art, Chapter 4.4) eventually caps off this booklet.

Links:
- Book

Navigational hashtags: #armbooks #armsite
General hashtags: #dl #deeplearning #nlp #cv

@data_science_weekly
👍4
The Tensor Cookbook by Thomas Dybdahl Ahle

What are Tensor Diagrams? Machine learning involves a lot of tensor manipulation, and it's easy to lose track of the larger structure when manipulating high-dimensional data using notation designed for vectors and matrices.
Graphical notation (first introduced by Roger Penrose in 1971) reduces the mental overhead and makes the connections "come alive":

In short, each edge is the index of a tensor, and connecting two edges contracts the tensors over this dimension. After a bit of practice, this becomes incredibly intuitive.
The Tensor Cookbook aims to popularize tensor diagrams by rewriting the classical "Matrix Cookbook". You can think of it as a reference book, skip around for some cool diagrams, or a crash course full of exercises to practice your skill.

Links:
- Book
- Site

Navigational hashtags: #armbooks #armsite
General hashtags: #tensor #matrix #derivative #statistics #probability #ml

@data_science_weekly
👍5
Part 2: Deep Learning from the Foundations by fast.ai

This course shows how to build a state of the art deep learning model from scratch.

It takes you all the way from the foundations of implementing matrix multiplication and back-propagation, through to high performance mixed-precision training, to the latest neural network architectures and learning techniques, and everything in between.

It covers many of the most important academic papers that form the foundations of modern deep learning, using “code-first” teaching, where each method is implemented from scratch in python and explained in detail (in the process, we’ll discuss many important software engineering techniques too). Before starting this part, you need to have completed Part 1: Practical Deep Learning for Coders.

The first five lessons use Python, PyTorch, and the fastai library; the last two lessons use Swift for TensorFlow, and are co-taught with Chris Lattner, the original creator of Swift, clang, and LLVM.

The purpose of Deep Learning from the Foundations is, in some ways, the opposite of part 1. This time, we’re not learning practical things that we will use right away, but are learning foundations that we can build on. This is particularly important nowadays because this field is moving so fast. In this new course, we will learn to implement a lot of things that are inside the fastai and PyTorch libraries. In fact, we’ll be reimplementing a significant subset of the fastai library! Along the way, we will practice implementing papers, which is an important skill to master when making state of the art models.

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #ml #machinelearning

@data_science_weekly
👍5
Personalized Machine Learning by Julian McAuley

Every day we interact with machine learning systems that personalize their predictions to individual users, whether to recommend movies, find new friends or dating partners, or organize our news feeds. Such systems involve several modalities of data, ranging from sequences of clicks or purchases, to rich modalities involving text, images, or social interactions.

While settings and data modalities vary significantly, in this book we introduce a common set of principles and methods that underpin the design of personalized predictive models.

The book begins by revising "traditional" machine learning models, with a special focus on how they should be adapted to settings involving user data. Later, we'll develop techniques based on more advanced principles such as matrix factorization, deep learning, and generative modeling. Finally, we conclude with a detailed study of the consequences and risks of deploying personalized predictive systems.

By understanding the principles behind personalized machine learning, readers will gain the ability to design models and systems for a wide range of applications involving user data. A series of case-studies will help readers understand the importance of personalization in domains ranging from e-commerce to personalized health, and hands-on projects and code examples (and an online supplement) will give readers experience working with large-scale real-world datasets.

Link: Book

Navigational hashtags: #armbooks #armsite
General hashtags: #ml #machinelearning #regression #classification #recommendation #recsys #nlp

@data_science_weekly
👍7
Hey! 👋

We have some exciting news! Telegram offers a fantastic feature called auto-translation for posts, which would make our channel accessible to a much wider global audience 🌍

But here's the catch: To unlock this feature for our channel, we need Telegram Premium users to boost us! 🔋

How you can help (if you're a Premium user):

1. Tap on the channel name at the top.
2. Select "Boost Channel" (or find it in the channel menu).
3. Choose how many boosts you'd like to contribute (even 1 helps!).
4. Confirm – it's quick and free for Premium users!

Or simply use this link to boost!

Why boosting matters:

- 🌐 Break Language Barriers: Auto-translation will instantly translate our posts into your preferred language, making our content accessible to everyone.
- 💡 Share Knowledge Widely: Reach more people who can benefit from what we share here.
- 🚀 Grow Together: Help our community expand and become even more vibrant!

We're currently at 4 boosts (Level 2). Our goal is Level 4 to unlock auto-translation! Every single boost from a Premium user gets us closer.

To our amazing Premium members: Your boosts are incredibly valuable! If you find this channel useful and want to help us reach more people globally, please consider boosting us. It makes a huge difference! 🙏

To everyone else: Even if you're not Premium, you can still help massively! Please share this message with friends or groups who are Premium users and might be willing to support us. 🤝

Let's unlock the power of translation together! Thank you for being such a fantastic community!

With gratitude,
Artem Ryblov
👍3
python-patterns

A collection of design patterns and idioms in Python.

Remember that each pattern has its own trade-offs. And you need to pay attention more to why you're choosing a certain pattern than to how to implement it.

Link: GitHub

Navigational hashtags: #armsite
General hashtags: #python #programming #patterns #development #engineering

@data_science_weekly
👍6
How to avoid machine learning pitfalls by Michael A. Lones

Mistakes in machine learning practice are commonplace, and can result in a loss of confidence in the findings and products of machine learning.

This guide outlines common mistakes that occur when using machine learning, and what can be done to avoid them.

Whilst it should be accessible to anyone with a basic understanding of machine learning techniques, it focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions.

It covers five stages of the machine learning process:
- What to do before model building
- How to reliably build models
- How to robustly evaluate models
- How to compare models fairly
- How to report results

Link: arXiv

Navigational hashtags: #armarticles
General hashtags: #ml #machinelearning #mlsystemdesign

@data_science_weekly
👍5
Deep Learning Fundamentals by Sebastian Raschka and Lightning AI

Deep Learning Fundamentals is a free course on learning deep learning using a modern open-source stack.

If you found this page, you probably heard that artificial intelligence and deep learning are taking the world by storm. This is correct. In this course, Sebastian Raschka, a best-selling author and professor, will teach you deep learning (machine learning with deep learning) from the ground up via a course of 10 units with bite-sized videos, quizzes, and exercises. The entire course is free and uses the most popular open-source tools for deep learning.

What will you learn in this course?
- What machine learning is and when to use it
- The main concepts of deep learning
- How to design deep learning experiments with PyTorch
- How to write efficient deep learning code with PyTorch Lightning

What will you be able to do after this course?
- Build classifiers for various kinds of data like tables, images, and text
- Tune models effectively to optimize predictive and computational performance

How is this course structured?
- The course consists of 10 units, each containing several subsections
- It is centered around informative, succinct videos that are respectful of your time
- In each unit, you will find optional exercises to practice your knowledge
- We also provide additional resources for those who want a deep dive on specific topics

What are the prerequisites?
- Ideally, you should already be familiar with programming in Python
- (Some lectures will involve a tiny bit of math, but a strong math background is not required!)

Are there interactive quizzes or exercises?
- Each section is accompanied by optional multiple-choice quizzes to test your understanding of the material
- Optionally, each unit also features one or more code exercises to practice implementing concepts covered in this class

Is there a course completion badge or certificate?
- At the end of this course, you can take an optional exam featuring 25 multiple-choice questions
- Upon answering 80% of the questions in the exam correctly (there are 5 attempts), you obtain an optional course completion badge that can be shared on LinkedIn

Link: Course

Navigational hashtags: #armcourses
General hashtags: #dl #deeplearning #pytorch #ligthning

@data_science_weekly
👍7
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering.

Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence.

Authors establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. They present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities.

Additionally, authors provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. They further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.

Link: ArXiv

Navigational hashtags: #armarticles
General hashtags: #promptengineering #prompts #prompt #llm

@data_science_weekly
👍6
Linear Algebra for Data Science by Prof. Wanmo Kang and Prof. Kyunghyun Cho

Authors have been discussing over the past few years how they should teach linear algebra to students in this new era of data science and artificial intelligence.

Over these discussions, which also led to some research collaboration as well, they realized that (one of the central concepts from linear algebra that is used frequently in practice, if not every day, was) the central concepts from linear algebra invoked frequently in practice, if not every day, were projection, and consequently singular value decomposition (SVD) as well as even less frequently positive definiteness.

Unfortunately, they noticed that existing courses on linear algebra often focus much more on the invertibility (or lack thereof), to the point that many concepts are introduced not in the order of their practicality nor usefulness but in the order of the conveniences in mathematical derivations/introductions.

They began to wonder a lot whether they can introduce concepts and results in linear algebra in a radically different way.

So, here’s a new textbook on linear algebra, where they re-imagined how and in which order linear algebra could be taught.

Links:
- Site
- Book

Navigational hashtags: #armbooks
General hashtags: #math #mathematics #linearalgebra

@data_science_weekly
👍3
Problem Solving with Algorithms and Data Structures using Python by Brad Miller and David Ranum, Luther College

This textbook is about computer science. It is also about Python. However, there is much more.

The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence.

This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving.

Authors cover abstract data types and data structures, writing algorithms, and solving problems. They look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.

Links:
- Site
- Book

Navigational hashtags: #armbooks #armcourses
General hashtags: #python #algorithms #datastructures #programming #cs #computerscience

@data_science_weekly
👍5
Deep Learning and Computational Physics by Deep Ray, Orazio Pinti, Assad A. Oberai

These notes were compiled as lecture notes for a course developed and taught at the University of the Southern California. They should be accessible to a typical engineering graduate student with a strong background in Applied Mathematics.

The main objective of these notes is to introduce a student who is familiar with concepts in linear algebra and partial differential equations to select topics in deep learning. These lecture notes exploit the strong connections between deep learning algorithms and the more conventional techniques of computational physics to achieve two goals. First, they use concepts from computational physics to develop an understanding of deep learning algorithms. Not surprisingly, many concepts in deep learning can be connected to similar concepts in computational physics, and one can utilize this connection to better understand these algorithms. Second, several novel deep learning algorithms can be used to solve challenging problems in computational physics. Thus, they offer someone who is interested in modeling a physical phenomena with a complementary set of tools.

Links:
- ArXiv
- Book

Navigational hashtags: #armbooks
General hashtags: #dl #deeplearning #physics

@data_science_weekly
👍3