Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
The Pillars of Data Science

I've created a site where I have been developing two differently styled roadmaps based on the links I share on this channel.

Both guides contain the same information but are formatted differently for your convenience.

The first roadmap is called Topic Guides.
These guides are focused on topics like Machine Learning and then split into knowledge levels and resource types. Thus, you can use them if you want to focus on a specific topic and deepen your knowledge.

The second roadmap is called Content Type Guides.
These guides are aimed by resource type, such as courses, and then divided into topics and knowledge levels. So, you can use them if you prefer a certain type of resource and want to expand your knowledge.

This site is updated as new links are posted.

Link: Site

@data_science_weekly
Artem Ryblov’s Data Science Weekly pinned «The Pillars of Data Science I've created a site where I have been developing two differently styled roadmaps based on the links I share on this channel. Both guides contain the same information but are formatted differently for your convenience. The first…»
Prompt Engineering Guide

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

Happy Prompting!

Links:
- https://github.com/dair-ai/Prompt-Engineering-Guide
- https://www.promptingguide.ai/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #promptengineering #prompts #promptdesign #prompt #prompting

@data_science_weekly
Основы алгоритмов

С помощью этого хендбука вы научитесь проектировать, оптимизировать, комбинировать и отлаживать алгоритмы — причём без привязки к какому-либо языку программирования. Кроме теории мы собрали и практические задания разного уровня сложности, а также подготовили систему автоматической проверки эффективности алгоритмов — всё это поможет вам закрепить и отточить новые навыки.

Link: https://academy.yandex.ru/handbook/algorithms

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #algorithms #datastructures #datastructuresandalgorithms #python

@data_science_weekly
👍1
The Hugging Face Course

This course will teach you about natural language processing (NLP) using libraries from the Hugging Face ecosystem — 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate — as well as the Hugging Face Hub.

It’s completely free and without ads.

Link: https://huggingface.co/learn/nlp-course/chapter1/1

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #nlp #language #naturallanguageprocessing #huggingface #transformers #deeplearning #freecourse #freecourses

@data_science_weekly
👍1
Learn PyTorch for Deep Learning: Zero to Mastery

Welcome to the second-best place on the internet to learn PyTorch (the first being the PyTorch documentation).
This is the online book version of the Learn PyTorch for Deep Learning: Zero to Mastery course.
This course will teach you the foundations of machine learning and deep learning with PyTorch (a machine learning framework written in Python).
The course is video based. However, the videos are based on the contents of this online book.

Links:
- https://www.learnpytorch.io/
- https://github.com/mrdbourke/pytorch-deep-learning
- https://zerotomastery.io/courses/learn-pytorch/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #machinelearning #python #computervision #transferlearning #classification #modeldeployment #pytorch #torch

@data_science_weekly
CS109: Probability for Computer Scientists

While the initial foundations of computer science began in the world of discrete mathematics (after all, modern computers are digital in nature), recent years have seen a surge in the use of probability as a tool for the analysis and development of new algorithms and systems. As a result, it is becoming increasingly important for budding computer scientists to understand probability theory, both to provide new perspectives on existing ideas and to help further advance the field in new ways.

CS109: Probability for Computer Scientists starts by providing a fundamental grounding in combinatorics, and then quickly moves into the basics of probability theory. We will then cover many essential concepts in probability theory, including particular probability distributions, properties of probabilities, and mathematical tools for analysing probabilities. Finally, the last third of the class will focus on data analysis and machine learning as a means for seeing direct applications of probability in this exciting and quickly growing subfield of computer science. This is going to be a great quarter, and we are looking forward to the chance to teach you.

Course Topics
Here are the broad strokes of the course (in approximate order). More information is available on our Schedule page. We cover a very broad set of topics so that you are equipped with the probability and statistics you will see in your future CS studies!
- Counting and probability fundamentals
- Single-dimensional random variables
- Probabilistic models
- Uncertainty theory
- Parameter estimation
- Introduction to machine learning

Links
- Course: https://web.stanford.edu/class/cs109/
- Course Book: https://chrispiech.github.io/probabilityForComputerScientists/en/index.html
- Python for Probability: https://web.stanford.edu/class/archive/cs/cs109/cs109.1238/handouts/python.html

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #probability #stanford #machinelearning #dataanalysis #computerscience #help #mathematics

@data_science_weekly
The Ultimate SQL Guide

Understanding SQL remains the best way to work with data in our organizations. For a stakeholder, just being able to read and understand SQL queries can completely change how they work. Instead of only working with static, flat dashboards, they can work more closely with the data team, ask more probing questions and be a smarter consumer of the data they do receive.

The guide contains live data and queries and explains concepts spatially and offers pragmatic advice instead of overly technical explanations.

It was made with three people in mind:
- someone brand new to SQL and wants to learn the basics
- someone new to SQL wanting to upskill themselves
- people who use SQL every day and may need to refresh on certain concepts (like regular expressions) from time-to-time

Link
https://blog.count.co/the-ultimate-sql-guide/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #data #sql #learning #free

@data_science_weekly
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning by Sebastian Raschka

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings.
This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning.

Link
https://arxiv.org/abs/1811.12808

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #machinelearning #ml #modelevaluation #evaluation #selection #cv #crossvalidation

@data_science_weekly
Mindful Modeler by Christoph Molnar

The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.

Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to model.fit() and model.predict() you can stand out by bringing statistical thinking to the arena.
Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.

You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering

Link
https://mindfulmodeler.substack.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference

@data_science_weekly
👍1
Algorithmic concepts by Afshine Amidi and Shervine Amidi

This guide is a concise and illustrated guide for anyone who wants to brush up on their fundamentals in the context of coding interviews, computer science classes or to satisfy their own curiosity.

It is divided into 4 parts
- Foundations: main types of algorithms and related mathematical concepts
- Data structures: arrays, strings, queues, stacks, hash tables, linked lists and associated theorems and tricks
- Graphs and trees: graph concepts and graph traversal algorithms along with important types of trees
- Sorting and search: common, efficient sorting and search algorithms

Link
https://superstudy.guide/algorithms-data-structures/foundations/algorithmic-concepts

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #algorithms #datastructures #datastructuresandalgorithms #graphs #trees #sorting #search

@data_science_weekly
An Introduction to Statistical Learning with applications in PYTHON!

As the scale and scope of data collection continue to increase across virtually all fields, statistical learning has become a critical toolkit for anyone who wishes to understand data. An Introduction to Statistical Learning provides a broad and less technical treatment of key topics in statistical learning.

The Python edition (ISLP) was published in 2023.

The chapters cover the following topics:
- What is statistical learning?
- Regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Moving beyond linearity
- Tree-based methods
- Support vector machines
- Deep learning
- Survival analysis
- Unsupervised learning
- Multiple testing

Link: https://www.statlearning.com

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ISLR #ISLP #regression #classification #resampling #linearmodels #regularization #trees #svm #deeplearning #unsupervisedlearning #abtesting

@data_science_weekly
Full Stack Deep Learning Course

Full Stack Deep Learning (FSDL) is the course and community for people who are building products that are powered by machine learning (ML).

Table of contents:
- Lecture 1: Course Vision and When to Use ML
- Lab Overview
- Lecture 2: Development Infrastructure & Tooling
- Lab 4: Experiment Management
- Lecture 3: Troubleshooting & Testing
- Lab 5: Troubleshooting & Testing
- Lecture 4: Data Management
- Lab 6: Data Annotation
- Lecture 5: Deployment
- Lab 7: Web Deployment
- Lecture 6: Continual Learning
- Lab 8: Model Monitoring
- Lecture 7: Foundation Models
- Lecture 8: ML Teams and Project Management
- Lecture 9: Ethics

Link: https://fullstackdeeplearning.com/course/2022/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #fullstack #datamanagement #dataannotation #deployment #webdevelopment #continuallearning #modelmonitoring #foundationmodels #projectmanagement

@data_science_weekly
👍1
Machine Learning System Design by Valerii Babushkin and Arseny Kravchenko

Get the big picture and the important details with this end-to-end guide for designing highly effective, reliable machine learning systems.

In "Machine Learning System Design: With end-to-end examples" you will learn:
- The big picture of machine learning system design
- Analyzing a problem space to identify the optimal ML solution
- Ace ML system design interviews
- Selecting appropriate metrics and evaluation criteria
- Prioritizing tasks at different stages of ML system design
- Solving dataset-related problems through data gathering, error analysis, and feature engineering
- Recognizing common pitfalls in ML system development
- Designing ML systems to be lean, maintainable, and extensible over time

Link: https://www.manning.com/books/machine-learning-system-design

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #systemdesign #machinelearningsystemdesign

@data_science_weekly
Annotated PyTorch Paper Implementations

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes (see the screenshot).

These implementations will help you understand the algorithms better.

Link: https://nn.labml.ai/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #pytorch #deeplearning #ai #dl #article #paper #ml #machinelearning #deeplearningalgorithms

@data_science_weekly
MLOps Zoomcamp

Objective
Teach practical aspects of productionizing ML services — from training and experimenting to model deployment and monitoring.

Target audience
Data scientists and ML engineers. Also, software engineers and data engineers interested in learning about putting ML in production.

Pre-requisites
- Python
- Docker
- Being comfortable with command line
- Prior exposure to machine learning (at work or from other courses, e.g. from ML Zoomcamp)
- Prior programming experience (at least 1+ year)

Syllabus
- Module 1: Introduction
- Module 2: Experiment tracking and model management
- Module 3: Orchestration and ML Pipelines
- Module 4: Model Deployment
- Module 5: Model Monitoring
- Module 6: Best Practices
- Project

Link: https://github.com/DataTalksClub/mlops-zoomcamp

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #modeldeployment #mlops #modelmonitoring #modelorchestration

@data_science_weekly
Neural Networks: Zero to Hero by Andrej Karpathy

A course by Andrej Karpathy on building neural networks, from scratch, in code.

"We start with the basics of backpropagation and build up to modern deep neural networks, like GPT. In my opinion language models are an excellent place to learn deep learning, even if your intention is to eventually go to other areas like computer vision because most of what you learn will be immediately transferable. This is why we dive into and focus on language models."

Prerequisites:
- Solid programming (Python)
- Intro-level math (e.g. derivative, gaussian).

Current Syllabus:
- The spelled-out intro to neural networks and backpropagation: building micrograd
- The spelled-out intro to language modeling: building makemore
- Building makemore Part 2: MLP
- Building makemore Part 3: Activations & Gradients, BatchNorm
- Building makemore Part 4: Becoming a Backprop Ninja
- Building makemore Part 5: Building a WaveNet
- Let's build GPT: from scratch, in code, spelled out.
- ongoing...

Links:
- https://karpathy.ai/zero-to-hero.html
- https://github.com/karpathy/nn-zero-to-hero/tree/master

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #mlp #batchnorm #backprop #gpt #fromscratch #neuralnetworks #python

@data_science_weekly
Short Courses by DeepLearning.AI

Take your generative AI skills to the next level with short courses from DeepLearning.AI.

Their short courses help you learn new skills, tools, and concepts efficiently.

Available for free for a limited time:
- Understanding and Applying Text Embeddings
- ChatGPT Prompt Engineering for Developers
- Building Systems with the ChatGPT API
- LangChain for LLM Application Development
- LangChain: Chat with Your Data
- Finetuning Large Language Models
- Large Language Models with Semantic Search
- Building Generative AI Applications with Gradio
- Evaluating and Debugging Generative AI Models Using Weights and Biases
- How Diffusion Models Work
- How Business Thinkers Can Start Building AI Plugins With Semantic Kernel
- Pair Programming with a Large Language Model

Links:
- https://www.deeplearning.ai/short-courses/
- Linkedin version of this post

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #deeplearningai #llm #transformers #embeddings #chatgpt #gradio #diffusion #semanticsearch #promptengineering #prompts

@data_science_weekly
Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson

The process of developing predictive models includes many stages. Most resources focus on the modelling algorithms, but neglect other critical aspects of the modelling process. This book describes techniques for finding the best representations of predictors for modelling and for finding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques, along with R programs for reproducing the results.

Table of Contents:
1. Introduction
2. Illustrative Example: Predicting Risk of Ischemic Stroke
3. A Review of the Predictive Modeling Process
4. Exploratory Visualizations
5. Encoding Categorical Predictors
6. Engineering Numeric Predictors
7. Detecting Interaction Effects
8. Handling Missing Data
9. Working with Profile Data
10. Feature Selection Overview
11. Greedy Search Methods
12. Global Search Methods

Links:
- Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #featureengineering #featureselection #missingdata #categoricalvariables

@data_science_weekly
TinyML and Efficient Deep Learning Computing

Large generative models (e.g., large language models, diffusion models) have shown remarkable performance, but they require a massive amount of computational resources. To make them more accessible, it is crucial to improve their efficiency.

This course will introduce efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models, diffusion models, video recognition, and point cloud. This course will also cover topics about quantum machine learning.

Students will get hands-on experience deploying large language models (e.g., LLaMA 2) on a laptop.

Link: https://hanlab.mit.edu/courses/2023-fall-65940

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #llm #largelanguagemodels #diffusion #diffusionmodels #pruning #quantization

@data_science_weekly
Machine Learning for Everyone. In simple words. With real-world examples. Yes, again.

Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it. If you ever tried to read articles about machine learning on the Internet, most likely you stumbled upon two types of them: thick academic trilogies filled with theorems (I couldn’t even get through half of one) or fishy fairytales about artificial intelligence, data-science magic, and jobs of the future.

A simple introduction for those who always wanted to understand machine learning. Only real-world problems, practical solutions, simple language, and no high-level theorems. One and for everyone. Whether you are a programmer or a manager.

Link: https://vas3k.com/blog/machine_learning/

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #ml #machinelearning #data #features #algorithms #classification #regression #neuralnets #deeplearning #dl #supervised #unsupervised

@data_science_weekly