Artem Ryblov’s Data Science Weekly
616 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
An Introduction to Statistical Learning with applications in PYTHON!

As the scale and scope of data collection continue to increase across virtually all fields, statistical learning has become a critical toolkit for anyone who wishes to understand data. An Introduction to Statistical Learning provides a broad and less technical treatment of key topics in statistical learning.

The Python edition (ISLP) was published in 2023.

The chapters cover the following topics:
- What is statistical learning?
- Regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Moving beyond linearity
- Tree-based methods
- Support vector machines
- Deep learning
- Survival analysis
- Unsupervised learning
- Multiple testing

Link: https://www.statlearning.com

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ISLR #ISLP #regression #classification #resampling #linearmodels #regularization #trees #svm #deeplearning #unsupervisedlearning #abtesting

@data_science_weekly
Full Stack Deep Learning Course

Full Stack Deep Learning (FSDL) is the course and community for people who are building products that are powered by machine learning (ML).

Table of contents:
- Lecture 1: Course Vision and When to Use ML
- Lab Overview
- Lecture 2: Development Infrastructure & Tooling
- Lab 4: Experiment Management
- Lecture 3: Troubleshooting & Testing
- Lab 5: Troubleshooting & Testing
- Lecture 4: Data Management
- Lab 6: Data Annotation
- Lecture 5: Deployment
- Lab 7: Web Deployment
- Lecture 6: Continual Learning
- Lab 8: Model Monitoring
- Lecture 7: Foundation Models
- Lecture 8: ML Teams and Project Management
- Lecture 9: Ethics

Link: https://fullstackdeeplearning.com/course/2022/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #fullstack #datamanagement #dataannotation #deployment #webdevelopment #continuallearning #modelmonitoring #foundationmodels #projectmanagement

@data_science_weekly
👍1
Machine Learning System Design by Valerii Babushkin and Arseny Kravchenko

Get the big picture and the important details with this end-to-end guide for designing highly effective, reliable machine learning systems.

In "Machine Learning System Design: With end-to-end examples" you will learn:
- The big picture of machine learning system design
- Analyzing a problem space to identify the optimal ML solution
- Ace ML system design interviews
- Selecting appropriate metrics and evaluation criteria
- Prioritizing tasks at different stages of ML system design
- Solving dataset-related problems through data gathering, error analysis, and feature engineering
- Recognizing common pitfalls in ML system development
- Designing ML systems to be lean, maintainable, and extensible over time

Link: https://www.manning.com/books/machine-learning-system-design

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #systemdesign #machinelearningsystemdesign

@data_science_weekly
Annotated PyTorch Paper Implementations

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes (see the screenshot).

These implementations will help you understand the algorithms better.

Link: https://nn.labml.ai/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #pytorch #deeplearning #ai #dl #article #paper #ml #machinelearning #deeplearningalgorithms

@data_science_weekly
MLOps Zoomcamp

Objective
Teach practical aspects of productionizing ML services — from training and experimenting to model deployment and monitoring.

Target audience
Data scientists and ML engineers. Also, software engineers and data engineers interested in learning about putting ML in production.

Pre-requisites
- Python
- Docker
- Being comfortable with command line
- Prior exposure to machine learning (at work or from other courses, e.g. from ML Zoomcamp)
- Prior programming experience (at least 1+ year)

Syllabus
- Module 1: Introduction
- Module 2: Experiment tracking and model management
- Module 3: Orchestration and ML Pipelines
- Module 4: Model Deployment
- Module 5: Model Monitoring
- Module 6: Best Practices
- Project

Link: https://github.com/DataTalksClub/mlops-zoomcamp

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #modeldeployment #mlops #modelmonitoring #modelorchestration

@data_science_weekly
Neural Networks: Zero to Hero by Andrej Karpathy

A course by Andrej Karpathy on building neural networks, from scratch, in code.

"We start with the basics of backpropagation and build up to modern deep neural networks, like GPT. In my opinion language models are an excellent place to learn deep learning, even if your intention is to eventually go to other areas like computer vision because most of what you learn will be immediately transferable. This is why we dive into and focus on language models."

Prerequisites:
- Solid programming (Python)
- Intro-level math (e.g. derivative, gaussian).

Current Syllabus:
- The spelled-out intro to neural networks and backpropagation: building micrograd
- The spelled-out intro to language modeling: building makemore
- Building makemore Part 2: MLP
- Building makemore Part 3: Activations & Gradients, BatchNorm
- Building makemore Part 4: Becoming a Backprop Ninja
- Building makemore Part 5: Building a WaveNet
- Let's build GPT: from scratch, in code, spelled out.
- ongoing...

Links:
- https://karpathy.ai/zero-to-hero.html
- https://github.com/karpathy/nn-zero-to-hero/tree/master

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #mlp #batchnorm #backprop #gpt #fromscratch #neuralnetworks #python

@data_science_weekly
Short Courses by DeepLearning.AI

Take your generative AI skills to the next level with short courses from DeepLearning.AI.

Their short courses help you learn new skills, tools, and concepts efficiently.

Available for free for a limited time:
- Understanding and Applying Text Embeddings
- ChatGPT Prompt Engineering for Developers
- Building Systems with the ChatGPT API
- LangChain for LLM Application Development
- LangChain: Chat with Your Data
- Finetuning Large Language Models
- Large Language Models with Semantic Search
- Building Generative AI Applications with Gradio
- Evaluating and Debugging Generative AI Models Using Weights and Biases
- How Diffusion Models Work
- How Business Thinkers Can Start Building AI Plugins With Semantic Kernel
- Pair Programming with a Large Language Model

Links:
- https://www.deeplearning.ai/short-courses/
- Linkedin version of this post

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #deeplearningai #llm #transformers #embeddings #chatgpt #gradio #diffusion #semanticsearch #promptengineering #prompts

@data_science_weekly
Feature Engineering and Selection: A Practical Approach for Predictive Models by Max Kuhn and Kjell Johnson

The process of developing predictive models includes many stages. Most resources focus on the modelling algorithms, but neglect other critical aspects of the modelling process. This book describes techniques for finding the best representations of predictors for modelling and for finding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques, along with R programs for reproducing the results.

Table of Contents:
1. Introduction
2. Illustrative Example: Predicting Risk of Ischemic Stroke
3. A Review of the Predictive Modeling Process
4. Exploratory Visualizations
5. Encoding Categorical Predictors
6. Engineering Numeric Predictors
7. Detecting Interaction Effects
8. Handling Missing Data
9. Working with Profile Data
10. Feature Selection Overview
11. Greedy Search Methods
12. Global Search Methods

Links:
- Direct Link

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #featureengineering #featureselection #missingdata #categoricalvariables

@data_science_weekly
TinyML and Efficient Deep Learning Computing

Large generative models (e.g., large language models, diffusion models) have shown remarkable performance, but they require a massive amount of computational resources. To make them more accessible, it is crucial to improve their efficiency.

This course will introduce efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models, diffusion models, video recognition, and point cloud. This course will also cover topics about quantum machine learning.

Students will get hands-on experience deploying large language models (e.g., LLaMA 2) on a laptop.

Link: https://hanlab.mit.edu/courses/2023-fall-65940

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #llm #largelanguagemodels #diffusion #diffusionmodels #pruning #quantization

@data_science_weekly
Machine Learning for Everyone. In simple words. With real-world examples. Yes, again.

Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it. If you ever tried to read articles about machine learning on the Internet, most likely you stumbled upon two types of them: thick academic trilogies filled with theorems (I couldn’t even get through half of one) or fishy fairytales about artificial intelligence, data-science magic, and jobs of the future.

A simple introduction for those who always wanted to understand machine learning. Only real-world problems, practical solutions, simple language, and no high-level theorems. One and for everyone. Whether you are a programmer or a manager.

Link: https://vas3k.com/blog/machine_learning/

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #ml #machinelearning #data #features #algorithms #classification #regression #neuralnets #deeplearning #dl #supervised #unsupervised

@data_science_weekly
Harvard CS50 (2023) – Full Computer Science University Course

This is CS50, Harvard University’s introduction to the intellectual enterprises of computer science and the art of programming, for concentrators and non-concentrators alike, with or without prior programming experience. (Two thirds of CS50 students have never taken CS before.) This course teaches you how to solve problems, both with and without code, with an emphasis on correctness, design, and style. Topics include computational thinking, abstraction, algorithms, data structures, and computer science more generally. Problem sets inspired by the arts, humanities, social sciences, and sciences. More than teach you how to program in one language, this course teaches you how to program fundamentally and how to teach yourself new languages ultimately. The course starts with a traditional but omnipresent language called C that underlies today’s newer languages, via which you’ll learn not only about functions, variables, conditionals, loops, and more, but also about how computers themselves work underneath the hood, memory and all. The course then transitions to Python, a higher-level language that you’ll understand all the more because of C. Toward term’s end, the course introduces SQL, via which you can store data in databases, along with HTML, CSS, and JavaScript, via which you can create web and mobile apps alike. Course culminates in a final project.

Course Contents
⌨️ Lecture 0 - Scratch
⌨️ Lecture 1 - C
⌨️ Lecture 2 - Arrays
⌨️ Lecture 3 - Algorithms
⌨️ Lecture 4 - Memory
⌨️ Lecture 5 - Data Structures
⌨️ Lecture 6 - Python
⌨️ Lecture 7 - SQL
⌨️ Lecture 8 - HTML, CSS, JavaScript
⌨️ Lecture 9 - Flask
⌨️ Lecture 10 - Emoji
⌨️ Cybersecurity

Links:
- https://cs50.harvard.edu/x
- https://www.youtube.com/watch?v=LfaMVlDaQ24

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #cs #computerscience #harvard #algorithms #datastructures #datastructuresandalgorithms #python #sql #C #arrays

@data_science_weekly
👍2
Understanding Deep Learning by Simon J.D. Prince

Deep learning is a fast-moving field with sweeping relevance in today’s increasingly digital world. Understanding Deep Learning provides an authoritative, accessible, and up-to-date treatment of the subject, covering all the key topics along with recent advances and cutting-edge concepts. Many deep learning texts are crowded with technical details that obscure fundamentals, but Simon Prince ruthlessly curates only the most important ideas to provide a high density of critical information in an intuitive and digestible form. From machine learning basics to advanced models, each concept is presented in lay terms and then detailed precisely in mathematical form and illustrated visually. The result is a lucid, self-contained textbook suitable for anyone with a basic background in applied mathematics.

- Up-to-date treatment of deep learning covers cutting-edge topics not found in existing texts, such as transformers and diffusion models
- Short, focused chapters progress in complexity, easing students into difficult concepts
- Pragmatic approach straddling theory and practice gives readers the level of detail required to implement naive versions of models
- Streamlined presentation separates critical ideas from background context and extraneous detail
- Minimal mathematical prerequisites, extensive illustrations, and practice problems make challenging material widely accessible
- Programming exercises offered in accompanying Python Notebooks

Link: https://udlbook.github.io/udlbook/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #dl #deeplearning #transformers #diffusion

@data_science_weekly
👍1
Spinning Up in Deep RL by OpenAI

This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).

For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning.

More about the course: https://www.youtube.com/watch?v=fdY7dt3ijgY&t=1s (OpenAI Spinning Up in Deep RL Workshop)

Link: https://spinningup.openai.com/en/latest/index.html

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #rl #reinforcementlearning #deeprl #openai #deeplearning #dl

@data_science_weekly
👍1
Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis by Ethan Bueno de Mesquita, Anthony Fowler

An introduction to data science or statistics shouldn’t involve proving complex theorems or memorizing obscure terms and formulas, but that is exactly what most introductory quantitative textbooks emphasize. In contrast, Thinking Clearly with Data focuses, first and foremost, on critical thinking and conceptual understanding in order to teach students how to be better consumers and analysts of the kinds of quantitative information and arguments that they will encounter throughout their lives.

Among much else, the book teaches how to assess whether an observed relationship in data reflects a genuine relationship in the world and, if so, whether it is causal; how to make the most informative comparisons for answering questions; what questions to ask others who are making arguments using quantitative evidence; which statistics are particularly informative or misleading; how quantitative evidence should and shouldn’t influence decision-making; and how to make better decisions by using moral values as well as data.

- An ideal textbook for introductory quantitative methods courses in data science, statistics, political science, economics, psychology, sociology, public policy, and other fields
- Introduces the basic toolkit of data analysis―including sampling, hypothesis testing, Bayesian inference, regression, experiments, instrumental variables, differences in differences, and regression discontinuity
- Uses real-world examples and data from a wide variety of subjects
- Includes practice questions and data exercises

Link: https://www.amazon.com/Thinking-Clearly-Data-Quantitative-Reasoning/dp/0691214352

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #correlation #regression #causation #randomizedexperiments #statistics

@data_science_weekly
Channel name was changed to «Data Science Links»
The Illustrated Machine Learning

The idea is to make the complex world of Machine Learning more approachable through clear and concise illustrations.

The goal is to provide a visual aid for students, professionals, and anyone preparing for a technical interview to better understand the underlying concepts of Machine Learning.

Whether you're just starting out in the field or you're a seasoned professional looking to refresh your knowledge, these illustrations will be a valuable resource on your journey to understanding Machine Learning.

- Machine Learning
- Categorization
- Sampling and Resampling
- Bias/Variance
- Supervised Learning
- Unsupervised Learning
- Hyperparameters Tuning
- Machine Learning Engineering
- Introduction
- Before the Project Starts
- Data Collection and Preparation
- Projective Geometry
- Introduction
- Image Formation
- Structure from Motion
- Stereo Reconstruction
- Deep Learning Playbook

Link: https://illustrated-machine-learning.github.io/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #machinelearning #ml #mlsystemdesign #machinelearningsystemdesign #geometry #visualization #illustrated #supervised #unsupervised #dl #deeplearning #bias #variance #biasvariance

@data_science_weekly
👍1
How to do a code review by Google

The pages in this section contain recommendations on the best way to do code reviews, based on long experience. All together, they represent one complete document, broken up into many separate sections. You don’t have to read them all, but many people have found it very helpful to themselves and their team to read the entire set.

- The Standard of Code Review
- What to Look For In a Code Review
- Navigating a CL in Review
- Speed of Code Reviews
- How to Write Code Review Comments
- Handling Pushback in Code Reviews

Link: https://google.github.io/eng-practices/review/reviewer/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #computerscience #cs #codereview #coding #cl #changelist

@data_science_weekly
HarvardX: CS50's Introduction to Artificial Intelligence with Python

This course explores the concepts and algorithms at the foundation of modern artificial intelligence, diving into the ideas that give rise to technologies like game-playing engines, handwriting recognition, and machine translation. Through hands-on projects, students gain exposure to the theory behind graph search algorithms, classification, optimization, machine learning, large language models, and other topics in artificial intelligence as they incorporate them into their own Python programs. By course’s end, students emerge with experience in libraries for machine learning as well as knowledge of artificial intelligence principles that enable them to design intelligent systems of their own.

What you'll learn
- graph search algorithms
- adversarial search
- knowledge representation
- logical inference
- probability theory
- Bayesian networks
- Markov models
- constraint satisfaction
- machine learning
- reinforcement learning
- neural networks
- natural language processing

By the way, it starts today - December 14, 2023.

Links:
- https://www.edx.org/learn/artificial-intelligence/harvard-university-cs50-s-introduction-to-artificial-intelligence-with-python
- https://cs50.harvard.edu/ai/2024/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #deeplearning #dl #graphs #reinforcementlearning #rl #neuralnetworks #nn #naturallanguageprocessing #nlp

@data_science_weekly
LLM University by Cohere

Their comprehensive curriculum aims to give you a rock-solid foundation in NLP, equipping you with the skills needed to develop your own applications. Whether you want to learn semantic search, generation, classification, embeddings, or any other NLP technique, this is the place for you! We cater to learners from all backgrounds, covering everything from the basics to the most advanced topics in large language models (LLMs), ensuring you can harness the full potential of LLMs. Plus, you'll have the opportunity to work on hands-on exercises, allowing you to build and deploy your very own models.

The Curriculum

In this course, you will learn everything about Large Language Models (LLMs), including:
- How do LLMs work?:
Learn about their architecture and their moving pieces, including transformer models, embeddings, similarity, and attention mechanisms.
- What are LLMs useful for?:
Learn about many real-world applications of LLMs, including:
- Semantic search
- Text generation
- Text classification
- Analyzing text using embeddings
- How can I use LLMs to build and deploy my apps?:
Learn how to use LLMs to build applications. This course will teach you:
- How to use Cohere's endpoints: Classify, Generate, and Embed.
- How to build apps, including semantic search models, text generators, etc.
- (Coming soon...) How to deploy these apps on many platforms.

Link: https://docs.cohere.com/docs/llmu

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #dl #transofrmers #transformer #llm #largelanguagemodels #largelanguagemodel #textgeneration #semanticsearch #classification #textclassification #embeddings

@data_science_weekly
Prompt Engineering Guide by Open.AI

This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.

Some of the examples demonstrated here currently work only with our most capable model, gpt-4. In general, if you find that a model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.

Link: https://platform.openai.com/docs/guides/prompt-engineering

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #llm #openai #prompts #promptengineering #gpt #gpt3 #gpt4

@data_science_weekly
👍3
Channel name was changed to «Data Science Weekly»