Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
Channel name was changed to «Approximate Learning»
Hi!
I have accumulated a lot of information that I want to share:

- Books #armbooks
- Courses #armcourses
- YouTube channels #armyoutube
- Articles and blogs #armarticles
- Tutorials #armtutorials
- Python Libraries #armpackages
- Kaggle Notebooks #armkaggle
- Telegram channels #armtelegram
- Cheetsheets #armcheetsheets
- Repositories #armrepo
- Newsletters #armnewsletters

All the posts in the series can be found using the hashtag #armknowledgesharing (and the hashtags above)
👍2
Artem Ryblov’s Data Science Weekly pinned «Hi! I have accumulated a lot of information that I want to share: - Books #armbooks - Courses #armcourses - YouTube channels #armyoutube - Articles and blogs #armarticles - Tutorials #armtutorials - Python Libraries #armpackages - Kaggle Notebooks #armkaggle…»
Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones

This book is not about data science or machine learning, but I think anyone interested in building a productive life would love to read it, understand it, and start applying the techniques described in the book to real life.

I have read and listened to it in my native language and I am going to read it again in English. I believe that such books should be re-read once a year or two to refresh the information in memory.

10 Things This Book Will Teach You

Learn how to…
- Build a system for getting 1% better every day.
- Break your bad habits and stick to good ones.
- Avoid the common mistakes most people make when changing habits.
- Overcome a lack of motivation and willpower.
- Develop a stronger identity and believe in yourself.
- Make time for new habits (even when life gets crazy).
- Design your environment to make success easier.
- Make tiny, easy changes that deliver big results.
- Get back on track when you get off course.
- And most importantly, how to put these ideas into practice in real life.
…and much more.

I also recommend to sign up for the 3-2-1 Newsletter from the author of the book using the link in the comments section:

"The 3-2-1 Newsletter is one of the most popular newsletters in the world. Every Thursday, the latest issue is sent to over 2,000,000 people. Each message includes 3 short ideas from me, 2 quotes from others, and 1 question for you to ponder"

#armbooks #armknowledgesharing #book #habits #selfhelp #motivation
👍1
Open Machine Learning Course

Topics: #EDA, #Visualization, #Classification, #Regression, #Ensembles, #FeatureEngineering, #Clustering, #OnlineLearning, #TimeSeries, #GradientBoosting

mlcourse.ai is an open Machine Learning course by OpenDataScience (ods.ai), led by Yury Kashnitsky, Ph.D.. Having both a Ph.D. degree in applied math and a Kaggle Competitions Master tier, Yury aimed at designing an ML course with a perfect balance between theory and practice. Thus, the course meets you with math formulae in lectures, and a lot of practice in a form of assignments and Kaggle Inclass competitions. Currently, the course is in a self-paced mode. Here we guide you through the self-paced mlcourse.ai.

#armcourses #armknowledgesharing
Kaggle Learn

Kaggle not only allows you to participate in Data Science competitions, but also provides access to its courses.

Each course focuses on a particular topic and has several lessons. Passing them is not difficult and does not take much time (several hours), but the courses are interesting and allow you to remember the basics (and maybe learn something new for yourself).

It is convenient that for each course it is indicated which courses you need to take before studying this particular course and which after.

I've finished Python, Intro to Machine Learning, Intermediate Machine Learning when they were introduced. Now I'm going through Machine Learning Explainability.

Some of the must-have courses:
- Feature Engineering (https://lnkd.in/e7xF_9-Z)
- Time Series (https://lnkd.in/eA-cYvHi)
- Data Cleaning (https://lnkd.in/edCfAkat)
- Intro to AI Ethics (https://lnkd.in/eBiT2YHM)
- Machine Learning Explainability (https://lnkd.in/eTnCWkFD)

The courses are free.

There are also guides, like Natural Language Processing Guide
https://www.kaggle.com/learn-guide/natural-language-processing

#armknowledgesharing #armcourses
#kaggle #python #machinelearning #datascience

@data_science_weekly
The Matrix Calculus You Need For Deep Learning

This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don't worry if you get stuck at some point along the way---just go back and reread the previous section, and try writing down and working through some examples.

Article: https://arxiv.org/abs/1802.01528
Browser-friendly version: https://ar5iv.labs.arxiv.org/html/1802.01528
Explained.ai version: https://explained.ai/matrix-calculus/

#armknowledgesharing #armarticles
#deeplearning #math #matrix #matrixcalculus

@data_science_weekly
Data Science for Tabular Data: Advanced Techniques

This is a collection of the best Kaggle notebooks (kernels) and other resources (including notebooks (kernels) and posts in discussion from Prize Competition Winners) with Advanced Techniques of Data Science for Tabular Data.

Table of Contents:
- Exploratory Data Analysis (EDA)
- Feature Engineering (FE)
- Model Hyper-parameter Optimization
- Models Selection
- Time Series
- Probability Calibration
- Universal Tool-kits
- DS Tutorials

#armkaggle #armknowledgesharing
#datascience #kaggle #tabular #dataanalysis

@data_science_weekly
Python & ML tasks
Задачи по Python и машинному обучению

Today I want to share with you a telegram channel which will help you retain your knowledge of python and maybe learn something new.

Every day a question is posted and you can answer it using the quiz under the question.
If your answer is wrong, you can find out the correct one and read the explanation.

#armknowledgesharing #armtelegram #python

@data_science_weekly
The author of the channel is Valerii Babushkin. He is a Vice President (Data Science) at Blockchain.com.

He writes about Machine Learning, Deep Learning, AB Tests, Article Reviews, Job Interviews.

He has his own YouTube channel, and you can also search for videos and podcasts with him.

Telegram (rus version)
Youtube channel

#armknowledgesharing #armtelegram

@data_science_weekly
StatQuest with Josh Starmer

"Statistics, Machine Learning and Data Science can sometimes seem like very scary topics, but since each technique is really just a combination of small and simple steps, they are actually quite simple. My goal with StatQuest is to break down the major methodologies into easy to understand pieces. That said, I don't dumb down the material. Instead, I build up your understanding so that you are smarter."

This is how Joshua Starmer PhD describes his channel and I completely agree with him!

I watch his videos to understand the meaning of the algorithms before going into details, and I encourage you to do the same!

YouTube: https://www.youtube.com/@statquest/videos
Website: https://statquest.org/
Book: https://www.amazon.com/StatQuest-Illustrated-Guide-Machine-Learning/dp/B09ZCKR4H6

#machinelearning #datascience #algorithms #statistics #phd
#armknowledgesharing #armyoutube

@data_science_weekly
Applying Machine Learning by Eugene Yan

"Applying machine learning is hard. Many organizations have yet to benefit from ML, and most teams still find it tricky to apply it effectively.

Though there are many ML courses, most focus on theory and students finish without knowing how to apply ML. Practical know-how is gained via hands-on experience and seldom documented—it's hard to find it in a textbook, class, or tutorial. There's a gap between knowing ML vs. applying it at work.

To fill this gap, ApplyingML collects tacit/tribal/ghost knowledge on applying ML via curated papers/blogs, guides, and interviews with ML practitioners. In a nutshell, it's 1/3 applied-ml, 1/3 ghost knowledge, and 1/3 Tim Ferriss Show. The intent is to make it easier to apply—and benefit from—ML at work."

Actually, the site contains 3 types of resources:
- Guides (teardowns, ml guides, non-ml guides)
- Interviews with machine learning practitioners
- Papers (curated list divided by topics)

Site: https://applyingml.com/
Personal site of Eugene Yan: https://eugeneyan.com/

#armknowledgesharing #armarticles
#machinelearning #ml #experience #production #datascience #blogs

@data_science_weekly
👍1
CS229: Machine Learning

It is time to remember the basics!

This course provides a broad introduction to machine learning and statistical pattern recognition.

Topics include:
- Supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines);
- Unsupervised learning (clustering, dimensionality reduction, kernel methods);
- Learning theory (bias/variance tradeoffs, practical advice);
- Reinforcement learning and adaptive control.

The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.

Links:
- Lecture videos
- Lecture notes
- Course materials
- Main page for the course
- Cheatsheets

Navigational tags: #armknowledgesharing #armcourses
General tags: #machinelearning #supervisedlearning #neuralnetworks #svm #unsupervisedlearning #clustering #kernel #kernel #bias #variance #tradeoff #reinforcementlearning #cheatsheet #data #learning #patternrecognition #datamining

@data_science_weekly
Machine Learning Simplified:
A gentle introduction to supervised learning

The underlying goal of "Machine Learning Simplified" is to develop strong intuition for ML inside you. We would use simple intuitive examples to explain complex concepts, algorithms or methods, as well as democratize all mathematics behind machine learning.

After reading this book, you would understand everything that comes into the scope of Supervised ML, and would be able to not only understand nitty-gritty details of mathematics behind the scene, but also explain to anyone how things work on a high level.

The book is free, but you can purchase EPUB version through Amazon or show your appreciation to the author and purchase PDF through Leanpub.

Table of contents:
I. FUNDAMENTALS OF SUPERVISED LEARNING
Chapter 1. Introduction
Chapter 2. Overview of Supervised Learning
Chapter 3. Model Learning
Chapter 4. Basis Expansion & Regularization
Chapter 5. Model Selection
Chapter 6. Feature Selection
Chapter 7. Data Preparation
II. ADVANCED SUPERVISED LEARNING ALGORITHMS (WIP)
Chapter 1. Regression Models
Chapter 2. Logit Models
Chapter 3. Bayesian Models
Chapter 4. Maximum Margin Models
Chapter 5. Tree-Based Models
Chapter 6. Ensemble Models
Chapter 7. Algorithms Selection
Chapter 8. Hyperparameter Tuning
Chapter 9. Evaluation Metrics

Read for free: https://themlsbook.com/read
Buy on Amazon: https://www.amazon.com/dp/B0B216KMM4/qid=1653304321
Buy on LeanPub: https://leanpub.com/themlsbook
Repository: https://code.themlsbook.com/

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #ml #algorithms #learning #book

@data_science_weekly
End to End Machine Learning (FREE Courses)

The best way to learn new concepts is to use them to build something. These courses are structured to build foundational knowledge (100 series), provide in-depth applied machine learning case studies (200 series), and embark on project-driven deep-dives (300 series).

- 111. Getting ready to learn Python, Mac edition
- 112. Getting ready to learn Python, Windows edition
- 201. Intro to Python
- 211. Decision Trees with Python and Pandas
- 212. Time-Series Analysis
- 213. Nonlinear Modelling and Optimization
- 221. The k-nearest neighbours algorithm
- 311. Neural Network Visualization
- 312. Build a Neural Network Framework
- 313. Advanced Neural Network Methods
- 314. Neural Network Optimization
- 321. Convolutional Neural Networks in One Dimension
- 322. Convolutional neural networks in two dimensions

Come have a look around and try one out today!

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #algorithms #learning #course #python #decisiontrees #pandas #timeseries #nonlinear #knn #neuralnetworks #neuralnetwork #convolutionalneuralnetworks #optimization #analysis #visualization

@data_science_weekly
The Linux command line for beginners

The Linux command line is a text interface to your computer. Often referred to as the shell, terminal, console, prompt or various other names, it can give the appearance of being complex and confusing to use. Yet the ability to copy and paste commands from a website, combined with the power and flexibility the command line offers, means that using it may be essential when trying to follow instructions online, including many on this very website!

This tutorial will teach you a little of the history of the command line, then walk you through some practical exercises to become familiar with a few basic commands and concepts. We’ll assume no prior knowledge, but by the end we hope you’ll feel a bit more comfortable the next time you’re faced with some instructions that begin “Open a terminal”.

What you’ll learn
- A little history of the command line
- How to access the command line from your own computer
- How to perform some basic file manipulation
- A few other useful commands
- How to chain commands together to make more powerful tools
- The best way to use administrator powers

What you’ll need
- A computer running Ubuntu or some other version of Linux

Bonus Links:
- The Art of Command Line: https://github.com/jlevy/the-art-of-command-line
- Mind Map of Linux Commands: https://xmind.app/m/WwtB/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #linux #terminal #shell #console #prompt #unix #bash

@data_science_weekly
Dive into Deep Learning

- Interactive deep learning book with code, maths, and discussions.
- Implemented with PyTorch, NumPy/MXNet, JAX, and TensorFlow.
- Adopted at 400 universities from 60 countries.

Content and Structure

The book can be divided into roughly three parts, focusing on preliminaries, deep learning techniques, and advanced topics focused on real systems and applications:

Part 1: Basics and Preliminaries. Section 1 offers an introduction to deep learning. Then, in Section 2, we quickly bring you up to speed on the prerequisites required for hands-on deep learning, such as how to store and manipulate data, and how to apply various numerical operations based on basic concepts from linear algebra, calculus, and probability. Section 3 and Section 5 cover the most basic concepts and techniques in deep learning, including regression and classification; linear models; multilayer perceptrons; and overfitting and regularization.

Part 2: Modern Deep Learning Techniques. Section 6 describes the key computational components of deep learning systems and lays the groundwork for our subsequent implementations of more complex models. Next, Section 7 and Section 8 introduce convolutional neural networks (CNNs), powerful tools that form the backbone of most modern computer vision systems. Similarly, Section 9 and Section 10 introduce recurrent neural networks (RNNs), models that exploit sequential (e.g., temporal) structure in data and are commonly used for natural language processing and time series prediction. In Section 11, we introduce a relatively new class of models based on so-called attention mechanisms that has displaced RNNs as the dominant architecture for most natural language processing tasks. These sections will bring you up to speed on the most powerful and general tools that are widely used by deep learning practitioners.

Part 3: Scalability, Efficiency, and Applications. In Section 12, we discuss several common optimization algorithms used to train deep learning models. Next, in Section 13, we examine several key factors that influence the computational performance of deep learning code. Then, in Section 14, we illustrate major applications of deep learning in computer vision. Finally, in Section 15 and Section 16, we demonstrate how to pretrain language representation models and apply them to natural language processing tasks. This part is available online.

Navigational hashtags: #armknowledgesharing #armbooks #armcourses
General hashtags: #deeplearning #dl #tensorflow #pytorch #jax #numpy #computervision #naturallanguageprocessing #attention #neuralnetworks #algorithms

@data_science_weekly
👍1
CS 229 ― Machine Learning Cheatsheet

Set of illustrated Machine Learning cheatsheets covering the content of the CS 229 class.

They can (hopefully!) be useful to all future students of this course, as well as to anyone else interested in Machine Learning.

Navigational hashtags: #armknowledgesharing #armcheetsheets
General hashtags: #machinelearning #students #content #supervisedlearning #unsupervisedlearning #deeplearning #tips #tricks #statistics #probability #calculus

@data_science_weekly
Efficient Python Tricks and Tools for Data Scientists

"Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Why for data scientist? Because Python has a wide application. The Python tools used in the data science field are not necessarily useful for other fields, such as web development.

The goal of this book is to spread the awareness of efficient ways to do Python.
They include:
- efficient methods and libraries to work with iterator, dictionary, function, and class
- efficient methods to work with popular data science libraries such as pandas and NumPy
- efficient tools to incorporate in a data science project
- efficient tools to incorporate in any project
- efficient tools to work with Jupyter Notebook."

About The Author
Khuyen Tran wrote over 150 data science articles with 100k+ views per month on Towards Data Science. She also wrote 500+ daily data science tips at Data Science Simplified. Her current mission is to make open-source more accessible to the data science community.

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #python #pandas #datascientists #datascientist #datamanagement #datamining #pythonprogramminglanguage #datascience #jupyternotebook

@data_science_weekly
Geographic Data Science with Python

This book provides the tools, the methods, and the theory to meet the challenges of contemporary data science applied to geographic problems and data. Social media, new forms of data, and new computational techniques are revolutionizing social science. In the new world of pervasive, large, frequent, and rapid data, we have new opportunities to understand and analyse the role of geography in everyday life. This book provides the first comprehensive curriculum in geographic data science.

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #datascience #geospatial #geospatialdata #geographic #python #data #science

@data_science_weekly