Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
CS 229 ― Machine Learning Cheatsheet

Set of illustrated Machine Learning cheatsheets covering the content of the CS 229 class.

They can (hopefully!) be useful to all future students of this course, as well as to anyone else interested in Machine Learning.

Navigational hashtags: #armknowledgesharing #armcheetsheets
General hashtags: #machinelearning #students #content #supervisedlearning #unsupervisedlearning #deeplearning #tips #tricks #statistics #probability #calculus

@data_science_weekly
Efficient Python Tricks and Tools for Data Scientists

"Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Why for data scientist? Because Python has a wide application. The Python tools used in the data science field are not necessarily useful for other fields, such as web development.

The goal of this book is to spread the awareness of efficient ways to do Python.
They include:
- efficient methods and libraries to work with iterator, dictionary, function, and class
- efficient methods to work with popular data science libraries such as pandas and NumPy
- efficient tools to incorporate in a data science project
- efficient tools to incorporate in any project
- efficient tools to work with Jupyter Notebook."

About The Author
Khuyen Tran wrote over 150 data science articles with 100k+ views per month on Towards Data Science. She also wrote 500+ daily data science tips at Data Science Simplified. Her current mission is to make open-source more accessible to the data science community.

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #python #pandas #datascientists #datascientist #datamanagement #datamining #pythonprogramminglanguage #datascience #jupyternotebook

@data_science_weekly
Geographic Data Science with Python

This book provides the tools, the methods, and the theory to meet the challenges of contemporary data science applied to geographic problems and data. Social media, new forms of data, and new computational techniques are revolutionizing social science. In the new world of pervasive, large, frequent, and rapid data, we have new opportunities to understand and analyse the role of geography in everyday life. This book provides the first comprehensive curriculum in geographic data science.

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #machinelearning #datascience #geospatial #geospatialdata #geographic #python #data #science

@data_science_weekly
Artem Ryblov’s Data Science Weekly pinned «Machine Learning Simplified: A gentle introduction to supervised learning The underlying goal of "Machine Learning Simplified" is to develop strong intuition for ML inside you. We would use simple intuitive examples to explain complex concepts, algorithms…»
Statistics and Probability (Khan Academy)

Learn statistics and probability for free - everything you'd want to know about descriptive and inferential statistics:

Unit 1: Analysing categorical data
Unit 2: Displaying and comparing quantitative data
Unit 3: Summarizing quantitative data
Unit 4: Modelling data distributions
Unit 5: Exploring bivariate numerical data
Unit 6: Study design
Unit 7: Probability
Unit 8: Counting, permutations, and combinations
Unit 9: Random variables
Unit 10: Sampling distributions
Unit 11: Confidence intervals
Unit 12: Significance tests (hypothesis testing)
Unit 13: Two-sample inference for the difference between groups
Unit 14: Inference for categorical data (chi-square tests)
Unit 15: Advanced regression (inference and transforming)
Unit 16: Analysis of variance (ANOVA)

Link: https://www.khanacademy.org/math/statistics-probability

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #testing #design #data #abtesting #abtest #probability #ttest

@data_science_weekly
SQL Academy - SQL Interactive Course

A comprehensive SQL course designed to change the way you think about SQL forever. Together we will walk the path to understand how this language works and gain all the necessary skills to use it effectively at work.

Module 0 - Introduction
In this short module, we'll take a look at how this course's platform works and learn how to get the most out of it. And also get information about our community.

Module 1- Fundamentals
This module is designed to give you a basic understanding of databases and fill in potential gaps. Also in this module, we will get acquainted with the terminology of relational DBMS.

Module 2 - Basis of selection I
In this module we will learn how to write our first SQL queries, deal with such important concepts as conditional selection, sorting and data grouping.

Module 3 - Basis of selection II
We continue to write increasingly complex select queries: we learn how to get data from several tables, write subqueries, and get acquainted with a common table expression.

Module 4 - Data manipulation
In the previous modules, we learned how to write select-only queries, it's time to fool around more seriously: we get acquainted with adding, updating, and deleting records.

Module 5 - Databases and tables
It's time to work not only with ready-made databases, but also learn how to create your own.

Links:
- https://sql-academy.org/en
- https://sql-academy.org/en/trainer?sort=byId

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #sql #data #databases #database #tutorial #guide #onlinetraining #simulator

@data_science_weekly
The System Design Primer. Learn how to design large-scale systems.

Learning how to design scalable systems will help you become a better engineer.

System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.

This repo is an organized collection of resources to help you learn how to build systems at scale.

Link: https://github.com/donnemartin/system-design-primer#the-system-design-primer

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #systemdesign #softwareengineering #softwaredevelopment #engineer #learning #design #help

@data_science_weekly
CS 329S: Machine Learning Systems Design

This course aims to provide an iterative framework for developing real-world machine learning systems that are deployable, reliable, and scalable.
It starts by considering all stakeholders of each machine learning project and their objectives. Different objectives require different design choices, and this course will discuss the tradeoffs of those choices.
Students will learn about data management, data engineering, feature engineering, approaches to model selection, training, scaling, how to continually monitor and deploy changes to ML systems, as well as the human side of ML projects such as team structure and business metrics.

Link: https://stanford-cs329s.github.io/index.html#overview

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #mlsystemdesign #systemdesign #machinelearningsystemdesign #machinelearning #algorithms #design #architecture #engineering #software

@data_science_weekly
MACHINE LEARNING QUESTIONS

Bnomial publishes one machine learning question every day. It aims to teach you something new, one question at a time:

- The questions are practical.
- The answers are well explained, with a proper clarification of why the option is correct and why it is not.
- Reading resources are provided so one can learn more to clarify the topic.

Link: https://today.bnomial.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #machinelearning #deeplearning #ai #statistics #datascience #dataanalytics

@data_science_weekly
R2D3 is an experiment in expressing statistical thinking with interactive design.

The site contains several guides:

- A VISUAL INTRODUCTION TO MACHINE LEARNING
- Part 1: A Decision Tree
- Part 2: Bias and Variance

- MISC
- Design in a World where Machines are Learning
- Making Sense of COVID-19

Basically, they try to explain complex concepts using intuitive graphics.

Link: https://www.r2d3.us/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #machinelearning #covid #learning #design #decisiontrees #bias #variance #visualization #eda

@data_science_weekly
The Most Comprehensive List of Kaggle Solutions and Ideas

This is a list of almost all available solutions and ideas shared by top performers in the past Kaggle competitions. This list gets updated as soon as a new competition finishes.

Link: https://farid.one/kaggle-solutions/

Navigational hashtags: #armknowledgesharing #armkaggle
General hashtags: #kaggle #datascience #machinelearning #competitions

@data_science_weekly
Channel name was changed to «Accelerated Learning»
Stanford CS 230 ― Deep Learning

Set of illustrated Deep Learning cheatsheets covering the content of the CS 230 class.

They can (hopefully!) be useful to all future students of this course as well as to anyone else interested in Deep Learning.

Link: https://stanford.edu/~shervine/teaching/cs-230/

Navigational hashtags: #armknowledgesharing #armcheetsheets
General hashtags: #machinelearning #students #content #deeplearning #tips #tricks #cheetsheet #convolutionalneuralnetworks #recurrentneuralnetworks

@data_science_weekly
The Pillars of Data Science

I've created a site where I have been developing two differently styled roadmaps based on the links I share on this channel.

Both guides contain the same information but are formatted differently for your convenience.

The first roadmap is called Topic Guides.
These guides are focused on topics like Machine Learning and then split into knowledge levels and resource types. Thus, you can use them if you want to focus on a specific topic and deepen your knowledge.

The second roadmap is called Content Type Guides.
These guides are aimed by resource type, such as courses, and then divided into topics and knowledge levels. So, you can use them if you prefer a certain type of resource and want to expand your knowledge.

This site is updated as new links are posted.

Link: Site

@data_science_weekly
Artem Ryblov’s Data Science Weekly pinned «The Pillars of Data Science I've created a site where I have been developing two differently styled roadmaps based on the links I share on this channel. Both guides contain the same information but are formatted differently for your convenience. The first…»
Prompt Engineering Guide

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

Happy Prompting!

Links:
- https://github.com/dair-ai/Prompt-Engineering-Guide
- https://www.promptingguide.ai/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #promptengineering #prompts #promptdesign #prompt #prompting

@data_science_weekly
Основы алгоритмов

С помощью этого хендбука вы научитесь проектировать, оптимизировать, комбинировать и отлаживать алгоритмы — причём без привязки к какому-либо языку программирования. Кроме теории мы собрали и практические задания разного уровня сложности, а также подготовили систему автоматической проверки эффективности алгоритмов — всё это поможет вам закрепить и отточить новые навыки.

Link: https://academy.yandex.ru/handbook/algorithms

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #algorithms #datastructures #datastructuresandalgorithms #python

@data_science_weekly
👍1
The Hugging Face Course

This course will teach you about natural language processing (NLP) using libraries from the Hugging Face ecosystem — 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate — as well as the Hugging Face Hub.

It’s completely free and without ads.

Link: https://huggingface.co/learn/nlp-course/chapter1/1

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #nlp #language #naturallanguageprocessing #huggingface #transformers #deeplearning #freecourse #freecourses

@data_science_weekly
👍1
Learn PyTorch for Deep Learning: Zero to Mastery

Welcome to the second-best place on the internet to learn PyTorch (the first being the PyTorch documentation).
This is the online book version of the Learn PyTorch for Deep Learning: Zero to Mastery course.
This course will teach you the foundations of machine learning and deep learning with PyTorch (a machine learning framework written in Python).
The course is video based. However, the videos are based on the contents of this online book.

Links:
- https://www.learnpytorch.io/
- https://github.com/mrdbourke/pytorch-deep-learning
- https://zerotomastery.io/courses/learn-pytorch/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #machinelearning #python #computervision #transferlearning #classification #modeldeployment #pytorch #torch

@data_science_weekly
CS109: Probability for Computer Scientists

While the initial foundations of computer science began in the world of discrete mathematics (after all, modern computers are digital in nature), recent years have seen a surge in the use of probability as a tool for the analysis and development of new algorithms and systems. As a result, it is becoming increasingly important for budding computer scientists to understand probability theory, both to provide new perspectives on existing ideas and to help further advance the field in new ways.

CS109: Probability for Computer Scientists starts by providing a fundamental grounding in combinatorics, and then quickly moves into the basics of probability theory. We will then cover many essential concepts in probability theory, including particular probability distributions, properties of probabilities, and mathematical tools for analysing probabilities. Finally, the last third of the class will focus on data analysis and machine learning as a means for seeing direct applications of probability in this exciting and quickly growing subfield of computer science. This is going to be a great quarter, and we are looking forward to the chance to teach you.

Course Topics
Here are the broad strokes of the course (in approximate order). More information is available on our Schedule page. We cover a very broad set of topics so that you are equipped with the probability and statistics you will see in your future CS studies!
- Counting and probability fundamentals
- Single-dimensional random variables
- Probabilistic models
- Uncertainty theory
- Parameter estimation
- Introduction to machine learning

Links
- Course: https://web.stanford.edu/class/cs109/
- Course Book: https://chrispiech.github.io/probabilityForComputerScientists/en/index.html
- Python for Probability: https://web.stanford.edu/class/archive/cs/cs109/cs109.1238/handouts/python.html

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #probability #stanford #machinelearning #dataanalysis #computerscience #help #mathematics

@data_science_weekly