Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
SQL Academy - SQL Interactive Course

A comprehensive SQL course designed to change the way you think about SQL forever. Together we will walk the path to understand how this language works and gain all the necessary skills to use it effectively at work.

Module 0 - Introduction
In this short module, we'll take a look at how this course's platform works and learn how to get the most out of it. And also get information about our community.

Module 1- Fundamentals
This module is designed to give you a basic understanding of databases and fill in potential gaps. Also in this module, we will get acquainted with the terminology of relational DBMS.

Module 2 - Basis of selection I
In this module we will learn how to write our first SQL queries, deal with such important concepts as conditional selection, sorting and data grouping.

Module 3 - Basis of selection II
We continue to write increasingly complex select queries: we learn how to get data from several tables, write subqueries, and get acquainted with a common table expression.

Module 4 - Data manipulation
In the previous modules, we learned how to write select-only queries, it's time to fool around more seriously: we get acquainted with adding, updating, and deleting records.

Module 5 - Databases and tables
It's time to work not only with ready-made databases, but also learn how to create your own.

Links:
- https://sql-academy.org/en
- https://sql-academy.org/en/trainer?sort=byId

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #sql #data #databases #database #tutorial #guide #onlinetraining #simulator

@data_science_weekly
The System Design Primer. Learn how to design large-scale systems.

Learning how to design scalable systems will help you become a better engineer.

System design is a broad topic. There is a vast amount of resources scattered throughout the web on system design principles.

This repo is an organized collection of resources to help you learn how to build systems at scale.

Link: https://github.com/donnemartin/system-design-primer#the-system-design-primer

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #systemdesign #softwareengineering #softwaredevelopment #engineer #learning #design #help

@data_science_weekly
CS 329S: Machine Learning Systems Design

This course aims to provide an iterative framework for developing real-world machine learning systems that are deployable, reliable, and scalable.
It starts by considering all stakeholders of each machine learning project and their objectives. Different objectives require different design choices, and this course will discuss the tradeoffs of those choices.
Students will learn about data management, data engineering, feature engineering, approaches to model selection, training, scaling, how to continually monitor and deploy changes to ML systems, as well as the human side of ML projects such as team structure and business metrics.

Link: https://stanford-cs329s.github.io/index.html#overview

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #mlsystemdesign #systemdesign #machinelearningsystemdesign #machinelearning #algorithms #design #architecture #engineering #software

@data_science_weekly
MACHINE LEARNING QUESTIONS

Bnomial publishes one machine learning question every day. It aims to teach you something new, one question at a time:

- The questions are practical.
- The answers are well explained, with a proper clarification of why the option is correct and why it is not.
- Reading resources are provided so one can learn more to clarify the topic.

Link: https://today.bnomial.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #machinelearning #deeplearning #ai #statistics #datascience #dataanalytics

@data_science_weekly
R2D3 is an experiment in expressing statistical thinking with interactive design.

The site contains several guides:

- A VISUAL INTRODUCTION TO MACHINE LEARNING
- Part 1: A Decision Tree
- Part 2: Bias and Variance

- MISC
- Design in a World where Machines are Learning
- Making Sense of COVID-19

Basically, they try to explain complex concepts using intuitive graphics.

Link: https://www.r2d3.us/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #machinelearning #covid #learning #design #decisiontrees #bias #variance #visualization #eda

@data_science_weekly
The Most Comprehensive List of Kaggle Solutions and Ideas

This is a list of almost all available solutions and ideas shared by top performers in the past Kaggle competitions. This list gets updated as soon as a new competition finishes.

Link: https://farid.one/kaggle-solutions/

Navigational hashtags: #armknowledgesharing #armkaggle
General hashtags: #kaggle #datascience #machinelearning #competitions

@data_science_weekly
Channel name was changed to «Accelerated Learning»
Stanford CS 230 ― Deep Learning

Set of illustrated Deep Learning cheatsheets covering the content of the CS 230 class.

They can (hopefully!) be useful to all future students of this course as well as to anyone else interested in Deep Learning.

Link: https://stanford.edu/~shervine/teaching/cs-230/

Navigational hashtags: #armknowledgesharing #armcheetsheets
General hashtags: #machinelearning #students #content #deeplearning #tips #tricks #cheetsheet #convolutionalneuralnetworks #recurrentneuralnetworks

@data_science_weekly
The Pillars of Data Science

I've created a site where I have been developing two differently styled roadmaps based on the links I share on this channel.

Both guides contain the same information but are formatted differently for your convenience.

The first roadmap is called Topic Guides.
These guides are focused on topics like Machine Learning and then split into knowledge levels and resource types. Thus, you can use them if you want to focus on a specific topic and deepen your knowledge.

The second roadmap is called Content Type Guides.
These guides are aimed by resource type, such as courses, and then divided into topics and knowledge levels. So, you can use them if you prefer a certain type of resource and want to expand your knowledge.

This site is updated as new links are posted.

Link: Site

@data_science_weekly
Artem Ryblov’s Data Science Weekly pinned «The Pillars of Data Science I've created a site where I have been developing two differently styled roadmaps based on the links I share on this channel. Both guides contain the same information but are formatted differently for your convenience. The first…»
Prompt Engineering Guide

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

Happy Prompting!

Links:
- https://github.com/dair-ai/Prompt-Engineering-Guide
- https://www.promptingguide.ai/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #promptengineering #prompts #promptdesign #prompt #prompting

@data_science_weekly
Основы алгоритмов

С помощью этого хендбука вы научитесь проектировать, оптимизировать, комбинировать и отлаживать алгоритмы — причём без привязки к какому-либо языку программирования. Кроме теории мы собрали и практические задания разного уровня сложности, а также подготовили систему автоматической проверки эффективности алгоритмов — всё это поможет вам закрепить и отточить новые навыки.

Link: https://academy.yandex.ru/handbook/algorithms

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #algorithms #datastructures #datastructuresandalgorithms #python

@data_science_weekly
👍1
The Hugging Face Course

This course will teach you about natural language processing (NLP) using libraries from the Hugging Face ecosystem — 🤗 Transformers, 🤗 Datasets, 🤗 Tokenizers, and 🤗 Accelerate — as well as the Hugging Face Hub.

It’s completely free and without ads.

Link: https://huggingface.co/learn/nlp-course/chapter1/1

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #nlp #language #naturallanguageprocessing #huggingface #transformers #deeplearning #freecourse #freecourses

@data_science_weekly
👍1
Learn PyTorch for Deep Learning: Zero to Mastery

Welcome to the second-best place on the internet to learn PyTorch (the first being the PyTorch documentation).
This is the online book version of the Learn PyTorch for Deep Learning: Zero to Mastery course.
This course will teach you the foundations of machine learning and deep learning with PyTorch (a machine learning framework written in Python).
The course is video based. However, the videos are based on the contents of this online book.

Links:
- https://www.learnpytorch.io/
- https://github.com/mrdbourke/pytorch-deep-learning
- https://zerotomastery.io/courses/learn-pytorch/

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #deeplearning #machinelearning #python #computervision #transferlearning #classification #modeldeployment #pytorch #torch

@data_science_weekly
CS109: Probability for Computer Scientists

While the initial foundations of computer science began in the world of discrete mathematics (after all, modern computers are digital in nature), recent years have seen a surge in the use of probability as a tool for the analysis and development of new algorithms and systems. As a result, it is becoming increasingly important for budding computer scientists to understand probability theory, both to provide new perspectives on existing ideas and to help further advance the field in new ways.

CS109: Probability for Computer Scientists starts by providing a fundamental grounding in combinatorics, and then quickly moves into the basics of probability theory. We will then cover many essential concepts in probability theory, including particular probability distributions, properties of probabilities, and mathematical tools for analysing probabilities. Finally, the last third of the class will focus on data analysis and machine learning as a means for seeing direct applications of probability in this exciting and quickly growing subfield of computer science. This is going to be a great quarter, and we are looking forward to the chance to teach you.

Course Topics
Here are the broad strokes of the course (in approximate order). More information is available on our Schedule page. We cover a very broad set of topics so that you are equipped with the probability and statistics you will see in your future CS studies!
- Counting and probability fundamentals
- Single-dimensional random variables
- Probabilistic models
- Uncertainty theory
- Parameter estimation
- Introduction to machine learning

Links
- Course: https://web.stanford.edu/class/cs109/
- Course Book: https://chrispiech.github.io/probabilityForComputerScientists/en/index.html
- Python for Probability: https://web.stanford.edu/class/archive/cs/cs109/cs109.1238/handouts/python.html

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #statistics #probability #stanford #machinelearning #dataanalysis #computerscience #help #mathematics

@data_science_weekly
The Ultimate SQL Guide

Understanding SQL remains the best way to work with data in our organizations. For a stakeholder, just being able to read and understand SQL queries can completely change how they work. Instead of only working with static, flat dashboards, they can work more closely with the data team, ask more probing questions and be a smarter consumer of the data they do receive.

The guide contains live data and queries and explains concepts spatially and offers pragmatic advice instead of overly technical explanations.

It was made with three people in mind:
- someone brand new to SQL and wants to learn the basics
- someone new to SQL wanting to upskill themselves
- people who use SQL every day and may need to refresh on certain concepts (like regular expressions) from time-to-time

Link
https://blog.count.co/the-ultimate-sql-guide/

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #data #sql #learning #free

@data_science_weekly
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning by Sebastian Raschka

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings.
This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning.

Link
https://arxiv.org/abs/1811.12808

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #machinelearning #ml #modelevaluation #evaluation #selection #cv #crossvalidation

@data_science_weekly
Mindful Modeler by Christoph Molnar

The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.

Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to model.fit() and model.predict() you can stand out by bringing statistical thinking to the arena.
Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.

You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering

Link
https://mindfulmodeler.substack.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference

@data_science_weekly
👍1
Algorithmic concepts by Afshine Amidi and Shervine Amidi

This guide is a concise and illustrated guide for anyone who wants to brush up on their fundamentals in the context of coding interviews, computer science classes or to satisfy their own curiosity.

It is divided into 4 parts
- Foundations: main types of algorithms and related mathematical concepts
- Data structures: arrays, strings, queues, stacks, hash tables, linked lists and associated theorems and tricks
- Graphs and trees: graph concepts and graph traversal algorithms along with important types of trees
- Sorting and search: common, efficient sorting and search algorithms

Link
https://superstudy.guide/algorithms-data-structures/foundations/algorithmic-concepts

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #algorithms #datastructures #datastructuresandalgorithms #graphs #trees #sorting #search

@data_science_weekly
An Introduction to Statistical Learning with applications in PYTHON!

As the scale and scope of data collection continue to increase across virtually all fields, statistical learning has become a critical toolkit for anyone who wishes to understand data. An Introduction to Statistical Learning provides a broad and less technical treatment of key topics in statistical learning.

The Python edition (ISLP) was published in 2023.

The chapters cover the following topics:
- What is statistical learning?
- Regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Moving beyond linearity
- Tree-based methods
- Support vector machines
- Deep learning
- Survival analysis
- Unsupervised learning
- Multiple testing

Link: https://www.statlearning.com

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ISLR #ISLP #regression #classification #resampling #linearmodels #regularization #trees #svm #deeplearning #unsupervisedlearning #abtesting

@data_science_weekly