Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
What the f*ck Python! 😱

Python, being a beautifully designed high-level and interpreter-based programming language, provides us with many features for the programmer's comfort. But sometimes, the outcomes of a Python snippet may not seem obvious at first sight.

Here's a fun project attempting to explain what exactly is happening under the hood for some counter-intuitive snippets and lesser-known features in Python.

While some of the examples you see below may not be WTFs in the truest sense, but they'll reveal some of the interesting parts of Python that you might be unaware of. I find it a nice way to learn the internals of a programming language, and I believe that you'll find it interesting too!

If you're an experienced Python programmer, you can take it as a challenge to get most of them right in the first attempt. You may have already experienced some of them before, and I might be able to revive sweet old memories of yours! 😅

Links:
- Interactive Website
- Interactive Notebook
- GitHub Version:
- ENG
- RUS

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #python #programming #coding

@data_science_weekly
👍4
Data Analysis with Python and PySpark by Jonathan Rioux

In Data Analysis with Python and PySpark you will learn how to:

- Manage your data as it scales across multiple machines
- Scale up your data programs with full confidence
- Read and write data to and from a variety of sources and formats
- Deal with messy data with PySpark’s data manipulation functionality
- Discover new data sets and perform exploratory data analysis
- Build automated data pipelines that transform, summarize, and get insights from data
- Troubleshoot common PySpark errors
- Creating reliable long-running jobs

Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you’ve learned, and rapidly start implementing PySpark into your data systems. No previous knowledge of Spark is required.

Link: Direct

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #spark #pyspark #bigdata

@data_science_weekly
👍3
Practical Recommender Systems by Kim Falk

Practical Recommender Systems explains how recommender systems work and shows how to create and apply them for your site. After covering the basics, you’ll see how to collect user data and produce personalized recommendations. You’ll learn how to use the most popular recommendation algorithms and see examples of them in action on sites like Amazon and Netflix. Finally, the book covers scaling problems and other issues you’ll encounter as your site grows.

Link: Direct

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #recsys #recommendersystems

@data_science_weekly
👍3
CS224W: Machine Learning with Graphs

Complex data can be represented as a graph of relationships between objects. Such networks are a fundamental tool for modeling social, technological, and biological systems. This course focuses on the computational, algorithmic, and modeling challenges specific to the analysis of massive graphs. By means of studying the underlying graph structure and its features, students are introduced to machine learning techniques and data mining tools apt to reveal insights on a variety of networks.

Topics include: representation learning and Graph Neural Networks; algorithms for the World Wide Web; reasoning over Knowledge Graphs; influence maximization; disease outbreak detection, social network analysis.

Links:
- Direct
- Videos

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #graphs #graph #gnn #knowledgegraphs #socialnetworks

@data_science_weekly
👍2
🧠 Awesome ChatGPT Prompts

Welcome to the "Awesome ChatGPT Prompts" repository! This is a collection of prompt examples to be used with the ChatGPT model.

The ChatGPT model is a large language model trained by OpenAI that is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or expand on the given prompt.

In this repository, you will find a variety of prompts that can be used with ChatGPT.

To get started, simply clone this repository and use the prompts in the README.md file as input for ChatGPT. You can also use the prompts in this file as inspiration for creating your own.

Link: Direct

Navigational hashtags: #armknowledgesharing #armrepo
General hashtags: #prompts #prompt #promptengineering #chatgpt #gpt

@data_science_weekly
👍2
Mathematics Of Machine Learning by MIT

Broadly speaking, Machine Learning refers to the automated identification of patterns in data. As such it has been a fertile ground for new statistical and algorithmic developments. The purpose of this course is to provide a mathematically rigorous introduction to these developments with emphasis on methods and their analysis.

Link: Direct

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #math #maths #mathematics #ml

@data_science_weekly
👍3
Exceptional Resources for Data Science Interview Preparation. Part 3: Specialized Machine Learning

In the previous article, I shared materials for preparing for the stage on Classical Machine Learning.

In this article, we will look at materials that can be used to prepare for the section on specialized machine learning.

Table of contents
- Resources
- Deep Learning
- Natural Language Processing
- Computer Vision
- Graph Neural Networks
- Reinforcement Learning
- Recommender Systems
- Time Series
- Big Data
- Let’s sum it up
- What’s next?


NB:
I'm the author of the article.
It was initially published in Russian (on habr.com), then I published it on medium.com. So, for Russian speakers I recommend to read Russian version, for English speakers I recommend to read English version and both will benefit from starring the repository, which will be maintained and updated when new resources become available.

Links:
- Medium (eng)
- Habr (rus)

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #interview #interviewpreparation #machinelearning #ml #deeplearning #dl #nlp #cv #rl #gnn #recsys

@data_science_weekly
👍3
DevOps for Data Science by Alex K Gold

In this book, you’ll learn about DevOps conventions, tools, and practices that can be useful to you as a data scientist. You’ll also learn how to work better with the IT/Admin team at your organization, and even how to do a little server administration of your own if you’re pressed into service.

Link: Direct

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #devops #mlops #datascience

@data_science_weekly
👍5
Bash Scripting Tutorial for Beginners by Herbert Lindemans

Learn bash scripting in this crash course for beginners. Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.

⌨️ (00:00) Introduction
⌨️ (03:24) Basic commands
⌨️ (06:21) Writing your first bash script
⌨️ (11:29) Variables
⌨️ (14:55) Positional arguments
⌨️ (16:23) Output/Input redirection
⌨️ (23:23) Test operators
⌨️ (25:19) If/Elif/Else
⌨️ (28:37) Case statements
⌨️ (32:16) Arrays
⌨️ (34:12) For loop
⌨️ (36:03) Functions
⌨️ (41:31) Exit codes
⌨️ (42:30) AWK
⌨️ (45:11) SED

Link: Video

Navigational hashtags: #armknowledgesharing #armyoutube
General hashtags: #bash #cmd #terminal

@data_science_weekly
👍2
Immersive linear algebra by J. Ström, K. Åström, and T. Akenine-Möller

"A picture says more than a thousand words" is a common expression, and for text books, it is often the case that a figure or an illustration can replace a large number of words as well. However, they believe that an interactive illustration can say even more, and that is why they have decided to build their linear algebra book around such illustrations. They believe that these figures make it easier and faster to digest and to learn linear algebra (which would be the case for many other mathematical books as well, for that matter). In addition, they have added some more features (e.g., popup windows for common linear algebra terms) to their book, and they believe that those features will make it easier and faster to read and understand as well.

After using linear algebra for 20 years times three persons, they were ready to write a linear algebra book that they think will make it substantially easier to learn and to teach linear algebra. In addition, the technology of mobile devices and web browsers have improved beyond a certain threshold, so that this book could be put together in a very novel and innovative way (they think). The idea is to start each chapter with an intuitive concrete example that practically shows how the math works using interactive illustrations. After that, the more formal math is introduced, and the concepts are generalized and sometimes made more abstract. They believe it is easier to understand the entire topic of linear algebra with a simple and concrete example cemented into the reader's mind in the beginning of each chapter.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #math #linearalgebra #algebra

@data_science_weekly
👍3
Oh Shit, Git!?!

Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.

- I did something terribly wrong, please tell me git has a magic time machine!?!
- I committed and immediately realized I need to make one small change!
- I need to change the message on my last commit!
- I accidentally committed something to master that should have been on a brand new branch!
- I accidentally committed to the wrong branch!
- I tried to run a diff but nothing happened?!
- I need to undo a commit from like 5 commits ago!
- I need to undo my changes to a file!
- I give up

Link

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #git #versioncontrol #github #gitlab

@data_science_weekly
👍2
Leetcode for ML

Super neat set of machine learning coding challenges.

It could be useful to prep for an exam or ML interview.

Link

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #ml #dl #machinelearning #deeplearning

@data_science_weekly
👍6
NeetCode: A better way to prepare for coding interviews

The best free resources for Coding Interviews. Period.
- Organized study plans and roadmaps (Blind 75, Neetcode 150).
- Detailed video explanations.
- Public Discord community with over 30,000 members.
- Sign in to save your progress.

Links:
- Roadmap
- Practice (Core Skills, Blind 75, Neetcode 150, Neetcode All)
- Algorithms and Data Structures for Beginners (course) paid
- Advanced Algorithms (course) paid

Navigational hashtags: #armknowledgesharing #armsites #armtutorials
General hashtags: #leetcode #python #algorithms #datastructures #interviewpreparation #technicalinterview

@data_science_weekly
👍3
Write faster Python code, and ship your code faster

Faster and more memory efficient data
- Articles: Learn how to speed up your code and reduce memory usage.
- Products: Observability and profiling tools to help you identify bottlenecks in your code.

Docker packaging for Python
- Articles: Learn how to package your Python application for production.
- Products: Educational books and pre-written software templates.

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #python #development #docker

@data_science_weekly
👍4
Ace the SQL Interview by Nick Singh

Practice the most common SQL & Data Interview Questions and Learn SQL.

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #sql

@data_science_weekly
👍6
Applied Causal Inference Powered by ML and AI by Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

An introduction to the emerging fusion of machine learning and causal inference.

The book introduces ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Debiased Machine Learning methods to do inference in such models using modern predictive tools.

Links:
- PDF
- Site
- GitHub

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #ml #ai #causal #causalinference

@data_science_weekly
👍5
Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems by David S. Jordan

Key Features
- Learn how to integrate spatial data and spatial thinking into traditional data science workflows
- Develop a spatial perspective and learn to avoid common pitfalls along the way
- Gain expertise through practical case studies applicable in a variety of industries with code samples that can be reproduced and expanded

Table of Contents
1. Introducing Geographic Information Systems and Geospatial Data Science
2. What Is Geospatial Data and Where Can I Find It?
3. Working with Geographic and Projected Coordinate Systems
4. Exploring Geospatial Data Science Packages
5. Exploratory Data Visualization
6. Hypothesis Testing and Spatial Randomness
7. Spatial Feature Engineering
8. Spatial Clustering and Regionalization
9. Developing Spatial Regression Models
10. Developing Solutions for Spatial Optimization Problems
11. Advanced Topics in Spatial Data Science

Links:
- Amazon
- Packt
- GitHub

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #geo #geospatial

@data_science_weekly
👍4
Introduction to Machine Learning (I2ML) by LMU Munich

This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.

The quite extensive material can roughly be divided into:
- An introductory undergraduate part (chapters 1-10)
- A more advanced second one on MSc level (chapters 11-19)
- A third course, on MSc level (chapters 20-23).

A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare, and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint, and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.

Link:
- Main Course Website

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #ml #machinelearning #supervised

@data_science_weekly
👍6
Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos

This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. Authors don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details.

The book is written for three audiences:
(1) people finding themselves doing forecasting in business when they may not have had any formal training in the area;
(2) undergraduate students studying business;
(3) MBA students doing a forecasting elective. We use it ourselves for masters students and third-year undergraduate students at Monash University, Australia.

For most sections, authors only assume that readers are familiar with introductory statistics, and with high-school algebra. There are a couple of sections that also require knowledge of matrices, but these are flagged.

At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, authors suggest journal articles that provide more information.

Link: Book Website

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #forecasting #timeseries #ts

@data_science_weekly
👍3
The Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith

The Cartoon Guide to Statistics covers all the central ideas of modern statistics: the summary and display of data, probability in gambling and medicine, random variables, Bernoulli Trials, the Central Limit Theorem, hypothesis testing, confidence interval estimation, and much more - all explained in simple, clear, and yes, funny illustrations. Never again will you order the Poisson Distribution in a French restaurant!

Links:
- Amazon
- Internet Archive

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #stats #probability

@data_science_weekly
👍4
Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning by Google Cloud

Across industries, DevOps and DataOps have been widely adopted as methodologies to improve quality and reduce the time to market of software engineering and data engineering initiatives. With the rapid growth in machine learning (ML) systems, similar approaches need to be developed in the context of ML engineering, which handle the unique complexities of the practical applications of ML. This is the domain of MLOps. MLOps is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.

The document is in two parts. The first part, an overview of the MLOps lifecycle, is for all readers. It introduces MLOps processes and capabilities and why they’re important for successful adoption of ML-based systems.

The second part is a deep dive on the MLOps processes and capabilities. This part is for readers who want to understand the concrete details of tasks like running a continuous training pipeline, deploying a model, and monitoring predictive performance of an ML model.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #mlops

@data_science_weekly
👍6