Artem Ryblov’s Data Science Weekly
618 subscribers
139 photos
163 links
@artemfisherman’s Data Science Weekly: Elevate your expertise with a standout data science resource each week, carefully chosen for depth and impact.

Long-form content: https://artemryblov.substack.com
Download Telegram
Bash Scripting Tutorial for Beginners by Herbert Lindemans

Learn bash scripting in this crash course for beginners. Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.

⌨️ (00:00) Introduction
⌨️ (03:24) Basic commands
⌨️ (06:21) Writing your first bash script
⌨️ (11:29) Variables
⌨️ (14:55) Positional arguments
⌨️ (16:23) Output/Input redirection
⌨️ (23:23) Test operators
⌨️ (25:19) If/Elif/Else
⌨️ (28:37) Case statements
⌨️ (32:16) Arrays
⌨️ (34:12) For loop
⌨️ (36:03) Functions
⌨️ (41:31) Exit codes
⌨️ (42:30) AWK
⌨️ (45:11) SED

Link: Video

Navigational hashtags: #armknowledgesharing #armyoutube
General hashtags: #bash #cmd #terminal

@data_science_weekly
👍2
Immersive linear algebra by J. Ström, K. Åström, and T. Akenine-Möller

"A picture says more than a thousand words" is a common expression, and for text books, it is often the case that a figure or an illustration can replace a large number of words as well. However, they believe that an interactive illustration can say even more, and that is why they have decided to build their linear algebra book around such illustrations. They believe that these figures make it easier and faster to digest and to learn linear algebra (which would be the case for many other mathematical books as well, for that matter). In addition, they have added some more features (e.g., popup windows for common linear algebra terms) to their book, and they believe that those features will make it easier and faster to read and understand as well.

After using linear algebra for 20 years times three persons, they were ready to write a linear algebra book that they think will make it substantially easier to learn and to teach linear algebra. In addition, the technology of mobile devices and web browsers have improved beyond a certain threshold, so that this book could be put together in a very novel and innovative way (they think). The idea is to start each chapter with an intuitive concrete example that practically shows how the math works using interactive illustrations. After that, the more formal math is introduced, and the concepts are generalized and sometimes made more abstract. They believe it is easier to understand the entire topic of linear algebra with a simple and concrete example cemented into the reader's mind in the beginning of each chapter.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #math #linearalgebra #algebra

@data_science_weekly
👍3
Oh Shit, Git!?!

Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.

- I did something terribly wrong, please tell me git has a magic time machine!?!
- I committed and immediately realized I need to make one small change!
- I need to change the message on my last commit!
- I accidentally committed something to master that should have been on a brand new branch!
- I accidentally committed to the wrong branch!
- I tried to run a diff but nothing happened?!
- I need to undo a commit from like 5 commits ago!
- I need to undo my changes to a file!
- I give up

Link

Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #git #versioncontrol #github #gitlab

@data_science_weekly
👍2
Leetcode for ML

Super neat set of machine learning coding challenges.

It could be useful to prep for an exam or ML interview.

Link

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #ml #dl #machinelearning #deeplearning

@data_science_weekly
👍6
NeetCode: A better way to prepare for coding interviews

The best free resources for Coding Interviews. Period.
- Organized study plans and roadmaps (Blind 75, Neetcode 150).
- Detailed video explanations.
- Public Discord community with over 30,000 members.
- Sign in to save your progress.

Links:
- Roadmap
- Practice (Core Skills, Blind 75, Neetcode 150, Neetcode All)
- Algorithms and Data Structures for Beginners (course) paid
- Advanced Algorithms (course) paid

Navigational hashtags: #armknowledgesharing #armsites #armtutorials
General hashtags: #leetcode #python #algorithms #datastructures #interviewpreparation #technicalinterview

@data_science_weekly
👍3
Write faster Python code, and ship your code faster

Faster and more memory efficient data
- Articles: Learn how to speed up your code and reduce memory usage.
- Products: Observability and profiling tools to help you identify bottlenecks in your code.

Docker packaging for Python
- Articles: Learn how to package your Python application for production.
- Products: Educational books and pre-written software templates.

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #python #development #docker

@data_science_weekly
👍4
Ace the SQL Interview by Nick Singh

Practice the most common SQL & Data Interview Questions and Learn SQL.

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #sql

@data_science_weekly
👍6
Applied Causal Inference Powered by ML and AI by Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

An introduction to the emerging fusion of machine learning and causal inference.

The book introduces ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Debiased Machine Learning methods to do inference in such models using modern predictive tools.

Links:
- PDF
- Site
- GitHub

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #ml #ai #causal #causalinference

@data_science_weekly
👍5
Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems by David S. Jordan

Key Features
- Learn how to integrate spatial data and spatial thinking into traditional data science workflows
- Develop a spatial perspective and learn to avoid common pitfalls along the way
- Gain expertise through practical case studies applicable in a variety of industries with code samples that can be reproduced and expanded

Table of Contents
1. Introducing Geographic Information Systems and Geospatial Data Science
2. What Is Geospatial Data and Where Can I Find It?
3. Working with Geographic and Projected Coordinate Systems
4. Exploring Geospatial Data Science Packages
5. Exploratory Data Visualization
6. Hypothesis Testing and Spatial Randomness
7. Spatial Feature Engineering
8. Spatial Clustering and Regionalization
9. Developing Spatial Regression Models
10. Developing Solutions for Spatial Optimization Problems
11. Advanced Topics in Spatial Data Science

Links:
- Amazon
- Packt
- GitHub

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #geo #geospatial

@data_science_weekly
👍4
Introduction to Machine Learning (I2ML) by LMU Munich

This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.

The quite extensive material can roughly be divided into:
- An introductory undergraduate part (chapters 1-10)
- A more advanced second one on MSc level (chapters 11-19)
- A third course, on MSc level (chapters 20-23).

A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare, and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint, and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.

Link:
- Main Course Website

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #ml #machinelearning #supervised

@data_science_weekly
👍6
Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos

This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. Authors don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details.

The book is written for three audiences:
(1) people finding themselves doing forecasting in business when they may not have had any formal training in the area;
(2) undergraduate students studying business;
(3) MBA students doing a forecasting elective. We use it ourselves for masters students and third-year undergraduate students at Monash University, Australia.

For most sections, authors only assume that readers are familiar with introductory statistics, and with high-school algebra. There are a couple of sections that also require knowledge of matrices, but these are flagged.

At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, authors suggest journal articles that provide more information.

Link: Book Website

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #forecasting #timeseries #ts

@data_science_weekly
👍3
The Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith

The Cartoon Guide to Statistics covers all the central ideas of modern statistics: the summary and display of data, probability in gambling and medicine, random variables, Bernoulli Trials, the Central Limit Theorem, hypothesis testing, confidence interval estimation, and much more - all explained in simple, clear, and yes, funny illustrations. Never again will you order the Poisson Distribution in a French restaurant!

Links:
- Amazon
- Internet Archive

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #stats #probability

@data_science_weekly
👍4
Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning by Google Cloud

Across industries, DevOps and DataOps have been widely adopted as methodologies to improve quality and reduce the time to market of software engineering and data engineering initiatives. With the rapid growth in machine learning (ML) systems, similar approaches need to be developed in the context of ML engineering, which handle the unique complexities of the practical applications of ML. This is the domain of MLOps. MLOps is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.

The document is in two parts. The first part, an overview of the MLOps lifecycle, is for all readers. It introduces MLOps processes and capabilities and why they’re important for successful adoption of ML-based systems.

The second part is a deep dive on the MLOps processes and capabilities. This part is for readers who want to understand the concrete details of tasks like running a continuous training pipeline, deploying a model, and monitoring predictive performance of an ML model.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #mlops

@data_science_weekly
👍6
CS324 - Large Language Models by Stanford University

The field of natural language processing (NLP) has been transformed by massive pre-trained language models. They form the basis of all state-of-the-art systems across a wide range of tasks and have shown an impressive ability to generate fluent text and perform few-shot learning. At the same time, these models are hard to understand and give rise to new ethical and scalability challenges. In this course, students will learn the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them.

TABLE OF CONTENTS
- Introduction
- Capabilities
- Harms I
- Harms
- Data
- Security
- Legality
- Modeling
- Training
- Parallelism
- Scaling laws
- Selective architectures
- Adaptation
- Environmental impact

Link: Course

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #nlp #llm #transformer

@data_science_weekly
👍6
Deep Learning with Python by François Chollet

Deep Learning with Python, Second Edition introduces the field of deep learning using Python and the powerful Keras library. In this revised and expanded new edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. As you move through this book, you’ll build your understanding through intuitive explanations, crisp color illustrations, and clear examples. You’ll quickly pick up the skills you need to start developing deep-learning applications.

What's inside:
- Deep learning from first principles
- Image classification and image segmentation
- Time series forecasting
- Text classification and machine translation
- Text generation, neural style transfer, and image generation
- Printed in full color throughout

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #dl #deeplearning #keras

@data_science_weekly
👍4
Competitive Programmer’s Handbook by Antti Laaksonen

The purpose of this book is to give you a thorough introduction to competitive programming. It is assumed that you already know the basics of programming, but no previous background in competitive programming is needed.

The book is especially intended for students who want to learn algorithms and possibly participate in the International Olympiad in Informatics (IOI) or in the International Collegiate Programming Contest (ICPC). Of course, the book is also suitable for anybody else interested in competitive programming.

It takes a long time to become a good competitive programmer, but it is also an opportunity to learn a lot. You can be sure that you will get a good general understanding of algorithms if you spend time reading the book, solving problems and taking part in contests.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #leetcode #programming #competitiveprogramming

@data_science_weekly
👍2
The Querynomicon. An Introduction to SQL for Weary Data Scientists

Upon first encountering SQL after two decades of Fortran, C, Java, and Python, author thought he had stumbled into hell. He quickly realized that was optimistic: after all, hell has rules.

Author have since realized that SQL does too, and that they are no more confusing or contradictory than those of most other programming languages. They only appear so because it draws on a tradition unfamiliar to those of us raised with derivatives of C. To quote Terry Pratchett, it is not mad, just differently sane.

Welcome, then, to a world in which the strange will become familiar, and the familiar, strange. Welcome, thrice welcome, to SQL.

Table of contents:

1. Introduction
2. Core Features
3. Tools
4. Advanced Features
5. Python
6. R
7. PostgreSQL
8. Conclusion

Link: Tutorial

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #sql

@data_science_weekly
👍2
How to Win a Kaggle Competition by Darek Kłeczek

Darek Kłeczek:
When I join a competition, I research winning solutions from past similar competitions. It takes a lot of time to read and digest them, but it's an incredible source of ideas and knowledge. But what if we could learn from all the competitions? We've been given a list of Kaggle writeups in this competition, but there are so many of them! If only we could find a way to extract some structured data and analyze it... Well, it turns out that large language models (LLMs) [1] can help us extract structured data from unstructured writeups.


In this essay, author starts by providing a quick overview of the process he uses to collect data. He then presents several insights from analyzing datasets. The focus is to understand what the community has learned over the past 2 years of working and experimenting with Kaggle competitions. Finally, he mentions some ideas for future research.

Link: Kaggle

Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #kaggle #competitions

@data_science_weekly
👍3
MACHINE LEARNING @ Vrije Universiteit Amsterdam

This page contains all public information about the course Machine Learning at the VU University Amsterdam.

They provide the following materials:
- Lecture slides and videos.
- Worksheets
These are very brief Jupyter notebooks to help you get the software installed and to show the basics. They introduce the libraries Numpy, Matplotlib, Pandas, Sklearn and Keras.
- Homework
The homework consists of small pen-and-paper exercises to help you test that you’ve really understood the more technical points of the lectures. Answers are provided. If you are a registered student, please refer to the Canvas page instead. All material authored by Peter Bloem unless noted otherwise.

Link: Site

Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #dl #deeplearning

@data_science_weekly
👍5
Spark in Action by Jean-Georges Perrin

Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #spark #bigdata #sql

@data_science_weekly
👍5
ML and LLM system design: 500 case studies to learn from

How do companies like Netflix, Airbnb, and Doordash apply AI to improve their products and processes? We put together a database of 500 case studies from 100+ companies that share practical ML use cases, including applications built with LLMs and Generative AI, and learnings from designing ML and LLM systems.

Navigation tips. You can play around with the database by filtering case studies by industry or ML use case. We added tags based on recurring themes. This is not a perfect or mutually exclusive division, but you can use the tags to quickly find:
- Generative AI use cases. Look for tags “generative AI” and “LLM” to find examples of real-world LLM applications.
- ML systems with different data types: computer vision (CV) or natural language processing (NLP).
- ML systems for specific use cases. The most popular are recommender systems, search and ranking, and fraud detection.
- We also labeled use cases where ML powers a specific user-facing "product feature": from grammatical error correction to generating outfit combinations.

Link: Site

Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #mlsystemdesign #ml #systemdesign #llm

@data_science_weekly
👍10