Oh Shit, Git!?!
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.
- I did something terribly wrong, please tell me git has a magic time machine!?!
- I committed and immediately realized I need to make one small change!
- I need to change the message on my last commit!
- I accidentally committed something to master that should have been on a brand new branch!
- I accidentally committed to the wrong branch!
- I tried to run a diff but nothing happened?!
- I need to undo a commit from like 5 commits ago!
- I need to undo my changes to a file!
- I give up
Link
Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #git #versioncontrol #github #gitlab
@data_science_weekly
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.
- I did something terribly wrong, please tell me git has a magic time machine!?!
- I committed and immediately realized I need to make one small change!
- I need to change the message on my last commit!
- I accidentally committed something to master that should have been on a brand new branch!
- I accidentally committed to the wrong branch!
- I tried to run a diff but nothing happened?!
- I need to undo a commit from like 5 commits ago!
- I need to undo my changes to a file!
- I give up
Link
Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #git #versioncontrol #github #gitlab
@data_science_weekly
👍2
Leetcode for ML
Super neat set of machine learning coding challenges.
It could be useful to prep for an exam or ML interview.
Link
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #ml #dl #machinelearning #deeplearning
@data_science_weekly
Super neat set of machine learning coding challenges.
It could be useful to prep for an exam or ML interview.
Link
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #ml #dl #machinelearning #deeplearning
@data_science_weekly
👍6
NeetCode: A better way to prepare for coding interviews
The best free resources for Coding Interviews. Period.
- Organized study plans and roadmaps (Blind 75, Neetcode 150).
- Detailed video explanations.
- Public Discord community with over 30,000 members.
- Sign in to save your progress.
Links:
- Roadmap
- Practice (Core Skills, Blind 75, Neetcode 150, Neetcode All)
- Algorithms and Data Structures for Beginners (course)
- Advanced Algorithms (course)
Navigational hashtags: #armknowledgesharing #armsites #armtutorials
General hashtags: #leetcode #python #algorithms #datastructures #interviewpreparation #technicalinterview
@data_science_weekly
The best free resources for Coding Interviews. Period.
- Organized study plans and roadmaps (Blind 75, Neetcode 150).
- Detailed video explanations.
- Public Discord community with over 30,000 members.
- Sign in to save your progress.
Links:
- Roadmap
- Practice (Core Skills, Blind 75, Neetcode 150, Neetcode All)
- Algorithms and Data Structures for Beginners (course)
paid- Advanced Algorithms (course)
paid Navigational hashtags: #armknowledgesharing #armsites #armtutorials
General hashtags: #leetcode #python #algorithms #datastructures #interviewpreparation #technicalinterview
@data_science_weekly
👍3
Write faster Python code, and ship your code faster
Faster and more memory efficient data
- Articles: Learn how to speed up your code and reduce memory usage.
- Products: Observability and profiling tools to help you identify bottlenecks in your code.
Docker packaging for Python
- Articles: Learn how to package your Python application for production.
- Products: Educational books and pre-written software templates.
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #python #development #docker
@data_science_weekly
Faster and more memory efficient data
- Articles: Learn how to speed up your code and reduce memory usage.
- Products: Observability and profiling tools to help you identify bottlenecks in your code.
Docker packaging for Python
- Articles: Learn how to package your Python application for production.
- Products: Educational books and pre-written software templates.
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #python #development #docker
@data_science_weekly
👍4
Ace the SQL Interview by Nick Singh
Practice the most common SQL & Data Interview Questions and Learn SQL.
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #sql
@data_science_weekly
Practice the most common SQL & Data Interview Questions and Learn SQL.
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #sql
@data_science_weekly
👍6
Applied Causal Inference Powered by ML and AI by Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis
An introduction to the emerging fusion of machine learning and causal inference.
The book introduces ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Debiased Machine Learning methods to do inference in such models using modern predictive tools.
Links:
- PDF
- Site
- GitHub
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #ml #ai #causal #causalinference
@data_science_weekly
An introduction to the emerging fusion of machine learning and causal inference.
The book introduces ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and presents Debiased Machine Learning methods to do inference in such models using modern predictive tools.
Links:
- Site
- GitHub
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #ml #ai #causal #causalinference
@data_science_weekly
👍5
Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems by David S. Jordan
Key Features
- Learn how to integrate spatial data and spatial thinking into traditional data science workflows
- Develop a spatial perspective and learn to avoid common pitfalls along the way
- Gain expertise through practical case studies applicable in a variety of industries with code samples that can be reproduced and expanded
Table of Contents
1. Introducing Geographic Information Systems and Geospatial Data Science
2. What Is Geospatial Data and Where Can I Find It?
3. Working with Geographic and Projected Coordinate Systems
4. Exploring Geospatial Data Science Packages
5. Exploratory Data Visualization
6. Hypothesis Testing and Spatial Randomness
7. Spatial Feature Engineering
8. Spatial Clustering and Regionalization
9. Developing Spatial Regression Models
10. Developing Solutions for Spatial Optimization Problems
11. Advanced Topics in Spatial Data Science
Links:
- Amazon
- Packt
- GitHub
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #geo #geospatial
@data_science_weekly
Key Features
- Learn how to integrate spatial data and spatial thinking into traditional data science workflows
- Develop a spatial perspective and learn to avoid common pitfalls along the way
- Gain expertise through practical case studies applicable in a variety of industries with code samples that can be reproduced and expanded
Table of Contents
1. Introducing Geographic Information Systems and Geospatial Data Science
2. What Is Geospatial Data and Where Can I Find It?
3. Working with Geographic and Projected Coordinate Systems
4. Exploring Geospatial Data Science Packages
5. Exploratory Data Visualization
6. Hypothesis Testing and Spatial Randomness
7. Spatial Feature Engineering
8. Spatial Clustering and Regionalization
9. Developing Spatial Regression Models
10. Developing Solutions for Spatial Optimization Problems
11. Advanced Topics in Spatial Data Science
Links:
- Amazon
- Packt
- GitHub
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #geo #geospatial
@data_science_weekly
👍4
Introduction to Machine Learning (I2ML) by LMU Munich
This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.
The quite extensive material can roughly be divided into:
- An introductory undergraduate part (chapters 1-10)
- A more advanced second one on MSc level (chapters 11-19)
- A third course, on MSc level (chapters 20-23).
A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare, and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint, and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.
Link:
- Main Course Website
Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #ml #machinelearning #supervised
@data_science_weekly
This website offers an open and free introductory course on (supervised) machine learning. The course is constructed as self-contained as possible, and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.
The quite extensive material can roughly be divided into:
- An introductory undergraduate part (chapters 1-10)
- A more advanced second one on MSc level (chapters 11-19)
- A third course, on MSc level (chapters 20-23).
A key goal of the course is to teach the fundamental building blocks behind ML, instead of introducing “yet another algorithm with yet another name”. We discuss, compare, and contrast risk minimization, statistical parameter estimation, the Bayesian viewpoint, and information theory and demonstrate that all of these are equally valid entry points to ML. Developing the ability to take on and switch between these perspectives is a major goal of this course, and in our opinion not always ideally presented in other courses.
Link:
- Main Course Website
Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #ml #machinelearning #supervised
@data_science_weekly
👍6
Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos
This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. Authors don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details.
The book is written for three audiences:
(1) people finding themselves doing forecasting in business when they may not have had any formal training in the area;
(2) undergraduate students studying business;
(3) MBA students doing a forecasting elective. We use it ourselves for masters students and third-year undergraduate students at Monash University, Australia.
For most sections, authors only assume that readers are familiar with introductory statistics, and with high-school algebra. There are a couple of sections that also require knowledge of matrices, but these are flagged.
At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, authors suggest journal articles that provide more information.
Link: Book Website
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #forecasting #timeseries #ts
@data_science_weekly
This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. Authors don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details.
The book is written for three audiences:
(1) people finding themselves doing forecasting in business when they may not have had any formal training in the area;
(2) undergraduate students studying business;
(3) MBA students doing a forecasting elective. We use it ourselves for masters students and third-year undergraduate students at Monash University, Australia.
For most sections, authors only assume that readers are familiar with introductory statistics, and with high-school algebra. There are a couple of sections that also require knowledge of matrices, but these are flagged.
At the end of each chapter we provide a list of “further reading”. In general, these lists comprise suggested textbooks that provide a more advanced or detailed treatment of the subject. Where there is no suitable textbook, authors suggest journal articles that provide more information.
Link: Book Website
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #forecasting #timeseries #ts
@data_science_weekly
👍3
The Cartoon Guide to Statistics by Larry Gonick, Woollcott Smith
The Cartoon Guide to Statistics covers all the central ideas of modern statistics: the summary and display of data, probability in gambling and medicine, random variables, Bernoulli Trials, the Central Limit Theorem, hypothesis testing, confidence interval estimation, and much more - all explained in simple, clear, and yes, funny illustrations. Never again will you order the Poisson Distribution in a French restaurant!
Links:
- Amazon
- Internet Archive
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #stats #probability
@data_science_weekly
The Cartoon Guide to Statistics covers all the central ideas of modern statistics: the summary and display of data, probability in gambling and medicine, random variables, Bernoulli Trials, the Central Limit Theorem, hypothesis testing, confidence interval estimation, and much more - all explained in simple, clear, and yes, funny illustrations. Never again will you order the Poisson Distribution in a French restaurant!
Links:
- Amazon
- Internet Archive
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #statistics #stats #probability
@data_science_weekly
👍4
Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning by Google Cloud
Across industries, DevOps and DataOps have been widely adopted as methodologies to improve quality and reduce the time to market of software engineering and data engineering initiatives. With the rapid growth in machine learning (ML) systems, similar approaches need to be developed in the context of ML engineering, which handle the unique complexities of the practical applications of ML. This is the domain of MLOps. MLOps is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.
The document is in two parts. The first part, an overview of the MLOps lifecycle, is for all readers. It introduces MLOps processes and capabilities and why they’re important for successful adoption of ML-based systems.
The second part is a deep dive on the MLOps processes and capabilities. This part is for readers who want to understand the concrete details of tasks like running a continuous training pipeline, deploying a model, and monitoring predictive performance of an ML model.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #mlops
@data_science_weekly
Across industries, DevOps and DataOps have been widely adopted as methodologies to improve quality and reduce the time to market of software engineering and data engineering initiatives. With the rapid growth in machine learning (ML) systems, similar approaches need to be developed in the context of ML engineering, which handle the unique complexities of the practical applications of ML. This is the domain of MLOps. MLOps is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.
The document is in two parts. The first part, an overview of the MLOps lifecycle, is for all readers. It introduces MLOps processes and capabilities and why they’re important for successful adoption of ML-based systems.
The second part is a deep dive on the MLOps processes and capabilities. This part is for readers who want to understand the concrete details of tasks like running a continuous training pipeline, deploying a model, and monitoring predictive performance of an ML model.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #mlops
@data_science_weekly
👍6
CS324 - Large Language Models by Stanford University
The field of natural language processing (NLP) has been transformed by massive pre-trained language models. They form the basis of all state-of-the-art systems across a wide range of tasks and have shown an impressive ability to generate fluent text and perform few-shot learning. At the same time, these models are hard to understand and give rise to new ethical and scalability challenges. In this course, students will learn the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them.
TABLE OF CONTENTS
- Introduction
- Capabilities
- Harms I
- Harms
- Data
- Security
- Legality
- Modeling
- Training
- Parallelism
- Scaling laws
- Selective architectures
- Adaptation
- Environmental impact
Link: Course
Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #nlp #llm #transformer
@data_science_weekly
The field of natural language processing (NLP) has been transformed by massive pre-trained language models. They form the basis of all state-of-the-art systems across a wide range of tasks and have shown an impressive ability to generate fluent text and perform few-shot learning. At the same time, these models are hard to understand and give rise to new ethical and scalability challenges. In this course, students will learn the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as gain hands-on experience working with them.
TABLE OF CONTENTS
- Introduction
- Capabilities
- Harms I
- Harms
- Data
- Security
- Legality
- Modeling
- Training
- Parallelism
- Scaling laws
- Selective architectures
- Adaptation
- Environmental impact
Link: Course
Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #nlp #llm #transformer
@data_science_weekly
👍6
Deep Learning with Python by François Chollet
Deep Learning with Python, Second Edition introduces the field of deep learning using Python and the powerful Keras library. In this revised and expanded new edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. As you move through this book, you’ll build your understanding through intuitive explanations, crisp color illustrations, and clear examples. You’ll quickly pick up the skills you need to start developing deep-learning applications.
What's inside:
- Deep learning from first principles
- Image classification and image segmentation
- Time series forecasting
- Text classification and machine translation
- Text generation, neural style transfer, and image generation
- Printed in full color throughout
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #dl #deeplearning #keras
@data_science_weekly
Deep Learning with Python, Second Edition introduces the field of deep learning using Python and the powerful Keras library. In this revised and expanded new edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. As you move through this book, you’ll build your understanding through intuitive explanations, crisp color illustrations, and clear examples. You’ll quickly pick up the skills you need to start developing deep-learning applications.
What's inside:
- Deep learning from first principles
- Image classification and image segmentation
- Time series forecasting
- Text classification and machine translation
- Text generation, neural style transfer, and image generation
- Printed in full color throughout
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #dl #deeplearning #keras
@data_science_weekly
👍4
Competitive Programmer’s Handbook by Antti Laaksonen
The purpose of this book is to give you a thorough introduction to competitive programming. It is assumed that you already know the basics of programming, but no previous background in competitive programming is needed.
The book is especially intended for students who want to learn algorithms and possibly participate in the International Olympiad in Informatics (IOI) or in the International Collegiate Programming Contest (ICPC). Of course, the book is also suitable for anybody else interested in competitive programming.
It takes a long time to become a good competitive programmer, but it is also an opportunity to learn a lot. You can be sure that you will get a good general understanding of algorithms if you spend time reading the book, solving problems and taking part in contests.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #leetcode #programming #competitiveprogramming
@data_science_weekly
The purpose of this book is to give you a thorough introduction to competitive programming. It is assumed that you already know the basics of programming, but no previous background in competitive programming is needed.
The book is especially intended for students who want to learn algorithms and possibly participate in the International Olympiad in Informatics (IOI) or in the International Collegiate Programming Contest (ICPC). Of course, the book is also suitable for anybody else interested in competitive programming.
It takes a long time to become a good competitive programmer, but it is also an opportunity to learn a lot. You can be sure that you will get a good general understanding of algorithms if you spend time reading the book, solving problems and taking part in contests.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #leetcode #programming #competitiveprogramming
@data_science_weekly
👍2
The Querynomicon. An Introduction to SQL for Weary Data Scientists
Upon first encountering SQL after two decades of Fortran, C, Java, and Python, author thought he had stumbled into hell. He quickly realized that was optimistic: after all, hell has rules.
Author have since realized that SQL does too, and that they are no more confusing or contradictory than those of most other programming languages. They only appear so because it draws on a tradition unfamiliar to those of us raised with derivatives of C. To quote Terry Pratchett, it is not mad, just differently sane.
Welcome, then, to a world in which the strange will become familiar, and the familiar, strange. Welcome, thrice welcome, to SQL.
Table of contents:
1. Introduction
2. Core Features
3. Tools
4. Advanced Features
5. Python
6. R
7. PostgreSQL
8. Conclusion
Link: Tutorial
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #sql
@data_science_weekly
Upon first encountering SQL after two decades of Fortran, C, Java, and Python, author thought he had stumbled into hell. He quickly realized that was optimistic: after all, hell has rules.
Author have since realized that SQL does too, and that they are no more confusing or contradictory than those of most other programming languages. They only appear so because it draws on a tradition unfamiliar to those of us raised with derivatives of C. To quote Terry Pratchett, it is not mad, just differently sane.
Welcome, then, to a world in which the strange will become familiar, and the familiar, strange. Welcome, thrice welcome, to SQL.
Table of contents:
1. Introduction
2. Core Features
3. Tools
4. Advanced Features
5. Python
6. R
7. PostgreSQL
8. Conclusion
Link: Tutorial
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #sql
@data_science_weekly
👍2
How to Win a Kaggle Competition by Darek Kłeczek
Darek Kłeczek:
In this essay, author starts by providing a quick overview of the process he uses to collect data. He then presents several insights from analyzing datasets. The focus is to understand what the community has learned over the past 2 years of working and experimenting with Kaggle competitions. Finally, he mentions some ideas for future research.
Link: Kaggle
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #kaggle #competitions
@data_science_weekly
Darek Kłeczek:
When I join a competition, I research winning solutions from past similar competitions. It takes a lot of time to read and digest them, but it's an incredible source of ideas and knowledge. But what if we could learn from all the competitions? We've been given a list of Kaggle writeups in this competition, but there are so many of them! If only we could find a way to extract some structured data and analyze it... Well, it turns out that large language models (LLMs) [1] can help us extract structured data from unstructured writeups.
In this essay, author starts by providing a quick overview of the process he uses to collect data. He then presents several insights from analyzing datasets. The focus is to understand what the community has learned over the past 2 years of working and experimenting with Kaggle competitions. Finally, he mentions some ideas for future research.
Link: Kaggle
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #kaggle #competitions
@data_science_weekly
👍3
MACHINE LEARNING @ Vrije Universiteit Amsterdam
This page contains all public information about the course Machine Learning at the VU University Amsterdam.
They provide the following materials:
- Lecture slides and videos.
- Worksheets
These are very brief Jupyter notebooks to help you get the software installed and to show the basics. They introduce the libraries Numpy, Matplotlib, Pandas, Sklearn and Keras.
- Homework
The homework consists of small pen-and-paper exercises to help you test that you’ve really understood the more technical points of the lectures. Answers are provided. If you are a registered student, please refer to the Canvas page instead. All material authored by Peter Bloem unless noted otherwise.
Link: Site
Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #dl #deeplearning
@data_science_weekly
This page contains all public information about the course Machine Learning at the VU University Amsterdam.
They provide the following materials:
- Lecture slides and videos.
- Worksheets
These are very brief Jupyter notebooks to help you get the software installed and to show the basics. They introduce the libraries Numpy, Matplotlib, Pandas, Sklearn and Keras.
- Homework
The homework consists of small pen-and-paper exercises to help you test that you’ve really understood the more technical points of the lectures. Answers are provided. If you are a registered student, please refer to the Canvas page instead. All material authored by Peter Bloem unless noted otherwise.
Link: Site
Navigational hashtags: #armknowledgesharing #armcourses
General hashtags: #machinelearning #ml #dl #deeplearning
@data_science_weekly
👍5
Spark in Action by Jean-Georges Perrin
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #spark #bigdata #sql
@data_science_weekly
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.
Link: Book
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #spark #bigdata #sql
@data_science_weekly
👍5
ML and LLM system design: 500 case studies to learn from
How do companies like Netflix, Airbnb, and Doordash apply AI to improve their products and processes? We put together a database of 500 case studies from 100+ companies that share practical ML use cases, including applications built with LLMs and Generative AI, and learnings from designing ML and LLM systems.
Navigation tips. You can play around with the database by filtering case studies by industry or ML use case. We added tags based on recurring themes. This is not a perfect or mutually exclusive division, but you can use the tags to quickly find:
- Generative AI use cases. Look for tags “generative AI” and “LLM” to find examples of real-world LLM applications.
- ML systems with different data types: computer vision (CV) or natural language processing (NLP).
- ML systems for specific use cases. The most popular are recommender systems, search and ranking, and fraud detection.
- We also labeled use cases where ML powers a specific user-facing "product feature": from grammatical error correction to generating outfit combinations.
Link: Site
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #mlsystemdesign #ml #systemdesign #llm
@data_science_weekly
How do companies like Netflix, Airbnb, and Doordash apply AI to improve their products and processes? We put together a database of 500 case studies from 100+ companies that share practical ML use cases, including applications built with LLMs and Generative AI, and learnings from designing ML and LLM systems.
Navigation tips. You can play around with the database by filtering case studies by industry or ML use case. We added tags based on recurring themes. This is not a perfect or mutually exclusive division, but you can use the tags to quickly find:
- Generative AI use cases. Look for tags “generative AI” and “LLM” to find examples of real-world LLM applications.
- ML systems with different data types: computer vision (CV) or natural language processing (NLP).
- ML systems for specific use cases. The most popular are recommender systems, search and ranking, and fraud detection.
- We also labeled use cases where ML powers a specific user-facing "product feature": from grammatical error correction to generating outfit combinations.
Link: Site
Navigational hashtags: #armknowledgesharing #armsites
General hashtags: #mlsystemdesign #ml #systemdesign #llm
@data_science_weekly
👍10
Data Structures & Algorithms by Google
Familiarize yourself with common data structures and algorithms such as lists, trees, maps, graphs, Big-O analysis, and more!
Topics:
- Maps/Dictionaries
- Linked Lists
- Trees
- Stacks & Queues
- Heaps
- Graphs
- Runtime Analysis
- Searching & Sorting
- Recursion & DP
Link: Site
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #algorithms #leetcode #programming
@data_science_weekly
Familiarize yourself with common data structures and algorithms such as lists, trees, maps, graphs, Big-O analysis, and more!
Topics:
- Maps/Dictionaries
- Linked Lists
- Trees
- Stacks & Queues
- Heaps
- Graphs
- Runtime Analysis
- Searching & Sorting
- Recursion & DP
Link: Site
Navigational hashtags: #armknowledgesharing #armtutorials
General hashtags: #algorithms #leetcode #programming
@data_science_weekly
👍7
Feature Engineering A-Z by Emil Hvitfeldt
This book is written to be used as a reference guide to nearly all feature engineering methods you will encounter. This is reflected in the chapter structure. Any question a practitioner is having should be answered by looking at the index and finding the right chapter.
Each section tries to be as comprehensive as possible with the number of different methods and solutions that are presented. A section on dimensionality reduction should list all the practical methods that could be used, as well as a comparison between the methods to help the reader decide what would be most appropriate. This does not mean that all methods are recommended to use. A number of these methods have little and narrow use cases. Methods that are deemed too domain-specific have been excluded from this book.
Each chapter will cover a specific method or small group of methods. This will include motivations and explanations for the method. Whenever possible each method will be accompanied by mathematical formulas and visualizations to illustrate the mechanics of the method. A small pros and cons list is provided for each method. Lastly, each section will include code snippets showcasing how to implement the methods. This is done in R and Python, using tidymodels and scikit-learn respectively. This book is a methods book first, and a coding book second.
Links:
- Site
- Repository
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering
@data_science_weekly
This book is written to be used as a reference guide to nearly all feature engineering methods you will encounter. This is reflected in the chapter structure. Any question a practitioner is having should be answered by looking at the index and finding the right chapter.
Each section tries to be as comprehensive as possible with the number of different methods and solutions that are presented. A section on dimensionality reduction should list all the practical methods that could be used, as well as a comparison between the methods to help the reader decide what would be most appropriate. This does not mean that all methods are recommended to use. A number of these methods have little and narrow use cases. Methods that are deemed too domain-specific have been excluded from this book.
Each chapter will cover a specific method or small group of methods. This will include motivations and explanations for the method. Whenever possible each method will be accompanied by mathematical formulas and visualizations to illustrate the mechanics of the method. A small pros and cons list is provided for each method. Lastly, each section will include code snippets showcasing how to implement the methods. This is done in R and Python, using tidymodels and scikit-learn respectively. This book is a methods book first, and a coding book second.
Links:
- Site
- Repository
Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering
@data_science_weekly
👍4