The Data Science skill no one talks about...
Every aspiring data scientist I talk to thinks their job starts when someone else gives them:
1. a dataset, and
2. a clearly defined metric to optimize for, e.g. accuracy
But it doesnโt.
It starts with a business problem you need to understand, frame, and solve. This is the key data science skill that separates senior from junior professionals.
Letโs go through an example.
Example
Imagine you are a data scientist at Uber. And your product lead tells you:
We say that a user churns when she decides to stop using Uber.
But why?
There are different reasons why a user would stop using Uber. For example:
1. โLyft is offering better prices for that geoโ (pricing problem)
2. โCar waiting times are too longโ (supply problem)
3. โThe Android version of the app is very slowโ (client-app performance problem)
You build this list โ by asking the right questions to the rest of the team. You need to understand the userโs experience using the app, from HER point of view.
Typically there is no single reason behind churn, but a combination of a few of these. The question is: which one should you focus on?
This is when you pull out your great data science skills and EXPLORE THE DATA ๐.
You explore the data to understand how plausible each of the above explanations is. The output from this analysis is a single hypothesis you should consider further. Depending on the hypothesis, you will solve the data science problem differently.
For exampleโฆ
Scenario 1: โLyft Is Offering Better Pricesโ (Pricing Problem)
One solution would be to detect/predict the segment of users who are likely to churn (possibly using an ML Model) and send personalized discounts via push notifications. To test your solution works, you will need to run an A/B test, so you will split a percentage of Uber users into 2 groups:
The A group. No user in this group will receive any discount.
The B group. Users from this group that the model thinks are likely to churn, will receive a price discount in their next trip.
You could add more groups (e.g. C, D, Eโฆ) to test different pricing points.
1. Translating business problems into data science problems is the key data science skill that separates a senior from a junior data scientist.
2. Ask the right questions, list possible solutions, and explore the data to narrow down the list to one.
3. Solve this one data science problem
Every aspiring data scientist I talk to thinks their job starts when someone else gives them:
1. a dataset, and
2. a clearly defined metric to optimize for, e.g. accuracy
But it doesnโt.
It starts with a business problem you need to understand, frame, and solve. This is the key data science skill that separates senior from junior professionals.
Letโs go through an example.
Example
Imagine you are a data scientist at Uber. And your product lead tells you:
๐ฉโ๐ผ: โWe want to decrease user churn by 5% this quarterโ
We say that a user churns when she decides to stop using Uber.
But why?
There are different reasons why a user would stop using Uber. For example:
1. โLyft is offering better prices for that geoโ (pricing problem)
2. โCar waiting times are too longโ (supply problem)
3. โThe Android version of the app is very slowโ (client-app performance problem)
You build this list โ by asking the right questions to the rest of the team. You need to understand the userโs experience using the app, from HER point of view.
Typically there is no single reason behind churn, but a combination of a few of these. The question is: which one should you focus on?
This is when you pull out your great data science skills and EXPLORE THE DATA ๐.
You explore the data to understand how plausible each of the above explanations is. The output from this analysis is a single hypothesis you should consider further. Depending on the hypothesis, you will solve the data science problem differently.
For exampleโฆ
Scenario 1: โLyft Is Offering Better Pricesโ (Pricing Problem)
One solution would be to detect/predict the segment of users who are likely to churn (possibly using an ML Model) and send personalized discounts via push notifications. To test your solution works, you will need to run an A/B test, so you will split a percentage of Uber users into 2 groups:
The A group. No user in this group will receive any discount.
The B group. Users from this group that the model thinks are likely to churn, will receive a price discount in their next trip.
You could add more groups (e.g. C, D, Eโฆ) to test different pricing points.
In a nutshell
1. Translating business problems into data science problems is the key data science skill that separates a senior from a junior data scientist.
2. Ask the right questions, list possible solutions, and explore the data to narrow down the list to one.
3. Solve this one data science problem
๐7โค2๐ฅฐ1
FREE FREE FREE
10 Books on Data Science & Data Analysis will be posted on this channel daily basis
Book 1. Python for Data Analysis
Publisher: O'Reilly
wesmckinney.com/book/
Give it a like if you want me to continue โค๏ธ
10 Books on Data Science & Data Analysis will be posted on this channel daily basis
Book 1. Python for Data Analysis
Publisher: O'Reilly
wesmckinney.com/book/
Give it a like if you want me to continue โค๏ธ
๐24โค13
โค9๐1
Coding & Data Science Resources
2. Fundamentals of Data Visualization Publisher: O'Reilly clauswilke.com/dataviz/ Like for more โค๏ธ
Telegram
Coding & Data Science Resources
FREE FREE FREE
10 Books on Data Science & Data Analysis will be posted on this channel daily basis
Book 1. Python for Data Analysis
Publisher: O'Reilly
wesmckinney.com/book/
Give it a like if you want me to continue โค๏ธ
10 Books on Data Science & Data Analysis will be posted on this channel daily basis
Book 1. Python for Data Analysis
Publisher: O'Reilly
wesmckinney.com/book/
Give it a like if you want me to continue โค๏ธ
๐5
๐3
5. Data Science at the Command Line
Publisher: O'Reilly
jeroenjanssens.com/dsatcl/
10 Data Science Books
Publisher: O'Reilly
jeroenjanssens.com/dsatcl/
10 Data Science Books
Jeroenjanssens
Welcome | Data Science at the Command Line, 2e
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. Youโll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, andโฆ
Coding & Data Science Resources
5. Data Science at the Command Line Publisher: O'Reilly jeroenjanssens.com/dsatcl/ 10 Data Science Books
6. Introduction to Probability for Data Science
Publisher: Michigan University
probability4datascience.com
Publisher: Michigan University
probability4datascience.com
๐2
9. Kafka, The Definitive Guide
Publisher: O'Reilly
https://assets.confluent.io/m/2849a76e39cda2bd/original/20201119-EB-Kafka_The_Definitive_Guide-Preview-Chapters_1_thru_6.pdf
10 Data Science Books
Publisher: O'Reilly
https://assets.confluent.io/m/2849a76e39cda2bd/original/20201119-EB-Kafka_The_Definitive_Guide-Preview-Chapters_1_thru_6.pdf
10 Data Science Books
Coding & Data Science Resources
9. Kafka, The Definitive Guide Publisher: O'Reilly https://assets.confluent.io/m/2849a76e39cda2bd/original/20201119-EB-Kafka_The_Definitive_Guide-Preview-Chapters_1_thru_6.pdf 10 Data Science Books
10. Python Data Science Handbook
Publisher: O'Reilly
HTML: https://jakevdp.github.io/PythonDataScienceHandbook/
PDF: https://github.com/terencetachiona/Python-Data-Science-Handbook/blob/master/Python%20Data%20Science%20Handbook%20-%20Jake%20VanderPlas.pdf
10 Data Science Books
Publisher: O'Reilly
HTML: https://jakevdp.github.io/PythonDataScienceHandbook/
PDF: https://github.com/terencetachiona/Python-Data-Science-Handbook/blob/master/Python%20Data%20Science%20Handbook%20-%20Jake%20VanderPlas.pdf
10 Data Science Books
๐4
One day or Day one. You decide.
Data Science edition.
๐ข๐ป๐ฒ ๐๐ฎ๐ : I will learn SQL.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Download mySQL Workbench.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will build my projects for my portfolio.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Look on Kaggle for a dataset to work on.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will master statistics.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Start the free Khan Academy Statistics and Probability course.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will learn to tell stories with data.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Install Power BI and create my first chart.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will become a Data Data Analyst.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Update my resume and apply to some Data Science job postings.
Data Science edition.
๐ข๐ป๐ฒ ๐๐ฎ๐ : I will learn SQL.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Download mySQL Workbench.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will build my projects for my portfolio.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Look on Kaggle for a dataset to work on.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will master statistics.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Start the free Khan Academy Statistics and Probability course.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will learn to tell stories with data.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Install Power BI and create my first chart.
๐ข๐ป๐ฒ ๐๐ฎ๐: I will become a Data Data Analyst.
๐๐ฎ๐ ๐ข๐ป๐ฒ: Update my resume and apply to some Data Science job postings.
๐11๐ฅ1๐ฅฐ1