Choosing the right chart type can make or break your data story. Todayβs tip: Use bar charts for comparisons. Use Line Chart For WoW, MoM, YoY Analysis. Whatβs your go-to chart?
β€1
9 Distance Metrics used in Data Science & Machine Learning.
In data science, distance measures are crucial for various tasks such as clustering, classification, and regression. Below are nine commonly used distance methods:
1. Euclidean Distance:
This measures the straight-line distance between two points in space, similar to measuring with a ruler.
2. Manhattan Distance (L1 Norm):
This distance is calculated by summing the absolute differences between the coordinates of the points, similar to navigating a grid-like city layout.
3. Minkowski Distance:
A general form of distance measurement that includes both Euclidean and Manhattan distances as special cases, depending on a parameter.
4. Chebyshev Distance:
This measures the maximum absolute difference between coordinates of the points, akin to the greatest difference along any dimension.
5. Cosine Similarity:
This assesses how similar two vectors are based on the angle between them, used to measure similarity rather than distance. For distance, it's often inverted.
6. Hamming Distance:
This counts the number of positions at which corresponding symbols differ, commonly used for comparing strings or binary data.
7. Jaccard Distance:
This measures the dissimilarity between two sets by comparing the size of their intersection relative to their union.
8. Mahalanobis Distance:
This measures the distance between a point and a distribution, accounting for correlations among variables, making it useful for multivariate data.
9. Bray-Curtis Distance:
This measures dissimilarity between two samples based on the differences in counts or proportions, often used in ecological and environmental studies.
These distance measures are essential tools in data science for tasks such as clustering, classification, and pattern recognition.
In data science, distance measures are crucial for various tasks such as clustering, classification, and regression. Below are nine commonly used distance methods:
1. Euclidean Distance:
This measures the straight-line distance between two points in space, similar to measuring with a ruler.
2. Manhattan Distance (L1 Norm):
This distance is calculated by summing the absolute differences between the coordinates of the points, similar to navigating a grid-like city layout.
3. Minkowski Distance:
A general form of distance measurement that includes both Euclidean and Manhattan distances as special cases, depending on a parameter.
4. Chebyshev Distance:
This measures the maximum absolute difference between coordinates of the points, akin to the greatest difference along any dimension.
5. Cosine Similarity:
This assesses how similar two vectors are based on the angle between them, used to measure similarity rather than distance. For distance, it's often inverted.
6. Hamming Distance:
This counts the number of positions at which corresponding symbols differ, commonly used for comparing strings or binary data.
7. Jaccard Distance:
This measures the dissimilarity between two sets by comparing the size of their intersection relative to their union.
8. Mahalanobis Distance:
This measures the distance between a point and a distribution, accounting for correlations among variables, making it useful for multivariate data.
9. Bray-Curtis Distance:
This measures dissimilarity between two samples based on the differences in counts or proportions, often used in ecological and environmental studies.
These distance measures are essential tools in data science for tasks such as clustering, classification, and pattern recognition.
π16
What is your preferred method for handling imbalanced datasets in machine learning?
1. Resampling techniques (oversampling/undersampling)
2. Synthetic data generation (SMOTE, ADASYN)
3. Algorithm-specific techniques (class weights, cost-sensitive learning)
4. Ensemble methods (bagging, boosting)
5. Other (share your approach in the comments below!) ππ
1. Resampling techniques (oversampling/undersampling)
2. Synthetic data generation (SMOTE, ADASYN)
3. Algorithm-specific techniques (class weights, cost-sensitive learning)
4. Ensemble methods (bagging, boosting)
5. Other (share your approach in the comments below!) ππ
In todayβs world,
itβs crucial to focus on leading technologies like full-stack development or AI/ML.
However, many students are just copying projects instead of learning. To succeed,
itβs important to work on real, hands-on projects and truly understand the concepts.
itβs crucial to focus on leading technologies like full-stack development or AI/ML.
However, many students are just copying projects instead of learning. To succeed,
itβs important to work on real, hands-on projects and truly understand the concepts.
π24
Has anyone went through interview for data science related roles recently? Feel free to share your experience π
π11π1
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data Science
Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.
1. Basic python and statistics
Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset
2. Advanced Statistics
Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset
3. Supervised Learning
a) Regression Problems
How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview
b) Classification problems
Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking
4. Some helpful Data science projects for beginners
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
https://www.kaggle.com/c/digit-recognizer
https://www.kaggle.com/c/titanic
5. Intermediate Level Data science Projects
Black Friday Data : https://www.kaggle.com/sdolezel/black-friday
Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones
Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset
Million Song Data : https://www.kaggle.com/c/msdchallenge
Census Income Data : https://www.kaggle.com/c/census-income/data
Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset
Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2
Share with credits: https://t.iss.one/sqlproject
ENJOY LEARNING ππ
Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.
1. Basic python and statistics
Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset
2. Advanced Statistics
Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset
3. Supervised Learning
a) Regression Problems
How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview
b) Classification problems
Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking
4. Some helpful Data science projects for beginners
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
https://www.kaggle.com/c/digit-recognizer
https://www.kaggle.com/c/titanic
5. Intermediate Level Data science Projects
Black Friday Data : https://www.kaggle.com/sdolezel/black-friday
Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones
Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset
Million Song Data : https://www.kaggle.com/c/msdchallenge
Census Income Data : https://www.kaggle.com/c/census-income/data
Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset
Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2
Share with credits: https://t.iss.one/sqlproject
ENJOY LEARNING ππ
π12β€6π1
Here are some of the most popular python project ideas: π‘
Simple Calculator
Text-Based Adventure Game
Number Guessing Game
Password Generator
Dice Rolling Simulator
Mad Libs Generator
Currency Converter
Leap Year Checker
Word Counter
Quiz Program
Email Slicer
Rock-Paper-Scissors Game
Web Scraper (Simple)
Text Analyzer
Interest Calculator
Unit Converter
Simple Drawing Program
File Organizer
BMI Calculator
Tic-Tac-Toe Game
To-Do List Application
Inspirational Quote Generator
Task Automation Script
Simple Weather App
Automate data cleaning and analysis (EDA)
Sales analysis
Sentiment analysis
Price prediction
Customer Segmentation
Time series forecasting
Image classification
Spam email detection
Credit card fraud detection
Market basket analysis
NLP, etc
These are just starting points. Feel free to explore, combine ideas, and personalize your projects based on your interest and skills. π―
Simple Calculator
Text-Based Adventure Game
Number Guessing Game
Password Generator
Dice Rolling Simulator
Mad Libs Generator
Currency Converter
Leap Year Checker
Word Counter
Quiz Program
Email Slicer
Rock-Paper-Scissors Game
Web Scraper (Simple)
Text Analyzer
Interest Calculator
Unit Converter
Simple Drawing Program
File Organizer
BMI Calculator
Tic-Tac-Toe Game
To-Do List Application
Inspirational Quote Generator
Task Automation Script
Simple Weather App
Automate data cleaning and analysis (EDA)
Sales analysis
Sentiment analysis
Price prediction
Customer Segmentation
Time series forecasting
Image classification
Spam email detection
Credit card fraud detection
Market basket analysis
NLP, etc
These are just starting points. Feel free to explore, combine ideas, and personalize your projects based on your interest and skills. π―
β€15π12π₯°1
What is your favorite machine learning project that you've worked on, and what made it memorable?
Share your experience below! π
Share your experience below! π
Data Science Projects
What is your favorite machine learning project that you've worked on, and what made it memorable? Share your experience below! π
This is a simple example of ML Project with the steps involved ππ
https://t.iss.one/datasciencefun/1800
https://t.iss.one/datasciencefun/1800
π2β€1
How do you stay updated with the latest advancements in machine learning and AI?
π
π
Some helpful Data science projects for beginners
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
https://www.kaggle.com/c/digit-recognizer
https://www.kaggle.com/c/titanic
BEST RESOURCES TO LEARN DATA SCIENCE AND MACHINE LEARNING FOR FREE
https://developers.google.com/machine-learning/crash-course
https://www.kaggle.com/learn/overview
https://forums.fast.ai/t/recommended-python-learning-resources/26888
https://www.fast.ai/
https://imp.i115008.net/JrBjZR
https://ern.li/OP/1qvkxbfaxqj
ENJOY LEARNING ππ
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
https://www.kaggle.com/c/digit-recognizer
https://www.kaggle.com/c/titanic
BEST RESOURCES TO LEARN DATA SCIENCE AND MACHINE LEARNING FOR FREE
https://developers.google.com/machine-learning/crash-course
https://www.kaggle.com/learn/overview
https://forums.fast.ai/t/recommended-python-learning-resources/26888
https://www.fast.ai/
https://imp.i115008.net/JrBjZR
https://ern.li/OP/1qvkxbfaxqj
ENJOY LEARNING ππ
π5β€2
Free Projects to Practice Data Analysis and Python Skills
Here are free hands-on projects from Coursera with no trial periods or card attachments required.
Each project takes about 8 hours to complete.
1. Web Scraping and Analyzing Data Analyst Job Listings with Python
In this project, you will help a recruitment agency find suitable job listings for their clients, giving them an edge over other job seekers. You'll need to extract job listing data from several websites, visualize, and analyze it.
π https://bit.ly/3W3jFRB
2. Analyzing Social Media Usage Data with Python
In this project, you will work as a data analyst at a marketing firm specializing in brand promotion on social media. Your task is to use Python to extract, clean, and analyze tweets in specific categories (health, family, food, etc.) and create visualizations.
π https://bit.ly/4bM1xlh
Here are free hands-on projects from Coursera with no trial periods or card attachments required.
Each project takes about 8 hours to complete.
1. Web Scraping and Analyzing Data Analyst Job Listings with Python
In this project, you will help a recruitment agency find suitable job listings for their clients, giving them an edge over other job seekers. You'll need to extract job listing data from several websites, visualize, and analyze it.
π https://bit.ly/3W3jFRB
2. Analyzing Social Media Usage Data with Python
In this project, you will work as a data analyst at a marketing firm specializing in brand promotion on social media. Your task is to use Python to extract, clean, and analyze tweets in specific categories (health, family, food, etc.) and create visualizations.
π https://bit.ly/4bM1xlh
π4β€1
Explain the features of Python / Say something about the benefits of using Python?
Python is a MUST for students and working professionals to become a great Software Engineer specially when they are working in Web Development Domain. I will list down some of the key advantages of learning Python:
β Simple and easy to learn:
* Learning python programming language is easy and fun.
* Compared to other language, like, Java or C++, its syntax is a way lot easier.
* You also donβt have to worry about the missing semicolons (;) in the end!
* It is more expressive means that it is more understandable and readable.
* Python is a great language for the beginner-level programmers.
* It supports the development of a wide range of applications from simple text processing to WWW browsers to games.
* Easy-to-learn β Python has few keywords, simple structure, and a clearly defined syntax. This makes it easy for Beginners to pick up the language quickly.
* Easy-to-read β Python code is more clearly defined and readable. It's almost like plain and simple English.
* Easy-to-maintain β Python's source code is fairly easy-to-maintain.
Features of Python
β Python is Interpreted β
* Python is processed at runtime by the interpreter.
* You do not need to compile your program before executing it. This is similar to PERL and PHP.
β Python is Interactive β
* Python has support for an interactive mode which allows interactive testing and debugging of snippets of code.
* You can open the interactive terminal also referred to as Python prompt and interact with the interpreter directly to write your programs.
β Python is Object-Oriented β
* Python not only supports functional and structured programming methods, but Object Oriented Principles.
β Scripting Language β
* Python can be used as a scripting language or it can be compliled to byte-code for building large applications.
β Dynammic language β
* It provides very high-level dynamic data types and supports dynamic type checking.
β Garbage collection β
* Garbage collection is a process where the objects that are no longer reachable are freed from memory.
* Memory management is very important while writing programs and python supports automatic garbage collection, which is one of the main problems in writing programs using C & C++.
β Large Open Source Community β
* Python has a large open source community and which is one of its main strength.
* And its libraries, from open source 118 thousand plus and counting.
* If you are stuck with an issue, you donβt have to worry at all because python has a huge community for help. So, if you have any queries, you can directly seek help from millions of python community members.
* A broad standard library β Python's bulk of the library is very portable and cross-platform compatible on UNIX, Windows, and Macintosh.
* Extendable β You can add low-level modules to the Python interpreter. These modules enable programmers to add to or customize their tools to be more efficient.
β Cross-platform Language β
* Python is a Cross-platform language or Portable language.
* Python can run on a wide variety of hardware platforms and has the same interface on all platforms.
* Python can run on different platforms such as Windows, Linux, Unix and Macintosh etc.
π14π1
What type of project do you enjoy working on the most?
1. Personal projects
2. Open-source contributions
3. Freelance work
4. Corporate projects
5. Academic projects
If any other, add in comments ππ
1. Personal projects
2. Open-source contributions
3. Freelance work
4. Corporate projects
5. Academic projects
If any other, add in comments ππ
π8β€3
Data Analytics is a wild career. One minute you're doing fancy product experimentation, statistics, and ML... and the next minute you're spending hours copying and pasting into an Excel doc while people tell you to hurry up.
π7π4π3π€2
SQL Interview Question for #DataScience:
A company has provided sales data containing information about customer purchases, as shown in the table below.
Your task is to:
Calculate Total Revenue
Calculate Total Sales by Product
Find Top Customers by Revenue
Solve it using SQL
A company has provided sales data containing information about customer purchases, as shown in the table below.
Your task is to:
Calculate Total Revenue
Calculate Total Sales by Product
Find Top Customers by Revenue
Solve it using SQL
π19β€2
Hi Guys,
Here are some of the telegram channels which may help you in data analytics journey ππ
SQL: https://t.iss.one/sqlanalyst
Power BI & Tableau: https://t.iss.one/PowerBI_analyst
Excel: https://t.iss.one/excel_analyst
Python: https://t.iss.one/dsabooks
Jobs: https://t.iss.one/jobs_SQL
Data Science: https://t.iss.one/datasciencefree
Artificial intelligence: https://t.iss.one/machinelearning_deeplearning
Data Engineering: https://t.iss.one/sql_engineer
Hope it helps :)
Here are some of the telegram channels which may help you in data analytics journey ππ
SQL: https://t.iss.one/sqlanalyst
Power BI & Tableau: https://t.iss.one/PowerBI_analyst
Excel: https://t.iss.one/excel_analyst
Python: https://t.iss.one/dsabooks
Jobs: https://t.iss.one/jobs_SQL
Data Science: https://t.iss.one/datasciencefree
Artificial intelligence: https://t.iss.one/machinelearning_deeplearning
Data Engineering: https://t.iss.one/sql_engineer
Hope it helps :)
π17β€5