π¨30 FREE Dataset Sources for Data Science Projectsπ₯
Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/
US Government Dataset: https://www.data.gov/
Open Government Data (OGD) Platform India: https://data.gov.in/
The World Bank Open Data: https://data.worldbank.org/
Data World: https://data.world/
BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics
The Humanitarian Data Exchange (HDX): https://data.humdata.org/
Data at World Health Organization (WHO): https://www.who.int/data
FBIβs Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/
AWS Open Data Registry: https://registry.opendata.aws/
FiveThirtyEight: https://data.fivethirtyeight.com/
IMDb Datasets: https://www.imdb.com/interfaces/
Kaggle: https://www.kaggle.com/datasets
UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php
Google Dataset Search: https://datasetsearch.research.google.com/
Nasdaq Data Link: https://data.nasdaq.com/
Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html
Reddit - Datasets: https://www.reddit.com/r/datasets/
Open Data Network by Socrata: https://www.opendatanetwork.com/
Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/
Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/
IEEE Data Port: https://ieee-dataport.org/
Wikipedia: Database: https://dumps.wikimedia.org/
BuzzFeed News: https://github.com/BuzzFeedNews/everything
Academic Torrents: https://academictorrents.com/
Yelp Open Dataset: https://www.yelp.com/dataset
The NLP Index by Quantum Stat: https://index.quantumstat.com/
Computer Vision Online: https://www.computervisiononline.com/dataset
Visual Data Discovery: https://www.visualdata.io/
Roboflow Public Datasets: https://public.roboflow.com/
Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/
US Government Dataset: https://www.data.gov/
Open Government Data (OGD) Platform India: https://data.gov.in/
The World Bank Open Data: https://data.worldbank.org/
Data World: https://data.world/
BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics
The Humanitarian Data Exchange (HDX): https://data.humdata.org/
Data at World Health Organization (WHO): https://www.who.int/data
FBIβs Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/
AWS Open Data Registry: https://registry.opendata.aws/
FiveThirtyEight: https://data.fivethirtyeight.com/
IMDb Datasets: https://www.imdb.com/interfaces/
Kaggle: https://www.kaggle.com/datasets
UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php
Google Dataset Search: https://datasetsearch.research.google.com/
Nasdaq Data Link: https://data.nasdaq.com/
Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html
Reddit - Datasets: https://www.reddit.com/r/datasets/
Open Data Network by Socrata: https://www.opendatanetwork.com/
Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/
Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/
IEEE Data Port: https://ieee-dataport.org/
Wikipedia: Database: https://dumps.wikimedia.org/
BuzzFeed News: https://github.com/BuzzFeedNews/everything
Academic Torrents: https://academictorrents.com/
Yelp Open Dataset: https://www.yelp.com/dataset
The NLP Index by Quantum Stat: https://index.quantumstat.com/
Computer Vision Online: https://www.computervisiononline.com/dataset
Visual Data Discovery: https://www.visualdata.io/
Roboflow Public Datasets: https://public.roboflow.com/
Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
π16β€4π₯1π1
sergio-j-rojas-g-learning-scipy-for-numerical-and-2015.pdf
3.5 MB
Learning SciPy for Numerical and Scientific Computing
Sergio J. Rojas G., 2015
Sergio J. Rojas G., 2015
π7
Marketing Research with R and Python.pdf
22.7 MB
Marketing Research with R and Python
Howard Pong Yuen Lam, 2023
Howard Pong Yuen Lam, 2023
π4
1680810253047.docx
54.8 KB
One of the most effective ways to learn machine learning is by getting hands-on experience and building something yourself.
While finding inspiration can be challenging, exploring projects by others can open your eyes to the endless possibilities. π‘
The projects I am sharing are perfect for those new to machine learning and curious about its potential.
While finding inspiration can be challenging, exploring projects by others can open your eyes to the endless possibilities. π‘
The projects I am sharing are perfect for those new to machine learning and curious about its potential.
π7
20 Python Libraries You Aren't Using (But Should).pdf
4.1 MB
20 Python Libraries You
Arenβt Using (But Should)
Caleb Hattingh, 2016
Arenβt Using (But Should)
Caleb Hattingh, 2016
Advice from 25 Amazing Data Scientist.pdf
2.8 MB
Resource Pdf :- Advice from 25 Amazing Data Scientists.
Source :- Jake Klamka
Source :- Jake Klamka
π5β€2
Python for Data Analysts - Quick Summary (1).pdf
64.4 KB
π4β€2π2
Data Scientist Roadmap
|
|-- 1. Basic Foundations
| |-- a. Mathematics
| | |-- i. Linear Algebra
| | |-- ii. Calculus
| | |-- iii. Probability
| |
| | |
| |
| |
|
|
|-- 2. Data Exploration and Preprocessing
| |-- a. Exploratory Data Analysis (EDA)
| |-- b. Feature Engineering
| |-- c. Data Cleaning
| |-- d. Handling Missing Data
|
| | |
| |
| |
| |-- b. Unsupervised Learning
| | |-- i. Clustering
| | | |-- 1. K-means
| | | |-- 2. DBSCAN
| | |
| | |-- 1. Principal Component Analysis (PCA)
| | |-- 2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
| |
| |
|
|
|-- 4. Deep Learning
| |-- a. Neural Networks
| | |-- i. Perceptron
| |
| |
| |-- c. Recurrent Neural Networks (RNNs)
| | |-- i. Sequence-to-Sequence Models
| | |-- ii. Text Classification
| |
| |
|
|
|-- 5. Big Data Technologies
| |-- a. Hadoop
| | |-- i. HDFS
| |
| |
|
|
|-- 6. Data Visualization and Reporting
| |-- a. Dashboarding Tools
| | |-- i. Tableau
| | |-- ii. Power BI
| | |-- iii. Dash (Python)
| |
|
|-- 7. Domain Knowledge and Soft Skills
| |-- a. Industry-specific Knowledge
| |-- b. Problem-solving
| |-- c. Communication Skills
| |-- d. Time Management
|
|-- a. Online Courses
|-- b. Books and Research Papers
|-- c. Blogs and Podcasts
|-- d. Conferences and Workshops
`-- e. Networking and Community Engagement
|
|-- 1. Basic Foundations
| |-- a. Mathematics
| | |-- i. Linear Algebra
| | |-- ii. Calculus
| | |-- iii. Probability
| |
-- iv. Statistics
| |
| |-- b. Programming
| | |-- i. Python
| | | |-- 1. Syntax and Basic Concepts
| | | |-- 2. Data Structures
| | | |-- 3. Control Structures
| | | |-- 4. Functions
| | | -- 5. Object-Oriented Programming| | |
| |
-- ii. R (optional, based on preference)
| |
| |-- c. Data Manipulation
| | |-- i. Numpy (Python)
| | |-- ii. Pandas (Python)
| | -- iii. Dplyr (R)| |
|
-- d. Data Visualization
| |-- i. Matplotlib (Python)
| |-- ii. Seaborn (Python)
| -- iii. ggplot2 (R)|
|-- 2. Data Exploration and Preprocessing
| |-- a. Exploratory Data Analysis (EDA)
| |-- b. Feature Engineering
| |-- c. Data Cleaning
| |-- d. Handling Missing Data
|
-- e. Data Scaling and Normalization
|
|-- 3. Machine Learning
| |-- a. Supervised Learning
| | |-- i. Regression
| | | |-- 1. Linear Regression
| | | -- 2. Polynomial Regression| | |
| |
-- ii. Classification
| | |-- 1. Logistic Regression
| | |-- 2. k-Nearest Neighbors
| | |-- 3. Support Vector Machines
| | |-- 4. Decision Trees
| | -- 5. Random Forest| |
| |-- b. Unsupervised Learning
| | |-- i. Clustering
| | | |-- 1. K-means
| | | |-- 2. DBSCAN
| | |
-- 3. Hierarchical Clustering
| | |
| | -- ii. Dimensionality Reduction| | |-- 1. Principal Component Analysis (PCA)
| | |-- 2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
| |
-- 3. Linear Discriminant Analysis (LDA)
| |
| |-- c. Reinforcement Learning
| |-- d. Model Evaluation and Validation
| | |-- i. Cross-validation
| | |-- ii. Hyperparameter Tuning
| | -- iii. Model Selection| |
|
-- e. ML Libraries and Frameworks
| |-- i. Scikit-learn (Python)
| |-- ii. TensorFlow (Python)
| |-- iii. Keras (Python)
| -- iv. PyTorch (Python)|
|-- 4. Deep Learning
| |-- a. Neural Networks
| | |-- i. Perceptron
| |
-- ii. Multi-Layer Perceptron
| |
| |-- b. Convolutional Neural Networks (CNNs)
| | |-- i. Image Classification
| | |-- ii. Object Detection
| | -- iii. Image Segmentation| |
| |-- c. Recurrent Neural Networks (RNNs)
| | |-- i. Sequence-to-Sequence Models
| | |-- ii. Text Classification
| |
-- iii. Sentiment Analysis
| |
| |-- d. Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU)
| | |-- i. Time Series Forecasting
| | -- ii. Language Modeling| |
|
-- e. Generative Adversarial Networks (GANs)
| |-- i. Image Synthesis
| |-- ii. Style Transfer
| -- iii. Data Augmentation|
|-- 5. Big Data Technologies
| |-- a. Hadoop
| | |-- i. HDFS
| |
-- ii. MapReduce
| |
| |-- b. Spark
| | |-- i. RDDs
| | |-- ii. DataFrames
| | -- iii. MLlib| |
|
-- c. NoSQL Databases
| |-- i. MongoDB
| |-- ii. Cassandra
| |-- iii. HBase
| -- iv. Couchbase|
|-- 6. Data Visualization and Reporting
| |-- a. Dashboarding Tools
| | |-- i. Tableau
| | |-- ii. Power BI
| | |-- iii. Dash (Python)
| |
-- iv. Shiny (R)
| |
| |-- b. Storytelling with Data
| -- c. Effective Communication|
|-- 7. Domain Knowledge and Soft Skills
| |-- a. Industry-specific Knowledge
| |-- b. Problem-solving
| |-- c. Communication Skills
| |-- d. Time Management
|
-- e. Teamwork
|
-- 8. Staying Updated and Continuous Learning|-- a. Online Courses
|-- b. Books and Research Papers
|-- c. Blogs and Podcasts
|-- d. Conferences and Workshops
`-- e. Networking and Community Engagement
π35π₯°2