Data Science Projects
51.9K subscribers
372 photos
1 video
57 files
329 links
Perfect channel for Data Scientists

Learn Python, AI, R, Machine Learning, Data Science and many more

Admin: @love_data
Download Telegram
Free Datasets to practice data science projects

1. Enron Email Dataset

Data Link: https://www.cs.cmu.edu/~enron/

2. Chatbot Intents Dataset

Data Link: https://github.com/katanaml/katana-assistant/blob/master/mlbackend/intents.json

3. Flickr 30k Dataset

Data Link: https://www.kaggle.com/hsankesara/flickr-image-dataset

4. Parkinson Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/parkinsons

5. Iris Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/Iris

6. ImageNet dataset

Data Link: https://www.image-net.org/

7. Mall Customers Dataset

Data Link: https://www.kaggle.com/shwetabh123/mall-customers

8. Google Trends Data Portal

Data Link: https://trends.google.com/trends/

9. The Boston Housing Dataset

Data Link: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

10. Uber Pickups Dataset

Data Link: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city

11. Recommender Systems Dataset

Data Link: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Source Code: https://bit.ly/37iBDEp

12. UCI Spambase Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/Spambase

13. GTSRB (German traffic sign recognition benchmark) Dataset

Data Link: https://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset

Source Code: https://bit.ly/39taSyH

14. Cityscapes Dataset

Data Link: https://www.cityscapes-dataset.com/

15. Kinetics Dataset

Data Link: https://deepmind.com/research/open-source/kinetics

16. IMDB-Wiki dataset

Data Link: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/


17. Color Detection Dataset

Data Link: https://github.com/codebrainz/color-names/blob/master/output/colors.csv


18. Urban Sound 8K dataset

Data Link: https://urbansounddataset.weebly.com/urbansound8k.html

19. Librispeech Dataset

Data Link: https://www.openslr.org/12

20. Breast Histopathology Images Dataset

Data Link: https://www.kaggle.com/paultimothymooney/breast-histopathology-images

21. Youtube 8M Dataset

Data Link: https://research.google.com/youtube8m/

Join for more -> https://t.iss.one/addlist/ID95piZJZa0wYzk5

ENJOY LEARNING 👍👍
👍275🥰1
Build a linear regression model in just 5 minutes 👇👇
https://www.instagram.com/reel/C6BTlcqA6fl/?igsh=MTM5ZG02anB3aHZwNQ==
👍5
Top Platforms for Building Data Science Portfolio

Build an irresistible portfolio that hooks recruiters with these free platforms.

Landing a job as a data scientist begins with building your portfolio with a comprehensive list of all your projects. To help you get started with building your portfolio, here is the list of top data science platforms. Remember the stronger your portfolio, the better chances you have of landing your dream job.

1. GitHub
2. Kaggle
3. LinkedIn
4. Medium
5. MachineHack
6. DagsHub
7. HuggingFace
👍295🥰2
🚨30 FREE Dataset Sources for Data Science Projects🔥

Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/

US Government Dataset: https://www.data.gov/

Open Government Data (OGD) Platform India: https://data.gov.in/

The World Bank Open Data: https://data.worldbank.org/

Data World: https://data.world/

BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics

The Humanitarian Data Exchange (HDX): https://data.humdata.org/

Data at World Health Organization (WHO): https://www.who.int/data

FBI’s Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/

AWS Open Data Registry: https://registry.opendata.aws/

FiveThirtyEight: https://data.fivethirtyeight.com/

IMDb Datasets: https://www.imdb.com/interfaces/

Kaggle: https://www.kaggle.com/datasets

UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php

Google Dataset Search: https://datasetsearch.research.google.com/

Nasdaq Data Link: https://data.nasdaq.com/

Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Reddit - Datasets: https://www.reddit.com/r/datasets/

Open Data Network by Socrata: https://www.opendatanetwork.com/

Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/

Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/

IEEE Data Port: https://ieee-dataport.org/

Wikipedia: Database: https://dumps.wikimedia.org/

BuzzFeed News: https://github.com/BuzzFeedNews/everything

Academic Torrents: https://academictorrents.com/

Yelp Open Dataset: https://www.yelp.com/dataset

The NLP Index by Quantum Stat: https://index.quantumstat.com/

Computer Vision Online: https://www.computervisiononline.com/dataset

Visual Data Discovery: https://www.visualdata.io/

Roboflow Public Datasets: https://public.roboflow.com/

Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
👍174🔥1
If you're into deep learning, then you know that students usually one of the two paths:

- Computer vision
- Natural language processing (NLP)

If you're into NLP, here are 5 fundamental concepts you should know:If you're into deep learning, then you know that students usually one of the two paths:

- Computer vision
- Natural language processing (NLP)

If you're into NLP, here are 5 fundamental concepts you should know:

Before we start, What is NLP?

Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and humans through language.

It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful.

Data scientists need NLP to analyze, process, and generate insights from large volumes of textual data, aiding in tasks ranging from sentiment analysis to automated summarization.

Tokenization

Tokenization involves breaking down text into smaller units, such as words or phrases. This is the first step in preprocessing textual data for further analysis or NLP applications.

Part-of-Speech Tagging:

This process involves identifying the part of speech for each word in a sentence (e.g., noun, verb, adjective). It is crucial for various NLP tasks that require understanding the grammatical structure of text.

Stemming and Lemmatization

These techniques reduce words to their base or root form. Stemming cuts off prefixes and suffixes, while lemmatization considers the morphological analysis of the words, leading to more accurate results.

Named Entity Recognition (NER)

NER identifies and classifies named entities in text into predefined categories such as the names of persons, organizations, locations, etc. It's essential for tasks like data extraction from documents and content classification.

Sentiment Analysis

This technique determines the emotional tone behind a body of text. It's widely used in business and social media monitoring to gauge public opinion and customer sentiment.

That's a wrap! Which natural language processing/ computer vision concepts do you know?

Like for more 😄

Share our channel with friends: https://t.iss.one/pythonspecialist
👍154🔥4
Struggling with Machine Learning algorithms? 🤖

Then you better stay with me! 🤓

We are going back to the basics to simplify ML algorithms.
... today's turn is Logistic Regression! 👇🏻

1️⃣ 𝗟𝗢𝗚𝗜𝗦𝗧𝗜𝗖 𝗥𝗘𝗚𝗥𝗘𝗦𝗦𝗜𝗢𝗡
It is a binary classification model used to classify our input data into two main categories.

It can be extended to multiple classifications... but today we'll focus on a binary one.

Also known as Simple Logistic Regression.

2️⃣ 𝗛𝗢𝗪 𝗧𝗢 𝗖𝗢𝗠𝗣𝗨𝗧𝗘 𝗜𝗧?
The Sigmoid Function is our mathematical wand, turning numbers into neat probabilities between 0 and 1.

It's what makes Logistic Regression tick, giving us a clear 'probabilistic' picture.

3️⃣ 𝗛𝗢𝗪 𝗧𝗢 𝗗𝗘𝗙𝗜𝗡𝗘 𝗧𝗛𝗘 𝗕𝗘𝗦𝗧 𝗙𝗜𝗧?
For every parametric ML algorithm, we need a LOSS FUNCTION.

It is our map to find our optimal solution or global minimum.

(hoping there is one! 😉)

✚ 𝗕𝗢𝗡𝗨𝗦 - FROM LINEAR TO LOGISTIC REGRESSION
To obtain the sigmoid function, we can derive it from the Linear Regression equation.
11👍4👏1😁1
Complete Numpy Cheatsheet
👍127💋4
Data Science Projects
Photo
Need more Cheatsheet like this?
Anonymous Poll
97%
Yes
3%
No
👍12
Here are a few project ideas that could help you stand out:

Quantitative Analysis of Financial Data: Create a project where you analyze historical financial data using statistical methods and time series analysis to identify patterns, correlations, and trends in the data.

Development of Trading Strategies: Design and backtest quantitative trading strategies using historical market data. Showcase your ability to develop, test, and optimize algorithmic trading models.
Risk Management Simulation: Build a simulation model to assess and manage financial risk. This could involve implementing Value at Risk (VaR) models or stress testing methodologies.

Machine Learning for Finance: Explore the application of machine learning algorithms to financial markets. Develop a project that uses machine learning for stock price prediction, sentiment analysis of news articles, or credit risk assessment.

Financial Modeling and Valuation: Create detailed financial models for companies or investment opportunities. This could include building discounted cash flow (DCF) models, comparable company analysis, and merger and acquisition (M&A) valuation.

Portfolio Optimization: Develop a project that focuses on portfolio optimization techniques, such as modern portfolio theory, mean-variance optimization, or factor modeling.

By working on these projects, you can demonstrate your skills in quantitative analysis, financial modeling, and programming, which are highly valued in the field of quantitative finance.

Additionally, consider sharing your projects on platforms like GitHub or creating a personal website to showcase your work to potential employers.
👍168🥰1
Hey guys,
What's up, what are you all working on or learning these days?
Let me know in comments 😄👇
4
👍16🔥10
Hey guys,
What you all are planning to do this weekend?
My plan: Brush up Machine Learning and Statistics concepts 😄
👍18👏2