Data Science Projects
51.9K subscribers
372 photos
1 video
57 files
329 links
Perfect channel for Data Scientists

Learn Python, AI, R, Machine Learning, Data Science and many more

Admin: @love_data
Download Telegram
Here are some project ideas for a data science and machine learning project focused on generating AI:

1. Natural Language Generation (NLG) Model: Build a model that generates human-like text based on input data. This could be used for creating product descriptions, news articles, or personalized recommendations.

2. Code Generation Model: Develop a model that generates code snippets based on a given task or problem statement. This could help automate software development tasks or assist programmers in writing code more efficiently.

3. Image Captioning Model: Create a model that generates captions for images, describing the content of the image in natural language. This could be useful for visually impaired individuals or for enhancing image search capabilities.

4. Music Generation Model: Build a model that generates music compositions based on input data, such as existing songs or musical patterns. This could be used for creating background music for videos or games.

5. Video Synthesis Model: Develop a model that generates realistic video sequences based on input data, such as a series of images or a textual description. This could be used for generating synthetic training data for computer vision models.

6. Chatbot Generation Model: Create a model that generates conversational agents or chatbots based on input data, such as dialogue datasets or user interactions. This could be used for customer service automation or virtual assistants.

7. Art Generation Model: Build a model that generates artistic images or paintings based on input data, such as art styles, color palettes, or themes. This could be used for creating unique digital artwork or personalized designs.

8. Story Generation Model: Develop a model that generates fictional stories or narratives based on input data, such as plot outlines, character descriptions, or genre preferences. This could be used for creative writing prompts or interactive storytelling applications.

9. Recipe Generation Model: Create a model that generates new recipes based on input data, such as ingredient lists, dietary restrictions, or cuisine preferences. This could be used for meal planning or culinary inspiration.

10. Financial Report Generation Model: Build a model that generates financial reports or summaries based on input data, such as company financial statements, market trends, or investment portfolios. This could be used for automated financial analysis or decision-making support.

Any project which sounds interesting to you?
πŸ‘25❀8πŸ”₯2πŸ‘1
ETL vs REVERSE ETL vs ELT
πŸ‘10πŸ”₯4πŸ‘Ž1
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data Science

Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.

1. Basic python and statistics

Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset

2. Advanced Statistics

Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset

3. Supervised Learning

a) Regression Problems

How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview

b) Classification problems

Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking

4. Some helpful Data science projects for beginners

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/digit-recognizer

https://www.kaggle.com/c/titanic

5. Intermediate Level Data science Projects

Black Friday Data : https://www.kaggle.com/sdolezel/black-friday

Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones

Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset

Million Song Data : https://www.kaggle.com/c/msdchallenge

Census Income Data : https://www.kaggle.com/c/census-income/data

Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset

Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2

Share with credits: https://t.iss.one/sqlproject

ENJOY LEARNING πŸ‘πŸ‘
πŸ‘16❀8πŸ‘Ž1πŸ‘1πŸ‘Œ1
Sharing 20+ Diverse DatasetsπŸ“Š for Data Science and Analytics practice!


1. How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview

2. Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand

3. Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction

4. Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data

5. Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction

6. Iris Dataset: https://archive.ics.uci.edu/ml/datasets/iris

7. Titanic Dataset: https://www.kaggle.com/c/titanic

8. Wine Quality Dataset: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

9. Heart Disease Dataset: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

10. Bengaluru House Price Dataset: https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data

11. Breast Cancer Dataset: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

12. Credit Card Fraud Detection: https://www.kaggle.com/mlg-ulb/creditcardfraud

13. Netflix Movies and TV Shows: https://www.kaggle.com/shivamb/netflix-shows

14. Trending YouTube Video Statistics: https://www.kaggle.com/datasnaek/youtube-new

15. Walmart Store Sales Forecasting: https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting

16. FIFA 19 Complete Player Dataset: https://www.kaggle.com/karangadiya/fifa19

17. World Happiness Report: https://www.kaggle.com/unsdsn/world-happiness

18. TMDB 5000 Movie Dataset: https://www.kaggle.com/tmdb/tmdb-movie-metadata

19. Students Performance in Exams: https://www.kaggle.com/spscientist/students-performance-in-exams

20. Twitter Sentiment Analysis Dataset: https://www.kaggle.com/kazanova/sentiment140

21. Digit Recognizer: https://www.kaggle.com/c/digit-recognizer


πŸ’»πŸ” Don't miss out on these valuable resources for advancing your data science journey!
πŸ‘20❀5πŸ‘Ž1πŸ₯°1
❀8πŸ‘1πŸ”₯1
5️⃣ responses that you can use when you don’t know the answer to an interview question!

βœ… Option 1 (Shows that you are motivated)
Thank you for asking this question. However, I am not very well acquainted with this subject. But, I can assure you I will definitely do some research around this.

βœ… Option 2 (Shows that you are a fast-learner)
Right now I won’t be able to provide you with an exact answer. But I can assure you that I am a fast learner and will learn very quickly under your mentorship.

βœ… Option 3 (Best for technical questions)
I can’t think of an exact answer. I would request you to allow me some time and we can come back on this later.

βœ… Option 4 (If you don’t understand the question)
I did not understand the question properly. Could you please simplify and rephrase it? I don’t want to misinterpret it.

βœ… Option 5 (Redirect the conversation to the topic you are confident about)
While I don’t have much experience in X skill, I do have proper knowledge of Y. If the job requires me to learn X skill I will be excited to expand my knowledge.

Join for more: https://t.iss.one/englishlearnerspro

Remember that Interviewers don’t even expect you to answer every question perfectly. However, simply saying β€˜I don’t know’ could leave a bad impression. These responses will help you to tackle tricky interview questions!
πŸ‘15❀10
Free Datasets to practice data science projects

1. Enron Email Dataset

Data Link: https://www.cs.cmu.edu/~enron/

2. Chatbot Intents Dataset

Data Link: https://github.com/katanaml/katana-assistant/blob/master/mlbackend/intents.json

3. Flickr 30k Dataset

Data Link: https://www.kaggle.com/hsankesara/flickr-image-dataset

4. Parkinson Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/parkinsons

5. Iris Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/Iris

6. ImageNet dataset

Data Link: https://www.image-net.org/

7. Mall Customers Dataset

Data Link: https://www.kaggle.com/shwetabh123/mall-customers

8. Google Trends Data Portal

Data Link: https://trends.google.com/trends/

9. The Boston Housing Dataset

Data Link: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

10. Uber Pickups Dataset

Data Link: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city

11. Recommender Systems Dataset

Data Link: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Source Code: https://bit.ly/37iBDEp

12. UCI Spambase Dataset

Data Link: https://archive.ics.uci.edu/ml/datasets/Spambase

13. GTSRB (German traffic sign recognition benchmark) Dataset

Data Link: https://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset

Source Code: https://bit.ly/39taSyH

14. Cityscapes Dataset

Data Link: https://www.cityscapes-dataset.com/

15. Kinetics Dataset

Data Link: https://deepmind.com/research/open-source/kinetics

16. IMDB-Wiki dataset

Data Link: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/


17. Color Detection Dataset

Data Link: https://github.com/codebrainz/color-names/blob/master/output/colors.csv


18. Urban Sound 8K dataset

Data Link: https://urbansounddataset.weebly.com/urbansound8k.html

19. Librispeech Dataset

Data Link: https://www.openslr.org/12

20. Breast Histopathology Images Dataset

Data Link: https://www.kaggle.com/paultimothymooney/breast-histopathology-images

21. Youtube 8M Dataset

Data Link: https://research.google.com/youtube8m/

Join for more -> https://t.iss.one/addlist/ID95piZJZa0wYzk5

ENJOY LEARNING πŸ‘πŸ‘
πŸ‘27❀5πŸ₯°1
Build a linear regression model in just 5 minutes πŸ‘‡πŸ‘‡
https://www.instagram.com/reel/C6BTlcqA6fl/?igsh=MTM5ZG02anB3aHZwNQ==
πŸ‘5
Top Platforms for Building Data Science Portfolio

Build an irresistible portfolio that hooks recruiters with these free platforms.

Landing a job as a data scientist begins with building your portfolio with a comprehensive list of all your projects. To help you get started with building your portfolio, here is the list of top data science platforms. Remember the stronger your portfolio, the better chances you have of landing your dream job.

1. GitHub
2. Kaggle
3. LinkedIn
4. Medium
5. MachineHack
6. DagsHub
7. HuggingFace
πŸ‘29❀5πŸ₯°2
🚨30 FREE Dataset Sources for Data Science ProjectsπŸ”₯

Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/

US Government Dataset: https://www.data.gov/

Open Government Data (OGD) Platform India: https://data.gov.in/

The World Bank Open Data: https://data.worldbank.org/

Data World: https://data.world/

BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics

The Humanitarian Data Exchange (HDX): https://data.humdata.org/

Data at World Health Organization (WHO): https://www.who.int/data

FBI’s Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/

AWS Open Data Registry: https://registry.opendata.aws/

FiveThirtyEight: https://data.fivethirtyeight.com/

IMDb Datasets: https://www.imdb.com/interfaces/

Kaggle: https://www.kaggle.com/datasets

UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php

Google Dataset Search: https://datasetsearch.research.google.com/

Nasdaq Data Link: https://data.nasdaq.com/

Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Reddit - Datasets: https://www.reddit.com/r/datasets/

Open Data Network by Socrata: https://www.opendatanetwork.com/

Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/

Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/

IEEE Data Port: https://ieee-dataport.org/

Wikipedia: Database: https://dumps.wikimedia.org/

BuzzFeed News: https://github.com/BuzzFeedNews/everything

Academic Torrents: https://academictorrents.com/

Yelp Open Dataset: https://www.yelp.com/dataset

The NLP Index by Quantum Stat: https://index.quantumstat.com/

Computer Vision Online: https://www.computervisiononline.com/dataset

Visual Data Discovery: https://www.visualdata.io/

Roboflow Public Datasets: https://public.roboflow.com/

Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
πŸ‘17❀4πŸ”₯1
If you're into deep learning, then you know that students usually one of the two paths:

- Computer vision
- Natural language processing (NLP)

If you're into NLP, here are 5 fundamental concepts you should know:If you're into deep learning, then you know that students usually one of the two paths:

- Computer vision
- Natural language processing (NLP)

If you're into NLP, here are 5 fundamental concepts you should know:

Before we start, What is NLP?

Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and humans through language.

It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful.

Data scientists need NLP to analyze, process, and generate insights from large volumes of textual data, aiding in tasks ranging from sentiment analysis to automated summarization.

Tokenization

Tokenization involves breaking down text into smaller units, such as words or phrases. This is the first step in preprocessing textual data for further analysis or NLP applications.

Part-of-Speech Tagging:

This process involves identifying the part of speech for each word in a sentence (e.g., noun, verb, adjective). It is crucial for various NLP tasks that require understanding the grammatical structure of text.

Stemming and Lemmatization

These techniques reduce words to their base or root form. Stemming cuts off prefixes and suffixes, while lemmatization considers the morphological analysis of the words, leading to more accurate results.

Named Entity Recognition (NER)

NER identifies and classifies named entities in text into predefined categories such as the names of persons, organizations, locations, etc. It's essential for tasks like data extraction from documents and content classification.

Sentiment Analysis

This technique determines the emotional tone behind a body of text. It's widely used in business and social media monitoring to gauge public opinion and customer sentiment.

That's a wrap! Which natural language processing/ computer vision concepts do you know?

Like for more πŸ˜„

Share our channel with friends: https://t.iss.one/pythonspecialist
πŸ‘15❀4πŸ”₯4
Struggling with Machine Learning algorithms? πŸ€–

Then you better stay with me! πŸ€“

We are going back to the basics to simplify ML algorithms.
... today's turn is Logistic Regression! πŸ‘‡πŸ»

1️⃣ π—Ÿπ—’π—šπ—œπ—¦π—§π—œπ—– π—₯π—˜π—šπ—₯π—˜π—¦π—¦π—œπ—’π—‘
It is a binary classification model used to classify our input data into two main categories.

It can be extended to multiple classifications... but today we'll focus on a binary one.

Also known as Simple Logistic Regression.

2️⃣ 𝗛𝗒π—ͺ 𝗧𝗒 π—–π—’π— π—£π—¨π—§π—˜ π—œπ—§?
The Sigmoid Function is our mathematical wand, turning numbers into neat probabilities between 0 and 1.

It's what makes Logistic Regression tick, giving us a clear 'probabilistic' picture.

3️⃣ 𝗛𝗒π—ͺ 𝗧𝗒 π——π—˜π—™π—œπ—‘π—˜ π—§π—›π—˜ π—•π—˜π—¦π—§ π—™π—œπ—§?
For every parametric ML algorithm, we need a LOSS FUNCTION.

It is our map to find our optimal solution or global minimum.

(hoping there is one! πŸ˜‰)

✚ 𝗕𝗒𝗑𝗨𝗦 - FROM LINEAR TO LOGISTIC REGRESSION
To obtain the sigmoid function, we can derive it from the Linear Regression equation.
❀11πŸ‘4πŸ‘1😁1
Complete Numpy Cheatsheet
πŸ‘12❀7πŸ’‹4