Data Science Projects

You're not a Data Scientist until you have...

☑ Built a dashboard that no one uses
☑ Googled the same SQL window function for the 100th time
☑ Convinced yourself that the weird outlier is definitely not a bug
☑ Built a complex analysis in Python when you could have done it in SQL
☑ Broken a data pipeline because you "forgot" to test before pushing to prod
☑ Tell your mom for the 8374th time that you're not a real scientist, and she should stop telling her friends that

<insert more here>

What else did I miss?

😁18❤16🤣11👍9👎1👏1

10.4K views04:44

Data Science Projects

Platforms to learn Data Science
👇👇
https://www.linkedin.com/posts/sql-analysts_datascience-machinelearning-ai-activity-7208370270485594112-8ODh

❤2

8.3K viewsedited 09:38

Data Science Projects

-- iv. Statistics
|   |
|   |-- b. Programming
|   |   |-- i. Python
|   |   |   |-- 1. Syntax and Basic Concepts
|   |   |   |-- 2. Data Structures
|   |   |   |-- 3. Control Structures
|   |   |   |-- 4. Functions
|   |   |

-- ii. R (optional, based on preference)
|   |
|   |-- c. Data Manipulation
|   |   |-- i. Numpy (Python)
|   |   |-- ii. Pandas (Python)
|   |

-- iii. Dplyr (R)
| |
|

-- d. Data Visualization
|       |-- i. Matplotlib (Python)
|       |-- ii. Seaborn (Python)
|

-- e. Data Scaling and Normalization
|
|-- 3. Machine Learning
|   |-- a. Supervised Learning
|   |   |-- i. Regression
|   |   |   |-- 1. Linear Regression
|   |   |

-- ii. Classification
|   |       |-- 1. Logistic Regression
|   |       |-- 2. k-Nearest Neighbors
|   |       |-- 3. Support Vector Machines
|   |       |-- 4. Decision Trees
|   |

-- 3. Hierarchical Clustering
|   |   |
|   |

-- ii. Dimensionality Reduction
|   |       |-- 1. Principal Component Analysis (PCA)
|   |       |-- 2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
|   |

-- 3. Linear Discriminant Analysis (LDA)
|   |
|   |-- c. Reinforcement Learning
|   |-- d. Model Evaluation and Validation
|   |   |-- i. Cross-validation
|   |   |-- ii. Hyperparameter Tuning
|   |

-- iii. Model Selection
| |
|

-- e. ML Libraries and Frameworks
|       |-- i. Scikit-learn (Python)
|       |-- ii. TensorFlow (Python)
|       |-- iii. Keras (Python)
|

-- ii. Multi-Layer Perceptron
|   |
|   |-- b. Convolutional Neural Networks (CNNs)
|   |   |-- i. Image Classification
|   |   |-- ii. Object Detection
|   |

-- iii. Sentiment Analysis
|   |
|   |-- d. Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU)
|   |   |-- i. Time Series Forecasting
|   |

-- ii. Language Modeling
| |
|

-- e. Generative Adversarial Networks (GANs)
|       |-- i. Image Synthesis
|       |-- ii. Style Transfer
|

-- ii. MapReduce
|   |
|   |-- b. Spark
|   |   |-- i. RDDs
|   |   |-- ii. DataFrames
|   |

-- iii. MLlib
| |
|

-- c. NoSQL Databases
|       |-- i. MongoDB
|       |-- ii. Cassandra
|       |-- iii. HBase
|

-- iv. Shiny (R)
|   |
|   |-- b. Storytelling with Data
|

-- e. Teamwork
|

-- 8. Staying Updated and Continuous Learning
    |-- a. Online Courses
    |-- b. Books and Research Papers
    |-- c. Blogs and Podcasts
    |-- d. Conferences and Workshops
    `-- e. Networking and Community Engagement

Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624

All the best 👍👍

👍33🔥2❤1

12K viewsedited 18:26

Data Science Projects

👍18❤7

10.2K views04:40

Data Science Projects

Forwarded from Health Fitness & Diet Tips - Gym Motivation 💪

The 80/20 Principle:

- Health: 80% eating, 20% excercising
- Wealth: 80% habits, 20% math
- Talking: 80% listening, 20% speaking
- Learning: 80% understanding, 20% reading
- Achieving: 80% doing, 20% dreaming
- Happiness: 80% purpose, 20% fun
- Relationships: 80% giving, 20% receiving
- Improving: 80% persistence, 20% ideas

Priories the 80% and the rest will fall into place

🔥26👍12❤5👎1

9.26K views06:06

Data Science Projects

What roles make it easier to get into Data Science?

Most of Data Scientists usually transitioned in from other roles

The most common ones, are - Data Analyst, Business Intelligence Engineer and Data Engineer.

For a fresher with only a bachelors degree, I would advise the Data Analyst role. Based on the team and work, you may in essence be able to work as a Data Scientist.

👍14👌2❤1

8.31K viewsedited 11:05

Data Science Projects

Hi guys,

This post is for all those who are confused with which path to go. Whether ai will take over what they're learning. See one thing is for sure, life is random, now ai is trending and in future it might be something else. So better be prepared with whatever comes up and never stop learning. Recently, I got a lesson from my friend. He follow a very good habit of investing money in stocks or crypto. One of my friend has earned around 90lakh+ by investing on a single stock. Even though, previously he encured some losses before earning this profit.

But investing is really a very good skill to have. I never did it before but now, I am also planning to learn investment skill moving further.

Investing allows you to put your money into businesses you believe in, potentially growing your wealth over time. It can also provide financial security and help you achieve your long-term financial goals, such as buying a home, funding education, or planning for retirement.

I will share my learnings with you guys here 👇👇
https://t.iss.one/stockmarketinginsights

Learning some additional skills gives you advantage of fighting with uncertainties in life. So, don't think too much before learning any new skill. Even though it's not required for now, but who knows if that's super useful in future. Keep learning & never give up.

What are your thoughts guys, let me know in comments 👇👇

👍8❤2

7.95K viewsedited 06:25

Data Science Projects

One of the very important and underrated skills while learning data science, machine learning, or any other new skill is patience.

Everything takes time, but patience helps you stay calm and focused. Learn from your mistakes, keep practicing, and steadily improve.

These early struggles will slowly turn into success 😄💪

👍9❤1🔥1

7.38K viewsedited 06:35

Data Science Projects

What is your preferred programming language for data manipulation?

1. Python
2. R
3. Julia
4. MATLAB
5. SAS

Feel free to mention any other language you prefer in the comments! 👇👇

👍10❤2

7.47K views18:48

Data Science Projects

How do we evaluate classification models?

Depending on the classification problem, we can use the following evaluation metrics:

Accuracy
Precision
Recall
F1 Score
Logistic loss (also known as Cross-entropy loss)
Jaccard similarity coefficient score

👍17❤5

7.37K views01:34

Data Science Projects

Which machine learning framework do you find most effective?

1. TensorFlow
2. PyTorch
3. Scikit-learn
4. Keras
5. XGBoost

If you have a different favorite, share it in the comments below! 👇👇

👍3

6.57K views18:50

Data Science Projects

Where to get data for your next machine learning project?

An overview of 5 amazing resources to accelerate your next project with data!

📌 Google Datasets
Easy to search Datasets on Google Dataset Search engine as it is to search for anything on Google Search! You just enter the topic on which you need to find a Dataset.

📌 Kaggle Dataset
Explore, analyze, and share quality data.

📌 Open Data on AWS
This registry exists to help people discover and share datasets that are available via AWS resources

📌 Awesome Public Datasets
A topic-centric list of HQ open datasets.

📌 Azure public data sets
Public data sets for testing and prototyping.

👍12❤4

8.3K views04:13

Data Science Projects

Can you write a program to print "Hello World" in python?

👍11

7.76K views07:22

Data Science Projects

Can you write a program to print "Hello World" in python?

Without using print statement 😁

👍5😁3❤1👎1

7.67K views07:23

Data Science Projects

Many of you already guessed it correctly. Brilliant people ❤️

Here is the correct solution

import sys
sys.stdout.write("Hello World\n")

👍17❤5🥰1

7.8K views10:43

Data Science Projects

What is your preferred method for handling missing data in datasets?

1. Imputation techniques (mean, median, mode)
2. Deleting rows/columns with missing data
3. Using predictive models for imputation
4. Handling missing data as a separate category
5. Other (please specify in comments) 👇👇

👍9❤1

6.71K views18:50

Data Science Projects

Forwarded from TrueMinds | Personality Development - Words of Wisdom & Life Quotes

Young people,

Go to the gym,
Even if you’re tired.

Start that business,
Even if you’re poor.

Invest in education,
Even if you’re broke.

Approach that boy or girl,
Even if you’re shy.

Do that work,
Even if you’re unmotivated.

You are a not weak.

Find a way to get things done.

TrueMinds

👍27❤12⚡2

6.33K views09:06

Data Science Projects

How to validate your models?

One of the most common approaches is splitting data into train, validation and test parts.

Models are trained on train data, hyperparameters (for example early stopping) are selected based on the validation data, the final measurement is done on test dataset.

Another approach is cross-validation: split dataset into K folds and each time train models on training folds and measure the performance on the validation folds.

Also you could combine these approaches: make a test/holdout dataset and do cross-validation on the rest of the data. The final quality is measured on test dataset.

👍6❤1

8.04K views11:38

Data Science Projects

How do you typically validate a machine learning model?

1. Train-test split
2. Cross-validation
3. Holdout validation
4. Bootstrap methods
5. Other (please specify in comments) 👇👇

👍7❤1

6.96K views15:07

Data Science Projects

Is accuracy always a good metric?

Accuracy is not a good performance metric when there is imbalance in the dataset. For example, in binary classification with 95% of A class and 5% of B class, a constant prediction of A class would have an accuracy of 95%. In case of imbalance dataset, we need to choose Precision, recall, or F1 Score depending on the problem we are trying to solve.

What are precision, recall, and F1-score?

Precision and recall are classification evaluation metrics:
P = TP / (TP + FP) and R = TP / (TP + FN).

Where TP is true positives, FP is false positives and FN is false negatives

In both cases the score of 1 is the best: we get no false positives or false negatives and only true positives.

F1 is a combination of both precision and recall in one score (harmonic mean):
F1 = 2 * PR / (P + R).
Max F score is 1 and min is 0, with 1 being the best.

👍16❤5

8.46K views01:39

About

Blog

Apps

Platform