Data Science & Machine Learning
73.2K subscribers
791 photos
2 videos
68 files
690 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
What is AUC (AU ROC)? When to use it?

AUC stands for Area Under the ROC Curve. ROC is a probability curve and AUC represents degree or measure of separability. It's used when we need to value how much model is capable of distinguishing between classes. The value is between 0 and 1, the higher the better.
πŸ‘1
What is the PR (precision-recall) curve?

A precision-recall curve (or PR Curve) is a plot of the precision (y-axis) and the recall (x-axis) for different probability thresholds. Precision-recall curves (PR curves) are recommended for highly skewed domains where ROC curves may provide an excessively optimistic view of the performance.
What is the area under the PR curve? Is it a useful metric?

The Precision-Recall AUC is just like the ROC AUC, in that it summarizes the curve with a range of threshold values as a single score.

A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate.
What do we do with categorical variables?

Categorical variables must be encoded before they can be used as features to train a machine learning model. There are various encoding techniques, including:

One-hot encoding
Label encoding
Ordinal encoding
Target encoding
Which algorithm builds one tree at a time?
Anonymous Quiz
50%
Gradient boosting
50%
Random forest
Data Science & Machine Learning
Which algorithm builds one tree at a time?
What’s the difference between random forest and gradient boosting?

Random Forests builds each tree independently while Gradient Boosting builds one tree at a time.
Random Forests combine results at the end of the process (by averaging or "majority rules") while Gradient Boosting combines results along the way.
❀1
What happens to our linear regression model if we have three columns in our data: x, y, z β€Šβ€”β€Š and z is a sum of x and y?

We would not be able to perform the regression. Because z is linearly dependent on x and y so when performing the regression would be a singular (not invertible) matrix.
Today is the last day to get exclusive 75% discount by using coupon code JULY75
Everything you need to know about TensorFlow 2.0
Keras-APIs, SavedModels, TensorBoard, Keras-Tuner and more.

https://hackernoon.com/everything-you-need-to-know-about-tensorflow-2-0-b0856960c074?
Machine Learning for Everyone in some words

https://vas3k.com/blog/machine_learning/
πŸ‘1
Which regularization techniques do you know?

There are mainly two types of regularization,

L1 Regularization (Lasso regularization) - Adds the sum of absolute values of the coefficients to the cost function.
L2 Regularization (Ridge regularization) - Adds the sum of squares of coefficients to the cost function

Here, Lambda determines the amount of regularization.
How does L2 regularization look like in a linear model?

L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter.

This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.
Which of the following Python Library can be exclusively used to plot graphs?
Anonymous Quiz
9%
Numpy
85%
Matplotlib
3%
Keras
3%
Tensorflow
What are the main parameters in the gradient boosting model?

There are many parameters, but below are a few key defaults.

learning_rate=0.1 (shrinkage).
n_estimators=100 (number of trees).
max_depth=3.
min_samples_split=2.
min_samples_leaf=1.
subsample=1.0.
πŸŽ“ Introduction to Deep Learning (by MIT) πŸŽ“

This is one of the top high-quality courses to learn the foundational knowledge of deep learning.

All lectures have been uploaded. 100% Free!
https://youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI
πŸ‘4
What are the main parameters of the random forest model?

max_depth: Longest Path between root node and the leaf

min_sample_split: The minimum number of observations needed to split a given node

max_leaf_nodes: Conditions the splitting of the tree and hence, limits the growth of the trees

min_samples_leaf: minimum number of samples in the leaf node

n_estimators: Number of trees

max_sample: Fraction of original dataset given to any individual tree in the given model

max_features: Limits the maximum number of features provided to trees in random forest model
❀1
In which technique, data is unlabeled and the algorithms learn to inherent structure from the input data?
Anonymous Quiz
17%
Supervised Learning
75%
Unsupervised Learning
8%
Semi-supervised Learning
Which of the following is not a type of unsupervised Learning?
Anonymous Quiz
31%
Association problems
55%
Classification problems
14%
Clustering problems
Quiz Explaination

Supervised Learning: All data is labeled and the algorithms learn to predict the output from the
input data

Unsupervised Learning: All data is unlabeled and the algorithms learn to inherent structure from
the input data.

Semi-supervised Learning: Some data is labeled but most of it is unlabeled and a mixture of
supervised and unsupervised techniques can be used to solve problem.

Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.

Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy A also tend to buy B.
πŸ‘4
What is feature selection? Why do we need it?

Feature Selection is a method used to select the relevant features for the model to train on. We need feature selection to remove the irrelevant features which leads the model to under-perform.