Data Science & Machine Learning – Telegram

Data Science & Machine Learning

@datasciencefun

73.2K subscribers

791 photos

2 videos

68 files

690 links

Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data

Download Telegram

About

Blog

Apps

Platform

Data Science & Machine Learning

73.2K subscribers

Data Science & Machine Learning

What is AUC (AU ROC)? When to use it?

AUC stands for Area Under the ROC Curve. ROC is a probability curve and AUC represents degree or measure of separability. It's used when we need to value how much model is capable of distinguishing between classes. The value is between 0 and 1, the higher the better.

👍1

3.7K views04:29

Data Science & Machine Learning

What is the PR (precision-recall) curve?

A precision-recall curve (or PR Curve) is a plot of the precision (y-axis) and the recall (x-axis) for different probability thresholds. Precision-recall curves (PR curves) are recommended for highly skewed domains where ROC curves may provide an excessively optimistic view of the performance.

3.78K views04:31

Data Science & Machine Learning

What is the area under the PR curve? Is it a useful metric?

The Precision-Recall AUC is just like the ROC AUC, in that it summarizes the curve with a range of threshold values as a single score.

A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate.

4.01K views04:32

Data Science & Machine Learning

What do we do with categorical variables?

Categorical variables must be encoded before they can be used as features to train a machine learning model. There are various encoding techniques, including:

One-hot encoding
Label encoding
Ordinal encoding
Target encoding

3.51K views05:46

Data Science & Machine Learning

Which algorithm builds one tree at a time?

Anonymous Quiz

Gradient boosting

551 voters3.75K views05:51

Data Science & Machine Learning

Data Science & Machine Learning

Which algorithm builds one tree at a time?

What’s the difference between random forest and gradient boosting?

Random Forests builds each tree independently while Gradient Boosting builds one tree at a time.
Random Forests combine results at the end of the process (by averaging or "majority rules") while Gradient Boosting combines results along the way.

❤1

3.58K views04:41

Data Science & Machine Learning

What happens to our linear regression model if we have three columns in our data: x, y, z — and z is a sum of x and y?

We would not be able to perform the regression. Because z is linearly dependent on x and y so when performing the regression would be a singular (not invertible) matrix.

3.62K views13:32

Data Science & Machine Learning

Today is the last day to get exclusive 75% discount by using coupon code JULY75

3.6K views04:15

Data Science & Machine Learning

Everything you need to know about TensorFlow 2.0
Keras-APIs, SavedModels, TensorBoard, Keras-Tuner and more.

https://hackernoon.com/everything-you-need-to-know-about-tensorflow-2-0-b0856960c074?

3.59K views05:05

Data Science & Machine Learning

Machine Learning for Everyone in some words

https://vas3k.com/blog/machine_learning/

👏1

3.67K viewsedited 05:16

Data Science & Machine Learning

Which regularization techniques do you know?

There are mainly two types of regularization,

L1 Regularization (Lasso regularization) - Adds the sum of absolute values of the coefficients to the cost function.
L2 Regularization (Ridge regularization) - Adds the sum of squares of coefficients to the cost function

Here, Lambda determines the amount of regularization.

3.66K viewsedited 19:50

Data Science & Machine Learning

How does L2 regularization look like in a linear model?

L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter.

This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.

3.61K views20:01

Data Science & Machine Learning

Which of the following Python Library can be exclusively used to plot graphs?

Anonymous Quiz

751 voters3.91K views06:10

Data Science & Machine Learning

What are the main parameters in the gradient boosting model?

There are many parameters, but below are a few key defaults.

learning_rate=0.1 (shrinkage).
n_estimators=100 (number of trees).
max_depth=3.
min_samples_split=2.
min_samples_leaf=1.
subsample=1.0.

4.2K views06:15

Data Science & Machine Learning via @like

Pandas in 8 Pages.pdf

4.82K views19:25

👍 15 ❤️ 30

Data Science & Machine Learning

🎓 Introduction to Deep Learning (by MIT) 🎓

This is one of the top high-quality courses to learn the foundational knowledge of deep learning.

All lectures have been uploaded. 100% Free!
https://youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI

👍4

4.36K views06:15

Data Science & Machine Learning

What are the main parameters of the random forest model?

max_depth: Longest Path between root node and the leaf

min_sample_split: The minimum number of observations needed to split a given node

max_leaf_nodes: Conditions the splitting of the tree and hence, limits the growth of the trees

min_samples_leaf: minimum number of samples in the leaf node

n_estimators: Number of trees

max_sample: Fraction of original dataset given to any individual tree in the given model

max_features: Limits the maximum number of features provided to trees in random forest model

❤1

3.72K views10:15

Data Science & Machine Learning

In which technique, data is unlabeled and the algorithms learn to inherent structure from the input data?

Anonymous Quiz

Supervised Learning

Unsupervised Learning

Semi-supervised Learning

535 voters3.85K views18:22

Data Science & Machine Learning

Which of the following is not a type of unsupervised Learning?

Anonymous Quiz

Association problems

Classification problems

Clustering problems

542 voters3.87K views18:28

Data Science & Machine Learning

Quiz Explaination

Supervised Learning: All data is labeled and the algorithms learn to predict the output from the
input data

Unsupervised Learning: All data is unlabeled and the algorithms learn to inherent structure from
the input data.

Semi-supervised Learning: Some data is labeled but most of it is unlabeled and a mixture of
supervised and unsupervised techniques can be used to solve problem.

Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.

Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy A also tend to buy B.

👍4

4.42K viewsedited 04:29

Data Science & Machine Learning

What is feature selection? Why do we need it?

Feature Selection is a method used to select the relevant features for the model to train on. We need feature selection to remove the irrelevant features which leads the model to under-perform.

4.19K views20:20