How does L2 regularization look like in a linear model?
L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter.
This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.
L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter.
This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.
Which of the following Python Library can be exclusively used to plot graphs?
Anonymous Quiz
9%
Numpy
85%
Matplotlib
3%
Keras
3%
Tensorflow
What are the main parameters in the gradient boosting model?
There are many parameters, but below are a few key defaults.
learning_rate=0.1 (shrinkage).
n_estimators=100 (number of trees).
max_depth=3.
min_samples_split=2.
min_samples_leaf=1.
subsample=1.0.
There are many parameters, but below are a few key defaults.
learning_rate=0.1 (shrinkage).
n_estimators=100 (number of trees).
max_depth=3.
min_samples_split=2.
min_samples_leaf=1.
subsample=1.0.
🎓 Introduction to Deep Learning (by MIT) 🎓
This is one of the top high-quality courses to learn the foundational knowledge of deep learning.
All lectures have been uploaded. 100% Free!
https://youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI
This is one of the top high-quality courses to learn the foundational knowledge of deep learning.
All lectures have been uploaded. 100% Free!
https://youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI
👍4
What are the main parameters of the random forest model?
max_depth: Longest Path between root node and the leaf
min_sample_split: The minimum number of observations needed to split a given node
max_leaf_nodes: Conditions the splitting of the tree and hence, limits the growth of the trees
min_samples_leaf: minimum number of samples in the leaf node
n_estimators: Number of trees
max_sample: Fraction of original dataset given to any individual tree in the given model
max_features: Limits the maximum number of features provided to trees in random forest model
max_depth: Longest Path between root node and the leaf
min_sample_split: The minimum number of observations needed to split a given node
max_leaf_nodes: Conditions the splitting of the tree and hence, limits the growth of the trees
min_samples_leaf: minimum number of samples in the leaf node
n_estimators: Number of trees
max_sample: Fraction of original dataset given to any individual tree in the given model
max_features: Limits the maximum number of features provided to trees in random forest model
❤1
In which technique, data is unlabeled and the algorithms learn to inherent structure from the input data?
Anonymous Quiz
17%
Supervised Learning
75%
Unsupervised Learning
8%
Semi-supervised Learning
Which of the following is not a type of unsupervised Learning?
Anonymous Quiz
31%
Association problems
55%
Classification problems
14%
Clustering problems
Quiz Explaination
Supervised Learning: All data is labeled and the algorithms learn to predict the output from the
input data
Unsupervised Learning: All data is unlabeled and the algorithms learn to inherent structure from
the input data.
Semi-supervised Learning: Some data is labeled but most of it is unlabeled and a mixture of
supervised and unsupervised techniques can be used to solve problem.
Unsupervised learning problems can be further grouped into clustering and association problems.
Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy A also tend to buy B.
Supervised Learning: All data is labeled and the algorithms learn to predict the output from the
input data
Unsupervised Learning: All data is unlabeled and the algorithms learn to inherent structure from
the input data.
Semi-supervised Learning: Some data is labeled but most of it is unlabeled and a mixture of
supervised and unsupervised techniques can be used to solve problem.
Unsupervised learning problems can be further grouped into clustering and association problems.
Clustering: A clustering problem is where you want to discover the inherent groupings
in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy A also tend to buy B.
👍4
What is feature selection? Why do we need it?
Feature Selection is a method used to select the relevant features for the model to train on. We need feature selection to remove the irrelevant features which leads the model to under-perform.
Feature Selection is a method used to select the relevant features for the model to train on. We need feature selection to remove the irrelevant features which leads the model to under-perform.
What are the decision trees?
This is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables.
In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.
A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a value for the target variable.
Various techniques : like Gini, Information Gain, Chi-square, entropy.
This is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables.
In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.
A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a value for the target variable.
Various techniques : like Gini, Information Gain, Chi-square, entropy.
👍1
What are the benefits of a single decision tree compared to more complex models?
easy to implement
fast training
fast inference
good explainability
easy to implement
fast training
fast inference
good explainability
👍1
🤓 Technical Python concepts tested in the data science job interviews are:
- Data types.
- Built-in data structures.
- User-defined data structures.
- Built-in functions.
- Loops and conditionals.
- External libraries (Pandas).
Source Article: https://www.kdnuggets.com/2021/07/top-python-data-science-interview-questions.html
- Data types.
- Built-in data structures.
- User-defined data structures.
- Built-in functions.
- Loops and conditionals.
- External libraries (Pandas).
Source Article: https://www.kdnuggets.com/2021/07/top-python-data-science-interview-questions.html
Some interview questions related to Data science
1- what is difference between structured data and unstructured data.
2- what is multicollinearity.and how to remove them
3- which algorithms you use to find the most correlated features in the datasets.
4- define entropy
5- what is the workflow of principal component analysis
6- what are the applications of principal component analysis not with respect to dimensionality reduction
7- what is the Convolutional neural network. Explain me its working
1- what is difference between structured data and unstructured data.
2- what is multicollinearity.and how to remove them
3- which algorithms you use to find the most correlated features in the datasets.
4- define entropy
5- what is the workflow of principal component analysis
6- what are the applications of principal component analysis not with respect to dimensionality reduction
7- what is the Convolutional neural network. Explain me its working
👍1
Some useful Machine Learning projects for practice
https://elitedatascience.com/machine-learning-projects-for-beginners
https://hackernoon.com/top-5-machine-learning-projects-for-beginners-47b184e7837f
https://www.springboard.com/blog/machine-learning-projects
https://www.ubuntupit.com/top-20-best-machine-learning-projects-for-beginner-to-professional
https://elitedatascience.com/machine-learning-projects-for-beginners
https://hackernoon.com/top-5-machine-learning-projects-for-beginners-47b184e7837f
https://www.springboard.com/blog/machine-learning-projects
https://www.ubuntupit.com/top-20-best-machine-learning-projects-for-beginner-to-professional
👍2❤1
What are precision, recall, and F1-score?
Precision and recall are classification evaluation metrics:
P = TP / (TP + FP) and R = TP / (TP + FN).
Where TP is true positives, FP is false positives and FN is false negatives
In both cases the score of 1 is the best: we get no false positives or false negatives and only true positives.
F1 is a combination of both precision and recall in one score (harmonic mean):
F1 = 2 * PR / (P + R).
Max F score is 1 and min is 0, with 1 being the best.
Precision and recall are classification evaluation metrics:
P = TP / (TP + FP) and R = TP / (TP + FN).
Where TP is true positives, FP is false positives and FN is false negatives
In both cases the score of 1 is the best: we get no false positives or false negatives and only true positives.
F1 is a combination of both precision and recall in one score (harmonic mean):
F1 = 2 * PR / (P + R).
Max F score is 1 and min is 0, with 1 being the best.
👍1
What is unsupervised learning?
Unsupervised learning aims to detect patterns in the data where no labels are given.
Unsupervised learning aims to detect patterns in the data where no labels are given.
Would you prefer gradient boosting trees model or logistic regression when doing text classification with bag of words?
Usually logistic regression is better because bag of words creates a matrix with large number of columns. For a huge number of columns logistic regression is usually faster than gradient boosting trees.
Usually logistic regression is better because bag of words creates a matrix with large number of columns. For a huge number of columns logistic regression is usually faster than gradient boosting trees.
❤1
What is clustering? When do we need it?
Clustering algorithms group objects such that similar feature points are put into the same groups (clusters) and dissimilar feature points are put into different clusters.
Clustering algorithms group objects such that similar feature points are put into the same groups (clusters) and dissimilar feature points are put into different clusters.
What is bag of words? How we can use it for text classification?
Bag of Words is a representation of text that describes the occurrence of words within a document. The order or structure of the words is not considered. For text classification, we look at the histogram of the words within the text and consider each word count as a feature.
Bag of Words is a representation of text that describes the occurrence of words within a document. The order or structure of the words is not considered. For text classification, we look at the histogram of the words within the text and consider each word count as a feature.
❤1
Source codes for data science projects 👇👇
1. Build chatbots:
https://dzone.com/articles/python-chatbot-project-build-your-first-python-pro
2. Credit card fraud detection:
https://www.kaggle.com/renjithmadhavan/credit-card-fraud-detection-using-python
3. Fake news detection
https://data-flair.training/blogs/advanced-python-project-detecting-fake-news/
4.Driver Drowsiness Detection
https://data-flair.training/blogs/python-project-driver-drowsiness-detection-system/
5. Recommender Systems (Movie Recommendation)
https://data-flair.training/blogs/data-science-r-movie-recommendation/
6. Sentiment Analysis
https://data-flair.training/blogs/data-science-r-sentiment-analysis-project/
7. Gender Detection & Age Prediction
https://www.pyimagesearch.com/2020/04/13/opencv-age-detection-with-deep-learning/
𝗘𝗡𝗝𝗢𝗬 𝗟𝗘𝗔𝗥𝗡𝗜𝗡𝗚👍👍
1. Build chatbots:
https://dzone.com/articles/python-chatbot-project-build-your-first-python-pro
2. Credit card fraud detection:
https://www.kaggle.com/renjithmadhavan/credit-card-fraud-detection-using-python
3. Fake news detection
https://data-flair.training/blogs/advanced-python-project-detecting-fake-news/
4.Driver Drowsiness Detection
https://data-flair.training/blogs/python-project-driver-drowsiness-detection-system/
5. Recommender Systems (Movie Recommendation)
https://data-flair.training/blogs/data-science-r-movie-recommendation/
6. Sentiment Analysis
https://data-flair.training/blogs/data-science-r-sentiment-analysis-project/
7. Gender Detection & Age Prediction
https://www.pyimagesearch.com/2020/04/13/opencv-age-detection-with-deep-learning/
𝗘𝗡𝗝𝗢𝗬 𝗟𝗘𝗔𝗥𝗡𝗜𝗡𝗚👍👍
❤2👍2