Which of the following maybe involved in the data science project?
Anonymous Quiz
3%
Data Cleaning
4%
Data Visualization
2%
Feature selection
3%
Exploratory data analysis
89%
All of the above
Which of the following is not a machine learning algorithm?
Anonymous Quiz
2%
Linear Regression
6%
K-means clustering
87%
Data Cleaning
5%
Logistic Regression
DATA SCIENCE INTERVIEW QUESTIONS
[ PART - 13]
๐1. ๐๐จ๐ฐ ๐ญ๐จ ๐ข๐๐๐ง๐ญ๐ข๐๐ฒ ๐ ๐๐๐ฎ๐ฌ๐ ๐ฏ๐ฌ. ๐ ๐๐จ๐ซ๐ซ๐๐ฅ๐๐ญ๐ข๐จ๐ง? ๐๐ข๐ฏ๐ ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐๐ฌ.
Ans. While causation and correlation can exist at the same time, correlation does not imply causation. Causation explicitly applies to cases where action A causes outcome B. On the other hand, correlation is simply a relationship. Correlation between Ice cream sales and sunglasses sold. As the sales of ice creams is increasing so do the sales of sunglasses. Causation takes a step further than correlation.
๐2. ๐ฉ๐ซ๐๐๐ข๐ฌ๐ข๐จ๐ง, ๐๐๐๐ฎ๐ซ๐๐๐ฒ ๐๐ง๐ ๐ซ๐๐๐๐ฅ๐ฅ?
Ans. The recall is the ratio of the relevant results returned by the search engine to the total number of the relevant results that could have been returned. The precision is the proportion of relevant results in the list of all returned search results. Accuracy is the measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training, data.
๐3. ๐๐ก๐จ๐จ๐ฌ๐ ๐ค ๐ข๐ง ๐ค-๐ฆ๐๐๐ง๐ฌ?
Ans. There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.
๐4. ๐ฐ๐จ๐ซ๐2๐ฏ๐๐ ๐ฆ๐๐ญ๐ก๐จ๐๐ฌ?
Ans. Word2vec is a technique for natural language processing published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.
๐5. P๐ซ๐ฎ๐ง๐ข๐ง๐ ๐ข๐ง ๐๐๐ฌ๐ ๐จ๐ ๐๐๐๐ข๐ฌ๐ข๐จ๐ง ๐ญ๐ซ๐๐๐ฌ?
Ans. Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances.
ENJOY LEARNING ๐๐
[ PART - 13]
๐1. ๐๐จ๐ฐ ๐ญ๐จ ๐ข๐๐๐ง๐ญ๐ข๐๐ฒ ๐ ๐๐๐ฎ๐ฌ๐ ๐ฏ๐ฌ. ๐ ๐๐จ๐ซ๐ซ๐๐ฅ๐๐ญ๐ข๐จ๐ง? ๐๐ข๐ฏ๐ ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐๐ฌ.
Ans. While causation and correlation can exist at the same time, correlation does not imply causation. Causation explicitly applies to cases where action A causes outcome B. On the other hand, correlation is simply a relationship. Correlation between Ice cream sales and sunglasses sold. As the sales of ice creams is increasing so do the sales of sunglasses. Causation takes a step further than correlation.
๐2. ๐ฉ๐ซ๐๐๐ข๐ฌ๐ข๐จ๐ง, ๐๐๐๐ฎ๐ซ๐๐๐ฒ ๐๐ง๐ ๐ซ๐๐๐๐ฅ๐ฅ?
Ans. The recall is the ratio of the relevant results returned by the search engine to the total number of the relevant results that could have been returned. The precision is the proportion of relevant results in the list of all returned search results. Accuracy is the measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training, data.
๐3. ๐๐ก๐จ๐จ๐ฌ๐ ๐ค ๐ข๐ง ๐ค-๐ฆ๐๐๐ง๐ฌ?
Ans. There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.
๐4. ๐ฐ๐จ๐ซ๐2๐ฏ๐๐ ๐ฆ๐๐ญ๐ก๐จ๐๐ฌ?
Ans. Word2vec is a technique for natural language processing published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.
๐5. P๐ซ๐ฎ๐ง๐ข๐ง๐ ๐ข๐ง ๐๐๐ฌ๐ ๐จ๐ ๐๐๐๐ข๐ฌ๐ข๐จ๐ง ๐ญ๐ซ๐๐๐ฌ?
Ans. Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances.
ENJOY LEARNING ๐๐
๐3
Which of the following is used to read csv file in python using pandas?
import pandas as pd
import pandas as pd
Anonymous Quiz
10%
pd.readcsv(file.csv)
80%
pd.read_csv("file.csv")
6%
pd.read(file)
4%
pd(read_csv.file)
๐๐จ๐๐๐ฒ'๐ฌ ๐ข๐ง๐ญ๐๐ซ๐ฏ๐ข๐๐ฐ ๐๐ฎ๐๐ฌ๐ญ ๐ ๐๐ง๐ฌ
DATA SCIENCE INTERVIEW QUESTIONS
[PART - 14]
๐1. ๐ ๐๐๐ญ๐ฎ๐ซ๐ ๐ฌ๐๐ฅ๐๐๐ญ๐ข๐จ๐ง ๐ฆ๐๐ญ๐ก๐จ๐๐ฌ ๐๐จ๐ซ ๐ฌ๐๐ฅ๐๐๐ญ๐ข๐ง๐ ๐ญ๐ก๐ ๐ซ๐ข๐ ๐ก๐ญ ๐ฏ๐๐ซ๐ข๐๐๐ฅ๐๐ฌ ๐๐จ๐ซ ๐๐ฎ๐ข๐ฅ๐๐ข๐ง๐ ๐๐๐๐ข๐๐ข๐๐ง๐ญ ๐ฉ๐ซ๐๐๐ข๐๐ญ๐ข๐ฏ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ?
Ans. Some of the Feature selection techniques are: Information Gain, Chi-square test, Correlation Coefficient, Mean Absolute Difference (MAD), Exhaustive selection, Forward selection, Regularization.
๐2. ๐๐ซ๐๐๐ญ ๐ฆ๐ข๐ฌ๐ฌ๐ข๐ง๐ ๐ฏ๐๐ฅ๐ฎ๐๐ฌ?
Ans. They are:
1. List wise or case deletion
2. Pairwise deletion
3. Mean substitution
4. Regression imputation
5. Maximum likelihood.
๐3. ๐๐ฌ๐ฌ๐ฎ๐ฆ๐ฉ๐ญ๐ข๐จ๐ง๐ฌ ๐ฎ๐ฌ๐๐ ๐ข๐ง ๐ฅ๐ข๐ง๐๐๐ซ ๐ซ๐๐ ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง? ๐๐ก๐๐ญ ๐ฐ๐จ๐ฎ๐ฅ๐ ๐ก๐๐ฉ๐ฉ๐๐ง ๐ข๐ ๐ญ๐ก๐๐ฒ ๐๐ซ๐ ๐ฏ๐ข๐จ๐ฅ๐๐ญ๐๐?
Ans. 1. Linear relationship.
2. Multivariate normality.
3. no or little multicollinearity.
4. no auto-correlation.
5. Homoscedasticity.
Data to be analyzed by linear regression were sampled violate one or more of the linear regression assumptions, the results of the analysis may be incorrect or misleading.
๐4. ๐๐จ๐ฐ ๐ข๐ฌ ๐ญ๐ก๐ ๐ ๐ซ๐ข๐ ๐ฌ๐๐๐ซ๐๐ก ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ ๐๐ข๐๐๐๐ซ๐๐ง๐ญ ๐๐ซ๐จ๐ฆ ๐ญ๐ก๐ ๐ซ๐๐ง๐๐จ๐ฆ ๐ฌ๐๐๐ซ๐๐ก ๐ญ๐ฎ๐ง๐ข๐ง๐ ๐ฌ๐ญ๐ซ๐๐ญ๐๐ ๐ฒ?
Ans. Random search differs from grid search in that we no longer provide an explicit set of possible values for each hyperparameter; rather, we provide a statistical distribution for each hyperparameter from which values are sampled. Essentially, we define a sampling distribution for each hyperparameter to carry out a randomized search.
๐5. ๐๐ฌ ๐ข๐ญ ๐ ๐จ๐จ๐ ๐ญ๐จ ๐๐จ ๐๐ข๐ฆ๐๐ง๐ฌ๐ข๐จ๐ง๐๐ฅ๐ข๐ญ๐ฒ ๐ซ๐๐๐ฎ๐๐ญ๐ข๐จ๐ง ๐๐๐๐จ๐ซ๐ ๐๐ข๐ญ๐ญ๐ข๐ง๐ ๐ ๐๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐๐๐๐ญ๐จ๐ซ ๐๐จ๐๐๐ฅ?
๐ns. Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations.
๐6. ๐๐๐ ๐๐ฎ๐ซ๐ฏ๐?
Ans ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.
ENJOY LEARNING ๐๐
DATA SCIENCE INTERVIEW QUESTIONS
[PART - 14]
๐1. ๐ ๐๐๐ญ๐ฎ๐ซ๐ ๐ฌ๐๐ฅ๐๐๐ญ๐ข๐จ๐ง ๐ฆ๐๐ญ๐ก๐จ๐๐ฌ ๐๐จ๐ซ ๐ฌ๐๐ฅ๐๐๐ญ๐ข๐ง๐ ๐ญ๐ก๐ ๐ซ๐ข๐ ๐ก๐ญ ๐ฏ๐๐ซ๐ข๐๐๐ฅ๐๐ฌ ๐๐จ๐ซ ๐๐ฎ๐ข๐ฅ๐๐ข๐ง๐ ๐๐๐๐ข๐๐ข๐๐ง๐ญ ๐ฉ๐ซ๐๐๐ข๐๐ญ๐ข๐ฏ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ?
Ans. Some of the Feature selection techniques are: Information Gain, Chi-square test, Correlation Coefficient, Mean Absolute Difference (MAD), Exhaustive selection, Forward selection, Regularization.
๐2. ๐๐ซ๐๐๐ญ ๐ฆ๐ข๐ฌ๐ฌ๐ข๐ง๐ ๐ฏ๐๐ฅ๐ฎ๐๐ฌ?
Ans. They are:
1. List wise or case deletion
2. Pairwise deletion
3. Mean substitution
4. Regression imputation
5. Maximum likelihood.
๐3. ๐๐ฌ๐ฌ๐ฎ๐ฆ๐ฉ๐ญ๐ข๐จ๐ง๐ฌ ๐ฎ๐ฌ๐๐ ๐ข๐ง ๐ฅ๐ข๐ง๐๐๐ซ ๐ซ๐๐ ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง? ๐๐ก๐๐ญ ๐ฐ๐จ๐ฎ๐ฅ๐ ๐ก๐๐ฉ๐ฉ๐๐ง ๐ข๐ ๐ญ๐ก๐๐ฒ ๐๐ซ๐ ๐ฏ๐ข๐จ๐ฅ๐๐ญ๐๐?
Ans. 1. Linear relationship.
2. Multivariate normality.
3. no or little multicollinearity.
4. no auto-correlation.
5. Homoscedasticity.
Data to be analyzed by linear regression were sampled violate one or more of the linear regression assumptions, the results of the analysis may be incorrect or misleading.
๐4. ๐๐จ๐ฐ ๐ข๐ฌ ๐ญ๐ก๐ ๐ ๐ซ๐ข๐ ๐ฌ๐๐๐ซ๐๐ก ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ ๐๐ข๐๐๐๐ซ๐๐ง๐ญ ๐๐ซ๐จ๐ฆ ๐ญ๐ก๐ ๐ซ๐๐ง๐๐จ๐ฆ ๐ฌ๐๐๐ซ๐๐ก ๐ญ๐ฎ๐ง๐ข๐ง๐ ๐ฌ๐ญ๐ซ๐๐ญ๐๐ ๐ฒ?
Ans. Random search differs from grid search in that we no longer provide an explicit set of possible values for each hyperparameter; rather, we provide a statistical distribution for each hyperparameter from which values are sampled. Essentially, we define a sampling distribution for each hyperparameter to carry out a randomized search.
๐5. ๐๐ฌ ๐ข๐ญ ๐ ๐จ๐จ๐ ๐ญ๐จ ๐๐จ ๐๐ข๐ฆ๐๐ง๐ฌ๐ข๐จ๐ง๐๐ฅ๐ข๐ญ๐ฒ ๐ซ๐๐๐ฎ๐๐ญ๐ข๐จ๐ง ๐๐๐๐จ๐ซ๐ ๐๐ข๐ญ๐ญ๐ข๐ง๐ ๐ ๐๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐๐๐๐ญ๐จ๐ซ ๐๐จ๐๐๐ฅ?
๐ns. Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations.
๐6. ๐๐๐ ๐๐ฎ๐ซ๐ฏ๐?
Ans ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.
ENJOY LEARNING ๐๐
โค1๐1
DATA SCIENCE INTERVIEW QUESTIONS
[PART -15]
๐1. ๐๐๐๐ฅ ๐ฐ๐ข๐ญ๐ก ๐ฎ๐ง๐๐๐ฅ๐๐ง๐๐๐ ๐๐ข๐ง๐๐ซ๐ฒ ๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง?
๐ns. Techniques to Handle unbalanced Data:
1. Use the right evaluation metrics
2. Use K-fold Cross-Validation in the right way
3. Ensemble different resampled datasets
4. Resample with different ratios
5. Design your own models
๐2. ๐๐๐ญ๐ข๐ฏ๐๐ญ๐ข๐จ๐ง ๐๐ฎ๐ง๐๐ญ๐ข๐จ๐ง?
๐ns. Activation functions are mathematical equations that determine the output of a neural network model. It is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output.
๐3. ๐๐ข๐ฆ๐๐ง๐ฌ๐ข๐จ๐ง ๐ซ๐๐๐ฎ๐๐ญ๐ข๐จ๐ง?
๐ns. Dimensionality Reduction is used to reduce the feature space with consideration by a set of principal features.
๐4. ๐๐ก๐ฒ ๐ข๐ฌ ๐ฆ๐๐๐ง ๐ฌ๐ช๐ฎ๐๐ซ๐ ๐๐ซ๐ซ๐จ๐ซ ๐ ๐๐๐ ๐ฆ๐๐๐ฌ๐ฎ๐ซ๐ ๐จ๐ ๐ฆ๐จ๐๐๐ฅ ๐ฉ๐๐ซ๐๐จ๐ซ๐ฆ๐๐ง๐๐?
๐ns. Mean Squared Error (MSE) gives a relatively high weight to large errors โ therefore, MSE tends to put too much emphasis on large deviations.
๐5. ๐๐๐ฆ๐จ๐ฏ๐ ๐ฆ๐ฎ๐ฅ๐ญ๐ข๐๐จ๐ฅ๐ฅ๐ข๐ง๐๐๐ซ๐ข๐ญ๐ฒ?
๐ns. To remove multicollinearities, we can do two things.
1. We can create new features
2. remove them from our data.
๐6. ๐ฅ๐จ๐ง๐ -๐ญ๐๐ข๐ฅ๐๐ ๐๐ข๐ฌ๐ญ๐ซ๐ข๐๐ฎ๐ญ๐ข๐จ๐ง ?
๐ns. A long tail distribution of numbers is a kind of distribution having many occurrences far from the "head" or central part of the distribution. Most of occurrences in this kind of distributions occurs at early frequencies/values of x-axis.
๐7. ๐๐ฎ๐ญ๐ฅ๐ข๐๐ซ? ๐๐๐๐ฅ ๐ฐ๐ข๐ญ๐ก ๐ข๐ญ?
๐ns. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution error.
Removing outliers is legitimate only for specific reasons. Outliers can be very informative about the subject-area and data collection process. If the outlier does not change the results but does affect assumptions, you may drop the outlier. Or just trim the data set, but replace outliers with the nearest โgoodโ data, as opposed to truncating them completely.
๐8. ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐ ๐ฐ๐ก๐๐ซ๐ ๐ญ๐ก๐ ๐ฆ๐๐๐ข๐๐ง ๐ข๐ฌ ๐ ๐๐๐ญ๐ญ๐๐ซ ๐ฆ๐๐๐ฌ๐ฎ๐ซ๐ ๐ญ๐ก๐๐ง ๐ญ๐ก๐ ๐ฆ๐๐๐ง ?
๐ns. If your data contains outliers, then you would typically rather use the median because otherwise the value of the mean would be dominated by the outliers rather than the typical values. In conclusion, if you are considering the mean, check your data for outliers, if any then better choose median.
ENJOY LEARNING ๐๐
[PART -15]
๐1. ๐๐๐๐ฅ ๐ฐ๐ข๐ญ๐ก ๐ฎ๐ง๐๐๐ฅ๐๐ง๐๐๐ ๐๐ข๐ง๐๐ซ๐ฒ ๐๐ฅ๐๐ฌ๐ฌ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง?
๐ns. Techniques to Handle unbalanced Data:
1. Use the right evaluation metrics
2. Use K-fold Cross-Validation in the right way
3. Ensemble different resampled datasets
4. Resample with different ratios
5. Design your own models
๐2. ๐๐๐ญ๐ข๐ฏ๐๐ญ๐ข๐จ๐ง ๐๐ฎ๐ง๐๐ญ๐ข๐จ๐ง?
๐ns. Activation functions are mathematical equations that determine the output of a neural network model. It is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output.
๐3. ๐๐ข๐ฆ๐๐ง๐ฌ๐ข๐จ๐ง ๐ซ๐๐๐ฎ๐๐ญ๐ข๐จ๐ง?
๐ns. Dimensionality Reduction is used to reduce the feature space with consideration by a set of principal features.
๐4. ๐๐ก๐ฒ ๐ข๐ฌ ๐ฆ๐๐๐ง ๐ฌ๐ช๐ฎ๐๐ซ๐ ๐๐ซ๐ซ๐จ๐ซ ๐ ๐๐๐ ๐ฆ๐๐๐ฌ๐ฎ๐ซ๐ ๐จ๐ ๐ฆ๐จ๐๐๐ฅ ๐ฉ๐๐ซ๐๐จ๐ซ๐ฆ๐๐ง๐๐?
๐ns. Mean Squared Error (MSE) gives a relatively high weight to large errors โ therefore, MSE tends to put too much emphasis on large deviations.
๐5. ๐๐๐ฆ๐จ๐ฏ๐ ๐ฆ๐ฎ๐ฅ๐ญ๐ข๐๐จ๐ฅ๐ฅ๐ข๐ง๐๐๐ซ๐ข๐ญ๐ฒ?
๐ns. To remove multicollinearities, we can do two things.
1. We can create new features
2. remove them from our data.
๐6. ๐ฅ๐จ๐ง๐ -๐ญ๐๐ข๐ฅ๐๐ ๐๐ข๐ฌ๐ญ๐ซ๐ข๐๐ฎ๐ญ๐ข๐จ๐ง ?
๐ns. A long tail distribution of numbers is a kind of distribution having many occurrences far from the "head" or central part of the distribution. Most of occurrences in this kind of distributions occurs at early frequencies/values of x-axis.
๐7. ๐๐ฎ๐ญ๐ฅ๐ข๐๐ซ? ๐๐๐๐ฅ ๐ฐ๐ข๐ญ๐ก ๐ข๐ญ?
๐ns. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution error.
Removing outliers is legitimate only for specific reasons. Outliers can be very informative about the subject-area and data collection process. If the outlier does not change the results but does affect assumptions, you may drop the outlier. Or just trim the data set, but replace outliers with the nearest โgoodโ data, as opposed to truncating them completely.
๐8. ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐ ๐ฐ๐ก๐๐ซ๐ ๐ญ๐ก๐ ๐ฆ๐๐๐ข๐๐ง ๐ข๐ฌ ๐ ๐๐๐ญ๐ญ๐๐ซ ๐ฆ๐๐๐ฌ๐ฎ๐ซ๐ ๐ญ๐ก๐๐ง ๐ญ๐ก๐ ๐ฆ๐๐๐ง ?
๐ns. If your data contains outliers, then you would typically rather use the median because otherwise the value of the mean would be dominated by the outliers rather than the typical values. In conclusion, if you are considering the mean, check your data for outliers, if any then better choose median.
ENJOY LEARNING ๐๐
๐ฅ2๐1
Which of the following method/s can be used to handle missing values?
Anonymous Quiz
16%
Mean Substitution
6%
Pairwise deletion
11%
Regression imputation
66%
All of the above
๐2
Which of the following is not a feature selection technique?
Anonymous Quiz
21%
Information Gain
13%
Forward Selection
23%
Regularisation
44%
K-means clustering
Data Science Interview Questions
[PART-16]
Q. How can outlier values be treated?
A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.
Q. What is root cause analysis?
A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problemโthe most fundamental reasonโthat puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.
Q. What is bias and variance in Data Science?
A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.
Q. What is a confusion matrix?
A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.
ENJOY LEARNING ๐๐
[PART-16]
Q. How can outlier values be treated?
A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.
Q. What is root cause analysis?
A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problemโthe most fundamental reasonโthat puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.
Q. What is bias and variance in Data Science?
A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.
Q. What is a confusion matrix?
A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.
ENJOY LEARNING ๐๐
๐4โค1
Which of the following is not a python library?
Anonymous Quiz
3%
Pandas
2%
Numpy
3%
Matplotlib
10%
Scikit-learn
82%
Array
Which of the following is not a machine learning algorithm?
Anonymous Quiz
5%
Linear Regression
9%
Random Forest
77%
Standard scalar
6%
Decision Tree
4%
Logistic Regression
Which of the following is not a supervised algorithm?
Anonymous Quiz
11%
Linear Regression
9%
Logistic Regression
64%
Clustering
16%
Decision Tree
๐3
Which of the following tool can be used for Data Visualization?
Anonymous Quiz
9%
Tableau
11%
Matplotlib
7%
Power BI
74%
All of the above
Data Science & Machine Learning
Do you want daily quiz to enhance your knowledge?
Thats an amazing response from you guys โค๏ธ๐
Which of the following cannot give 10 as an answer?
Anonymous Quiz
8%
5*2
7%
2+5*2-2
69%
2+5*(2-2)
16%
3*2+9//2
๐2
Data Science & Machine Learning
Which of the following cannot give 10 as an answer?
Well done guys!!
Explanation for those who marked wrong answer:
Read the question again
The Answer to (9//2) is 4 and not 4.5
Explanation for those who marked wrong answer:
Read the question again
The Answer to (9//2) is 4 and not 4.5