Python Data Science Jobs & Interviews
20.3K subscribers
187 photos
4 videos
25 files
325 links
Your go-to hub for Python and Data Science—featuring questions, answers, quizzes, and interview tips to sharpen your skills and boost your career in the data-driven world.

Admin: @Hussein_Sheikho
Download Telegram
Question 2 (Intermediate):
What is a common use case for the PCA (Principal Component Analysis) algorithm in machine learning?

A) Hyperparameter tuning
B) Data visualization and dimensionality reduction
C) Gradient descent optimization
D) Model ensembling

#MachineLearning #PCA #DimensionalityReduction #MLQuiz #DataScience
#How can I implement Principal Component Analysis (PCA) for dimensionality reduction using scikit-learn? Provide a Python example, explain the concept of variance maximization, and discuss how to choose the number of principal components.

Answer:
PCA reduces the dimensionality of data while preserving as much variance as possible. It transforms features into new uncorrelated variables (principal components) ordered by explained variance.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

# Load dataset
data = load_iris()
X = data.data
y = data.target
feature_names = data.feature_names

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Print explained variance ratio
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Total Explained Variance:", sum(pca.explained_variance_ratio_))

# Plot results
plt.figure(figsize=(8, 6))
colors = ['red', 'green', 'blue']
for i in range(3):
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], c=colors[i], label=data.target_names[i])
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.title('PCA of Iris Dataset')
plt.legend()
plt.grid(True)
plt.show()

# Determine optimal number of components
pca_full = PCA()
pca_full.fit(X_scaled)
cumulative_variance = np.cumsum(pca_full.explained_variance_ratio_)

plt.figure(figsize=(8, 6))
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker='o')
plt.axhline(y=0.95, color='r', linestyle='--', label='95% Variance Threshold')
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('Choosing Number of Components')
plt.legend()
plt.grid(True)
plt.show()


Explanation:
- Standardization: Essential because PCA is sensitive to scale.
- PCA transformation: Finds directions (components) that maximize variance in the data.
- Components: The first component captures the most variance, the second the next highest, etc.

Choosing Number of Components:
Use the "elbow method" or set a threshold (e.g., 95% total variance). In the example, n_components=2 retains ~97% of variance, showing effective reduction from 4D to 2D.

Time Complexity: O(nm² + m³) where n is samples and m is features.
Use Case: #PCA is ideal for visualization, noise reduction, and improving model performance on high-dimensional data.

By: @DataScienceQ 🚀
Please open Telegram to view this post
VIEW IN TELEGRAM