Python Data Science Jobs & Interviews
20.4K subscribers
188 photos
4 videos
25 files
327 links
Your go-to hub for Python and Data Science—featuring questions, answers, quizzes, and interview tips to sharpen your skills and boost your career in the data-driven world.

Admin: @Hussein_Sheikho
Download Telegram
Question 4 (Intermediate):
In scikit-learn's KMeans implementation, what is the purpose of the n_init parameter?

A) Number of initial centroid configurations to try
B) Number of iterations for each run
C) Number of features to initialize
D) Number of CPU cores to use

#Python #KMeans #Clustering #MachineLearning

By: https://t.iss.one/DataScienceQ
3
KMeans Interview Questions

What is the primary goal of KMeans clustering?

Answer:
To partition data into K clusters based on similarity, minimizing intra-cluster variance

How does KMeans determine the initial cluster centers?

Answer:
By randomly selecting K data points as initial centroids

What is the main limitation of KMeans regarding cluster shape?

Answer:
It assumes spherical and equally sized clusters, struggling with non-spherical shapes

How do you choose the optimal number of clusters (K) in KMeans?

Answer:
Using methods like the Elbow Method or Silhouette Score

What is the role of the inertia metric in KMeans?

Answer:
Measures the sum of squared distances from each point to its cluster center

Can KMeans handle categorical data directly?

Answer:
No, it requires numerical data; categorical variables must be encoded

How does KMeans handle outliers?

Answer:
Outliers can distort cluster centers and increase inertia

What is the difference between KMeans and KMedoids?

Answer:
KMeans uses mean of points, while KMedoids uses actual data points as centers

Why is feature scaling important for KMeans?

Answer:
To ensure all features contribute equally and prevent dominance by large-scale features

How does KMeans work in high-dimensional spaces?

Answer:
It suffers from the curse of dimensionality, making distance measures less meaningful

What is the time complexity of KMeans?

Answer:
O(n * k * t), where n is samples, k is clusters, and t is iterations

What is the space complexity of KMeans?

Answer:
O(k * d), where k is clusters and d is features

How do you evaluate the quality of KMeans clustering?

Answer:
Using metrics like silhouette score, within-cluster sum of squares, or Davies-Bouldin index

Can KMeans be used for image segmentation?

Answer:
Yes, by treating pixel values as features and clustering them

How does KMeans initialize centroids differently in KMeans++?

Answer:
Centroids are initialized to be far apart, improving convergence speed and quality

What happens if the number of clusters (K) is too small?

Answer:
Clusters may be overly broad, merging distinct groups

What happens if the number of clusters (K) is too large?

Answer:
Overfitting occurs, creating artificial clusters

Does KMeans guarantee a global optimum?

Answer:
No, it converges to a local optimum depending on initialization

How can you improve KMeans performance on large datasets?

Answer:
Using MiniBatchKMeans or sampling techniques

What is the effect of random seed on KMeans results?

Answer:
Different seeds lead to different initial centroids, affecting final clusters

#️⃣ #kmeans #machine_learning #clustering #data_science #ai #python #coding #dev

By: t.iss.one/DataScienceQ 🚀