💡 Python: Simple K-Means Clustering Project
K-Means is a popular unsupervised machine learning algorithm used to partition
Code explanation: This script loads the Iris dataset, scales its features using
#Python #MachineLearning #KMeans #Clustering #DataScience
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
K-Means is a popular unsupervised machine learning algorithm used to partition
n observations into k clusters, where each observation belongs to the cluster with the nearest mean (centroid). This simple project demonstrates K-Means on the classic Iris dataset using scikit-learn to group similar flower species based on their measurements.import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np
# 1. Load the Iris dataset
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # True labels (0, 1, 2 for different species) - not used by KMeans
# 2. (Optional but recommended) Scale the features
# K-Means is sensitive to the scale of features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 3. Define and train the K-Means model
# We know there are 3 species in Iris, so we set n_clusters=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10) # n_init is important for robust results
kmeans.fit(X_scaled)
# 4. Get the cluster assignments for each data point
labels = kmeans.labels_
# 5. Get the coordinates of the cluster centroids
centroids = kmeans.cluster_centers_
# 6. Visualize the clusters (using first two features for simplicity)
plt.figure(figsize=(8, 6))
# Plot each cluster
colors = ['red', 'green', 'blue']
for i in range(3):
plt.scatter(X_scaled[labels == i, 0], X_scaled[labels == i, 1],
s=50, c=colors[i], label=f'Cluster {i+1}', alpha=0.7)
# Plot the centroids
plt.scatter(centroids[:, 0], centroids[:, 1],
s=200, marker='X', c='black', label='Centroids', edgecolor='white')
plt.title('K-Means Clustering on Iris Dataset (Scaled Features)')
plt.xlabel('Scaled Sepal Length')
plt.ylabel('Scaled Sepal Width')
plt.legend()
plt.grid(True)
plt.show()
# You can also compare with true labels (for evaluation, not part of clustering process itself)
# print("True labels:", y)
# print("K-Means labels:", labels)
Code explanation: This script loads the Iris dataset, scales its features using
StandardScaler, and then applies KMeans to group the data into 3 clusters. It visualizes the resulting clusters and their centroids using a scatter plot with the first two scaled features.#Python #MachineLearning #KMeans #Clustering #DataScience
━━━━━━━━━━━━━━━
By: @DataScienceM ✨