Data Science Machine Learning Data Analysis

💡 Python: Simple K-Means Clustering Project

K-Means is a popular unsupervised machine learning algorithm used to partition n observations into k clusters, where each observation belongs to the cluster with the nearest mean (centroid). This simple project demonstrates K-Means on the classic Iris dataset using scikit-learn to group similar flower species based on their measurements.

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np

# 1. Load the Iris dataset
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # True labels (0, 1, 2 for different species) - not used by KMeans

# 2. (Optional but recommended) Scale the features
# K-Means is sensitive to the scale of features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Define and train the K-Means model
# We know there are 3 species in Iris, so we set n_clusters=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10) # n_init is important for robust results
kmeans.fit(X_scaled)

# 4. Get the cluster assignments for each data point
labels = kmeans.labels_

# 5. Get the coordinates of the cluster centroids
centroids = kmeans.cluster_centers_

# 6. Visualize the clusters (using first two features for simplicity)
plt.figure(figsize=(8, 6))

# Plot each cluster
colors = ['red', 'green', 'blue']
for i in range(3):
    plt.scatter(X_scaled[labels == i, 0], X_scaled[labels == i, 1],
                s=50, c=colors[i], label=f'Cluster {i+1}', alpha=0.7)

# Plot the centroids
plt.scatter(centroids[:, 0], centroids[:, 1],
            s=200, marker='X', c='black', label='Centroids', edgecolor='white')

plt.title('K-Means Clustering on Iris Dataset (Scaled Features)')
plt.xlabel('Scaled Sepal Length')
plt.ylabel('Scaled Sepal Width')
plt.legend()
plt.grid(True)
plt.show()

# You can also compare with true labels (for evaluation, not part of clustering process itself)
# print("True labels:", y)
# print("K-Means labels:", labels)

Code explanation: This script loads the Iris dataset, scales its features using StandardScaler, and then applies KMeans to group the data into 3 clusters. It visualizes the resulting clusters and their centroids using a scatter plot with the first two scaled features.

#Python #MachineLearning #KMeans #Clustering #DataScience

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

562 views06:10

About

Blog

Apps

Platform