Why can different random starts give different final clusters?

Lloyd’s algorithm **monotonically improves** the WCSS objective, but only toward a **local** minimum of the non-convex k-means loss. Different initial centroids can land in different basins of attraction, especially with well-separated blobs—this is why production pipelines often try **multiple restarts** or smarter initializations (e.g., **k-means++**), which are beyond this minimal visualization.

Does the algorithm guarantee the globally optimal partition?

No—exact global optimization of k-means is **NP-hard** in the worst case even in the plane for general discrete sets. Lloyd’s iterations are a fast **heuristic** that works extremely well in practice for many smooth-looking clouds.

What happens if a cluster becomes empty?

In strict textbook code you might **re-seed** that centroid (e.g., farthest point from others). Here, an empty cluster **keeps its previous centroid** so the animation never “loses” a label mid-lesson; you can still see reassignment on the next assignment step.

K-Means Clustering (Lloyd)

K-means is a prototype-based clustering method that partitions a finite set of vectors into k groups by minimizing the within-cluster sum of squared Euclidean distances (WCSS) to the cluster means. This page implements the classic Lloyd’s algorithm: starting from k centroids in the plane, repeatedly (1) assign every data point to its nearest centroid (Voronoi regions in Euclidean distance) and (2) update each centroid to the arithmetic mean of all points currently assigned to it. The iteration is guaranteed to decrease (or leave unchanged) the WCSS each full step because the assignment step is optimal for fixed centers and the mean is the minimizer for fixed assignments. The simulator is deliberately small-scale and deterministic where randomness appears: initial centroids are k distinct data points chosen with a reproducible shuffle controlled by an init seed, and an optional Gaussian mixture demo paints a toy multi-modal cloud. Empty clusters (possible in degenerate geometry) keep their previous centroid in the update step—a simple teaching default rather than full k-means++ repair logic. Students can click to add points, drag them, change k, re-randomize seeds, single-step Lloyd, or run to convergence while watching SSE drop.

Who it's for: Introductory data-science, statistics, or numerical-methods students learning unsupervised clustering, Voronoi intuition, and coordinate-descent-style alternating minimization.

Key terms

K-means clustering
Lloyd's algorithm
Centroid
Voronoi assignment
Within-cluster sum of squares
Gaussian mixture toy data
Local optima
Empty cluster handling

How it works

K-means partitions points into k groups by alternating assignment (each point → nearest centroid) and update (centroid ← mean of its cluster)—Lloyd’s algorithm. Click to build a dataset, pick k, randomize centroids, then step through iterations and watch SSE (sum of squared distances to assigned centroids) fall.

Frequently asked questions

Why can different random starts give different final clusters?: Lloyd’s algorithm monotonically improves the WCSS objective, but only toward a local minimum of the non-convex k-means loss. Different initial centroids can land in different basins of attraction, especially with well-separated blobs—this is why production pipelines often try multiple restarts or smarter initializations (e.g., k-means++), which are beyond this minimal visualization.
Does the algorithm guarantee the globally optimal partition?: No—exact global optimization of k-means is NP-hard in the worst case even in the plane for general discrete sets. Lloyd’s iterations are a fast heuristic that works extremely well in practice for many smooth-looking clouds.
What happens if a cluster becomes empty?: In strict textbook code you might re-seed that centroid (e.g., farthest point from others). Here, an empty cluster keeps its previous centroid so the animation never “loses” a label mid-lesson; you can still see reassignment on the next assignment step.

Other simulators in this category — or see all 61.

View category →

NewSchool

DBSCAN Density Clustering

Sliders for ε and minPts on a click-built point set: core / border / noise coloring, optional ε-disks around cores, demo with scattered outliers.

Launch Simulator

NewSchool

PCA in 2D (principal components & 1D projection)

Click-built cloud: covariance eigenvectors as PC1/PC2 arrows from the mean, optional orthogonal drops to the PC1 line, and a bottom strip of PC1 scores — the standard rank-one projection coordinate.

Launch Simulator

NewSchool

Decision Tree Classifier (2D toy)

Greedy axis-aligned splits on a click-labeled scatter: compare **Gini** vs **entropy** impurity, max depth, and min-samples-per-leaf; shaded rectangles show leaf decisions, dashed lines show recursive partitions.

Launch Simulator

NewSchool

Toy 2-Layer MLP + Backprop (XOR / spiral)

Click-labeled 2D data; **tanh** hidden layer + **logistic** output trained by **full-batch** gradient descent on **binary cross-entropy**. Heatmap shows **P(class 1)** evolving across epoch blocks — watch the **0.5 decision contour** wrap XOR or untangle spirals.

Launch Simulator

NewUniversity / research