K-means is a prototype-based clustering method that partitions a finite set of vectors into k groups by minimizing the within-cluster sum of squared Euclidean distances (WCSS) to the cluster means. This page implements the classic Lloyd’s algorithm: starting from k centroids in the plane, repeatedly (1) assign every data point to its nearest centroid (Voronoi regions in Euclidean distance) and (2) update each centroid to the arithmetic mean of all points currently assigned to it. The iteration is guaranteed to decrease (or leave unchanged) the WCSS each full step because the assignment step is optimal for fixed centers and the mean is the minimizer for fixed assignments. The simulator is deliberately small-scale and deterministic where randomness appears: initial centroids are k distinct data points chosen with a reproducible shuffle controlled by an init seed, and an optional Gaussian mixture demo paints a toy multi-modal cloud. Empty clusters (possible in degenerate geometry) keep their previous centroid in the update step—a simple teaching default rather than full k-means++ repair logic. Students can click to add points, drag them, change k, re-randomize seeds, single-step Lloyd, or run to convergence while watching SSE drop.
Who it's for: Introductory data-science, statistics, or numerical-methods students learning unsupervised clustering, Voronoi intuition, and coordinate-descent-style alternating minimization.
Key terms
K-means clustering
Lloyd's algorithm
Centroid
Voronoi assignment
Within-cluster sum of squares
Gaussian mixture toy data
Local optima
Empty cluster handling
How it works
K-means partitions points into k groups by alternating assignment (each point → nearest centroid) and update (centroid ← mean of its cluster)—Lloyd’s algorithm. Click to build a dataset, pick k, randomize centroids, then step through iterations and watch SSE (sum of squared distances to assigned centroids) fall.
Frequently asked questions
Why can different random starts give different final clusters?
Lloyd’s algorithm monotonically improves the WCSS objective, but only toward a local minimum of the non-convex k-means loss. Different initial centroids can land in different basins of attraction, especially with well-separated blobs—this is why production pipelines often try multiple restarts or smarter initializations (e.g., k-means++), which are beyond this minimal visualization.
Does the algorithm guarantee the globally optimal partition?
No—exact global optimization of k-means is NP-hard in the worst case even in the plane for general discrete sets. Lloyd’s iterations are a fast heuristic that works extremely well in practice for many smooth-looking clouds.
What happens if a cluster becomes empty?
In strict textbook code you might re-seed that centroid (e.g., farthest point from others). Here, an empty cluster keeps its previous centroid so the animation never “loses” a label mid-lesson; you can still see reassignment on the next assignment step.