DBSCAN (Density-Based Spatial Clustering of Applications with Noise) discovers clusters of arbitrary shape without fixing the number of clusters k. Two user parameters control density: a distance ε defining local neighborhoods, and minPts, the minimum number of points (including the query point itself) required inside an ε-ball for that point to be a core object. The algorithm grows clusters by transitively expanding from cores: any unvisited point in the ε-neighborhood of a core is pulled into the same cluster; non-core points reached this way are border points; anything never absorbed is labeled noise. Unlike k-means, DBSCAN can reject sparse outliers and separate nearby blobs when ε is small enough—while large ε tends to bridge distinct groups. This page recomputes labels instantly on the plane as you drag ε and minPts, colors clusters, outlines noise distinctly, and optionally draws ε-circles around cores in screen space (a teaching overlay, not a second metric). A built-in demo mixes four tight Gaussian blobs with uniformly scattered background points to make the noise class visually obvious.
Who it's for: Introductory machine-learning or spatial-data students comparing partition-based k-means with density-based clustering; pairs naturally with the Lloyd k-means lab on this site.
Key terms
DBSCAN
ε-neighborhood
minPts
Core point
Border point
Noise
Density reachability
Arbitrary-shaped clusters
How it works
DBSCAN finds density-connected clusters without fixing k: a point is a core if at least minPts neighbors lie within distance ε (including itself). From each unvisited core, the algorithm expands by unioning ε-neighborhoods of cores; other reached points become border; points never absorbed stay noise. Drag ε and minPts to split or merge blobs and watch noise points appear at sparse regions.
Frequently asked questions
Why does a tiny change in ε sometimes merge or split clusters dramatically?
DBSCAN’s decisions are thresholded on counts within a fixed-radius ball. Near critical densities, adding a small ε can suddenly connect two dense regions through a sparse “bridge” of points, merging clusters; shrinking ε can sever that bridge. This non-smooth behavior is intrinsic to hard density thresholds.
How should I pick minPts in 2D?
A common rule of thumb is minPts ≈ 2 × dim for low-dimensional spatial data (so 4 is a frequent baseline in the plane), then tune for noise tolerance—larger minPts demands denser cores and tends to label more border/sparse points as noise.
Does this implementation exactly match sklearn’s DBSCAN?
It follows the same core / border / noise logic and ε-neighborhood expansion, but omits advanced indexing (kd-trees) and edge-case policies used in optimized libraries. The goal is visual correctness for teaching, not bit-identical parity with production code.