Why do the arrows rotate when I move a single outlier?

PCA is **global**: the covariance depends on **all** points. A far-outlying point can tilt the ellipse of inertia and change both eigenvectors. Robust alternatives (not shown) down-weight outliers.

Is this sample covariance (divide by n−1) or population (divide by n)?

The simulator uses the **population** divisor **n** for simplicity and stable teaching plots with small **n**; rankings of PCs are unchanged; only absolute eigenvalues differ from the **n−1** convention by a constant factor.

How does the bottom strip relate to sklearn’s `.transform(X)[:,0]`?

After centering with the same mean **μ** used here, sklearn’s first column is **(X_centered) v₁** with **v₁** the first principal axis — identical to the **scores** plotted on the strip (up to sign, which flips both the arrow and the axis consistently).

PCA in 2D (principal components & 1D projection)

Principal component analysis (PCA) in two coordinates starts by centering the data (subtracting the mean of each axis). The 2×2 covariance matrix of the centered points is symmetric positive semidefinite; its eigenvectors are orthogonal directions of maximal variance, ordered by eigenvalues λ₁ ≥ λ₂ ≥ 0. The first eigenvector PC1 is the line through the mean along which orthogonal projection preserves the most variance; PC2 is perpendicular and captures the remaining spread. Each point’s score on PC1 is the inner product (p − μ)·v₁ with a unit eigenvector — exactly the coordinate used when you project the cloud onto a one-dimensional subspace. This page draws the mean, arrows scaled roughly as √λ (user-tunable) so length reflects spread along each axis, optional perpendicular segments from every point to its rank-one reconstruction on the PC1 line, and a bottom strip that maps PC1 scores to a horizontal axis so the 1D view is literal, not abstract.

Who it's for: Introductory linear algebra, statistics, or machine-learning students learning eigen-decompositions, variance maximization, and the geometry of least-squares rank-one approximation in the plane.

Key terms

Principal component analysis
Covariance matrix
Eigenvector
Eigenvalue
Centering
Orthogonal projection
Explained variance
Rank-one approximation

How it works

PCA (principal component analysis) finds orthogonal directions in feature space that capture the most variance. Here the data live in the plane: we center the cloud, form the 2×2 covariance matrix, and take its eigenvectors — PC1 (largest eigenvalue) and PC2. Each point’s score on PC1 is the dot product with that unit vector; the bottom strip maps those scores to a 1D axis, equivalent to projecting orthogonally onto the infinite line through the mean along PC1. Toggle projection segments to see the reconstruction error drop when you align the cloud with PC1.

Frequently asked questions

Why do the arrows rotate when I move a single outlier?: PCA is global: the covariance depends on all points. A far-outlying point can tilt the ellipse of inertia and change both eigenvectors. Robust alternatives (not shown) down-weight outliers.
Is this sample covariance (divide by n−1) or population (divide by n)?: The simulator uses the population divisor n for simplicity and stable teaching plots with small n; rankings of PCs are unchanged; only absolute eigenvalues differ from the n−1 convention by a constant factor.
How does the bottom strip relate to sklearn’s `.transform(X)[:,0]`?: After centering with the same mean μ used here, sklearn’s first column is (X_centered) v₁ with v₁ the first principal axis — identical to the scores plotted on the strip (up to sign, which flips both the arrow and the axis consistently).

Other simulators in this category — or see all 61.

View category →

NewSchool

Decision Tree Classifier (2D toy)

Greedy axis-aligned splits on a click-labeled scatter: compare **Gini** vs **entropy** impurity, max depth, and min-samples-per-leaf; shaded rectangles show leaf decisions, dashed lines show recursive partitions.

Launch Simulator

NewSchool

Toy 2-Layer MLP + Backprop (XOR / spiral)

Click-labeled 2D data; **tanh** hidden layer + **logistic** output trained by **full-batch** gradient descent on **binary cross-entropy**. Heatmap shows **P(class 1)** evolving across epoch blocks — watch the **0.5 decision contour** wrap XOR or untangle spirals.

Launch Simulator

NewUniversity / research

Convolution (pulses)

Two rectangular pulses; overlap length at τ = 0.

Launch Simulator

NewUniversity / research

Euler vs RK4 (Pendulum)

Same nonlinear pendulum ODE and step h; Euler vs RK4 side by side.

Launch Simulator

NewSchool

Lotka–Volterra

N′ = αN−βNP, P′ = δNP−γP; phase plane RK4; equilibrium dot.

Launch Simulator

NewSchool

Logistic Growth

dN/dt = rN(1−N/K); exact S-curve vs carrying capacity K.

Launch Simulator

PCA in 2D (principal components & 1D projection)

How it works

Frequently asked questions

More from Math Visualization

Decision Tree Classifier (2D toy)

Toy 2-Layer MLP + Backprop (XOR / spiral)

Convolution (pulses)

Euler vs RK4 (Pendulum)

Lotka–Volterra

Logistic Growth

PCA in 2D (principal components & 1D projection)

How it works

Frequently asked questions