Why is only the slope penalized, not the intercept?

Penalizing the intercept would make the fit depend on an arbitrary shift of **y**; most textbook ridge/lasso formulations either **center** the responses/features or leave the intercept **unpenalized** so the model can match the overall level of the data. This simulator follows that teaching convention.

Does a large Ridge λ always give a better model?

No—**λ** trades off bias and variance. Too large a penalty shrinks the slope toward zero even when a steep slope is warranted, underfitting the signal. **Cross-validation** (not shown here) is the standard way to pick λ in practice.

Why does my Lasso slope hit exactly zero sometimes?

The L1 penalty can drive coefficients to **exact zeros** (sparse solutions). In this one-slope setup, a sufficiently large λ makes the optimal slope **0**, leaving a constant model **y ≈ β₀**.

Linear Regression: OLS, Ridge, Lasso & R²

This lab is an interactive simple linear regression playground on the plane. You build a small dataset by clicking to add points, dragging to move them, and Shift+clicking to delete—so the geometry of leverage and outliers is immediate. The model is y = β₀ + β₁ x with an intercept and one slope. Ordinary least squares (OLS) minimizes the sum of squared vertical residuals. Ridge adds an L2 penalty on the slope only (the intercept is not shrunk), corresponding to the normal equations with a single diagonal regularizer on the second parameter—this tends to pull the slope toward zero and reduces variance at the cost of bias. Lasso uses an L1 penalty on the slope only and is solved here with a short coordinate-descent loop; it can exactly zero the slope for large penalties, performing a kind of hard complexity control. A dedicated Δy spike is applied only to the point with the largest |x| (a high-leverage location for a line), mimicking a classic vertical outlier experiment: OLS often tilts dramatically to reduce squared error on that point, while penalized fits frequently remain closer to the bulk trend. Readouts include SSE and R² = 1 − SSE/SST with SST measured around ȳ on the currently plotted y-values (including the spike). When viewing Ridge/Lasso, you can overlay a faint OLS line to compare slopes directly.

Who it's for: Intro statistics / machine-learning students learning OLS vs penalized regression, R², and outlier sensitivity; pairs well with matrix-form normal-equation lectures.

Key terms

Ordinary least squares
Ridge regression
Lasso regression
L2 and L1 penalties
R-squared
Sum of squared errors
Outliers and leverage
Coordinate descent

How it works

Interactive scatter in the plane: fit y = β₀ + β₁ x with ordinary least squares (OLS), Ridge (L2 on the slope), or Lasso (L1 on the slope, intercept not penalized). A vertical spike on the largest |x| point mimics an outlier in y; compare how OLS tilts while penalized fits often stay closer to the bulk trend. Readouts include SSE and R²; optionally overlay a faint OLS line while viewing Ridge/Lasso.

Frequently asked questions

Why is only the slope penalized, not the intercept?: Penalizing the intercept would make the fit depend on an arbitrary shift of y; most textbook ridge/lasso formulations either center the responses/features or leave the intercept unpenalized so the model can match the overall level of the data. This simulator follows that teaching convention.
Does a large Ridge λ always give a better model?: No—λ trades off bias and variance. Too large a penalty shrinks the slope toward zero even when a steep slope is warranted, underfitting the signal. Cross-validation (not shown here) is the standard way to pick λ in practice.
Why does my Lasso slope hit exactly zero sometimes?: The L1 penalty can drive coefficients to exact zeros (sparse solutions). In this one-slope setup, a sufficiently large λ makes the optimal slope 0, leaving a constant model y ≈ β₀.

Other simulators in this category — or see all 61.

View category →

NewSchool

K-Means Clustering (Lloyd)

Click to add points, choose k, randomize centroids, then step Lloyd iterations (assign to nearest centroid, update means). Optional Gaussian-mixture demo; watch within-cluster SSE decrease.

Launch Simulator

NewSchool

DBSCAN Density Clustering

Sliders for ε and minPts on a click-built point set: core / border / noise coloring, optional ε-disks around cores, demo with scattered outliers.

Launch Simulator

NewSchool

PCA in 2D (principal components & 1D projection)

Click-built cloud: covariance eigenvectors as PC1/PC2 arrows from the mean, optional orthogonal drops to the PC1 line, and a bottom strip of PC1 scores — the standard rank-one projection coordinate.

Launch Simulator

NewSchool

Decision Tree Classifier (2D toy)

Greedy axis-aligned splits on a click-labeled scatter: compare **Gini** vs **entropy** impurity, max depth, and min-samples-per-leaf; shaded rectangles show leaf decisions, dashed lines show recursive partitions.

Launch Simulator

NewSchool

Toy 2-Layer MLP + Backprop (XOR / spiral)

Click-labeled 2D data; **tanh** hidden layer + **logistic** output trained by **full-batch** gradient descent on **binary cross-entropy**. Heatmap shows **P(class 1)** evolving across epoch blocks — watch the **0.5 decision contour** wrap XOR or untangle spirals.

Launch Simulator

NewUniversity / research

Convolution (pulses)

Two rectangular pulses; overlap length at τ = 0.

Launch Simulator

Linear Regression: OLS, Ridge, Lasso & R²

How it works

Frequently asked questions

More from Math Visualization

K-Means Clustering (Lloyd)

DBSCAN Density Clustering

PCA in 2D (principal components & 1D projection)

Decision Tree Classifier (2D toy)

Toy 2-Layer MLP + Backprop (XOR / spiral)

Convolution (pulses)

Linear Regression: OLS, Ridge, Lasso & R²

How it works

Frequently asked questions