PhysSandbox
Classical MechanicsWaves & SoundElectricity & MagnetismOptics & LightGravity & OrbitsLabs
🌙Astronomy & The Sky🌡️Thermodynamics🌍Biophysics, Fluids & Geoscience📐Math Visualization🔧Engineering🧪Chemistry

More from Math Visualization

Other simulators in this category — or see all 46.

View category →
NewUniversity / research

Minkowski Diagram

Light cone and boosted axes in 1+1D; γ from v.

Launch Simulator
NewSchool

Twin Paradox

Out-and-back worldlines; proper time τ = T/γ vs Earth time T.

Launch Simulator
NewKids

Monte Carlo π

Uniform samples in a square; 4·(in disk)/N estimates π.

Launch Simulator
NewSchool

Random Walk

1D or 2D steps; trail and running mean ⟨r²⟩ vs diffusion intuition.

Launch Simulator
FeaturedSchool

Vector Addition

Place vectors and see the resultant with head-to-tail animation.

Launch Simulator
FeaturedSchool

Trigonometry Circle

Unit circle with live sin, cos, tan values as you drag.

Launch Simulator
PhysSandbox

Interactive physics, chemistry, and engineering simulators for students, teachers, and curious minds.

Physics

  • Classical Mechanics
  • Waves & Sound
  • Electricity & Magnetism

Science

  • Optics & Light
  • Gravity & Orbits
  • Astronomy & The Sky

More

  • Thermodynamics
  • Biophysics, Fluids & Geoscience
  • Math Visualization
  • Engineering
  • Chemistry

© 2026 PhysSandbox. Free interactive science simulators.

PrivacyTermsContact
Home/Math Visualization/Gradient Descent (2D)

Gradient Descent (2D)

Gradient descent is a fundamental optimization algorithm used to find the minimum of a function. This simulator visualizes the process in two dimensions, where the function f(x,y) represents a surface, such as a bowl or an elliptic well. The level sets, or contour lines, of f(x,y) are curves of constant function value, analogous to elevation lines on a topographic map. The algorithm iteratively updates the current point (x,y) by moving a small step in the direction opposite the function's gradient, following the update rule: (x,y) ← (x,y) − η∇f(x,y). Here, ∇f(x,y) is the gradient vector, which points in the direction of steepest ascent, and η (eta) is the learning rate, a positive scalar controlling the step size. By repeatedly stepping against the gradient, the path descends toward a local minimum. The visualization demonstrates how the choice of learning rate and starting position affects convergence. A rate that is too small leads to slow progress, while one that is too large can cause overshooting and oscillation, or even divergence. The model simplifies real-world optimization by using smooth, convex functions with a single global minimum, avoiding complexities like saddle points, noise, or high-dimensional parameter spaces. Interacting with this simulation helps learners build intuition for the core mechanics of gradient-based optimization, a principle underpinning machine learning training, engineering design, and various scientific fitting procedures.

Who it's for: Undergraduate students in calculus, multivariable analysis, or introductory machine learning courses, as well as anyone seeking an intuitive grasp of numerical optimization.

Key terms

  • Gradient Descent
  • Gradient
  • Learning Rate
  • Level Sets
  • Contour Plot
  • Optimization
  • Convex Function
  • Local Minimum

Objective & step

0.12

Each step: (x, y) ← (x, y) − η∇f. Contours show level sets of f. Too large η can oscillate; too small is slow — same trade-offs as in machine-learning optimizers (this is the continuous analogy).

Measured values

x2.2000
y1.6000
f(x,y)15.08000
|∇f|13.5351

How it works

Visualize steepest-descent on a smooth convex bowl — a bridge between calculus and optimization.

Frequently asked questions

Why do we move against the gradient, not with it?
The gradient vector ∇f points in the direction of the steepest increase of the function. To minimize the function, we want to go downhill, which is the direction of steepest decrease. Therefore, we subtract the gradient, moving in the direction of -∇f.
What happens if I set the learning rate (η) too high?
An excessively high learning rate causes the algorithm to take steps that are too large. This can lead to overshooting the minimum, resulting in oscillations around it or even causing the path to diverge and move away from the minimum entirely, failing to converge.
Is the minimum found always the global (lowest) minimum?
Not necessarily. Gradient descent converges to a local minimum. In this simulator, the functions are convex (shaped like a bowl), so there is only one local minimum, which is also the global minimum. In more complex, non-convex functions, the algorithm could get stuck in a local minimum that is not the lowest point overall.
How is this related to machine learning?
Training a machine learning model often involves minimizing a 'loss function' that measures prediction error. Gradient descent is the core algorithm used to adjust the model's parameters (weights and biases) by following the negative gradient of this loss, thereby reducing error step-by-step.