Gradient descent is a fundamental optimization algorithm used to find the minimum of a function. This simulator visualizes the process in two dimensions, where the function f(x,y) represents a surface, such as a bowl or an elliptic well. The level sets, or contour lines, of f(x,y) are curves of constant function value, analogous to elevation lines on a topographic map. The algorithm iteratively updates the current point (x,y) by moving a small step in the direction opposite the function's gradient, following the update rule: (x,y) ← (x,y) − η∇f(x,y). Here, ∇f(x,y) is the gradient vector, which points in the direction of steepest ascent, and η (eta) is the learning rate, a positive scalar controlling the step size. By repeatedly stepping against the gradient, the path descends toward a local minimum. The visualization demonstrates how the choice of learning rate and starting position affects convergence. A rate that is too small leads to slow progress, while one that is too large can cause overshooting and oscillation, or even divergence. The model simplifies real-world optimization by using smooth, convex functions with a single global minimum, avoiding complexities like saddle points, noise, or high-dimensional parameter spaces. Interacting with this simulation helps learners build intuition for the core mechanics of gradient-based optimization, a principle underpinning machine learning training, engineering design, and various scientific fitting procedures.
Who it's for: Undergraduate students in calculus, multivariable analysis, or introductory machine learning courses, as well as anyone seeking an intuitive grasp of numerical optimization.
Key terms
Gradient Descent
Gradient
Learning Rate
Level Sets
Contour Plot
Optimization
Convex Function
Local Minimum
How it works
Visualize steepest-descent on a smooth convex bowl — a bridge between calculus and optimization.
Frequently asked questions
Why do we move against the gradient, not with it?
The gradient vector ∇f points in the direction of the steepest increase of the function. To minimize the function, we want to go downhill, which is the direction of steepest decrease. Therefore, we subtract the gradient, moving in the direction of -∇f.
What happens if I set the learning rate (η) too high?
An excessively high learning rate causes the algorithm to take steps that are too large. This can lead to overshooting the minimum, resulting in oscillations around it or even causing the path to diverge and move away from the minimum entirely, failing to converge.
Is the minimum found always the global (lowest) minimum?
Not necessarily. Gradient descent converges to a local minimum. In this simulator, the functions are convex (shaped like a bowl), so there is only one local minimum, which is also the global minimum. In more complex, non-convex functions, the algorithm could get stuck in a local minimum that is not the lowest point overall.
How is this related to machine learning?
Training a machine learning model often involves minimizing a 'loss function' that measures prediction error. Gradient descent is the core algorithm used to adjust the model's parameters (weights and biases) by following the negative gradient of this loss, thereby reducing error step-by-step.