Why can’t the tree learn a diagonal boundary with one split?

Each split is **parallel to an axis** (either x ≤ t or y ≤ t). A single oblique line like x + y = 0 requires **staircasing** with many axis cuts or a different model class (linear SVM, oblique trees).

When do Gini and entropy pick different first splits?

Both reward purer children, but the **curvature** differs: entropy penalizes 50/50 mixes more sharply, so it sometimes prefers a slightly different threshold that buys more purity early. On many toy clouds the first split matches; on borderline ties our tie-break prefers **x** over **y**, then a smaller threshold.

Does this match sklearn’s `DecisionTreeClassifier` exactly?

The **split search and impurity formulas** mirror the teaching description, but sklearn adds tie-breaking details, optional `class_weight`, `max_features`, and floating policies. Expect **near-identical** trees on the same data with matching hyperparameters, not bit-identical dumps.

Decision Tree Classifier (2D toy)

A binary classification tree built in the plane treats each feature x and y as a coordinate axis. This simulator implements the textbook greedy CART recipe used in scikit-learn’s `DecisionTreeClassifier` for axis-aligned splits: at every node, enumerate every threshold between consecutive sorted values along x and along y that separates the current training subset into nonempty left/right child sets respecting min_samples_leaf. Each candidate split scores the weighted impurity of the children using either Gini (probability of misclassification under random label draws) or binary entropy in bits. The split with largest information gain (parent impurity minus weighted children) is chosen; the process recurses until leaves are pure, depth hits max_depth, no positive gain exists, or there are too few points to split further. Leaves predict the majority class; the canvas shades each leaf’s rectangle and overlays dashed split lines so the recursive partition of ℝ² is visible. Training error is the in-sample misclassification rate (white ring on a point = wrong class inside its leaf).

Who it's for: Introductory machine-learning students comparing impurity criteria, depth limits, and the geometry of axis-aligned decision boundaries before moving to random forests or gradient boosting.

Key terms

CART
Gini impurity
Entropy
Information gain
Axis-aligned split
Min samples per leaf
Max depth
Majority vote

How it works

Train a binary axis-aligned CART tree on a toy 2D point cloud. At each node the learner searches every vertical and horizontal threshold between sorted coordinates, scoring splits by impurity drop (parent minus weighted child impurity) using either Gini or binary entropy. The chosen split recursively partitions the plane into rectangles; leaves vote majority class. Drag points or change max depth and min samples per leaf to see overfitting vs underfitting, and compare how entropy favors purer children slightly earlier than Gini on some layouts.

Frequently asked questions

Why can’t the tree learn a diagonal boundary with one split?: Each split is parallel to an axis (either x ≤ t or y ≤ t). A single oblique line like x + y = 0 requires staircasing with many axis cuts or a different model class (linear SVM, oblique trees).
When do Gini and entropy pick different first splits?: Both reward purer children, but the curvature differs: entropy penalizes 50/50 mixes more sharply, so it sometimes prefers a slightly different threshold that buys more purity early. On many toy clouds the first split matches; on borderline ties our tie-break prefers x over y, then a smaller threshold.
Does this match sklearn’s `DecisionTreeClassifier` exactly?: The split search and impurity formulas mirror the teaching description, but sklearn adds tie-breaking details, optional `class_weight`, `max_features`, and floating policies. Expect near-identical trees on the same data with matching hyperparameters, not bit-identical dumps.

Other simulators in this category — or see all 61.

View category →

NewSchool

Toy 2-Layer MLP + Backprop (XOR / spiral)

Click-labeled 2D data; **tanh** hidden layer + **logistic** output trained by **full-batch** gradient descent on **binary cross-entropy**. Heatmap shows **P(class 1)** evolving across epoch blocks — watch the **0.5 decision contour** wrap XOR or untangle spirals.

Launch Simulator

NewUniversity / research

Convolution (pulses)

Two rectangular pulses; overlap length at τ = 0.

Launch Simulator

NewUniversity / research

Euler vs RK4 (Pendulum)

Same nonlinear pendulum ODE and step h; Euler vs RK4 side by side.

Launch Simulator

NewSchool

Lotka–Volterra

N′ = αN−βNP, P′ = δNP−γP; phase plane RK4; equilibrium dot.

Launch Simulator

NewSchool

Logistic Growth

dN/dt = rN(1−N/K); exact S-curve vs carrying capacity K.

Launch Simulator

NewUniversity / research

Logistic Map Bifurcation

x_{n+1}=rx_n(1−x_n): scan r, plot attractors — period doubling to chaos (Feigenbaum cascade).

Launch Simulator

Decision Tree Classifier (2D toy)

How it works

Frequently asked questions

More from Math Visualization

Toy 2-Layer MLP + Backprop (XOR / spiral)

Convolution (pulses)

Euler vs RK4 (Pendulum)

Lotka–Volterra

Logistic Growth

Logistic Map Bifurcation

Decision Tree Classifier (2D toy)

How it works

Frequently asked questions