2026-05-05

Gradient descent metaphor for life:

Do your best where you are. Trust that there are multiple solutions with low loss. Trust that life will perturb you out of suboptimal points.

You don't need to know where the minimum is. You only need to know which way is down. After enough steps, if you're lucky and the landscape isn't too cruel, you end up somewhere with low loss.

By naive intuition, gradient descent on a wildly non-convex surface in millions of dimensions should get stuck constantly in bad local minima and saddle points. It mostly doesn't. The current best understanding is something like: in very high dimensions, most critical points are saddles rather than minima, saddles are escapable with stochastic noise, and the local minima you do reach tend to be roughly equivalent in loss for reasons related to over-parameterization and the geometry of the loss surface.