Why does gradient descent work?

Tivadar Danka

Apr 6, 2023

Rolling downhill with dynamical systems

Read →

7 Comments

Jaisurya Prabakaran

Apr 13, 2023Liked by Tivadar Danka

Well explained 💯🔥

Expand full comment

Yaroslav Bulatov

Yaroslav’s Substack

Apr 12, 2023Liked by Tivadar Danka

I like the differential equations view. People tend to add a disclaimed that it's the "small step size limit", but turns out this approximation works quite well even for large step size in high dimensions. Step size is bounded by largest eigenvalue, but if most of the mass comes from remaining dimensions, the "large" step size is actually quite small. I did some visualizations on this a while back https://machine-learning-etc.ghost.io/gradient-descent-linear-update/

Expand full comment

Reply (1)