7 Comments
Apr 13, 2023Liked by Tivadar Danka

Well explained 💯🔥

Expand full comment

I like the differential equations view. People tend to add a disclaimed that it's the "small step size limit", but turns out this approximation works quite well even for large step size in high dimensions. Step size is bounded by largest eigenvalue, but if most of the mass comes from remaining dimensions, the "large" step size is actually quite small. I did some visualizations on this a while back https://machine-learning-etc.ghost.io/gradient-descent-linear-update/

Expand full comment
Apr 6, 2023Liked by Tivadar Danka

I think there's a typo in the derivative definition: isn't the limit supposed to go to zero?

Expand full comment

I didn't get how you reached to the equations for the `monotonicity describes long-term behavior`.

Expand full comment