Hands On: Basecamp Overshooting

Let’s talk about the learning rate again. In the final example of this chapter, I used a learning rate of 0.001. Try increasing the learning rate, and you might notice that at some point the loss starts to increase instead of decreasing. Can you imagine why?

If you can’t picture the answer in your mind, try drawing the loss function on paper. What happens with a very large learning rate? As usual, check the content of the 03_gradient/solution directory if you’re stuck.

Footnotes

[9]: https://www.khanacademy.org/math/differential-calculus