In the beginning of Part II, we set a target for ourselves: 99% accuracy on MNIST. We just got so close, reaching 98.6%. That last 0.4%? That’s up to you.
I told you that configuring a network can be more art than science, and this Hands On is proof of that. Here’s my suggestion: to find better hyperparameters than the ones we have now, drop compare.py. Instead, use neural_network_quieter.py, an alternative version of the network that logs accuracy only once every 10 epochs—it’s much faster. Here are a few things that you can try:
The standardized version of MNIST generally works better. There’s no reason not to use it.
Sometimes, it pays off to be patient and wait for more epochs before giving up. However, if 20 or 30 epochs go by without the accuracy increasing at all, then it may be time to try another configuration.
More hidden nodes slow down the training, but they make the network smarter when dealing with gnarly data. Don’t be afraid to push that number higher.
The smaller the learning rate, the slower the training, but those small steps help the network reach closer to the minimum loss. Try a slightly smaller value of lr than the one we settled on.
We talked a lot about the pros and cons of small batches in the past. All that being said, in this specific case I had more luck with very large batch sizes.
Hitting that 99% with our neural network is entirely possible. Happy hunting! Check out the 15_development/solution directory if you want to compare your hyperparameters to the ones that I found.