What You Just Learned

This chapter was all about overfitting. Overfitting happens when the system learns the statistical noise in the training data, and fails to generalize that knowledge to new data. The more powerful a supervised learning system is, the more likely it is to overfit. Deep neural networks are very powerful, so they’re very prone to overfitting.

You can reduce overfitting by “smoothing out” the neural network’s model, so that it follows the general shape of the data instead of tracking every noisy fluctuation. That idea is called regularization. In this chapter, we looked at a few regularization techniques:

L1 and L2 regularization
Simplifying the model
Simplifying the training data
Early stopping
Increasing the learning rate

To understand overfitting, you should also know about the opposite problem: underfitting. A network underfits when its model is too simplistic for the data at hand. You recognize overfitting because the system’s accuracy is better on the training set than it is on the validation set; you recognize underfitting because the system accuracy is low on both sets.

Overfitting and underfitting aren’t mutually exclusive; a network can do a bit of both. However, as you keep reducing the first, you’ll eventually increase the second. That catch-22 is known as the bias-variance trade-off. A machine learning expert will aim for the sweet spot between overfitting and underfitting.

And that’s it! We confronted overfitting head-on, and came out with a nice collection of regularization methods—and a 2% improvement in accuracy. That, however, is only the beginning. Deep neural networks can take us much farther than that, as long as we get familiar with more, and more varied techniques. In the next chapter, you’ll learn enough ML tricks to fill up your bag.