Deep neural networks can be wild, untameable beasts. In this chapter, you learned of a few useful techniques to turn them into cute docile puppies.
We started with a lengthy discussion of activation functions. You learned that nonlinear functions are a necessity in neural networks, but we must choose them carefully. So far we used sigmoids inside the network, but sigmoids can cause a number of problems as the network gets deeper: saddening dead neurons, perplexing vanishing gradients, and shocking exploding gradients. For that reason, we looked at a few alternatives to the sigmoid—in particular, the popular ReLU activation function.
After that discussion of activation functions, we took a whirlwind tour through a number of other techniques that help us tame deep neural networks:
Xavier initialization to initialize a neural network’s weights.
A handful of advanced GD algorithms: learning rate decay, momentum, RMSprop, and Adam.
Dropout, a brilliantly counterintuitive regularization technique.
The extremely useful technique of batch normalization.
You’re probably itching to try out these techniques yourself. If so, then the next hands-on exercise will scratch that itch. Otherwise, prepare for the next chapter: it’s going to take everything that you learned so far about neural networks and spin it on its head.