Here’s the Plan

Now that you know about the softmax, let’s replace the second sigmoid with a softmax and complete the design of our neural network:

The input layer has 785 columns (one per pixel, plus the bias), and a sigmoid activation function. The hidden layer has 201 columns including the bias, and a softmax activation function. The output layer has 10 columns, matching the 10 classes in MNIST. Finally, the matrices of weights have the right dimensions to make all the matrix multiplications add up. That’s all we need to know to get started.

Speaking of code, you must be eager to get down to it. But first, let’s recap what we learned in this chapter.

There is no standardized notation for neural networks. This book uses its own simple notation, like the one in Here’s the Plan. Images of neural networks from other sources are more likely to use a notation that you might call “blobs and lines.” It comes in different flavors, but it generally looks something like this:

images/designing/network_common_notation.png

The blobs and lines notation does have a few advantages. For example, all those lines give you a good place to jot down the values of individual weights. If you don’t need to do that, however, the lines end up cluttering the diagram and making it harder to sketch.

This book eschews blobs and lines in favor of two other notations. In most of the book, I use a streamlined notation that’s good for sketching neural networks. In Chapter 11, Training the Network, I use a different notation (“computational graphs”) to explain one specific concept. Even if you won’t see the blobs and lines notation in these pages, however, you should be aware that it exists. It’s useful in some edge cases, and it’s common in the wild.