Now that you have some Keras under your belt, here is an exercise for you: take the MNIST network we built in Part II of this book and rewrite it from scratch using Keras.
Keras already comes with a few common datasets, MNIST included—so you don’t have to use our mnist.py library. Instead, you can load MNIST and one-hot encode its labels with this piece of code:
| from keras.datasets import mnist |
| from keras.utils import to_categorical |
| |
| (X_train_raw, Y_train_raw), (X_test_raw, Y_test_raw) = mnist.load_data() |
| X_train = X_train_raw.reshape(X_train_raw.shape[0], -1) / 255 |
| X_test = X_test_raw.reshape(X_test_raw.shape[0], -1) / 255 |
| Y_train = to_categorical(Y_train_raw) |
| Y_test = to_categorical(Y_test_raw) |
The first time you run this code, Keras will download MNIST all by itself.
Note that MNIST doesn’t have a validation set out of the box. You can either bend the rules and use the test set to validate the network, or split the test set into a validation set and a smaller test set, like we did before.
Note that this code doesn’t standardize the inputs as accurately as we did in our earlier implementations: it just divides all the input pixels by 255 so they’ll range from 0 to 1. Feel free to use the more sophisticated standardization method we used in Standardization in Practice. When I tried that method, I found it didn’t make a difference for my network—so I chose not to use it.
I showed you the code to load the data, but writing the network is up to you. When it comes time to call model.compile(), you can decide which algorithm to use—either SGD (that is, standard gradient descent) or the more advanced RMSProp that we used earlier in this chapter. If you wish, try both and compare the training speed and final accuracy on the training and the test set.
Don’t expect that the Keras MNIST network will work well with the same hyperparameters we found in Chapter 15, Let’s Do Development. You have a different implementation, so you’ll probably need to find new values for those hyperparameters. Aim for 99% accuracy or better. If you want to take a peek, you’ll find my solution in the 16_deeper/solution directory, as usual.
One last thing: you should notice that the Keras-based neural network is faster than the one we wrote from scratch. That’s no surprise—after all, Keras has been carefully optimized, and our code hasn’t. However, most of the CPU power during the training of a neural network is spent multiplying matrices, and our earlier code delegated that operation to the highly optimized NumPy. Bottom line: you should expect the Keras version of the neural network to be faster than the old bespoke version, but not crazy faster.
Happy coding!