Here is the complete code of our neural network:
| import numpy as np |
| |
| |
| def sigmoid(z): |
| return 1 / (1 + np.exp(-z)) |
| |
| |
| def softmax(logits): |
| exponentials = np.exp(logits) |
| return exponentials / np.sum(exponentials, axis=1).reshape(-1, 1) |
| |
| |
| def sigmoid_gradient(sigmoid): |
| return np.multiply(sigmoid, (1 - sigmoid)) |
| |
| |
| def loss(Y, y_hat): |
| return -np.sum(Y * np.log(y_hat)) / Y.shape[0] |
| |
| |
| def prepend_bias(X): |
| return np.insert(X, 0, 1, axis=1) |
| |
| |
| def forward(X, w1, w2): |
| h = sigmoid(np.matmul(prepend_bias(X), w1)) |
| y_hat = softmax(np.matmul(prepend_bias(h), w2)) |
| return (y_hat, h) |
| |
| |
| def back(X, Y, y_hat, w2, h): |
| w2_gradient = np.matmul(prepend_bias(h).T, (y_hat - Y)) / X.shape[0] |
| w1_gradient = np.matmul(prepend_bias(X).T, np.matmul(y_hat - Y, w2[1:].T) |
| * sigmoid_gradient(h)) / X.shape[0] |
| return (w1_gradient, w2_gradient) |
| |
| |
| def classify(X, w1, w2): |
| y_hat, _ = forward(X, w1, w2) |
| labels = np.argmax(y_hat, axis=1) |
| return labels.reshape(-1, 1) |
| |
| |
| def initialize_weights(n_input_variables, n_hidden_nodes, n_classes): |
| w1_rows = n_input_variables + 1 |
| w1 = np.random.randn(w1_rows, n_hidden_nodes) * np.sqrt(1 / w1_rows) |
| |
| w2_rows = n_hidden_nodes + 1 |
| w2 = np.random.randn(w2_rows, n_classes) * np.sqrt(1 / w2_rows) |
| |
| return (w1, w2) |
| |
| |
| def report(iteration, X_train, Y_train, X_test, Y_test, w1, w2): |
| y_hat, _ = forward(X_train, w1, w2) |
| training_loss = loss(Y_train, y_hat) |
| classifications = classify(X_test, w1, w2) |
| accuracy = np.average(classifications == Y_test) * 100.0 |
| print("Iteration: %5d, Loss: %.8f, Accuracy: %.2f%%" % |
| (iteration, training_loss, accuracy)) |
| |
| |
| def train(X_train, Y_train, X_test, Y_test, n_hidden_nodes, iterations, lr): |
| n_input_variables = X_train.shape[1] |
| n_classes = Y_train.shape[1] |
| w1, w2 = initialize_weights(n_input_variables, n_hidden_nodes, n_classes) |
| for iteration in range(iterations): |
| y_hat, h = forward(X_train, w1, w2) |
| w1_gradient, w2_gradient = back(X_train, Y_train, y_hat, w2, h) |
| w1 = w1 - (w1_gradient * lr) |
| w2 = w2 - (w2_gradient * lr) |
| report(iteration, X_train, Y_train, X_test, Y_test, w1, w2) |
| return (w1, w2) |
| |
| |
| import mnist |
| w1, w2 = train(mnist.X_train, mnist.Y_train, |
| mnist.X_test, mnist.Y_test, |
| n_hidden_nodes=200, iterations=10000, lr=0.01) |
To write the very last line here, I had to set values for the hyperparameters. The number of hidden nodes was easy: we’d already decided to have 200 hidden nodes—plus the bias, that’s added automatically by the network. By contrast, it took me some time to find a good value for lr: I had to try a few different learning rates, and pick the one that resulted in the lowest loss. Near the end of Part II, we’ll look at a more systematic way to choose hyperparameter values.
And at long last, here are the results of running the network:
| Iteration: 0, Loss: 2.38746031, Accuracy: 13.61% |
| Iteration: 1, Loss: 2.34527197, Accuracy: 15.00% |
| … |
| Iteration: 9999, Loss: 0.14668400, Accuracy: 93.25% |
The perceptron that we built in Chapter 7, The Final Challenge couldn’t reach 93% accuracy, no matter how long you trained it. The neural network has to do more calculations than a perceptron, so it takes a bit longer to pass 90% accuracy—but after that, it just keeps going. After a few hours of number crunching, the network has reached over 93% accuracy in character recognition on MNIST. Pretty good, although still far from our lofty goal of 99% accuracy.
In case you think that my enthusiasm for that 1% improvement is unwarranted, look at it from the opposite point of view: the perceptron makes a classification mistake over 8% of the times, while the network gets it wrong less than 7% of the times. In a production system, that improvement might already make a difference—and that is only the beginning. Over the next few chapters, we’ll push that accuracy much higher.
Let’s recap what we learned in this long chapter.