The CIFAR-10 Dataset

Do you remember back in Chapter 6, Getting Real, when I introduced MNIST? Those handwritten digits looked like a formidable challenge back then. By now, our neural networks are making short work of them. Since we passed 99% accuracy on MNIST, it’s getting hard to even tell apart actual improvements from random fluctuations.

What do you do when your neural networks are too cool for MNIST? You turn to a more challenging dataset: CIFAR-10.

What CIFAR-10 Looks Like

In the field of image recognition, MNIST is considered an entry point. A tougher benchmark is the CIFAR-10 dataset.[27] CIFAR stands for Canadian Institute For Advanced Research, and 10 is the number of classes in it. A random sample of CIFAR-10 images is shown in the figure.

images/beyond/cifar10.png

 

Like MNIST, CIFAR-10 contains 60,000 images in the training set and 10,000 in the test set, but the images in CIFAR-10 are thougher to classify. Even though they look so pixelated, they’re bigger than MNIST’s digits. Instead of being matrices, CIFAR-10 images are tensors—that is, arrays with more than two dimensions. To be precise, they’re (32, 32, 3) tensors: 32 by 32 pixels with 3 color channels for red, green, and blue. That makes for a grand total of 3,072 variables—almost four times as many as MNIST’s 784.

However, the challenge of CIFAR-10 isn’t in the size of its examples as much as in their complexity. Instead of relatively simple digits, the images represent real-world subjects such as horses and ships. For example, check out how different the birds (labeled 2 in our sample) look from each other. By contrast, see how similar objects can be even when they belong to different classes, like the seagull and the fighter jet, or cars and trucks. With so many confusing subjects, CIFAR-10 puts even powerful neural networks through their paces.

Let’s dust off the best neural network we’ve built so far and see how it copes with these images.

Falling Short of CIFAR

I retrieved our best neural network so far—the solution to Hands On: The 10 Epochs Challenge—and modified it to train on CIFAR-10 instead of MNIST. Keras, bless its binary soul, comes with built-in support for CIFAR-10. So I only had to change a few lines, which are marked by small arrows in the left margin:

19_beyond/cifar10_fully_connected.py
 import​ ​numpy​ ​as​ ​np
 from​ ​keras.models​ ​import​ Sequential
 from​ ​keras.layers​ ​import​ Dense, BatchNormalization
 from​ ​keras.optimizers​ ​import​ Adam
 from​ ​keras.initializers​ ​import​ glorot_normal
 from​ ​keras.utils​ ​import​ to_categorical
»from​ ​keras.datasets​ ​import​ cifar10
 
»(X_train_raw, Y_train_raw), (X_test_raw, Y_test_raw) = cifar10.load_data()
 X_train = X_train_raw.reshape(X_train_raw.shape[0], -1) / 255
 X_test_all = X_test_raw.reshape(X_test_raw.shape[0], -1) / 255
 X_validation, X_test = np.split(X_test_all, 2)
 Y_train = to_categorical(Y_train_raw)
 Y_validation, Y_test = np.split(to_categorical(Y_test_raw), 2)
 
 model = Sequential()
 model.add(Dense(1200, activation=​'relu'​))
 model.add(BatchNormalization())
 model.add(Dense(500, activation=​'relu'​))
 model.add(BatchNormalization())
 model.add(Dense(200, activation=​'relu'​))
 model.add(BatchNormalization())
 model.add(Dense(10, activation=​'softmax'​))
 
 model.compile(loss=​'categorical_crossentropy'​,
  optimizer=Adam(),
  metrics=[​'accuracy'​])
 
 history = model.fit(X_train, Y_train,
  validation_data=(X_validation, Y_validation),
» epochs=25, batch_size=32)

Aside from loading CIFAR-10 instead of MNIST, the only change I made is the number of epochs. I bumped it up from 10 to 25, to give more training time to the network. When I ran this program, I had to wait for a few minutes as Keras downloaded CIFAR-10 to my computer—and then for a few hours as it churned through those 25 epochs. Here is what I got in the end:

 Train on 50000 samples, validate on 5000 samples
 Epoch 1 - loss: 1.7835 - acc: 0.3665 - val_loss: 1.9901 - val_acc: 0.3244
 
 Epoch 25 - loss: 0.9512 - acc: 0.6596 - val_loss: 1.4610 - val_acc: 0.5216

The numbers are in: our neural network misclassifies about half the images in the validation set. Sure, it’s more accurate than the 10% we’d get by random guessing, but that doesn’t make it great.

It seems that CIFAR-10 is too much to handle for a fully connected neural network. Let’s see what happens if we switch to a different architecture—one that’s more fit to deal with images.