From 2010, an annual image classification competition has been run called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The image set consists of over 14 million images that have been labelled with over 1,000 categories. Without this dataset, there would not be the huge interest in deep learning that there is today. It has provided the stimulus for research in deep learning through the competition. The models and weights learnt on the Imagenet dataset have then been used in thousands of other deep learning models through transfer learning. The actual history of ImageNet is an interesting story. The following link (https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/) explains how the project originally got little attention, but that changed with a number of linked events:
- The ILSVRC became the benchmark for image classification for researchers.
- NVIDIA had released libraries that allowed access to graphical processing units (GPUs). GPUs are designed to do massive parallel matrix operations, which is exactly what is needed to create deep neural networks.
- Geoffrey Hinton, Ilya Sutskever, and Alex Krizhevsky from the University of Toronto created a deep convolutional neural network architecture called AlexNet that won the competition in 2012. Although this was not the first use of convolutional neural networks, their submission beat the next approach by a huge margin.
- Researchers noticed that when they trained models using the ImageNet dataset, they could use them on other classification tasks. They almost always got much better performance from using the ImageNet model and then using transfer learning than just training a model from scratch on the original dataset.
The advances in image classification can be tracked with some notable entries in the ILSVRC competition:
Team |
Year |
Error rate |
2011 ILSVRC winner (not deep learning) |
2011 |
25.8% |
AlexNet (7 layers) |
2012 |
15.3% |
VGG Net (16 layers) |
2014 |
7.32% |
GoogLeNet / Inception (19 layers) |
2014 |
6.67% |
ResNet (152 layers) |
2015 |
3.57% |
VGGNet, Inception, and Resnet are all available in Keras. A complete list of available networks can be found at https://keras.rstudio.com/reference/index.html#section-applications.
The models for these networks can be loaded in Keras and used to classify a new image into one of the 1,000 categories in ImageNet. We will look at this next. If you have a new classification task with a different set of images, then you can also use these networks and then use transfer learning, which we will look at later in this chapter. The number of categories can be different; you do not need to have 1,000 categories for your task.
Perhaps the simplest model to begin with is VGGNet, as it is not that different to what we saw in Chapter 5, Image Classification Using Convolutional Neural Networks.