Transfer learning
The method known as ‘transfer learning’ is based on a pretty simple concept: first take a CNN that has already been trained with some data (perhaps obtained from the internet) and then further train the CNN with a relatively small amount of data relating to your application. Would you expect this to work well? Probably not, but in fact it does work remarkably well. To reiterate, transfer learning involves combining a ‘pre-trained’ network with a limited amount of application specific data. The result is a CNN that has a performance comparable to that of a conventionally fully trained CNN, but a much smaller requirement for training data. Bingo! This offers potential to dramatically increase the number of useful applications of deep learning, since one of the biggest challenges for companies wanting to make use of this technology is that of getting all the large amounts of training data that used to be required. Not only this, but these data generally need to be ‘annotated’ – in other words, each image needs to have a note attached to it saying what is being shown. One application we worked on was using a CNN to automatically detect whether weeds are present in images. To do this, we took 6000 images of a field and for each one a person had a close look to see if weeds were present. This is not as easy as it sounds - even for humans deciding whether small amounts of weeds are present (e.g. 5 – 10%) is not that easy. And for 6000 images? It’s a wonder we didn’t go stir-crazy.
In transfer learning, the lower convolutional layers of the pre-trained network (with ‘frozen’ weights), are combined with several, untrained, ‘fully connected’ layers. The resulting model is then trained on an application specific dataset, resulting in the training of only the ‘fully connected’ layers. The pre-trained network can be generated by employing a dataset known as ImageNet, (as mentioned above this contains millions of images in thousands of classes, such as animals, types of vehicles, household objects and many more). In 2014, Ali Sharif Razavian et al. released a paper that explored the following idea: Since the ImageNet dataset is so diverse, the lower convolutional layers of a CNN, trained on this dataset, would contain very general feature detectors that could be used as the low-level feature detectors for a number of machine vision tasks. This theory proved to be true and ImageNet-trained CNNs have shown strong performance when compared to other more sophisticated methods. There are now a number of the best performing models, pre-trained on the ImageNet dataset, available to download and use freely. Implementing transfer learning in this way has become a very useful tool for creating high performing models without the large data and computational requirements that are often needed to train complex networks. As mentioned, this is important since, for large amounts of data, much effort is required for both the capturing and accurate labelling. Considering again the automatic weed detection application, suppose you are a farmer who wishes to detect weeds in grass so that he/she can selectively spray them with herbicide. This is shown in the figure below, where the dock weed has been ‘segmented’ making it a relatively simple matter to spray some herbicide at it – or even to zap it with a high-power laser!
Using machine vision and deep learning, dock weed in grass has been ‘segmented’ (in the right image), making it a relatively simple matter to spray some herbicide at it . (Images by the author.)
It is, in fact, quite important to remove dock weed from grass, since ingestion of such a weed can be quite harmful to animals. A facility for automatically detecting the weeds and selectively spraying a small amount of herbicide at them clearly offers great potential for realising very significant cost and environmental benefits. It’s only a matter of time until all farmers, at least in the West, employ such methods for weed control.
So, that’s great. The farmer does not want to go around the field for days on end capturing vast amounts of weed images in order to train a CNN for this task, and by employing transfer learning he/she doesn’t have to. So much for transfer learning, but what about data augmentation ?