Neural network models for classification usually accept three-channel images and produce a vector with probabilities across categories. To use a trained model, you need to know a few things:
- What preprocessing of input images has been used in training
- Which layers are inputs and which are outputs
- How data is organized in the output tensor
- What meaning the values in the output tensor have
In our case, each model requires its own preprocessing. Also, models need different orders of channels. Without these two things, models won't work as well (sometimes slightly, sometimes dramatically). Also, models have different names for input and output layers.
Output vectors in classification contain probabilities for all categories. Indexes for maximal values in the outputs are indexes for categories. To convert such indexes to names, you need to parse a special file with matches between categories indexes and their names. These files may be (and in our case are) different for different models.
After executing the code, you will get images similar to the following: