The Capsule Network and convolutional neural networks

The convolutional neural network (CNN) has been one of the most important milestones in the area of deep learning. It has got everyone excited and has been the cornerstone for new research, too. But, as they say, Nothing is perfect in this world. Nor is our beloved CNN.

Can you recall how CNNs work? The most important job of a CNN is to execute convolution. What this means is that once you pass an image through CNN, the features, such as edges and color gradients, are extracted from image pixels by the convolution layer. Other layers will combine these features into a more complex one. And on top of it, once the dense layer is kept, it enables the network to carry out the classification job. The following diagram shows the image that we are working on:

The preceding diagram is a basic CNN network, which is being used to detect the car in the image. The following diagram shows the image of a car that is in perfect order, and a fragmented image of the same:

Let's say we pass on these two images through the CNN network (to detect the car) – what will be the response of the network for both images? Can you ponder over this and come up with an answer? Just to help you, a car has got a number of components such as wheels, a windshield, a bonnet, and so on, but a car is deemed as a car to human eyes when all of these parts/components are set in order. However, for a CNN, only the features are important. A CNN doesn't take relative positional and orientational relationship into account. So, both of these images will be classified as a car by the network, even though this is not the case to the human eye.

To make amends, CNNs include max-pooling, which helps in increasing the view of a higher layer's neurons, thus making the detection of higher order features possible. Max-pooling makes CNNs work, but at the same time, information loss also takes place. It's a big drawback of CNN.