Multilayer perceptron

ANNs are models based on perceptrons or other similar basic building blocks, and the ones that we will learn about in this book are based on perceptrons. One of the most popular ANN models is the MLP, which we will use in this book. The motivation for using perceptrons in an ANN is that, instead of using one single perceptron for classification, what if we used many of them? Take a look at the following screenshot:

Here, we have three perceptrons and we notice that we have a different bias for each perceptron. But the values for our features will be the same in all cases. If we use three perceptrons, we will get three output values, but we know that this is a binary classification problem so we need only one output. So, now that we have three output values, we can combine them or we can view these output values as input values for another perceptron. Take a look at the following screenshot:

As you can see in the following screenshot, we can take the output values from the preceding perceptrons and fit them as input values to another perceptron, and this perceptron will give us the output. So, this is the intuition on how to build neural networks or MLPs, and this is an example of an ANN:

In the preceding screenshot, we have the following three layers of an MLP:

Input Layer: In this layer, you have the original data or the training data that you will use to train this model
Hidden Layer: This middle layer is the output from the preceding perceptron, which is used as the input for the next perceptron
Output Layer: In this layer, you have the output that you get from the network

The following screenshot is another way to visualize the same ANN:

This is a more compact way to visualize it, but it's actually the same network. So, instead of having three biases, we add one constant feature, 1, for every observation. This value of 1 gets multiplied by the different biases and goes as input to the neurons in our hidden layer. The value of x1 gets multiplied by some weight and goes as input for the next neurons, and the same happens with the value of x2. Then, the result of the neurons in the hidden layer is used as input for the last perceptron in our network, which is the overall output.