Perceptrons are the simplest type of artificial neuron, invented as a simple model for binary classification. Let's use the context of the dataset that we have been using in this book, the credit card dataset. Let's say that we have only two features for classifying defaulters and nondefaulters: age and bill amount. So the idea of the perceptron is to create some kind of a score. To do so, you take one constant, w1 ,and multiply it by the value of age, and then you add another constant, w2, which is multiplied by the value of the bill amount as follows:
score = w1age+w2bill
As a rule, we classify this person as a defaulter if score > b.
So, from this simple operation, we create a score. Then, we follow the rule to classify people as defaulters or as nondefaulters. So, if this score is greater than some number, then we classify this person as a defaulter.
An equivalent way to state this rule is shown in the following screenshot:
So, the prediction of this model will be 1, or defaulter, if the quantity is greater than 0, and the prediction will be 0, or nondefaulter, if this quantity is less than or equal to 0. The b value is also known as the threshold or bias.
In general, if we have n features, then our perceptron would look similar to the following screenshot:
As you can see, we have the same form. We predict 1 if the sum of the weights times the values of our features -b is actually greater than 0, otherwise, we predict 0. Assuming that all features are on the same scale, the weights would represent the importance of each feature in making the decision. So, we know that for this particular problem we have, all features are in very different scales. For example, ages are in different scales than bill amount, but let's say that you set all of the features to a similar scale. You can think about the w variables as the weights, and they are the most important part of each feature while making the decision.
The following screenshot shows another way to visualize this perceptron:
So, you have the values of the threshold or the bias, b, and you have the value of Age, x1 ,and the value of Bill amount, x2. So the three values go into an operation, and then you get an output. Now, there is a little modification that we can do to the perceptron, and this is to add what is known as an activation function. An activation function is any function that takes the result of the operation and performs some transformation to the input values using the f function. So the input for the activation function is the resulting quantity from the operation, and then, after applying activation function f, we will get the following output:
So, this is the perceptron. We can add an activation function to the perceptron and then we get the rule or the classification 1 or 0.
Now, maybe you are wondering how do we decide which are the best weights and threshold for our perceptron? What activation function can we use? The answers to these questions are provided by the perceptron learning algorithm. So, there is a learning algorithm that we can use to actually train perceptrons. The good thing about perceptrons is that they are very simple to understand. However, they are very weak in performance when compared to more sophisticated methods, such as the methods that we used in previous chapters. So, it is not worth actually learning about this perceptron learning algorithm. However, these very simple models are the building blocks for ANNs.