ID3 algorithm

So far, the algorithms that we have seen were not very complex; they were rather trivial, and most of our focus was on implementing the API for a particular data structure. In this example, there is no data structure to implement; the algorithm itself generates the decision tree that we would be using for our application.

First, let's take a look at how a table of historical data can be converted into a decision tree. The tree that is formed primarily is made up of decision nodes (which indicate a decision) and leaf nodes (which indicates the final response, such as yes or no).  The decision nodes can have two or more subnodes, depending on the dataset. However, the tree has to start from somewhere, right? What would be the root node and how do we get it?

To determine the root node, we will first need to know some basics of Information Theory here:

The attribute with the highest information gain becomes the root attribute. Then, we repeat the process for each subtree until we get no entropy. Let's take a look at this with an example and then start off with the code.

For the following example, we will take a simple input and popular dataset to decide whether a game is going to be played or not based on the weather conditions:

Outlook Temperature Humidity Wind Play Soccer
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No

In the preceding example, the target is the Play Soccer attribute. Let's assume that our input source has a capability of sending n messages and the probability of sending each message is Pn , then the entropy of the source is the summation over n of the probabilities E = -pi * log2(pi).