Generating the decision tree

To keep the code complexity to a minimum, we will be extracting all the helper methods that we would be recursively calling, as seen in the previous example. We can start with the train() method because that is going to be called first to determine the root decision node.

Before we do that, let's create an Injectable service for our ID3 algorithm in the utils folder which we will be injecting where we wish to use it. This logic can live anywhere you wish, server or client side. One thing to note is that the dataset, in this case, is relatively small, so training the dataset and predicting the outcomes are an okay thing to do on the client side. With larger datasets, which take much longer to train, it is recommended to do this on the server side, as follows:

import {Injectable} from "@angular/core";

@Injectable()
export class ID3 {

constructor() {

}

}

At each step of the algorithm, we will be relying heavily on helper methods to keep the implementation details clear; most of these would be provided by lodash, so let's install and import it so that we can implement the train() method:

npm install lodash --save

Once lodash is installed, we can start with the train() method, which accepts three parameters: the training dataset, the target attribute, and the list of all the attributes extracted from the training dataset sans the target:

import {Injectable} from "@angular/core";
import { } from "lodash";

@Injectable()
export class ID3 {

constructor() {

}

public train(trainingData, target, attributes) {

}

}

To use this service, mark it as a provider in the main module and then inject it in the app.component:

...
import { ID3 } from '../utils/id3';
...

@NgModule({
...
providers: [
ID3
],
...
})
export class AppModule { }

Then, to consume it in the main component, we can just import the ID3 service we have just created and then call the train() method on the service instance:

import { Component, OnInit } from '@angular/core';
import {ID3} from "../utils/id3";
import {without, keys, filter} from "lodash";
import {CreditCard} from "../utils/training-data/credit-card";

@Component({
selector: 'app-root',
templateUrl: './app.component.html',
styleUrls: ['./app.component.scss']
})
export class AppComponent implements OnInit {
tree;
tests: any;

constructor(private id3: ID3) {
this.tree = this.id3.train(
CreditCard.data,
'approved',
without(keys(CreditCard.data[0]),
'approved'));
}

ngOnInit() {
this.tests = ... // testing data
}

}

Let us also add some styles to our page to make it look nice, so update the app.component.scss file:

.split {
width: 50%;
float: left
}

table, td, th {
text-align: center;
border: 1px solid black;
}

table {
border-collapse: collapse;
width: 100%;
}

th {
height: 50px;
}


.true {
background: #bcf9bc;
}

.false {
background: #ffa2a7;
}

As discussed in the preceding algorithm, the first thing that we do in our application is to determine the root decision node, for example, the attribute with the highest information gain:

import {Injectable} from "@angular/core";
import { maxBy, uniq, map, filter, without, keys, size, chain, find, countBy } from "lodash";

@Injectable()
export class ID3 {

constructor() {

}

public train(trainingData, target, attributes) {

// calculate root node from current list of attributes
var currentRootNode = this.getCurrentRootNode(
trainingData, target, attributes);

}

private getCurrentRootNode(trainingData, target, attributes) {

// get max extropy attribute
return maxBy(attributes, (attr) => {

// calculate information gain at each attribute
// e.g. 'creditScore', 'creditAge' etc
return this.gain(trainingData, target, attr);
});
}

private gain(trainingData, target, attr) {
// calculate target branches entropy e.g. approved
var targetEntropy = this.entropy(map(trainingData, target));

// calculate the summation of all branches entropy
var sumOfBranchEntropies =
chain(trainingData)

// extract branches for the given attribute
// e.g creditScore has the branches Excellent, Good,
// Average, Poor
.map(attr)

// make the values unique
.uniq()

// for each unique branch calculate the branch entropy
// e.g. calculate entropy of Excellent, Good, Average,
Poor
.map((branch) => {

// extract only the subset training data
// which belongs to current branch
var branchTrainingData = filter(trainingData,
[attr, branch]);

// return (probability of branch) * entropy of
branch
return (branchTrainingData.length /
trainingData.length)
* this.entropy(map(branchTrainingData,
target));
})

// add all branch entropies
// e.g. add entropy of Excellent, Good, Average, Poor
.reduce(this.genericReducer, 0)

// return the final value
.valueOf();

// return information gain
return targetEntropy - sumOfBranchEntropies;
}

private entropy(vals) {

// take all values
return chain(vals)

// make them unique
// e.g. an array of Yes and No
.uniq()

// calculate probability of each
.map((x) => this.probability(x, vals))

// calculate entropy
.map((p) => -p * Math.log2(p))

// reduce the value
.reduce(this.genericReducer, 0)

// return value
.valueOf();
}

private probability(val, vals){

// calculate total number of instances
// e.g. Yes is 100 out of the 300 values
var instances = filter(vals, (x) => x === val).length;

// total values passed e.g. 300
var total = vals.length;

// return 1/3
return instances/total;
}

private genericReducer(a, b) {

// add and return
return a + b;
}

From the preceding code, you can see that we calculate the root decision node of the tree first by calculating the branch entropies of each attribute and determining the maximum information gain.

Now that we have the root, we can recursively repeat the process for each branch of the node and then continue to find the decision nodes until we hit the entropy of 0, that is, leaf nodes.

This modifies our train() method as follows :

public train(trainingData, target, attributes) {
// extract all targets from data set e.g.
// Yes or No
var allTargets = uniq(map(trainingData, target));

// only Yes or No is remaining e.g. leaf node found
if (allTargets.length == 1){
return { leaf: true, value: allTargets[0] };
}

// calculate root node from current list of attributes
var currentRootNode = this.getCurrentRootNode(
trainingData, target, attributes);

// form node for current root
var node: any = { name: currentRootNode, leaf: false };

// remove currentRootNode from list of all attributes
// e.g. remove creditScore or whatever the root node was
// from the entire list of attributes
var remainingAttributes = without(attributes, currentRootNode);

// get unique branch names for currentRootNode
// e.g creditScore has the branches Excellent, Good,
// Average, Poor
var branches = uniq(map(trainingData, currentRootNode));

// recursively repeat the process for each branch
node.branches = map(branches, (branch) => {

// take each branch training data
// e.g. training data where creditScore is Excellent
var branchTrainingData = filter(trainingData, [currentRootNode,
branch]);

// create node for each branch
var branch: any = { name: branch, leaf: false };

// initialize branches for node
branch.branches = [];

// train and push data to subbranch
branch.branches.push(this.train(
branchTrainingData, target, remainingAttributes));

// return branch as a child of parent node
return branch;
});

return node;
}

With that, the train() method:

  1. Takes the input training data, the target attribute, and the attributes list.
  2. Gets the current root attribute by calculating the maximum information gain at each of the branch of the attribute and creates the root node of the tree.
  3. Pushes the recursively generated sub-tree into the branches of the root node.