Evaluation for classification

The core DL4J class when implementing evaluations is called evaluation (https://static.javadoc.io/org.deeplearning4j/deeplearning4j-nn/0.9.1/org/deeplearning4j/eval/Evaluation.html, part of the DL4J NN module).

The dataset that will be used for the example presented in this subsection is the Iris dataset (it is available for download at https://archive.ics.uci.edu/ml/datasets/iris). It is a multivariate dataset that was introduced in 1936 by the British statistician and biologist Ronald Fisher (https://en.wikipedia.org/wiki/Ronald_Fisher). It contains 150 records – 50 samples – from three species of Iris flower (Iris setosa, Iris virginica, and Iris versicolor). Four attributes (features) have been measured from each sample – the length and width of the sepals and petals (in centimeters). The structure of this dataset was used for the example that was presented in Chapter 4, Streaming, in the Streaming data with DL4J and Spark section. Here's a sample of the data that's contained in this set:

sepal_length,sepal_width,petal_length,petal_width,species
 5.1,3.5,1.4,0.2,0
 4.9,3.0,1.4,0.2,0
 4.7,3.2,1.3,0.2,0
 4.6,3.1,1.5,0.2,0
 5.0,3.6,1.4,0.2,0
 5.4,3.9,1.7,0.4,0
 ...

Typically, for cases of supervised learning like this, a dataset is split into two parts: 70% and 30%. The first part is for the training, while the second is used to calculate the error and modify the network if necessary. This is also the case for this section example – we are going to use 70% of the dataset for the network training and the remaining 30% for evaluation purposes.

The first thing we need to do is get the dataset using a CSVRecordReader (the input file is a list of comma-separated records):

val numLinesToSkip = 1
 val delimiter = ","
 val recordReader = new CSVRecordReader(numLinesToSkip, delimiter)
 recordReader.initialize(new FileSplit(new ClassPathResource("iris.csv").getFile))

Now, we need to convert the data that's going to be used in the neural network:

val labelIndex = 4
 val numClasses = 3
 val batchSize = 150
 
 val iterator: DataSetIterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, numClasses)
 val allData: DataSet = iterator.next
 allData.shuffle()

Each row of the input file contains five values – the four input features, followed by an integer label (class) index. This means that the labels are the fifth value (labelIndex is 4). The dataset has three classes representing the types of Iris flowers. They have integer values of either zero (setosa), one (versicolor), or two (virginica).

As we mentioned previously, we split the dataset into two parts – 70% of the data is for training, while the rest is for evaluation:

val iterator: DataSetIterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, numClasses)
 val allData: DataSet = iterator.next
 allData.shuffle()
 val testAndTrain: SplitTestAndTrain = allData.splitTestAndTrain(0.70)
 
 val trainingData: DataSet = testAndTrain.getTrain
 val testData: DataSet = testAndTrain.getTest

The split happens through the SplitTestAndTrain class (https://deeplearning4j.org/api/latest/org/nd4j/linalg/dataset/SplitTestAndTrain.html) of ND4J.

We also need to normalize the input data (for both the training and evaluation sets) using the ND4J NormalizeStandardize class (https://deeplearning4j.org/api/latest/org/nd4j/linalg/dataset/api/preprocessor/NormalizerStandardize.html) so that we have a zero mean and a standard deviation of one:

val normalizer: DataNormalization = new NormalizerStandardize
 normalizer.fit(trainingData)
 normalizer.transform(trainingData)
 normalizer.transform(testData)

We can now configure and build the model (a simple feedforward neural network):

val conf = new NeuralNetConfiguration.Builder()
   .seed(seed)
   .activation(Activation.TANH)
   .weightInit(WeightInit.XAVIER)
   .l2(1e-4)
   .list
   .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(3)
     .build)
   .layer(1, new DenseLayer.Builder().nIn(3).nOut(3)
     .build)
   .layer(2, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
     .activation(Activation.SOFTMAX)
     .nIn(3).nOut(outputNum).build)
   .backprop(true).pretrain(false)
   .build

The following screenshot shows a graphical representation of the network for this example:

The MNN can be created by starting from the preceding configuration:

val model = new MultiLayerNetwork(conf)
 model.init()
 model.setListeners(new ScoreIterationListener(100))

The training can be started if we use the portion (70%) of the input dataset that has been reserved for it:

for(idx <- 0 to 2000) {
   model.fit(trainingData)
 }

At the end of training, the evaluation can be done using the reserved portion (30%) of the input dataset:

val eval = new Evaluation(3)
 val output = model.output(testData.getFeatureMatrix)
 eval.eval(testData.getLabels, output)
 println(eval.stats)

The value that's passed to the evaluation class constructor is the number of classes to account for in the evaluation – this is 3 here because we have 3 classes of flowers in the dataset. The eval method compares the labels array from the test dataset with the labels that were generated by the model. The result of the evaluation is finally printed to the output:

By default, the stats method of the Evaluation class displays the confusion matrix entries (one entry per line), Accuracy, Precision, Recall, and F1 Score, but other information can be displayed. Let's talk about what these stats are.

The confusion matrix is a table that is used to describe the performance of a classifier on a test dataset for which the true values are known. Let's consider the following example (for a binary classifier):

Prediction count = 200	Predicted as no	Predicted as yes
Actual: no	55	5
Actual: yes	10	130

These are the insights we can get from the preceding matrix:

There are two possible predicted classes, yes and no
The classifier made 200 predictions in total
Out of those 200 cases, the classifier predicted yes 135 times and no 65 times
In reality, 140 cases in the sample are yes and 60 are no

When this is translated into proper terms, the insights are as follows:

True positives (TP): These are cases in which yes has been predicted and it is really a yes
True negatives (TN): No has been predicted and it is really a no
False positives (FP): Yes has been predicted, but really it is a no
False negatives (FN): No has been predicted, but really it is a yes

Let's consider the following example:

	Predicted as no	Predicted as yes
Actual: no	TN	FP
Actual: yes	FN	TP

This is done in terms of numbers. A list of rates can be calculated from a confusion matrix. With reference to the code example in this section, they are as follows:

Accuracy: Represents how often a classifier is correct: (TP+TN)/total.
Precision: Represents how often a classifier is correct when it predicts a positive observation.
Recall: The average recall for all classes (labels) in the evaluation dataset: TP/TP+FN.
F1 Score: This is the weighted average of precision and recall. It takes into account both false positives and false negatives: 2 * TP / (2TP + FP + FN).

The Evaluation class can also display other information such as the G-measure or the Matthews Correlation Coefficient, and much more. The confusion matrix can be also displayed in its full form:

println(eval.confusionToString)

The preceding command returns the following output:

The confusion matrix can be also accessed directly and converted into CSV format:

eval.getConfusionMatrix.toCSV

The preceding command returns the following output:

It can also be converted into HTML:

eval.getConfusionMatrix.toHTML

The preceding command returns the following output: