Other types of evaluation

Other evaluations are available through the DL4J API. This section lists them.

It is possible to evaluate a network performing regression through the RegressionEvaluation class (https://static.javadoc.io/org.deeplearning4j/deeplearning4j-nn/1.0.0-alpha/org/deeplearning4j/eval/RegressionEvaluation.html, DL4J NN). With reference to the example that we used in the Evaluation for classification section, evaluation for regression can be done the following way:

val eval = new RegressionEvaluation(3)
val output = model.output(testData.getFeatureMatrix)
eval.eval(testData.getLabels, output)
println(eval.stats)

The produced output of the stats method includes the MSE, the MAE, the RMSE, the RSE, and the R^2:

ROC (short for Receiver Operating Characteristic, https://en.wikipedia.org/wiki/Receiver_operating_characteristic) is another commonly used metric for the evaluation of classifiers. DL4J provides three different implementations for ROC:

All of the three preceding classes have the ability to calculate the area under ROC curve (AUROC), through the calculateAUC method, and the area under Precision-Recall curve (AUPRC), through the calculateAUPRC method. These three ROC implementations support two modes of calculation:

It is possible to export the AUROC and AUPRC in HTML format so that they can be viewed using a web browser. The exportRocChartsToHtmlFile method of the EvaluationTools class (https://deeplearning4j.org/api/1.0.0-beta2/org/deeplearning4j/evaluation/EvaluationTools.html) has to be used to do this export. It expects the ROC implementation to export and a File object (the destination HTML file) as parameters. It saves both curves in a single HTML file.

To evaluate networks with binary classification outputs, the EvaluationBinary class (https://deeplearning4j.org/api/1.0.0-beta2/org/deeplearning4j/eval/EvaluationBinary.html) is used. The typical classification metrics (Accuracy, Precision, Recall, F1 Score, and so on) are calculated for each output. The following is the syntax for this class:

val size:Int = 1
val eval: EvaluationBinary = new EvaluationBinary(size)

What about time series evaluation (in the case of RNNs)? It is quite similar to the evaluation approaches for classification that we have described so far in this chapter. For time series in DL4J, the evaluation is performed on all the non-masked time steps in a separate way. But what is masking for RNNs? RNNs require that inputs have a fixed length. Masking is a technique that's used to handle this because it marks missing time steps. The only difference between the other evaluation cases that were presented previously is the optional presence of mask arrays. This means that, in many time series cases, you can just use the evaluate or evaluateRegression methods of the MultiLayerNetwork class – regardless of whether mask arrays should be present, they can be properly handled by those two methods.

DL4J also provides a way to analyze the calibration of a classifier – the EvaluationCalibration class (https://deeplearning4j.org/api/1.0.0-beta2/org/deeplearning4j/eval/EvaluationCalibration.html). It provides a number of tools for this, such as the following:

The evaluation of a classifier using this class is performed in a similar manner to the other evaluation classes. It is possible to export its plots and histograms in HTML format through the exportevaluationCalibrationToHtmlFile method of the EvaluationTools class. This method expects an EvaluationCalibration instance and a file object (the destination HTML file) as arguments.