Types of evaluation metric

Different evaluation metrics are used for categorization and regression tasks. For categorization, accuracy is the most commonly used evaluation metric. However, accuracy is only valid if the cost of errors is the same for all classes, which is not always the case. For example, in medical diagnosis, the cost of a false negative will be much higher than the cost of a false positive. A false negative in this case says that the person is not sick when they are, and a delay in diagnosis can have serious, perhaps fatal, consequences. On the other hand, a false positive is saying that the person is sick when they are not, which is upsetting for that person but is not life threatening.

This issue is compounded when you have imbalanced datasets, that is, when one class is much more common than the other. Going back to our medical diagnosis example, if only 1% of people who get tested actually have the disease, then a machine learning algorithm can get 99% accuracy by just declaring that nobody has the disease. In this case, you can look at other metrics rather than accuracy. One such metric that is useful for imbalanced datasets is the F1 evaluation metric, which is a weighted average of precision and recall. The formula for the F1 score is as follows:

F1 = 2 * (precision * recall) / (precision + recall)

The formulas for precision and recall are as follows:

precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)

For regression, you have a choice of evaluation metrics: MAE, MSE, and RMSE. MAE, or Mean Absolute Error, is the simplest; it is just the average of the absolute difference between the actual value and the predicted value. The advantage of MAE is that it is easily understood; if MAE is 3.5, then the difference between the predicted value and the actual value is 3.5 on average. MSE, or Mean Squared Error, is the average of the squared error, that is, it takes the difference between the actual value and the predicted value, squares it, and then takes the average of those values. The advantage of using MSE over MAE is that it penalizes errors according to their severity. If the difference between the actual value and the predicted value for two rows was 2 and 5, then the MSE would put more weight on the second example because the error is larger. RMSE, or Root Mean Squared Error, is the square root of MSE. The advantage of using MSE is that it puts the error term back into units that are comparable to the actual values. For regression tasks, RMSE is usually the preferred metric.

For more information on metrics in MXNet, see https://mxnet.incubator.apache.org/api/python/metric/metric.html.

For more information on metrics in Keras, see https://keras.io/metrics/.