We have explored various machine learning techniques and built several models to predict the credit ratings of customers, so now comes the question of which model we should select and how the models compare against each other. Our test data has 130 instances of customers with a bad credit rating (0) and 270 customers with a good credit rating (1).
If you remember, earlier we had talked about using domain knowledge and business requirements after doing modeling to interpret results and make decisions. Right now, our decision is to choose the best model to maximize profits and minimize losses for the German bank. Let us consider the following conditions:
Keeping these conditions in mind, we will make a comparison table for the various models, including some of the metrics we had calculated earlier for the best model for each machine learning algorithm. Remember that considering all the model performance metrics and business requirements, there is no one model that is the best among them all. Each model has its own set of good performance points, which is evident in the following analysis:
The cells highlighted in the preceding table show the best performance for that particular metric. As we mentioned earlier, there is no best model and we have listed down the models that have performed best against each metric. Considering the total overall gain, decision tree seems to be the model of choice. However, this is assuming that the credit loan amount requested is constant per customer. Remember that if each customer requests loans of different amounts then this notion of total gain cannot be compared because then the profit from one loan might be different to another and the loss incurred might be different on different loans. This analysis is a bit complex and out of the scope of this chapter, but we will mention briefly how this can be computed. If you remember, there is a credit.amount
feature, which specifies the credit amount requested by the customer. Since we already have the customer numbers in the training data, we can aggregate the rated customers with their requested amount and sum up the ones for which losses and profits are incurred, and then we will get the total gain of the bank for each method!