Training different regression models

The following screenshot shows a dataframe where we are going to save performance. We are going to run four models, namely logistic regression, bagging, random forest, and boosting:

We are going to use the following evaluation metrics in this case:

accuracy: This metric measures how often the model predicts defaulters and non-defaulters correctly
precision: This metric will be when the model predicts the default and how often the model is correct
recall: This metric will be the proportion of actual defaulters that the model will correctly predict

The most important of these is the recall metric. The reason behind this is that we want to maximize the proportion of actual defaulters that the model identifies, and so the model with the best recall is selected.