Now that we have seen how feature engineering techniques help in building predictive models, let's try and improve the performance of these models and evaluate whether the newly built model works better than the previous built model. Then, we will talk about two very important concepts that you must always keep in mind when doing predictive analytics, and these are the reducible and irreducible errors in your predictive models.
Let's first import the necessary modules, as shown in the following screenshot:
So, let's go to the Jupyter Notebook and take a look at the imported credit card default dataset that we saw earlier in this chapter, but as you can see, some modifications have been made to this dataset:
For this model, instead of transforming the sex and marriage features into two dummy features, the ones that we have been using were male and married; therefore, let's encode the information in a slightly different way to see if this works better. So, we will encode the information as married_male and not_married_female, and see if this works better. This is the first transformation that we are doing here. This is what the dataset looks like:
Now, let's do a little bit more feature engineering. The first thing that we will do is calculate these new features, which are built from subtracting the payment amount from the bill amount, as shown in the following screenshot:
For this problem, we will perform one mathematical operation. We will use the new features shown in the preceding screenshot to predict the target. Most of the information in the bill amount features is now encoded in these features, which are not needed anymore, but instead of throwing them away, what we can do is reduce the six bill amount features to just one using the PCA technique. So, let's apply the PCA technique to reduce the six features to just one component. Now there is a new feature called bill_amt_new_feat. So, this was the second feature engineering step that we performed. Finally, for the pay_i features, we will preserve the first one as is, and apply the PCA technique to the last five features, pay_2, pay_3, pay_4, pay_5, and pay_6, to reduce these five features to just two components. You can use the fit_transform method on the PCA object to get the components.
Now, let's take a look at the following screenshot, showing all of the features that have to do with money. As you can see, the variances here are really huge because the currency amounts are large:
Now, rescale these features by dividing them by 1,000 in order to reduce the variances, as shown in the following screenshot:
This helps us to make these numbers understandable. So, this is the other transformation that we did, and now let's train our model with these new features.