Cross-validation and Parameter Tuning

Predictive analytics is about making predictions for unknown events. We use it to produce models that generalize data. For this, we use a technique called cross-validation.

Cross-validation is a validation technique for assessing the result of a statistical analysis that generalizes to an independent dataset that gives a measure of out-of-sample accuracy. It achieves the task by averaging over several random partitions of the data into training and test samples. It is often used for hyperparameter tuning by doing cross-validation for several possible values of a parameter and choosing the parameter value that gives the lowest cross-validation average error.

There are two kinds of cross-validation: exhaustive and non-exhaustive. K-fold is an example of non-exhaustive cross-validation. It is a technique for getting a more accurate assessment of the model's performance. Using k-fold cross-validation, we can do hyperparameter tuning. This is about choosing the best hyperparameters for our models. Techniques such as k-fold cross-validation and hyperparameter tuning are crucial for building great predictive analytics models. There are many flavors or methods of cross-validation, such as holdout cross-validation and k-fold cross-validation.

In this chapter, we are going to cover the following topics:

Holdout cross-validation
K-fold cross-validation
Comparing models with k-fold cross-validation
Introduction to hyperparameter tuning