Bagging

Bagging, also known as bootstrap aggregation, is a general purpose procedure for reducing variance in the machine learning model. It is based on the bootstrap sampling technique and is generally used with regression or classification trees, but in principle this bagging technique can be used with any model.

The following steps are involved in the bagging process:

  1. We choose the number of estimators or individual models to use. Let's consider this as parameter B.
  2. We take sample datasets from B with replacement using the bootstrap sampling from the training set.
  3. For every one of these training datasets, we fit the machine learning model in each of the bootstrap samples. This way, we get individual predictors for the B parameter.
  4. We get the ensemble prediction by aggregating all of the individual predictions.

In the regression problem, the most common way to get the ensemble prediction would be to find the average of all of the individual predictions.

In the classification problem, the most common way to get the aggregated predictions is by doing a majority vote. The majority vote can be explained by an example. Let's say that we have 100 individual predictors and 80 of them vote for one particular category. Then, we choose that category as our aggregated prediction. This is what a majority vote means.