Now, when we have all of the ingredients, we can replicate the Naive Bayes approach that we are expected to outperform. This approach will not include any additional data preprocessing, attribute selection, or model selection. As we do not have true labels for the test data, we will apply five-fold cross-validation to evaluate the model on a small dataset.
First, we initialize a Naive Bayes classifier, as follows:
Classifier baselineNB = new NaiveBayes();
Next, we pass the classifier to our evaluation function, which loads the data and applies cross-validation. The function returns an area under the ROC curve score for all three problems, and the overall results:
double resNB[] = evaluate(baselineNB); System.out.println("Naive Bayes\n" + "\tchurn: " + resNB[0] + "\n" + "\tappetency: " + resNB[1] + "\n" + "\tup-sell: " + resNB[2] + "\n" + "\toverall: " + resNB[3] + "\n");
In our case, the model returns the following results:
Naive Bayes churn: 0.5897891153549814 appetency: 0.630778394752436 up-sell: 0.6686116692438094 overall: 0.6297263931170756
These results will serve as a baseline when we tackle the challenge with more advanced modeling. If we process the data with significantly more sophisticated, time-consuming, and complex techniques, we expect the results to be much better. Otherwise, we are simply wasting resources. In general, when solving machine learning problems, it is always a good idea to create a simple baseline classifier that serves us as an orientation point.