Getting the data

At the KDD Cup web page (, you should see a page that looks similar to the following screenshot. First, under the Small version (230 var.) header, download Next, download the three sets of true labels associated with this training data. The following files are found under the Real binary targets (small) header:

  • orange_small_train_appentency.labels
  • orange_small_train_churn.labels
  • orange_small_train_upselling.labels

Save and unzip all of the files marked in the red boxes, as shown in the screenshot:

In the following sections, first, we will load the data into Weka and apply basic modeling with the Naive Bayes classifier, in order to obtain our own baseline AUC scores. Later, we will look at more advanced modeling techniques and tricks.