Day of the week

As a sanity check that the data was imported correctly, let's also explore the VDAYR variable, which indicates the day of the week that the patient visit occurred:

X_train.groupby('VDAYR').size()

The output is as follows:

VDAYR
1    2559
2    2972
3    2791
4    2632
5    2553
6    2569
7    2506
dtype: int64

As we would expect, there are seven possible values, and the observations are relatively uniformly distributed across the possible values. We could get fancy and engineer a WEEKEND feature, but engineering additional features can be very time-consuming and memory-consuming, often for minimal gain. We'll leave that exercise up to the reader.