As a sanity check that the data was imported correctly, let's also explore the VDAYR variable, which indicates the day of the week that the patient visit occurred:
X_train.groupby('VDAYR').size()
The output is as follows:
VDAYR 1 2559 2 2972 3 2791 4 2632 5 2553 6 2569 7 2506 dtype: int64
As we would expect, there are seven possible values, and the observations are relatively uniformly distributed across the possible values. We could get fancy and engineer a WEEKEND feature, but engineering additional features can be very time-consuming and memory-consuming, often for minimal gain. We'll leave that exercise up to the reader.