Age

As people get older, one can expect them to be sicker and to be admitted to the hospital more frequently. This hypothesis will be tested once we see the variable importance results of our model.

There are three variables that reflect the age in the dataset. AGE is an integer value that gives the age in years. AGEDAYS is an integer value that gives the age in days if the patient is less than 1 year old. AGER is the age variable, except that it has been converted to a categorical variable. Let's convert the AGE variable to a numeric type, leave the AGER variable as is, and remove the AGEDAYS variable since it will be not applicable in the vast majority of cases:

X_train.loc[:,'AGE'] = X_train.loc[:,'AGE'].apply(pd.to_numeric)
X_test.loc[:,'AGE'] = X_test.loc[:,'AGE'].apply(pd.to_numeric)

X_train.drop('AGEDAYS', axis=1, inplace=True)
X_test.drop('AGEDAYS', axis=1, inplace=True)