Making the response variable

In some cases, the response variable that we are trying to predict may already be a separate well-defined column. In those cases, simply converting the response from a string to a numeric type before splitting the data into train and test sets will suffice.

In our specific modeling task, we are trying to predict which patients presenting to the ED will eventually be hospitalized. In our case, hospitalization encompasses:

Accordingly, we must do some data wrangling to assemble all of these various outcomes into a single response variable:

response_cols = ['ADMITHOS','TRANOTH','TRANPSYC','OBSHOS','OBSDIS']

df_ed.loc[:, response_cols] = df_ed.loc[:, response_cols].apply(pd.to_numeric)

df_ed['ADMITTEMP'] = df_ed[response_cols].sum(axis=1)
df_ed['ADMITFINAL'] = 0
df_ed.loc[df_ed['ADMITTEMP'] >= 1, 'ADMITFINAL'] = 1

df_ed.drop(response_cols, axis=1, inplace=True)
df_ed.drop('ADMITTEMP', axis=1, inplace=True)

Let's discuss the previous code example in detail: