While imputing the single missing temperature value was not difficult in our example, imputing the two missing BNP values is more problematic for a number of reasons:
- There is a higher proportion of visits that have a missing BNP value.
- While the normal temperature range is simply 98.6, BNP has a huge normal range of 100 - 400 pg/mL. How do we select which value to impute when doing normal-value imputation?
- The mean of the BNP values in our dataset is 462.5, which is, in fact, abnormal. This means that if we tried mean imputation with this variable, we would be imputing an abnormal value for all patients who didn't have blood drawn, a highly unlikely scenario.
While there is no perfect answer for this problem, if we do try to salvage the raw BNP values (which means imputing the missing values), in this query set, we impute from a uniform distribution of values in the normal range:
sqlite> ALTER TABLE MORT_FINAL_2 ADD COLUMN Raw_BNP INTEGER;
sqlite> UPDATE MORT_FINAL_2 SET Raw_BNP =
(SELECT CAST(Lab_value as INTEGER)
FROM LABS AS L
WHERE MORT_FINAL_2.Pid = L.Pid
AND MORT_FINAL_2.Visit_date = L.Lab_date
AND L.Lab_name = 'Natriuretic peptide B');
sqlite> UPDATE MORT_FINAL_2 SET Raw_BNP =
ROUND(ABS(RANDOM()) % (300 - 250) + 250)
WHERE Raw_BNP IS NULL;