Fox News was excited: ‘Unplanned children develop more slowly, study finds’. The Telegraph was equally shrill in its headline: ‘IVF children have bigger vocabulary than unplanned children’. And the British Medical Journal press release drove it all: ‘Children born after an unwanted pregnancy are slower to develop’.
The last two, at least, made a good effort to explain that this effect disappeared when the researchers accounted for social and demographic factors. But was there ever any point in reporting the raw finding, from before this correction was made?
I will now demonstrate, with a nerdy table illustration, how you correct for things such as social and demographic factors. You’ll have to pay attention, because this is a tricky concept; but at the end, when the mystery is gone, you will see why reporting the unadjusted figures as the finding, especially in a headline, is simply wrong.
Correcting for an extra factor is best understood by doing something called ‘stratification’. Imagine you do a study, and you find that people who drink are three times more likely to get lung cancer than people who don’t. The results are in Table 1. Your odds of getting lung cancer as a drinker are 0.16 (that’s 366 ÷ 2,300). Your odds as a non-drinker are 0.05. So your odds of getting lung cancer are three times higher as a drinker (0.16 ÷ .05 is roughly 3, and that figure is called the ‘odds ratio’) – as in Table 1 below.
But then some clever person comes along and says: Wait, maybe this whole finding is confounded by the fact that drinkers are more likely to smoke cigarettes. That could be an alternative explanation for the apparent relationship between drinking and lung cancer. So you want to factor smoking out.
The way to do this is to chop your data in half, and analyse non-smokers and smokers separately. So you take only the people who smoke, and compare drinkers against non-drinkers; then you take only the people who don’t smoke, and compare drinkers against non-drinkers in that group separately. You can see the results of this in the second and third tables.
Now your findings are a bit weird. Suddenly, since you’ve split the data up by whether people are smokers or not, drinkers and non-drinkers have exactly the same odds of getting lung cancer. The apparent effect of drinking has been eradicated, and this means that the observed risk of drinking was entirely due to smoking: smokers had a higher chance of lung cancer – in fact their odds were 0.3 rather than 0.03, ten times higher – and drinkers were more likely to also be smokers. Looking at the figures in these tables, 203 out of 1,954 non-drinkers smoked, whereas 1,430 out of 2,666 drinkers smoked.
I explained all this with a theoretical example, where the odds of cancer apparently trebled before correction for smoking. Why didn’t I just use the data from the unplanned pregnancies paper? Because in the real world of research, you’re often correcting for lots of things at once. In the case of this BMJ paper, the researchers corrected for parents’ socioeconomic position and qualifications, sex of child, age, language spoken at home, and a huge list of other factors.
When you’re correcting for so many things, you can’t use old-fashioned stratification, as I did in this simple example, because you’d be dividing your data up among so many smaller tables that some would have no people in them at all. That’s why you calculate your adjusted figures using cleverer methods, such as logistic regression1 and likelihood theory. But it all comes down to the same thing. In our example above, alcohol wasn’t really associated with lung cancer. And in this BMJ paper, unplanned pregnancy wasn’t really associated with slower development. Pretending otherwise is just silly.