4

Means and their differences

DOUGLAS G ALTMAN, MARTIN J GARDNER

The rationale behind the use of confidence intervals was described in chapters 1 and 3. Here formulae for calculating confidence intervals are given for means and their differences. There is a common underlying principle of subtracting and adding to the sample statistic a multiple of its standard error (SE). This extends to other statistics, such as proportions and regression coefficients, but is not universal.

Confidence intervals for means are constructed using the t distribution if the data have an approximately Normal distribution. For differences between two means the data should also have similar standard deviations (SDs) in each study group. This is implicit in the example given in chapter 3 and in the worked examples below. The calculations have been carried out to full arithmetical precision, as is recommended practice (see chapter 14), but intermediate steps are shown as rounded results.

The case of non-Normal data is discussed both in this chapter and in chapter 5.

A confidence interval indicates the precision of the sample mean or the difference between two sample means as an estimate of the overall population value. As such, confidence intervals convey the effects of sampling variation but cannot control for non-sampling errors in study design or conduct.

Single sample

The confidence interval for a population mean is derived using the mean image and its standard error SE(image) from a sample of size n. For this case the standard error is obtained simply from the sample standard deviation (SD) as SE = SD/image. Thus, the confidence interval is given by

img_029_001.gif

where t1 – α/2 is the appropriate value from the t distribution with n – 1 degrees of freedom associated with a “confidence” of 100(1 – α)%. For a 95% confidence interval α is 0·05, for a 99% confidence interval α is 0·01, and so on. Values of t can be found from Table 18.2 or in statistical textbooks.1,2 For a 95% confidence interval the value of t will be close to 2 for samples of 20 upwards but noticeably greater than 2 for smaller samples.

Worked example

Blood pressure levels were measured in a sample of 100 diabetic men aged 40–49 years. The mean systolic blood pressure was 146·4 mmHg and the standard deviation 18·5 mmHg. The standard error of the mean is thus found as 18·5/image = 1.85.

To calculate the 95% confidence interval the appropriate value of t0·975 with 99 degrees of freedom is 1·984. The 95% confidence interval for the population value of the mean systolic blood pressure is then given by

img_029_005.gif

that is, from 142·7 to 150·1 mmHg.

Two samples: unpaired case

The confidence interval for the difference between two population means is derived in a similar way. Suppose img_xb_sr.gif1 and img_xb_sr.gif2 are the two sample means, s1 and s2 the corresponding standard deviations, and n1 and n2 the sample sizes. Firstly, we need a “pooled” estimate of the standard deviation, which is given by

img_029_002.gif

From this the standard error of the difference between the two sample means is

img_029_003.gif

where d = img_xb_sr.gif1img_xb_sr.gif2. The 100(1 – α)% confidence interval for the difference in the two population means is then

img_029_004.gif

where t1 – α/2 is taken from the t distribution with n1 + n2 – 2 degrees of freedom (see Table 18.2).

If the standard deviations differ considerably then a common pooled estimate is not appropriate unless a suitable transformation of scale can be found.3 Otherwise obtaining a confidence interval is more complex.4

Worked example

Blood pressure levels were measured in 100 diabetic and 100 non-diabetic men aged 40–49 years. Mean systolic blood pressures were 146·4 mmHg (SD 18·5) among the diabetics and 140·4 mmHg (SD 16·8) among the non-diabetics, giving a difference between sample means of 6·0 mmHg.

Using the formulae given above the pooled estimate of the standard deviation is

img_030_001.gif

and the standard error of the difference between the sample means is

img_030_002.gif

To calculate the 95% confidence interval the appropriate value of t1 – α/2 with 198 degrees of freedom is 1·972. Thus the 95% confidence interval for the difference in population means is given by

img_030_003.gif

that is, from 1·1 to 10·9 mmHg, as shown in Figure 3.1.

Suppose now that the samples had been of only 50 men each but that the means and standard deviations had been the same. Then the pooled standard deviation would remain 17·7 mmHg, but the standard error of the difference between the sample means would become

img_030_004.gif

The appropriate value of t1 – α/2 on 98 degrees of freedom is 1·984, and the 95% confidence interval is calculated as

img_030_005.gif

that is, from –1·0 to 13·0 mmHg, as shown in Figure 3.2.

For the original samples of 100 each the appropriate values of t0.995 and t0.95 with 198 degrees of freedom to calculate the 99% and 90% confidence intervals are 2·601 and 1·653, respectively. Thus the 99% confidence interval is calculated as

img_030_006.gif

that is, from –0·5 to 12·5 mmHg (Figure 3.3), and the 90% confidence interval is given by

img_030_007.gif

that is, from 1·9 to 10·1 mmHg (Figure 3.3).

Two samples: paired case

Paired data arise in studies of repeated measurements—for example, at different times or in different circumstances on the same subjects—and matched case-control comparisons. For such data the same formulae as for the single sample case are used to calculate the confidence interval, where img_xb_sr.gif and SD are now the mean and standard deviation of the individual within subject or patient–control differences.

Worked example

Systolic blood pressure levels were measured in 16 middle-aged men before and after a standard exercise, giving the results shown in Table 4.1.

The mean difference (rise) in systolic blood pressure following exercise was 6·6 mmHg. The standard deviation of the differences, shown in the last column of Table 4.1, is 6·0 mmHg. Thus the standard error of the mean difference is found as 6·0/image = 1.49 mmHg.

Table 4.1 Systolic blood pressure levels (mmHg) in 16 men before and after exercise

img_031_001.gif

To calculate the 95% confidence interval the appropriate value of t0·975 with 15 degrees of freedom is 2·131. The 95% confidence interval for the population value of the mean systolic blood pressure increase after the standard exercise is then given by

img_032_001.gif

that is, from 3·4 to 9·8 mmHg.

Non-Normal data

The sample data may have to be transformed on to a different scale to achieve approximate Normality. The most common reason is because the distribution of the observations is skewed, with a long “tail” of high values. The logarithmic transformation is the most frequently used. Transformation often also helps to make the standard deviations on the transformed scale in different groups more similar.5

Single sample

For a single sample a mean and confidence interval can be constructed from the transformed data and then transformed back to the original scale of measurement.6 This is preferable to presenting the results in units of, say, log mmHg. With highly skewed or otherwise awkward data the median may be preferable to the mean as a measure of central tendency and used with non-parametric methods of analysis. Confidence intervals can be calculated for the median (see chapter 5).

Worked example

Table 4.2 shows T4 and T8 lymphocyte counts in 28 haemophiliacs7 ranked in increasing order of the T4 counts.

Suppose that we wish to calculate a confidence interval for the mean T4 lymphocyte count in the population of haemophiliacs. Inspection of histograms and plots of the data reveals that whereas the distribution of T4 values is skewed, after logarithmic transformation the values of loge (T4) have a symmetric near Normal distribution. We can thus apply the method given previously for calculating a confidence interval for a population mean derived from a single sample of observations.

The mean of the values of loge (T4) is –0·2896 and the standard deviation is 0·5921. Thus the standard error of the mean is found as 0·5921/image = 0·1119. The units here are log lymphocyte counts × 109/1.

To calculate the 95% confidence interval the appropriate value of t0·975 with 27 degrees of freedom is 2·052. The 95% confidence interval for the mean loge (T4) in the population is then given by

img_033_002.gif

that is, from –0·5192 to –0·0600.

Table 4.2 T4 and T8 lymphocyte counts (× 109/1) in 28 haemophiliacs7

img_033_001.gif

We can transform this confidence interval on the logarithmic scale back to the original units to get a more meaningful confidence interval. First we transform back the mean of loge (T4) to get the geometric mean T4 count. This is given as exp(–0·2896) = 0·75 × 109/1. (The geometric mean is found as the antilog of the mean of the log values.) In the same way we can transform back the values describing the confidence interval to get a 95% confidence interval for the geometric mean T4 lymphocyte count in the population of haemophiliacs, which is thus given by

img_033_003.gif

that is, from 0·59 to 0·94 × 109/1.

Two samples

For the case of two samples, only the logarithmic transformation is suitable.5 For paired or unpaired samples the confidence interval for the difference in the means of the transformed data has to be transformed back. For the log transformation the antilog of the difference in sample means on the transformed scale is an estimate of the ratio of the two population (geometric) means, and the antilogged confidence interval for the difference gives a confidence interval for this ratio. Other transformations do not lead to sensible confidence intervals when transformed back,5 but a non-parametric approach can be used to calculate a confidence interval for the population difference between medians (see chapter 5).

Worked example

Suppose that we wish to calculate a confidence interval for the difference between the T4 and T8 counts in the population of haemophiliacs using the results given in Table 4.2. Inspection of histograms and plots of these data reveals that the distribution of the differences T4 – T8 is skewed, but after logarithmic transformation the differences loge (T4) – loge (T8) have a symmetric near Normal distribution. We can thus apply the method given previously for calculating a confidence interval from paired samples. The method makes use of the fact that the difference between the logarithms of two quantities is exactly the same as the logarithm of their ratio. Thus

img_034_001.gif

The mean of the differences between the logs of the T4 and T8 counts (shown in the final column of Table 4.2) is 0·5154 and the standard deviation is 0·5276. Thus the standard error of the mean is found as 0·5276/image = 0·0997.

To calculate the 95% confidence interval the appropriate value of t0·975 with 27 degrees of freedom is 2·052. The 95% confidence interval for the difference between the mean values of loge (T4) and loge (T8) in the population of haemophiliacs is then given by

img_034_002.gif

that is, from 0·3108 to 0·7200.

The confidence interval for the difference between log counts is not as easy to interpret as a confidence interval relating to the actual counts. We can take antilogs of the above values to get a more useful confidence interval. The antilog of the mean difference between log counts is exp(0·5154) = 1·67. Because of the equivalence of the difference loge (T4) – loge (T8) and loge (T4/T8) this value is an estimate of the geometric mean of the ratio T4/T8 in the population. The antilogs of the values describing the confidence interval are exp(0·3108) = 1·36 and exp(0·7200) = 2·05, and these values provide a 95% confidence interval for the geometric mean ratio of T4 to T8 lymphocyte counts in the population of haemophiliacs.

Note that whereas for a single sample the use of the log transformation still leads to a confidence interval in the original units, for paired samples the confidence interval is in terms of a ratio and has no units.

If log transformation is considered necessary, a confidence interval for the difference in the means of two unpaired samples is derived in much the same way as for paired samples. The log data are used to calculate a confidence interval, using the method for unpaired samples given previously. The antilogs of the difference in the means of the log data and the values describing its confidence interval give the geometric mean ratio and its associated confidence interval.

Comment

The sampling distribution of a mean (and the difference between two means) will become more like a Normal distribution as the sample size increases. However, study sizes typical in medical research are usually not large enough to rely on this property, especially for a single mean, so it is useful to use the log transformation for skewed data. An exception is where one is interested only in the difference between means, not their ratio, such as in studies of cost data.8

1 Altman DG. Practical statistics for medical research. London: Chapman & Hall, 1991.

2 Campbell MJ, Machin D. Medical statistics. A commonsense approach. 3rd edn. Chichester: John Wiley, 1999.

3 Bland JM, Altman DG. Transforming data. BMJ 1996;312:770.

4 Armitage P, Berry G. Statistical methods in medical research. 3rd edn. Oxford: Blackwell Science, 1994:111–14.

5 Bland JM, Altman DG. The use of transformation when comparing two means. BMJ 1996;312:1153.

6 Bland JM, Altman DG. Transformations, means, and confidence intervals. BMJ 1996; 312:1079.

7 Ball SE, Hows JM, Worslet AM, et al. Seroconversion of human T cell lymphotropic virus III (HTLV-III) in patients with haemophilia: a longitudinal study. BMJ 1985;290:1705–6.

8 Barber JA, Thompson SG. Analysis and interpretation of cost data in randomised controlled trials: review of published studies. BMJ 1998;317:1195–200.