9

Time to event studies

DAVID MACHIN, MARTIN J GARDNER

It is common in follow-up studies to be concerned with the survival time between the time of entry to the study and a subsequent event.1 The event may be death in a study of cancer, the disappearance of pain in a study comparing different steroids in arthritis, or the return of ovulation after stopping a long-acting method of contraception. These studies often generate some so-called “censored” observations of survival time. Such an observation would occur, for example, on any patient who is still alive at the time of analysis in a randomised trial where death is the end point. In this case the time from allocation to treatment to the latest follow-up visit would be the patient’s censored survival time.

The Kaplan–Meier product limit technique is the recognised approach for calculating survival curves in such studies.2,3 An outline of this method is given here. Details of how to calculate a confidence interval for the population value of the survival proportion at any time during the follow up and the median survival are given. Confidence interval calculations are also described for the difference in survival between two groups as expressed by the difference in survival proportions as well as for the hazard ratio between groups which summarises, for example, the relative death or relapse rate.

In some circumstances, the comparison between groups is adjusted for prognostic variables by means of Cox regression.3 In this case the confidence interval describing the difference between the groups is adjusted for the relevant prognostic variable.

In the survival comparisons context, confidence intervals convey only the effects of sampling variation on the precision of the estimated statistics and cannot control for any non-sampling errors such as bias in the selection of patients or in losses to follow up.

Survival proportions

Single sample

Suppose that the survival times after entry to the study (ordered by increasing duration) of a group of n subjects are tl, t2, t3,… tn. The proportion of subjects surviving beyond any follow-up time t, often referred to as S(t) but here denoted p for brevity, is estimated by the Kaplan–Meier technique as

img_094_001.gif

where ri is the number of subjects alive just before time ti (the ith ordered survival time), di denotes the number who died at time ti and ∏ indicates multiplication over each time a death occurs up to and including time t.

The standard error (SE) of p is given by

img_094_002.gif

where neffective is the “effective” sample size at time t. When there are no censored survival times, neffective will be equal to n the total number of subjects in the study group. When censored observations are present, the effective sample size is calculated each time a death occurs.4

img_094_003.gif

The 100(1 – α)% confidence interval for the population value of the survival proportion p at time t is then calculated as

img_094_004.gif

where z1–α/2 is the appropriate value from the standard Normal distribution for the 100(1 – α/2) percentile. Thus for a 95% confidence interval α = 0.05 and Table 18.1 gives Z1–α/2 – 1·96.

There are other and more complex alternatives for the calculation of the SE given here including that of Greenwood5 but, except in situations with very small numbers, these will lead to similar confidence intervals.3

The times at which to estimate survival proportions and their confidence intervals should be determined in advance of the results. They can be chosen according to practical convention— for example, the five-year survival proportions which are often quoted in cancer studies—or according to previous similar studies.

Worked example
Consider the survival experience of the 25 patients randomly assigned to receive γ-linolenic acid for the treatment of colorectal cancer of Dukes’s stage C.6 The ordered survival times (t), the calculated survival proportions (p), and the effective sample sizes (neffective) are shown in Table 9.1.
The data come from a comparative trial, but it may be of interest to quote the two-year survival proportion and its confidence interval for the group receiving γ-linolenic acid. The survival proportion to any follow-up time is taken from the entries in the table for that time if vailable or otherwise for the time immediately preceding. Thus for two years, t = 24 months, the survival proportion is p = S(24) = 0·5498. The corresponding effective sample size is neffective = 16.

Table 9.1 Survival data by month for 49 patients with Dukes’s C colorectal cancer randomly assigned to receive either γ-linolenic acid or control treatment6

img_095_001.gif
The standard error of this survival proportion is

img_096_001.gif

The 95% confidence interval for the population value of the survival proportion is then given by

img_096_002.gif

that is, from 0·31 to 0·79.
The estimated percentage of survivors to two years is thus 55% with a 95% confidence interval of 31% to 79%.

Median survival time Single sample

Single sample

If there are no censored observations, for example, if all the patients have died on a clinical trial, then the median survival time, M, is estimated by the middle observation (see also chapter 5) of the ordered survival times t1, t2…, tn if the number of observations n is odd, and by the average of tn/2 and tn/2 + 1 if n is even. Thus

img_096_003.gif

or

img_096_004.gif

Worked example
If we ignore the fact that there are censored observations in Table 9.1 and therefore consider all the patients to have died, then the median survival time of the 25 patients receiving γ-linolenic acid is the 13th ordered observation or M = 12 months. Making the same assumption for the 24 patients of the control group the median is the average of the 12th and 13th ordered survival times, that is M = (16 + 18)/2 = 17 months.

In the presence of censored survival times the median survival is estimated by first calculating the Kaplan–Meier survival curve, then finding the value of t that satisfies the equation

img_096_005.gif

This can be done by extending a horizontal line from p = 0·5 (or 50%) on the vertical axis of the Kaplan–Meier survival curve, until the actual curve is met, then moving vertically down from that point to cut the horizontal time axis at t = M, which is the estimated median survival time.

The calculations required for the confidence interval of a median are quite complicated and an explanation of how these are derived is complex.7 The expression for the standard error of the median includes SE(p) described above but evaluated at p = S(M) = 0·5. When p = 0·5,

img_097_001.gif

The standard error of the median is given by

img_097_002.gif

where tsmall is the smallest observed survival time from the Kaplan–Meier curve for which p is less than or equal to 0·45, while tlarge is the largest observed survival time from the Kaplan–Meier curve for which p is greater than 0·55. The ratio [(tsmalltlarge)/(psmallplarge)] in the above expression, estimates the height of the distribution of survival times at the median. Just as the blood pressure values of chapter 4 have a distribution, in that case taking the Normal distribution form, survival times will also have an underlying distribution of some form. The values of 0·45 and 0·55 are chosen at each side of the median of 0·5 to define “small” and “large” and are arbitrary. Should. Plarge = Psmall then the two values will need to be chosen wider apart. They may be chosen closer to 0·5 for large study sizes.

The 100(1 – α)% confidence interval for the population value of the median survival M is then calculated as

img_097_003.gif

where z1– α/2 is obtained from Table 18.1.

However, we must caution against the uncritical use of this method for small data sets as the value of SE(M) is unreliable in such circumstances, and also the values of tsmall and tlarge will be poorly determined.

Worked example
The Kaplan–Meier survival curve for the control patients of Table 9.1 is shown in Figure 9.1 and the hatched line indicates how the median is estimated. This gives M = 30 months. (We note that this is quite different from the incorrect value given in the illustrative example above.)

Figure 9.1 Kaplan–Meier estimate of the survival curve of 24 patients with Dukes’s C colorectal cancer.6

img_098_001.gif
The effective sample size at 30 months is neffective = 14 so that

img_098_002.gif

Reading from Table 9.1 at psmall = 0·3852 < 0·45 gives tsmall = 30 months also, and for plarge = 0·5870 > 0·55 gives tlarge = 20 months.
Thus

img_098_003.gif

The 95% confidence interval is therefore

img_098_004.gif

that is, from 17·0 to 43·0 months.
The estimated median survival is 30 months with a 95% confidence interval of 17 to 43 months. For the γ-linolenic acid group M = 32 months and SE(M) = 14·18.

Two samples

The difference between survival proportions at any time t in two study groups of sample sizes n1 and n2 is measured by p1p2, where p1 = S1(t) and p2 = S2(t) are the survival proportions at time t in groups 1 and 2 respectively.

The standard error of p1p2 is

img_099_001.gif

where neffective, 1 and neffective, 2 are the effective sample sizes at time t in each group.

The 100(1 – α)% confidence interval for the population value of P1P2 is

img_099_002.gif

where z1–α/2 is obtained from Table 18.1.

Worked example
The survival experience of the patients receiving γ-linolenic acid and the controls can be compared from the results given in Table 9.1. At two years for example, p1 = 0·5498 and p2 = 0·5136 with neffctive, 1 = 16 and neffctive, 2 = 17. The estimated difference in two-year survival proportions is thus 0·5498 – 0·5136 = 0·0363.
The standard error of this difference in survival proportions is

img_099_003.gif

The 95% confidence interval for the population value of the difference in two-year survival proportions is then given by

img_099_005.gif

that is, from –0·30 to 0·38.
Thus the study estimate of the increased survival proportion at two years for the patients given γ-linolenic acid compared with the control group is only about 4%. Moreover, the imprecision in the estimate from this small study is indicated by the 95% confidence interval ranging from –30% to +38%.

Difference between median survival times

The difference between the median survival times in two study groups of sample sizes n1 and n2 is measured by M1M2, where M1 and M2 are the medians in groups 1 and 2 respectively. The standard error of M1M2 is

img_099_004.gif

The 100(1 – α)% confidence interval for the population value of M1M2 is

img_100_001.gif

where z1–α/2 is obtained from Table 18.1.

Worked example
The median survival experiences of the patients receiving γ-linolenic acid and the controls can be compared from the results given in Table 9.1. Thus M1 = 32 and M2 = 30 months, a difference of M1M2 = 2 months. The standard error of this difference is estimated by

img_100_002.gif

The 95% confidence interval for the population value of the difference in medians is then given by

img_100_003.gif

that is, from –28·7 to 32·7 months.
Thus the study estimate of the increased median survival for the patients given γ-linolenic acid compared with the control group is only 2 months. Moreover, the imprecision in the estimate from this small study is indicated by the 95% confidence interval ranging from –29 to +33 months.

The hazard ratio

In a follow-up study of two groups the ratio of failure rates—for example, death or relapse rates—is termed the “hazard ratio”. It is a common measure of the relative effect of treatments or exposures. If O1 and O2 are the total numbers of deaths observed in the two groups then the corresponding expected numbers of deaths (E1 and E2) assuming an equal risk of dying at each time in both groups, may be calculated as

img_100_004.gif

Here r1i and r2i are the numbers of subjects alive and not censored in groups 1 and 2 just before time ti with r = r1i + r2i; di = d1i + d2i is the number who died at time ti in the two groups combined; and ∑ indicates addition over each time of death.

One estimator of the hazard ratio (HR) is (Ol /El)/(O2 /E2) although, for technical reasons, the more complex estimator

img_101_001.gif

where

img_101_002.gif

is more appropriate.

To obtain a 100(1 – α)% confidence interval for the population value of the hazard ratio one first calculates the two quantities

img_101_003.gif

where z1–α/2 is the appropriate value from the standard Normal distribution for the 100(1 – α/2) percentile (see Table 18.1). Thus for a 95% confidence interval α = 0·05 and z1–α/2 = 1·96.

The hazard ratio can then be estimated by HR and the confidence interval for the hazard ratio by8

img_101_004.gif

The hazard ratio calculated from (O1/E1)/(O2/E2) will be close to eX except in unusual data sets.

Worked example
For the data at the end of the trial, shown in Table 9.1, O1 = 10, E1 = 11·37, O2 = 12, E2 = 10·63, and V = 4·99.
The values of X and for Y, with α = 0·05, are

img_101_005.gif

The hazard ratio is thus estimated as

img_101_006.gif

The 95% confidence interval for the population value of the hazard ratio is then given by

img_101_007.gif

that is, from 0·32 to 1·83.
The results indicate that treatment with γ-linolenic acid has been associated with an estimated reduction in mortality to 76% of that for the control treatment, while the alternative hazard ratio calculation gives a similar figure of 78%. The reduction, however, is imprecisely estimated as shown by the wide confidence interval of 32% to 183%.

In the case when the distributions for the two groups can be assumed to be from exponential distributions., the ratio of the inverse of the two medians provides an estimate of the hazard ratio, that is, HRmedian = M2/M1. In this case, the approximate confidence interval is given as9

img_102_001.gif

where

img_102_002.gif

As noted earlier, O1 and O2 are the number of deaths in the respective groups.

Worked example
For the data at the end of the study, shown in Table 9.1, M1 – 32, M2 = 30, while O1 = 10 and O2 = 12. This gives an estimate of the hazard ratio as 30/32 = 0·9375. This is equivalent to a reduction in mortality of 6%. The corresponding standard error is

img_102_003.gif

The 95% confidence interval for the population value of the hazard ratio is then given by

img_102_004.gif

that is, from 0·9375 – 0·4320 to 0·9375 + 0·4320, or 0·51 to 1·37.

Cox regression

Just as in the situations described in chapter 8 in which the linear regression equation is used for predicting one variable from another, it is often important to relate the outcome measure (here survival time) to other variables. In contrast to the y variable of chapter 8, the comparable variable is time t but with the added complication that this will usually have censored values in some cases. As a consequence, and for quite technical reasons, special methods have been developed for survival time regression.10 These Cox regression models are then utilised in much the same way as the regression models of chapter 8. In the special case of a comparison between two groups of subjects, the Cox model provides essentially the same estimate of HR and the associated confidence interval as described earlier. The basic assumption is that the risk of failure (death) in one group is the same constant multiple of the other group at any point in the follow-up time.3

The Cox regression model for the comparison of two groups assumes that the risk of death in the two groups can be respectively described by

img_103_001.gif

Here if β = 0 then both groups have the same underlying death rate (hazard), λ0(t), at each time t, but this rate may change over time. For comparing two groups, it is usual to write x1 = 1 and x2 = 0, in which case

img_103_002.gif

Since t does not appear in the above expression (eβ) the hazard ratio does not change with time.

The 100(1 – α)% confidence interval for the population hazard ratio is

img_103_003.gif

where SE(β) is obtained from a computer program.

Worked example
For the data of Table 9.1 use of a standard statistical package gives β = –0·2528, with SE(β) = 0·4302. Thus the HRCox = e–0·2528 = 0·78. The corresponding 95% confidence interval for the hazard ratio is

img_103_004.gif

or 0·33 to 1·80.

It is useful to note that the estimate of HRCox and the corresponding 95% confidence interval are similar to those given in earlier calculations. They differ somewhat from those corresponding to HRmedian for which the assumption of a constant hazard (one that does not change with t) was made within each treatment group.

In certain circumstances there may be prognostic features of individual patients which may influence their survival and thus may modify the observed difference between groups. In such cases, we wish to compare the groups taking account of (or adjusted for) these variables. This leads to extending the single variable Cox model just described (with one explanatory variable indicating the group) to include also one or more prognostic variables as one may do in other multiple regression situations (see chapter 8). In the context of randomised controlled trials., described in chapter 11., we wish to check whether or not the treatment effect observed., as expressed by the hazard ratio., will be modified after taking account of these prognostic variables.3

1 Bland JM, Altman DG. Time to event (survival) data. BMJ 1997;317:468–9.

2 Altman DG, Bland JM. Survival probabilities (the Kaplan–Meier method). BMJ 1997;317:1572.

3 Parmar MKB, Machin D. Survival analysis: a practical approach. Chichester: John Wiley, 1995:26–40;115–42.

4 Peto J. The calculation and interpretation of survival curves. In: Buyse ME, Staquet MJ, Sylvester RJ (eds). Cancer clinical trials: methods and practice. Oxford: Oxford University Press, 1984:361–80.

5 Greenwood M. The natural duration of cancer. Reports of Public Health and Medical Subjects, 33. London: HMSO, 1926.

6 Mclllmurray MB, Turkie W. Controlled trial of γ-linolenic acid in Dukes’s C colorectal cancer. BMJ 1987;294:1260 and 295:475.

7 Collett D. Modelling survival data in medical research. London: Chapman & Hall, 1994: section 2·4.

8 Daly L. Confidence intervals. BMJ 1988;297:66.

9 Altman DG. Practical statistics for medical research. London: Chapman & Hall, 1991:384–5.

10 Cox DR. Regression models and life tables (with discussion). J R Statist Soc Ser B 1972;34:187–220.