Using Kaplan-Meier survival curves

The Kaplan-Meier estimator, also known as the product-limit estimator, is a statistical measure used to estimate the percentage chance of survival of a population over a given length of time. The formula for the Kaplan-Meier estimator is given as follows:

In plain English, this formula means that the survivability at any time (t) is the product of 1 minus the number of end events (d), the non-surviving population, divided by the number of the population that have not reached an end event (n), the surviving population, for all increments of time (t) that are less than or equal to the current time (t).

Think of this as essentially a running product over time. This means that values for the function are calculated for each increment of time (t) less than or equal to the present time and then multiplied together to get a new value. Obviously, this is similar in concept to a running sum except that multiplication is being used instead of addition.

Perhaps most associated with use in the clinical sciences, the Kaplan-Meier estimator morbidly allows the estimation of the percentage chance of survival over time once a patient has been diagnosed with a disease or received treatment for a disease. This is accomplished by using historical data that contains information on when patients have died after being diagnosed with the same disease or received similar treatment.

However, all that is really required to use the Kaplan-Meier estimator is a defined population and a duration of time between the start and end of an event. This means that there are numerous applications for the Kaplan-Meier estimator. For instance, the Kaplan-Meier estimator can be used to analyze the historical record of machine failures in order to determine the length of time when an additional failure is likely. In this case, the start time is when the machine started operation and the end event time is a failure or repair of the machine. Similarly, the Kaplan-Meier estimator can be used to determine the length of time someone might remain jobless. In this instance, the starting event is the loss of one's job and the end event is finding a new job.

This recipe uses the Kaplan-Meier estimator to determine the expected tenure of employees within an organization using simple human resources data that should be available to nearly all businesses. In this case, the start time is the date of hire for an employee and the end event time is the date of that employee leaving the organization. This recipe demonstrates how to compare the survival rates between two segments of the population. In this recipe, we use two different departments. However, any segmentation could be used, such as employees voluntarily leaving or involuntarily leaving.