2

The Benchmark Operation

There is a major difference between the public perception of surgeons (doctors who treat mainly by cutting patients) and physicians (doctors who treat mainly by prescribing drugs). When a physician treats a patient, and the patient dies, then it is commonly seen as the patient’s fault: he or she ‘failed to respond’ to the treatment. This platitude becomes harder to swallow if the treatment is an elective operation: when a healthy patient walks into a hospital, has an operation, and then dies, it is much easier to think that the surgeon was at fault. After all, the temporal, if not causal, relationship between the operation and the outcome speaks with a resonant eloquence that is impossible to ignore. Nowhere is this relationship in sharper focus than in heart surgery: a heart operation that goes wrong leads more directly and more rapidly to death than, say, an orthopaedic or bowel operation.

One of the simplest clinical outcomes to measure is the death rate of an operation. When 100 patients have an operation and five die as a result, we say the mortality of the operation is 5 per cent. This simple percentage is a crude yet crucial measure of an operation’s success rate, and its importance was recognised as far back as the 1960s by Michael Crichton.

Crichton, who died in 2008 at the age of only 66, was a prolific American writer of novels, film scripts, and television series (including Jurassic Park and ER). He developed a keen interest in writing at a young age, but moved from studying literature to medicine while at Harvard Medical School. One of his early novels, A Case of Need, begins with the intriguing statement ‘All heart surgeons are bastards’. A page later, he describes one particular (fictional) heart surgeon in the following terms:

Because Frank Conway was good, because he was an eight-percenter, a man with lucky hands, a man with the touch, everyone put up with his temper tantrums, his moments of anger and destructiveness.

Many anaesthetists, operating room nurses, and trainee surgeons will sympathise with this profile, and perhaps recognise some of their own heart surgeons in this damning depiction, but much more important than the vivid description of surgical tantrums is the term ‘eight-percenter’. This is the surgeon’s mortality rate: the proportion of patients who die under the knife, so to speak, or soon after the operation.

Fortunately, we have come a long way from the days when 8 per cent mortality in heart surgery was considered evidence of excellence. Nowadays, the death rate after heart surgery in many great hospitals around the world is approximately 2 per cent, or even lower, but what is truly interesting about this passage in Michael Crichton’s book is not the percentage. It is this: even as far back as the 1960s, and in the eyes of a populist writer of mass-market medical fiction, albeit one studying medicine at one of the top medical schools in the world, the measurement of the quality of a heart surgeon’s work was well and truly established in people’s minds as the percentage mortality. This, rudimentary and crude as it may be, was probably the first sign of quality control and performance measurement in medicine.

On the face of it, the death rate after a particular type of operation is a straightforward measure of how good the operator and the hospital are. It is one definition of success that is crucial for at least two reasons. First, it is absolute: after an operation, the patient is either dead or alive.* Second, the patient needs to be alive to enjoy the benefits of whatever was done, so the patient will care more about survival than about other possible benefits, such as quality of life and symptom relief, which, important though they are, do not count for much if one is dead.

[* Though even such a simple outcome can, in practice, be difficult to define and a subject of controversy. This is addressed in Chapter Four.]

In heart surgery, the benchmark procedure (or ‘index procedure’) for measuring the quality of medical care is an operation used to treat angina. Having angina is bad news. For a start, it feels awful. It does not hurt in the same way that a paper cut or a hammer blow to your thumb does, but it is truly unpleasant. Angina is a perception of tightness or pressure across the chest, and usually comes on with physical exertion. Patients may describe it as a sensation of heaviness, a dead weight, or a vice-like grip. It does not score high on the pain severity scale, but it is often associated with a feeling of impending doom that, at the very least, makes you want to stop or slow down your physical exertions until it abates. Angina is heartache, literally.

More importantly, having angina means your coronary arteries are narrowed, and that is even worse news. A coronary artery becomes narrowed when its wall is furred up with cholesterol. The resulting formation is called a ‘plaque’. A plaque can rupture, and, when that happens, the cholesterol and other coarse muck within it become exposed to the blood flowing within the artery. The blood then sticks to the disrupted plaque and forms a clot, and the narrowed coronary artery closes completely. The bit of heart muscle that the coronary artery supplies begins to die, and this is called a heart attack. As everyone knows, heart attacks kill people. Having angina means that the sufferer, in addition to the pain that is angina, is at a higher risk of having a heart attack.

Fortunately, angina can now be cured (and the risk of heart attack dramatically reduced) by the most commonly performed heart operation ever: coronary artery bypass grafting, or CABG for short. In this operation, veins or arteries are taken from various body parts and used to bypass blockages or narrowings in the coronary arteries, those fine, fiddly, yet fiendishly important vital suppliers to the heart muscle itself. The operation is done to relieve the heartache that is angina and to help prevent heart attacks. The layperson calls this operation a bypass, or a double, triple, or quadruple bypass, depending on the number of coronary arteries that are treated and how much the patient wants to impress the select few who are remotely interested in someone else’s tales of medical adventure. Heart surgeons and cardiologists often just call it a ‘cabbage’, a corruption of the acronym CABG. Politically correct hospital managers really do not like the use of the term ‘cabbage’, especially when there are patients or relatives within earshot. They prefer to call it ‘See a Bee Gee’ (singing ‘Stayin’ Alive’, perhaps?).

I hope your coronary arteries are as clean and unobstructed as the day you were born, but, perish the thought, let us assume they are not. Perhaps years of smoking, obesity, and high blood pressure have taken their toll, or you have led a clean and healthy life but were cursed by a family genetic predisposition to plaque formation in the arteries. Whatever the cause, you now have angina that no number of tablets can relieve, a family history littered with early deaths from heart attacks, and coronary arteries full of blockages and narrowings. In short, you have been advised to have a CABG.

You have decided to go ahead, and your next decision is to choose between surgeon A and surgeon B. Both are nice people, friendly, men, of a similar age, have an excellent bedside manner, and work in the same hospital. In fact, you find it difficult to distinguish between them in most respects, but then you learn their mortality data: surgeon A has a CABG mortality of 1.25 per cent, and surgeon B has a CABG mortality of 2.08 per cent — nearly double that of surgeon A. Which surgeon would you choose? Surgeon A, of course! This, as some would say, is a no-brainer.

But is it? You could, in fact, be making a very big mistake.

Consider the situation in more detail. You are a prospective patient contemplating having a CABG at St Elsewhere’s General Infirmary. You do your homework and obtain information about the hospital, its location and services, the quality of its food, its policy on visiting hours, and, perhaps most important of all, the ease and cost of parking your car there. You also, wisely, ask for data on the various heart surgeons and their CABG results. The hospital obliges willingly, as it has a policy of openness and transparency coupled with pride in the results. The information is provided to you by the hospital’s audit department. The data have been validated by a specialist society and published on the web for all to see. It simply states that, for CABG, the benchmark operation for measuring performance, the results are as follows:

CABG mortality at St Elsewhere’s General Infirmary

Surgeon A

1.25%

Surgeon B

2.08%

On the face of it, the decision is indeed a no-brainer, but I first need to tell you a few things about surgeons A and B that do not appear in the figures. Surgeon B is an ordinary bloke, with a Type B personality. He drives an ageing Saab, is a little obsessive about safety, hates taking risks, and practises medicine on the basis of scientific evidence. Surgeon A, however, has a Type A personality. He drives a Ferrari, cuts corners in the operating theatre as on the road, likes to take risks in his own life and with the lives of his patients, and believes that evidence-based medicine is like painting by numbers — all right for pedestrian artists, but not for him, the self-styled Leonardo da Vinci of the art of surgery. Furthermore, he is getting a little bored with CABG as a blandly predictable, bread-and-butter operation, and wants to explore new ways of treating his patients.

In fact, most medicine as practised nowadays is (or at least should be) evidence-based. There is so much medical research around that a doctor should be armed with the facts and figures before deciding to use this or that type of treatment. The opposite of evidence-based medicine is sometimes practised by stubborn yet famous high-profile doctors. I call it eminence-based medicine, and it is defined as ‘persisting in making the same mistake over and over again, but with ever-increasing conviction’. This is, of course, insane.

So we now have a mental picture of surgeon B as a solid citizen and surgeon A as a cavalier risk-taker. But, I hear you say, despite all that, surgeon A’s CABG mortality is definitely lower, and that is surely a good thing. In fact, it is not.

To understand why, we need some additional information about the two surgeons and their CABG practice. Last year, two lots of 100 identically matched CABG patients were referred for surgery, 100 to each of the two surgeons. This is how the patients are in each of these two identical groups:

  1. All 100 have triple-vessel coronary artery disease and need a triple CABG.
  2. Eighty have strong hearts and are low-risk patients.
  3. Sixteen have weak hearts due to damage from a previous heart attack, with a stable, old scar on the heart, and are medium-risk patients.
  4. Four have weak hearts due to damage from a previous heart attack, and the scar on the heart has ballooned into an aneurysm, which is slowly expanding. They are high-risk patients.

Both surgeons do exactly the same for the 80 low-risk patients: they perform a triple CABG. The mortality in this group is expected to be low. Seventy-nine of the 80 patients sail through the operation without a hitch. One unfortunate patient dies as a result of the operation.

Both surgeons do exactly the same for the four high-risk patients: they carry out a triple CABG and cut out the aneurysm. This is a dangerous surgery, and, not surprisingly, one patient out of the four dies as a result, and the other three do well.

In the medium-risk group of 16 patients, surgeon B does a triple CABG, which is all they needed. One dies and 15 survive, as can be expected. Surgeon A, however, gets excited about the scar on the heart. He imagines it to be an aneurysm. He is getting bored with just doing CABG and wants some variety in his professional life. He fancies a challenge and happens to be feeling somewhat overconfident at the time. He decides to cut out the scar, call this additional procedure an aneurysmectomy, and then reshape the heart to make it work better. Of course, he has no scientific proof for any of this, but who needs proof when one is a surgical superstar? So he proceeds, despite the total lack of evidence that this will do any good, and despite the fact that this will unnecessarily complicate the surgery. In this group, surgeon A has three deaths: the one that was expected, plus two more due to the bleeding and heart-rhythm problems that arose directly as a result of the unnecessary cutting of the heart to remove an ‘aneurysm’ that wasn’t really there.

Each of the two surgeons has now completed the 100 operations. Surgeon B has three deaths, and surgeon A has five. They submit their results to the auditors, and this is what is published:

  1. Simple operation (the benchmark procedure): CABG on its own
  2. Surgeon B: two deaths out of 96 = 2.08 per cent
  3. Surgeon A: one death out of 80 = 1.25 per cent
  4. Surgeon A wins!
  1. Complex operation: CABG plus aneurysmectomy
  2. Surgeon B: one death out of 4 = 25 per cent
  3. Surgeon A: four deaths out of 20 = 20 per cent
  4. Surgeon A wins again!

‘But’, you may well object, ‘that is completely crazy; figures for death cannot possibly be that misleading!’ Most of the time you would be right, but not always.

The above example shows one rather extreme consequence of something I shall call the ‘category shift’, and it is a phenomenon that comes into play especially when the wrong outcome measure is used. Earlier, Michael Crichton, like so many others, evaluated heart surgeons by their mortality: what percentage of their patients died as a result of surgery. Later, many specialist organisations, professional bodies, hospital auditors, and media health correspondents became interested in ‘procedural mortality’; in other words, the death rate of a particular type of operation. More often than not, the operation first chosen for measurement is the one carried out most commonly. In heart surgery, that operation is CABG. This is now our benchmark, or index, procedure. For many years and in many institutions around the world, CABG mortality was the only outcome measure used in cardiac surgery and, indeed, in healthcare as a whole. In 1992, the state of New York was the first official administration to measure this outcome in its many hospitals and make the results available to the general public. The impact that this had on New York surgeons was substantial, and much of it was negative.

Imagine an average New York State surgeon in 1992. He* has been bumbling along, doing the best for his patients, and achieving acceptable results for the early ’90s with a CABG mortality of 4 per cent, when a newspaper, out of the blue, publishes a league table for CABG mortality in which he is ranked alongside all the other surgeons in the state. We have to remember that his practice (and his income, sports car, and second home in Florida) all depend on his being referred patients on whom to operate. The publication of outcomes means that patients, who, after all, have a choice in the matter, are likely to seek the surgeon with a low or even the lowest mortality. With half an eye on next year’s figures, our surgeon is in the middle of performing a CABG on a frail, elderly woman with many medical problems. The operation is not going at all well. It is technically difficult, the arteries are heavily diseased and challenging to suture, the heart is weak with a scar from a previous heart attack, and there is a lot of bleeding: in short, this patient looks like she might not make it. Like many American surgeons, this surgeon only does about 50 CABGs per year. If she dies, his mortality jumps from 4 per cent to 6 per cent, now well above the average. On the other hand, if he decides to tackle that scar, she is no longer a CABG case, but a CABG and something else, and therefore no longer an index procedure. The patient shifts into another category.

[* Even today, the overwhelming majority of heart surgeons are male, but this is slowly changing.]

‘Now that you mention it,’ he says to his assistant, ‘that scar looks more and more like an aneurysm to me …’

It is obviously difficult to track with any certainty how often this sort of behaviour happens, but there are reports that such ‘gaming’ has been observed and is in fact commonplace in the US (Shahian 2001). Patients have had ‘aneurysm’ repair when there wasn’t one, tricuspid valve repair when it wasn’t needed, and other ingenious additions of unnecessary manoeuvres to shift them out of the index-procedure category. I have tried to gauge the prevalence of such behaviour in the United Kingdom by conducting an anonymous online survey of all heart surgeons. The question I asked my fellow surgeons, and their responses, are outlined below:

It is possible for a cardiac surgeon to modify the appearance of surgeon-specific outcomes by using ‘category shift’. For example, in a CABG, adding a few stitches to the left ventricle and calling the operation CABG and LV aneurysm, or a couple of stitches to the tricuspid valve to add tricuspid valve repair, or excising a sliver of aorta in an AVR to call it AVR and aortoplasty, or ascending aortic repair. There are other examples. The net result is that an operation is shifted from a lower risk category to a higher risk category. Have you ever done this?

Are you aware of other surgeons doing this?

Of the 115 surgeons who responded, 12 (or just over 10 per cent) admitted to having practised category shift themselves, and more than half (55 per cent) stated that they were aware of other surgeons doing so.

The above findings illustrate the possible unpleasant consequences of a simple, well-meaning attempt at measuring the quality of a specialty service. The specialty is heart surgery, and the index procedure reasonably chosen to measure its quality is CABG, but the result of the exercise is a combination of damage to patients and unintentional muddying of the waters in the data pool. The problem with setting targets, as many health departments and ministers have discovered to their detriment and to that of their patients, is that, if you set the wrong target, you run the risk of distorting clinical decisions, with unexpected and sometimes damaging consequences. In fact, even if, with the best of intentions, you set the right target, but with insufficient consideration to the methods of measurement or to the human reaction to your plan, the law of unintended consequences may have some nasty surprises for you. The road to hell, as they say, is paved with good intentions.

In medicine, as in every field of human endeavour, we can choose what we want to measure. Sir Bruce Keogh, one-time cardiac surgeon and later medical director of the UK’s National Health Service, used to ask this perceptive and thought-provoking question at meetings: ‘Do we make what’s important measurable, or what’s measurable important?’ It is a crucial question, for the simple reason that it is so tempting, when faced with something that is easy to measure, to fall into the trap of making that something important just because we can measure it easily. It is much more difficult to look at what actually matters to us, and to find a way of measuring it. This is a vital concept that should be drilled into all politicians and managers in charge of healthcare.

Our hypothetical New York surgeon replicated the behaviour of surgeon B in producing category shift. Surgeon B did it out of ignorance and bloody-mindedness, and the New York surgeon did it semiconsciously with an eye on the league tables. Both of these instances are rather extreme representations of what can happen when the wrong targets are set, but there are even greater pitfalls than category shift in interpreting medical outcomes.