CHAPTER FIVE

DIAGNOSTIC THINKING

Medical shows on TV are popular because we are enraptured by the brilliant doctor who plucks an esoteric diagnosis out of the hat, based on some obscure clinical detail and a keen memory for arcane facts. When we ourselves are patients, however, we prefer decidedly less drama. We like it when the diagnosis is straightforward and reassuringly boring. In real life nobody wants to be a diagnostic puzzle or a cliffhanger ending.

As we can see with Jay, real life in medicine is immensely complicated. Textbook logic for making a diagnosis falls apart in the messy world of human physiology, rendering medicine even less akin to the aviation industry. Even when we think we have a culprit—in Jay’s case, the MRSA bacteria growing in his blood—there is still uncertainty. Jay’s condition was worsening despite antibiotics for MRSA. Was the problem ineffective treatment? Delayed treatment? Perhaps there were other diagnoses brewing alongside the MRSA infection?

It was particularly notable that Jay was short of breath and remained febrile. Shortness of breath has many possible explanations, but in the presence of a fever, pneumonia is the prime contender. His chest X-ray was reported as “negative” for pneumonia, but this illustrates one of the challenges of making an accurate diagnosis. Does a negative chest X-ray mean that a patient does not have pneumonia?

As always, context counts. A negative chest X-ray could reasonably rule out a pneumonia in a healthy person with a mild cough sitting comfortably in a doctor’s office. But in the context of a critically ill, febrile, immunocompromised patient in a hospital ward who is struggling to breathe, a negative chest X-ray assumes a different meaning, or several meanings. It could represent a false negative; that is, the pneumonia exists, but the X-ray did not show it. Or a negative X-ray report could mean that the radiologist made an error when reading the film, but the X-ray actually did show a pneumonia. Alternatively, it could be that an X-ray is the wrong test to diagnose pneumonia in this context; an X-ray might simply be incapable of picking up a pneumonia in a patient whose devastated immune system can’t gin up the normal inflammatory response that creates the typical X-ray findings.

Patients often assume that tests such as X-rays are objective, like calculators: put in the numbers and the correct answer will be spit out. Reading X-rays, however, is a learned, cognitive skill done by humans, who have to render subjective decisions about what constitutes normal and what constitutes pathology. Sometimes a pneumonia on a chest X-ray is blindingly obvious—an entire chunk of lung is whited-out from inflammation. But often, the radiologic signs of pneumonia are subtle. Countless times I’ve stared at a patch of haziness on an X-ray until my eyes water, debating whether this vague fuzziness represents a true pneumonia or whether it’s just schmutz (a word that has migrated seamlessly from Yiddish to official radiologic terminology). A good radiologist spends years staring at films, learning the copious variations that inflammation and infection of the lung tissue can take.

Because X-ray reading is essentially about visual pattern recognition, it’s an area in which technology and artificial intelligence is making headway. You teach a doctor to read X-rays by showing her enough examples during her training, and the idea is that you could similarly teach a computer system the same thing by inputting a sufficient number of images. Trial and error should allow the system to learn to distinguish what is pathology from what is schmutz.

A group of researchers from California have tried to create such a system by uploading images from 112,120 chest X-rays.1 These X-rays had been individually labeled as normal or as having up to fourteen abnormalities, including pneumonia. The researchers created an algorithm to analyze the images and with “machine learning” were able to train the system in much the way you’d train a radiology resident. They then tested the system with 420 new X-rays to see how well it did in diagnosing pneumonia. And for fun, they tested it against nine radiologists from esteemed academic institutions, who independently reviewed the same 420 images.

The computerized system did just as well as its human counterparts for ten of the fourteen pulmonary abnormalities, including pneumonia, lung masses, and fluid in and around the lungs (the radiologists edged out the system for emphysema, hiatal hernia, and enlarged heart). Furthermore, it was able to evaluate the 420 X-rays in 1.5 minutes, whereas the humans took an average of 240 minutes. It isn’t necessarily surprising that the computerized system did so well—and didn’t need coffee or bathroom breaks—since accurate pattern recognition relates to the sheer amount of experience in seeing prior patterns. With a computerized system, you can take the foie gras approach and endlessly cram in the images. Unlike with a goose—or a radiology resident, for that matter—you won’t get squawks, ruptured intestines, or “Oh, it’s 6 p.m., I gotta run.”

Success with visual pattern recognition has sparked interest in using artificial intelligence and computerized algorithms to improve diagnostic accuracy for a wide variety of medical conditions. For straightforward clinical situations, such as whether an ankle injury requires an X-ray to distinguish a sprain from a fracture, it’s relatively easy to create an algorithm. But it’s a whole other can of worms to teach a computer to make a diagnosis when a patient presents with a vague symptom like “My stomach hurts” or “I’m feeling more tired these days.”

Improving the grand diagnostic process is something of a holy grail for researchers in this field. Wouldn’t it be great to have a system in which you could enter the patient’s symptoms, and the program would create an accurate differential diagnosis? It would include all the rare diseases that fallible humans tend to forget but, of course, eliminate the ones that are too far out in left field. It would generate an intelligent road map for a thorough—but not reckless—workup. It would take into account cost efficiency and clinical context and assiduously avoid both false-positive and false-negative errors. Holy grail, indeed!

Several diagnostic tools have been created with this goal in mind, and a few of them are out there in practice: ISABEL, VisualDx, and DXplain (“Dx” is medical shorthand for diagnosis). A review that analyzed all the published studies about these programs came to mixed conclusions.2 Researchers didn’t find compelling evidence to make a wholesale recommendation for doctors to use them but did say that they had the potential to be of assistance. I spent an afternoon trying out these systems when I was supervising in our walk-in clinic. Each time a resident or student presented a case, I’d enter the symptoms into the system. Before I pressed “submit,” we’d come up with our own differential diagnosis and then compare our results to the computer. For the cases that were straightforward, the system was far too laborious; our minds were much faster and more efficient. But then we got a case that represented a diagnostic dilemma. Perfect for testing out the system.

The case involved a young, healthy woman in her early twenties who was experiencing episodes of rapid heart rate and shortness of breath. She’d previously played on sports teams but was now too fatigued to do so. Financial difficulties had recently forced her family to move into a cramped basement apartment. She disliked it intensely and felt very anxious whenever she was alone in the apartment.

She’d already stayed overnight in the hospital after an ER visit and the major cardiac causes had been ruled out. The cardiologists felt it was anxiety that was making her heart race, and when they gave her a beta-blocker to slow the heart rate, she felt better, though not completely.

As soon as my team and I began typing in the symptoms, we realized just how formidable it is to quantify the diagnostic process. Entering “tachycardia” and “dyspnea” (racing heart and shortness of breath) brought up a voluminous list of possible diagnoses. The system was casting a wide net so that it wouldn’t miss anything, but we humans wouldn’t have wasted an iota of mental effort on a good chunk of their diagnoses. For example, the list was headed up by “septic shock”—which of course can present with those symptoms. But when you are evaluating a healthy-appearing woman who is smiling and chatting amiably with you, septic shock would never enter your mind (as opposed to when you are evaluating a patient like Jay, who also had tachycardia and dyspnea). Nor would massive hemorrhage or ruptured aortic aneurysm—two other diagnoses that appeared on the list.

There was no box to type in the “gist” of what this young woman was like. There was no place for context. There was no field to enter psychosocial issues like “financial stress forcing move to claustrophobic, subterranean apartment.” I don’t fault the system for this, but these limitations highlight the breadth of the elements that enter into the diagnostic process. Furthermore, the system wouldn’t be able to consider that a basement apartment might host more mold than an upstairs apartment. Mold causes or exacerbates a host of pulmonary ailments, from asthma to aspergillosis to hypersensitivity pneumonitis, so the system would miss these possibilities.

In reviewing the differential diagnosis list that the system provided for our case, we quickly crossed off the assorted catastrophic conditions that weren’t remotely a consideration in a walking, talking, not-prostrate-on-a-gurney patient. The remainder of the list mostly contained things we’d already considered, such as hyperthyroidism and anemia. It did mention a few that we might not have thought about, such as acute porphyria and sodium azide poisoning. Our conclusion was that the system didn’t really parallel the manner in which we think about patients, but it could be useful to jog our memories for rarer things.

One of the major criticisms of these computerized diagnosis systems is that they have an incentive to cast their nets as widely as possible. This allows the commercial developers to tout impressive statistics about how frequently the correct diagnosis appears on the list. But in actual clinical practice, doctors have to make real-world trade-offs, especially when it comes to rarer diagnoses that require expensive tests with risks of harm to patients.

We also have to factor in the logistics of the diagnostic workup: How long will it take to get a CT scan? Does the patient’s insurance cover an MRI? When is the next available rheumatology appointment? Can the patient take off time from work to get that thyroid scan? There are also patients’ preferences that affect the diagnostic process: How aggressive does he want to be? How risk averse is she? What are his financial concerns? All of these real-world considerations play a role in how we do diagnostic workups—all intricacies that do not burden these sleekly insouciant algorithms.

Lastly, there are the practical aspects of real-time use. These systems take time to employ. A doctor would have to, in essence, stop her evaluation of the patient in order to give time to the algorithm. Given how time-crunched medical evaluations are these days, anything that subtracts from the (already limited) direct face time between doctors and patients has to offer proven value.

These systems are impressive works of technology whose role is still evolving. Most likely, they will be resources for complex cases and for teaching. But it is important to keep in mind that simply generating a list of possibilities isn’t the same as making the diagnosis. A computer doesn’t have to commit. A doctor does. And so does the patient.

For the young woman we were evaluating in the walk-in clinic, most of the workup came back negative. Her chest X-ray and pulmonary function tests were normal. Unlike a computerized algorithm, we still had to do something to help her symptoms. We still had to commit, even in the absence of a specific diagnosis.

The most important diagnostic clue remained the fact that her breathing was better when she was out of the apartment. So whether it was mold in the apartment that was triggering respiratory symptoms or claustrophobia that was causing anxiety, the best treatment we could offer was to help her structure her life to spend more time away from the apartment. She began spending weekends with her aunt and occasional weeknights with friends. On a follow-up visit she said she felt that her symptoms were improving. She was now focusing her energies on saving enough money to eventually move out on her own.

Computerized algorithms are one way to try to minimize diagnostic errors. But is there a way to improve how doctors do their own internal algorithms? Is there a way to improve the actual thinking? Refining this overall process of diagnostic reasoning is a more global approach that could potentially improve things in all areas of medicine, and it would avoid the pitfalls of the disease-of-the-week approach that many healthcare systems use in tackling medical error.

Mark Graber and Hardeep Singh, the researchers focused on improving diagnostic accuracy, admit that targeting thinking is far more challenging than standard quality-improvement projects such as reducing hospital-acquired infections or screening for depression. Is it an impossible task? “With ten thousand diseases, there is so much uncertainty,” Graber admits. “We get it right 90% of the time. That’s pretty amazing!” And then he adds slyly, “But can we get it to 95%?”

Nudging the accuracy rate up, rather than attempting to eliminate all diagnostic error, seems like a doable prospect. But thinking styles are deeply ingrained and decidedly recalcitrant when commanded to evolve. Our declarations of rationality are routinely undercut by our unconscious biases, not to mention those rascally emotions that we are convinced we retain mastery of. Additionally, our thinking styles are routinely and unceremoniously jolted when we are rushed, pressured, inattentive, or otherwise distracted.

There are times when we cogitate like Kant and other times when we rely on intuition and guesswork like $5 sidewalk psychics. Mental shortcuts abound in our reasoning and our reliance on them is matched only by our chipper unawareness of using them. Such capricious thinking patterns aren’t as amenable to typical patient-safety checklists or aviation-style standardization.

Furthermore, it is not clear how such research could even be done. As Semmelweis, Nightingale, and Pronovost all demonstrated, you need to be able to measure the problem, apply an intervention, and then track the results with meaningful real-world outcomes. How would you take even the first step with regard to diagnostic reasoning? There isn’t a thought-o-meter that researchers can seamlessly slip into our cerebral gyri to measure our reasoning process in all its fits of brilliance and banality. If asked, most of us probably couldn’t even describe how we think. So while it makes intuitive sense to work on the thinking process, doing the actual research is no picnic. As a result, data in this field are notably sparse.

Nevertheless, since the thought process is the source of most diagnostic error, there is still value in pursuing this avenue. It is especially beneficial in the early years of medical school and nursing school, as this is when diagnostic “habits” are formed. And even for established clinicians, there is probably at least some value in thinking about how we think. Even if only modestly effective, it is likely the truest way to decrease diagnostic error.

There are several techniques that can be used to hone our diagnostic thinking process, and they share in common the idea of resisting the urge to jump to an easy conclusion. They start with Graber’s dictum about the “discipline of the differential diagnosis.” Every time doctors evaluate patients, they should force themselves to consider a range of possibilities before homing in on a single one. The very act of considering alternatives opens up the mind to, well, alternatives. You can’t get to a particular diagnosis if you never actually consider it.

So how do you do this in real time? As soon as a first diagnosis is entertained, the doctor (and the patient) should ask, “Could it be anything else?”—Graber and Singh call this the “universal antidote” to diagnostic error—and then keep asking it. A cough, for example, may seem like a standard viral URI (upper respiratory infection) at first blush, but what else could it be? Sinusitis, bronchitis, influenza, and pneumonia all have a cough component. Gastric reflux can present with a cough, as can asthma. A cough could also be a sign of congestive heart failure or a side effect of ACE inhibitors (a class of blood pressure medications). A cough could signify tuberculosis or lung cancer. It might be emphysema or pertussis (whooping cough). A cough could be caused by environmental irritants such as mold, or by accidentally inhaling objects that are best left outside the body.

What else could it be? Well, there are a host of less common conditions that can present with cough, such as sarcoidosis, interstitial lung disease, vascular malformations, or blood clots in the lung. And then there are the rare diseases such as amyloidosis, relapsing polychondritis, granulomatosis with polyangiitis, and psychogenic cough. And the even rarer conditions like syngamosis, pulmonary Langerhans cell histiocytosis, and tracheobronchopathia osteochondroplastica.

Or maybe the cough is just plain old post-nasal drip.

While it isn’t necessary to hoof it as far as tracheobronchopathia osteochondroplastica for the average patient with a cough, the point is that the more you ask yourself, “What else could it be?” the more ideas you come up with. The vast majority of coughs will fall into the first few tiers of the differential, but it’s still important to think beyond. As attendings intone—ad infinitum—to medical students on rounds, “No one ever made the diagnosis of sarcoidosis without first considering sarcoidosis in the differential.” The nice thing about the what-else-could-it-be strategy is that it’s straightforward and its logic fits naturally with the whole idea of the differential diagnosis.

There are some variations on this strategy. Considering the consequences of a missed diagnosis of a more severe illness is another way to challenge your thinking. Yes, most coughs in a typical primary care setting flower in the URI/bronchitis/sinusitis garden and will get better on their own no matter what you do or don’t do. But what if the cough is the harbinger of a lung cancer? Or a blood clot? Missing these could be devastating, even deadly, so even though they are far less frequent, we always need to be sure we’ve thought about these severe diagnoses and then rallied the data that would exclude them.

When I am reviewing a case with students or interns, I push them to give a full differential, not just jump out with the one diagnosis that seems obvious. After they’ve provided their list, I’ll ask the two golden questions: “What else could it be?” and “Is there anything that we can’t afford to miss?”

The hard part is remembering to do this for myself. When I’m rushing through a busy day, backlogged with patients and overwhelmed with the never-ending EMR minutiae, this discipline melts away faster than the smooth talk from a drug rep. If it walks like a URI and quacks like a URI, I’ll pretty quickly chalk up that cough to a URI without rigorously questioning myself. How many diagnostic errors have I made by doing this? Unfortunately, I’ll never know.

Another cognitive trick is to focus on the data that don’t fit your presumptive diagnosis. If I’m making the diagnosis of URI in a patient with a cough who also has a rash, I should expend a neuron or two on the fact that rash is not usually associated with a URI. That would force me to reconsider my diagnosis. Maybe it’s something other than a URI, such as infection with parvovirus B19 or Epstein-Barr virus. Or maybe the patient has two different things going on. Patients with URIs, after all, are allowed to also have eczema. Or maybe the patient took a medication to treat the URI and had an allergic reaction to it.

These questions and exercises designed to sharpen diagnostic thinking have the makings of—you guessed it—a checklist! Mark Graber, along with John Ely and Pat Croskerry, explored the idea of checklists for diagnosis and recognized that you really need two distinct types of checklists—one for content and one for process.3 Content checklists could be the computerized algorithms mentioned earlier that are tailored to the specific patient data that you enter or they could just be plain old lists. Ely, a family physician in Iowa, developed a convenient set of lists for outpatient use.4 He categorized the forty-six most common complaints in outpatient medicine (dizziness, abdominal/pelvic pain, diarrhea, headache, insomnia, etc.) and then listed a dozen or two of the most common causes, with a few tricky ones labeled “commonly missed” and a few serious ones labeled “do not miss.” It’s a quick way for a doctor to run down a list and make sure she hasn’t missed anything.

By contrast, process checklists review the thought process, looking for biases and shortcuts that might undercut diagnostic accuracy. Graber, Singh, and colleagues created a process checklist that includes the standard “What else could it be?” and “What can I not afford to miss?” but also asks some other interesting questions that can affect accuracy. Did I simply accept the first diagnosis that came to mind? Did someone else—patient, colleague—already put on a diagnostic label that’s biasing me? Was the patient recently evaluated for the same complaint? Am I distracted or overtired right now? Is this a patient that I don’t like for some reason? Is this a patient I like too much (family member, friend, colleague)?5

The point of these questions is to make you stop and think. For complex cases in which things don’t make sense, you can take the extra step and do a full-fledged “diagnostic timeout,” similar to the standard timeout procedure before starting surgery. This is especially helpful when existing diagnoses don’t quite add up. I have a patient whose chart listed rheumatoid arthritis (RA) in every note since the Paleolithic era. When I took over her care from a departing doctor, I dutifully listed RA in every one of my notes. Over the years, though, it slowly dawned on me that she never actually experienced any specific symptoms of RA (swollen, tender joints in a symmetrical distribution associated with significant morning stiffness). One day I finally took a diagnostic timeout and dug back through her voluminous chart. After some deep hunting, I found the workup from years earlier in which two classic blood tests for RA were “positive.” That, alongside some nonspecific aches and pains, was how the diagnosis was implanted in her chart. It became entrenched and every subsequent doctor repeated the gospel until it was simply a fact of her medical history. But the truth was that she did not actually have rheumatoid arthritis—those initial blood tests were likely false positives. Diagnoses do indeed evolve over time; this one took a decade to figure out.

Diagnostic checklists, however, are harder to implement than other checklists. Presurgical checklists consist mostly of clear-cut, tangible items: Did we check the patient’s name? The surgical site? They take one second to answer and then you are done and can move on. Well, how do you know when a thought is done? How do you know when to end the “what else could it be” line of thinking? How do you decide whether your fatigue or distraction are significant enough to impair your thinking (given that everyone is exhausted and that interruptions are incessant)?

Additionally, most pre-surgical (and pre-flight) checklists are performed aloud with other people. Not so in diagnosis. “Diagnosis is usually silent, lonely work,” write Ely, Graber, and Croskerry. “A natural pause point to review the checklist, such as before takeoff or before incision, does not exist in diagnosis, which can stretch over hours, days, or even months.”

Many doctors find diagnostic checklists off-putting because they include things that are obvious, even insulting—take a complete history, read the X-ray yourself, take time to reflect. But these authors point out that pilots don’t feel insulted running down their checklists or being questioned by their copilots. They might have initially, but now it’s simply part of the job. Importantly, they don’t do it only in difficult situations; they do it every time, even when they are flying with the most experienced crews on the most perfect sunny days. Doctors, on the other hand, tend to see value in these checklists—if they do at all—only for the difficult cases and diagnostic conundrums.

When I think honestly about my own practice style as a physician, I recognize—with no small amount of abashment—that my approach is cursory or intuitive far more than I’d want to admit. In the survival mode that most doctors and nurses work in today, it’s easy to fall back on snap judgments and obvious diagnoses. Fighting the current to slow down and question my thinking is arduous, especially when I feel like I’m struggling just to keep my head above water most days.

One Monday morning, a patient handed me a note from a pain-management doctor that he was seeing at another institution. In the course of sending out an unnecessarily expansive panel of blood tests, a cortisol level came out slightly low. The doctor had jotted a one-liner for me on a sheet of his prescription paper: “Rule out adrenal insufficiency.”

As my patient began updating me on his six other chronic conditions, I surreptitiously pulled up the webpage on adrenal insufficiency. Not that I don’t remember every detail of adrenal vagaries, mind you. And, sure, I’d re-memorized it all for my board recertification two years prior, but let’s just say that adrenal insufficiency is one of those conditions that resides out on a wobbly, far-flung cortical gyrus.

Adrenal insufficiency is a notoriously knotty topic. The symptoms are both varied and vague. There is primary adrenal insufficiency and secondary adrenal insufficiency. There is acute adrenal insufficiency and chronic adrenal insufficiency. To test for it, I’d have to give the patient a dose of a hormone to stimulate the adrenal gland and then check cortisol levels at zero, thirty, and sixty minutes. There were at least ten variations on how to administer that hormone and even more variations on how to interpret the results. And could I even figure out how to order three separate, timed blood draws? Within a minute, my head was spinning.

While my patient updated me on his back pain, his diabetes, and his GI symptoms, I dug through the fine print to remind myself which way the diurnal variation of cortisol runs: Up in the morning? Down at night? Or vice versa? My patient stacked his fifteen medications on my desk—all of which needed refills, and all of which could interfere with adrenal function and/or adrenal testing. I realized I simply could not sort this all out in the moment.

What I needed was time to think.

I found myself pining for those medical-school Saturdays in the library—endless hours to read and think. Nothing but me, knowledge, and silence, facing off in a tai chi battle of concentration. How I hated those study sessions then, and how I would have given my left adrenal for a few minutes of that now.

But a gazillion EMR fields were demanding attention. Three more charts were waiting in my box. The patient still had two MRI reports and an endoscopy report for me to review, plus a question about prostate testing.

His adrenal insufficiency was swamped by my cerebral insufficiency.

I could tell him I’d review his case later and get back to him. But what “later” were we talking about? My morning session would run overtime by hours—that was a given. There were last week’s labs to review, student notes to check, patient calls to return, meds to renew, forms and papers erupting in a Cubist dystopia all over my desk. There would never be any “later.” (“Later” is a fantasy dreamed up by bureaucrat who’ve never set foot outside a cubicle.) There was only now.

But if I made any clinical decisions now, they would be haphazard, rife with potential for error. They would be an embarrassment. I finally threw in the towel and scribbled a referral to endocrinology—let them deal with it. I hustled my patient out the door and hurried the next person in.

In the pressurized world of contemporary medicine, there is simply no time to think. It certainly doesn’t feel like I have time to run through extensive diagnostic checklists, no matter how logical and important they seem. I sometimes feel as though I am racing to cover the bare minimum, sprinting in subsistence-level intellectual mode because that’s all that’s sustainable. I confess that I harbor a fear of anything “atypical” popping up during a visit. I dread symptoms that don’t add up, test results that are contradictory, patients who lug in bagfuls of herbal supplements with instructions to “ask your doctor.” If I can’t spring to a conclusion in a minute flat, I’m sunk. God help me if their medical history includes Sturge-Weber syndrome or polyarteritis nodosa. I don’t even have enough time to type them (or spell them!), much less look them up and remind myself what they are.

So when I think about the reasoned approach that Graber, Singh, and other researchers suggest, I applaud it. I second it. I yearn for it. But it’s hard to see how it can fit in with the everyday experiences of most doctors and nurses.

A few days after the visit with that patient, I happened on Core IM, an internal medicine podcast created by some of my NYU colleagues. One of the hosts mentioned an episode on adrenal insufficiency. “It’s one of those topics,” he observed, “that’s never nailed down fully.”

Ah, so maybe I wasn’t the only idiot who couldn’t iron out adrenal insufficiency on the fly. Maybe I wasn’t such a loser for not being able to orient the hypothalamic-pituitary-adrenal axis in the middle of a chaotic clinic session. I listened to the episode and then reread the chapter. With an actual case in hand, the physiology clicked more easily. The next day, I went to work early, opened the patient’s chart, and resifted through his data.

I still wanted him to see an endocrinologist, but at least now I didn’t feel like I was handing off a mess. I appended my initial note with a more intelligible analysis and called the patient to explain our plan. When I closed out the chart, I felt satisfied with the case for the first time. In retrospect, I realize that I had taken a full-fledged diagnostic timeout, which is what this case required. It felt frankly thrilling to have given medical care at the appropriate level of thoroughness, to have fended off the cutting of corners we’re so often forced to employ.

Of course, sorting out this one issue for this one patient took a full hour outside his visit. I couldn’t have pulled it off in the moment, and I can’t carve out an extra hour during that nonexistent “later” for every patient with a complex problem. But that’s what so many of our patients’ diagnoses require—time to think, consider, revisit, reanalyze. From the billing-and-coding perspective, this is supremely inefficient. There’s no diagnosis code for “cognitive pandemonium.” There’s no billing code for “contemplation.” But extra time dedicated to thinking—using diagnostic checklists to expand our differential diagnosis as well as to examine our thinking process—could actually be remarkably efficient.

Time to think seems quaint in our metrics-driven, pay-for-performance, throughput-obsessed healthcare system, but we’d make fewer diagnostic errors and surely save money by reducing unnecessary tests and cop-out referrals. I suspect time to think would also make a substantial dent in the demoralization of medical professionals today, but that’s a whole other story.

Reducing diagnostic error will ultimately require a culture shift in healthcare. We need to reorient how we think as well as the culture that impedes our thinking. According to Singh, this would involve “acknowledging uncertainty and associating humility rather than heroism with our diagnostic decision-making capabilities.”6 There are few diagnoses more rare in the medical species than intellectual humility. There are few allergies more common than that of doctors to uncertainty.

“Overconfidence is an enormous problem,” Graber observed to me, “both personal and organizational.” We’re so sure of our snap-judgment diagnosis that we rarely stop to think about what else it could be, much less whether our thought process was flawed in any way. If we do, it’s usually only on a very cursory level.

But Graber also admitted that overconfidence doesn’t come about just because we doctors think we’re so smart (although that arrogance is certainly a hefty contributor!) but that it also stems from a lack of feedback. If we never hear back from the patient, then we assume everything is fine and that we must have been correct in our diagnosis. That may indeed be the case some of the time. But it’s equally plausible that not hearing back means the patient didn’t get better and that we were wrong. The patient might have sought care elsewhere and been given the correct diagnosis by a different doctor. Or worse, not hearing back from our patients might mean they’ve exited stage left thanks to our missteps. But we have no way of knowing.

It was when Graber suggested that doctors ought to have something like “Stump the Chumps” that I knew we were kindred spirits. For more years than I care to admit, I’ve been inexplicably addicted to the radio show Car Talk. Hosted by Tom and Ray Magliozzi, brothers endowed with capacious Boston accents and equally capacious belly laughs, it was a call-in show about automotive repair. Full disclosure: I’m a Manhattanite who doesn’t own a car and hopes to remain auto-less till my last terrestrial breath. But there I was, week in and week out, riveted by the discussions of head gaskets and timing belts. The shows were funnier than most TV shows billed as comedy and surprisingly informative (hey, taxis break down occasionally, so even New Yorkers need to know their camshafts from their crankshafts).

Even when the show went off the air, I listened to reruns. Even after Tom sadly passed away, I listened to the podcasts—I was that much of a groupie. It didn’t matter that the cars they were talking about were twenty years out of date; there was something ridiculously comforting hearing about the “staff” that included Russian chauffeur Picov Andropov, Greek tailor Euripedes Imenedes, and the white-glove law firm Dewey, Cheetham, and Howe. After particularly aggravating days in the hospital, I turn on the Car Talk podcast before I’ve even hung up my white coat. It works faster than Valium and the only side effect is cackling like a goofball while crossing Twenty-Eighth Street.

So it was an unexpected delight when I interviewed Mark Graber and he—unprompted—brought up Car Talk in our conversation. Every few weeks, Tom and Ray would run a segment called “Stump the Chumps” in which they’d bring back a caller from a previous show. They’d replay the original call and review their analysis at the time. Then the caller would tell them how things played out, and Tom and Ray would learn if they’d made the correct automotive diagnosis or not.

Graber’s point was that we need something like “Stump the Chumps” in medicine—a regular feature in which patients come back and let us know how they fared and whether we got the diagnosis right. In academic centers, there are M&M (“morbidity and mortality”) conferences, but these tend to focus on disasters. And patients—even if they’ve survived—aren’t typically part of the M&M process. In reality, there isn’t really any forum—either in academia or private practice—for ongoing feedback from patients in these more ordinary situations. A dose of Car Talk for this (and for many a mind-numbing administrative meeting) might be just what the doctor ordered. You never know when you’ll need to call upon Car Talk’s director of staff bonuses, Xavier Breath, or bungee-jumping instructor, Hugo First.

Sixteen years after the Institute of Medicine published the To Err Is Human report, which set the patient-safety movement in motion, it took up the subject of diagnostic error.* The report offered the chilling observation that nearly everyone will experience at least one diagnostic error in their lifetime.7 It’s quite a damning statistic, one that garnered eye-catching news headlines. Of course, not all of these diagnostic errors have significant clinical consequences (misdiagnosing mild arthritis for tendinitis is unlikely to harm anyone, especially since they are treated nearly identically). But many misdiagnoses have the potential to cause significant harm to patients, in addition to squandering prodigious amounts of money.

Refreshingly, the report did not simply point the finger at the incompetence of individual physicians, as both lawsuits and popular media tend to do. Rather, it described a Borgesian healthcare system that seems almost intentionally designed to stymie the diagnostic thought process. It noted that our reimbursement system favors procedures over thoughtful analysis. That is, more revenue is generated if I order an MRI for all of my patients with abdominal pain than if I spend extra time talking with them and sorting out the details.

If I review a case with a colleague to get a second opinion, or call a radiologist to discuss whether a less expensive ultrasound would suffice, that would not be reimbursed in our current system. If I make additional phone calls to a patient after the visit to elicit further clarifying information, that too would be ignored by the billing system.

Talking about reimbursement may reinforce the stereotype that doctors care only about money. But in reality, if something is not reimbursed, it’s hard to get it done because there are only so many hours in the day. For time-pressed clinicians, the system makes it faster and easier to simply order MRIs than to think longer and deeper about our patients’ cases.

So bravo to the IOM for recognizing that diagnosis can be a team sport, and that time spent analyzing a case is as critically important as tests and procedures. The report explicitly presses insurance companies to reimburse for the cognitive side of medicine and to eliminate the financial distortion that overwhelmingly favors procedures over thinking.

Additionally, there needs to be a mechanism for clinicians to report their own errors without fear of getting sued or reprimanded. Near misses—errors that almost happen, or errors that occur but don’t cause harm—represent perhaps the largest trove of information for improving healthcare. Yet medical professionals tend to keep quiet about them, because of both liability fears and the personal shame that accompanies such errors. In chapter 11, I’ll examine efforts to address these concerns.

Overall, diagnostic error is far thornier to tackle than errors related to procedures (e.g., putting in central lines) or even medication errors. The sheer number of possible diseases multiplied by infinite human variability makes diagnosis much less amenable to simplistic checklists and rigid algorithms. Real-life clinical medicine never lays out like the neat bullet points of task-force reports, no matter how expert or well-meaning the authors are.

There’s an old adage that 90% of diagnoses are made just by taking a patient’s history. This probably isn’t 100% accurate, but it’s pretty darn close. Patients and their families are arguably the truest experts when it comes to the illness at hand. Improving communication between doctors and patients would be an excellent investment for preventing diagnostic errors.

The other adage worth remembering is that the most important part of the stethoscope is the part between the earpieces. The work of Graber, Singh, and other researchers has demonstrated that most medical errors in diagnosis are cognitive, and so we have to pay attention to how we train clinicians to think, something I’ll touch on more in chapter 14. In nearly every diagnostic situation, there’s almost always a certain part of our stethoscopes that could stand to be tuned a little more finely.

__________________

* In that same year, the IOM also changed its name to the National Academy of Medicine and folded into the far less abbreviatable “National Academies of Sciences, Engineering, and Medicine.” Like many of my colleagues, though, I retain a soft spot for the more mellifluous “IOM.”