image

After we identify an argument, along with its purpose and structure, and fill it out with suppressed premises, we finally reach the point when we can evaluate it – that is, ask whether it is any good. To call something good, as we have seen, is to say that it meets the relevant standards. So, what are the relevant standards for arguments?

One standard is pragmatic. Just as we call an advertisement good when it increases sales, because that is its purpose, so we call an argument good when it serves its intended purpose. If an argument is presented in order to persuade some audience, then it is good in this pragmatic way to the extent that it succeeds in persuading that audience. However, the argument might persuade only by tricking its audience into believing something that they have no real reason to believe. The argument might offer no reason at all or only a very bad reason. In this case it persuades without justifying.

If we seek justification, understanding and truth instead of persuasion alone, then we hold arguments to a higher standard. We want arguments that provide good and adequate reasons or at least some real reason as opposed to a trick or misdirection. But then we need standards for determining when reasons are good in some epistemic sense that has to do with truth and justification instead of only belief or persuasion. That is the kind of standard and value that we will discuss in this chapter.

The particular relation to truth and justification that an arguer claims in an argument depends in part on its form. Some arguers want their premises to guarantee their conclusions, whereas others are happy with some evidence short of any guarantee. On this basis, it is common to distinguish deductive from inductive forms of arguments, so we will follow that tradition, although we will see that this distinction is problematic in some ways.

Was Sherlock Holmes a master of deduction?

Let’s start with a few simple examples. Imagine someone who argues like this:

(I)    Noel is a Brazilian.

Therefore, Noel speaks Portuguese.

This argument is clearly not valid, because Noel could easily be a Brazilian who does not speak Portuguese. Maybe Noel is a baby who is too young to speak any language or a recent immigrant who has not yet learned Portuguese.

Despite these weaknesses, it is easy to add a single suppressed premise that makes this argument valid:

(II)   All Brazilians speak Portuguese.

Noel is a Brazilian.

Therefore, Noel speaks Portuguese.

Now, it is not possible for both premises to be true when the conclusion is false. If the conclusion is false, because Noel does not speak Portuguese, then either Noel is not a Brazilian (in which case the second premise is false) or Noel is a Brazilian who does not speak Portuguese (in which case the first premise is false). This relation between its premises and conclusion makes argument (II) valid.

Great, so it is valid! Does that make argument (II) any better than argument (I)? No. Adding the suppressed premise that turned invalid (I) into valid (II) simply shifted any doubts from the relation between the premise and conclusion in (I) to the first premise in (II). This shift merely raises the question of whether we should accept that added premise.

What kind of evidence could support the premise that all Brazilians speak Portuguese? Maybe the speaker generalized from the Brazilians whom he knows. Then his argument might seem like this:

(III)   All Brazilians whom I know speak Portuguese.

Noel is a Brazilian.

Therefore, Noel speaks Portuguese.

Unfortunately, now the argument is back to being invalid, because it is possible that I do not know Noel, who does not speak Portuguese even though he is a Brazilian.

Another possibility is that the arguer read on Wikipedia that Brazilians speak Portuguese, and he assumed this meant all Brazilians.

(IV)   Wikipedia says that Brazilians speak Portuguese. Wikipedia says that Brazilians speak Portuguese.

Therefore, all Brazilians speak Portuguese.

Noel is a Brazilian.

Therefore, Noel speaks Portuguese.

The last three lines are just like argument (II), so that second part is still valid. However, the inference from the first line to the second line is clearly not valid, because Wikipedia might be wrong or might have been referring only to Brazilians in general rather than to every single Brazilian, including babies and recent immigrants.

This sequence of arguments teaches an important lesson. argument (II) – repeated in lines 24 of (IV) – is the only one that is valid. By squeezing the argument into this stilted form, the speaker suggests that he intends argument (II) to be valid. After all, it is obviously valid, and it took effort to formulate it to be valid, so the speaker must have wanted it to be valid and to appear valid. In contrast, arguments (I), (III) and the first two lines of (IV) are all obviously invalid, so speakers would not formulate these arguments in this way if they intended their arguments to be valid. This contrast shows that some speakers intend their arguments to be valid, while others do not.

That intention is the difference between deductive and inductive arguments. An argument is deductive if its proponent intends it to be valid. An argument is inductive if its proponent does not intend it to be valid. Thus argument (II) is deductive, but arguments (I) and (III) are inductive. Argument (IV) combines an inductive argument in its first two lines with a deductive argument in its lines 24.

It might seem odd to distinguish forms of arguments in terms of what their proponents intend. The reference to intention is needed, however, because of bad deductive arguments, like this:

(V)  All Brazilians speak Portuguese.

All citizens of Portugal speak Portuguese.

Therefore, all Brazilians are citizens of Portugal.

If speakers were ever confused enough to give this invalid argument, then the fact that they put it in this form would suggest that they intended it to be valid. That intention explains why we would classify this argument as deductive, even though it is invalid and fallacious.

This way of distinguishing deduction and induction shows why that distinction is important. Since deductive arguments are intended to be valid, it is fair to criticize them for being invalid. In contrast, the fact that an inductive argument is invalid is no criticism at all, because it is not intended to be valid. To criticize an inductive argument for being invalid is just as inappropriate as criticizing a rugby ball for failing as a football (or soccer ball) when the rugby ball was never intended for use in that other game.

Although this notion of induction is common among philosophers and logicians, others conceive of induction very differently. Some people say that induction rises from particulars to generalizations. This characterization is inaccurate, because some inductive arguments run in the reverse direction, as we will see.

Another potential source of confusion is Sir Arthur Conan Doyle, who described his fictional detective Sherlock Holmes as a master of the science of deduction, because Holmes could draw conclusions from minor observations that others overlooked. In one story, Holmes glimpses a man on the street and immediately pegs him as ‘an old soldier … served in India … Royal Artillery’. How could he tell so much so quickly? ‘ “Surely,” answered Holmes, “it is not hard to say that a man with that bearing, expression of authority, and sun-baked skin, is a soldier, is more than a private, and is not long from India … He had not the cavalry stride, yet he wore his hat on one side, as is shown by the lighter skin on that side of his brow. His weight is against his being a sapper [a soldier who works on fortifications]. He is in the artillery.” ’1 These inferences are amazing, but are they deductive? Well, the arguments are clearly not valid, because it is possible that the man is an actor playing the part of an old artilleryman in India. Since their invalidity is so obvious, it is unlikely that anyone as smart as Holmes would have intended them to be valid. So these arguments are not deductive by our definition. That does not mean that the arguments are no good. Their brilliance is the point of the incident in the story. Still, instead of being a master of deduction, Holmes is a master of induction – in the philosophical sense of these terms.

What’s so great about deduction?

Why did Conan Doyle misleadingly describe Sherlock Holmes as a master of deduction instead of induction? Perhaps to heap the highest possible praise on Holmes’s reasoning. Many people assume that deduction is somehow better than induction. The comparisons among arguments (I)(V) should already make us sceptical of this assumption, but it is worth asking why so many people believe it.

One reason for preferring deduction might be that it seems to achieve certainty by ruling out all contrary possibilities. A valid argument excludes any possibility of a false conclusion when its premises are true. Another apparent advantage of deduction is that validity is indefeasible in the sense that, if an argument is valid, then adding an extra premise can never make it invalid. (Just try it with argument (II).) Addition cannot invalidate validity.

These features of deduction seem desirable if you want certainty. Unfortunately, you can’t always get what you want, according to philosophers Mick Jagger and Keith Richards. The appearance of certainty in deductive arguments is an illusion. The conclusion of a valid argument is guaranteed only if its premises are true. If its premises are not true, then a valid argument shows nothing. Hence, when we cannot be certain of its premises, a deductively valid argument cannot create certainty about its conclusion.

An argument’s validity does rule out the option of believing the premises and denying the conclusion, but you still have several alternatives: you can either accept the conclusion or deny a premise. In argument (II) above, you can deny the conclusion that Noel speaks Portuguese as long as you give up either the premise that Noel is a Brazilian or the other premise, that all Brazilians speak Portuguese. The argument cannot tell you whether its own premises are true, so it cannot force you to accept its conclusion as long as you are willing to give up one of its premises.

This point is ossified in the adage: ‘One person’s modus ponens is another person’s modus tollens.’ Recall that modus ponens is the argument form ‘If x, then y; x; so y’, whereas modus tollens is the argument form ‘If x, then y; not y; so not x.’ In modus ponens, the antecedent x is accepted, so the consequent y is also accepted. But in modus tollens, the consequent y is rejected, so the antecedent x is also rejected. The conditional ‘If x, then y’ cannot tell us whether to accept its antecedent x and then apply modus ponens, or instead to deny its consequent y and then apply modus tollens. Similarly, a valid argument cannot tell us whether to accept its premises and then accept its conclusion, or instead to reject its conclusion and then also reject one or all of its premises. As a result, the valid argument by itself cannot tell us whether or not to believe its conclusion.

We cannot easily give up either premise if both premises are justified. However, all that shows is that the real force of a valid argument comes not from its validity but from the justifications for its premises. If my only reason to believe that all Brazilians speak Portuguese is that all Brazilians whom I know speak Portuguese, then it is hard to see why valid argument (II) is any better than invalid argument (III). The only real difference is that the uncertainty in argument (II) is about its first premise, whereas the uncertainty in argument (III) is about the relation of its premises to its conclusion. Neither form of argument avoids uncertainty. They simply locate that uncertainty in different places.

For these reasons, we need to give up our quest for certainty.2 One way to curtail this impossible dream is to turn from deductive arguments to inductive arguments. Inductive arguments are not intended to be valid or certain. They do not try or pretend to rule out every contrary possibility. They admit to being defeasible in the sense that further information or premises can turn a strong inductive argument into a weak one. All of this might seem disappointing, but it is actually invigorating. The realization that more information could make a difference motivates further inquiry. A recognition of uncertainty also brings humility and openness to contrary evidence and competing positions. These are advantages of inductive arguments.

How strong are you?

Since inductive arguments by definition do not aim at validity, what do they aim at? The answer is strength. An inductive argument is better if its premises provide stronger reasons for its conclusion. Satisfied? I hope not. You should be asking, ‘But what is strength? It is a relation between premises and conclusion, but how can we tell when one reason or argument is stronger than another? And what makes it stronger?’

No answer has achieved consensus. The notion of inductive strength is still highly controversial, but one natural way to think about strength is as probability. On this view, the strength of an inductive argument is (or depends on) the conditional probability of its conclusion, given its premises. An inductive argument is stronger when the probability of its conclusion – given its premises – is higher.

To understand this standard of strength, we need to learn a little about conditional probability. Imagine an area of India where it rains one out of five days in general, but it rains four out of five days during monsoon season. What is the probability that it will rain there on Gandhi’s birthday? That depends on the date of Gandhi’s birth. If you have no idea when Gandhi’s birthday is, it is reasonable to estimate this probability as one out of five or 0.20. But suppose you discover that Gandhi’s birthday is during the monsoon season in this area of India. With that extra information, it now becomes reasonable to estimate the probability of rain on Gandhi’s birthday as four out of five or 0.80. This new figure is the conditional probability of rain on Gandhi’s birthday in this area, given that his birthday is during the monsoon season in that area.

The application to inductive arguments is straightforward. Consider this argument:

Our parade will occur on Gandhi’s birthday in that area.

Therefore, it will rain on our parade.

This argument is neither valid nor deductive, so it makes sense to evaluate it by the inductive standard of strength. The premise by itself gives no information about when Gandhi’s birthday is, so the conditional probability of the conclusion, given the premise, is 0.20. That argument is not very strong since it is more likely than not that it won’t rain in that area then, given only the information in the premise. But now let’s add a new premise:

Our parade will occur on Gandhi’s birthday in that area.

Gandhi’s birthday is during monsoon season in that area.

Therefore, it will rain on our parade.

The argument is still not valid, but it is stronger, because the conditional probability of the conclusion, given the premise, has risen to 0.80. The extra information in the new premise increases the probability. All of this is common sense. If you do not know when Gandhi’s birthday is, the first argument is not a strong reason to reschedule the parade. But when someone adds, ‘That’s during monsoon season!’, then it makes sense to reschedule the parade, unless you like walking in the rain.3

How do I induce thee? Let me count the ways

What is in the grab-bag of inductive arguments? Let’s reach deep into the bag and see what comes out.

Imagine that you want to open a restaurant, and you have chosen a location in Edinburgh, but you have not yet decided whether to serve Ethiopian food or Turkish food, your chef’s two specialties. The success of the restaurant depends on how many people in the neighbourhood like each kind of food. To answer this crucial question, you ask random people in the neighbourhood and discover that 60 per cent like Turkish food, but only 30 per cent like Ethiopian food. You conclude that these same percentages hold throughout the whole neighbourhood. This inference is a statistical generalization that argues from premises about the small sample that you tested to a conclusion about a larger group. Such generalizations are inductive arguments because they are not intended to be valid. The tested sample clearly might not match the whole neighbourhood.

Next, you need to test items for your menu. You decide to try them out on friends and neighbours, but you do not want to test Turkish food on people who do not like it, since they won’t come to your restaurant anyway. You wonder whether your neighbour to the south of your restaurant likes Turkish food. You don’t know anything special about him, so you conclude that he has a 60 per cent chance of liking Turkish food. This argument can be called a statistical application, because it applies a generalization about the whole population to an individual. It is inductive, because it is clearly not valid. It could underestimate the probability if, for example, your neighbour happens to be Turkish.

Finally, your restaurant opens, but nobody shows up. Why not? The explanation cannot be that people in the neighbourhood do not like Turkish food, since 60 per cent do. The explanation cannot be that your prices are too high or that your dishes taste bad, because potential customers do not know your prices or quality yet. The explanation cannot be lack of advertising, because you have big banners, a fancy website and advertisements in local papers. Then you hear that someone has been spreading rumours that your restaurant is filled with cockroaches. Who? Nobody else would have a motive, so you suspect the owner of the older restaurant across the street. This conclusion is supported by an inference to the best explanation. It is also an inductive argument, because its premises give some reason to believe your conclusion, but your suspicions could still be wrong.

Although discouraged, you regain hope when you remember the story of another Turkish restaurant that had a rough first month but then later became extremely popular as soon as people tried it. That other restaurant is a lot like yours, so you conclude that your restaurant will probably take off soon. This argument from analogy is inductive, because it is clearly not valid but does give some reason for hope.

Luckily, your restaurant turns into a huge success. Customers pile in. What attracts them to your restaurant? To find out, you lower your prices a little, but that has no effect on turnout. Then you check your records to see which dishes customers ordered more often, but nothing sticks out. Your curiosity is piqued, so you drop items off your menu one by one and observe changes in the clientele. There is a big drop in customers when you take kokoreç off the menu. Kokoreç consists of lamb or goat intestines wrapped around seasoned hearts, lungs and kidneys. You had no idea that local people like offal so much, but your experiment supports the conclusion that this dish is what causes people to come to your restaurant. This causal reasoning is inductive, because it is possible that something else is the cause, so the argument is not valid, but it still gives you some reason to believe its conclusion. Accordingly, you put kokoreç back on your menu.

All goes well until your restaurant is robbed. The only witness reports that the robber drove off in a Fiat. Only a small percentage (2 per cent) of the cars in Edinburgh are Fiats, so the witness’s report is surprising, and you wonder whether to trust it. You and the police estimate that this witness in these lighting conditions will identify a Fiat correctly around 90 per cent of the time and will misidentify another kind of car as a Fiat around 10 per cent of the time. That sounds pretty good, but then (using Bayes’ theorem) you calculate that the probability of this report being accurate is less than one in six.4 It is five times more likely that the witness misidentified another car as a Fiat. This argument exemplifies reasoning about probability.

This story could go on, but it already includes six kinds of inductive arguments: statistical generalization, statistical application, inference to the best explanation, argument from analogy, causal reasoning and probability. Each of these forms of argument is common in many areas of everyday life. Each has its own standards and can be performed well or poorly. Each has special fallacies associated only with it. Instead of surveying them all, I will focus on a few of the most important kinds of inductive argument.5

How can dates and polls go wrong?

Profiling and stereotypes are anathema to many people. Police are supposed to choose whom to stop or arrest by observing what those people do instead of what they look like or where they are. In everyday life, many people aspire to Martin Luther King’s vision: ‘I have a dream that my four little children will one day live in a nation where they will not be judged by the colour of their skin but by the content of their character.’6 We all hope to be treated as individuals rather than as members of groups.

Despite these hopes and dreams, we all use stereotypes about groups to predict how other individuals will act. Marketing experts use generalizations about groups to predict which customers will buy their products, as with our Turkish restaurant. Doctors use risk factors – which include group membership – to recommend medications and operations. Insurance agents charge individual clients on the basis of whether or not they belong to groups that cost insurers expensive payments. Universities decide which applicants to admit on the basis of their grades. We hope that these professionals will not judge customers, patients, clients or applicants by the colour of their skin, but they also do not base their decisions on the content of their character. They can’t, because they don’t know enough about their character.

In many contexts, it is hard to see how we could do without stereotypes. If I do not know someone at all, but I need to make a fast decision, then the only information I can use is what I can observe quickly. For example, if a stranger in a public bar talks casually with me for a few minutes and then offers to buy me a drink or dinner, then I need to decide whether to trust this stranger. What is he up to? As we saw, Sherlock Holmes might be able to induce a great deal about this stranger, but most of us have no choice but to rely on a few inaccurate generalizations based on our limited experience. We all do it, whether or not we accept the stranger’s offer.

These cases depend on arguments up and down. First, they generalize up from premises about a sample of a group to a conclusion about the group as a whole. Second, they apply the resulting generalization back down to a conclusion about the individual. These two stages can be described as generalization and application.

GENERALIZATION

Each of these forms of argument introduces numerous complexities and complications. Even the most sophisticated reasoning of this sort can go badly wrong. Just recall the surprising mistakes made by political polls in the Brexit vote in the UK and also the 2016 Presidential election in the US. In those cases, even professional statisticians with tons of data were way off base. To avoid such errors and to fully understand statistical generalizations and applications, we all need to take several courses in statistics and probability, and then we need to gather big data of high quality. Who has the time? Luckily a simple example can illustrate a few common methods and mistakes without going into technical detail.

Imagine that you are seeking a male life partner who will play golf with you, and you are curious about online dating websites. You go onto one site, randomly pick ten potential dates, and ask each of them how often he played golf in the last six months. Only one of them reports having played golf at all in the last six months. You reason that only 10 per cent of your sample played golf in the last six months, so around 10 per cent of people who use online dating services play golf. This argument is a statistical generalization, because it runs from a premise about a sample (the ten you asked) to a conclusion about the whole group (people who use online dating sites).

The next day, someone else who uses the site contacts you. You decide not to reply, because you reason like this: ‘This person uses an online dating website, and only 10 per cent of online dating website users play golf, so this person probably does not play golf – or, more precisely, there is only a 10 per cent chance that this person played golf in the last six months.’ This argument is a statistical application, because it applies premises that include a generalization about the whole group to a conclusion about this particular user.

Both of these arguments are inductive, because they are clearly not valid. It is possible that only 10 per cent of your sample plays golf, but many more people who use online dating services play golf. It is also possible that 10 per cent of people who use online dating services play golf, but it is much more likely that this individual plays golf. Because these possibilities are so obvious, this argument is probably not intended to be valid.

How strong are these inductive arguments? That depends on the probability of the conclusion given the premises. To assess that, we need to ask a series of questions to determine how each argument could go astray.

The first question to ask about the generalization is whether its premise is true. Did only one out of your sample of ten play golf in the last six months? Even if only one reported playing golf then, maybe more of them played golf, but they chose to ignore that question; or maybe they played golf but forgot about it; or maybe they denied playing golf because they thought you were asking your question in order to weed out dates who play golf too often. People on online dating sites are not always trustworthy. What a surprise!

The second question is whether your sample is big enough. It is better to ask ten than to ask only three, but it would be better yet to ask a hundred, although it would take a long time to gather such a large sample. A sample of ten thus gives your argument some strength, but not much. Whether it is strong enough depends on how much is at stake. If the sample is too small, then the argument commits a fallacy called hasty generalization.

The third question is whether your sample is biased. A sample is biased when the percentage of the sample with the feature you are seeking is significantly higher or lower than the percentage of the whole group with that feature. Notice that even a large sample (such as 100 or 1,000 online daters) can be biased. This bias could occur if most golfers use a different online dating website, which reduces the number of golfers who use the website that you are sampling. Then you should not use your sample to draw any conclusion about how many people who use online dating services in general play golf. Even if you are interested only in this particular website, your sample might be biased if your application mentioned that you play golf, and the website used this information to suggest possible contacts. Then the names that you received might include many more golfers than is representative of the website as a whole. Or the website might send you only names of local users, and you might live in an area with fewer (or more) golfers than other areas.

Another way to bias your sample is by asking leading or misleading questions. The percentage of affirmative answers would probably have been much higher if you had asked, ‘Would you ever be willing to play golf?’ and much lower if you had asked, ‘Are you fanatical about golf?’ To avoid this way of pushing your results in one direction or the other, you asked, ‘How often did you play golf in the last six months?’ This apparently neutral question still might have hidden biases. If you ask it in April, many golfers in snowy climates will not have played golf in six months, even though they will play as much as they can after the snow melts and their golf courses open. To avoid this problem, you should have asked about a full year. Or maybe they really do like to play golf, but they have nobody to play with, so they are also looking for a partner who plays golf. Then you should have asked whether they want to play golf. The results of generalizations are often affected by the questions used to gather a sample.

Overall, every inductive generalization from a sample needs to meet several standards. First, its premises must be true. (Duh! That is obvious, but people often forget it.) Second, its sample must be large enough. (Obvious again! But people rarely bother to ask how big the sample was.) Third, its sample must not be biased. (Bias is often less clear, because it is hidden in the sampling methods.) You will be fooled less often if you get in to the habit of asking whether all three standards are met whenever you encounter or give an inductive generalization.

APPLICATION

The next kind of induction applies generalizations back down to individuals. Our example was this argument: ‘This person uses an online dating website, and only 10 per cent of online dating website users play golf, so this person probably does not play golf.’ How strong is this argument?

As always, the first question that you need to ask is whether its premises are true. If not (and if you should know this), then this argument does not give you a strong reason to believe the conclusion. But let’s assume that the premises are true.

You also need to ask whether the percentage is high (or low) enough. Your argument would provide a stronger reason for its conclusion if its second premise cited 1 per cent instead of 10 per cent and a weaker reason for its conclusion if its second premise cited 30 per cent instead of 10 per cent. And if its second premise were that 90 per cent of online daters play golf, then it could provide a strong reason for the opposite conclusion that this person probably does play golf. These numbers affect the strength of this kind of inductive argument.

Another kind of mistake is more subtle and quite common. What if the person who contacts you on the dating website contacted you because your profile mentioned golf? Add that 80 per cent of users who contact people because their profiles mention golf are themselves golfers. We can build this new information into a conflicting statistical application. This person contacted you because your profile mentioned golf, and 80 per cent of users who contact people because their profiles mention golf are themselves golfers, so this person probably does play golf – or, more precisely, there is an 80 per cent chance that this person plays golf.

Now we have statistical applications with opposite conclusions. The first said that this person probably does not play golf. The second says that this person probably does play golf. Which is more accurate? Which should we trust? The crucial difference to notice is that these arguments cite different classes, called reference classes. The first argument cites percentages within the class of online dating website users, whereas the second cites percentages within the class of those special online dating website users who contact people because their profiles mention golf. The latter class is smaller and a proper subset of the former class. In cases like this, assuming that the premises are true and equally justified, the argument with the narrower reference class usually provides a stronger reason, because its information is more specific to the case at hand.

Conflicting reference classes are often overlooked by people who apply generalizations to individual conclusions. This mistake combined with the fallacy of hasty generalization lies behind a great deal of stereotyping and prejudice. We all depend on generalizations and stereotypes in some cases, but mistakes about disadvantaged and vulnerable ethnic, racial and gender groups can be especially harmful. A bigot might run into one stupid, violent or dishonest member of an ethnic group. Every group has bad apples. The bigot then hastily generalizes to the conclusion that everyone in that ethnic group is similarly stupid, violent or dishonest. Then the bigot meets a new member of that ethnic group and applies the hasty generalization. The bigot concludes that this new individual is also stupid, violent or dishonest, without considering the fact that this new individual also has other features that indicate intelligence, pacifism and honesty. The bigot’s small sample and failure to consider such narrow conflicting reference classes show how bad reasoning can play a role in originating and maintaining prejudice. Bad reasoning is not the whole story, of course, since emotion, history and self-interest also fuel bigotry, but we still might be able to reduce prejudice to some degree by avoiding simple mistakes in inductive arguments.

Why did that happen?

Our next form of inductive reasoning is inference to the best explanation. It might be the most common form of all. When a cake does not rise, the baker needs to figure out the best explanation of this catastrophe. When a committee member does not show up to a meeting, colleagues wonder why. When a car does not start in the morning, its owner needs to find the best explanation in order to figure out which part to fix. This kind of inductive argument is also what detectives (like Sherlock Holmes) use to catch criminals. Detectives infer a conclusion about who did it, because that conclusion provides the best explanation of their observations of the crime scene, the suspects and other evidence. Many crime dramas are, in effect, long inferences to the best explanation. Science also postulates theories as the best explanation of observed results in experiments, such as when Sir Isaac Newton postulates gravity to explain tides or palaeontologists hypothesize a meteor to explain the extinction of the dinosaurs. These arguments share a certain form:

(1) Observation: some surprising phenomenon needs to be explained.

(2) Hypothesis: a certain hypothesis explains the observations in (1).

(3) Comparison: the explanation in (2) is better than any alternative explanation of the observations in (1).

(4) Conclusion: the hypothesis in (2) is correct.

In our examples, the observations in (1) are the cake not rising, the colleague missing the meeting, the car not starting, the crime occurring, the tides rising and the dinosaurs disappearing. Each argument then needs a set of competing hypotheses to compare, plus some reasons to prefer one of those explanations.

Inferences to the best explanation are clearly not valid, since it is possible for the conclusion (4) to be false when the premises (1)–(3) are all true. That lack of validity is, however, a feature rather than a bug. Inferences to the best explanation are not intended to be valid, so it is unfair to criticize them for failing to be valid – just as it would be unfair to criticize a bicycle for failing to work in the ocean.

Inferences to the best explanation still need to meet other standards. They can go astray when any of their premises is false. Sometimes an inference to the best explanation is defective because the observation in premise (1) is not accurate. A detective might be misled when he tries to explain the blood on the car seat, when the stain is really beetroot juice. An inference to the best explanation can also go astray when the hypothesis in premise (2) does not really explain the observation. You might think that your car did not start because it was out of fuel, when actually the starter did not even begin to turn over, and lack of fuel cannot explain that observation, since the starter does turn over when it is out of fuel (but not when the electrical system fails). Perhaps the most common problem for inferences to the best explanation is when premise (3) is false either because a competing hypothesis is better than the arguer thinks or because the arguer overlooked an alternative hypothesis that provides an even better explanation. You might think that your colleague missed the meeting because she forgot, when really she was hit by a car on the way to the meeting. Such mistakes can lead to regret and apologies.

Overall, some inferences to the best explanation can provide strong reasons to believe their conclusions, as when a detective provides evidence beyond a reasonable doubt that a defendant is guilty. In contrast, other inferences to the best explanation fail miserably, such as when beetroot juice is mistaken for blood. In order to determine how strong an inference to the best explanation is, we need to look carefully at each premise and also at the conclusion.

HUSSEIN’S TUBES

Let’s try this with a controversial example. Some of the most important inferences to the best explanation lie behind political decisions, such as the decision by the United States to start the Iraq War. In his testimony before the United Nations Security Council on 5 February 2003, US Secretary of State Colin Powell gave this argument:

Saddam Hussein is determined to get his hands on a nuclear bomb. He is so determined that he has made repeated covert attempts to acquire high-specification aluminum tubes from eleven different countries … There is controversy about what these tubes are for. Most US experts think they are intended to serve as rotors in centrifuges to enrich uranium. Other experts, and the Iraqis themselves, argue that they are really to produce the rocket bodies for a conventional weapon, a multiple rocket launcher … First, it strikes me as quite odd that these tubes are manufactured to a tolerance that far exceeds US requirements for comparable rockets. Maybe Iraqis just manufacture their conventional weapon to a higher standard than we do, but I don’t think so. Second, we actually have examined tubes from several different batches that were seized clandestinely before they reached Baghdad. What we notice in these different batches is a progression to higher and higher levels of specification … Why would they continue refining the specifications, go to all that trouble for something that, if it was a rocket, would soon be blown into shrapnel when it went off? … [T]ese illicit procurement efforts show that Saddam Hussein is very much focused on putting in place the key missing piece from his nuclear weapons program, the ability to produce fissile material.7

Of course, I do not endorse this argument. There are many reasons to doubt its premises and conclusion, especially given what we learned later. My goal is only to understand it.

The most natural way to understand Powell’s argument is as an inference to the best explanation. He mentions a surprising phenomenon that needs to be explained and compares three potential explanations of that phenomenon, so his argument fits cleanly into the form above:

(1*) Observation: Saddam Hussein made repeated covert attempts to acquire high-specification aluminum tubes that were increasingly refined.

(2*) Hypothesis: Hussein’s desire to produce fissile material and use it to make a nuclear bomb could explain why he made the attempts described in (1*).

(3*) Comparison: the explanation in (2*) is better than any alternative explanation of the observations in (1*), including Hussein’s reported desire to produce conventional rocket bodies and higher standards in Iraqi manufacturing.

(4*) Conclusion: Hussein desires to produce fissile material for a nuclear bomb.

Powell adds more to back up his premises, but let’s start with the central argument (1*)–(4*). Reconstructing the argument in this form should reveal or clarify how its premises work together to provide some reason to believe its conclusion. But how strong is that reason? To assess the strength of the argument, we need to go through the premises and conclusion carefully.

Premise (1*) raises several questions. How high were the specifications of the tubes that Hussein tried to obtain? How do we know that he insisted on such high specifications? How many attempts did he make? How long ago? Were they covert in the sense of being hidden from everyone or only from the US? Why did he hide them? Although such questions are important, Powell could probably answer them, and he does cite evidence of Hussein’s attempts in other parts of his testimony, so it makes sense here to focus attention on his other premises.

Premise (2*) adds that the phenomenon in (1*) can be explained by Hussein’s desire to produce fissile material for a nuclear bomb. This makes sense. People who desire to make fissile material will want to acquire what is necessary to make it, and high-specification aluminum tubes were needed to produce fissile material. Indeed, the high specifications were needed only for fissile material of the kind used in nuclear bombs, and there would be little use for this kind of fissile material except in making nuclear bombs. At least that is what Powell assumes.

The most serious problems arise in premise (3*). This premise compares Powell’s preferred explanation in (2*) with two competitors: a desire to produce conventional rocket bodies and higher Iraqi standards in manufacturing rockets. Powell focuses on rocket bodies, because that explanation was offered by Hussein himself. Still, Powell’s argument would fail if any other explanation was as strong as Powell’s preferred explanation in (2*), so we need to consider both alternatives.

Powell criticizes the alternative explanation in terms of conventional rockets by asking rhetorical questions: ‘Why would they continue refining the specifications, go to all that trouble for something that, if it was a rocket, would soon be blown into shrapnel when it went off?’ His point here is that the explanation in terms of conventional rockets fails to explain the continual refinements, because rockets do not require these refinements, whereas his preferred explanation in terms of nuclear bombs succeeds in explaining these additional observations. Its ability to explain more observations is what is supposed to make his explanation better.

This increased explanatory power is a common ground for preferring one explanation to another. Suppose that the hypothesis that Gregor killed Maxim explains why the bootprints outside the murder scene are size 14, because Gregor wears size 14 boots, but this hypothesis cannot explain why those bootprints have their distinctive tread pattern, because Gregor does not own any boots with that tread pattern. Then that explanation is not as good as the hypothesis that Ivan killed Maxim, if Ivan wears size 14 and also owns boots with that distinctive tread pattern. We prefer hypotheses that explain more. Powell is simply applying this general principle to the case of aluminium tubes.

This argument is still subject to many objections. Critics could deny or doubt that Iraq did continue refining the specifications, in which case there would be no need to explain this. Or they could reply that these continual refinements were needed for conventional rockets, so the alternative hypothesis does explain the observations. To avoid these objections, Powell needs background arguments that are not included in the quoted passage. Still, even without delving deeper, our reconstruction has pinpointed at least two issues for further exploration.

The other alternative that Powell mentions is that ‘Iraqis just manufacture their conventional weapons to a higher standard than we do.’ Here Powell seems to have his tongue in his cheek. That is why he thinks all he needs to say in response is simply, ‘I don’t think so.’ This sarcastic assurance seems to build on the assumption that US manufacturing is at least as precise as Iraqi manufacturing. That assumption might be obvious to this audience, but it is striking that Powell does not explicitly give any reason to favour his own explanation above this alternative.

It need not always be a problem to ignore or dismiss an alternative explanation without argument. Some alternative explanations are so clearly inadequate that they do not deserve any effort at refutation. Every inference to the best explanation would need to be irritatingly long in order to deal with every foolish alternative. Nonetheless, this failure to argue against an alternative does reduce the potential audience for the argument. It cannot reach anyone with any inclination to accept this alternative explanation.

The most serious weakness in Powell’s argument lies not in the alternatives he does mention, but in the alternatives he does not mention. This problem pervades inferences to the best explanation. Just recall any murder mystery in which a new suspect appears after the detectives thought that they had already solved the case. The same kind of possibility can undermine Powell’s argument, but here the suspects are hypotheses. In order to refute his argument, all that Powell’s opponents need to produce is one other viable hypothesis that explains the relevant data at least as well as Powell’s.

Notice that opponents do not have to produce a better alternative. If all they want to show is that he has not justified his conclusion, then all they need to show is that there is one alternative at least as good as his. If two alternative explanations tie for top place, then Powell’s argument cannot determine which of these top two is correct. In that case, Powell’s opponents win, because Powell is the one who is trying to argue for one of them over the other.

Still, it might be hard to come up with even one decent alternative. Maybe Hussein was controlled by aliens who eat fissile material, but he did not want any for himself. You cannot falsify that alternative hypothesis if there is no way to detect the presence of such aliens. Nonetheless, these aliens would violate well-established laws of physics, so we have plenty of reasons to dismiss this hypothesis as silly. A little more realistically, maybe Hussein had OCD (obsessive-compulsive disorder), and that is why he continually demanded more refined tubes. However, he did not show symptoms of OCD in other areas of his life, so there is no independent evidence that he had this mental disorder (though maybe he had others, such as narcissism). Hypotheses like these are clearly not even decent explanations.

What we really need for a realistic explanation to be as good as Powell’s is some common and plausible motive that would make Hussein seek more and more refined aluminium tubes. Well, maybe he wanted to use these tubes in some innocent kind of manufacturing. Maybe, but that hypothesis lacks explanatory power – it cannot explain much – until we make it more specific. What kind of products are such refined tubes needed to manufacture? The hypothesis that Hussein was planning to use the tubes to manufacture some other product also cannot explain why Hussein mentioned only conventional rockets in his defence. And Powell has already rejected the rocket hypothesis.

Thus it is at least not easy to come up with any explanation that is as good as Powell’s. Of course, this difficulty might be due to my (and your?) lack of knowledge about rockets, fissile material and Iraqi manufacturing. Even if we cannot come up with any viable alternative, there still might be some explanation that is as good as Powell’s. Nonetheless, in the absence of any such alternative, Powell’s argument does give us some reason to believe his conclusion.

Other problems arise, however, when we look closely at that conclusion. The conclusion of an inference to the best explanation is supposed to be the same as the hypothesis that explains the observations. However, people who use this form of argument often make subtle changes in their conclusions. That happens here. First, Hussein’s attempts to acquire the tubes occurred in the past. What explains these attempts is a desire at the past time when those attempts were made. However, the conclusion is about the present: Hussein desires – not desired – to produce fissile material for a nuclear bomb. Powell exchanged ‘s’ for ‘d’! Moreover, the present tense is essential. Powell wants to justify invading Iraq soon after his testimony. His argument would not work if Hussein used to desire fissile material in the past, but no longer has that desire at present. So Powell at least owes us some reason to believe that Hussein has not changed.

Similarly, what if Hussein still desires fissile material for a nuclear bomb, but he has little or no chance of getting any of what he desires? The Rolling Stones are right again: you can’t always get what you want. Then the conclusion that Hussein wants fissile material for nuclear bombs would hardly be enough to justify invading Iraq. A lot of other world leaders want nuclear bombs, but the US is not justified in invading all of them. An invasion could be justified only if it would avoid some harm or danger, but a mere desire for nuclear bombs without any chance of fulfilling that desire would not be harmful or dangerous – or at least not harmful or dangerous enough to justify invasion. Thus Powell also owes us some reason to believe that Hussein has a significant chance of getting nuclear bombs.

These gaps show that Powell’s argument is at best incomplete. As before, my job here is not to determine whether he was correct, much less whether the United States was justified in invading Iraq. I doubt it, partly because of what we have learned in intervening years, but that does not matter in this context. My goal is only to understand Powell and his argument better. Admitting these gaps in his argument is completely compatible with admitting that his argument still achieves something: it gives us some reason to believe the conclusion that Hussein desired to produce fissile material for a nuclear bomb. As with many arguments, we understand the argument more fully if we recognize both its accomplishments and also its limits.

This example also teaches other lessons. Powell’s argument shows that inferences to the best explanation can have important effects even when they are incomplete or worse. Like other arguments, inferences to the best explanation can persuade without justifying. We all need to learn how to evaluate inferences to the best explanation in order to avoid such mistakes and all of their accompanying costs.