Chapter 5
The Monty Hall Paradox and Conditional Probabilities

The Monty Hall problem is loosely based on the American television game show Let's Make a Deal and it is named after the show's original host, Monty Hall. It is considered a paradox because the result appears absurd but can be demonstrated to be true nevertheless. The problem was made famous when it appeared in Marilyn von Savant's Parade Magazine column in 1990, and it has also been featured in the final episode of the 2004–2005 season of NUMB3RS, as well as in the opening scenes of the 2008 movie 21.

5.1 The Monty Hall Paradox

Suppose you are participating in a contest and you are given a choice of three doors: behind one door there is a car; behind the others, goats. You pick a door, say number 1, and the host (call him Monty), who knows what's behind each door, opens another door, say number 3, which has a goat. He then offers you the opportunity to switch to the other unopened door (in this case, door number 2). Should you switch doors?

Contrary to intuition, under some reasonable assumptions (mainly, that when a random choice needs to be made, all options are chosen with the same probability), you are better off switching because your probability of winning the car increases. To see why this is true, we will represent the contest as a series of decisions using a tree diagram. In a tree diagram, each level of the tree represents a series of mutually exclusive events that can occur at a given point in time. By following the different branches of the tree from its root, we can represent all possible outcomes in a complex experiment.

The Monty Hall paradox involves three different sets of events: the producers of the contest need to decide which doors hides the car, the contestant (you) needs to pick a door, and finally Monty needs to decide which other (among the ones that hide a goat and have not been chosen by you) he will open. Hence, our final tree will consist of three levels, which we will be adding to it sequentially.

Consider the first choice. Before the contest starts, the producers are free to place the car behind any one of the three doors. Since we assume that each door is selected with the same probability, we end up with the tree representation in Figure 5.1. The number next to each branch corresponds to the probability we associate with each possible outcome. In this case, the probability of each branch is c05-math-001 because we assumed that the prize is located behind each door with the same probability.

Scheme for Each branch of a tree representing a different decision and the 1/3s represent the probability of each door being selected to contain the prize.

Figure 5.1 Each branch in this tree represents a different decision and the c05-math-002s represent the probability of each door being selected to contain the prize.

Now, for the second decision, you are unaware of which door hides the car, so you are also free to select any of the three doors. Again, we assume that you select each door with the same probability, leading to the representation in Figure 5.2.

Scheme for tree structure representing an extra level, representing the contestant decisions and the probability for each decision to be the one chosen.

Figure 5.2 The tree structure now represents an extra level, representing the contestant decisions and the probability for each decision to be the one chosen.

As a consequence of the multiplication rule, if we follow along the branches of the tree, we can obtain the probability associated with any combination of doors by multiplying together the corresponding probabilities. Therefore, in this case, any combination has the same probability 1/9.

Consider now Monty's decision: unlike the previous ones, his options are affected by the choices made by the producers and you, the contestant. To simplify the explanation, consider only the branch corresponding to the car being located behind door 3. Then, if the door chosen by you is either door 2 or door 3, Monty has a single choice for the door he will open (he cannot open the door with the car, or the door chosen by you). On the other hand, if you chose door 1, Monty has two options (he can open either door 2 or door 3), and from our original description, he opens each door with the same probability. Figure 5.3 shows the sub-tree associated with these decisions.

Scheme for Decision tree for the point when Monty decides which door to open assuming the prize is behind door 3.

Figure 5.3 Decision tree for the point when Monty decides which door to open assuming the prize is behind door 3.

A similar argument applies if the prize is located behind either door 2 or door 3. This leads to 12 possible paths with non-zero probability, as shown in Table 5.1 (note that the sum of all probabilities equals 1). Of these 12 paths, six (the ones highlighted in the table) correspond to paths where you would win by switching. If we sum these six values (the branches correspond to mutually exclusive events), we get

equation

which shows that it is beneficial for the player to switch doors.

Table 5.1 Probabilities of winning if the contestant in the Monty problem switches doors

Door where the prize is Door chosen by contestant Door opened by Monty Probability of the scenario
1 1 2 1/18
1 1 3 1/18
1 2 3 1/9
1 3 2 1/9
2 1 3 1/9
2 2 1 1/18
2 2 3 1/18
2 3 1 1/9
3 1 2 1/9
3 2 1 1/9
3 3 1 1/18
3 3 2 1/18

Highlighted lines correspond to scenarios in which the player wins by switching doors.

You can empirically verify these results by running the following simulation in R:

c05uf001

5.2 Conditional Probabilities

The solution to the Monty Hall problem illustrates the concept of conditional probability. Conditional probabilities simply reflect the fact that the probability of an event might depend on our knowledge of whether other events have already occurred in the past or not. For example, the probability of winning in the Monty Hall problem (i.e., choosing the door that hides the car) before Monty opens the door is 1/3. However, once Monty has opened a door, the probability of winning changes. This happens because we have learned something about the space of outcomes from the fact that Monty opened a door.

Conditional probabilities involve two events; in our example above we have

equation

and we are interested in computing the probability of c05-math-003 if c05-math-004 has happened (i.e., the probability that you win the car if you switch doors), which we will denote c05-math-005. This expression is also read as “the probability of c05-math-006 given c05-math-007” or “the probability of c05-math-008 conditional on c05-math-009.” Unlike the event whose probability we want to compute (in this case, c05-math-010, or winning the car), the event we condition upon (in this case, c05-math-011, switching doors) is not random; c05-math-012 is an event we assume has occurred. Consequently, generally speaking c05-math-013. As a matter of fact, it is easy to confuse the conditional probability of one event given another with their joint probability of these events. Recall that the joint probability of c05-math-014 and c05-math-015, denoted c05-math-016, describes the probability that c05-math-017 and c05-math-018 happen simultaneously. In this case, both events are random (we do not know if they have occurred or not), and we have that c05-math-019.

To further illustrate the difference between joint and conditional probabilities, consider examining the association between smoking and lung cancer. More specifically, let's imagine we interview 1000 persons and ask them whether they have ever smoked and also whether they have suffered from lung cancer (the results of one such study are presented in Table 5.2). Let

equation

Table 5.2 Studying the relationship between smoking and lung cancer

Had lung cancer Never had lung cancer
Has smoked 20 180
Has not smoked 50 750

We could ask what the probability is that a randomly selected person smokes and suffers from lung cancer. If we do this, we are inquiring about the joint probability of c05-math-020 and c05-math-021, c05-math-022. By exploiting the frequentist interpretation of probability, we could estimate c05-math-023 by dividing the number of people who answered both questions affirmatively by the total number of people interviewed. Hence,

equation

As we can see, that probability is small. On the other hand, we could also ask what is the probability of suffering lung cancer if you smoke, that is, c05-math-024. Clearly, this is the most relevant question if you want to decide if you want to quit smoking or not. Note that in this case there is nothing random about whether the person smokes or not: we know that he or she does. So, we need to compute c05-math-025, and we need to look only at the people who have suffered from cancer among those that smoke, that is,

equation

Note how different c05-math-026 and c05-math-027 are. Furthermore, if we define c05-math-028, we could compute c05-math-029, the probability of suffering from cancer if you do not smoke,

equation

All these calculations indicate that smokers are about one and a half times more likely than nonsmokers to suffer from lung cancer (since c05-math-030). Furthermore, note that c05-math-031, but c05-math-032.

Even though joint and conditional probabilities are different concepts, there is a link between them,

The conditional probability of an event c05-math-033 given c05-math-034 can be computed as

equation

or alternatively

equation

We implicitly used the second formula while constructing the decision tree for the Monty Hall problem. In fact, the probabilities in the branches of the tree are all conditional probabilities given the previous events in the tree. For example, if we define the events

equation

Then the very first branch represents c05-math-035, while c05-math-036 and c05-math-037. Using these values, we computed the joint probability of all events in the branch as the product of all three values (just as the formula suggests), that is,

equation

5.3 Independent Events

In the first few chapters, and in particular when discussing roulette, we informally used the term independent to qualify experiments where the outcome of one trial does not affect the outcome of another. Conditional probabilities can be used to formalize the notion of independence.

Independent Events

We say that two events c05-math-038 and c05-math-039 are independent if

equation

which implies that

equation

Intuitively, two events are independent if knowledge of whether c05-math-040 happened or not does not affect the probability of c05-math-041 happening. To clarify this notion, consider the Monty Hall problem again. In that case, the selection of the door by the contestant is independent from the selection by the producers. However, the decision by Monty is not independent from the decisions made by the producers and the contestant, which is the reason for the apparent paradox.

In our example involving smoking and lung cancer, we can see that cancer and smoking are not independent: since c05-math-042 and also c05-math-043 and c05-math-044, then c05-math-045.

One consequence of independence is that formulas for the expectation and variance of random variables simplify

If c05-math-046 and c05-math-047 are independent random variables and c05-math-048 and c05-math-049 are two constant (non-random numbers), then

equation

5.4 Bayes Theorem

As we discussed before, conditional probabilities are not symmetric, that is, in general c05-math-050. However, the two quantities are related. Indeed, since c05-math-051 and also c05-math-052, we have that

Bayes Theorem

For any two events c05-math-053 and c05-math-054,

equation

When c05-math-055 is unknown we can compute it from c05-math-056, c05-math-057, c05-math-058, and c05-math-059. Figure 5.4 illustrates how an event c05-math-060 can be broken down in two parts, c05-math-061 and c05-math-062. By exploiting this decomposition, we obtain the following representation:

Scheme for Partitioning the event space.

Figure 5.4 Partitioning the event space.

Total Probability Law

For any two events c05-math-063 and c05-math-064,

equation

Substituting back into Bayes theorem, we have

Bayes Theorem (Alternative Formulation)

For any two events c05-math-065 and c05-math-066,

equation

A slight generalization of Bayes theorem involves a more general partition of the space with more than two choices.

General Alternative Formulation

For any events c05-math-067 and c05-math-068 such that c05-math-069 form a partition of the space of all possible outcomes,

equation

Bayes theorem is particularly useful because in many cases we might be interested in c05-math-070, but it is much easier to compute c05-math-071. For example, suppose that you play the following game, which we call the game of urns. There are three urns:

The dealer secretly picks an urn uniformly at random, from which she draws a ball also uniformly at random. The dealer then presents the ball to you and asks you to decide what urn the ball was drawn from. If you pick the right urn, you get your wager back plus $1. Otherwise, you lose the money you bet. The question is, how should you play?

Clearly, your answer should depend on what color shows up. For example, you do not need to be an expert in probability to realize that if you observe a yellow ball you should never pick urn 2 (it does not contain any yellow ball in the first place!). However, intuition is less useful in deciding whether to pick urn 1 or urn 3.

We can use conditional probabilities and Bayes theorem to generate a strategy. Let

equation

Consider first the case when the dealer shows you a yellow ball. To create a strategy, we need to compute the probabilities associated with the ball coming from each of the urns conditional on it being yellow, that is, c05-math-072, c05-math-073, and c05-math-074. Then, the optimal strategy is to select the urn with the highest probability. Using Bayes theorem, we have

equation

(note that the denominator is the same for all three expressions, and that it corresponds to c05-math-075 because the three events c05-math-076, c05-math-077, and c05-math-078 form a partition of all possible events).

Since the dealer picks the urns uniformly at random, we have c05-math-079. Also, c05-math-080 (because urn 1 has two yellow balls out of six), c05-math-081 (because there are no yellow balls in urn 2), and c05-math-082 (because urn 3 has four yellow balls out of six). Substituting these values we have

equation

Therefore, if we see a yellow ball, the optimal strategy is to select urn 3. Incidentally, note our calculation gives us the probability of drawing a yellow ball as a byproduct, the probability of a yellow ball is c05-math-083.

A similar approach can be used in the case we observe a red ball,

equation

where, as a byproduct, we see that c05-math-084. This means that the optimal strategy in this case is to attribute the red ball to urn 2.

Finally, for a blue ball

equation

while c05-math-085, so the optimal strategy is again to attribute the blue ball to urn 2.

The probability of winning this game under the optimal strategy can be obtained using the total probability law. Figure 5.5 shows a tree representation of the game. Note that the only ways to win are when a blue ball comes up from urn 2 (which is correctly called by our strategy) and when a yellow ball is taken from the third urn (which is again correctly called by our strategy). Hence,

Scheme for Tree representation of the outcomes of the game of urns under the optimal strategy that calls yellow balls as coming from Urn 3 and blue and red balls as coming from urn 2.

Figure 5.5 Tree representation of the outcomes of the game of urns under the optimal strategy that calls yellow balls as coming from Urn 3 and blue and red balls as coming from urn 2.

equation

The following R code simulates the game of urns and can be used to check that the derivations shown above are correct.

c05uf002

5.5 Exercises

  1. 1. There are three condemned prisoners in jail, one of whom is to be secretly pardoned. One of the prisoners begs the warden to tell him the name of one of the others who will be executed, arguing that this reveals no information about his own fate but increases his chances of being pardoned from c05-math-086 to c05-math-087. The warden obliges, (secretly) flipping a coin to decide which name to provide if the prisoner who is asking is the one being pardoned. Does knowing the warden's answer really change the asking prisoner's chances of being pardoned?

  2. 2. Ignorant Monty: In a variant of the Monty Hall problem, Monty does not know what lies behind the door and picks one at random to open. When he does, he is relieved that it contains a goat. Show that, in this case, it is irrelevant whether you switch or not.

  3. 3. [R] Write a simulation that corroborates your calculations for the ignorant Monty game.

  4. 4. You are presented with three boxes: a box containing two gold coins, a box with two silver coins, and a box with one of each. After choosing a box at random and withdrawing one coin at random that happens to be a gold coin, what is the probability that the other coin is gold?

  5. 5. Consider the following table giving the joint probability for two random variables. Are the two events c05-math-088 and c05-math-089 independent? How about c05-math-090 and c05-math-091?

    c05-math-092 c05-math-093
    c05-math-094 0.25 0.25
    c05-math-095 0.25 0.25
  6. 6. Consider the following table giving the joint probability for two random variables. Are the two events c05-math-096 and c05-math-097 independent? How about c05-math-098 and c05-math-099?

    c05-math-100 c05-math-101 c05-math-102
    c05-math-103 0.25 0.15 0.10
    c05-math-104 0.25 0.20 0.05
  7. 7. When a laboratory tests you for a particular condition (say the test is for the presence of human immunodeficiency virus, or HIV) it can either produce a positive or negative result. The latter means there is no virus in your body, the former means there is a virus in your system. These tests have rates of sensitivity (i.e., how often they correctly diagnose a person with the disease) and rates of specificity (i.e., the rate of times the test correctly identifies people who do not have the condition). These rates are ideally close to 100% but, in practice, there are always false positives and false negatives. In other words, there are always situations in which the test says someone has the virus but the person actually doesn't, and situations where the test says there is no virus in the person's system but there actually is. Let's say we are testing a new HIV test and the results are presented in the following 2×2 table.

    Patients Patients
    with HIV with no HIV
    Patient with positive test 10 1
    Patient with negative test 10 10,000

    Based on the data and using a frequentist approach to probability, answer the following questions

    1. a. What's the probability of a person having HIV?
    2. b. What is the probability of the test correctly diagnosing the presence of HIV (i.e., what is the sensitivity of the test)?
    3. c. What is the specificity of the test?
    4. d. What is the rate of false positives?
    5. e. What is the rate of false negatives?
    6. f. In terms of the usefulness to the society at large (remember this is a communicable disease), what is more useful: a lower rate of false positives or a lower rate of false negatives?
  8. 8. Consider a competitor HIV test to the one mentioned above. Its field tests produced the following data.

    Patients Patients
    with HIV with no HIV
    Patient with positive test 9 2
    Patient with negative test 5 10,000

    Based on the data and using a frequentist approach to probability, answer the following questions

    1. a. Calculate c05-math-105(test is positive c05-math-106 person has HIV)?
    2. b. Calculate c05-math-107(test is negative c05-math-108 person doesn't have HIV)?
    3. c. Calculate c05-math-109(test is positive c05-math-110 person doesn't have HIV)?
    4. d. Calculate c05-math-111(test is negative c05-math-112 person has HIV)?
    5. e. Which of the two tests is preferable?
  9. 9. Compute the fair value of the game of urns described at the end of the chapter.

  10. 10. Consider a variation of the game of urns where

    • Urn 1 contains 4 blue balls, 2 red balls, and 1 yellow ball.
    • Urn 2 contains 1 blue ball, 2 red balls, and 2 yellow balls.
    • Urn 3 contains 2 blue balls, 1 blue ball, and 3 yellow balls.
    What would your optimal strategy be in this case, and what is the fair value of this game?
  11. 11. [R] Write a simulation for the game of urns in the previous exercise.