The Monty Hall problem is loosely based on the American television game show Let's Make a Deal and it is named after the show's original host, Monty Hall. It is considered a paradox because the result appears absurd but can be demonstrated to be true nevertheless. The problem was made famous when it appeared in Marilyn von Savant's Parade Magazine column in 1990, and it has also been featured in the final episode of the 2004–2005 season of NUMB3RS, as well as in the opening scenes of the 2008 movie 21.
Suppose you are participating in a contest and you are given a choice of three doors: behind one door there is a car; behind the others, goats. You pick a door, say number 1, and the host (call him Monty), who knows what's behind each door, opens another door, say number 3, which has a goat. He then offers you the opportunity to switch to the other unopened door (in this case, door number 2). Should you switch doors?
Contrary to intuition, under some reasonable assumptions (mainly, that when a random choice needs to be made, all options are chosen with the same probability), you are better off switching because your probability of winning the car increases. To see why this is true, we will represent the contest as a series of decisions using a tree diagram. In a tree diagram, each level of the tree represents a series of mutually exclusive events that can occur at a given point in time. By following the different branches of the tree from its root, we can represent all possible outcomes in a complex experiment.
The Monty Hall paradox involves three different sets of events: the producers of the contest need to decide which doors hides the car, the contestant (you) needs to pick a door, and finally Monty needs to decide which other (among the ones that hide a goat and have not been chosen by you) he will open. Hence, our final tree will consist of three levels, which we will be adding to it sequentially.
Consider the first choice. Before the contest starts, the producers are free to place the car behind any one of the three doors. Since we assume that each door is selected with the same probability, we end up with the tree representation in Figure 5.1. The number next to each branch corresponds to the probability we associate with each possible outcome. In this case, the probability of each branch is because we assumed that the prize is located behind each door with the same probability.
Figure 5.1 Each branch in this tree represents a different decision and the s represent the probability of each door being selected to contain the prize.
Now, for the second decision, you are unaware of which door hides the car, so you are also free to select any of the three doors. Again, we assume that you select each door with the same probability, leading to the representation in Figure 5.2.
Figure 5.2 The tree structure now represents an extra level, representing the contestant decisions and the probability for each decision to be the one chosen.
As a consequence of the multiplication rule, if we follow along the branches of the tree, we can obtain the probability associated with any combination of doors by multiplying together the corresponding probabilities. Therefore, in this case, any combination has the same probability 1/9.
Consider now Monty's decision: unlike the previous ones, his options are affected by the choices made by the producers and you, the contestant. To simplify the explanation, consider only the branch corresponding to the car being located behind door 3. Then, if the door chosen by you is either door 2 or door 3, Monty has a single choice for the door he will open (he cannot open the door with the car, or the door chosen by you). On the other hand, if you chose door 1, Monty has two options (he can open either door 2 or door 3), and from our original description, he opens each door with the same probability. Figure 5.3 shows the sub-tree associated with these decisions.
Figure 5.3 Decision tree for the point when Monty decides which door to open assuming the prize is behind door 3.
A similar argument applies if the prize is located behind either door 2 or door 3. This leads to 12 possible paths with non-zero probability, as shown in Table 5.1 (note that the sum of all probabilities equals 1). Of these 12 paths, six (the ones highlighted in the table) correspond to paths where you would win by switching. If we sum these six values (the branches correspond to mutually exclusive events), we get
which shows that it is beneficial for the player to switch doors.
Table 5.1 Probabilities of winning if the contestant in the Monty problem switches doors
Door where the prize is | Door chosen by contestant | Door opened by Monty | Probability of the scenario |
1 | 1 | 2 | 1/18 |
1 | 1 | 3 | 1/18 |
1 | 2 | 3 | 1/9 |
1 | 3 | 2 | 1/9 |
2 | 1 | 3 | 1/9 |
2 | 2 | 1 | 1/18 |
2 | 2 | 3 | 1/18 |
2 | 3 | 1 | 1/9 |
3 | 1 | 2 | 1/9 |
3 | 2 | 1 | 1/9 |
3 | 3 | 1 | 1/18 |
3 | 3 | 2 | 1/18 |
Highlighted lines correspond to scenarios in which the player wins by switching doors.
You can empirically verify these results by running the following simulation in R
:
The solution to the Monty Hall problem illustrates the concept of conditional probability. Conditional probabilities simply reflect the fact that the probability of an event might depend on our knowledge of whether other events have already occurred in the past or not. For example, the probability of winning in the Monty Hall problem (i.e., choosing the door that hides the car) before Monty opens the door is 1/3. However, once Monty has opened a door, the probability of winning changes. This happens because we have learned something about the space of outcomes from the fact that Monty opened a door.
Conditional probabilities involve two events; in our example above we have
and we are interested in computing the probability of if
has happened (i.e., the probability that you win the car if you switch doors), which we will denote
. This expression is also read as “the probability of
given
” or “the probability of
conditional on
.” Unlike the event whose probability we want to compute (in this case,
, or winning the car), the event we condition upon (in this case,
, switching doors) is not random;
is an event we assume has occurred. Consequently, generally speaking
. As a matter of fact, it is easy to confuse the conditional probability of one event given another with their joint probability of these events. Recall that the joint probability of
and
, denoted
, describes the probability that
and
happen simultaneously. In this case, both events are random (we do not know if they have occurred or not), and we have that
.
To further illustrate the difference between joint and conditional probabilities, consider examining the association between smoking and lung cancer. More specifically, let's imagine we interview 1000 persons and ask them whether they have ever smoked and also whether they have suffered from lung cancer (the results of one such study are presented in Table 5.2). Let
Table 5.2 Studying the relationship between smoking and lung cancer
Had lung cancer | Never had lung cancer | ||
Has smoked | 20 | 180 | |
Has not smoked | 50 | 750 |
We could ask what the probability is that a randomly selected person smokes and suffers from lung cancer. If we do this, we are inquiring about the joint probability of and
,
. By exploiting the frequentist interpretation of probability, we could estimate
by dividing the number of people who answered both questions affirmatively by the total number of people interviewed. Hence,
As we can see, that probability is small. On the other hand, we could also ask what is the probability of suffering lung cancer if you smoke, that is, . Clearly, this is the most relevant question if you want to decide if you want to quit smoking or not. Note that in this case there is nothing random about whether the person smokes or not: we know that he or she does. So, we need to compute
, and we need to look only at the people who have suffered from cancer among those that smoke, that is,
Note how different and
are. Furthermore, if we define
, we could compute
, the probability of suffering from cancer if you do not smoke,
All these calculations indicate that smokers are about one and a half times more likely than nonsmokers to suffer from lung cancer (since ). Furthermore, note that
, but
.
Even though joint and conditional probabilities are different concepts, there is a link between them,
The conditional probability of an event given
can be computed as
or alternatively
We implicitly used the second formula while constructing the decision tree for the Monty Hall problem. In fact, the probabilities in the branches of the tree are all conditional probabilities given the previous events in the tree. For example, if we define the events
Then the very first branch represents , while
and
. Using these values, we computed the joint probability of all events in the branch as the product of all three values (just as the formula suggests), that is,
In the first few chapters, and in particular when discussing roulette, we informally used the term independent to qualify experiments where the outcome of one trial does not affect the outcome of another. Conditional probabilities can be used to formalize the notion of independence.
Independent Events
We say that two events and
are independent if
which implies that
Intuitively, two events are independent if knowledge of whether happened or not does not affect the probability of
happening. To clarify this notion, consider the Monty Hall problem again. In that case, the selection of the door by the contestant is independent from the selection by the producers. However, the decision by Monty is not independent from the decisions made by the producers and the contestant, which is the reason for the apparent paradox.
In our example involving smoking and lung cancer, we can see that cancer and smoking are not independent: since and also
and
, then
.
One consequence of independence is that formulas for the expectation and variance of random variables simplify
If and
are independent random variables and
and
are two constant (non-random numbers), then
As we discussed before, conditional probabilities are not symmetric, that is, in general . However, the two quantities are related. Indeed, since
and also
, we have that
Bayes Theorem
For any two events and
,
When is unknown we can compute it from
,
,
, and
. Figure 5.4 illustrates how an event
can be broken down in two parts,
and
. By exploiting this decomposition, we obtain the following representation:
Figure 5.4 Partitioning the event space.
Total Probability Law
For any two events and
,
Substituting back into Bayes theorem, we have
Bayes Theorem (Alternative Formulation)
For any two events and
,
A slight generalization of Bayes theorem involves a more general partition of the space with more than two choices.
General Alternative Formulation
For any events and
such that
form a partition of the space of all possible outcomes,
Bayes theorem is particularly useful because in many cases we might be interested in , but it is much easier to compute
. For example, suppose that you play the following game, which we call the game of urns. There are three urns:
The dealer secretly picks an urn uniformly at random, from which she draws a ball also uniformly at random. The dealer then presents the ball to you and asks you to decide what urn the ball was drawn from. If you pick the right urn, you get your wager back plus $1. Otherwise, you lose the money you bet. The question is, how should you play?
Clearly, your answer should depend on what color shows up. For example, you do not need to be an expert in probability to realize that if you observe a yellow ball you should never pick urn 2 (it does not contain any yellow ball in the first place!). However, intuition is less useful in deciding whether to pick urn 1 or urn 3.
We can use conditional probabilities and Bayes theorem to generate a strategy. Let
Consider first the case when the dealer shows you a yellow ball. To create a strategy, we need to compute the probabilities associated with the ball coming from each of the urns conditional on it being yellow, that is, ,
, and
. Then, the optimal strategy is to select the urn with the highest probability. Using Bayes theorem, we have
(note that the denominator is the same for all three expressions, and that it corresponds to because the three events
,
, and
form a partition of all possible events).
Since the dealer picks the urns uniformly at random, we have . Also,
(because urn 1 has two yellow balls out of six),
(because there are no yellow balls in urn 2), and
(because urn 3 has four yellow balls out of six). Substituting these values we have
Therefore, if we see a yellow ball, the optimal strategy is to select urn 3. Incidentally, note our calculation gives us the probability of drawing a yellow ball as a byproduct, the probability of a yellow ball is .
A similar approach can be used in the case we observe a red ball,
where, as a byproduct, we see that . This means that the optimal strategy in this case is to attribute the red ball to urn 2.
Finally, for a blue ball
while , so the optimal strategy is again to attribute the blue ball to urn 2.
The probability of winning this game under the optimal strategy can be obtained using the total probability law. Figure 5.5 shows a tree representation of the game. Note that the only ways to win are when a blue ball comes up from urn 2 (which is correctly called by our strategy) and when a yellow ball is taken from the third urn (which is again correctly called by our strategy). Hence,
Figure 5.5 Tree representation of the outcomes of the game of urns under the optimal strategy that calls yellow balls as coming from Urn 3 and blue and red balls as coming from urn 2.
The following R
code simulates the game of urns and can be used to check that the derivations shown above are correct.
1. There are three condemned prisoners in jail, one of whom is to be secretly pardoned. One of the prisoners begs the warden to tell him the name of one of the others who will be executed, arguing that this reveals no information about his own fate but increases his chances of being pardoned from to
. The warden obliges, (secretly) flipping a coin to decide which name to provide if the prisoner who is asking is the one being pardoned. Does knowing the warden's answer really change the asking prisoner's chances of being pardoned?
2. Ignorant Monty: In a variant of the Monty Hall problem, Monty does not know what lies behind the door and picks one at random to open. When he does, he is relieved that it contains a goat. Show that, in this case, it is irrelevant whether you switch or not.
3. [R] Write a simulation that corroborates your calculations for the ignorant Monty game.
4. You are presented with three boxes: a box containing two gold coins, a box with two silver coins, and a box with one of each. After choosing a box at random and withdrawing one coin at random that happens to be a gold coin, what is the probability that the other coin is gold?
5. Consider the following table giving the joint probability for two random variables. Are the two events and
independent? How about
and
?
![]() |
![]() |
|
![]() |
0.25 | 0.25 |
![]() |
0.25 | 0.25 |
6. Consider the following table giving the joint probability for two random variables. Are the two events and
independent? How about
and
?
![]() |
![]() |
![]() |
|
![]() |
0.25 | 0.15 | 0.10 |
![]() |
0.25 | 0.20 | 0.05 |
7. When a laboratory tests you for a particular condition (say the test is for the presence of human immunodeficiency virus, or HIV) it can either produce a positive or negative result. The latter means there is no virus in your body, the former means there is a virus in your system. These tests have rates of sensitivity (i.e., how often they correctly diagnose a person with the disease) and rates of specificity (i.e., the rate of times the test correctly identifies people who do not have the condition). These rates are ideally close to 100% but, in practice, there are always false positives and false negatives. In other words, there are always situations in which the test says someone has the virus but the person actually doesn't, and situations where the test says there is no virus in the person's system but there actually is. Let's say we are testing a new HIV test and the results are presented in the following 2×2 table.
Patients | Patients | |
with HIV | with no HIV | |
Patient with positive test | 10 | 1 |
Patient with negative test | 10 | 10,000 |
Based on the data and using a frequentist approach to probability, answer the following questions
8. Consider a competitor HIV test to the one mentioned above. Its field tests produced the following data.
Patients | Patients | |
with HIV | with no HIV | |
Patient with positive test | 9 | 2 |
Patient with negative test | 5 | 10,000 |
Based on the data and using a frequentist approach to probability, answer the following questions
9. Compute the fair value of the game of urns described at the end of the chapter.
10. Consider a variation of the game of urns where
11. [R] Write a simulation for the game of urns in the previous exercise.