Chapter 4
Lotto and Combinatorial Numbers

Lotteries are a very common and popular type of gamble. The bet is typically targeted to a number (or set of numbers), which are randomly selected during a special event held one or more times a week. In most cases, the randomization mechanism used in lotteries is such that any number has the same probability of occurring so we are dealing with an equiprobable outcome space. In the United States, state governments typically administer lotteries, and the profits they generate are used to provide supplemental funding to public programs such as public schools and colleges; for example, see http://www.calottery.com/default.htm.

4.1 Rules and Bets

Lotto is one variant of lotteries that is extremely popular. In its simplest version, the player pays a fixed price for a ticket (often $1) and gets to select a few numbers (often 5 or 6) from among a longer list of them (anywhere between 42 and 90 numbers). You win different prizes depending on how many numbers in your list match the random draw (the more matches, the larger the prize). To reduce the risk to the organization running the lottery, the prizes are often set as a fixed percentage of total revenue (often a 50–50% split is used, making this type of game an extremely bad proposition, at least when compared with most casino games). A casino version of lotto, often called keno, is also widely popular but is even more difficult to win than the version offered by most state lotteries.

4.1.1 The Colorado Lotto

To be more specific, consider the 6-of-42 lotto offered by the Colorado Lottery (http://www.coloradolottery.com/GAMES/LOTTO/), which is drawn twice a week on Wednesday and Saturday evenings. This game costs $1 to play, distributes 50% of the revenue and pays out if you match 3, 4, 5, or 6 of the drawn numbers. We would like to compute the probability of winning for each of these four prizes.

Since we are dealing with an equiprobable space, we use the formula,

equation

Let's start with the probability of winning the main prize (i.e., of picking the 6 right numbers). The numerator in this case is very easy; since the order in which the numbers come up does not matter, there is just one combination of numbers that allows you to win.

To figure out what the denominator is, let's start by using the multiplication rule. We need to choose six numbers without replacement (i.e., the numbers cannot be repeated). Therefore, we have 42 options for the first number, 41 for the second, 40 the third, and so on. This means that, as a first approximation, we have:

equation

This number is known as the number of permutations of 42 objects taken 6 at a time, and is denoted as c04-math-001. This number can also be computed as

equation

where c04-math-002 is read c04-math-003 factorial (with the convention c04-math-004).

When computing the number of permutations, we implicitly assume that the order in which the numbers appear is important. That is, we assume that the sequences c04-math-005 and c04-math-006 are different. However, for the purpose of the game of lotto, these two sequences are really the same. To adjust for this, we need to figure out how many different orderings we have for 6 numbers. Again, we can use the multiplication rule: we need to fill up 6 spots with 6 numbers, so there are 6 options for the first number, 5 spots for the second, and so on. Hence, the total number of ordering of 6 numbers is c04-math-007.

Since our previous calculation was counting each different combination of 6 numbers 720 times, we just need to divide c04-math-008 (the number of subsets in which the permutation is important) by the total numbers of ways we can order 6 digits (c04-math-009),

equation

and the probability of matching exactly the 6 winning numbers

equation

Note that the whole calculation for the number of possible groups of 6 numbers out of 42 can be written as

equation

This quantity is known as the number of combinations of 42 elements taken 6 at a time, and is denoted as

equation

which is read 42 choose 6.

The results discussed earlier can be extended to situations in which c04-math-019 objects need to be chosen from among c04-math-020 of them.

The number of ways in which c04-math-021 ordered objects can be selected from a total of c04-math-022 options is given by the permutation number

equation

In the special case where we are interested in the number of ways c04-math-023 objects are ordered this reduces to

equation

The number of ways in in which c04-math-024 unordered objects can be selected from a total of c04-math-025 options is given by the combinatorial number (sometimes called the binomial coefficient)

equation

To convince yourself that the formula for the combinatorial number is correct, consider a simple example in which we want to enumerate all the possible options. In particular, let's compute c04-math-026, the number of ways in which 3 numbers can be selected out of 6 without repetition. Our formula above says that

equation

This can be verified by explicitly enumerating all possible options (see Table 4.1). Sidebars 4.1 and 4.2 discuss how to use R to enumerate and count permutations and combinations.

Table 4.1 List of possible groups of 3 out of 6 numbers, if the order of the numbers is not important

1, 2, 3 1, 3, 4 1, 4, 6 2, 3, 6 3, 4, 5
1, 2, 4 1, 3, 5 1, 5, 6 2, 4, 5 3, 4, 6
1, 2, 5 1, 3, 6 2, 3, 4 2, 4, 6 3, 5, 6
1, 2, 6 1, 4, 5 2, 3, 5 2, 5, 6 4, 5, 6

Let's proceed now to calculate the probability of winning the second prize, that is, matching exactly 5 numbers out of 6. The denominator is the same as before, so we do not need to repeat the calculation. For the numerator, we need to pick 5 numbers out of the 6 that came up in the drawing, while the sixth number needs to come up from among the 36 that are not winning numbers. So the numerator is

equation

and the probability is

equation

A similar argument applies for the third prize (getting 4 out of 6 numbers). For the number of combinations that match exactly 4 numbers, we need to first choose 4 among the 6 winning numbers, and then 2 numbers among the remaining 36 non-winning numbers. Hence,

equation

Finally, for the fourth prize (3 out of 6 numbers) we have

equation

The following code can be used to simulate the outcome of the Colorado Lotto and estimate the probability of the third and fourth prizes (see Sidebar 4.3 for how to use R to sample without replacement).

c04uf008

4.1.2 The California Superlotto

Let's analyze now the Superlotto game offered by the California lottery. In this variant of lotto, 6 numbers are picked; the first 5 are selected between 1 and 47, and the 6th number (called the Mega) is selected between 1 and 27. Note that because the Mega is drawn separately from the other 5 numbers, it might be equal to one of the 5 other numbers. The first prize is awarded to the tickets that match all 6 numbers, other prizes are awarded depending on how many of the first 5 numbers are matched, and on whether the mega is also matched or not.

The number of different tickets in the California Superlotto is

equation

where the first term corresponds to the number of ways in which 5 numbers can be selected out of 47, while the second term corresponds to the number of ways in which the Mega number can be selected. Hence, the probability of winning the first prize is

equation

The second prize in the California Superlotto is awarded to those tickets that match the 5 first winning numbers but do not get the Mega number right. Using the multiplication rule, this number is simply

equation

The third prize is awarded to tickets that match the Mega number and 4 out of the 5 first numbers. Using a similar reasoning to the previous examples

equation

The probability associated with other prizes can be computed in a similar way (e.g., see Exercise 10).

4.2 Sharing Profits: De Méré's Second Problem

Combinatorial numbers can be used to answer another question originally posed to Blaise Pascal by the Chevalier De Méré. This question revolves around how to split the proceedings of the bets when a series of games cannot be completed. For example, assume that John and Monica are betting on the outcome of a series of seven games played between two teams (like the Major League Baseball World Series). Assume also that the two teams are evenly matched (therefore, before they start playing, all possible sequences of seven games are equally likely), that both John and Monica bet $10 on their respective teams, and that the first team to win four games gets its fan the whole pot ($20). After playing four games, Monica's team has won three of them and John's only one. If the series has to be canceled, how should they split the $20 pot?

One possible answer is to split the pot evenly, as if the bets had never been made. However, Monica would (rightfully) argue that, since her team had won more games, she should also get a larger share of the pot. The question is, how much larger should it be?

To answer this question, we first need to compute the probability that Monica's team wins its fourth game before John's wins two more. Since the space is equiprobable, the exact history of how we got to the current state does not really matter (i.e., it does not matter who won what during the first four games, as long as we have three wins and one loss for Monica's team). Thus, we could say that the history is,

equation

where W means that Monica's team won, L means that it lost, and the underlined spaces correspond to the unknown outcomes associated with the last three games. Now, let's consider the future. Since we would typically stop playing once one of the teams has reached four wins, there are four possible ways in which the World Series could end up being played.

History Winner
L W W W W Monica
L W W W L W Monica
L W W W L L W Monica
L W W W L L L John

Now, it is tempting to argue that, since three out of those four futures lead to Monica winning the bet, then the probability of her team winning is 3/4. However, this is not quite right because the four outcomes above are not equiprobable. Indeed, it is the whole sequences of seven characters which are equiprobable! Accordingly, we need to consider all the possible sequences of seven characters that start with L W W W (there are 8 of them):

History Winner
L W W W W W W Monica
L W W W W L W Monica
L W W W W W L Monica
L W W W W L L Monica
L W W W L W W Monica
L W W W L W L Monica
L W W W L L W Monica
L W W W L L L John

Note that the first seven imply that Monica wins the bet (the first four correspond to Monica's team winning the fifth game in the series, the next two correspond to Monica's team losing the fifth but winning the sixth, and the second to last corresponds to Monica's team losing the fifth and sixth games but winning the seventh), while only the last one implies that Monica will lose the bet. Therefore, her probability of winning is c04-math-027 and not c04-math-028!

As before, we can convince ourselves that this reasoning is correct using a simple simulation:[

c04uf009

Once we have computed the probability that Monica will win, we can go back to our definition of a fair game and compute Monica's share of the pot as Monica's expected profit:

equation

while John's share should be

equation

(note that both add up to $20, as they should).

This result can be generalized. Suppose that in the current state of the game John needs to win c04-math-029 games to win the bet, and Monica needs to win c04-math-030 of them. In our example above c04-math-031 and c04-math-032. Then, we need to consider an additional c04-math-033 rounds of the game (in our example, we considered c04-math-034). There are c04-math-035 possible different outcomes for these c04-math-036 rounds (c04-math-037 in our example), of which

equation

have Monica as the winner. In the previous sum, the first terms correspond to the number of future sequences of c04-math-038 games in which Monica wins exactly c04-math-039 games (the minimum it needs to get the full pot), the second corresponds to the number of sequences in which she wins exactly c04-math-040 games, and so on.

From the previous results, we can get the probability that Monica will win the bet is simply

equation

Incidentally, note that if c04-math-041 (i.e., both teams are tied at the time the series is halted) then

equation

which implies that the pot should be evenly split (as we would have expected, given that the teams are evenly matched).

4.3 Exercises

  1. 1. You are a photographer sitting in a group of 10 people in a row for pictures. How many different seating arrangements could you use?

  2. 2. Eight horses (Alabaster, Beauty, Candy, Doughty, Excellente, Friday, Great One, and High 'n Mighty) run a race. In how many ways can the first three finishers turn out?

  3. 3. A statistics class has 30 students. The students need to select a team of 5 people to represent them, how many different such teams can be formed?

  4. 4. In how many ways can I seat 5 people in a circular table?

  5. 5. In casino keno, a player chooses 10 numbers out of 80. If she matches all the 10 numbers, she wins the first prize. What is the probability of winning the first prize in this game? How does it compare with the probability of winning the Colorado Lotto (use an odds ratio to compare the results, and interpret it)?

  6. 6. The Florida Lotto is a lottery offered in the state of Florida; the first prize is won when you get 6 winning numbers from a list of 53 numbers. What is the probability of winning the first prize? The second prize is won if you get 5 of those 6 winning numbers, what is the probability of winning the second prize?

  7. 7. What is the probability of winning the first prize of the Florida Lotto if you buy 100 tickets? (Assume each ticket has a different set of numbers.)

  8. 8. For the New York State lottery the first prize is won if you get 6 winning numbers from a list of 59 numbers. Calculate the probability of winning this lottery. The third prize is obtained by getting 4 of the 6 winning numbers, calculate also the probability for this prize.

  9. 9. What is the probability of winning this last lottery if you buy 100 tickets? What is your expected profit for this situation if each ticket costs $1?

  10. 10. What is the probability of winning the fourth prize in the California Superlotto? The fourth prize goes to tickets that get 4 out of the 5 first numbers correct, but miss the Mega.

  11. 11. Imagine the California SuperLotto lottery is changed so that there's a second Mega number (chosen from the same list of 26 numbers as the first Mega number) and the first prize is obtained if the player gets the 5 winning numbers, the first Mega number and the second Mega number. What is the probability of getting the first prize in the new lottery? Is this prize harder or easier to win than the actual California SuperLotto?

  12. 12. In the California Lotto, what is the probability of getting any 3 of the 5 winning numbers and the Mega?

  13. 13. In the SuperLotto, you can get a prize if you get 4 out of the 5 winning numbers and the Mega number, but you also get a price if you just get 4 out of the 5 winning numbers and miss the Mega number. Which of the two prizes has higher probability of winning?

  14. 14. Which is more likely: to get 3 out of the 5 winning numbers and the Mega number or getting the 4 out of 5 winning numbers?

  15. 15. [R] Can you list out all the combinations of 3 numbers from the list of numbers from 1 to 6? Check your list using R.

  16. 16. [R] Modify the R code for simulating the Colorado Lotto in order to instead estimate the probability of the third and fourth prizes of the California Superlotto.