In Chapter 1 we learn that probabilities are useful in thinking about events (things that occur or characteristics that exist) relative to observations (opportunities for things to occur or characteristics to exist). When thinking about probabilities, we use literary, graphic, or mathematic language. In literary language, a probability is the frequency of events relative to the number of observations. In graphic language, we use Venn diagrams to think about probabilities. In Venn diagrams, a circular area usually represents occurrences of the event and a rectangle represents the observations. Then, the probability of the event is reflected by the area of the circle relative to the area of the rectangle. Mathematically, probabilities are proportions, because the number of events is part of the number of observations.
Regardless of which language we use to think about probabilities, we notice that probabilities have certain properties. One of these is that a probability must have a value within the range of zero to one. A probability of zero tells us that the event never occurs. A probability of one tells us that the event always occurs. Probabilities between zero and one tell us that the event sometimes occurs.
In addition to thinking about single events, we can use probabilities to think about collections of events. The first collection of events considered in Chapter 1 includes the event and its complement. The complement of an event includes everything that could happen in an observation except the event. Events and their complements always have two characteristics. One is that they are always collectively exhaustive. Events are collectively exhaustive when at least one of the events must occur in every observation. Another characteristic of events and their complements is that they are mutually exclusive. Being mutually exclusive means that, at most, only one of the events can occur in a particular observation.
We can have collections of events other than just a particular event and its complement. Other collections of events might be collectively exhaustive and/or mutually exclusive. An event and its complement, however, are always collectively exhaustive and mutually exclusive.
With other collections of events, we can be interested in two types relationships of the events. These are the intersection and the union of those events. In an intersection of events, we are interested in those observations in which all of the events occur. In a union of events, we are interested in those observations in which at least one of the events occurs.
When we are interested in the intersection of events, we can use the multiplication rule to calculate the probability of the intersection. There are two versions of the multiplication rule. The simplified version involves multiplying the probabilities of each event together. This simplified version is appropriate if the events are statistically independent (from each other). That is, if the probability of each event is the same regardless of whether the other event(s) occur(s). The full version of the multiplication rule uses conditional probabilities.
Many of the probabilities we encounter in health research and practice are conditional probabilities. What distinguishes conditional probabilities from other probabilities is the fact that conditional probabilities address a subset of the observations, rather than all of the observations. That subset of observations is specified by the conditioning event(s). The event(s) addressed by the conditional probability is specified by the conditional event(s). If events are statistically independent, then the probability of the conditional event occurring is the same regardless of whether the conditioning event occurs.
When we are interested in the union of events, we use the addition rule to calculate the probability of the union. There are two versions of the addition rule. In the simplified version, the probabilities of the events in the union are added together. This simplified version of the addition rule can be used when the events are mutually exclusive. If the events are not mutually exclusive, the probabilities of their intersections need to be taken into account.
The conditional and conditioning events in conditional probabilities have very different functions. A conditional probability addresses the probability that the conditional event(s) will occur under the assumption that the conditioning event(s) has occurred. Often we find we are interested in the probability that the conditioning event will occur assuming the conditional event has occurred. One example of this situation is the relationship among the probabilities used in interpreting diagnostic tests. Tests are characterized by their sensitivities and specificities. The conditioning events in sensitivity and specificity are whether or not a person has the disease. To interpret the result of a diagnostic test, however, we want to consider the probability that a person has the disease. In other words, we want to change whether someone has the disease from being the conditioning event to be the conditional event. The way in which we exchange conditional and conditioning events is by using Bayes' theorem.
![]() |
the probability of the complement of event A as it relates to the probability of event A. (see Equation{1.2}) |
![]() |
the probability of the intersection of events A and B. Either of the events can be represented by an unconditional probability. Then, the other event is represented by a conditional probability. This is the multiplication rule. (see Equation {1.4}) |
![]() |
three probabilities that are equal to the same value if event A is statistically independent of event B. (see Equation {1.8}) |
![]() |
the probability of event B given that event A occurs. (see Equation {1.9}) |
![]() |
the probability of event A given that event B occurs. |
![]() |
the probability of the union of events A and B. This is the addition rule. (see Equation {1.13}) |
![]() |
the relationship between the probability of event B occurring given that event A occurs and the probability of event A occurring given that event B occurs. This is Bayes' theorem. (see Equation {1.18}) |
Suppose 25 persons who ate a buffet lunch at a particular restaurant developed salmonella infections (a type of food poisoning). As epidemiologists, we are interested in finding out what food from the buffet was associated with becoming ill. To investigate this, we ask the 25 persons who became ill (the cases), and another 100 persons who ate at the buffet, but did not become ill (the controls), what they ate. Imagine we observe the following results for the items offered that are most likely to be the source of the infection:
Table 1.1 Frequencies of eating different foods for cases and for controls.
FOOD | CASES | CONTROLS |
Potato salad | 10 | 40 |
Chicken salad | 5 | 20 |
Egg salad | 2 | 8 |
Seafood salad | 5 | 20 |
Cole slaw | 4 | 16 |
Deviled eggs | 8 | 32 |
Turkey | 12 | 18 |
Dressing | 12 | 24 |
Chicken | 10 | 40 |
Eating a particular food is considered an event.
To be mutually exclusive, the probability of one event occurring given that another event has occurred must be equal to zero. In this context, mutual exclusion would mean that a person could not eat both turkey and dressing, for example. This is illustrated in Figure 1.1.
Figure 1.1 Venn diagram illustrating mutual exclusion between eating turkey and eating dressing.
This is not likely to be true. Further, we can tell that at least some of the people ate more than one item. Among cases, the total number of items eaten is 68, but there are only 25 cases. Among the controls, the total number of items eaten is 218, but there are only 100 controls. The only explanation for those frequencies is that some people ate more than one item.
To be collectively exhaustive, all of the possible events must be listed. We are told that the listed items were “most likely to be the source of infection.” This implies that there were other items on the buffet that were unlikely to cause a salmonella infection. If there were other items on the buffet, then the listed foods are not collectively exhaustive.
To determine which foods might be the source of the infection, we want look at the associations between each type of food eaten and becoming ill. An association between events is the same as the events not being statistically independent.
Statistical independence is defined by the relationship between conditional probabilities. If two events are statistically independent, then the probability of one event is the same regardless of whether another event occurs. Here, statistical independence means the probability of eating any particular item is the same for cases as it is for controls.
Next, we will change the data in Table 1 so that they reflect probabilities of eating each of the foods for cases and for controls.
The numerator will be the number of cases eating a particular food or the number of controls eating a particular food.
The denominator will be the number of cases or the number of controls.
Table 1.2 Probabilities of cases and controls eating particular foods.
FOOD | CASES | CONTROLS |
Potato salad | ![]() |
![]() |
Chicken salad | ![]() |
![]() |
Egg salad | ![]() |
![]() |
Seafood salad | ![]() |
![]() |
Cole slaw | ![]() |
![]() |
Deviled eggs | ![]() |
![]() |
Turkey | ![]() |
![]() |
Dressing | ![]() |
![]() |
Chicken | ![]() |
![]() |
They are conditional probabilities with case or control as the conditioning event and eating a particular food as the conditional event.
If there is association between eating a particular type of food and becoming ill, then the conditional probabilities for that item will be unequal. Equal conditional probabilities are a sign of statistical independence, which is the same as no association.
The conditional probabilities for turkey and dressing are equal to different values for cases compared to controls. Cases had a higher probability of eating either of those foods than did controls.
When the probability of one event is equal to zero if another event occurs, we call the events mutually exclusive.
Suppose we are interested in the probability that someone ate either turkey or chicken.
When we are interested in the probability of one and/or more events occurring, we are interested in the union of those events.
We find the union of two or more events by using the addition rule. When the events are mutually exclusive, we can use the simplified version of the addition rule.
Therefore, 88% of the cases ate either turkey or chicken.
Again, we are interested in the union of eating turkey and/or chicken. Since no one ate both, we can use the simplified version of the addition rule.
Therefore, 58% of the controls ate either turkey or chicken.
Now, suppose we are interested in how many people ate both potato salad and dressing.
When we are interested in both (all) events occurring in the same observation, we are interested in the intersection of the events.
Here, we are asked to calculate the probability of the intersection of eating potato salad and eating dressing for cases. We are told the probability of eating potato salad given that a case ate dressing is equal to 0.2. We know the events are not statistically independent since the conditional probability of eating potato salad is equal to 0.2, but the unconditional probability of eating potato salad among cases is 0.4 (from Table 2). Thus, we need to use the full version of the multiplication rule to calculate the probability of the intersection.
In this example, we are asked to calculate the probability of the intersection of eating potato salad and dressing among controls. The only difference between this example and the previous example is that we are considering controls. The conditional probability is the same for cases and controls, but the unconditional probability of eating dressing is different.
1.1. In a particular high school, 40 of the 200 graduating seniors report they have had unprotected sexual intercourse and 100 of the 200 graduating seniors report they have tried smoking marijuana at least once during high school. Further, 30 of the 100 graduating seniors who report they have tried smoking marijuana have also had unprotected sexual intercourse. Based on that information, which of the following is the best description of the relationship between having unprotected sexual intercourse and smoking marijuana?
1.2. Suppose 25% of the children in a certain elementary school developed nausea and vomiting following a holiday party. None of the children who drank the apple cider at that party became ill. Based on that information, which of the following is the best description of the relationship between becoming ill and drinking apple cider at the party?
1.3. In a particular population, 30% of the people smoke cigarettes and 10% of the people have chronic obstructive pulmonary disease (COPD). If having COPD is independent of smoking, what percent of the population would you expect to find who smoke and also have COPD?
1.4. In a particular population, 5% of the infants have a low birthweight (<2,500 g) and 60% of the mothers have at least 16 years of education. If having a mother with at least 16 years of education and having a low birthweight are statistically independent, which of the following is closest to the percentage of infants who have a mother with at least 16 years of education and who have a low birthweight?
1.5. Suppose in a population of 10,000 persons, 5,000 smoke cigarettes and 7,500 have a high fat diet. Further, suppose that among the 5,000 persons who smoke cigarettes there are 3,000 who are also among the 7,500 who have a high fat diet. Based that information, which of the following is closest to the percentage of persons in the population who smoke cigarettes and/or have a high fat diet?
1.6. In a particular population, 60% of infants receive only their mother's breast milk, 20% receive only commercial infant formula, and 20% receive both. What is the chance that a particular infant selected randomly would receive breast milk and/or formula?
1.7. Suppose we are interested in the efficacy of a new treatment for anemia. To investigate this treatment, we randomly assign 50 persons with anemia to receive the new treatment and 50 persons with anemia to receive the standard treatment. Among those 100 persons, suppose 60% are cured. If there is no association between treatment and the chance of being cured, what percentage of the persons who received the new treatment would we expect to be cured?
1.8. In a certain industry, 20% of the workers develop liver disease and 40% develop respiratory disease. If there is no association between developing liver disease and developing respiratory disease, what percentage of the workers who develop liver disease would we expect to develop respiratory disease?