Appendix B

On Predictive Diversity

THE KEY TO THE DIVERSITY PREDICTION THEOREM IS TO DETERMINE (i) the average error of individual predictive models and (ii) the collective error of the models taken together, and see how (i) compares with (ii).1 First, a measure of (i), average individual error, is required. Taking a cue from Francis Galton’s famous weight-judging competition (and somewhat simplifying the numbers), suppose we are trying to guess (A) the weight of an ox which actually weighs 1,000 pounds2 (let us simply use thousands of pounds, so call this “1K”), and then (B) the weight of a bull, which actually weighs 2,000 pounds (2K). Suppose an individual employs his or her predictive model for both competitions; assume it predicts the same weight for the ox and for the bull, 1,500 pounds (1.5K). We cannot compute its average error by simply taking each error and averaging. That would give us an error of .5 + (–.5) = 0. But obviously the model is manifestly error prone. We avoid this by first squaring the errors, [(.52) + (–.52)] = .25 + .25 = .5, giving us the average error of this model. Figure B-1 gives the individual errors for three predictive models, α, β, and γ, applied to our two weight-judging contests.

image

Figure B-1. An example of predictive diversity

The average individual error here is (1.25+2+1.13)/3, or 1.46. More generally, average individual

image

The collective prediction (cp) is the average of the individual predictions; the models collectively predict that the ox weighs 1.42K and the bull 2.25K. We thus can now compute (ii), their collective error: (1.42−1)2 + (2.25−2)2 = (.42)2 + (.25)2 = .18 +.06 = .24. The collective error (.24) is less than the average individual error (1.46).

The reason for this is the predictive diversity of the group, which is defined as

image

where xi is the individual’s prediction and xcp is the collective prediction. To calculate the predictive diversity for our example:

α’s squared distance from cp: (.5−1.42)2 + (1−2.25)2 = 2.4

β’s squared distance from cp: (2−1.42)2 + (3−2.25)2 = .9

γ’s squared distance from cp: (1.75−1.42)2 + (2.75−2.25)2 = .36

This yields a prediction diversity (average of the squared distances) of 2.4 + .9 + .36 / 3, or 1.22.

We thus arrive at Page’s Diversity Prediction Theorem that

Collective Error = Average Individual − Predictive Diversity.

In our example the collective error of .24 equals the average individual error of 1.46 minus predictive diversity of 1.22. Although in our example the collective prediction is better than any individual prediction, that does not generally hold: an excellent individual predictor can beat the collective prediction. But the collective prediction will always beat the average of individual predictions. This is an important result: even if our predictive models are not very good, a diverse perspective or society can draw on diverse predictive models (understood as predictive diversity as defined above), and so significantly enhance its confidence in its estimates of the justice of alternative social worlds.

An especially interesting mechanism of collective intelligence is information markets. Perhaps the best known is the Iowa Election Markets, which is essentially a market for future political events. As I write (August 23, 2014) there is a market for predicting the 2014 congressional elections; there are basically four possibilities in the overall congressional prediction market: (1) Democrats control both the House and Senate; (2) Democrats control the House and the Republicans the Senate; (3) Republicans control the House and the Democrats the Senate; (4) the Republicans control both. For each “share” one buys, one gets $1 if correct, and nothing if one is not. At this moment the price for option 4 is roughly 64¢; if one buys an option at 64¢, and the Republicans control both houses on November 4, 2014 (as they did), one receives $1. Option (3) is selling at 25¢, and option (1) at 3/10 of a cent. An important feature of such markets is that predictors’ bets give information about the accuracy of their prediction—by putting their money where their predictions are, those predictors who have high confidence in their predictions have inordinate influence. If high confidence is correlated with possessing better models, information markets thus not only draw on predictive diversity but give extra influence to the best models. Moreover, not only does the market have a current price, but traders can issue bids and asks; it turns out that the traders who do this, rather than just take the current price, are those with most information, and have the greatest impact on the market.3

1 I am following the presentations of Page, The Difference, pp. 205–12, and Wagner, Zhao, Schneider, and Chen, “The Wisdom of Reluctant Crowds.”

2 “A little more than a year ago, I happened to be at Plymouth, and was interested in a Cattle exhibition, where a visitor could purchase a stamped and numbered ticket for sixpence, which qualified him to become a candidate in a weight-judging competition. An ox was selected, and each of about eight hundred candidates wrote his name and address on his ticket, together with his estimate of what the beast would weigh when killed and “dressed” by the butcher. The most successful of them gained prizes. The result of these estimates was analogous, under reservation, to the votes given by a democracy, and it seemed likely to be instructive to learn how votes were distributed on this occasion, and the value of the result. So I procured a loan of the cards after the ceremony was past, and worked them out. … It appeared that in this instance the vox populi was correct to within 1 per cent, of the real value; it was 1207 pounds instead of 1198 pounds, and the individual estimates were distributed in such a way that it was an equal chance whether one of them selected at random fell within or without the limits of −3.7 per cent, or +2.4 per cent, of the middlemost value of the whole.” Francis Galton, Memories of My Life, pp. 280–81.

3 Sunstein, Infotopia, chap. 4.