In a numerical dataset, you might expect the leading digits of each number to be evenly distributed. In fact, lower-value digits are much more likely than higher digits.
The phenomenon was first seen by American astronomer Simon Newcombe in the 1880s. It was rediscovered and popularized in 1938 by the physicist and engineer Frank Benford.
He found that the leading digits of numbers in a real-life dataset are distributed as follows:
So, the leading digit is 30 percent likely to be a “1” but less than 5 percent likely to be a “9.”
Benford tested the law on many datasets, including constants of nature, population statistics, life expectancies, and the sizes of rivers.
The Pareto principle: also known as the eighty–twenty rule, this states that 80 percent of monetary wealth is typically owned by 20 percent of the population.
Zipf’s law: the frequency of a word in a text is inversely proportional to its rank. So the nth most common word will occur with a probability proportional to 1/n.
Price’s law: half of the publications in a given field are written by the square root of all authors in that field. So if there are a hundred authors, half of all the literature will be written by ten of them.