In statistical modeling, the use of the Greek letter epsilon explicitly recognizes that uncertainty is intrinsic to our world. The statistical paradigm has two components: data or measurements, drawn from the world we observe, and the underlying processes generating those data. Epsilon appears in mathematical descriptions of those underlying processes and represents the inherent randomness with which the data we observe are generated. Through the collection and modeling of data, we hope to make better guesses at the mathematical form of those processes; a better understanding of the data-generating mechanism will enable us to do a better job modeling and predicting the world around us.
That use of epsilon is a recognition of the inability of data-driven research to perfectly predict the future, no matter the computing or data-collection resources. It codifies the idea that uncertainty exists in the world itself. We may understand the structure of this uncertainty better over time, but the statistical paradigm asserts it as fundamental.
So we can never expect perfect predictions, even if we manage to take perfect measurements. This inherent uncertainty means that doubt isn’t a negative or a weakness but a mature recognition that our knowledge is imperfect. The statistical paradigm is increasingly being used as we continue to collect and analyze vast amounts of data, and as the output of algorithms and models grows as a source of information. We’re seeing the effects across society: evidence-based policy, evidence-based medicine, more sophisticated pricing and market-prediction models, social media customized to our online browsing patterns. . . . The intelligent use of the information derived from statistical models relies on understanding uncertainty, as does policymaking and our cultural understanding of this information source. The 21st century is surely the century of data, and we need to correctly understand their use. The stakes are high.