Chapter 1. Introduction to Data Analysis

Data analysis is the process of organizing, cleaning, transforming, and modeling data to obtain useful information and ultimately, new knowledge. The terms data analytics, business analytics, data mining, artificial intelligence, machine learning, knowledge discovery, and big data are also used to describe similar processes. The distinctions of these fields probably lie more in their areas of application than in their fundamental nature. Some argue that these are all part of the new discipline of data science.

The central process of gaining useful information from organized data is managed by the application of computer science algorithms. Consequently, these will be a central focus of this book.

Data analysis is both an old field and a new one. Its origins lie among the mathematical fields of numerical methods and statistical analysis, which reach back into the eighteenth century. But many of the methods that we shall study gained prominence much more recently, with the ubiquitous force of the internet and the consequent availability of massive datasets.

In this first chapter, we look at a few famous historical examples of data analysis. These can help us appreciate the importance of the science and its promise for the future.

Data is as old as civilization itself, maybe even older. The 17,000-year-old paintings in the Lascaux caves in France could well have been attempts by those primitive dwellers to record their greatest hunting triumphs. Those records provide us with data about humanity in the Paleolithic era. That data was not analyzed, in the modern sense, to obtain new knowledge. But its existence does attest to the need humans have to preserve their ideas in data.

Five thousand years ago, the Sumerians of ancient Mesopotamia recorded far more important data on clay tablets. That cuneiform writing included substantial accounting data about daily business transactions. To apply that data, the Sumerians invented not only text writing, but also the first number system.

In 1086, King William the Conqueror ordered a massive collection of data to determine the extent of the lands and properties of the crown and of his subjects. This was called the Domesday Book, because it was a final tallying of people's (material) lives. That data was analyzed to determine ownership and tax obligations for centuries to follow.