Books on big data tend to fall into one of two categories: either they offer no explanation as to how things actually work or they are highly mathematical textbooks suitable only for graduate students. The aim of this book is to offer an alternative by providing an introduction to how big data works and is changing the world about us; the effect it has on our everyday lives; and the effect it has in the business world.
Data used to mean documents and papers, with maybe a few photos, but it now means much more than that. Social networking sites generate large amounts of data in the form of images, videos, and movies on a minute by minute basis. Online shopping creates data as we enter our address and credit card details. We are now at a point where the collection and storage of data is growing at a rate unimaginable only a few decades ago but, as we will see in this book, new data analysis techniques are transforming this data into useful information. While writing this book, I found that big data cannot be meaningfully discussed without frequent reference to its collection, storage, analysis, and use by the big commercial players. Since research departments in companies such as Google and Amazon have been responsible for many of the major developments in big data, frequent reference will be made to them.
The first chapter introduces the reader to the diversity of data in general before explaining how the digital age has led to changes in the way we define data. Big data is introduced informally through the idea of the data explosion, which involves computer science, statistics, and the interface between them. In Chapters 2 to 4, I have used diagrams quite extensively to help explain some of the new methods required by big data. The second chapter explores what makes big data special and, in doing so, leads us to a more specific definition. In Chapter 3, we discuss the problems related to storing and managing big data. Most people are familiar with the need to back up the data on their personal computer. But how do we do this with the colossal amounts of data that are now being generated? To answer this question, we will look at database storage and the idea of distributing tasks across clusters of computers. Chapter 4 argues that big data is only useful if we can extract useful information from it. A flavour of how data is turned into information is given using simplified explanations of several well-established techniques.
We then move on to a more detailed discussion of big data applications, starting in Chapter 5 with the role of big data in medicine. Chapter 6 analyses business practices with case studies on Amazon and Netflix, each highlighting different features of marketing using big data. Chapter 7 looks at some of the security issues surrounding big data and the importance of encryption. Data theft has become a big problem and we look at some of the cases that have been in the news including Snowden and WikiLeaks. The chapter concludes by showing how cybercrime is an issue that big data needs to address. In the final chapter, Chapter 8, we consider how big data is changing the society we live in, through the development of sophisticated robots and their role in the workplace. A consideration of the smart homes and smart cities of the future concludes the book.
In a very short introduction it is not possible to mention everything, so I hope the reader will pursue their interests through the Further reading section’s recommendations.