FOREWORD

Executive Letter from Rob Thomas

There’s an old story about two men working on a railroad track many years back. As they are laying track in the heat of the day, a person drives by in a car and rolls down the window (not enough to let the air conditioning out, but enough to be heard). He yells, “Tom, is that you?” Tom, one of the men working on the track, replies, “Chris, it’s great to see you! It must have been 20 years … How are you?” They continue the conversation and Chris eventually drives off. When he leaves, another worker turns to Tom and says, “I know that was the owner of the railroad and he’s worth nearly a billion dollars. How do you know him?” Tom replies, “Chris and I started working on the railroad, laying track, on the same day 20 years ago. The only difference between Chris and me is that I came to work for $1.25/hour and he came to work for the railroad.”

*****

Perspective. Aspiration. Ambition. These are the attributes that separate those who come to work for a paycheck versus those who come to work to change the world. The coming of the Big Data Era is a chance for everyone in the technology world to decide into which camp they fall, as this era will bring the biggest opportunity for companies and individuals in technology since the dawn of the Internet.

Let’s step back for a moment and look at how the technology world has changed since the turn of the century:

 

• 80 percent of the world’s information is unstructured.

• Unstructured information is growing at 15 times the rate of structured information.

• Raw computational power is growing at such an enormous rate that today’s off-the-shelf commodity box is starting to display the power that a supercomputer showed half a decade ago.

• Access to information has been democratized: it is (or should be) available for all.

This is the new normal. These aspects alone will demand a change in our approach to solving information-based problems. Does this mean that our investments in the past decade are lost or irrelevant? Of course not! We will still need relational data stores and warehouses, and that footprint will continue to expand. However, we will need to augment those traditional approaches with technology that will allow enterprises to benefit from the Big Data Era.

The Big Data Era will be led by the individuals and companies that deliver a platform suitable for the new normal—a platform consisting of exploration and development toolsets, visualization techniques, search and discovery, native text analytics, machine learning, and enterprise stability and security, among other aspects. Many will talk about this, few will deliver.

I’m participating here because I know we can change the technology world, and that’s much more satisfying than $1.25/hour. Welcome to the Big Data Era.

image

Rob Thomas
IBM Vice President, Business Development

Executive Letter from Anjul Bhambhri

It was in the 1970s when the first prototype of a relational database system, System R, was created in the Almaden Research labs in San Jose. System R sowed the seeds for the most common way of dealing with data structured in relational form, called SQL; you’ll recognize it as a key contribution to the development of products such as DB2, Oracle, SQL/DS, ALLBASE, and Non-Stop SQL, among others. In combination with the explosion of computing power across mainframes, midframes, and personal desktops, databases have become a ubiquitous way of collecting and storing data. In fact, their proliferation led to the creation of a discipline around “warehousing” the data, such that it was easier to manage and correlate data from multiple databases in a uniform fashion. It’s also led to the creation of vertical slices of these warehouses into data marts for faster decisions that are tightly associated with specific lines of business needs. These developments, over a short period of ten years in the 1990s, made the IT department a key competitive differentiator for every business venture. Thousands of applications were born—some horizontal across industries, some specific to domains such as purchasing, shipping, transportation, and more. Codenames such as ERP (Enterprise Resource Planning), SCM (Supply Chain Management), and others became commonplace.

By the late 1990s, inevitably, different portions of an organization used different data management systems to store and search their critical data, leading to federated database engines (under the IBM codename Garlic). Then, in 2001, came the era of XML. The DB2 pureXML technology offers sophisticated capabilities to store, process, and manage XML data in its native hierarchical format. Although XML allowed a flexible schema and ease of portability as key advantages, the widespread use of e-mail, accumulation of back office content, and other technologies led to the demand for content management systems and the era of analyzing unstructured and semistruc-tured data in enterprises was born. Today, the advent of the Internet, coupled with complete democratization of content creation and distribution in multiple formats, has led to the explosion of all types of data. Data is now not only big, both in terms of volume and variety, but it has a velocity component to it as well. The ability for us to glean the nuggets of information embedded in such a cacophony of data, at precisely the time of need, makes it very exciting. We are sitting at the cusp of another evolution, popularly called as Big Data.

At IBM, our mission is to help our clients achieve their business objectives through technological innovations, and we’ve being doing it for a century as of 2011. During the last five decades, IBM has invented technologies and delivered multiple platforms to meet the evolving data management challenges of our customers. IBM invented the relational database more than 30 years ago, which has become an industry standard with multiple products provided by IBM alone (for example, DB2, Informix, Solid DB, and others). Relational databases have further specialized into multidimensional data warehouses, with highly parallel data servers, a breadth of dedicated appliances (such as Netezza or the Smart Analytics System), as well as analysis and reporting tools (such as SPSS or Cognos).

Across industries and sectors (consumer goods, financial services, government, insurance, telecommunications, and more), companies are assessing how to manage the volume, variety, and velocity of their untapped information in an effort to find ways to make better decisions about their business. This explosion of data comes from a variety of data sources such as sensors, smart devices, social media, billions of Internet and smartphone users, and more. This is data that arrives in massive quantities in its earliest and most primitive form.

Organizations seeking to find a better way, which differentiates them from their competitors, want to tap into the wealth of information hidden in this explosion of data around them to improve their competitiveness, efficiency, insight, profitability, and more. These organizations recognize the value delivered by analyzing all their data (structured, semistructured, and unstructured) coming from a myriad of internal and external sources. This is the realm of “Big Data.” While many companies appreciate that the best Big Data solutions work across functional groups touching many positions, few corporations have figured out how to proceed. The challenge for the enterprise is to have a data platform that leverages these large volumes of data to derive timely insight, while preserving their existing investments in Information Management. In reality, the best Big Data solutions will also help organizations to know their customer better than ever before.

To address these business needs, this book explores key case studies of how people and companies have approached this modern problem. The book diligently describes the challenges of harnessing Big Data and provides examples of Big Data solutions that deliver tangible business benefits.

I would like to thank Paul, George, Tom, and Dirk for writing this book. They are an outstanding group whose dedication to our clients is unmatched. Behind them is the Big Data development team, who continually overcomes the challenges of our decade. I get to work with an outstanding group of people who are passionate about our customers’ success, dedicated to their work, and are continually innovating. It is a privilege to work with them.

Thank you, and enjoy the book.

image

Anjul Bhambhri
IBM Vice President, Big Data Development