Understanding Big Data by Zikopoulos, IBM Paul -- Read -- Imperial Library of Trantor

Index

Cover Page Understanding Big Data Copyright Page Contents Foreword Acknowledgments About this Book Part I Big Data: From the Business Perspective

1 What Is Big Data? Hint: You’re a Part of It Every Day

Characteristics of Big Data

Can There Be Enough? The Volume of Data Variety Is the Spice of Life How Fast Is Fast? The Velocity of Data

Data in the Warehouse and Data in Hadoop (It’s Not a Versus Thing) Wrapping It Up

2 Why Is Big Data Important?

When to Consider a Big Data Solution Big Data Use Cases: Patterns for Big Data Deployment

IT for IT Log Analytics The Fraud Detection Pattern They Said What? The Social Media Pattern The Call Center Mantra: “This Call May Be Recorded for Quality Assurance Purposes” Risk: Patterns for Modeling and Management Big Data and the Energy Sector

3 Why IBM for Big Data?

Big Data Has No Big Brother: It’s Ready, but Still Young What Can Your Big Data Partner Do for You?

The IBM $100 Million Big Data Investment

A History of Big Data Innovation

Domain Expertise Matters

Part II Big Data: From the Technology Perspective

4 All About Hadoop: The Big Data Lingo Chapter

Just the Facts: The History of Hadoop Components of Hadoop

The Hadoop Distributed File System The Basics of MapReduce Hadoop Common Components

Application Development in Hadoop

Pig and PigLatin Hive Jaql

Getting Your Data into Hadoop

Basic Copy Data Flume

Other Hadoop Components

ZooKeeper HBase Oozie Lucene Avro

Wrapping It Up

5 InfoSphere BigInsights: Analytics for Big Data at Rest

Ease of Use: A Simple Installation Process

Hadoop Components Included in BigInsights 1.2

A Hadoop-Ready Enterprise-Quality File System: GPFS-SNC

Extending GPFS for Hadoop: GPFS Shared Nothing Cluster What Does a GPFS-SNC Cluster Look Like? GPFS-SNC Failover Scenarios GPFS-SNC POSIX-Compliance GPFS-SNC Performance GPFS-SNC Hadoop Gives Enterprise Qualities

Compression

Splittable Compression Compression and Decompression

Administrative Tooling Security Enterprise Integration

Netezza DB2 for Linux, UNIX, and Windows JDBC Module InfoSphere Streams InfoSphere DataStage R Statistical Analysis Applications

Improved Workload Scheduling: Intelligent Scheduler Adaptive MapReduce Data Discovery and Visualization: BigSheets Advanced Text Analytics Toolkit Machine Learning Analytics Large-Scale Indexing BigInsights Summed Up

6 IBM InfoSphere Streams: Analytics for Big Data in Motion

InfoSphere Streams Basics

Industry Use Cases for InfoSphere Streams

How InfoSphere Streams Works

What’s a Stream? The Streams Processing Language Source and Sink Adapters Operators Streams Toolkits

Enterprise Class

High Availability Consumability: Making the Platform Easy to Use Integration is the Apex of Enterprise Class Analysis