Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
MapReduce Design Patterns
Dedication
Preface
Intended Audience
Pattern Format
The Examples in This Book
Conventions Used in This Book
Using Code Examples
SafariĀ® Books Online
How to Contact Us
Acknowldgements
1. Design Patterns and MapReduce
Design Patterns
MapReduce History
MapReduce and Hadoop Refresher
Hadoop Example: Word Count
Pig and Hive
2. Summarization Patterns
Numerical Summarizations
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Numerical Summarization Examples
Minimum, maximum, and count example
MinMaxCountTuple code
Mapper code
Reducer code
Combiner optimization
Data flow diagram
Average example
Mapper code
Reducer code
Combiner optimization
Data flow diagram
Median and standard deviation
Mapper code
Reducer code
Combiner optimization
Memory-conscious median and standard deviation
Mapper code
Reducer code
Combiner optimization
Data flow diagram
Inverted Index Summarizations
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Performance analysis
Inverted Index Example
Wikipedia reference inverted index
Mapper code
Reducer code
Combiner optimization
Counting with Counters
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Performance analysis
Counting with Counters Example
Number of users per state
Mapper code
Driver code
3. Filtering Patterns
Filtering
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Filtering Examples
Distributed grep
Mapper code
Simple Random Sampling
Mapper Code
Bloom Filtering
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Bloom Filtering Examples
Hot list
Bloom filter training
Mapper code
HBase Query using a Bloom filter
Mapper Code
Top Ten
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Top Ten Examples
Top ten users by reputation
Mapper code
Reducer code
Distinct
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Distinct Examples
Distinct user IDs
Mapper code
Reducer code
Combiner optimization
4. Data Organization Patterns
Structured to Hierarchical
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Structured to Hierarchical Examples
Post/comment building on StackOverflow
Driver code
Mapper code
Reducer code
Question/answer building on StackOverflow
Mapper code
Reducer code
Partitioning
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Known uses
Resemblances
Performance analysis
Partitioning Examples
Partitioning users by last access date
Driver code
Mapper code
Partitioner code
Reducer code
Binning
Pattern Description
Intent
Motivation
Structure
Consequences
Resemblances
Performance analysis
Binning Examples
Binning by Hadoop-related tags
Driver code
Mapper code
Total Order Sorting
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Resemblances
Performance analysis
Total Order Sorting Examples
Sort users by last visit
Driver code
Analyze mapper code
Order mapper code
Order reducer code
Shuffling
Pattern Description
Intent
Motivation
Structure
Consequences
Resemblances
Performance analysis
Shuffle Examples
Anonymizing StackOverflow comments
Mapper code
Reducer code
5. Join Patterns
A Refresher on Joins
Reduce Side Join
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Resemblances
Performance analysis
Reduce Side Join Example
User and comment join
Driver code
User mapper code
Comment mapper code
Reducer code
Combiner optimization
Reduce Side Join with Bloom Filter
Reputable user and comment join
User mapper code
Comment mapper code
Replicated Join
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Resemblances
Performance analysis
Replicated Join Examples
Replicated user comment example
Mapper code
Composite Join
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Performance analysis
Composite Join Examples
Composite user comment join
Driver code
Mapper code
Reducer and combiner
Cartesian Product
Pattern Description
Intent
Motivation
Applicability
Structure
Consequences
Resemblances
Performance Analysis
Cartesian Product Examples
Comment Comparison
Input format code
Driver code
Record reader code
Mapper code
6. Metapatterns
Job Chaining
With the Driver
Job Chaining Examples
Basic job chaining
Job one mapper
Job one reducer
Job two mapper
Driver code
Parallel job chaining
Mapper code
Reducer code
Driver code
With Shell Scripting
Bash example
Bash script
Sample run
With JobControl
Job control example
Main method
Helper methods
Chain Folding
The ChainMapper and ChainReducer Approach
Chain Folding Example
Bin users by reputation
Parsing mapper code
Replicated join mapper code
Reducer code
Binning mapper code
Driver code
Job Merging
Job Merging Examples
Anonymous comments and distinct users
TaggedText WritableComparable
Merged mapper code
Merged reducer code
Driver code
7. Input and Output Patterns
Customizing Input and Output in Hadoop
InputFormat
RecordReader
OutputFormat
RecordWriter
Generating Data
Pattern Description
Intent
Motivation
Structure
Consequences
Resemblances
Performance analysis
Generating Data Examples
Generating random StackOverflow comments
Driver code
InputSplit code
InputFormat code
RecordReader code
External Source Output
Pattern Description
Intent
Motivation
Structure
Consequences
Performance analysis
External Source Output Example
Writing to Redis instances
OutputFormat code
RecordReader code
Mapper Code
Driver Code
External Source Input
Pattern Description
Intent
Motivation
Structure
Consequences
Performance analysis
External Source Input Example
Reading from Redis Instances
InputSplit code
InputFormat code
RecordReader code
Driver code
Partition Pruning
Pattern Description
Intent
Motivation
Structure
Consequences
Resemblances
Performance analysis
Partition Pruning Examples
Partitioning by last access date to Redis instances
Custom WritableComparable code
OutputFormat code
RecordWriter code
Mapper code
Driver code
Querying for user reputation by last access date
InputSplit code
InputFormat code
RecordReader code
Driver code
8. Final Thoughts and the Future of Design Patterns
Trends in the Nature of Data
Images, Audio, and Video
Streaming Data
The Effects of YARN
Patterns as a Library or Component
How You Can Help
A. Bloom Filters
Overview
Use Cases
Representing a Data Set
Reduce Queries to External Database
Google BigTable
Downsides
Tweaking Your Bloom Filter
Index
About the Authors
Colophon
Copyright
← Prev
Back
Next →
← Prev
Back
Next →