Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Pig Design Patterns
Table of Contents Pig Design Patterns Credits Foreword About the Author Acknowledgments About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe? Free Access for Packt account holders
Preface
What this book covers
Motivation for this book
What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code
Third-party libraries Datasets
Errata Piracy Questions
1. Setting the Context for Design Patterns in Pig
Understanding design patterns The scope of design patterns in Pig Hadoop demystified – a quick reckoner
The enterprise context Common challenges of distributed systems The advent of Hadoop Hadoop under the covers Understanding the Hadoop Distributed File System
HDFS design goals Working of HDFS
Understanding MapReduce
Understanding how MapReduce works The MapReduce internals
Pig – a quick intro
Understanding the rationale of Pig Understanding the relevance of Pig in the enterprise Working of Pig – an overview
Firing up Pig The use case Code listing The dataset
Understanding Pig through the code
Pig's extensibility Operators used in code The EXPLAIN operator Understanding Pig's data model
Primitive types Complex types
The relevance of schemas
Summary
2. Data Ingest and Egress Patterns
The context of data ingest and egress Types of data in the enterprise Ingest and egress patterns for multistructured data
Considerations for log ingestion
The Apache log ingestion pattern Background Motivation Use cases Pattern implementation Code snippets
Code for the CommonLogLoader class Code for the CombinedLogLoader class
Results Additional information
The Custom log ingestion pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The image ingress and egress pattern
Background Motivation Use cases Pattern implementation
The image Ingress Implementation The image egress implementation
Code snippets
The image ingress
Pig script Image to a sequence UDF snippet
The image egress
Pig script Sequence to an image UDF
Results Additional information
The ingress and egress patterns for the NoSQL data
MongoDB ingress and egress patterns
Background Motivation Use cases Pattern implementation
The ingress implementation The egress implementation
Code snippets
The ingress code The egress code
Results Additional information
The HBase ingress and egress pattern
Background Motivation Use cases Pattern implementation
The ingress implementation The egress implementation
Code snippets
The ingress code The egress code
Results Additional information
The ingress and egress patterns for structured data
The Hive ingress and egress patterns
Background Motivation Use cases Pattern implementation
The ingress implementation The egress implementation
Code snippets
The ingress Code
Importing data using RCFile Importing data using HCatalog
The egress code
Results Additional information
The ingress and egress patterns for semi-structured data
The mainframe ingestion pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
XML ingest and egress patterns
Background Motivation
Motivation for ingesting raw XML Motivation for ingesting binary XML Motivation for egression of XML
Use cases Pattern implementation
The implementation of the XML raw ingestion The implementation of the XML binary ingestion
Code snippets
The XML raw ingestion code The XML binary ingestion code The XML egress code
Pig script The XML storage
Results Additional information
JSON ingress and egress patterns
Background
Motivation Use cases Pattern implementation
The ingress implementation The egress implementation
Code snippets
The ingress code
The code for simple JSON The code for nested JSON
The egress code
Results Additional information
Summary
3. Data Profiling Patterns
Data profiling for Big Data
Big Data profiling dimensions Sampling considerations for profiling Big Data
Sampling support in Pig
Rationale for using Pig in data profiling The data type inference pattern
Background Motivation Use cases Pattern implementation Code snippets
Pig script Java UDF
Results Additional information
The basic statistical profiling pattern
Background Motivation Use cases Pattern implementation Code snippets
Pig script Macro
Results Additional information
The pattern-matching pattern
Background Motivation Use cases Pattern implementation Code snippets
Pig script Macro
Results Additional information
The string profiling pattern
Background Motivation Use cases Pattern implementation Code snippets
Pig script Macro
Results Additional information
The unstructured text profiling pattern
Background Motivation Use cases Pattern implementation Code snippets
Pig script Java UDF for stemming Java UDF for generating TF-IDF
Results Additional information
Summary
4. Data Validation and Cleansing Patterns
Data validation and cleansing for Big Data Choosing Pig for validation and cleansing The constraint validation and cleansing design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The regex validation and cleansing design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The corrupt data validation and cleansing design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The unstructured text data validation and cleansing design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
Summary
5. Data Transformation Patterns
Data transformation processes The structured-to-hierarchical transformation pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The data normalization pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The data integration pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The aggregation pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The data generalization pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
Summary
6. Understanding Data Reduction Patterns
Data reduction – a quick introduction Data reduction considerations for Big Data Dimensionality reduction – the Principal Component Analysis design pattern
Background Motivation Use cases Pattern implementation
Limitations of PCA implementation
Code snippets Results Additional information
Numerosity reduction – the histogram design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
Numerosity reduction – sampling design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
Numerosity reduction – clustering design pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
Summary
7. Advanced Patterns and Future Work
The clustering pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The topic discovery pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The natural language processing pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
The classification pattern
Background Motivation Use cases Pattern implementation Code snippets Results Additional information
Future trends
Emergence of data-driven patterns The emergence of solution-driven patterns Patterns addressing programmability constraints
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion