Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Programming Hive SPECIAL OFFER: Upgrade this ebook with O’Reilly Preface
Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us What Brought Us to Hive?
Edward Capriolo Dean Wampler Jason Rutherglen
Acknowledgments
1. Introduction
An Overview of Hadoop and MapReduce
MapReduce
Hive in the Hadoop Ecosystem
Pig HBase Cascading, Crunch, and Others
Java Versus Hive: The Word Count Algorithm What’s Next
2. Getting Started
Installing a Preconfigured Virtual Machine Detailed Installation
Installing Java
Linux-specific Java steps Mac OS X−specific Java steps
Installing Hadoop Local Mode, Pseudodistributed Mode, and Distributed Mode Testing Hadoop Installing Hive
What Is Inside Hive? Starting Hive Configuring Your Hadoop Environment
Local Mode Configuration Distributed and Pseudodistributed Mode Configuration Metastore Using JDBC
The Hive Command
Command Options
The Command-Line Interface
CLI Options Variables and Properties Hive “One Shot” Commands Executing Hive Queries from Files The .hiverc File More on Using the Hive CLI
Autocomplete
Command History Shell Execution Hadoop dfs Commands from Inside Hive Comments in Hive Scripts Query Column Headers
3. Data Types and File Formats
Primitive Data Types Collection Data Types Text File Encoding of Data Values Schema on Read
4. HiveQL: Data Definition
Databases in Hive Alter Database Creating Tables
Managed Tables External Tables
Partitioned, Managed Tables
External Partitioned Tables Customizing Table Storage Formats
Dropping Tables Alter Table
Renaming a Table Adding, Modifying, and Dropping a Table Partition Changing Columns Adding Columns Deleting or Replacing Columns Alter Table Properties Alter Storage Properties Miscellaneous Alter Table Statements
5. HiveQL: Data Manipulation
Loading Data into Managed Tables Inserting Data into Tables from Queries
Dynamic Partition Inserts
Creating Tables and Loading Them in One Query Exporting Data
6. HiveQL: Queries
SELECT … FROM Clauses
Specify Columns with Regular Expressions Computing with Column Values Arithmetic Operators Using Functions
Mathematical functions Aggregate functions Table generating functions Other built-in functions
LIMIT Clause Column Aliases Nested SELECT Statements CASE … WHEN … THEN Statements When Hive Can Avoid MapReduce
WHERE Clauses
Predicate Operators Gotchas with Floating-Point Comparisons LIKE and RLIKE
GROUP BY Clauses
HAVING Clauses
JOIN Statements
Inner JOIN Join Optimizations LEFT OUTER JOIN OUTER JOIN Gotcha RIGHT OUTER JOIN FULL OUTER JOIN LEFT SEMI-JOIN Cartesian Product JOINs Map-side Joins
ORDER BY and SORT BY DISTRIBUTE BY with SORT BY CLUSTER BY Casting
Casting BINARY Values
Queries that Sample Data
Block Sampling Input Pruning for Bucket Tables
UNION ALL
7. HiveQL: Views
Views to Reduce Query Complexity Views that Restrict Data Based on Conditions Views and Map Type for Dynamic Tables View Odds and Ends
8. HiveQL: Indexes
Creating an Index
Bitmap Indexes
Rebuilding the Index Showing an Index Dropping an Index Implementing a Custom Index Handler
9. Schema Design
Table-by-Day Over Partitioning Unique Keys and Normalization Making Multiple Passes over the Same Data The Case for Partitioning Every Table Bucketing Table Data Storage Adding Columns to a Table Using Columnar Tables
Repeated Data Many Columns
(Almost) Always Use Compression!
10. Tuning
Using EXPLAIN EXPLAIN EXTENDED Limit Tuning Optimized Joins Local Mode Parallel Execution Strict Mode Tuning the Number of Mappers and Reducers JVM Reuse Indexes Dynamic Partition Tuning Speculative Execution Single MapReduce MultiGROUP BY Virtual Columns
11. Other File Formats and Compression
Determining Installed Codecs Choosing a Compression Codec Enabling Intermediate Compression Final Output Compression Sequence Files Compression in Action Archive Partition Compression: Wrapping Up
12. Developing
Changing Log4J Properties Connecting a Java Debugger to Hive Building Hive from Source
Running Hive Test Cases Execution Hooks
Setting Up Hive and Eclipse Hive in a Maven Project Unit Testing in Hive with hive_test The New Plugin Developer Kit
13. Functions
Discovering and Describing Functions Calling Functions Standard Functions Aggregate Functions Table Generating Functions A UDF for Finding a Zodiac Sign from a Day UDF Versus GenericUDF Permanent Functions User-Defined Aggregate Functions
Creating a COLLECT UDAF to Emulate GROUP_CONCAT
User-Defined Table Generating Functions
UDTFs that Produce Multiple Rows UDTFs that Produce a Single Row with Multiple Columns UDTFs that Simulate Complex Types
Accessing the Distributed Cache from a UDF Annotations for Use with Functions
Deterministic Stateful DistinctLike
Macros
14. Streaming
Identity Transformation Changing Types Projecting Transformation Manipulative Transformations Using the Distributed Cache Producing Multiple Rows from a Single Row Calculating Aggregates with Streaming CLUSTER BY, DISTRIBUTE BY, SORT BY GenericMR Tools for Streaming to Java Calculating Cogroups
15. Customizing Hive File and Record Formats
File Versus Record Formats Demystifying CREATE TABLE Statements File Formats
SequenceFile RCFile Example of a Custom Input Format: DualInputFormat
Record Formats: SerDes CSV and TSV SerDes ObjectInspector Think Big Hive Reflection ObjectInspector XML UDF XPath-Related Functions JSON SerDe Avro Hive SerDe
Defining Avro Schema Using Table Properties Defining a Schema from a URI Evolving Schema
Binary Output
16. Hive Thrift Service
Starting the Thrift Server Setting Up Groovy to Connect to HiveService Connecting to HiveServer Getting Cluster Status Result Set Schema Fetching Results Retrieving Query Plan Metastore Methods
Example Table Checker
Finding tables not marked as external
Administrating HiveServer
Productionizing HiveService Cleanup
Hive ThriftMetastore
ThriftMetastore Configuration Client Configuration
17. Storage Handlers and NoSQL
Storage Handler Background HiveStorageHandler HBase Cassandra
Static Column Mapping Transposed Column Mapping for Dynamic Columns Cassandra SerDe Properties
DynamoDB
18. Security
Integration with Hadoop Security Authentication with Hive Authorization in Hive
Users, Groups, and Roles Privileges to Grant and Revoke Partition-Level Privileges Automatic Grants
19. Locking
Locking Support in Hive with Zookeeper Explicit, Exclusive Locks
20. Hive Integration with Oozie
Oozie Actions
Hive Thrift Service Action
A Two-Query Workflow Oozie Web Console Variables in Workflows Capturing Output Capturing Output to Variables
21. Hive and Amazon Web Services (AWS)
Why Elastic MapReduce? Instances Before You Start Managing Your EMR Hive Cluster Thrift Server on EMR Hive Instance Groups on EMR Configuring Your EMR Cluster
Deploying hive-site.xml Deploying a .hiverc Script
Deploying .hiverc using a config step Deploying a .hiverc using a bootstrap action
Setting Up a Memory-Intensive Configuration
Persistence and the Metastore on EMR HDFS and S3 on EMR Cluster Putting Resources, Configs, and Bootstrap Scripts on S3 Logs on S3 Spot Instances Security Groups EMR Versus EC2 and Apache Hive Wrapping Up
22. HCatalog
Introduction MapReduce
Reading Data Writing Data
Command Line Security Model Architecture
23. Case Studies
m6d.com (Media6Degrees)
Data Science at M6D Using Hive and R M6D UDF Pseudorank M6D Managing Hive Data Across Multiple MapReduce Clusters
Cross deployment queries with Hive Replicating Hive data between deployments
Outbrain
In-Site Referrer Identification
Cleaning up the URLs Determining referrer type Multiple URLs
Counting Uniques
Why this is a problem Load a temp table Querying the temp table
Sessionization
Setting it up Finding origin pageviews Bucketing PVs to origins Aggregating on origins Aggregating on origin type Measure engagement
NASA’s Jet Propulsion Laboratory
The Regional Climate Model Evaluation System Our Experience: Why Hive? Some Challenges and How We Overcame Them
Conclusion
Photobucket
Big Data at Photobucket What Hardware Do We Use for Hive? What’s in Hive? Who Does It Support?
SimpleReach Experiences and Needs from the Customer Trenches
A Karmasphere Perspective Introduction Use Case Examples from the Customer Trenches
Customer trenches #1: Optimal data formatting for Hive Customer trenches #2: Partitions and performance Customer trenches #3: Text analytics with Regex, Lateral View Explode, Ngram, and other UDFs
Apache Hive in production: Incremental needs and capabilities
Collaborative multiuser environments Productivity enhancements Managing Hive assets Extending Hive for advanced analytics Extending Hive beyond the SQL skill set Data exploration capabilities Schedule and operationalize Hive queries
About Karmasphere
Hive features survey
Glossary A. References Index About the Authors Colophon SPECIAL OFFER: Upgrade this ebook with O’Reilly Copyright
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion