Apache Hive Essentials · 2nd Edition by Du, Dayong -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Apache Hive Essentials Second Edition

Dedication Packt Upsell

Why subscribe? PacktPub.com

Contributors

About the author About the reviewers Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Overview of Big Data and Hive

A short history Introducing big data The relational and NoSQL databases versus Hadoop Batch, real-time, and stream processing Overview of the Hadoop ecosystem Hive overview Summary

Setting Up the Hive Environment

Installing Hive from Apache Installing Hive from vendors Using Hive in the cloud Using the Hive command Using the Hive IDE Summary

Data Definition and Description

Understanding data types Data type conversions Data Definition Language Database Tables

Table creation Table description Table cleaning Table alteration

Partitions Buckets Views Summary

Data Correlation and Scope

Project data with SELECT Filtering data with conditions Linking data with JOIN

INNER JOIN OUTER JOIN Special joins

Combining data with UNION Summary

Data Manipulation

Data exchanging with LOAD Data exchange with INSERT Data exchange with [EX|IM]PORT Data sorting Functions

Function tips for collections Function tips for date and string Virtual column functions

Transactions and locks

Transactions

UPDATE statement DELETE statement MERGE statement

Locks

Summary

Data Aggregation and Sampling

Basic aggregation Enhanced aggregation

Grouping sets Rollup and Cube

Aggregation condition Window functions

Window aggregate functions Window sort functions Window analytics functions Window expression

Sampling

Random sampling Bucket table sampling Block sampling

Summary

Performance Considerations

Performance utilities

EXPLAIN statement ANALYZE statement Logs

Design optimization

Partition table design Bucket table design Index design Use skewed/temporary tables

Data optimization

File format Compression Storage optimization

Job optimization

Local mode JVM reuse Parallel execution Join optimization

Common join Map join Bucket map join Sort merge bucket (SMB) join Sort merge bucket map (SMBM) join Skew join

Job engine Optimizer

Vectorization optimization Cost-based optimization

Summary

Extensibility Considerations

User-defined functions

UDF code template UDAF code template UDTF code template Development and deployment

HPL/SQL Streaming SerDe Summary

Security Considerations

Authentication

Metastore authentication Hiveserver2 authentication

Authorization

Legacy mode Storage-based mode SQL standard-based mode

Mask and encryption

The data-hashing function The data-masking function The data-encryption function Other methods

Summary

Working with Other Tools

The JDBC/ODBC connector NoSQL The Hue/Ambari Hive view HCatalog Oozie Spark Hivemall Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →