HBase Primer

Deepak Vohra

Apache HBase Primer

Deepak Vohra

White Rock, British Columbia

Canada

ISBN-13 (pbk): 978-1-4842-2423-6 ISBN-13 (electronic): 978-1-4842-2424-3 DOI 10.1007/978-1-4842-2424-3

Library of Congress Control Number: 2016959189

Copyright © 2016 by Deepak Vohra

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Managing Director: Welmoed Spahr

Lead Editor: Steve Anglin

Technical Reviewer: Massimo Nardone

Editorial Board: Steve Anglin, Pramila Balan, Laura Berendson, Aaron Black, Louise Corrigan, Jonathan Gennick, Robert Hutchinson, Celestin Suresh John, Nikhil Karkal, James Markham, Susan McDermott, Matthew Moodie, Natalie Pao, Gwenan Spearing

Coordinating Editor: Mark Powers

Copy Editor: Mary Behr

Compositor: SPi Global

Indexer: SPi Global

Artist: SPi Global

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com , or visit www.springeronline.com . Apress Media, LLC is a

California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please e-mail rights@apress.com , or visit www.apress.com .

Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales–eBook Licensing web page at www.apress.com/bulk-sales .

Any source code or other supplementary materials referenced by the author in this text are available to readers at www.apress.com . For detailed information about how to locate your

book’s source code, go to www.apress.com/source-code/ . Readers can also access source code

at SpringerLink in the Supplementary Material section for each chapter. Printed on acid-free paper

About the Author ............................................................................ xiii

About the Technical Reviewer ......................................................... xv

Introduction ................................................................................... xvii

1

Chapter 1: Fundamental Characteristics ........................................ 3

Chapter 2: Apache HBase and HDFS ............................................... 9

Chapter 3: Application Characteristics ......................................... 45

49

Chapter 4: Physical Storage ......................................................... 51

Chapter 5: Column Family and Column Qualifi er .......................... 53

Chapter 6: Row Versioning ........................................................... 59

Chapter 7: Logical Storage ........................................................... 63

67

Chapter 8: Major Components of a Cluster ................................... 69

Chapter 9: Regions ....................................................................... 75

Chapter 10: Finding a Row in a Table ........................................... 81

Chapter 11: Compactions ............................................................. 87

Chapter 12: Region Failover ......................................................... 99

Chapter 13: Creating a Column Family ....................................... 105

CONTENTS AT A GLANCE

109

Chapter 14: Region Splitting....................................................... 111

Chapter 15: Defi ning the Row Keys ............................................ 117

121

Chapter 16: The HBaseAdmin Class............................................ 123

Chapter 17: Using the Get Class ................................................. 129

Chapter 18: Using the HTable Class ............................................ 133

135

Chapter 19: Using the HBase Shell ............................................. 137

Chapter 20: Bulk Loading Data ................................................... 145

Index .............................................................................................. 149

About the Author ............................................................................ xiii

About the Technical Reviewer ......................................................... xv

Introduction ................................................................................... xvii

1

Chapter 1: Fundamental Characteristics ........................................ 3

Distributed ............................................................................................... 3

Big Data Store ......................................................................................... 3

Non-Relational ......................................................................................... 3

Flexible Data Model ................................................................................. 4

Scalable ................................................................................................... 4

Roles in Hadoop Big Data Ecosystem ...................................................... 5

How Is Apache HBase Different from a Traditional RDBMS? ................... 5

Summary ................................................................................................. 8

Chapter 2: Apache HBase and HDFS ............................................... 9

Overview ................................................................................................. 9

Storing Data .......................................................................................... 14

HFile Data fi les- HFile v1 ....................................................................... 15

HBase Blocks ........................................................................................ 17

Key Value Format .................................................................................. 18

HFile v2 ................................................................................................. 19

Encoding................................................................................................ 20

CONTENTS

Compaction ........................................................................................... 21

KeyValue Class ...................................................................................... 21

Data Locality .......................................................................................... 24

Table Format ......................................................................................... 25

HBase Ecosystem .................................................................................. 25

HBase Services ..................................................................................... 26

Auto-sharding ........................................................................................ 27

The Write Path to Create a Table ........................................................... 27

The Write Path to Insert Data ................................................................ 28

The Write Path to Append-Only R/W ...................................................... 29

The Read Path for Reading Data ........................................................... 30

The Read Path Append-Only to Random R/W ........................................ 30

HFile Format .......................................................................................... 30

Data Block Encoding ............................................................................. 31

Compactions ......................................................................................... 32

Snapshots ............................................................................................. 32

The HFileSystem Class .......................................................................... 33

Scaling .................................................................................................. 33

HBase Java Client API............................................................................ 35

Random Access ..................................................................................... 36

Data Files (HFile) ................................................................................... 36

Reference Files/Links ............................................................................ 37

Write-Ahead Logs .................................................................................. 38

Data Locality .......................................................................................... 38

Checksums ............................................................................................ 40

Data Locality for HBase ......................................................................... 42

CONTENTS

MemStore .............................................................................................. 42

Summary ............................................................................................... 43

Chapter 3: Application Characteristics ......................................... 45

Summary ............................................................................................... 47

49

Chapter 4: Physical Storage ......................................................... 51

Summary ............................................................................................... 52

Chapter 5: Column Family and Column Qualifi er .......................... 53

Summary ............................................................................................... 57

Chapter 6: Row Versioning ........................................................... 59

Versions Sorting .................................................................................... 61

Summary ............................................................................................... 62

Chapter 7: Logical Storage ........................................................... 63

Summary ............................................................................................... 65

67

Chapter 8: Major Components of a Cluster ................................... 69

Master ................................................................................................... 70

RegionServers ....................................................................................... 70

ZooKeeper ............................................................................................. 71

Regions ................................................................................................. 72

Write-Ahead Log .................................................................................... 72

Store ...................................................................................................... 72

HDFS...................................................................................................... 73

Clients ................................................................................................... 73

Summary ............................................................................................... 73

CONTENTS

Chapter 9: Regions ....................................................................... 75

How Many Regions? .............................................................................. 76

Compactions ......................................................................................... 76

Region Assignment ................................................................................ 76

Failover .................................................................................................. 77

Region Locality ...................................................................................... 77

Distributed Datastore ............................................................................ 77

Partitioning ............................................................................................ 77

Auto Sharding and Scalability ............................................................... 78

Region Splitting ..................................................................................... 78

Manual Splitting .................................................................................... 79

Pre-Splitting .......................................................................................... 79

Load Balancing ...................................................................................... 79

Preventing Hotspots .............................................................................. 80

Summary ............................................................................................... 80

Chapter 10: Finding a Row in a Table ........................................... 81

Block Cache ........................................................................................... 82

The hbase:meta Table .......................................................................... 83

Summary ............................................................................................... 85

Chapter 11: Compactions ............................................................. 87

Minor Compactions ............................................................................... 87

Major Compactions ............................................................................... 88

Compaction Policy ................................................................................. 88

Function and Purpose ........................................................................... 89

Versions and Compactions .................................................................... 90

Delete Markers and Compactions ......................................................... 90

Expired Rows and Compactions ............................................................ 90

CONTENTS

Region Splitting and Compactions ........................................................ 90

Number of Regions and Compactions ................................................... 91

Data Locality and Compactions ............................................................. 91

Write Throughput and Compactions ...................................................... 91

Encryption and Compactions................................................................. 91

Confi guration Properties ....................................................................... 92

Summary ............................................................................................... 97

Chapter 12: Region Failover ......................................................... 99

The Role of the ZooKeeper .................................................................... 99

HBase Resilience ................................................................................... 99

Phases of Failover ............................................................................... 100

Failure Detection ................................................................................. 102

Data Recovery ..................................................................................... 102

Regions Reassignment ........................................................................ 103

Failover and Data Locality ................................................................... 103

Confi guration Properties ..................................................................... 103

Summary ............................................................................................. 103

Chapter 13: Creating a Column Family ....................................... 105

Cardinality ........................................................................................... 105

Number of Column Families ................................................................ 106

Column Family Compression ............................................................... 106

Column Family Block Size ................................................................... 106

Bloom Filters ....................................................................................... 106

IN_MEMORY ........................................................................................ 107

MAX_LENGTH and MAX_VERSIONS ..................................................... 107

Summary ............................................................................................. 107

CONTENTS

109

Chapter 14: Region Splitting....................................................... 111

Managed Splitting ............................................................................... 112

Pre-Splitting ........................................................................................ 113

Confi guration Properties ..................................................................... 113

Summary ............................................................................................. 116

Chapter 15: Defi ning the Row Keys ............................................ 117

Table Key Design ................................................................................. 117

Filters .................................................................................................. 118

FirstKeyOnlyFilter Filter ........................................................................................ 118

KeyOnlyFilter Filter ............................................................................................... 118

Bloom Filters ....................................................................................... 118

Scan Time ............................................................................................ 118

Sequential Keys ................................................................................... 118

Defi ning the Row Keys for Locality ..................................................... 119

Summary ............................................................................................. 119

121

Chapter 16: The HBaseAdmin Class............................................ 123

Summary ............................................................................................. 127

Chapter 17: Using the Get Class ................................................. 129

Summary ............................................................................................. 132

Chapter 18: Using the HTable Class ............................................ 133

Summary ............................................................................................. 134

CONTENTS

135

Chapter 19: Using the HBase Shell ............................................. 137

Creating a Table ................................................................................... 137

Altering a Table .................................................................................... 138

Adding Table Data ................................................................................ 139

Describing a Table ............................................................................... 139

Finding If a Table Exists ....................................................................... 139

Listing Tables ....................................................................................... 139

Scanning a Table ................................................................................. 140

Enabling and Disabling a Table............................................................ 141

Dropping a Table .................................................................................. 141

Counting the Number of Rows in a Table ............................................ 141

Getting Table Data ............................................................................... 141

Truncating a Table ............................................................................... 142

Deleting Table Data ............................................................................. 142

Summary ............................................................................................. 143

Chapter 20: Bulk Loading Data ................................................... 145

Summary ............................................................................................. 147

Index .............................................................................................. 149

Deepak Vohra is a consultant and a principal member of

the NuBean software company. Deepak is a Sun-certified

Java programmer and Web component developer. He has

worked in the fields of XML, Java programming, and Java

EE for over seven years. Deepak is the coauthor of Pro

XML Development with Java Technology (Apress, 2006).

Deepak is also the author of the JDBC 4.0 and Oracle

JDeveloper for J2EE Development, Processing XML

Documents with Oracle JDeveloper 11g, EJB 3.0 Database

Persistence with Oracle Fusion Middleware 11g , and Java

EE Development in Eclipse IDE (Packt Publishing). He

also served as the technical reviewer on WebLogic: The

Definitive Guide (O’Reilly Media, 2004) and Ruby

Programming for the Absolute Beginner ( Cengage

Learning PTR, 2007).

Massimo Nardone has more than 22 years of

experience in security, web/mobile development, and

cloud and IT architecture. His true IT passions are

security and Android. He has been programming and

teaching how to program with Android, Perl, PHP, Java,

VB, Python, C/C++, and MySQL for more than 20 years.

Technical skills include security, Android, cloud, Java,

MySQL, Drupal, Cobol, Perl, web and mobile

development, MongoDB, D3, Joomla, Couchbase,

C/C++, WebGL, Python, Pro Rails, Django CMS, Jekyll,

Scratch, etc.

He currently works as Chief Information

Security Office (CISO) for Cargotec Oyj. He holds four

international patents (PKI, SIP, SAML, and Proxy areas).

He worked as a visiting lecturer and supervisor for

exercises at the Networking Laboratory of the Helsinki University of Technology (Aalto University). He has also worked as a Project Manager, Software Engineer, Research Engineer, Chief Security Architect, Information Security Manager, PCI/SCADA Auditor, and Senior Lead IT Security/Cloud/SCADA Architect for many years. He holds a Master of Science degree in Computing Science from the University of Salerno, Italy.

Massimo has reviewed more than 40 IT books for different publishing companies, and he is the coauthor of Pro Android Games (Apress, 2015).

Apache HBase is an open source NoSQL database based on the wide-column data store model. HBase was initially released in 2008. While many NoSQL databases are available, Apache HBase is the database for the Apache Hadoop ecosystem.

HBase supports most of the commonly used programming languages such as C, C++, PHP, and Java. The implementation language of HBase is Java. HBase provides access support with Java API, RESTful HTTP API, and Thrift.

Some of the other Apache HBase books have a practical orientation and do not discuss HBase concepts in much detail. In this primer level book, I shall discuss Apache HBase concepts. For practical use of Apache HBase, refer another Apress book: Practical Hadoop Ecosystem .

Apache HBase is the Hadoop database. HBase is open source and its fundamental characteristics are that it is a non-relational, column-oriented, distributed, scalable, big data store. HBase provides schema flexibility. The fundamental characteristics of Apache HBase are as follows.

Distributed

HBase provides two distributed modes. In the pseudo-distributed mode, all HBase daemons run on a single node. In the fully-distributed mode , the daemons run on multiple nodes across a cluster. Pseudo-distributed mode can run against a local file system or an instance of the Hadoop Distributed File System (HDFS) . When run against local file system, durability is not guaranteed. Edits are lost if files are not properly closed. The fully-distributed mode can only run on HDFS. Pseudo-distributed mode is suitable for small-scale testing while fully-distributed mode is suitable for production. Running against HDFS preserves all writes.

HBase supports auto-sharding, which implies that tables are dynamically split and distributed by the database when they become too large.

Big Data Store

HBase is based on Hadoop and HDFS, and it provides low latency, random, real-time, read/ write access to big data. HBase supports hosting very large tables with billions of rows and billions/millions of columns. HBase can handle petabytes of data. HBase is designed for queries of massive data sets and is optimized for read performance. Random read access is not a Apache Hadoop feature as with Hadoop the reader can only run batch processing, which implies that the data is accessed only in a sequential way so that it has to search the entire dataset for any jobs needed to perform.

Non-Relational

HBase is a NoSQL database. NoSQL databases are not based on the relational database model. Relational databases such as Oracle database, MySQL database, and DB2 database store data in tables, which have relations between them and make use of © Deepak Vohra 2016 3

D. Vohra, Apache HBase Primer , DOI 10.1007/978-1-4842-2424-3_1 CHAPTER 1 FUNDAMENTAL CHARACTERISTICS

SQL (Structured Query Language) to access and query the tables. NoSQL databases, in contrast, make use of a storage-and-query mechanism that is predominantly based on a non-relational, non-SQL data model. The data storage model used by NoSQL databases is not some fixed data model; it is a flexible schema data model. The common feature among the NoSQL databases is that the relational and tabular database model of SQL-based databases is not used. Most NoSQL databases make use of no SQL at all, but NoSQL does not imply absolutely no SQL is used, because of which NoSQL is also termed as “not only SQL.”

Flexible Data Model

In 2006 the Google Labs team published a paper entitled “BigTable: A Distributed Storage System for Structured Data" ( http://static.googleusercontent.com/media/research. google.com/en//archive/bigtable-osdi06.pdf ). Apache HBase is a wide-column data store based on Apache Hadoop and on BigTable concepts. The basic unit of storage in HBase is a table . A table consists of one or more column families , which further consists of columns . Columns are grouped into column families. Data is stored in rows . A row is a collection of key/value pairs. Each row is uniquely identified by a row key. The row keys are created when table data is added and the row keys are used to determine the sort order and for data sharding , which is splitting a large table and distributing data across the cluster.

HBase provides a flexible schema model in which columns may be added to a table column family as required without predefining the columns. Only the table and column family/ies are required to be defined in advance. No two rows in a table are required to have the same column/s. All columns in a column family are stored in close proximity.

HBase does not support transactions. HBase is not eventually consistent but is a strongly consistent at the record level. Strong consistency implies that the latest data is always served but at the cost of increased latency. In contrast, eventual consistency can return out-of-date data.

HBase does not have the notion of data types, but all data is stored as an array of bytes. Rows in a table are sorted lexicographically by row key, a design feature that makes

it feasible to store related rows (or rows that will be read together) together for optimized scan.

Scalable

The basic unit of horizontal scalability in HBase is a region . Rows are shared by regions. A region is a sorted set consisting of a range of adjacent rows stored together. A table’s data can be stored in one or more regions. When a region becomes too large, it splits into two at the middle row key into approximately two equal regions. For example, in Figure  1-1 , the region has 12 rows, and it splits into two regions with 6 rows each.

CHAPTER 1 FUNDAMENTAL CHARACTERISTICS

Figure 1-1. Region split example

For balancing, data associated with a region can be stored on different nodes in a cluster.

Roles in Hadoop Big Data Ecosystem HBase is run against HDFS in the fully-distributed mode; it also has the same option available in the pseudo-distributed mode. HBase stores data in StoreFiles on HDFS. HBase does not make use of the MapReduce framework of Hadoop but could serve as the source and/or destination of MapReduce jobs.

Just as Hadoop, HBase is designed to run on commodity hardware with tolerance for individual node failure. HBase is designed for batch processing systems optimized for streamed access to large data sets. While HBase supports random read access, HBase is not designed to optimize random access. HBase incurs a random read latency which may be reduced by enabling the Block Cache and increasing the Heap size.

HBase can be used for real-time analytics in conjunction with MapReduce and other frameworks in the Hadoop ecosystem, such as Apache Hive. How Is Apache HBase Different from a Traditional RDBMS?

HBase stores data in a table, which has rows and column just as a RDBMS does, but the databases are mostly different, other than the similar terminology. Table  1-1 shows the salient differences.

Schema Fixed schema Flexible schema

Data volume Small to medium (few Big Data (hundreds of thousand or a million rows). millions or billions rows). PB TB of data.

Primary query language SQL Get, Put, Scan shell

commands.

Data types Typed columns No data types

Relational integrity Supported Not supported

Joins Supports joins, such as Does not provide built-in

equi-joins and outer-joins support for joins. Joins using

MapReduce.

Data structure Highly structured and Un-structured, semi-statically structured. Fixed structured, and structured. data model. Flexible data model.

Transactions Supported Not supported

Advanced query language Supported Not supported

Indexes Primary, secondary, B-Tree, Secondary indexes

Clustered

Consistency Strongly consistent. CAP Strongly consistent. ACID theorem consistency.

Scalability Provides some level of Highly scalable with

scalability with vertical horizontal scalability in

scaling in which additional which additional servers

capacity may be added to a may be added. Scales

server linearly.

Distributed Distributed to some extent Highly distributed Real-time queries Supported

Triggers Supported Provides RDBMS-

like triggers through

coprocessors

Hardware Specialized hardware and Commodity hardware and less hardware more hardware

Stored procedures Supported Provides stored procedures

through coprocessors

Java Does not require Java Requires Java

High Availability Supported Ultra-high availability

Fault tolerant Fault tolerant to some extent Highly fault tolerant Normalization Required for large data sets Not required Object-oriented Because of the complexity The key-value storage makes programming model of aggregating data, using HBase suitable for object-

JOINS using an object- oriented programming

oriented programming model. Supported with

model is not suitable. client APIs in object-

oriented languages such as

PHP, Ruby, and Java.

Administration More administration Less administration with auto-sharding, scaling, and

rebalancing

Architecture Monolithic Distributed

Sharding Limited support. Manual Auto-sharding

server sharding. Table

partitioning.

Write performance Does not scale well Scales linearly

Single point of failure With single point of failure Without single point of failure

Replication Asynchronous Asynchronous

Storage model Tablespaces StoreFiles (HFiles) in HDFS

Compression Built-in with some RDBMS Built-in Gzip compression in storage engine/table/

index. Various methods

available to other RDBMS.

Caching Standard data/metadata In-memory caching

cache with query cache

Primary data object Table Table

Read/write throughput 1000s of queries per second Millions of queries per second

Security Authentication/ Authentication/

Authorization Authorization

Row-/Column- Oriented Row-oriented Column-oriented Sparse tables Suitable for sparse tables Not optimized for sparse

tables

Wide/narrow tables Narrow tables Wide tables

MapReduce integration Not designed for MR Well integrated CHAPTER 1 FUNDAMENTAL CHARACTERISTICS

Summary

In this chapter, I discussed the fundamental characteristics of big data store Apache HBase, which are its distributedness, scalability, non-relational, and flexible model. I also discussed HBase's role in the Hadoop big data ecosystem and how is HBase different from traditional RDBMS. In the next chapter, I will discuss how HBase stores its data in a H D F S .

Apache HBase runs on HDFS as the underlying filesysystem and benefits from HDFS features such as data reliability, scalability, and durability. HBase stores data as Store Files ( HFiles ) on the HDFS Datanodes. HFile is the file format for HBase and org.apache.hadoop.hbase. io.hfile.HFile is a Java class. HFile is an HBase-specific file format that is based on the TFile binary file format. A Store File is a lightweight wrapper around the HFile. In addition to storing table data HBase also stores the write-ahead logs (WALs ), which store data before it is written to HFiles on HDFS. HBase provides a Java API for client access. HBase itself is a HDFS client and makes use of the Java class DFSClient . References to DFSClient appear in the HBase client log messages and HBase logs as HBase makes use of the class to connect to NameNode to get block locations for Datanode blocks and add data to the Datanode blocks. HBase leverages the fault tolerance provided by the Hadoop Distributed File System (HDFS) . HBase requires some configuration at the client side (HBase) and the server side (HDFS). Overview

HBase storage in HDFS has the following characteristics :

Large files

A lot of random seeks

Latency sensitive

Durability guarantees with sync

Computation generates local data

Large number of open files

HBase makes use of three types of files:

WALs or HLogs

Data files (also known as store files or HFile s)

0 length files: References (symbolic or logical links)

Each file is replicated three times. Data files are stored in the following format in which "userTable" and "table1" are example tables, and "098345asde5t6u" and "09jhk7y65" are regions, and "family" and "f1" are column families, and "09ojki8dgf6" and "90oifgr56" are the data files:

© Deepak Vohra 2016 9

D. Vohra, Apache HBase Primer , DOI 10.1007/978-1-4842-2424-3_2 CHAPTER 2 APACHE HBASE AND HDFS

hdfs://localhost:41020/hbase/userTable/098345asde5t6u/family/09ojki8dgf6 hdfs://localhost:41020/hbase/userTable/09drf6745asde5t6u/family/09ojki8ddfre5 hdfs://localhost:41020/hbase/table1/09jhk7y65/f1/90oifgr56

Write-ahead logs are stored in the following format in which "server1" is the region server: hdfs://localhost:41020/hbase/.logs/server1,602045,123456090 hdfs://localhost:41020/hbase/.logs/server1,602045,123456090/server1%2C134r5

Links are stored in the following format in which "098eerf6" is a region, "family" is a column family and "8uhy67" is the link:

hdfs://localhost:41020/hbase/.hbase_snapshot/usertable_snapshot/098eerf6/ family/8uhy67

Datanode failure is handled by HDFS using replication. HBase provides real-time, random, read/write access to data in the HDFS. A data consumer reads data from HBase, which reads data from HDFS, as shown in Figure  2-1 . A data producer writes data to HBase, which writes data to HDFS. A data producer may also write data to HDFS directly.

Figure 2-1. Apache HBase read/write path

HBase is the most advanced user of HDFS and makes use of the HFileSystem (an encapsulation of FileSystem ) Java API. HFileSystem has the provision to use separate filesystem objects for reading and writing WALs and HFiles. The HFileSystem. create(Path path) method is used to create the HFile data files and HLog W A L s .

HBase communicates with both the NameNode and the Datanodes using HDFS Client classes such as DFSClient . HDFS pipelines communications, including data writes, from one Datanode to another as shown in Figure  2-2 . HBase write errors, including communication issues between Datanodes, are logged to the HDFS logs and not the HBase logs. Whenever feasible HBase writes are local, which implies that the writes are to a local Datanode. As a result, Region Servers should not get too many write errors. Errors may be logged in both HDFS and HBase but if a Datanode is not able to replicate data blocks, errors are written to HDFS logs and not the HBase logs.

CHAPTER 2 APACHE HBASE AND HDFS

Figure 2-2. Apache HBase communication with NameNode and Datanode

The hdfs-site.xml configuration file contains information like the following: The value of replication factor

NameNode path

Datanode path of the local file systems where you want to store the Hadoop infrastructure

A HBase client communicates with the ZooKeeper and the HRegionServers . The HMaster coordinates the RegionServers. The RegionServers run on the Datanodes. When HBase starts up, the HMaster assigns the regions to each RegionServer, including the regions for the –ROOT- and .META. ( hbase:meta catalog table in later version) tables. The HBase table structure comprises of Column Families (or a single Column Family), that group similar column data. The basic element of distribution for an HBase table is a Region, which is further comprised of a Store per column family. A Region is a subset of a table’s data. A set of regions is served by a RegionServer and each region is served by only one RegionServer. A Store has an in-memory component called the MemStore and a persistent storage component called an HFile or StoreFile. The DFSClient is used to store and replicate the HFiles and HLogs in the HDFS datanodes. The storage architecture of HBase is shown in Figure  2-3 .

CHAPTER 2 APACHE HBASE AND HDFS

Figure 2-3. Apache HBase storage architecture

As shown in Figure  2-3 , the HFiles and HLogs are primarily handled by the HRegions but sometimes even the HRegionServers have to perform low-level operations. The following flow sequence is used when a new client contacts the ZooKeeper quorum to find a row key:

1. First, the client gets the server name that hosts the –ROOT- region from the ZooKeeper. In later versions, the –ROOT-

region is not used and the hbase:meta table location is stored directly on the ZooKeeper.

2. Next, the client queries the server to get the server that hosts the .META. tables. The –ROOT- and .META. server info is

cached and looked up once only.

3. Subsequently, the client queries the .META. server to get the server that has the row the client is trying to get.

4. The client caches the information about the HRegion in which the row is located.

5. The client contacts the HRegionServer hosting the region directly. The client does not need to contact the .META. server again and again once it finds and caches information about the location of the row/s.

6. The HRegionServer opens the region and creates a HRegion object. A store instance is created for each HColumnFamily for each table. Each of the store instances can have StoreFile instances, which are lightweight wrappers around the HFile storage files. Each HRegion has a MemStore and a HLog

instance.

CHAPTER 2 APACHE HBASE AND HDFS

HBase communicates with HDFS on two different ports:

1. Port 50010 using the ipc.Client interface. It is configured in HDFS using the dfs.datanode.ipc.address configuration parameter.

2. Port 50020 using the DataNode class. It is configured in HDFS using the dfs.datanode.address configuration parameter.

Being just another HDFS client, the configuration settings for HDFS also apply to HBase. Table  2-1 shows how the HDFS configuration settings pertain to retries and timeouts.

Table 2-1. HDFS Configuration Settings

ipc.client.connect.max. Number of tries a HBase client core-default.xml retries will make to establish a server

connection. Default is 10. For

SASL connections, the setting is

hard coded at 15 and cannot be

reconfigured.

ipc.client.connect.max. Number of tries an HBase client will core-default.xml retries.on.timeouts make on socket timeout to establish

a server connection. Default is 45.

dfs.client.block.write. Number of retries HBase makes to hdfs-default.xml retries write to a datanode before signaling

a failure. Default is 3. After hitting the

failure threshold, the HBase client

reconnects with the NameNode

to get block locations for another

datanode.

dfs.client.socket- The time before which an HBase hdfs-default.xml timeout client trying to establish socket

connection or reading times out.

Default is 60 secs.

dfs.datanode.socket. The time before which a write hdfs-default.xml write.timeout operation times out. Default is 8

CHAPTER 2 APACHE HBASE AND HDFS

Storing Data

Data is written to the actual storage in the following sequence: 1. The client sends an HTable.put(Put) request to the

HRegionServer. The org.apache.hadoop.hbase.client. HTable class is used to communicate with a single HBase

table. The class provides the put(List<Put> puts) and the put(Put put) methods to put some data in a table. The org. apache.hadoop.hbase.client.Put class is used to perform Put operations for a single row.

2. The HRegionServer forwards the request to the matching HRegion.

3. Next, it is ascertained if data is to be written to the WAL, which is represented with the HLog class. Each RegionServer has a HLog object. The Put class method setWriteToWAL(boolean write) , which is deprecated in HBase 0.94, or the

setDurability(Durability d) method is used to set if data is to be written to WAL. Durability is an enum with values listed in Table  2-2 ; Put implements the Mutation abstract class,

which is extended by classes implementing put, delete, and append operations.

Table 2-2. D u r a b ilit y E n u m s

ASYNC_WAL Write the Mutation to the WAL asynchronously as soon as possible. FSYNC_WAL Write the Mutation to the WAL synchronously and force the entries

to disk.

SKIP_WAL Do not write the Mutation to the WAL.

SYNC_WAL Write the Mutation to the WAL synchronously.

The WAL is a standard sequence file ( SequenceFile ) instance. If data is to be written to WAL, it is written to the WAL.

1. After the data is written (or not) to the WAL, it is put in the MemStore . If the MemStore is full, data is flushed to the disk.

2. A separate thread in the HRegionServer writes the data to an HFile in HDFS. A last written sequence number is also saved.

The HFile and HLog files are written in subdirectories of the root directory in HDFS for HBase, the /hbase directory. The HLog files are written to the /hbase/.logs directory. A sub-directory for each HRegionServer is created within the .logs directory. Within the HLog file, a log is written for each HRegion.

CHAPTER 2 APACHE HBASE AND HDFS

For the data files (HFiles), a subdirectory for each HBase table is created in the /hbase directory. Within the directory (ies) corresponding to the tables, subdirectories for regions are created. Each region name is encoded to create the subdirectory for the region. Within the region’s subdirectories, further subdirectories are created for the

column families. Within the directories for the column family, the HFile files are stored. As a result, the directory structure in HDFS for data files is as follows: /hbase/<tablename>/<encoded-regionname>/<column-family>/<filename>

The root of the region directory has the .regioninfo file containing metadata for the region. HBase regions split automatically if the HFile data file’s storage exceeds the limit set by the hbase.hregion.max.filesize setting in the hbase-site.xml/hbase-default.xml configuration file. The default setting for hbase.hregion.max.filesize is 10GB. When the default storage requirement for a region exceeds the hbase.hregion. max.filesize parameter value, the region splits into two and reference files are created in the new regions. The reference files contain information such as the key at which the region was split. The reference files are used to read the original region data files. When compaction is performed, new data files are created in a new region directory and the reference files are removed. The original data files in the original region are also removed.

Each /hbase/table directory also contains a compaction.dir directory, which is used when splitting and compacting regions.

HFile Data files- HFile v1

Up to version 0.20, HBase used the MapFile format. In HBase 0.20, MapFile is replaced by the HFile format. HFile is made of data blocks, which have the actual data, followed by metadata blocks (optional), a fileinfo block, the data block index, the metadata block index, and a fixed size trailer, which records the offsets at which the HFile changes content type. <data blocks><meta blocks><fileinfo><data index><meta index><trailer>

Each block has a “magic” at the start. Blocks are in a key/value format. In data blocks, both the key and the value are a byte array. In metadata blocks, the key is a String and the value is a byte array. An empty file structure is as follows:

<fileinfo><trailer>

The HFile format is based on the TFile binary file format. The HFile format version 1 is shown in Figure  2-4 .

Figure 2-4. HFile file format

CHAPTER 2 APACHE HBASE AND HDFS

The different sections of the HFile are discussed in Table  2-3 . Table 2-3. HFile Sections

Data The data blocks in which the HBase table data is stored. The actual data is stored as key/value pairs. It’s optional, but most likely an HFile has data.

The list of records in a data block are stored in the following format: Key Length: int

Value Length: int

Key: byte[]

Value: byte[]

Meta The metadata blocks. The meta is the metadata for the data stored in HFile. Optional. Meta blocks are writtern upon file close. RegionServer’s StoreFile uses metadata blocks to store a bloom filter, which is used to avoid reading a file if there is no chance that the key is present. A bloom filter just indicates that maybe the key is in the file. The file/s still need to be scanned to find if the key is in the file.

FileInfo FileInfo is written upon file close. FileInfo is a simple Map with key/ value pairs that are both a byte array. RegionServer’s StoreFile uses FileInfo to store Max SequenceId, the major compaction key, and timerange info. Max SequenceId is used to avoid reading a file if the file is too old. Timerange is used to avoid reading a file if the file is too new. Meta information about the HFile such as MAJOR_COMPACTION_KEY , MAX_SEQ_ID_KEY , hfile.AVG_KEY_LEN , hfile.AVG_VALUE_LEN , hfile. COMPARATOR , hfile.LASTKEY

The format of the FileInfo is as follows:

Last Key: byte[]

Avg Key Length: int

Avg Value Length: int

Comparator Class: byte[]

Data Index Records the offset of the data blocks. The format of the data index is as follows:

Block Begin: long

Block Size: int

Block First Key: byte[]

Meta Index Records the offset of the meta blocks. The format of the meta index is as follows:

Block Begin: long

Block Size: int

Trailer Has pointers to the other blocks and is written last. The format of the Trailer is as follows:

FileInfo Offset: long

Data/Meta Index Offset: long

Data/Meta Index Count: long

Total Bytes: long

Entry Count: long

Compression Codec: int

On load, the following sequence is followed:

1. Read the trailer

2. Seek Back to read file info

3. Read the data index

4. Read the meta index

Each data block contains a “magic” header and KeyValue pairs of actual data in plain or compressed form, as shown in Figure  2-5 .

Figure 2-5. A d a ta b lo c k

HBase Blocks

The default block size for an HFile file is 64KB, which is several times less than the minimum HDFS block size of 64MB. A minimum block size between 8KB and 1MB is recommended for the HFile file. If sequential access is the primary objective, a larger block size is preferred. A larger block size is inefficient for random access because more data has to be decompressed. For random access, small block sizes are more efficient but have the following disadvantages:

1. They require more memory to hold the block index

2. They may be slower to create

3. The smallest feasible block size is limited to 20KB-30KB because the compressor stream must be flushed at the end of each data block

CHAPTER 2 APACHE HBASE AND HDFS

The default minimum block size for HFile may be configured in hbase-site.xml with the hfile.min.blocksize.size parameter. Blocks are used for different purpose in HDFS and HBase. In HDFS, blocks are the unit of storage on disk. In HBase, blocks are the unit of storage for memory. Many HBase blocks fit into a single HBase file. HBase is designed to maximize efficiency from the HDFS file system and fully utilize the HDFS block size.

Key Value Format

The org.apache.hadoop.hbase.KeyValue class represents a HBase key/value. The KeyValue format is as follows:

<keylength> <valuelength> <key> <value>

A KeyValue is a low-level byte array structure made of the sections shown in Figure  2-6 .

Figure 2-6. A KeyValue

The key has the following format:

<rowlength> <row> <columnfamilylength> <columnfamily> <columnqualifier> <timestamp> <keytype>