Table of Contents
INTRODUCTION
WHO IS THIS BOOK FOR?
WHAT YOU NEED TO USE THIS BOOK
HOW THIS BOOK IS STRUCTURED
CONVENTIONS
SOURCE CODE
ERRATA
P2P.WROX.COM
CHAPTER 1: HADOOP INTRODUCTION
What Is Zookeeper?
What Is Hive?
Integration with Other Systems
Summary
CHAPTER 2: STORAGE
Basics of Hadoop HDFS
Setting Up the HDFS Cluster in Distributed Mode
Advanced Features of HDFS
File Format
Cloud Storage
Summary
CHAPTER 3: COMPUTATION
Basics of Hadoop MapReduce
How to Launch a MapReduce Job
Advanced Features of MapReduce
The Difference from a Spark Job
Summary
CHAPTER 4: USER EXPERIENCE
Apache Hive
Apache Pig
UDF
Hue
Apache Oozie
Summary
CHAPTER 5: INTEGRATION WITH OTHER SYSTEMS
Apache Sqoop
Apache Flume
Apache Kafka
Apache Storm
Summary
CHAPTER 6: HADOOP SECURITY
Securing the Hadoop Cluster
Securing Data
Securing Applications
Summary
CHAPTER 7: ECOSYSTEM AT LARGE: HADOOP WITH APACHE BIGTOP
Basics Concepts
Developing a Custom-Tailored Stack
Deployment
Integration Validation
Putting It All Together
Summary
CHAPTER 8: IN-MEMORY COMPUTING IN HADOOP STACK
Introduction to In-Memory Computing
Apache Ignite: Memory First
Legacy Hadoop Acceleration with Ignite
Advanced Use of Apache Ignite
Summary
GLOSSARY
End User License Agreement
Pages
vii
viii
ix
xi
xiii
xix
xx
xxi
xxii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
45
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
183
185
186
Guide
Table of Contents
Begin Reading
List of Illustrations
Chapter 1: Hadoop Introduction
Figure 1.1
Figure 1.2
Figure 1.3
Figure 1.4
Figure 1.5
Figure 1.6
Figure 1.7
Figure 1.8
Figure 1.9
Figure 1.10
Figure 1.11
Chapter 2: Storage
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Figure 2.6
Figure 2.7
Figure 2.8
Figure 2.9
Figure 2.10
Chapter 3: Computation
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 3.7
Figure 3.8
Chapter 4: User Experience
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
Figure 4.7
Chapter 5: Integration with Other Systems
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4
Figure 5.5
Figure 5.6
Figure 5.7
Figure 5.8
Figure 5.9
Figure 5.10
Chapter 6: Hadoop Security
Figure 6.1
Figure 6.2
Figure 6.3
Figure 6.4
Chapter 7: Ecosystem at Large: Hadoop with Apache Bigtop
Figure 7.1
Chapter 8: In-Memory Computing in Hadoop Stack
Figure 8.1
Figure 8.2
Figure 8.3
Figure 8.4
Figure 8.5
Figure 8.6
Figure 8.7
Figure 8.8
List of Tables
Chapter 2: Storage
Table 2.1 Read Operations
Table 2.2 Write Operations
Table 2.3 Other Operations
Table 2.4 NameNode Daemon Configurations
Table 2.5 DataNode Daemon Configurations
Table 2.6 snapshotDiff Modification Types
Table 2.7 Policy Details
Table 2.8 Reed-Solomon in Erasure Coding
Chapter 3: Computation
Table 3.1 APIs for MapReduce information
Table 3.2 Differences between Hadoop MapReduce and Spark job
Chapter 4: User Experience
Table 4.1 Batch mode command line options
Table 4.2 Hive interactive shell mode properties
Table 4.3 Built in functions
Table 4.4 Batch mode on the command line
Table 4.5 Interactive shell mode commands
Table 4.6 Frequently used operators
Chapter 5: Integration with Other Systems
Table 5.1 Common Flume Agent Components
Chapter 6: Hadoop Security
Table 6.1 Node types
Table 6.2 Service Authorization policies
Table 6.3 core-site.xml properties
Table 6.4 core-site.xml for SSL properties
Table 6.5 ssl-server.xml keystore and truststore properties
Table 6.6 Hadoop.rpc.protection properties
Table 6.7 dfs-site.xml properties
Table 6.8 Security delegation token properties
Chapter 7: Ecosystem at Large: Hadoop with Apache Bigtop
Table 7.1 Extended components make up a Hadoop proper stack
Chapter 8: In-Memory Computing in Hadoop Stack
Table 8.1 Cache creating modes