Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Hadoop Operations
Dedication
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
1. Introduction
2. HDFS
Goals and Motivation
Design
Daemons
Reading and Writing Data
The Read Path
The Write Path
Managing Filesystem Metadata
Namenode High Availability
Namenode Federation
Access and Integration
Command-Line Tools
FUSE
REST Support
3. MapReduce
The Stages of MapReduce
Introducing Hadoop MapReduce
Daemons
Jobtracker
Tasktracker
When It All Goes Wrong
Child task failures
Tasktracker/worker node failures
Jobtracker failures
HDFS failures
YARN
4. Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop
Apache Hadoop
Cloudera’s Distribution Including Apache Hadoop
Versions and Features
What Should I Use?
Hardware Selection
Master Hardware Selection
Namenode considerations
Secondary namenode hardware
Jobtracker hardware
Worker Hardware Selection
Cluster Sizing
Blades, SANs, and Virtualization
Operating System Selection and Preparation
Deployment Layout
Software
Hostnames, DNS, and Identification
Users, Groups, and Privileges
Kernel Tuning
vm.swappiness
vm.overcommit_memory
Disk Configuration
Choosing a Filesystem
ext3
ext4
xfs
Mount Options
Network Design
Network Usage in Hadoop: A Review
HDFS
MapReduce
1 Gb versus 10 Gb Networks
Typical Network Topologies
Traditional tree
Spine fabric
5. Installation and Configuration
Installing Hadoop
Apache Hadoop
Tarball installation
Package installation
CDH
Configuration: An Overview
The Hadoop XML Configuration Files
Environment Variables and Shell Scripts
Logging Configuration
HDFS
Identification and Location
Optimization and Tuning
Formatting the Namenode
Creating a /tmp Directory
Namenode High Availability
Fencing Options
Basic Configuration
Automatic Failover Configuration
Initialzing ZooKeeper State
Format and Bootstrap the Namenodes
Namenode Federation
MapReduce
Identification and Location
Optimization and Tuning
Rack Topology
Security
6. Identity, Authentication, and Authorization
Identity
Kerberos and Hadoop
Kerberos: A Refresher
Kerberos Support in Hadoop
Configuring Hadoop security
Authorization
HDFS
MapReduce
Other Tools and Systems
Apache Hive
Apache HBase
Apache Oozie
Hue
Apache Sqoop
Apache Flume
Apache ZooKeeper
Apache Pig, Cascading, and Crunch
Tying It Together
7. Resource Management
What Is Resource Management?
HDFS Quotas
MapReduce Schedulers
The FIFO Scheduler
Configuration
The Fair Scheduler
Configuration
The Capacity Scheduler
Configuration
The Future
8. Cluster Maintenance
Managing Hadoop Processes
Starting and Stopping Processes with Init Scripts
Starting and Stopping Processes Manually
HDFS Maintenance Tasks
Adding a Datanode
Decommissioning a Datanode
Checking Filesystem Integrity with fsck
Balancing HDFS Block Data
Dealing with a Failed Disk
MapReduce Maintenance Tasks
Adding a Tasktracker
Decommissioning a Tasktracker
Killing a MapReduce Job
Killing a MapReduce Task
Dealing with a Blacklisted Tasktracker
9. Troubleshooting
Differential Diagnosis Applied to Systems
Common Failures and Problems
Humans (You)
Misconfiguration
Hardware Failure
Resource Exhaustion
Host Identification and Naming
Network Partitions
“Is the Computer Plugged In?”
E-SPORE
Treatment and Care
War Stories
A Mystery Bottleneck
There’s No Place Like 127.0.0.1
10. Monitoring
An Overview
Hadoop Metrics
Apache Hadoop 0.20.0 and CDH3 (metrics1)
JMX Support
REST Interface
Using the metrics servlet
Using the JMX JSON servlet
Apache Hadoop 0.20.203 and Later, and CDH4 (metrics2)
What about SNMP?
Health Monitoring
Host-Level Checks
All Hadoop Processes
HDFS Checks
MapReduce Checks
11. Backup and Recovery
Data Backup
Distributed Copy (distcp)
Parallel Data Ingestion
Namenode Metadata
A. Deprecated Configuration Properties
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright
← Prev
Back
Next →
← Prev
Back
Next →