Hadoop Operations by Sammer, Eric -- Read -- Imperial Library of Trantor

Index

Hadoop Operations Dedication SPECIAL OFFER: Upgrade this ebook with O’Reilly Preface

Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us Acknowledgments

1. Introduction 2. HDFS

Goals and Motivation Design Daemons Reading and Writing Data

The Read Path The Write Path

Managing Filesystem Metadata Namenode High Availability Namenode Federation Access and Integration

Command-Line Tools FUSE REST Support

3. MapReduce

The Stages of MapReduce Introducing Hadoop MapReduce

Daemons

Jobtracker Tasktracker

When It All Goes Wrong

Child task failures Tasktracker/worker node failures Jobtracker failures HDFS failures

YARN

4. Planning a Hadoop Cluster

Picking a Distribution and Version of Hadoop

Apache Hadoop Cloudera’s Distribution Including Apache Hadoop Versions and Features What Should I Use?

Hardware Selection

Master Hardware Selection

Namenode considerations Secondary namenode hardware Jobtracker hardware

Worker Hardware Selection Cluster Sizing Blades, SANs, and Virtualization

Operating System Selection and Preparation

Deployment Layout Software Hostnames, DNS, and Identification Users, Groups, and Privileges

Kernel Tuning

vm.swappiness vm.overcommit_memory

Disk Configuration

Choosing a Filesystem

ext3 ext4 xfs

Mount Options

Network Design

Network Usage in Hadoop: A Review

HDFS MapReduce

1 Gb versus 10 Gb Networks Typical Network Topologies

Traditional tree Spine fabric

5. Installation and Configuration

Installing Hadoop

Apache Hadoop

Tarball installation Package installation

CDH

Configuration: An Overview

The Hadoop XML Configuration Files

Environment Variables and Shell Scripts Logging Configuration HDFS

Identification and Location Optimization and Tuning Formatting the Namenode Creating a /tmp Directory

Namenode High Availability

Fencing Options Basic Configuration Automatic Failover Configuration

Initialzing ZooKeeper State

Format and Bootstrap the Namenodes

Namenode Federation MapReduce

Identification and Location Optimization and Tuning

Rack Topology Security

6. Identity, Authentication, and Authorization

Identity Kerberos and Hadoop

Kerberos: A Refresher Kerberos Support in Hadoop

Configuring Hadoop security

Authorization

HDFS MapReduce Other Tools and Systems

Apache Hive Apache HBase Apache Oozie Hue Apache Sqoop Apache Flume Apache ZooKeeper Apache Pig, Cascading, and Crunch

Tying It Together

7. Resource Management

What Is Resource Management? HDFS Quotas MapReduce Schedulers

The FIFO Scheduler

Configuration

The Fair Scheduler

Configuration

The Capacity Scheduler

Configuration

The Future

8. Cluster Maintenance

Managing Hadoop Processes

Starting and Stopping Processes with Init Scripts Starting and Stopping Processes Manually

HDFS Maintenance Tasks

Adding a Datanode Decommissioning a Datanode Checking Filesystem Integrity with fsck Balancing HDFS Block Data Dealing with a Failed Disk

MapReduce Maintenance Tasks

Adding a Tasktracker Decommissioning a Tasktracker Killing a MapReduce Job Killing a MapReduce Task Dealing with a Blacklisted Tasktracker

9. Troubleshooting

Differential Diagnosis Applied to Systems Common Failures and Problems

Humans (You) Misconfiguration Hardware Failure Resource Exhaustion Host Identification and Naming Network Partitions

“Is the Computer Plugged In?”

E-SPORE

Treatment and Care War Stories

A Mystery Bottleneck There’s No Place Like 127.0.0.1

10. Monitoring

An Overview Hadoop Metrics

Apache Hadoop 0.20.0 and CDH3 (metrics1)

JMX Support REST Interface

Using the metrics servlet Using the JMX JSON servlet

Apache Hadoop 0.20.203 and Later, and CDH4 (metrics2) What about SNMP?

Health Monitoring

Host-Level Checks All Hadoop Processes HDFS Checks MapReduce Checks

11. Backup and Recovery

Data Backup

Distributed Copy (distcp) Parallel Data Ingestion

Namenode Metadata

A. Deprecated Configuration Properties Index About the Author Colophon SPECIAL OFFER: Upgrade this ebook with O’Reilly Copyright

← Prev
Back
Next →

← Prev
Back
Next →