Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
About This E-Book
Title Page
Copyright Page
Contents at a glance
Table of Contents
Preface
About the Author
Acknowledgments
Part I: Getting Started with Hadoop
Hour 1: Introducing Hadoop
Hadoop and a Brief History of Big Data
Hadoop Explained
The Commercial Hadoop Landscape
Typical Hadoop Use Cases
Summary
Q&A
Workshop
Hour 2: Understanding the Hadoop Cluster Architecture
HDFS Cluster Processes
YARN Cluster Processes
Hadoop Cluster Architecture and Deployment Modes
Summary
Q&A
Workshop
Hour 3: Deploying Hadoop
Installation Platforms and Prerequisites
Installing Hadoop
Deploying Hadoop in the Cloud
Summary
Q&A
Workshop
Hour 4: Understanding the Hadoop Distributed File System (HDFS)
HDFS Overview
Review of the HDFS Roles
NameNode Metadata
SecondaryNameNode Role
Interacting with HDFS
Summary
Q&A
Workshop
Hour 5: Getting Data into Hadoop
Data Ingestion Using Apache Flume
Ingesting Data from a Database using Sqoop
Data Ingestion Using HDFS RESTful Interfaces
Data Ingestion Considerations
Summary
Q&A
Workshop
Hour 6: Understanding Data Processing in Hadoop
Introduction to MapReduce
MapReduce Explained
Word Count: The “Hello, World” of MapReduce
MapReduce in Hadoop
Summary
Q&A
Workshop
Part II: Using Hadoop
Hour 7: Programming MapReduce Applications
Introducing the Java MapReduce API
Writing a MapReduce Program in Java
Advanced MapReduce API Concepts
Using the MapReduce Streaming API
Summary
Q&A
Workshop
Hour 8: Analyzing Data in HDFS Using Apache Pig
Introducing Pig
Pig Latin Basics
Loading Data into Pig
Filtering, Projecting, and Sorting Data using Pig
Built-in Functions in Pig
Summary
Q&A
Workshop
Hour 9: Using Advanced Pig
Grouping Data in Pig
Multiple Dataset Processing in Pig
User-Defined Functions in Pig
Automating Pig Using Macros and Variables
Summary
Q&A
Workshop
Hour 10: Analyzing Data Using Apache Hive
Introducing Hive
Creating Hive Objects
Analyzing Data with Hive
Data Output with Hive
Summary
Q&A
Workshop
Hour 11: Using Advanced Hive
Automating Hive
Complex Datatypes in Hive
Text Processing Using Hive
Optimizing and Managing Queries in Hive
Summary
Q&A
Workshop
Hour 12: Using SQL-on-Hadoop Solutions
What Is SQL on Hadoop?
Columnar Storage in Hadoop
Introduction to Impala
Introduction to Tez
Introduction to HAWQ and Drill
Summary
Q&A
Workshop
Hour 13: Introducing Apache Spark
Introducing Spark
Spark Architecture
Resilient Distributed Datasets in Spark
Transformations and Actions in Spark
Extensions to Spark
Summary
Q&A
Workshop
Hour 14: Using the Hadoop User Environment (HUE)
Introducing HUE
Installing, Configuring and Using HUE
Summary
Q&A
Workshop
Hour 15: Introducing NoSQL
Introduction to NoSQL
Introducing HBase
Introducing Apache Cassandra
Other NoSQL Implementations and the Future of NoSQL
Summary
Q&A
Workshop
Part III: Managing Hadoop
Hour 16: Managing YARN
YARN Revisited
Administering YARN
Application Scheduling in YARN
Summary
Q&A
Workshop
Hour 17: Working with the Hadoop Ecosystem
Hadoop Ecosystem Overview
Introduction to Oozie
Stream Processing and Messaging in Hadoop
Infrastructure and Security Projects
Machine Learning, Visualization, and More Data Analysis Tools
Summary
Q&A
Workshop
Hour 18: Using Cluster Management Utilities
Cluster Management Overview
Deploying Clusters and Services Using Management Tools
Configuration and Service Management Using Management Tools
Monitoring, Troubleshooting, and Securing Hadoop Clusters Using Cluster Management Utilities
Getting Started with the Cluster Management Utilities
Summary
Q&A
Workshop
Hour 19: Scaling Hadoop
Linear Scalability with Hadoop
Adding Nodes to your Hadoop Cluster
Decommissioning Nodes from your Cluster
Rebalancing a Hadoop Cluster
Benchmarking Hadoop
Summary
Q&A
Workshop
Hour 20: Understanding Cluster Configuration
Configuration in Hadoop
HDFS Configuration Parameters
YARN Configuration Parameters
Ecosystem Component Configuration
Summary
Q&A
Workshop
Hour 21: Understanding Advanced HDFS
HDFS Rack Awareness
HDFS High Availability
HDFS Federation
HDFS Caching, Snapshotting, and Archiving
Summary
Q&A
Workshop
Hour 22: Securing Hadoop
Hadoop Security Basics
Securing Hadoop with Kerberos
Perimeter Security Using Apache Knox
Role-Based Access Control Using Ranger and Sentry
Summary
Q&A
Workshop
Hour 23: Administering, Monitoring and Troubleshooting Hadoop
Administering Hadoop
Troubleshooting Hadoop
System and Application Monitoring in Hadoop
Best Practices and Other Information Sources
Summary
Q&A
Workshop
Hour 24: Integrating Hadoop into the Enterprise
Hadoop and the Data Center
Use Case: Data Warehouse/ETL Offload
Use Case: Event Storage and Processing
Use Case: Predictive Analytics
Summary
Q&A
Workshop
Index
Code Snippets
← Prev
Back
Next →
← Prev
Back
Next →