[Sams Teach Yourself 01] • Hadoop in 24 Hours, Sams Teach Yourself by Aven, Jeffrey -- Read -- Imperial Library of Trantor

Index

About This E-Book Title Page Copyright Page Contents at a glance Table of Contents Preface About the Author Acknowledgments Part I: Getting Started with Hadoop

Hour 1: Introducing Hadoop

Hadoop and a Brief History of Big Data Hadoop Explained The Commercial Hadoop Landscape Typical Hadoop Use Cases Summary Q&A Workshop

Hour 2: Understanding the Hadoop Cluster Architecture

HDFS Cluster Processes YARN Cluster Processes Hadoop Cluster Architecture and Deployment Modes Summary Q&A Workshop

Hour 3: Deploying Hadoop

Installation Platforms and Prerequisites Installing Hadoop Deploying Hadoop in the Cloud Summary Q&A Workshop

Hour 4: Understanding the Hadoop Distributed File System (HDFS)

HDFS Overview Review of the HDFS Roles NameNode Metadata SecondaryNameNode Role Interacting with HDFS Summary Q&A Workshop

Hour 5: Getting Data into Hadoop

Data Ingestion Using Apache Flume Ingesting Data from a Database using Sqoop Data Ingestion Using HDFS RESTful Interfaces Data Ingestion Considerations Summary Q&A Workshop

Hour 6: Understanding Data Processing in Hadoop

Introduction to MapReduce MapReduce Explained Word Count: The “Hello, World” of MapReduce MapReduce in Hadoop Summary Q&A Workshop

Part II: Using Hadoop

Hour 7: Programming MapReduce Applications

Introducing the Java MapReduce API Writing a MapReduce Program in Java Advanced MapReduce API Concepts Using the MapReduce Streaming API Summary Q&A Workshop

Hour 8: Analyzing Data in HDFS Using Apache Pig

Introducing Pig Pig Latin Basics Loading Data into Pig Filtering, Projecting, and Sorting Data using Pig Built-in Functions in Pig Summary Q&A Workshop

Hour 9: Using Advanced Pig

Grouping Data in Pig Multiple Dataset Processing in Pig User-Defined Functions in Pig Automating Pig Using Macros and Variables Summary Q&A Workshop

Hour 10: Analyzing Data Using Apache Hive

Introducing Hive Creating Hive Objects Analyzing Data with Hive Data Output with Hive Summary Q&A Workshop

Hour 11: Using Advanced Hive

Automating Hive Complex Datatypes in Hive Text Processing Using Hive Optimizing and Managing Queries in Hive Summary Q&A Workshop

Hour 12: Using SQL-on-Hadoop Solutions

What Is SQL on Hadoop? Columnar Storage in Hadoop Introduction to Impala Introduction to Tez Introduction to HAWQ and Drill Summary Q&A Workshop

Hour 13: Introducing Apache Spark

Introducing Spark Spark Architecture Resilient Distributed Datasets in Spark Transformations and Actions in Spark Extensions to Spark Summary Q&A Workshop

Hour 14: Using the Hadoop User Environment (HUE)

Introducing HUE Installing, Configuring and Using HUE Summary Q&A Workshop

Hour 15: Introducing NoSQL

Introduction to NoSQL Introducing HBase Introducing Apache Cassandra Other NoSQL Implementations and the Future of NoSQL Summary Q&A Workshop

Part III: Managing Hadoop

Hour 16: Managing YARN

YARN Revisited Administering YARN Application Scheduling in YARN Summary Q&A Workshop

Hour 17: Working with the Hadoop Ecosystem

Hadoop Ecosystem Overview Introduction to Oozie Stream Processing and Messaging in Hadoop Infrastructure and Security Projects Machine Learning, Visualization, and More Data Analysis Tools Summary Q&A Workshop

Hour 18: Using Cluster Management Utilities

Cluster Management Overview Deploying Clusters and Services Using Management Tools Configuration and Service Management Using Management Tools Monitoring, Troubleshooting, and Securing Hadoop Clusters Using Cluster Management Utilities Getting Started with the Cluster Management Utilities Summary Q&A Workshop

Hour 19: Scaling Hadoop

Linear Scalability with Hadoop Adding Nodes to your Hadoop Cluster Decommissioning Nodes from your Cluster Rebalancing a Hadoop Cluster Benchmarking Hadoop Summary Q&A Workshop

Hour 20: Understanding Cluster Configuration

Configuration in Hadoop HDFS Configuration Parameters YARN Configuration Parameters Ecosystem Component Configuration Summary Q&A Workshop

Hour 21: Understanding Advanced HDFS

HDFS Rack Awareness HDFS High Availability HDFS Federation HDFS Caching, Snapshotting, and Archiving Summary Q&A Workshop

Hour 22: Securing Hadoop

Hadoop Security Basics Securing Hadoop with Kerberos Perimeter Security Using Apache Knox Role-Based Access Control Using Ranger and Sentry Summary Q&A Workshop

Hour 23: Administering, Monitoring and Troubleshooting Hadoop

Administering Hadoop Troubleshooting Hadoop System and Application Monitoring in Hadoop Best Practices and Other Information Sources Summary Q&A Workshop

Hour 24: Integrating Hadoop into the Enterprise

Hadoop and the Data Center Use Case: Data Warehouse/ETL Offload Use Case: Event Storage and Processing Use Case: Predictive Analytics Summary Q&A Workshop

Index Code Snippets

← Prev
Back
Next →

← Prev
Back
Next →