HDInsight Essentials by Nadipalli, Rajesh -- Read -- Imperial Library of Trantor

Index

HDInsight Essentials

Table of Contents HDInsight Essentials Credits About the Author About the Reviewers www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe? Free Access for Packt account holders

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy Questions

1. Hadoop and HDInsight in a Heartbeat

Big Data – hype or real? Apache Hadoop concepts

Core components Hadoop cluster layout The Hadoop ecosystem

Data access Data processing The Hadoop data store Management and integration

Hadoop distributions HDInsight distribution differentiator End-to-end solution using HDInsight

Key phases of a Hadoop project

Stage 1 – collect data Stage 2a – process your data (build MapReduce) Stage 2b – process your data (execute MapReduce) Stage 3 – analyze data using JavaScript and Pig Stage 4 – report data using JavaScript charts

Summary

2. Deploying HDInsight on Premise

HDInsight and Hadoop relationship Deployment options for on-premise

Windows HDInsight server Hortonworks Data Platform (HDP for Windows) Supported platforms for on-premise install

Single-node install

Downloading the software Running the install wizard Validating the install

Multinode planning and preparation

Setting up the network Setting common time on all nodes Setting up remote scripting Configuring firewall ports

Multinode installation

Downloading the software Configuring the multinode install Running the installer Validating the install

Managing HDInsight services Uninstalling HDInsight Summary

3. HDInsight Azure Cloud Service

HDInsight Service on Azure

Considerations for Azure HDInsight Service

Provision your cluster HDInsight management dashboard Verify the cluster and run sample jobs

Access HDFS Deploy and execute the sample MapReduce job View job results

Monitor your cluster Azure storage integration Remove your cluster

Delete your cluster Delete your storage Restore your cluster

Summary

4. Administering Your HDInsight Cluster

Cluster status Distributed filesystem health

NameNode URL Browsing HDFS

MapReduce health

MapReduce summary MapReduce Job History

Key files

Backing up NameNode content

Summary

5. Ingesting Data to Your Cluster

Loading data using Hadoop commands

Step 1 – connect to a Hadoop client Step 2 – get your files on local storage Step 3 – upload to HDFS

Loading data using Azure Storage Vault (ASV)

Storage access keys Storage tools Azure Storage Explorer

Registering your storage account Uploading files to your blob storage

Loading data using interactive JavaScript Shipping data to Azure Loading data using Sqoop

Key benefits Two modes of using Sqoop Using Sqoop to import (SQL to Hadoop)

Summary

6. Transforming Data in Cluster

Transformation scenario

Scenario Transformation objective File organization

MapReduce solution

Design Map code Reduce code Driver code Compiling and packaging the code Executing MapReduce Results verification

Hive solution

Overview of Hive Starting Hive in the HDInsight node Step 1 – table creation Step 2 – table loading Step 3 – summary table creation Step 4 – verifying the summary table

Pig solution

Pig architecture Pig or Hive? Starting Pig in the HDInsight node Pig Grunt script

Code Code explanation Execution Verification

Summary

7. Analyzing and Reporting Your Data

Analyzing and reporting using Excel

Step 1 – installing the Hive ODBC driver Step 2 – creating Hive ODBC data source Step 3 – importing data to Excel

Hive for ad hoc queries

Creating reference tables Ad hoc queries Analytic functions in HiveQL

Interactive JavaScript for analysis and reporting Other business intelligence tools Summary

8. Project Planning Tips and Resources

Architectural considerations

Extensible and modular Metadata-driven solution Integration strategy Security

Project planning

Proof of Concept Production implementation Reference sites and blogs

Summary

Index

← Prev
Back
Next →

← Prev
Back
Next →