Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
HDInsight Essentials
Table of Contents
HDInsight Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Hadoop and HDInsight in a Heartbeat
Big Data – hype or real?
Apache Hadoop concepts
Core components
Hadoop cluster layout
The Hadoop ecosystem
Data access
Data processing
The Hadoop data store
Management and integration
Hadoop distributions
HDInsight distribution differentiator
End-to-end solution using HDInsight
Key phases of a Hadoop project
Stage 1 – collect data
Stage 2a – process your data (build MapReduce)
Stage 2b – process your data (execute MapReduce)
Stage 3 – analyze data using JavaScript and Pig
Stage 4 – report data using JavaScript charts
Summary
2. Deploying HDInsight on Premise
HDInsight and Hadoop relationship
Deployment options for on-premise
Windows HDInsight server
Hortonworks Data Platform (HDP for Windows)
Supported platforms for on-premise install
Single-node install
Downloading the software
Running the install wizard
Validating the install
Multinode planning and preparation
Setting up the network
Setting common time on all nodes
Setting up remote scripting
Configuring firewall ports
Multinode installation
Downloading the software
Configuring the multinode install
Running the installer
Validating the install
Managing HDInsight services
Uninstalling HDInsight
Summary
3. HDInsight Azure Cloud Service
HDInsight Service on Azure
Considerations for Azure HDInsight Service
Provision your cluster
HDInsight management dashboard
Verify the cluster and run sample jobs
Access HDFS
Deploy and execute the sample MapReduce job
View job results
Monitor your cluster
Azure storage integration
Remove your cluster
Delete your cluster
Delete your storage
Restore your cluster
Summary
4. Administering Your HDInsight Cluster
Cluster status
Distributed filesystem health
NameNode URL
Browsing HDFS
MapReduce health
MapReduce summary
MapReduce Job History
Key files
Backing up NameNode content
Summary
5. Ingesting Data to Your Cluster
Loading data using Hadoop commands
Step 1 – connect to a Hadoop client
Step 2 – get your files on local storage
Step 3 – upload to HDFS
Loading data using Azure Storage Vault (ASV)
Storage access keys
Storage tools
Azure Storage Explorer
Registering your storage account
Uploading files to your blob storage
Loading data using interactive JavaScript
Shipping data to Azure
Loading data using Sqoop
Key benefits
Two modes of using Sqoop
Using Sqoop to import (SQL to Hadoop)
Summary
6. Transforming Data in Cluster
Transformation scenario
Scenario
Transformation objective
File organization
MapReduce solution
Design
Map code
Reduce code
Driver code
Compiling and packaging the code
Executing MapReduce
Results verification
Hive solution
Overview of Hive
Starting Hive in the HDInsight node
Step 1 – table creation
Step 2 – table loading
Step 3 – summary table creation
Step 4 – verifying the summary table
Pig solution
Pig architecture
Pig or Hive?
Starting Pig in the HDInsight node
Pig Grunt script
Code
Code explanation
Execution
Verification
Summary
7. Analyzing and Reporting Your Data
Analyzing and reporting using Excel
Step 1 – installing the Hive ODBC driver
Step 2 – creating Hive ODBC data source
Step 3 – importing data to Excel
Hive for ad hoc queries
Creating reference tables
Ad hoc queries
Analytic functions in HiveQL
Interactive JavaScript for analysis and reporting
Other business intelligence tools
Summary
8. Project Planning Tips and Resources
Architectural considerations
Extensible and modular
Metadata-driven solution
Integration strategy
Security
Project planning
Proof of Concept
Production implementation
Reference sites and blogs
Summary
Index
← Prev
Back
Next →
← Prev
Back
Next →