Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
HDInsight Essentials
Table of Contents HDInsight Essentials Credits About the Author About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe? Free Access for Packt account holders
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. Hadoop and HDInsight in a Heartbeat
Big Data – hype or real? Apache Hadoop concepts
Core components Hadoop cluster layout The Hadoop ecosystem
Data access Data processing The Hadoop data store Management and integration
Hadoop distributions HDInsight distribution differentiator End-to-end solution using HDInsight
Key phases of a Hadoop project
Stage 1 – collect data Stage 2a – process your data (build MapReduce) Stage 2b – process your data (execute MapReduce) Stage 3 – analyze data using JavaScript and Pig Stage 4 – report data using JavaScript charts
Summary
2. Deploying HDInsight on Premise
HDInsight and Hadoop relationship Deployment options for on-premise
Windows HDInsight server Hortonworks Data Platform (HDP for Windows) Supported platforms for on-premise install
Single-node install
Downloading the software Running the install wizard Validating the install
Multinode planning and preparation
Setting up the network Setting common time on all nodes Setting up remote scripting Configuring firewall ports
Multinode installation
Downloading the software Configuring the multinode install Running the installer Validating the install
Managing HDInsight services Uninstalling HDInsight Summary
3. HDInsight Azure Cloud Service
HDInsight Service on Azure
Considerations for Azure HDInsight Service
Provision your cluster HDInsight management dashboard Verify the cluster and run sample jobs
Access HDFS Deploy and execute the sample MapReduce job View job results
Monitor your cluster Azure storage integration Remove your cluster
Delete your cluster Delete your storage Restore your cluster
Summary
4. Administering Your HDInsight Cluster
Cluster status Distributed filesystem health
NameNode URL Browsing HDFS
MapReduce health
MapReduce summary MapReduce Job History
Key files
Backing up NameNode content
Summary
5. Ingesting Data to Your Cluster
Loading data using Hadoop commands
Step 1 – connect to a Hadoop client Step 2 – get your files on local storage Step 3 – upload to HDFS
Loading data using Azure Storage Vault (ASV)
Storage access keys Storage tools Azure Storage Explorer
Registering your storage account Uploading files to your blob storage
Loading data using interactive JavaScript Shipping data to Azure Loading data using Sqoop
Key benefits Two modes of using Sqoop Using Sqoop to import (SQL to Hadoop)
Summary
6. Transforming Data in Cluster
Transformation scenario
Scenario Transformation objective File organization
MapReduce solution
Design Map code Reduce code Driver code Compiling and packaging the code Executing MapReduce Results verification
Hive solution
Overview of Hive Starting Hive in the HDInsight node Step 1 – table creation Step 2 – table loading Step 3 – summary table creation Step 4 – verifying the summary table
Pig solution
Pig architecture Pig or Hive? Starting Pig in the HDInsight node Pig Grunt script
Code Code explanation Execution Verification
Summary
7. Analyzing and Reporting Your Data
Analyzing and reporting using Excel
Step 1 – installing the Hive ODBC driver Step 2 – creating Hive ODBC data source Step 3 – importing data to Excel
Hive for ad hoc queries
Creating reference tables Ad hoc queries Analytic functions in HiveQL
Interactive JavaScript for analysis and reporting Other business intelligence tools Summary
8. Project Planning Tips and Resources
Architectural considerations
Extensible and modular Metadata-driven solution Integration strategy Security
Project planning
Proof of Concept Production implementation Reference sites and blogs
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion