Mastering Apache Storm by Jain, Ankit -- Read -- Imperial Library of Trantor

Index

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Downloading the color images of this book Errata Piracy Questions

Real-Time Processing and Storm Introduction

Apache Storm Features of Storm Storm components

Nimbus Supervisor nodes The ZooKeeper cluster

The Storm data model

Definition of a Storm topology Operation modes in Storm

Programming languages Summary

Storm Deployment, Topology Development, and Topology Options

Storm prerequisites

Installing Java SDK 7 Deployment of the ZooKeeper cluster

Setting up the Storm cluster Developing the hello world example The different options of the Storm topology

Deactivate Activate Rebalance Kill Dynamic log level settings

Walkthrough of the Storm UI

Cluster Summary section Nimbus Summary section Supervisor Summary section Nimbus Configuration section Topology Summary section

Dynamic log level settings

Updating the log level from the Storm UI Updating the log level from the Storm CLI

Summary

Storm Parallelism and Data Partitioning

Parallelism of a topology

Worker process Executor Task Configure parallelism at the code level Worker process, executor, and task distribution

Rebalance the parallelism of a topology

Rebalance the parallelism of a SampleStormClusterTopology topology

Different types of stream grouping in the Storm cluster

Shuffle grouping Field grouping All grouping Global grouping Direct grouping Local or shuffle grouping None grouping Custom grouping

Guaranteed message processing Tick tuple Summary

Trident Introduction

Trident introduction Understanding Trident's data model Writing Trident functions, filters, and projections

Trident function Trident filter Trident projection

Trident repartitioning operations

Utilizing shuffle operation Utilizing partitionBy operation Utilizing global operation Utilizing broadcast operation Utilizing batchGlobal operation Utilizing partition operation

Trident aggregator

partitionAggregate aggregate

ReducerAggregator Aggregator CombinerAggregator

persistentAggregate Aggregator chaining

Utilizing the groupBy operation When to use Trident Summary

Trident Topology and Uses

Trident groupBy operation

groupBy before partitionAggregate groupBy before aggregate

Non-transactional topology Trident hello world topology Trident state Distributed RPC When to use Trident Summary

Storm Scheduler

Introduction to Storm scheduler Default scheduler Isolation scheduler Resource-aware scheduler

Component-level configuration Memory usage example CPU usage example Worker-level configuration Node-level configuration Global component configuration

Custom scheduler

Configuration changes in the supervisor node Configuration setting at component level Writing a custom supervisor class Converting component IDs to executors Converting supervisors to slots Registering a CustomScheduler class

Summary

Monitoring of Storm Cluster

Cluster statistics using the Nimbus thrift client

Fetching information with Nimbus thrift

Monitoring the Storm cluster using JMX Monitoring the Storm cluster using Ganglia Summary

Integration of Storm and Kafka

Introduction to Kafka Kafka architecture

Producer Replication Consumer Broker Data retention

Installation of Kafka brokers

Setting up a single node Kafka cluster Setting up a three node Kafka cluster

Multiple Kafka brokers on a single node

Share ZooKeeper between Storm and Kafka Kafka producers and publishing data into Kafka Kafka Storm integration Deploy the Kafka topology on Storm cluster Summary

Storm and Hadoop Integration

Introduction to Hadoop

Hadoop Common Hadoop Distributed File System

Namenode Datanode HDFS client Secondary namenode

YARN

ResourceManager (RM) NodeManager (NM) ApplicationMaster (AM)

Installation of Hadoop

Setting passwordless SSH Getting the Hadoop bundle and setting up environment variables Setting up HDFS Setting up YARN

Write Storm topology to persist data into HDFS Integration of Storm with Hadoop Setting up Storm-YARN Storm-Starter topologies on Storm-YARN Summary

Storm Integration with Redis, Elasticsearch, and HBase

Integrating Storm with HBase Integrating Storm with Redis Integrating Storm with Elasticsearch Integrating Storm with Esper Summary

Apache Log Processing with Storm

Apache log processing elements Producing Apache log in Kafka using Logstash

Installation of Logstash

What is Logstash? Why are we using Logstash? Installation of Logstash Configuration of Logstash

Why are we using Kafka between Logstash and Storm?

Splitting the Apache log line Identifying country, operating system type, and browser type from the log file Calculate the search keyword Persisting the process data Kafka spout and define topology Deploy topology MySQL queries

Calculate the page hit from each country Calculate the count for each browser Calculate the count for each operating system

Summary

Twitter Tweet Collection and Machine Learning

Exploring machine learning Twitter sentiment analysis

Using Kafka producer to store the tweets in a Kafka cluster

Kafka spout, sentiments bolt, and HDFS bolt Summary

← Prev
Back
Next →

← Prev
Back
Next →