Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Real-Time Processing and Storm Introduction
Apache Storm
Features of Storm
Storm components
Nimbus
Supervisor nodes
The ZooKeeper cluster
The Storm data model
Definition of a Storm topology
Operation modes in Storm
Programming languages
Summary
Storm Deployment, Topology Development, and Topology Options
Storm prerequisites
Installing Java SDK 7
Deployment of the ZooKeeper cluster
Setting up the Storm cluster
Developing the hello world example
The different options of the Storm topology
Deactivate
Activate
Rebalance
Kill
Dynamic log level settings
Walkthrough of the Storm UI
Cluster Summary section
Nimbus Summary section
Supervisor Summary section
Nimbus Configuration section
Topology Summary section
Dynamic log level settings
Updating the log level from the Storm UI
Updating the log level from the Storm CLI
Summary
Storm Parallelism and Data Partitioning
Parallelism of a topology
Worker process
Executor
Task
Configure parallelism at the code level
Worker process, executor, and task distribution
Rebalance the parallelism of a topology
Rebalance the parallelism of a SampleStormClusterTopology topology
Different types of stream grouping in the Storm cluster
Shuffle grouping
Field grouping
All grouping
Global grouping
Direct grouping
Local or shuffle grouping
None grouping
Custom grouping
Guaranteed message processing
Tick tuple
Summary
Trident Introduction
Trident introduction
Understanding Trident's data model
Writing Trident functions, filters, and projections
Trident function
Trident filter
Trident projection
Trident repartitioning operations
Utilizing shuffle operation
Utilizing partitionBy operation
Utilizing global operation
Utilizing broadcast operation
Utilizing batchGlobal operation
Utilizing partition operation
Trident aggregator
partitionAggregate
aggregate
ReducerAggregator
Aggregator
CombinerAggregator
persistentAggregate
Aggregator chaining
Utilizing the groupBy operation
When to use Trident
Summary
Trident Topology and Uses
Trident groupBy operation
groupBy before partitionAggregate
groupBy before aggregate
Non-transactional topology
Trident hello world topology
Trident state
Distributed RPC
When to use Trident
Summary
Storm Scheduler
Introduction to Storm scheduler
Default scheduler
Isolation scheduler
Resource-aware scheduler
Component-level configuration
Memory usage example
CPU usage example
Worker-level configuration
Node-level configuration
Global component configuration
Custom scheduler
Configuration changes in the supervisor node
Configuration setting at component level
Writing a custom supervisor class
Converting component IDs to executors
Converting supervisors to slots
Registering a CustomScheduler class
Summary
Monitoring of Storm Cluster
Cluster statistics using the Nimbus thrift client
Fetching information with Nimbus thrift
Monitoring the Storm cluster using JMX
Monitoring the Storm cluster using Ganglia
Summary
Integration of Storm and Kafka
Introduction to Kafka
Kafka architecture
Producer
Replication
Consumer
Broker
Data retention
Installation of Kafka brokers
Setting up a single node Kafka cluster
Setting up a three node Kafka cluster
Multiple Kafka brokers on a single node
Share ZooKeeper between Storm and Kafka
Kafka producers and publishing data into Kafka
Kafka Storm integration
Deploy the Kafka topology on Storm cluster
Summary
Storm and Hadoop Integration
Introduction to Hadoop
Hadoop Common
Hadoop Distributed File System
Namenode
Datanode
HDFS client
Secondary namenode
YARN
ResourceManager (RM)
NodeManager (NM)
ApplicationMaster (AM)
Installation of Hadoop
Setting passwordless SSH
Getting the Hadoop bundle and setting up environment variables
Setting up HDFS
Setting up YARN
Write Storm topology to persist data into HDFS
Integration of Storm with Hadoop
Setting up Storm-YARN
Storm-Starter topologies on Storm-YARN
Summary
Storm Integration with Redis, Elasticsearch, and HBase
Integrating Storm with HBase
Integrating Storm with Redis
Integrating Storm with Elasticsearch
Integrating Storm with Esper
Summary
Apache Log Processing with Storm
Apache log processing elements
Producing Apache log in Kafka using Logstash
Installation of Logstash
What is Logstash?
Why are we using Logstash?
Installation of Logstash
Configuration of Logstash
Why are we using Kafka between Logstash and Storm?
Splitting the Apache log line
Identifying country, operating system type, and browser type from the log file
Calculate the search keyword
Persisting the process data
Kafka spout and define topology
Deploy topology
MySQL queries
Calculate the page hit from each country
Calculate the count for each browser
Calculate the count for each operating system
Summary
Twitter Tweet Collection and Machine Learning
Exploring machine learning
Twitter sentiment analysis
Using Kafka producer to store the tweets in a Kafka cluster
Kafka spout, sentiments bolt, and HDFS bolt
Summary
← Prev
Back
Next →
← Prev
Back
Next →