Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Storm Blueprints: Patterns for Distributed Real-time Computation
Table of Contents Storm Blueprints: Patterns for Distributed Real-time Computation Credits About the Authors About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe? Free Access for Packt account holders
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. Distributed Word Count
Introducing elements of a Storm topology – streams, spouts, and bolts
Streams Spouts Bolts
Introducing the word count topology data flow
Sentence spout
Introducing the split sentence bolt Introducing the word count bolt Introducing the report bolt
Implementing the word count topology
Setting up a development environment Implementing the sentence spout Implementing the split sentence bolt Implementing the word count bolt Implementing the report bolt Implementing the word count topology
Introducing parallelism in Storm
WordCountTopology parallelism
Adding workers to a topology Configuring executors and tasks
Understanding stream groupings Guaranteed processing
Reliability in spouts Reliability in bolts Reliable word count
Summary
2. Configuring Storm Clusters
Introducing the anatomy of a Storm cluster
Understanding the nimbus daemon Working with the supervisor daemon Introducing Apache ZooKeeper Working with Storm's DRPC server Introducing the Storm UI
Introducing the Storm technology stack
Java and Clojure Python
Installing Storm on Linux
Installing the base operating system Installing Java ZooKeeper installation Storm installation Running the Storm daemons Configuring Storm Mandatory settings Optional settings The Storm executable Setting up the Storm executable on a workstation The daemon commands
Nimbus Supervisor UI DRPC
The management commands
Jar Kill Deactivate Activate Rebalance Remoteconfvalue
Local debug/development commands
REPL Classpath Localconfvalue
Submitting topologies to a Storm cluster Automating the cluster configuration A rapid introduction to Puppet
Puppet manifests Puppet classes and modules Puppet templates Managing environments with Puppet Hiera Introducing Hiera
Summary
3. Trident Topologies and Sensor Data
Examining our use case Introducing Trident topologies Introducing Trident spouts Introducing Trident operations – filters and functions
Introducing Trident filters Introducing Trident functions
Introducing Trident aggregators – Combiners and Reducers
CombinerAggregator ReducerAggregator Aggregator
Introducing the Trident state
The Repeat Transactional state The Opaque state
Executing the topology Summary
4. Real-time Trend Analysis
Use case Architecture
The source application The logback Kafka appender Apache Kafka Kafka spout The XMPP server
Installing the required software
Installing Kafka Installing OpenFire
Introducing the sample application
Sending log messages to Kafka
Introducing the log analysis topology
Kafka spout The JSON project function Calculating a moving average Adding a sliding window Implementing the moving average function Filtering on thresholds Sending notifications with XMPP
The final topology Running the log analysis topology Summary
5. Real-time Graph Analysis
Use case Architecture
The Twitter client Kafka spout A titan-distributed graph database
A brief introduction to graph databases
Accessing the graph – the TinkerPop stack Manipulating the graph with the Blueprints API Manipulating the graph with the Gremlin shell
Software installation
Titan installation
Setting up Titan to use the Cassandra storage backend
Installing Cassandra Starting Titan with the Cassandra backend
Graph data model Connecting to the Twitter stream
Setting up the Twitter4J client The OAuth configuration
The TwitterStreamConsumer class The TwitterStatusListener class
Twitter graph topology
The JSONProjectFunction class
Implementing GraphState
GraphFactory GraphTupleProcessor GraphStateFactory GraphState GraphUpdater
Implementing GraphFactory Implementing GraphTupleProcessor Putting it all together – the TwitterGraphTopology class
The TwitterGraphTopology class
Querying the graph with Gremlin Summary
6. Artificial Intelligence
Designing for our use case Establishing the architecture
Examining the design challenges Implementing the recursion
Accessing the function's return values Immutable tuple field values Upfront field declaration Tuple acknowledgement in recursion Output to multiple streams Read-before-write
Solving the challenges
Implementing the architecture
The data model Examining the recursive topology The queue interaction Functions and filters Examining the Scoring Topology
Addressing read-before-write
Distributed locking Retry when stale Executing the topology
Enumerating the game tree
Distributed Remote Procedure Call (DRPC)
Remote deployment
Summary
7. Integrating Druid for Financial Analytics
Use case Integrating a non-transactional system The topology
The spout The filter The state design
Implementing the architecture
DruidState Implementing the StormFirehose object Implementing the partition status in ZooKeeper
Executing the implementation Examining the analytics Summary
8. Natural Language Processing
Motivating a Lambda architecture Examining our use case Realizing a Lambda architecture Designing the topology for our use case Implementing the design
TwitterSpout/TweetEmitter Functions
TweetSplitterFunction WordFrequencyFunction PersistenceFunction
Examining the analytics Batch processing / historical analysis Hadoop
An overview of MapReduce The Druid setup
HadoopDruidIndexer
Summary
9. Deploying Storm on Hadoop for Advertising Analysis
Examining the use case Establishing the architecture
Examining HDFS Examining YARN
Configuring the infrastructure
The Hadoop infrastructure Configuring HDFS
Configuring the NameNode Configuring the DataNode Configuring YARN
Configuring the ResourceManager
Configuring the NodeManager
Deploying the analytics
Performing a batch analysis with the Pig infrastructure Performing a real-time analysis with the Storm-YARN infrastructure
Performing the analytics
Executing the batch analysis Executing real-time analysis
Deploying the topology Executing the topology Summary
10. Storm in the Cloud
Introducing Amazon Elastic Compute Cloud (EC2)
Setting up an AWS account The AWS Management Console
Creating an SSH key pair
Launching an EC2 instance manually
Logging in to the EC2 instance
Introducing Apache Whirr
Installing Whirr
Configuring a Storm cluster with Whirr
Launching the cluster
Introducing Whirr Storm
Setting up Whirr Storm
Cluster configuration Customizing Storm's configuration Customizing firewall rules
Introducing Vagrant
Installing Vagrant Launching your first virtual machine
The Vagrantfile and shared filesystem Vagrant provisioning Configuring multimachine clusters with Vagrant
Creating Storm-provisioning scripts
ZooKeeper Storm Supervisord
The Storm Vagrantfile Launching the Storm cluster
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion