Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cassandra: The Definitive Guide
Dedication
SPECIAL OFFER: Upgrade this ebook with O’Reilly
A Note Regarding Supplemental Files
Foreword
Preface
Why Apache Cassandra?
Is This Book for You?
What’s in This Book?
Finding Out More
Conventions Used in This Book
Using Code Examples
Safari® Enabled
How to Contact Us
Acknowledgments
Why Apache Cassandra?
Is This Book for You?
What’s in This Book?
Finding Out More
Conventions Used in This Book
Using Code Examples
Safari® Enabled
How to Contact Us
Acknowledgments
1. Introducing Cassandra
What’s Wrong with Relational Databases?
A Quick Review of Relational Databases
RDBMS: The Awesome and the Not-So-Much
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Web Scale
RDBMS: The Awesome and the Not-So-Much
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Web Scale
The Cassandra Elevator Pitch
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
Schema-Free
High Performance
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
Schema-Free
High Performance
Where Did Cassandra Come From?
Use Cases for Cassandra
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Evolving Applications
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Evolving Applications
Who Is Using Cassandra?
Summary
What’s Wrong with Relational Databases?
A Quick Review of Relational Databases
RDBMS: The Awesome and the Not-So-Much
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Web Scale
RDBMS: The Awesome and the Not-So-Much
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Transactions, ACID-ity, and two-phase commit
Schema
Sharding and shared-nothing architecture
Summary
Web Scale
The Cassandra Elevator Pitch
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
Schema-Free
High Performance
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
Schema-Free
High Performance
Where Did Cassandra Come From?
Use Cases for Cassandra
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Evolving Applications
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Evolving Applications
Who Is Using Cassandra?
Summary
2. Installing Cassandra
Installing the Binary
Extracting the Download
What’s In There?
Extracting the Download
What’s In There?
Building from Source
Additional Build Targets
Building with Maven
Additional Build Targets
Building with Maven
Running Cassandra
On Windows
On Linux
Starting the Server
On Windows
On Linux
Starting the Server
Running the Command-Line Client Interface
Basic CLI Commands
Help
Connecting to a Server
Describing the Environment
Creating a Keyspace and Column Family
Writing and Reading Data
Help
Connecting to a Server
Describing the Environment
Creating a Keyspace and Column Family
Writing and Reading Data
Summary
Installing the Binary
Extracting the Download
What’s In There?
Extracting the Download
What’s In There?
Building from Source
Additional Build Targets
Building with Maven
Additional Build Targets
Building with Maven
Running Cassandra
On Windows
On Linux
Starting the Server
On Windows
On Linux
Starting the Server
Running the Command-Line Client Interface
Basic CLI Commands
Help
Connecting to a Server
Describing the Environment
Creating a Keyspace and Column Family
Writing and Reading Data
Help
Connecting to a Server
Describing the Environment
Creating a Keyspace and Column Family
Writing and Reading Data
Summary
3. The Cassandra Data Model
The Relational Data Model
A Simple Introduction
Clusters
Keyspaces
Column Families
Column Family Options
Column Family Options
Columns
Wide Rows, Skinny Rows
Column Sorting
Wide Rows, Skinny Rows
Column Sorting
Super Columns
Composite Keys
Composite Keys
Design Differences Between RDBMS and Cassandra
No Query Language
No Referential Integrity
Secondary Indexes
Sorting Is a Design Decision
Denormalization
No Query Language
No Referential Integrity
Secondary Indexes
Sorting Is a Design Decision
Denormalization
Design Patterns
Materialized View
Valueless Column
Aggregate Key
Materialized View
Valueless Column
Aggregate Key
Some Things to Keep in Mind
Summary
The Relational Data Model
A Simple Introduction
Clusters
Keyspaces
Column Families
Column Family Options
Column Family Options
Columns
Wide Rows, Skinny Rows
Column Sorting
Wide Rows, Skinny Rows
Column Sorting
Super Columns
Composite Keys
Composite Keys
Design Differences Between RDBMS and Cassandra
No Query Language
No Referential Integrity
Secondary Indexes
Sorting Is a Design Decision
Denormalization
No Query Language
No Referential Integrity
Secondary Indexes
Sorting Is a Design Decision
Denormalization
Design Patterns
Materialized View
Valueless Column
Aggregate Key
Materialized View
Valueless Column
Aggregate Key
Some Things to Keep in Mind
Summary
4. Sample Application
Data Design
Hotel App RDBMS Design
Hotel App Cassandra Design
Hotel Application Code
Creating the Database
Loading the schema
Loading the schema
Data Structures
Getting a Connection
Prepopulating the Database
The Search Application
Creating the Database
Loading the schema
Loading the schema
Data Structures
Getting a Connection
Prepopulating the Database
The Search Application
Twissandra
Summary
Data Design
Hotel App RDBMS Design
Hotel App Cassandra Design
Hotel Application Code
Creating the Database
Loading the schema
Loading the schema
Data Structures
Getting a Connection
Prepopulating the Database
The Search Application
Creating the Database
Loading the schema
Loading the schema
Data Structures
Getting a Connection
Prepopulating the Database
The Search Application
Twissandra
Summary
5. The Cassandra Architecture
System Keyspace
Peer-to-Peer
Gossip and Failure Detection
Anti-Entropy and Read Repair
Memtables, SSTables, and Commit Logs
Hinted Handoff
Compaction
Bloom Filters
Tombstones
Staged Event-Driven Architecture
Managers and Services
Cassandra Daemon
Storage Service
Messaging Service
Hinted Handoff Manager
Cassandra Daemon
Storage Service
Messaging Service
Hinted Handoff Manager
Summary
System Keyspace
Peer-to-Peer
Gossip and Failure Detection
Anti-Entropy and Read Repair
Memtables, SSTables, and Commit Logs
Hinted Handoff
Compaction
Bloom Filters
Tombstones
Staged Event-Driven Architecture
Managers and Services
Cassandra Daemon
Storage Service
Messaging Service
Hinted Handoff Manager
Cassandra Daemon
Storage Service
Messaging Service
Hinted Handoff Manager
Summary
6. Configuring Cassandra
Keyspaces
Creating a Column Family
Transitioning from 0.6 to 0.7
Creating a Column Family
Transitioning from 0.6 to 0.7
Replicas
Replica Placement Strategies
Simple Strategy
Old Network Topology Strategy
Network Topology Strategy
Simple Strategy
Old Network Topology Strategy
Network Topology Strategy
Replication Factor
Increasing the Replication Factor
Increasing the Replication Factor
Partitioners
Random Partitioner
Order-Preserving Partitioner
Collating Order-Preserving Partitioner
Byte-Ordered Partitioner
Random Partitioner
Order-Preserving Partitioner
Collating Order-Preserving Partitioner
Byte-Ordered Partitioner
Snitches
Simple Snitch
PropertyFileSnitch
Simple Snitch
PropertyFileSnitch
Creating a Cluster
Changing the Cluster Name
Adding Nodes to a Cluster
Multiple Seed Nodes
Changing the Cluster Name
Adding Nodes to a Cluster
Multiple Seed Nodes
Dynamic Ring Participation
Security
Using SimpleAuthenticator
Programmatic Authentication
Using MD5 Encryption
Providing Your Own Authentication
Using SimpleAuthenticator
Programmatic Authentication
Using MD5 Encryption
Providing Your Own Authentication
Miscellaneous Settings
Additional Tools
Viewing Keys
Importing Previous Configurations
Viewing Keys
Importing Previous Configurations
Summary
Keyspaces
Creating a Column Family
Transitioning from 0.6 to 0.7
Creating a Column Family
Transitioning from 0.6 to 0.7
Replicas
Replica Placement Strategies
Simple Strategy
Old Network Topology Strategy
Network Topology Strategy
Simple Strategy
Old Network Topology Strategy
Network Topology Strategy
Replication Factor
Increasing the Replication Factor
Increasing the Replication Factor
Partitioners
Random Partitioner
Order-Preserving Partitioner
Collating Order-Preserving Partitioner
Byte-Ordered Partitioner
Random Partitioner
Order-Preserving Partitioner
Collating Order-Preserving Partitioner
Byte-Ordered Partitioner
Snitches
Simple Snitch
PropertyFileSnitch
Simple Snitch
PropertyFileSnitch
Creating a Cluster
Changing the Cluster Name
Adding Nodes to a Cluster
Multiple Seed Nodes
Changing the Cluster Name
Adding Nodes to a Cluster
Multiple Seed Nodes
Dynamic Ring Participation
Security
Using SimpleAuthenticator
Programmatic Authentication
Using MD5 Encryption
Providing Your Own Authentication
Using SimpleAuthenticator
Programmatic Authentication
Using MD5 Encryption
Providing Your Own Authentication
Miscellaneous Settings
Additional Tools
Viewing Keys
Importing Previous Configurations
Viewing Keys
Importing Previous Configurations
Summary
7. Reading and Writing Data
Query Differences Between RDBMS and Cassandra
No Update Query
Record-Level Atomicity on Writes
No Server-Side Transaction Support
No Duplicate Keys
No Update Query
Record-Level Atomicity on Writes
No Server-Side Transaction Support
No Duplicate Keys
Basic Write Properties
Consistency Levels
Basic Read Properties
The API
Ranges and Slices
Ranges and Slices
Setup and Inserting Data
Using a Simple Get
Seeding Some Values
Slice Predicate
Getting Particular Column Names with Get Slice
Getting a Set of Columns with Slice Range
Counts
Reversed
Counts
Reversed
Getting All Columns in a Row
Getting Particular Column Names with Get Slice
Getting a Set of Columns with Slice Range
Counts
Reversed
Counts
Reversed
Getting All Columns in a Row
Get Range Slices
Multiget Slice
Deleting
Batch Mutates
Batch Deletes
Range Ghosts
Batch Deletes
Range Ghosts
Programmatically Defining Keyspaces and Column Families
Summary
Query Differences Between RDBMS and Cassandra
No Update Query
Record-Level Atomicity on Writes
No Server-Side Transaction Support
No Duplicate Keys
No Update Query
Record-Level Atomicity on Writes
No Server-Side Transaction Support
No Duplicate Keys
Basic Write Properties
Consistency Levels
Basic Read Properties
The API
Ranges and Slices
Ranges and Slices
Setup and Inserting Data
Using a Simple Get
Seeding Some Values
Slice Predicate
Getting Particular Column Names with Get Slice
Getting a Set of Columns with Slice Range
Counts
Reversed
Counts
Reversed
Getting All Columns in a Row
Getting Particular Column Names with Get Slice
Getting a Set of Columns with Slice Range
Counts
Reversed
Counts
Reversed
Getting All Columns in a Row
Get Range Slices
Multiget Slice
Deleting
Batch Mutates
Batch Deletes
Range Ghosts
Batch Deletes
Range Ghosts
Programmatically Defining Keyspaces and Column Families
Summary
8. Clients
Basic Client API
Thrift
Thrift Support for Java
Exceptions
Thrift Summary
Thrift Support for Java
Exceptions
Thrift Summary
Avro
Avro Ant Targets
Avro Specification
Avro Summary
Avro Ant Targets
Avro Specification
Avro Summary
A Bit of Git
Connecting Client Nodes
Client List
Round-Robin DNS
Load Balancer
Client List
Round-Robin DNS
Load Balancer
Cassandra Web Console
Hector
Features
The Hector API
Features
The Hector API
HectorSharp
Chirper
Chiton
Pelops
Kundera
Fauna
Summary
Basic Client API
Thrift
Thrift Support for Java
Exceptions
Thrift Summary
Thrift Support for Java
Exceptions
Thrift Summary
Avro
Avro Ant Targets
Avro Specification
Avro Summary
Avro Ant Targets
Avro Specification
Avro Summary
A Bit of Git
Connecting Client Nodes
Client List
Round-Robin DNS
Load Balancer
Client List
Round-Robin DNS
Load Balancer
Cassandra Web Console
Hector
Features
The Hector API
Features
The Hector API
HectorSharp
Chirper
Chiton
Pelops
Kundera
Fauna
Summary
9. Monitoring
Logging
Tailing
General Tips
Following along
Warning signs
Following along
Warning signs
Tailing
General Tips
Following along
Warning signs
Following along
Warning signs
Overview of JMX and MBeans
MBeans
Integrating JMX
MBeans
Integrating JMX
Interacting with Cassandra via JMX
Cassandra’s MBeans
org.apache.cassandra.concurrent
org.apache.cassandra.db
org.apache.cassandra.gms
org.apache.cassandra.service
StorageService
StreamingService
StorageService
StreamingService
org.apache.cassandra.concurrent
org.apache.cassandra.db
org.apache.cassandra.gms
org.apache.cassandra.service
StorageService
StreamingService
StorageService
StreamingService
Custom Cassandra MBeans
Runtime Analysis Tools
Heap Analysis with JMX and JHAT
Detecting Thread Problems
Heap Analysis with JMX and JHAT
Detecting Thread Problems
Health Check
Summary
Logging
Tailing
General Tips
Following along
Warning signs
Following along
Warning signs
Tailing
General Tips
Following along
Warning signs
Following along
Warning signs
Overview of JMX and MBeans
MBeans
Integrating JMX
MBeans
Integrating JMX
Interacting with Cassandra via JMX
Cassandra’s MBeans
org.apache.cassandra.concurrent
org.apache.cassandra.db
org.apache.cassandra.gms
org.apache.cassandra.service
StorageService
StreamingService
StorageService
StreamingService
org.apache.cassandra.concurrent
org.apache.cassandra.db
org.apache.cassandra.gms
org.apache.cassandra.service
StorageService
StreamingService
StorageService
StreamingService
Custom Cassandra MBeans
Runtime Analysis Tools
Heap Analysis with JMX and JHAT
Detecting Thread Problems
Heap Analysis with JMX and JHAT
Detecting Thread Problems
Health Check
Summary
10. Maintenance
Getting Ring Information
Info
Ring
Range Tokens
Range Tokens
Info
Ring
Range Tokens
Range Tokens
Getting Statistics
Using cfstats
Using tpstats
Using cfstats
Using tpstats
Basic Maintenance
Repair
Flush
Cleanup
Repair
Flush
Cleanup
Snapshots
Taking a Snapshot
Clearing a Snapshot
Taking a Snapshot
Clearing a Snapshot
Load-Balancing the Cluster
loadbalance and streams
loadbalance and streams
Decommissioning a Node
Updating Nodes
Removing Tokens
Compaction Threshold
Changing Column Families in a Working Cluster
Removing Tokens
Compaction Threshold
Changing Column Families in a Working Cluster
Summary
Getting Ring Information
Info
Ring
Range Tokens
Range Tokens
Info
Ring
Range Tokens
Range Tokens
Getting Statistics
Using cfstats
Using tpstats
Using cfstats
Using tpstats
Basic Maintenance
Repair
Flush
Cleanup
Repair
Flush
Cleanup
Snapshots
Taking a Snapshot
Clearing a Snapshot
Taking a Snapshot
Clearing a Snapshot
Load-Balancing the Cluster
loadbalance and streams
loadbalance and streams
Decommissioning a Node
Updating Nodes
Removing Tokens
Compaction Threshold
Changing Column Families in a Working Cluster
Removing Tokens
Compaction Threshold
Changing Column Families in a Working Cluster
Summary
11. Performance Tuning
Data Storage
Reply Timeout
Commit Logs
Memtables
Concurrency
Caching
Buffer Sizes
Using the Python Stress Test
Generating the Python Thrift Interfaces
Getting Thrift
Getting Thrift
Running the Python Stress Test
Generating the Python Thrift Interfaces
Getting Thrift
Getting Thrift
Running the Python Stress Test
Startup and JVM Settings
Tuning the JVM
Tuning the JVM
Summary
Data Storage
Reply Timeout
Commit Logs
Memtables
Concurrency
Caching
Buffer Sizes
Using the Python Stress Test
Generating the Python Thrift Interfaces
Getting Thrift
Getting Thrift
Running the Python Stress Test
Generating the Python Thrift Interfaces
Getting Thrift
Getting Thrift
Running the Python Stress Test
Startup and JVM Settings
Tuning the JVM
Tuning the JVM
Summary
12. Integrating Hadoop
What Is Hadoop?
Working with MapReduce
Cassandra Hadoop Source Package
Cassandra Hadoop Source Package
Running the Word Count Example
Outputting Data to Cassandra
Hadoop Streaming
Outputting Data to Cassandra
Hadoop Streaming
Tools Above MapReduce
Pig
Hive
Pig
Hive
Cluster Configuration
Use Cases
Raptr.com: Keith Thornhill
Imagini: Dave Gardner
Raptr.com: Keith Thornhill
Imagini: Dave Gardner
Summary
What Is Hadoop?
Working with MapReduce
Cassandra Hadoop Source Package
Cassandra Hadoop Source Package
Running the Word Count Example
Outputting Data to Cassandra
Hadoop Streaming
Outputting Data to Cassandra
Hadoop Streaming
Tools Above MapReduce
Pig
Hive
Pig
Hive
Cluster Configuration
Use Cases
Raptr.com: Keith Thornhill
Imagini: Dave Gardner
Raptr.com: Keith Thornhill
Imagini: Dave Gardner
Summary
A. The Nonrelational Landscape
Nonrelational Databases
Object Databases
XML Databases
SoftwareAG Tamino
eXist
Oracle Berkeley XML DB
MarkLogic Server
Apache Xindice
Summary
SoftwareAG Tamino
eXist
Oracle Berkeley XML DB
MarkLogic Server
Apache Xindice
Summary
Document-Oriented Databases
IBM Lotus
Apache CouchDB
MongoDB
Riak
IBM Lotus
Apache CouchDB
MongoDB
Riak
Graph Databases
FlockDB
Neo4J
FlockDB
Neo4J
Key-Value Stores and Distributed Hashtables
Amazon Dynamo
Project Voldemort
Redis
Amazon Dynamo
Project Voldemort
Redis
Columnar Databases
Google Bigtable
HBase
Hypertable
Polyglot Persistence
Google Bigtable
HBase
Hypertable
Polyglot Persistence
Summary
Nonrelational Databases
Object Databases
XML Databases
SoftwareAG Tamino
eXist
Oracle Berkeley XML DB
MarkLogic Server
Apache Xindice
Summary
SoftwareAG Tamino
eXist
Oracle Berkeley XML DB
MarkLogic Server
Apache Xindice
Summary
Document-Oriented Databases
IBM Lotus
Apache CouchDB
MongoDB
Riak
IBM Lotus
Apache CouchDB
MongoDB
Riak
Graph Databases
FlockDB
Neo4J
FlockDB
Neo4J
Key-Value Stores and Distributed Hashtables
Amazon Dynamo
Project Voldemort
Redis
Amazon Dynamo
Project Voldemort
Redis
Columnar Databases
Google Bigtable
HBase
Hypertable
Polyglot Persistence
Google Bigtable
HBase
Hypertable
Polyglot Persistence
Summary
Glossary
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright
← Prev
Back
Next →
← Prev
Back
Next →