Cassandra · the Definitive Guide · Distributed Data at Web Scale by Carpenter, Jeff -- Read -- Imperial Library of Trantor

Index

Foreword Foreword Preface

Why Apache Cassandra? Is This Book for You? What’s in This Book?

New for the Second Edition

Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us Acknowledgments

1. Beyond Relational Databases

What’s Wrong with Relational Databases? A Quick Review of Relational Databases

RDBMSs: The Awesome and the Not-So-Much

Transactions, ACID-ity, and two-phase commit Schema Sharding and shared-nothing architecture

Web Scale The Rise of NoSQL Summary

2. Introducing Cassandra

The Cassandra Elevator Pitch

Cassandra in 50 Words or Less Distributed and Decentralized Elastic Scalability High Availability and Fault Tolerance Tuneable Consistency Brewer’s CAP Theorem Row-Oriented High Performance

Where Did Cassandra Come From?

Release History

Is Cassandra a Good Fit for My Project?

Large Deployments Lots of Writes, Statistics, and Analysis Geographical Distribution Evolving Applications

Getting Involved Summary

3. Installing Cassandra

Installing the Apache Distribution

Extracting the Download What’s In There?

Building from Source

Additional Build Targets

Running Cassandra

On Windows On Linux Starting the Server Stopping Cassandra

Other Cassandra Distributions Running the CQL Shell Basic cqlsh Commands

cqlsh Help Describing the Environment in cqlsh Creating a Keyspace and Table in cqlsh Writing and Reading Data in cqlsh

Summary

4. The Cassandra Query Language

The Relational Data Model Cassandra’s Data Model

Clusters Keyspaces Tables Columns

Timestamps Time to live (TTL)

CQL Types

Numeric Data Types Textual Data Types Time and Identity Data Types Other Simple Data Types Collections User-Defined Types

Secondary Indexes Summary

5. Data Modeling

Conceptual Data Modeling RDBMS Design

Design Differences Between RDBMS and Cassandra

No joins No referential integrity Denormalization Query-first design Designing for optimal storage Sorting is a design decision

Defining Application Queries Logical Data Modeling

Hotel Logical Data Model Reservation Logical Data Model

Physical Data Modeling

Hotel Physical Data Model Reservation Physical Data Model Materialized Views

Evaluating and Refining

Calculating Partition Size Calculating Size on Disk Breaking Up Large Partitions

Defining Database Schema

DataStax DevCenter

Summary

6. The Cassandra Architecture

Data Centers and Racks Gossip and Failure Detection Snitches Rings and Tokens Virtual Nodes Partitioners Replication Strategies Consistency Levels Queries and Coordinator Nodes Memtables, SSTables, and Commit Logs Caching Hinted Handoff Lightweight Transactions and Paxos Tombstones Bloom Filters Compaction Anti-Entropy, Repair, and Merkle Trees Staged Event-Driven Architecture (SEDA) Managers and Services

Cassandra Daemon Storage Engine Storage Service Storage Proxy Messaging Service Stream Manager CQL Native Transport Server

System Keyspaces Summary

7. Configuring Cassandra

Cassandra Cluster Manager Creating a Cluster Seed Nodes Partitioners

Murmur3 Partitioner Random Partitioner Order-Preserving Partitioner ByteOrderedPartitioner

Snitches

Simple Snitch Property File Snitch Gossiping Property File Snitch Rack Inferring Snitch Cloud Snitches Dynamic Snitch

Node Configuration

Tokens and Virtual Nodes Network Interfaces Data Storage Startup and JVM Settings

Adding Nodes to a Cluster Dynamic Ring Participation Replication Strategies

SimpleStrategy NetworkTopologyStrategy Changing the Replication Factor

Summary

8. Clients

Hector, Astyanax, and Other Legacy Clients DataStax Java Driver

Development Environment Configuration Clusters and Contact Points

Protocol version Compression Authentication and encryption

Sessions and Connection Pooling Statements

Simple statement Asynchronous execution Prepared statement Bound statement Built statement and the Query Builder Object mapper

Policies

Load balancing policy Retry policy Speculative execution policy Address translator

Metadata

Node discovery Schema access

Debugging and Monitoring

Logging Metrics

DataStax Python Driver DataStax Node.js Driver DataStax Ruby Driver DataStax C# Driver DataStax C/C++ Driver DataStax PHP Driver Summary

9. Reading and Writing Data

Writing

Write Consistency Levels The Cassandra Write Path Writing Files to Disk

Commit log files SSTable files

Lightweight Transactions Batches

Reading

Read Consistency Levels The Cassandra Read Path Read Repair Range Queries, Ordering and Filtering Functions and Aggregates

User-defined functions User-defined aggregates Built-in functions and aggregates

Paging Speculative Retry

Deleting Summary

10. Monitoring

Logging

Tailing Examining Log Files

Monitoring Cassandra with JMX

Connecting to Cassandra via JConsole Overview of MBeans

Cassandra’s MBeans

Database MBeans

Storage Service MBean Storage Proxy MBean ColumnFamilyStoreMBean CacheServiceMBean CommitLogMBean Compaction Manager MBean Snitch MBeans HintedHandoffManagerMBean

Networking MBeans

FailureDetectorMBean GossiperMBean StreamManagerMBean

Metrics MBeans Threading MBeans Service MBeans Security MBeans

Monitoring with nodetool

Getting Cluster Information

describecluster status info ring

Getting Statistics

Using tpstats Using tablestats

Summary

11. Maintenance

Health Check Basic Maintenance

Flush Cleanup Repair

Full repair, incremental repair, and anti-compaction Sequential and parallel repair Partitioner range repair Subrange repair

Rebuilding Indexes Moving Tokens

Adding Nodes

Adding Nodes to an Existing Data Center Adding a Data Center to a Cluster

Handling Node Failure

Repairing Nodes

Recovering from disk failure

Replacing Nodes Removing Nodes

Decommissioning a node Removing a node Assassinating a node

Upgrading Cassandra Backup and Recovery

Taking a Snapshot Clearing a Snapshot Enabling Incremental Backup Restoring from Snapshot

SSTable Utilities Maintenance Tools

DataStax OpsCenter Netflix Priam

Summary

12. Performance Tuning

Managing Performance

Setting Performance Goals Monitoring Performance Analyzing Performance Issues Tracing Tuning Methodology

Caching

Key Cache Row Cache Counter Cache Saved Cache Settings

Memtables Commit Logs SSTables Hinted Handoff Compaction Concurrency and Threading Networking and Timeouts JVM Settings

Memory Garbage Collection

Using cassandra-stress Summary

13. Security

Authentication and Authorization

Password Authenticator

Configuring the authenticator Additional authentication providers Adding users Authenticating via the DataStax Java driver

Using CassandraAuthorizer Role-Based Access Control

Encryption

SSL, TLS, and Certificates Node-to-Node Encryption Client-to-Node Encryption

JMX Security

Securing JMX Access Security MBeans

PermissionsCacheMBean

Summary

14. Deploying and Integrating

Planning a Cluster Deployment

Sizing Your Cluster Selecting Instances Storage Network

Cloud Deployment

Amazon Web Services Microsoft Azure Google Cloud Platform

Integrations

Apache Lucene, SOLR, and Elasticsearch Apache Hadoop Apache Spark

Use cases for Spark with Cassandra Deploying Spark with Cassandra The spark-cassandra-connector

Summary

Index

← Prev
Back
Next →

← Prev
Back
Next →