Mastering Apache Cassandra by Neeraj, Nishant -- Read -- Imperial Library of Trantor

Index

Mastering Apache Cassandra

Table of Contents Mastering Apache Cassandra Credits About the Author Acknowledgments About the Reviewers www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe? Free Access for Packt account holders

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy Questions

1. Quick Start

Introduction to Cassandra

Distributed database High availability Replication Multiple data centers

A brief introduction to a data model Installing Cassandra locally CRUD with cassandra-cli Cassandra in action

Modeling data Writing code

Setting up Application

Summary

2. Cassandra Architecture

Problems in the RDBMS world Enter NoSQL

The CAP theorem

Consistency Availability Partition-tolerance

Significance of the CAP theorem

Cassandra Cassandra architecture

Ring representation How Cassandra works

Write in action Read in action

Components of Cassandra

Messaging service Gossip Failure detection Partitioner Replication Log Structured Merge tree CommitLog MemTable SSTable

Bloom filter Index files Datafiles

Compaction Tombstones Hinted handoff Read repair and Anti-entropy

Merkle tree

Summary

3. Design Patterns

The Cassandra data model

The counter column The expiring column The super column The column family Keyspaces Data types – comparators and validators

Writing a custom comparator The primary index The wide-row index Simple groups Sorting for free, free as in speech An inverse index with a super column family An inverse index with composite keys The secondary index

Patterns and antipatterns

Avoid storing an entity in a single column (wherever possible) Atomic update Managing time series data

Wide-row time series High throughput rows and hotspots Advanced time series

Avoid super columns Transaction woes Use expiring columns batch_mutate

Summary

4. Deploying a Cluster

Evaluating requirements

Hard disk capacity

RAM CPU Nodes Network

System configurations

Optimizing user limits Swapping memory Clock synchronization Disk readahead

The required software

Installing Oracle Java 6

RHEL and CentOS systems Debian and Ubuntu systems

Installing the Java Native Access (JNA) library

Installing Cassandra

Installing from a tarball Installing from ASFRepository for Debian/Ubuntu Anatomy of the installation

Cassandra binaries Configuration files

Setting up Cassandra's data directory and commit log directory

Configuring a Cassandra cluster

The cluster name The seed node

Listen, broadcast, and RPC addresses

Initial token Partitioners

The random partitioner The byte-ordered partitioner The Murmur3 partitioner

Snitches

SimpleSnitch PropertyFileSnitch GossipingPropertyFileSnitch RackInferringSnitch EC2Snitch EC2MultiRegionSnitch

Replica placement strategies

SimpleStrategy NetworkTopologyStrategy

NetworkTopologyStrategy and multiple data center setups

Launching a cluster with a script Creating a keyspace

Authorization and authentication Summary

5. Performance Tuning

Stress testing Performance tuning

Write performance Read performance

Choosing the right compaction strategy Size tiered compaction strategy Leveled compaction Row cache Key cache Cache settings Enabling compression Tuning the bloom filter

More tuning via cassandra.yaml

index_interval commitlog_sync column_index_size_in_kb commitlog_total_space_in_mb

Tweaking JVM

Java heap Garbage collection Other JVM options

Scaling horizontally and vertically Network

Summary

6. Managing a Cluster – Scaling, Node Repair, and Backup

Scaling

Adding nodes to a cluster Removing nodes from a cluster

Removing a live node Removing a dead node

Replacing a node Backup and restoration

Using Cassandra bulk loader to restore the data

Load balancing Priam – managing large clusters on AWS Summary

7. Monitoring

Cassandra JMX interface

Accessing MBeans using JConsole

Cassandra nodetool

Monitoring with nodetool

cfstats netstats ring and describering tpstats compactionstats info

Administrating with nodetool

drain decommission move removetoken repair upgradesstable snapshot

DataStax OpsCenter

OpsCenter Features Installing OpsCenter and an agent

Prerequisites Running a Cassandra cluster Installing OpsCenter from Tarball Setting up an OpsCenter agent

Monitoring and administrating with OpsCenter Other features of OpsCenter

Nagios – monitoring and notification

Installing Nagios

Prerequisites Preparation Installation

Installing Nagios Configuring Apache httpd Installing Nagios plugins Setting up Nagios as a service

Nagios plugins

Nagios plugins for Cassandra Executing remote plugins via an NRPE plugin

Installing NRPE on host machines Installing NRPE plugin on a Nagios machine

Setting things up to monitor Monitoring and notification using Nagios

Cassandra log

Enabling Java Options for GC Logging

Troubleshooting

High CPU usage High memory usage Hotspots OpenJDK may behave erratically Disk performance Slow snapshot Getting help from the mailing list

Summary

8. Integration

Using Hadoop Hadoop and Cassandra

Introduction to Hadoop

HDFS – Hadoop Distributed File System Data management

NameNode DataNodes

Hadoop MapReduce

JobTracker TaskTracker

Reliability of data and process in Hadoop

Setting up local Hadoop Testing the installation

Cassandra with Hadoop MapReduce

ColumnFamilyInputFormat ColumnFamilyOutputFormat ConfigHelper

Wide-row support Bulk loading Secondary index support

Cassandra and Hadoop in action

Executing, debugging, monitoring, and looking at results

Hadoop in Cassandra cluster

Cassandra filesystem

Integration with Pig

Installing Pig Integrating Pig and Cassandra

Cassandra and Solr

Development note on Solandra

DataStax Enterprise – the next level Solr integration

Summary

9. Introduction to CQL 3 and Cassandra 1.2

CQL – the Cassandra Query Language CQL 3 for Thrift refugees

Wide rows Composite columns

CQL 3 basics

The CREATE KEYSPACE query The CREATE TABLE query Compact storage Creating a secondary index The INSERT query The SELECT query select expression The WHERE clause The ORDER BY clause The LIMIT clause The USING CONSISTENCY clause The UPDATE query The DELETE query The TRUNCATE query The ALTER TABLE query

Adding a new column Dropping an existing column Modifying the data type of an existing column Altering table options

The ALTER KEYSPACE query BATCH querying The DROP INDEX query The DROP TABLE query The DROP KEYSPACE query The USE statement

What's new in Cassandra 1.2?

Virtual Nodes Off-heap Bloom filters JBOD improvements Parallel leveled compaction Murmur3 partitioner Atomic batches Query profiling Collections support

Sets Lists Maps

Support for programming languages Summary

Index

← Prev
Back
Next →

← Prev
Back
Next →