I first came across Apache Cassandra at the start of 2009. As a beta tester for push alerts on the iPhone, our mobile news application was having a difficult time handling millions of user lookups to send timely updates. After evaluating several different approaches, I stumbled onto Cassandra. Here was a database where I had to model my data for the query I was going to make, optimizing the “fetch” to return the whole result set in a single disk seek. What had been taking up to 30 minutes, impacting latencies across the site, sped up to a matter of seconds—so fast that our development contacts at Apple had to ask us to lower our throughput!
Almost every aspect of Cassandra has changed substantially since then. What was initially a niche group of power users and distributed systems enthusiasts has blossomed into a thriving, diverse community. Over the years, Cassandra has proved itself time and again, running some of the largest workloads on the internet for services you use every day. An equally valid maturity signal is that you now have in front of you a third edition of this book.
Through all its successes, Cassandra is still a difficult system to use. From installation to integration and operationalization, it remains nuanced, with plenty of gotchas. This book does a fantastic job of walking the reader through common pitfalls, with detailed explanations of important concepts.
No matter what your focus, read all the way through this book, so you understand the Cassandra system as a whole. As a developer, you may never need to care about anti-entropy repair (that’s the ops team’s job!), but you still need to understand its impact on maintaining data consistency. This may be the first time you have used a system where the operational mechanics affect how you configure your application for data consistency. You’ll also need to communicate the failure boundaries and cluster topology to your application team so they can configure data distribution and consistency levels correctly.
Regardless of your role, these concepts are difficult. This book, more than any single resource I’ve come across to date, does an excellent job of explaining things. We all learn differently, though, and I encourage you to supplement what this volume offers by engaging with Apache Cassandra’s thriving community, full of experienced power users, developers, and application architects who contribute code and documentation, participate in discussion lists and Slack channels, and speak at meetups and events. Use this book as your gateway into the world of Apache Cassandra.