Chapter 8. Application Development with Drivers

Now that we’ve looked at how to design a microservice architecture for a hotel application, let’s look at how you might implement one of the services within that application—the Reservation Service. To write an application using Cassandra, you’re going to need a driver, and thankfully you are in good hands.

You’re likely used to connecting to relational databases using drivers. For example, in Java, JDBC is an API that abstracts the vendor implementation of the relational database to present a consistent way of storing and retrieving data using Statements, PreparedStatements, ResultSets, and so forth. To interact with the database, you get a driver that works with the particular database you’re using, such as Oracle, SQL Server, or MySQL; the implementation details of this interaction are hidden from the developer.

There are a number of client drivers available for Cassandra as well, including support for most popular languages. There are benefits to these clients, in that you can easily embed them in your own applications, and that they frequently offer more features than the CQL native interface does, including connection pooling and JMX integration and monitoring. In the following sections, you’ll learn about the various clients available and the features they offer.

Hector, Astyanax, and Other Legacy Clients

In the early days of Cassandra, the community produced a number of client drivers for different languages. These contributions were a key enabler of Cassandra adoption. We’ll mention a few of these clients here to pay tribute:

Hector was one of the first Cassandra clients. Hector provided a simple Java interface that helped many early developers avoid the challenges of writing to the Thrift API, and served as the inspiration for several other drivers.
Astyanax was a Java client originally built by Netflix on top of the Thrift API as a logical successor to the Hector driver. This driver helped many users transition from Thrift to CQL. The project was retired in 2016.
Other clients included Pycassa for Python, Perlcassa for Perl, Helenus for Node.js, and Cassandra-Sharp for the Microsoft .NET framework and C#. Most of these clients are no longer actively maintained, as they were based on the now-removed Thrift interface.

This Apache Cassandra page has a comprehensive list of both current and legacy drivers.

DataStax Java Driver

The introduction of CQL was the impetus for a major shift in the landscape of Cassandra client drivers. The simplicity and familiar syntax of CQL made the development of client programs similar to traditional relational database drivers. DataStax made a strategic investment of open source drivers for Java and several additional languages in order to fuel Cassandra adoption. These drivers quickly became the de facto standard for new development projects. You can access the drivers as well as additional connectors and tools at https://github.com/datastax.

DataStax Driver Compatibility Matrix

Visit the driver matrix page to access documentation and identify driver versions that are compatible with your server version.

The DataStax Java Driver is the oldest and most popular of these drivers, and typically the driver in which new features appear first. For this reason, we’ll focus on using the Java driver and use this as an opportunity to learn about the features that are provided by the DataStax drivers across multiple languages.

Development Environment Configuration

First, you’ll need to access the driver in your development environment. You could download the driver directly from the URL listed and manage the dependencies manually, but it is more typical in modern Java development to use a tool like Maven or Gradle to manage dependencies. If you’re using Maven, you’ll need to add something like the following to your project pom.xml file, while specifying a value for the driver version:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-core</artifactId>
  <version>${driver.version}</version>
</dependency>

You can find information in the online documentation manuals for the Java drivers, as well as for the documentation for Javadoc for the Java driver. Alternatively, the Javadocs are also part of the source distribution.

All of the DataStax drivers are managed as open source projects on GitHub. If you’re interested in seeing the Java driver source, you can get a read-only trunk version using this command:

$ git clone https://github.com/datastax/java-driver.git

If you’re interested in learning more about the internals of the driver, or even potentially contributing to the project, the DataStax documentation site also has a developer guide.

Driver API Changes

The 4.0 release of the Java driver included significant breaking changes to the API and configuration of the driver in order to simplify application development and discourage configurations contrary to best practices. This book conforms to the newer APIs. The “Clients” chapter in the second edition of this book remains a good resource for those using the Java Driver 3.x and earlier.

In September 2019, DataStax announced a significant change to its driver strategy. Prior to that point, DataStax had maintained separate open source and enterprise drivers for use with Apache Cassandra and DataStax Enterprise, respectively. In early 2020, the codebases for the drivers in each of the supported languages were merged, bringing the benefits of several performance and availability improvements that were previously only available to DSE customers. DSE-specific driver features are out of the scope of this book, but are well documented on the sites we’ve referenced.

Connecting to a Cluster

Once you’ve configured your environment, it’s time to start coding. We’ll base the code samples for this chapter around the Reservation Service, a microservice implementation based on the hotel data model introduced in Chapter 5, and the corresponding application design discussed in Chapter 7. The source code for the Reservation Service is available at https://github.com/jeffreyscarpenter/reservation-service.

To start building your application, you’ll use the driver’s API to connect to a cluster. In the Java driver, connectivity to a cluster is represented by the com.datastax.oss.driver.api.core.CqlSession class.

The CqlSession class is the main entry point of the driver. It supports a fluent-style API using the builder pattern. For example, the following line creates a CqlSession that will attempt to connect to a Cassandra node on the local host at the default Cassandra native protocol port number:

CqlSession cqlSession = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("127.0.0.1", 9042))
    .build()

Elimination of the Cluster Object

Previous versions of DataStax drivers supported the concept of a Cluster object used to create Session objects. Recent driver versions (for example, the 4.0 Java driver and later) have combined Cluster and Session into CqlSession.

In the terminology of the driver, the nodes you explicitly identify when creating a CqlSession are known as contact points. Contact points are similar to the concept of seed nodes that a Cassandra node uses to connect to other nodes in the same cluster.

The minimum required information to create a CqlSession is a single contact point. The driver defaults to a single contact point consisting of the local host and default port, so this statement is equivalent to the previous one (unless you are using file-based configuration, as we describe in “File-based configuration”):

CqlSession cqlSession = CqlSession.builder().build()

While this configuration is useful for development, when you might be running a Cassandra node on your local machine, for production environments you’ll want to specify multiple contact points. This is a good practice in case one of the nodes you pick happens to be down when the client application is attempting to create a CqlSession. You’ll also need to specify the name of the local data center. We’ll discuss naming data centers in Chapter 10.

CqlSession cqlSession = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("<some IP address>", 9042))
    .addContactPoint(new InetSocketAddress("<another IP address>", 9042))
    .withLocalDatacenter("<data center name>")
    .build()

When you create a CqlSession, the driver connects to one of the configured contact points to obtain metadata about the cluster. This action will throw a NoHostAvailableException if none of the contact points is available, or an AuthenticationException if authentication fails. We’ll discuss authentication in more detail in Chapter 14.

You can optionally provide the name of a keyspace to connect to, as in this example that connects to the reservation keyspace:

CqlSession cqlSession = CqlSession.builder()
    .addContactPoint(new InetSocketAddress("<some IP address>", 9042))
    .addContactPoint(new InetSocketAddress("<another IP address>", 9042))
    .withKeyspace("reservation")
    .build()

If you do not specify a keyspace name when creating the CqlSession, you’ll have to qualify every table reference in your queries with the appropriate keyspace name.

Each CqlSession manages connections to a Cassandra cluster, which are used to execute queries and control operations using the Cassandra native protocol. The CqlSession contains a pool of TCP connections for each host.

Sessions Are Expensive

Because a CqlSession maintains TCP connections to multiple nodes, it is a relatively heavyweight object. In most cases, you’ll want to create a single CqlSession and reuse it throughout your application, rather than continually building up and tearing down CqlSessions. Another acceptable option is to create a CqlSession per keyspace, if your application is accessing multiple keyspaces.

Statements

Once you have created a CqlSession to connect to a cluster, you’re ready to perform reads or writes. To begin doing some real application work, you’ll create and execute CQL statements using implementations of Statement. Statement is an interface with several implementations, including SimpleStatement, BoundStatement, and BatchStatement.

The simplest way to create and execute a statement is to call the CqlSession.execute() operation with a string representing the statement. Here’s an example of a statement that will return the entire contents of the reservations table:

cqlSession.execute("SELECT * from reservation.reservations_by_confirmation");

This statement creates and executes a query in a single method call. In practice, this could turn out to be a very expensive query to execute in a large database, but it does serve as a useful example of a very simple query. Most queries will be more complex, as you’ll have search criteria to specify, or specific values to insert. You can certainly use Java’s various string utilities to build up the syntax of your query by hand, but this, of course, is error prone. It may even expose your application to injection attacks if you’re not careful to sanitize strings that come from end users.

Simple Statements

Thankfully, you needn’t make things so hard on yourself. The Java driver provides the SimpleStatement class to help construct parameterized statements. As it turns out, the execute() operation is a convenience method for creating a SimpleStatement. The previous code is equivalent to the following, using the SimpleStatement.newInstance() method:

cqlSession.execute(SimpleStatement.newInstance(
  "SELECT * from reservation.reservations_by_confirmation"));

The newInstance() is most useful in cases where you already have a set query string. Let’s try building a query with variable parameters using a SimpleStatementBuilder. Here’s an example of a statement that will insert a row in the reservations table, which you can then execute:

SimpleStatement reservationInsert = SimpleStatement.builder(
  "INSERT INTO reservations_by_confirmation (confirmation_number, hotel_id,
    start_date, end_date, room_number, guest_id) VALUES (?, ?, ?, ?, ?, ?)")
  .addPositionalValue("RS2G0Z")
  .addPositionalValue("NY456")
  .addPositionalValue("2020-06-08")
  .addPositionalValue("2020-06-10")
  .addPositionalValue(111)
  .addPositionalValue("1b4d86f4-ccff-4256-a63d-45c905df2677")
  .build();
cqlSession.execute(reservationInsert);

The first parameter to the call is the basic syntax of your query, indicating the table and columns you are interested in. The question marks are used to indicate values that you’ll be providing in additional parameters. You use simple strings to hold the values of the hotel ID, name, and phone number.

If you’ve created your statement correctly, the insert will execute successfully (and silently). Now let’s create another statement to read back the row you just inserted:

SimpleStatement reservationSelect = SimpleStatement.builder(
  "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=?")
  .addPositionalValue("RS2G0Z")
  .build();
ResultSet reservationSelectResult = cqlSession.execute(reservationSelect);

Again, you make use of parameterization to provide the ID for the search. This time, when you execute the query, make sure to receive the ResultSet that is returned from the execute() method. You can iterate through the rows returned by the ResultSet as follows:

for (Row row : reservationSelectResult) {
  System.out.format("confirmation_number: %s, hotel_id: %, start_date: %s,
    end_date %s, room_number: %i, guest_id: %s\n",
    row.getString("confirmation_number"), row.getString("hotel_id"),
    row.getLocalDate("start_date"), row.getLocalDate("end_date"),
    row.getInt("room_number"), row.getUuid("guest_id"));
}

This code uses the ResultSet.iterator() option to get an Iterator over the rows in the result set and loop over each row, printing out the desired column values. Note that you use special accessors to obtain the value of each column, depending on the desired type—in this case, Row.getString(), getInt(), and getUuid(). As you might expect, this will print out a result such as:

confirmation_number: RS2G0Z, hotel_id: NY456, start_date: 2020-06-08,
  end_date: 2020-06-10, room_number: 111, guest_id:
  1b4d86f4-ccff-4256-a63d-45c905df2677

Of course, you typically will set columns to values you receive as variables, rather than the hardcoded value used here. You can find code samples for working with SimpleStatements on the simple-statement-solution branch of the Reservation Service repository.

Prepared Statements

While SimpleStatements are quite useful for creating ad hoc queries, most applications tend to perform the same set of queries repeatedly. The PreparedStatement is designed to handle these queries more efficiently. The structure of the statement is sent to nodes a single time for preparation, and a handle for the statement is returned. To use the prepared statement, only the handle and the parameters need to be sent.

As you’re building your application, you’ll typically create PreparedStatements for reading data, corresponding to each access pattern you derive in your data model, plus others for writing data to your tables to support those access patterns.

Let’s create some PreparedStatements to represent the same reservation queries as before, using the CqlSession.prepare() operation:

PreparedStatement reservationInsertPrepared = cqlSession.prepare(
  "INSERT INTO reservations_by_confirmation (confirmation_number, hotel_id,
    start_date, end_date, room_number, guest_id) VALUES (?, ?, ?, ?, ?, ?)");

PreparedStatement reservationSelectPrepared = cqlSession.prepare(
  "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=?");

Note that the PreparedStatement uses the same parameterized syntax used earlier for the SimpleStatement. A key difference, however, is that a PreparedStatement is not a subtype of Statement. This prevents the error of trying to pass an unbound PreparedStatement to the CqlSession to execute. Note that there is also a variant of CqlSession.prepare() that accepts a parameterized SimpleStatement as input.

Let’s take a step back and discuss what is happening behind the scenes of the CqlSession.prepare() operation:

The driver passes the contents of your PreparedStatement to a Cassandra node and gets back a unique identifier for the statement. This unique identifier is referenced when you create a BoundStatement. If you’re curious, you can actually see this reference by calling PreparedStatement.getId().

Once the driver prepares the statement on one node, it proceeds to prepare the statement on the other nodes in the cluster. Nodes keep track of prepared statements internally. In earlier releases, prepared statements were stored in a cache, but beginning with the 3.10 release, each Cassandra node stores prepared statements in a local table so that they are present if the node goes down and comes back up.

The driver also provides the advanced.prepared-statements.reprepare-on-up configuration options; this is primarily useful if your cluster is using a release prior to Cassandra 3.10. If re-preparation is enabled (the default), the driver will re-prepare statements on nodes that have come back up.

If the driver tries to execute a PreparedStatement on a node where it has not been prepared, the driver automatically prepares the statement, at the cost of an additional round trip between the driver and the node.

You can think of a PreparedStatement as a template for creating queries. In addition to specifying the form of your query, there are other attributes that you can set on a PreparedStatement that will be used as defaults for statements it is used to create, including a default consistency level, retry policy, and tracing.

In addition to improving efficiency, PreparedStatements also improve security by separating the query logic of CQL from the data. This provides protection against injection attacks, which attempt to embed commands into data fields in order to gain unauthorized access.

Bound statement

Now your PreparedStatement is available to use to create queries. In order to make use of a PreparedStatement, you bind it with actual values by calling the bind() operation. For example, you can bind the SELECT statement created earlier as follows:

BoundStatement reservationSelectBound = reservationSelectPrepared.bind("RS2G0Z");

The bind() operation used here allows you to provide values that match each variable in the PreparedStatement. It is possible to provide the first n bound values, in which case the remaining values must be bound separately before executing the statement. There is also a version of bind() that takes no parameters, in which case all of the parameters must be bound separately. There are several set() operations provided by BoundStatement that can be used to bind values of different types. For example, you can take the INSERT prepared statement from earlier and bind the name and phone values using the setString() operation:

BoundStatement reservationInsertBound = reservationInsertPrepared.bind()
  .setString("confirmation_number", "RS2G0Z")
  .setString("hotel_id", "NY456")
  .setLocalDate("start_date", "2020-06-08")
  .setLocalDate("end_date", "2020-06-10")
  .setShort(111)
  .setUuid("1b4d86f4-ccff-4256-a63d-45c905df2677")

Once you have bound all of the values, execute a BoundStatement using CqlSession.execute(). If you have failed to bind any of the values, they will be ignored on the server side, if protocol v4 (Cassandra 3.0 or later) is in use. The driver behavior for older protocol versions is to throw an IllegalStateException if there are any unbound values.

You can find code samples for working with PreparedStatement and BoundStatement on the prepared-statement-solution branch of the Reservation Service repository.

Query Builder

The driver also provides a QueryBuilder, which uses a fluent-style API for creating queries programmatically. This is especially useful for cases where there is variation in the query structure (such as optional parameters) that would make using PreparedStatements difficult. Similar to PreparedStatement, it also provides some protection against injection attacks.

To use the QueryBuilder, you’ll need to include an additional dependency, for example, in a Maven POM file:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-query-builder</artifactId>
  <version>${driver.version}</version>
</dependency>

The QueryBuilder provides a set of static methods to facilitate building different types of statements represented by different classes. The common usage is to import the static methods of the QueryBuilder class:

import static com.datastax.oss.driver.api.querybuilder.QueryBuilder.*;

Importing methods statically improves code readability, as you’ll see as you look at some examples.

The QueryBuilder produces objects that implement the com.datastax.oss.driver.api.querybuilder.BuildableQuery interface and its sub-interfaces, such as Select, Insert, Update, Delete, and others. The methods on these interfaces return objects that represent the content of a query as it is being built up. You’ll likely find your IDE quite useful in helping to identify the allowed operations as you’re building queries.

Let’s reproduce the queries from before using the QueryBuilder to see how it works. First, build a CQL INSERT query:

Insert reservationInsert =
  insertInto("reservation", "reservations_by_confirmation")
  .value("confirmation_number", "RS2G0Z")
  .value("hotel_id", "NY456")
  .value("start_date", "2020-06-08")
  .value("end_date", "2020-06-10")
  .value("room_number", 111)
  .value("guest_id", "1b4d86f4-ccff-4256-a63d-45c905df2677");

SimpleStatement reservationInsertStatement = reservationInsert.build();

The first operation calls the QueryBuilder.insertInto() operation to create an Insert statement for the reservations_by_confirmation table. Then use the Insert.value() operation repeatedly to specify values for each column you are inserting. The Insert.build() operation returns a SimpleStatement you can then pass to CqlSession.execute().

The construction of the CQL SELECT command is similar:

Select reservationSelect =
  selectFrom("reservation", "reservations_by_confirmation")
  .all()
  .whereColumn("confirmation_number").isEqualTo("RS2G0Z");

SimpleStatement reseravationSelectStatement = reservationSelect.build();

For this query, call QueryBuilder.selectFrom() to create a Select statement. You use the Select.all() operation to select all columns, although you could also have used the column() operation to select specific columns. Add a CQL WHERE clause via the Select.whereColumn() operation, to which you pass the name of the column and then add an equality check for the confirmation number, using the isEqualTo() operation.

This sample demonstrates how you can use the QueryBuilder to create a PreparedStatement instead of a SimpleStatement, using the concept of a bind marker as a placeholder for a value to be specified when the PreparedStatement is bound:

Select reservationSelect =
  selectFrom("reservation", "reservations_by_confirmation")
  .all()
  .whereColumn("confirmation_number").isEqualTo(bindMarker());

PreparedStatement reservationSelectPrepared =
  cqlSession.prepare(reservationSelect.build());

// later
SimpleStatement reservationSelectStatement =
  reservationSelectPrepared.bind("RS2G0Z");

For a complete code sample using the QueryBuilder, see the query-builder-solution branch of the Reservation Service repository.

Object Mapper

You’ve learned several techniques for creating and executing query statements with the driver. There is one final technique to look at that provides a bit more abstraction. The Java driver provides an object mapper that allows you to focus on developing and interacting with domain models (or data types used on APIs). The object mapper works off of annotations in source code that are used to map Java classes to tables or user-defined types (UDTs). The object mapper is a useful tool for abstracting some of the details of interacting with Cassandra, especially if you have an existing domain model.

The mapper is provided in two separate libraries for use at compile time and runtime, so you will need to include additional Maven dependencies in order to use the mapper in your project. You’ll add the following dependency to the compile path of your application:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-mapper-processor</artifactId>
  <version>${driver.version}</version>
</dependency>

You’ll also add the runtime library as a runtime dependency:

<dependency>
  <groupId>com.datastax.oss</groupId>
  <artifactId>java-driver-mapper-runtime</artifactId>
  <version>${driver.version}</version>
</dependency>

The mapper API is based on standard design patterns for data access, including entity classes and Data Access Objects (DAOs). You create an entity class to represent each table in your design, a DAO interface to specify queries on entities, and a mapper interface that helps generate DAO instances. The mapper generates code based on the classes and interfaces you provide.

For a complete example of using the mapper, you’ll want to look at the mapper-solution branch of the Reservation Service repository. We’ll share some of the highlights here. Let’s begin by creating a ReservationsByConfirmation entity class that will represent rows in the reservations_by_confirmation table:

import com.datastax.oss.driver.api.mapper.annotations.Entity;
import com.datastax.oss.driver.api.mapper.annotations.PartitionKey;
import com.datastax.oss.driver.api.mapper.annotations.NamingStrategy;
import static com.datastax.oss.driver.api.mapper.entity.naming.NamingConvention.
  SNAKE_CASE_INSENSITIVE;

@Entity
@NamingStrategy(convention = SNAKE_CASE_INSENSITIVE)
public class ReservationsByConfirmation {

    @PartitionKey
    private String confirmationNumber;

    private String hotelId;
    private LocalDate startDate;
    private LocalDate endDate;
    private short roomNumber;
    private UUID guestId;

    // constructors, get/set methods, hashcode, equals
}

There are several annotations used in this example. The class is denoted as an @Entity, and also as having a @NamingStrategy, which is a way of specifying how the mapper should correlate Java identifiers to CQL. For example, you can specify a SNAKE_CASE_INSENSITIVE convention as in the preceding code, which means that the mapper will convert Java-style class and member names to lowercase, with underscores separating words, which is the recommended CQL naming style. Thus the class name ReservationsByConfirmation will be mapped to the reservations_by_confirmation table, the confirmationNumber member will be mapped to the confirmation_number column, and so on.

The Reservation Service uses an additional entity class ReservationsByHotelDate that is used with the reservations_by_hotel_date table. Its implementation is quite similar, so we won’t reproduce it here.

You can also create entity classes corresponding to UDTs. If your domain model contains classes that reference other classes, you can annotate the referenced classes as user-defined types with the @Entity annotation. The object mapper processes objects recursively using your annotated types.

Next, you’ll create a DAO interface to represent queries on these entity classes:

import com.datastax.oss.driver.api.core.PagingIterable;
import com.datastax.oss.driver.api.mapper.annotations.*;

@Dao
public interface ReservationDao {

    @SelectReservationsByConfirmation
  findByConfirmationNumber(
      String confirmationNumber);

    @Query("SELECT * FROM ${tableId}")
    PagingIterable<ReservationsByConfirmation> findAll();

    @Insert
    void save(ReservationsByConfirmation reservationsByConfirmation);

    @Delete
    void delete(ReservationsByConfirmation reservationsByConfirmation);

    @Select (customWhereClause = "hotel_id = :hotelId AND start_date = :date")
    PagingIterable<ReservationsByHotelDate> findByHotelDate(
            @CqlName("hotel_id") String hotelId,
            @CqlName("start_date") LocalDate date);

    @Insert
    void save(ReservationsByHotelDate reservationsByHotelDate);

    @Delete
    void delete(ReservationsByHotelDate reservationsByHotelDate);
}

The ReservationDao interface is annotated as @Dao, and the various queries are marked with annotations such as @Select, @Insert, @Delete, and @Query.

The next step is to create a Mapper interface that can be used to obtain DAO instances:

import com.datastax.oss.driver.api.mapper.annotations.DaoFactory;
import com.datastax.oss.driver.api.mapper.annotations.Mapper;

@Mapper
public interface ReservationMapper {

    @DaoFactory
    ReservationDao reservationDao();

}

Annotate the interface with @Mapper, and each operation that returns a DAO with @DaoFactory. When you compile the application, the object mapper interprets your annotations to create a ReservationMapperBuilder class that you can invoke to obtain an implementation of ReservationMapper interface that wraps the CqlSession, and from there obtain an object implementing the ReservationDao interface:

ReservationMapper reservationMapper =
  new ReservationMapperBuilder(cqlSession).build();

ReservationDao reservationDao = reservationMapper.reservationDao();

Since the mapper and DAO objects are using your CqlSession, you should reuse them just as you do the CqlSession.

Now you can use the ReservationDao to perform queries using your entity classes. Create a ReservationsByConfirmation object using a simple constructor that you can save using the DAO:

ReservationsByConfirmation reservation = new ReservationsByConfirmation(
  "RS2G0Z", "NY456", "2020-06-08", "2020-06-10", 111,
  UUID.fromString("1b4d86f4-ccff-4256-a63d-45c905df2677"));

reservationDao.save(reservation);

You can use the java.util.UUID.fromString() operation here for convenience; in most applications, the value would have been passed in via a remote invocation.

The Mapper.save() operation is all you need to execute to perform a CQL INSERT or UPDATE, as these are really the same operation to Cassandra. The ReservationDao builds and executes the statement on your behalf.

To retrieve a specific reservation, use the ReservationDao.findByConfirmationNumber() operation, passing in an argument list that matches the partition key:

ReservationsByConfirmation reservation =
  reservationDao.findByConfirmationNumber("RS2G0Z");

Deleting a reservation is also straightforward:

reservationDao.delete(reservation);

The object mapper documentation describes more advanced features, including DAO methods that execute asynchronously, the ability to configure CQL statement options such as TTL or consistency level, and customizing how the mapper handles annotations.

Asynchronous Execution

The CqlSession.execute() operation is synchronous, which means that it blocks until a result is obtained or an error occurs, such as a network timeout. The driver also provides the asynchronous executeAsync() operation to support non-blocking interactions with Cassandra. These non-blocking requests can make it simpler to send multiple queries in parallel to speed performance of your client application.

You could take any of the Statements from the preceding examples and execute them asynchronously:

CompletionStage<AsyncResultSet> resultStage =  cqlSession.executeAsync(statement);

As of the 4.0 release of the DataStax Java driver, the result of executeAsync() and other asynchronous methods is of the type CompletionStage type introduced in Java 8. (Previous versions in the 3.x series relied on the ListenableFuture interface from Google’s Guava framework.) The CompletionStage represents a stage of a computation. These stages can be chained together so that when a stage completes, other dependent stages are triggered.

With the Java driver, the asynchronous APIs can be used to assemble processing chains consisting of CQL queries and code that processes their results. Consider a chain in which the results of a SELECT query are used as inputs to perform a second query. In this example, you might load a reservation you wish to delete from the reservations_by_confirmation table in a preliminary selectStage, in order to obtain the primary key columns you can then use to delete the reservation from the reservations_by_hotel_date table in a subsequent deleteStage:

// Load the reservation by confirmation number
CompletionStage<AsyncResultSet> selectStage = session.executeAsync(
  "SELECT * FROM reservations_by_confirmation WHERE
    confirmation_number=RS2G0Z");

// Use fields of the reservation to delete from other table
CompletionStage<AsyncResultSet> deleteStage =
  selectStage.thenCompose(
    resultSet -> {
      Row reservationRow = resultSet.one();
      return session.executeAsync(SimpleStatement.newInstance(
        "DELETE FROM reservations_by_hotel_date WHERE hotel_id = ? AND
          start_date = ? AND room_number = ?",
        reservationRow.getString("confirm_number"),
        reservationRow.getLocalDate("start_date"),
        reservationRow.getInt("room_number"));
    });

// Check results for success
deleteStage.whenComplete(
    (resultSet, error) -> {
      if (error != null) {
        System.out.printf("Failed to delete: %s\n", error.getMessage());
      } else {
        System.out.println("Delete successful");
      }

We simplified this for readability, as you might wish to use prepared statements, or take advantage of a batch to delete from the reservations_by_confirmation and reservations_by_hotel_date tables at the same time. You can find more extensive code samples using the asynchronous APIs on the async-solution branch of the Reservation Service repository.

In addition to the CqlSession.executeAsync() operation, the driver supports several other asynchronous operations, including CqlSession.closeAsync(), CqlSession.prepareAsync(), and several operations on the object mapper. You can also build the CqlSession asynchronously using CqlSessionBuilder.buildAsync(). For more information, see the Java driver’s asynchronous programming documentation.

Reactive Style Programming

If you’re interested in even more advanced asynchronous programming in Java, you may be familiar with reactive streams, an initiative to provide asynchronous stream processing with non-blocking back pressure. Reactive streams APIs became an official part of the Java platform in JDK 9 under the java.util.concurrent.Flow.* interfaces.

Beginning with the 4.4 release, the Java driver provides built-in support for reactive queries. The CqlSession interface extends a new ReactiveSession interface, which adds methods such as executeReactive() to process queries expressed as reactive streams. To learn more about these APIs, see the Java driver reactive streams documentation.

Driver Configuration

You’ve already looked at a few of the available options for configuring the driver, but now let’s take a step back and look at its overall configuration approach.

File-based configuration

While the CqlSession may be configured programmatically via the CqlSession.Builder class, the Java driver also supports a file-based configuration approach that provides a fuller set of configuration options. File-based configuration is based on the Typesafe Config project, an open source library that provides configuration for JVM languages. In most cases it is preferable to use configuration values based on a configuration file rather than programmatic statements. For example, the configuration values provided previously could be specified in a configuration file such as the one provided for the Reservation Service:

datastax-java-driver {
  basic {
    contact-points = [ "127.0.0.1:9042", "127.0.0.2:9042" ]
    session-keyspace = reservation
  }
}

The configuration file here is written in the Human-Optimized Config Object Notation (HOCON) format. The Java driver uses the conventions of the Typesafe Config library for configuration file locations; it searches the Java classpath for files named application.conf, application.json, or application.properties. The configuration loader is a pluggable interface that you can override to create your own implementation.

Basic configuration options

The Java driver divides configuration values into two categories: basic configuration values that are customized most frequently, and advanced configuration values that are used less frequently. The basic options include the following:

Contact points and keyspace name, as discussed previously
A session-name that will be used in log messages and metrics (if none is provided, they will be generated in the form s1, s2, and so on for each distinct CqlSession created)
The config-reload-interval that specifies how often configuration values will be reloaded from the file (defaults to 5 minutes)
Default parameters applied to each request, including the request.timeout, the request.consistency (consistency level), and the request.page-size, which determines how many rows will be retrieved at a time for larger queries
The load-balancing-policy, which we’ll discuss in in the next section

You can configure advanced options on a CqlSession, including query execution, connection management, security, logging, and metrics. We’ll examine several of these options in later sections. The DataStax documentation provides a reference configuration file, which is an excellent resource for learning about all of the available configuration options.

Load balancing

As discussed in Chapter 6, a query can be made to any node in a cluster, which is then known as the coordinator node for that query. Depending on the contents of the query, the coordinator may communicate with other nodes in order to satisfy the query. If a client directs all of its queries at the same node, this will produce an unbalanced load on the cluster, especially if other clients are doing the same.

To get around this issue, the driver provides a pluggable mechanism that will balance the query load across multiple nodes. Load balancing is implemented by selecting an implementation of this interface:

com.datastax.oss.driver.api.core.loadbalancing.LoadBalancingPolicy

Each LoadBalancingPolicy must provide a distance() operation to classify each node in the cluster as local, remote, or ignored, according to the HostDistance enumeration. The driver prefers interactions with local nodes and maintains more connections to local nodes than remote nodes. The other key operation is newQueryPlan(), which returns a list of nodes in the order they should be queried. The LoadBalancingPolicy interface also contains operations that are used to inform the policy when nodes are added or removed, or go up or down. These operations help the policy avoid including down or removed nodes in query plans.

Versions of the Java driver through the 3.x series provided multiple LoadBalancingPolicy implementations with a composable API that allowed a custom selection of behaviors. Beginning with the 4.0 release, the DataStax Java Driver ships with a single default LoadBalancingPolicy to simplify the developer experience. This default implementation reflects an opinionated point of view based on best practices observed from many deployments, including the following behaviors:

Round-robin queries: The policy allocates requests across the nodes in the cluster in a repeating pattern to spread the processing load (equivalent to the RoundRobinPolicy from the legacy driver).
Token awareness: Whenever you use a PreparedStatement, the policy uses the token value of the partition key in order to select a node that is a replica for the desired data, thus minimizing the number of nodes that must be queried (equivalent to the TokenAwarePolicy from the legacy driver).
Data center awareness: The policy requires setting a local data center. The default load balancing policy will only include nodes in the local data center as part of its query plans. The local data center must be identified explicitly when building the CqlSession via the withLocalDataCenter() operation, or via the configuration property basic.load-balancing-policy.local-datacenter.

This is a difference from the legacy driver, which provided a DCAwareRoundRobinPolicy that would include remote nodes in query plans after local nodes. This was intended as a reliability mechanism in case all replicas in the local data center were unavailable. In practice, however, if all the replicas in a local data center are down, it is typically a broader outage at the data center level, and shifting traffic to other nodes has proven to have undesirable side effects and be difficult to debug.

Should you wish to set a different default LoadBalancingPolicy, you may specify it when building a CqlSession via the withLoadBalancingPolicy() operation, or by configuring the properties in the basic.load-balancing-policy group.

Retrying failed queries

When Cassandra nodes fail or become unreachable, the driver automatically and transparently tries other nodes, and schedules reconnection to the dead nodes in the background according to the configured reconnection policy. The reconnection policy is determined according to the advanced.reconnection-policy configuration options. Two reconnection policies are provided: the ExponentialReconnectionPolicy and the ConstantReconnectionPolicy.

Because temporary changes in network conditions can also make nodes appear offline, the driver also provides a mechanism to retry queries that fail due to protocol or network-related errors. This removes the need to write retry logic in client code.

The driver retries failed queries according to the provided implementation of the com.datastax.oss.driver.api.core.retry.RetryPolicy interface. The onReadTimeout(), onWriteTimeout(), and onUnavailable() operations define the behavior that should be taken when a query fails with protocol- or network-related exceptions ReadTimeoutException, WriteTimeoutException, or UnavailableException, respectively. The onErrorResponse() operation describes the behavior for handling other recoverable server errors, and onRequestAborted() operation handles cases in which the driver aborts a request before the server responds.

The RetryPolicy operations return a RetryDecision, which indicates whether the query should be retried, and if so, at what consistency level. If the exception is not retried, it can be rethrown or ignored, in which case the query operation will return an empty ResultSet.

The 4.0 release of the driver provides a single opinionated implementation of the RetryPolicy based on best practices. Releases through 3.x had a FallthroughRetryPolicy that never recommended retries, and a DowngradingConsistencyRetryPolicy that downgrades the consistency level required on retries, as an attempt to get the query to succeed. The issue with the DowngradingConsistencyRetryPolicy was: if you are willing to accept a downgraded consistency level under some circumstances, do you really require a higher consistency level for the general case?

The RetryPolicy implementation can be overridden using the advanced.retry-policy configuration.

Speculative execution

While it’s great to have a retry mechanism that automates the response to network timeouts, you don’t often have the luxury of being able to wait for timeouts or even long garbage collection pauses. To speed things up, the driver provides a speculative execution feature. If the original coordinator node for a query fails to respond in a predetermined interval, the driver can preemptively start an additional execution of the query against a different coordinator node. When one of the queries returns, the driver provides that response and cancels any other outstanding queries.

Speculative execution is disabled by default via the NoSpeculativeExecutionPolicy, but can be enabled on a CqlSession by setting the ConstantSpeculativeExecutionPolicy. Here’s an example of how you configure this policy in the configuration file by specifying a maximum number of executions and a constant delay between executions (in milliseconds):

advanced.speculative-execution-policy {
  class = ConstantSpeculativeExecutionPolicy
  max-executions = 3
  delay = 100 milliseconds
}

You may create your own policy by implementing the com.datastax.oss.driver.api.core.specex.SpeculativeExecutionPolicy interface.

Connection pooling

Because the CQL native protocol is asynchronous, it allows multiple simultaneous requests per connection; the maximum is 128 simultaneous requests in protocol v2, while v3 and later allow up to 32,768 simultaneous requests. Because of this larger number of simultaneous requests, fewer connections per node are required. In fact, the default is a single connection per node.

Connection pool settings are configurable via the advanced.connection configuration options, including the number of connections to use for local and remote hosts, and the maximum number of simultaneous requests per connection (defaults to 1,024). While the v4 driver does not provide the ability to scale the number of connections up and down as with previous versions, you can adjust these settings by updating the configuration file, and the changes will be applied at the next time the configuration file is reloaded.

The driver uses a connection heartbeat to make sure that connections are not closed prematurely by intervening network devices. This defaults to 30 seconds but can be overridden using the advanced.heartbeat configuration options.

Protocol version

The driver supports multiple versions of the CQL native protocol. Cassandra 4.0 uses CQL protocol version 5, while Cassandra 3.X releases support version 4.

By default, the driver negotiates the protocol version when establishing connections, even correctly handling connections to mixed clusters in which multiple versions of Cassandra are in use. You can force a protocol version using the advanced.protocol.version configuration option.

For example, let’s say your default settings include a request timeout of one second and a consistency level of LOCAL_QUORUM. You could create an execution profile to use with requests that you want to give a stronger consistency by adding this to the profiles section of the configuration file:

datastax-java-driver {
  profiles {
    long_request {
      basic.request.timeout = 3 seconds
      basic.request.consistency = QUORUM
    }
}

Then, you can apply the values to a Statement:

statement.setExecutionProfileName("long_request");

There is also a setExecutionProfileName() operation available when using the SimpleStatementBuilder. Or, if you create a PreparedStatement from a SimpleStatement (using CqlSession.prepare()), any execution profile you have set will be inherited by any BoundStatements created from the PreparedStatement.

Metadata

To access the cluster metadata, invoke the CqlSession.getMetadata() method, which returns an object implementing the com.datastax.oss.driver.api.core.metadata.Metadata interface. This object provides information about the cluster at a snapshot in time, including the nodes in the cluster, the tokens assigned to each node, and the schema, including keyspaces and tables.

Node discovery

A CqlSession maintains a control connection to the first node it connects with, which it uses to maintain information on the state and topology of the cluster. Using this connection, the driver will discover all the nodes currently in the cluster, and you can obtain this information through the Metadata.getNodes() operation, which returns a list of com.datastax.oss.driver.api.core.metadata.Node objects to represent each node. You can view the state of each node through the Node.getState() operation, or you can register an implementation of the com.datastax.oss.driver.api.core.metadata.NodeStateListener interface to receive callbacks when nodes are added or removed from the cluster, or when they are up or down. This state information is also viewable in the driver logs, which we’ll discuss shortly.

Schema access

The Metadata class also allows the client to learn about the schema in a cluster, including operations that provide descriptions of individual keyspaces and tables. The schema version in use in a cluster can change over time as keyspaces and tables are created, altered, and deleted.

We discussed Cassandra’s support for eventual consistency at great length in Chapter 2. Because schema information is itself stored using Cassandra, it is also eventually consistent, and as a result it is possible for different nodes to have temporarily different versions of the schema. The driver has internal safeguards to check for schema agreement before initiating any statement that would change the schema. The driver provides a notification mechanism for clients to learn about schema changes by registering a com.datastax.oss.driver.api.core.metadata.schema.SchemaChangeListener with the CqlSession as it is built using the withSchemaChangeListener() operation on the builder, or via the advanced.schema-change-listener configuration option.

In addition to the schema access you’ve just examined in the Metadata class, the Java driver also provides a facility for managing schema in the com.datastax.oss.driver.api.querybuilder package. The SchemaBuilder provides a fluent-style API for creating Statements representing operations such as CREATE, ALTER, and DROP on keyspaces, tables, indexes, and user-defined types (UDTs).

For example, you could create the reservations_by_confirmation table using the createTable() schema builder:

import static com.datastax.oss.driver.api.querybuilder.SchemaBuilder.createTable;
import com.datastax.oss.driver.api.core.type.DataTypes;

cqlSession.execute(createTable("reservation", "reservations_by_confirmation")
  .ifNotExists()
  .withPartitionKey("confirmation_number, DataTypes.TEXT)
  .withColumn("hotel_id", DataTypes.TEXT)
  .withColumn("start_date", DataTypes.DATE)
  .withColumn("end_date", DataTypes.DATE)
  .withColumn("room_number", DataTypes.SMALLINT)
  .withColumn("guest_id", DataTypes.UUID)
  .build());

Managing Case-Sensitive Identifiers with the Java Driver

As you learned in Chapter 4, CQL is case-sensitive by default. While the practice is generally discouraged, it is possible to create case-sensitive names for keyspaces, tables, and columns by using quotes around identifiers in CQL. In order to simplify the handling of case sensitivity, the Java driver uses the CqlIdentifier class as a wrapper for all identifiers in its schema API. If you are writing code that manipulates schema, it’s a good practice to make use of these identifiers as well. Java Driver APIs that accept identifiers as arguments support both Java String (as shown previously) and CqlIdentifier formats (as shown in the Reservation Service implementation).

Debugging and Monitoring

The driver provides features for monitoring and debugging your client’s use of Cassandra, including facilities for logging and metrics. There are also capabilities for query tracing and tracking slow queries, which you’ll learn about in Chapter 13.

Driver logging

As you will learn in Chapter 11, Cassandra uses a logging API called Simple Logging Facade for Java (SLF4J). The Java driver uses the SLF4J API for logging as well. In order to enable logging on your Java client application, you need to provide a compliant SLF4J implementation on the classpath, such as Logback (used by the Reservation Service) or Log4j. The Java driver provides information at multiple levels; the ERROR, WARN, and INFO levels are the most useful to application developers.

You configure logging by taking advantage of Logback’s configuration mechanism, which supports separate configuration for test and production environments. Logback inspects the classpath first for the file logback-test.xml representing the test configuration, and then if no test configuration is found, it searches for the file logback.xml. Here’s an example extract from a logback.xml configuration file that enables the INFO log level for the Java driver:

<configuration>
  <!-- other appenders and loggers -->
  <logger name="com.datastax.oss.driver" level="INFO"/>
</configuration>

For more detail on Logback configuration, including sample configuration files for test and production environments, see the configuration page or the Reservation Service implementation.

Driver metrics

Sometimes it can be helpful to monitor the behavior of client applications over time in order to detect abnormal conditions and debug errors. The Java driver collects metrics on its activities and makes these available using the Dropwizard Metrics library. The driver reports metrics on connections, task queues, queries, and errors such as connection errors, read and write timeouts, retries, and speculative executions. A full list of metrics is available in the reference configuration.

You can access the Java driver metrics locally via the CqlSession.getMetrics() operation. The Metrics library can also integrate with the Java Management Extensions (JMX) to allow remote monitoring of metrics. We’ll discuss the remote monitoring of metrics from Cassandra nodes in Chapter 11, and the same techniques apply to gathering metrics from client applications. JMX reporting is disabled by default in the v4 drivers (it was enabled by default in v3), but can be configured.

Other Cassandra Drivers

There are several drivers available for other programming languages:

DataStax Python Driver

Summary

You should now understand the various drivers available for Cassandra, the features they provide, and how to install and use them. We gave particular attention to the DataStax Java Driver in order to get some hands-on experience, which should serve you well even if you choose to use one of the other DataStax or community drivers. You’ll continue to learn other driver features in the coming chapters as we discuss more details of reading and writing.