Now that we’ve looked at how to design a microservice architecture for a hotel application, let’s look at how you might implement one of the services within that application—the Reservation Service. To write an application using Cassandra, you’re going to need a driver, and thankfully you are in good hands.
You’re likely used to connecting to relational databases using drivers. For example, in Java, JDBC is an API that abstracts the vendor implementation of the relational database to present a consistent way of storing and retrieving data using Statements
, PreparedStatements
, ResultSets
, and so forth. To interact with the database, you get a driver that works with the particular database you’re using, such as Oracle, SQL Server, or MySQL; the implementation details of this interaction are hidden from the developer.
There are a number of client drivers available for Cassandra as well, including support for most popular languages. There are benefits to these clients, in that you can easily embed them in your own applications, and that they frequently offer more features than the CQL native interface does, including connection pooling and JMX integration and monitoring. In the following sections, you’ll learn about the various clients available and the features they offer.
The introduction of CQL was the impetus for a major shift in the landscape of Cassandra client drivers. The simplicity and familiar syntax of CQL made the development of client programs similar to traditional relational database drivers. DataStax made a strategic investment of open source drivers for Java and several additional languages in order to fuel Cassandra adoption. These drivers quickly became the de facto standard for new development projects. You can access the drivers as well as additional connectors and tools at https://github.com/datastax.
Visit the driver matrix page to access documentation and identify driver versions that are compatible with your server version.
The DataStax Java Driver is the oldest and most popular of these drivers, and typically the driver in which new features appear first. For this reason, we’ll focus on using the Java driver and use this as an opportunity to learn about the features that are provided by the DataStax drivers across multiple languages.
First, you’ll need to access the driver in your development environment. You could download the driver directly from the URL listed and manage the dependencies manually, but it is more typical in modern Java development to use a tool like Maven or Gradle to manage dependencies. If you’re using Maven, you’ll need to add something like the following to your project pom.xml file, while specifying a value for the driver version:
<dependency> <groupId>com.datastax.oss</groupId> <artifactId>java-driver-core</artifactId> <version>${driver.version}</version> </dependency>
You can find information in the online documentation manuals for the Java drivers, as well as for the documentation for Javadoc for the Java driver. Alternatively, the Javadocs are also part of the source distribution.
All of the DataStax drivers are managed as open source projects on GitHub. If you’re interested in seeing the Java driver source, you can get a read-only trunk version using this command:
$ git clone https://github.com/datastax/java-driver.git
If you’re interested in learning more about the internals of the driver, or even potentially contributing to the project, the DataStax documentation site also has a developer guide.
The 4.0 release of the Java driver included significant breaking changes to the API and configuration of the driver in order to simplify application development and discourage configurations contrary to best practices. This book conforms to the newer APIs. The “Clients” chapter in the second edition of this book remains a good resource for those using the Java Driver 3.x and earlier.
In September 2019, DataStax announced a significant change to its driver strategy. Prior to that point, DataStax had maintained separate open source and enterprise drivers for use with Apache Cassandra and DataStax Enterprise, respectively. In early 2020, the codebases for the drivers in each of the supported languages were merged, bringing the benefits of several performance and availability improvements that were previously only available to DSE customers. DSE-specific driver features are out of the scope of this book, but are well documented on the sites we’ve referenced.
Once you’ve configured your environment, it’s time to start coding. We’ll base the code samples for this chapter around the Reservation Service, a microservice implementation based on the hotel data model introduced in Chapter 5, and the corresponding application design discussed in Chapter 7. The source code for the Reservation Service is available at https://github.com/jeffreyscarpenter/reservation-service.
To start building your application, you’ll use the driver’s API to connect to a cluster. In the Java driver, connectivity to a cluster is represented by the com.datastax.oss.driver.api.core.CqlSession
class.
The CqlSession
class is the main entry point of the driver. It supports a fluent-style API using the builder pattern. For example, the following line creates a CqlSession
that will attempt to connect to a Cassandra node on the local host at the default Cassandra native protocol port number:
CqlSession cqlSession = CqlSession.builder() .addContactPoint(new InetSocketAddress("127.0.0.1", 9042)) .build()
Previous versions of DataStax drivers supported the concept of a Cluster
object used to create Session
objects. Recent driver versions (for example, the 4.0 Java driver and later) have combined Cluster
and Session
into CqlSession
.
In the terminology of the driver, the nodes you explicitly identify when creating a CqlSession
are known as contact points. Contact points are similar to the concept of seed nodes that a Cassandra node uses to connect to other nodes in the same cluster.
The minimum required information to create a CqlSession
is a single contact point. The driver defaults to a single contact point consisting of the local host and default port, so this statement is equivalent to the previous one (unless you are using file-based configuration, as we describe in “File-based configuration”):
CqlSession cqlSession = CqlSession.builder().build()
While this configuration is useful for development, when you might be running a Cassandra node on your local machine, for production environments you’ll want to specify multiple contact points. This is a good practice in case one of the nodes you pick happens to be down when the client application is attempting to create a CqlSession
. You’ll also need to specify the name of the local data center. We’ll discuss naming data centers in Chapter 10.
CqlSession cqlSession = CqlSession.builder() .addContactPoint(new InetSocketAddress("<some IP address>", 9042)) .addContactPoint(new InetSocketAddress("<another IP address>", 9042)) .withLocalDatacenter("<data center name>") .build()
When you create a CqlSession
, the driver connects to one of the configured contact points to obtain metadata about the cluster. This action will throw a NoHostAvailableException
if none of the contact points is available, or an AuthenticationException
if authentication fails. We’ll discuss authentication in more detail in Chapter 14.
You can optionally provide the name of a keyspace to connect to, as in this example that connects to the reservation
keyspace:
CqlSession cqlSession = CqlSession.builder() .addContactPoint(new InetSocketAddress("<some IP address>", 9042)) .addContactPoint(new InetSocketAddress("<another IP address>", 9042)) .withKeyspace("reservation") .build()
If you do not specify a keyspace name when creating the CqlSession
, you’ll have to qualify every table reference in your queries with the appropriate keyspace name.
Each CqlSession
manages connections to a Cassandra cluster, which are used to execute queries and control operations using the Cassandra native protocol. The CqlSession
contains a pool of TCP connections for each host.
Because a CqlSession
maintains TCP connections to multiple nodes, it is a relatively heavyweight object. In most cases, you’ll want to create a single CqlSession
and reuse it throughout your application, rather than continually building up and tearing down CqlSessions
. Another acceptable option is to create a CqlSession
per keyspace, if your application is accessing multiple keyspaces.
Once you have created a CqlSession
to connect to a cluster, you’re ready to perform reads or writes. To begin doing some real application work, you’ll create and execute CQL statements using implementations of Statement
. Statement
is an interface with several implementations, including SimpleStatement
, BoundStatement
, and BatchStatement
.
The simplest way to create and execute a statement is to call the CqlSession.execute()
operation with a string representing the statement. Here’s an example of a statement that will return the entire contents of the reservations
table:
cqlSession.execute("SELECT * from reservation.reservations_by_confirmation");
This statement creates and executes a query in a single method call. In practice, this could turn out to be a very expensive query to execute in a large database, but it does serve as a useful example of a very simple query. Most queries will be more complex, as you’ll have search criteria to specify, or specific values to insert. You can certainly use Java’s various string utilities to build up the syntax of your query by hand, but this, of course, is error prone. It may even expose your application to injection attacks if you’re not careful to sanitize strings that come from end users.
Thankfully, you needn’t make things so hard on yourself. The Java driver provides the SimpleStatement
class to help construct parameterized statements. As it turns out, the execute()
operation is a convenience method for creating a SimpleStatement
. The previous code is equivalent to the following, using the SimpleStatement.newInstance()
method:
cqlSession.execute(SimpleStatement.newInstance( "SELECT * from reservation.reservations_by_confirmation"));
The newInstance()
is most useful in cases where you already have a set query string. Let’s try building a query with variable parameters using a SimpleStatementBuilder
. Here’s an example of a statement that will insert a row in the reservations
table, which you can then execute:
SimpleStatement reservationInsert = SimpleStatement.builder( "INSERT INTO reservations_by_confirmation (confirmation_number, hotel_id, start_date, end_date, room_number, guest_id) VALUES (?, ?, ?, ?, ?, ?)") .addPositionalValue("RS2G0Z") .addPositionalValue("NY456") .addPositionalValue("2020-06-08") .addPositionalValue("2020-06-10") .addPositionalValue(111) .addPositionalValue("1b4d86f4-ccff-4256-a63d-45c905df2677") .build(); cqlSession.execute(reservationInsert);
The first parameter to the call is the basic syntax of your query, indicating the table and columns you are interested in. The question marks are used to indicate values that you’ll be providing in additional parameters. You use simple strings to hold the values of the hotel ID, name, and phone number.
If you’ve created your statement correctly, the insert will execute successfully (and silently). Now let’s create another statement to read back the row you just inserted:
SimpleStatement reservationSelect = SimpleStatement.builder( "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=?") .addPositionalValue("RS2G0Z") .build(); ResultSet reservationSelectResult = cqlSession.execute(reservationSelect);
Again, you make use of parameterization to provide the ID for the search. This time, when you execute the query, make sure to receive the ResultSet
that is returned from the execute()
method. You can iterate through the rows returned by the ResultSet
as follows:
for (Row row : reservationSelectResult) { System.out.format("confirmation_number: %s, hotel_id: %, start_date: %s, end_date %s, room_number: %i, guest_id: %s\n", row.getString("confirmation_number"), row.getString("hotel_id"), row.getLocalDate("start_date"), row.getLocalDate("end_date"), row.getInt("room_number"), row.getUuid("guest_id")); }
This code uses the ResultSet.iterator()
option to get an Iterator
over the rows in the result set and loop over each row, printing out the desired column values. Note that you use special accessors to obtain the value of each column, depending on the desired type—in this case, Row.getString()
, getInt()
, and getUuid()
. As you might expect, this will print out a result such as:
confirmation_number: RS2G0Z, hotel_id: NY456, start_date: 2020-06-08, end_date: 2020-06-10, room_number: 111, guest_id: 1b4d86f4-ccff-4256-a63d-45c905df2677
Of course, you typically will set columns to values you receive as variables, rather than the hardcoded value used here. You can find code samples for working with SimpleStatements
on the simple-statement-solution
branch of the Reservation Service repository.
While SimpleStatements
are quite useful for creating ad hoc queries, most applications tend to perform the same set of queries repeatedly. The PreparedStatement
is designed to handle these queries more efficiently. The structure of the statement is sent to nodes a single time for preparation, and a handle for the statement is returned. To use the prepared statement, only the handle and the parameters need to be sent.
As you’re building your application, you’ll typically create PreparedStatements
for reading data, corresponding to each access pattern you derive in your data model, plus others for writing data to your tables to support those access patterns.
Let’s create some PreparedStatements
to represent the same reservation queries as before, using the CqlSession.prepare()
operation:
PreparedStatement reservationInsertPrepared = cqlSession.prepare( "INSERT INTO reservations_by_confirmation (confirmation_number, hotel_id, start_date, end_date, room_number, guest_id) VALUES (?, ?, ?, ?, ?, ?)"); PreparedStatement reservationSelectPrepared = cqlSession.prepare( "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=?");
Note that the PreparedStatement
uses the same parameterized syntax used earlier for the SimpleStatement
. A key difference, however, is that a PreparedStatement
is not a subtype of Statement
. This prevents the error of trying to pass an unbound PreparedStatement
to the CqlSession
to execute. Note that there is also a variant of CqlSession.prepare()
that accepts a parameterized SimpleStatement
as input.
Let’s take a step back and discuss what is happening behind the scenes of the CqlSession.prepare()
operation:
The driver passes the contents of your PreparedStatement
to a Cassandra node and gets back a unique identifier for the statement. This unique identifier is referenced when you create a BoundStatement
. If you’re curious, you can actually see this reference by calling PreparedStatement.getId()
.
Once the driver prepares the statement on one node, it proceeds to prepare the statement on the other nodes in the cluster. Nodes keep track of prepared statements internally. In earlier releases, prepared statements were stored in a cache, but beginning with the 3.10 release, each Cassandra node stores prepared statements in a local table so that they are present if the node goes down and comes back up.
The driver also provides the advanced.prepared-statements.reprepare-on-up
configuration options; this is primarily useful if your cluster is using a release prior to Cassandra 3.10. If re-preparation is enabled (the default), the driver will re-prepare statements on nodes that have come back up.
If the driver tries to execute a PreparedStatement
on a node where it has not been prepared, the driver automatically prepares the statement, at the cost of an additional round trip between the driver and the node.
You can think of a PreparedStatement
as a template for creating queries. In addition to specifying the form of your query, there are other attributes that you can set on a PreparedStatement
that will be used as defaults for statements it is used to create, including a default consistency level, retry policy, and tracing.
In addition to improving efficiency, PreparedStatements
also improve security by separating the query logic of CQL from the data. This provides protection against injection attacks, which attempt to embed commands into data fields in order to gain unauthorized access.
Now your PreparedStatement
is available to use to create queries. In order to make use of a PreparedStatement
, you bind it with actual values by calling the bind()
operation. For example, you can bind the SELECT
statement created earlier as follows:
BoundStatement reservationSelectBound = reservationSelectPrepared.bind("RS2G0Z");
The bind()
operation used here allows you to provide values that match each variable in the PreparedStatement
. It is possible to provide the first n bound values, in which case the remaining values must be bound separately before executing the statement. There is also a version of bind()
that takes no parameters, in which case all of the parameters must be bound separately. There are several set()
operations provided by BoundStatement
that can be used to bind values of different types. For example, you can take the INSERT
prepared statement from earlier and bind the name and phone values using the setString()
operation:
BoundStatement reservationInsertBound = reservationInsertPrepared.bind() .setString("confirmation_number", "RS2G0Z") .setString("hotel_id", "NY456") .setLocalDate("start_date", "2020-06-08") .setLocalDate("end_date", "2020-06-10") .setShort(111) .setUuid("1b4d86f4-ccff-4256-a63d-45c905df2677")
Once you have bound all of the values, execute a BoundStatement
using CqlSession.execute()
. If you have failed to bind any of the values, they will be ignored on the server side, if protocol v4 (Cassandra 3.0 or later) is in use. The driver behavior for older protocol versions is to throw an IllegalStateException
if there are any unbound values.
You can find code samples for working with PreparedStatement
and BoundStatement
on the prepared-statement-solution
branch of the Reservation Service repository.
The driver also provides a QueryBuilder
, which uses a fluent-style API for creating queries programmatically. This is especially useful for cases where there is variation in the query structure (such as optional parameters) that would make using PreparedStatements
difficult. Similar to PreparedStatement
, it also provides some protection against injection attacks.
To use the QueryBuilder
, you’ll need to include an additional dependency, for example, in a Maven POM file:
<dependency> <groupId>com.datastax.oss</groupId> <artifactId>java-driver-query-builder</artifactId> <version>${driver.version}</version> </dependency>
The QueryBuilder
provides a set of static methods to facilitate building different types of statements represented by different classes. The common usage is to import the static methods of the QueryBuilder
class:
import static com.datastax.oss.driver.api.querybuilder.QueryBuilder.*;
Importing methods statically improves code readability, as you’ll see as you look at some examples.
The QueryBuilder
produces objects that implement the com.datastax.oss.driver.api.querybuilder.BuildableQuery
interface and its sub-interfaces, such as Select
, Insert
, Update
, Delete
, and others. The methods on these interfaces return objects that represent the content of a query as it is being built up. You’ll likely find your IDE quite useful in helping to identify the allowed operations as you’re building queries.
Let’s reproduce the queries from before using the QueryBuilder
to see how it works. First, build a CQL INSERT
query:
Insert reservationInsert = insertInto("reservation", "reservations_by_confirmation") .value("confirmation_number", "RS2G0Z") .value("hotel_id", "NY456") .value("start_date", "2020-06-08") .value("end_date", "2020-06-10") .value("room_number", 111) .value("guest_id", "1b4d86f4-ccff-4256-a63d-45c905df2677"); SimpleStatement reservationInsertStatement = reservationInsert.build();
The first operation calls the QueryBuilder.insertInto()
operation to create an Insert
statement for the reservations_by_confirmation
table. Then use the Insert.value()
operation repeatedly to specify values for each column you are inserting. The Insert.build()
operation returns a SimpleStatement
you can then pass to CqlSession.execute()
.
The construction of the CQL SELECT
command is similar:
Select reservationSelect = selectFrom("reservation", "reservations_by_confirmation") .all() .whereColumn("confirmation_number").isEqualTo("RS2G0Z"); SimpleStatement reseravationSelectStatement = reservationSelect.build();
For this query, call QueryBuilder.selectFrom()
to create a Select
statement. You use the Select.all()
operation to select all columns, although you could also have used the column()
operation to select specific columns. Add a CQL WHERE
clause via the Select.whereColumn()
operation, to which you pass the name of the column and then add an equality check for the confirmation number, using the isEqualTo()
operation.
This sample demonstrates how you can use the QueryBuilder
to create a PreparedStatement
instead of a SimpleStatement
, using the concept of a bind marker as a placeholder for a value to be specified when the PreparedStatement
is bound:
Select reservationSelect = selectFrom("reservation", "reservations_by_confirmation") .all() .whereColumn("confirmation_number").isEqualTo(bindMarker()); PreparedStatement reservationSelectPrepared = cqlSession.prepare(reservationSelect.build()); // later SimpleStatement reservationSelectStatement = reservationSelectPrepared.bind("RS2G0Z");
For a complete code sample using the QueryBuilder
, see the query-builder-solution
branch of the Reservation Service repository.
You’ve learned several techniques for creating and executing query statements with the driver. There is one final technique to look at that provides a bit more abstraction. The Java driver provides an object mapper that allows you to focus on developing and interacting with domain models (or data types used on APIs). The object mapper works off of annotations in source code that are used to map Java classes to tables or user-defined types (UDTs). The object mapper is a useful tool for abstracting some of the details of interacting with Cassandra, especially if you have an existing domain model.
The mapper is provided in two separate libraries for use at compile time and runtime, so you will need to include additional Maven dependencies in order to use the mapper in your project. You’ll add the following dependency to the compile path of your application:
<dependency> <groupId>com.datastax.oss</groupId> <artifactId>java-driver-mapper-processor</artifactId> <version>${driver.version}</version> </dependency>
You’ll also add the runtime library as a runtime dependency:
<dependency> <groupId>com.datastax.oss</groupId> <artifactId>java-driver-mapper-runtime</artifactId> <version>${driver.version}</version> </dependency>
The mapper API is based on standard design patterns for data access, including entity classes and Data Access Objects (DAOs). You create an entity class to represent each table in your design, a DAO interface to specify queries on entities, and a mapper interface that helps generate DAO instances. The mapper generates code based on the classes and interfaces you provide.
For a complete example of using the mapper, you’ll want to look at the mapper-solution
branch of the Reservation Service repository. We’ll share some of the highlights here. Let’s begin by creating a ReservationsByConfirmation
entity class that will represent rows in the reservations_by_confirmation
table:
import com.datastax.oss.driver.api.mapper.annotations.Entity; import com.datastax.oss.driver.api.mapper.annotations.PartitionKey; import com.datastax.oss.driver.api.mapper.annotations.NamingStrategy; import static com.datastax.oss.driver.api.mapper.entity.naming.NamingConvention. SNAKE_CASE_INSENSITIVE; @Entity @NamingStrategy(convention = SNAKE_CASE_INSENSITIVE) public class ReservationsByConfirmation { @PartitionKey private String confirmationNumber; private String hotelId; private LocalDate startDate; private LocalDate endDate; private short roomNumber; private UUID guestId; // constructors, get/set methods, hashcode, equals }
There are several annotations used in this example. The class is denoted as an @Entity
, and also as having a @NamingStrategy
, which is a way of specifying how the mapper should correlate Java identifiers to CQL. For example, you can specify a SNAKE_CASE_INSENSITIVE
convention as in the preceding code, which means that the mapper will convert Java-style class and member names to lowercase, with underscores separating words, which is the recommended CQL naming style. Thus the class name ReservationsByConfirmation
will be mapped to the reservations_by_confirmation
table, the confirmationNumber
member will be mapped to the confirmation_number
column, and so on.
The Reservation Service uses an additional entity class ReservationsByHotelDate
that is used with the reservations_by_hotel_date
table. Its implementation is quite similar, so we won’t reproduce it here.
You can also create entity classes corresponding to UDTs. If your domain model contains classes that reference other classes, you can annotate the referenced classes as user-defined types with the @Entity
annotation. The object mapper processes objects recursively using your annotated types.
Next, you’ll create a DAO interface to represent queries on these entity classes:
import com.datastax.oss.driver.api.core.PagingIterable; import com.datastax.oss.driver.api.mapper.annotations.*; @Dao public interface ReservationDao { @SelectReservationsByConfirmation findByConfirmationNumber( String confirmationNumber); @Query("SELECT * FROM ${tableId}") PagingIterable<ReservationsByConfirmation> findAll(); @Insert void save(ReservationsByConfirmation reservationsByConfirmation); @Delete void delete(ReservationsByConfirmation reservationsByConfirmation); @Select (customWhereClause = "hotel_id = :hotelId AND start_date = :date") PagingIterable<ReservationsByHotelDate> findByHotelDate( @CqlName("hotel_id") String hotelId, @CqlName("start_date") LocalDate date); @Insert void save(ReservationsByHotelDate reservationsByHotelDate); @Delete void delete(ReservationsByHotelDate reservationsByHotelDate); }
The ReservationDao
interface is annotated as @Dao
, and the various queries are marked with annotations such as @Select
, @Insert
, @Delete
, and @Query
.
The next step is to create a Mapper
interface that can be used to obtain DAO instances:
import com.datastax.oss.driver.api.mapper.annotations.DaoFactory; import com.datastax.oss.driver.api.mapper.annotations.Mapper; @Mapper public interface ReservationMapper { @DaoFactory ReservationDao reservationDao(); }
Annotate the interface with @Mapper
, and each operation that returns a DAO with @DaoFactory
. When you compile the application, the object mapper interprets your annotations to create a ReservationMapperBuilder
class that you can invoke to obtain an implementation of ReservationMapper
interface that wraps the CqlSession
, and from there obtain an object implementing the ReservationDao
interface:
ReservationMapper reservationMapper = new ReservationMapperBuilder(cqlSession).build(); ReservationDao reservationDao = reservationMapper.reservationDao();
Since the mapper and DAO objects are using your CqlSession
, you should reuse them just as you do the CqlSession
.
Now you can use the ReservationDao
to perform queries using your entity classes. Create a ReservationsByConfirmation
object using a simple constructor that you can save using the DAO:
ReservationsByConfirmation reservation = new ReservationsByConfirmation( "RS2G0Z", "NY456", "2020-06-08", "2020-06-10", 111, UUID.fromString("1b4d86f4-ccff-4256-a63d-45c905df2677")); reservationDao.save(reservation);
You can use the java.util.UUID.fromString()
operation here for convenience; in most applications, the value would have been passed in via a remote invocation.
The Mapper.save()
operation is all you need to execute to perform a CQL INSERT
or UPDATE
, as these are really the same operation to Cassandra. The ReservationDao
builds and executes the statement on your behalf.
To retrieve a specific reservation, use the ReservationDao.findByConfirmationNumber()
operation, passing in an argument list that matches the partition key:
ReservationsByConfirmation reservation = reservationDao.findByConfirmationNumber("RS2G0Z");
Deleting a reservation is also straightforward:
reservationDao.delete(reservation);
The object mapper documentation describes more advanced features, including DAO methods that execute asynchronously, the ability to configure CQL statement options such as TTL
or consistency level, and customizing how the mapper handles annotations.
The CqlSession.execute()
operation is synchronous, which means that it blocks until a result is obtained or an error occurs, such as a network timeout. The driver also provides the asynchronous executeAsync()
operation to support non-blocking interactions with Cassandra. These non-blocking requests can make it simpler to send multiple queries in parallel to speed performance of your client application.
You could take any of the Statements
from the preceding examples and execute them asynchronously:
CompletionStage<AsyncResultSet> resultStage = cqlSession.executeAsync(statement);
As of the 4.0 release of the DataStax Java driver, the result of executeAsync()
and other asynchronous methods is of the type CompletionStage
type introduced in Java 8. (Previous versions in the 3.x series relied on the ListenableFuture
interface from Google’s Guava framework.) The CompletionStage
represents a stage of a computation. These stages can be chained together so that when a stage completes, other dependent stages are triggered.
With the Java driver, the asynchronous APIs can be used to assemble processing chains consisting of CQL queries and code that processes their results. Consider a chain in which the results of a SELECT
query are used as inputs to perform a second query. In this example, you might load a reservation you wish to delete from the reservations_by_confirmation
table in a preliminary selectStage
, in order to obtain the primary key columns you can then use to delete the reservation from the reservations_by_hotel_date
table in a subsequent deleteStage
:
// Load the reservation by confirmation number CompletionStage<AsyncResultSet> selectStage = session.executeAsync( "SELECT * FROM reservations_by_confirmation WHERE confirmation_number=RS2G0Z"); // Use fields of the reservation to delete from other table CompletionStage<AsyncResultSet> deleteStage = selectStage.thenCompose( resultSet -> { Row reservationRow = resultSet.one(); return session.executeAsync(SimpleStatement.newInstance( "DELETE FROM reservations_by_hotel_date WHERE hotel_id = ? AND start_date = ? AND room_number = ?", reservationRow.getString("confirm_number"), reservationRow.getLocalDate("start_date"), reservationRow.getInt("room_number")); }); // Check results for success deleteStage.whenComplete( (resultSet, error) -> { if (error != null) { System.out.printf("Failed to delete: %s\n", error.getMessage()); } else { System.out.println("Delete successful"); }
We simplified this for readability, as you might wish to use prepared statements, or take advantage of a batch to delete from the reservations_by_confirmation
and reservations_by_hotel_date
tables at the same time. You can find more extensive code samples using the asynchronous APIs on the async-solution
branch of the Reservation Service repository.
In addition to the CqlSession.executeAsync()
operation, the driver supports several other asynchronous operations, including CqlSession.closeAsync()
, CqlSession.prepareAsync()
, and several operations on the object mapper. You can also build the CqlSession
asynchronously using CqlSessionBuilder.buildAsync()
. For more information, see the Java driver’s asynchronous programming documentation.
If you’re interested in even more advanced asynchronous programming in Java, you may be familiar with reactive streams, an initiative to provide asynchronous stream processing with non-blocking back pressure. Reactive streams APIs became an official part of the Java platform in JDK 9 under the java.util.concurrent.Flow.*
interfaces.
Beginning with the 4.4 release, the Java driver provides built-in support for reactive queries. The CqlSession
interface extends a new ReactiveSession
interface, which adds methods such as executeReactive()
to process queries expressed as reactive streams. To learn more about these APIs, see the Java driver reactive streams documentation.
You’ve already looked at a few of the available options for configuring the driver, but now let’s take a step back and look at its overall configuration approach.
While the CqlSession
may be configured programmatically via the CqlSession.Builder
class, the Java driver also supports a file-based configuration approach that provides a fuller set of configuration options. File-based configuration is based on the Typesafe Config project, an open source library that provides configuration for JVM languages. In most cases it is preferable to use configuration values based on a configuration file rather than programmatic statements. For example, the configuration values provided previously could be specified in a configuration file such as the one provided for the Reservation Service:
datastax-java-driver { basic { contact-points = [ "127.0.0.1:9042", "127.0.0.2:9042" ] session-keyspace = reservation } }
The configuration file here is written in the Human-Optimized Config Object Notation (HOCON) format. The Java driver uses the conventions of the Typesafe Config library for configuration file locations; it searches the Java classpath for files named application.conf, application.json, or application.properties. The configuration loader is a pluggable interface that you can override to create your own implementation.
The Java driver divides configuration values into two categories: basic
configuration values that are customized most frequently, and advanced
configuration values that are used less frequently. The basic options include the following:
Contact points and keyspace name, as discussed previously
A session-name
that will be used in log messages and metrics (if none is provided, they will be generated in the form s1
, s2
, and so on for each distinct CqlSession
created)
The config-reload-interval
that specifies how often configuration values will be reloaded from the file (defaults to 5 minutes
)
Default parameters applied to each request
, including the request.timeout
, the request.consistency
(consistency level), and the request.page-size
, which determines how many rows will be retrieved at a time for larger queries
The load-balancing-policy
, which we’ll discuss in in the next section
You can configure advanced options on a CqlSession
, including query execution, connection management, security, logging, and metrics. We’ll examine several of these options in later sections. The DataStax documentation provides a reference configuration file, which is an excellent resource for learning about all of the available configuration options.
As discussed in Chapter 6, a query can be made to any node in a cluster, which is then known as the coordinator node for that query. Depending on the contents of the query, the coordinator may communicate with other nodes in order to satisfy the query. If a client directs all of its queries at the same node, this will produce an unbalanced load on the cluster, especially if other clients are doing the same.
To get around this issue, the driver provides a pluggable mechanism that will balance the query load across multiple nodes. Load balancing is implemented by selecting an implementation of this interface:
com.datastax.oss.driver.api.core.loadbalancing.LoadBalancingPolicy
Each LoadBalancingPolicy
must provide a distance()
operation to classify each node in the cluster as local, remote, or ignored, according to the HostDistance
enumeration. The driver prefers interactions with local nodes and maintains more connections to local nodes than remote nodes. The other key operation is newQueryPlan()
, which returns a list of nodes in the order they should be queried. The LoadBalancingPolicy
interface also contains operations that are used to inform the policy when nodes are added or removed, or go up or down. These operations help the policy avoid including down or removed nodes in query plans.
Versions of the Java driver through the 3.x series provided multiple LoadBalancingPolicy
implementations with a composable API that allowed a custom selection of behaviors. Beginning with the 4.0 release, the DataStax Java Driver ships with a single default LoadBalancingPolicy
to simplify the developer experience. This default implementation reflects an opinionated point of view based on best practices observed from many deployments, including the following behaviors:
The policy allocates requests across the nodes in the cluster in a repeating pattern to spread the processing load (equivalent to the RoundRobinPolicy
from the legacy driver).
Whenever you use a PreparedStatement
, the policy uses the token value of the partition key in order to select a node that is a replica for the desired data, thus minimizing the number of nodes that must be queried (equivalent to the TokenAwarePolicy
from the legacy driver).
The policy requires setting a local data center. The default load balancing policy will only include nodes in the local data center as part of its query plans. The local data center must be identified explicitly when building the CqlSession
via the withLocalDataCenter()
operation, or via the configuration property basic.load-balancing-policy.local-datacenter
.
This is a difference from the legacy driver, which provided a DCAwareRoundRobinPolicy
that would include remote nodes in query plans after local nodes. This was intended as a reliability mechanism in case all replicas in the local data center were unavailable. In practice, however, if all the replicas in a local data center are down, it is typically a broader outage at the data center level, and shifting traffic to other nodes has proven to have undesirable side effects and be difficult to debug.
Should you wish to set a different default LoadBalancingPolicy
, you may specify it when building a CqlSession
via the withLoadBalancingPolicy()
operation, or by configuring the properties in the basic.load-balancing-policy
group.
When Cassandra nodes fail or become unreachable, the driver automatically and transparently tries other nodes, and schedules reconnection to the dead nodes in the background according to the configured reconnection policy. The reconnection policy is determined according to the advanced.reconnection-policy
configuration options. Two reconnection policies are provided: the ExponentialReconnectionPolicy
and the ConstantReconnectionPolicy
.
Because temporary changes in network conditions can also make nodes appear offline, the driver also provides a mechanism to retry queries that fail due to protocol or network-related errors. This removes the need to write retry logic in client code.
The driver retries failed queries according to the provided implementation of the com.datastax.oss.driver.api.core.retry.RetryPolicy
interface. The onReadTimeout()
, onWriteTimeout()
, and onUnavailable()
operations define the behavior that should be taken when a query fails with protocol- or network-related exceptions ReadTimeoutException
, WriteTimeoutException
, or UnavailableException
, respectively. The onErrorResponse()
operation describes the behavior for handling other recoverable server errors, and onRequestAborted()
operation handles cases in which the driver aborts a request before the server responds.
The RetryPolicy
operations return a RetryDecision
, which indicates whether the query should be retried, and if so, at what consistency level. If the exception is not retried, it can be rethrown or ignored, in which case the query operation will return an empty ResultSet
.
The 4.0 release of the driver provides a single opinionated implementation of the RetryPolicy
based on best practices. Releases through 3.x had a FallthroughRetryPolicy
that never recommended retries, and a DowngradingConsistencyRetryPolicy
that downgrades the consistency level required on retries, as an attempt to get the query to succeed. The issue with the DowngradingConsistencyRetryPolicy
was: if you are willing to accept a downgraded consistency level under some circumstances, do you really require a higher consistency level for the general case?
The RetryPolicy
implementation can be overridden using the advanced.retry-policy
configuration.
While it’s great to have a retry mechanism that automates the response to network timeouts, you don’t often have the luxury of being able to wait for timeouts or even long garbage collection pauses. To speed things up, the driver provides a speculative execution feature. If the original coordinator node for a query fails to respond in a predetermined interval, the driver can preemptively start an additional execution of the query against a different coordinator node. When one of the queries returns, the driver provides that response and cancels any other outstanding queries.
Speculative execution is disabled by default via the NoSpeculativeExecutionPolicy
, but can be enabled on a CqlSession
by setting the ConstantSpeculativeExecutionPolicy
. Here’s an example of how you configure this policy in the configuration file by specifying a maximum number of executions and a constant delay between executions (in milliseconds):
advanced.speculative-execution-policy { class = ConstantSpeculativeExecutionPolicy max-executions = 3 delay = 100 milliseconds }
You may create your own policy by implementing the com.datastax.oss.driver.api.core.specex.SpeculativeExecutionPolicy
interface.
Because the CQL native protocol is asynchronous, it allows multiple simultaneous requests per connection; the maximum is 128 simultaneous requests in protocol v2, while v3 and later allow up to 32,768 simultaneous requests. Because of this larger number of simultaneous requests, fewer connections per node are required. In fact, the default is a single connection per node.
Connection pool settings are configurable via the advanced.connection
configuration options, including the number of connections to use for local and remote hosts, and the maximum number of simultaneous requests per connection (defaults to 1,024). While the v4 driver does not provide the ability to scale the number of connections up and down as with previous versions, you can adjust these settings by updating the configuration file, and the changes will be applied at the next time the configuration file is reloaded.
The driver uses a connection heartbeat to make sure that connections are not closed prematurely by intervening network devices. This defaults to 30 seconds but can be overridden using the advanced.heartbeat
configuration options.
The driver supports multiple versions of the CQL native protocol. Cassandra 4.0 uses CQL protocol version 5, while Cassandra 3.X releases support version 4.
By default, the driver negotiates the protocol version when establishing connections, even correctly handling connections to mixed clusters in which multiple versions of Cassandra are in use. You can force a protocol version using the advanced.protocol.version
configuration option.
The driver provides the option of compressing messages between your client and Cassandra nodes according to the compression options supported by the CQL native protocol. Enabling compression reduces network bandwidth consumed by the driver, at the cost of additional CPU usage for the client and server.
Currently there are two compression algorithms available: LZ4
and SNAPPY
. The compression defaults to NONE
but can be overridden by setting the advanced.protocol.compression
configuration property.
The driver provides a pluggable authentication mechanism that can be used to support a simple username/password login, or integration with other authentication systems. By default, no authentication is performed. An authentication provider can be selected by passing an implementation of the com.datastax.oss.driver.api.core.auth.AuthProvider
interface, such as the PlainTextAuthProvider
to the CqlSessionBuilder.withAuthProvider()
operation, or by setting the advanced.auth-provider
section in your configuration file. You can configure the PlainTextAuthProvider
and provide your username and password by using the CqlSessionBuilder.withAuthCredentials()
operation.
The driver can also encrypt its communications with the server to ensure privacy. Client-server encryption options are specified by each node in its cassandra.yaml file. The driver complies with the encryption settings specified by each node.
We’ll examine authentication, authorization, and encryption from both the client and server perspective in more detail in Chapter 14.
While some of the configuration values that you’ve learned can be overridden on individual Statements
, many of them cannot. So what can you do when the configuration values chosen are appropriate for some of your queries, but not others? The driver allows you to create execution profiles, which are settings of configuration values that can be applied to individual Statements
as an overlay over the default configuration. To learn which configuration options can be set in a profile, see the reference configuration file.
For example, let’s say your default settings include a request timeout of one second and a consistency level of LOCAL_QUORUM
. You could create an execution profile to use with requests that you want to give a stronger consistency by adding this to the profiles
section of the configuration file:
datastax-java-driver { profiles { long_request { basic.request.timeout = 3 seconds basic.request.consistency = QUORUM } }
Then, you can apply the values to a Statement
:
statement.setExecutionProfileName("long_request");
There is also a setExecutionProfileName()
operation available when using the SimpleStatementBuilder
. Or, if you create a PreparedStatement
from a SimpleStatement
(using CqlSession.prepare()
), any execution profile you have set will be inherited by any BoundStatements
created from the PreparedStatement
.
To access the cluster metadata, invoke the CqlSession.getMetadata()
method, which returns an object implementing the com.datastax.oss.driver.api.core.metadata.Metadata
interface. This object provides information about the cluster at a snapshot in time, including the nodes in the cluster, the tokens assigned to each node, and the schema, including keyspaces and tables.
A CqlSession
maintains a control connection to the first node it connects with, which it uses to maintain information on the state and topology of the cluster. Using this connection, the driver will discover all the nodes currently in the cluster, and you can obtain this information through the Metadata.getNodes()
operation, which returns a list of com.datastax.oss.driver.api.core.metadata.Node
objects to represent each node. You can view the state of each node through the Node.getState()
operation, or you can register an implementation of the com.datastax.oss.driver.api.core.metadata.NodeStateListener
interface to receive callbacks when nodes are added or removed from the cluster, or when they are up or down. This state information is also viewable in the driver logs, which we’ll discuss shortly.
The Metadata
class also allows the client to learn about the schema in a cluster, including operations that provide descriptions of individual keyspaces and tables. The schema version in use in a cluster can change over time as keyspaces and tables are created, altered, and deleted.
We discussed Cassandra’s support for eventual consistency at great length in Chapter 2. Because schema information is itself stored using Cassandra, it is also eventually consistent, and as a result it is possible for different nodes to have temporarily different versions of the schema. The driver has internal safeguards to check for schema agreement before initiating any statement that would change the schema. The driver provides a notification mechanism for clients to learn about schema changes by registering a com.datastax.oss.driver.api.core.metadata.schema.SchemaChangeListener
with the CqlSession
as it is built using the withSchemaChangeListener()
operation on the builder, or via the advanced.schema-change-listener
configuration option.
In addition to the schema access you’ve just examined in the Metadata
class, the Java driver also provides a facility for managing schema in the com.datastax.oss.driver.api.querybuilder
package. The SchemaBuilder
provides a fluent-style API for creating Statements
representing operations such as CREATE
, ALTER
, and DROP
on keyspaces, tables, indexes, and user-defined types (UDTs).
For example, you could create the reservations_by_confirmation
table using the createTable()
schema builder:
import static com.datastax.oss.driver.api.querybuilder.SchemaBuilder.createTable; import com.datastax.oss.driver.api.core.type.DataTypes; cqlSession.execute(createTable("reservation", "reservations_by_confirmation") .ifNotExists() .withPartitionKey("confirmation_number, DataTypes.TEXT) .withColumn("hotel_id", DataTypes.TEXT) .withColumn("start_date", DataTypes.DATE) .withColumn("end_date", DataTypes.DATE) .withColumn("room_number", DataTypes.SMALLINT) .withColumn("guest_id", DataTypes.UUID) .build());
As you learned in Chapter 4, CQL is case-sensitive by default. While the practice is generally discouraged, it is possible to create case-sensitive names for keyspaces, tables, and columns by using quotes around identifiers in CQL. In order to simplify the handling of case sensitivity, the Java driver uses the CqlIdentifier
class as a wrapper for all identifiers in its schema API. If you are writing code that manipulates schema, it’s a good practice to make use of these identifiers as well. Java Driver APIs that accept identifiers as arguments support both Java String
(as shown previously) and CqlIdentifier
formats (as shown in the Reservation Service implementation).
The driver provides features for monitoring and debugging your client’s use of Cassandra, including facilities for logging and metrics. There are also capabilities for query tracing and tracking slow queries, which you’ll learn about in Chapter 13.
As you will learn in Chapter 11, Cassandra uses a logging API called Simple Logging Facade for Java (SLF4J). The Java driver uses the SLF4J API for logging as well. In order to enable logging on your Java client application, you need to provide a compliant SLF4J implementation on the classpath, such as Logback (used by the Reservation Service) or Log4j. The Java driver provides information at multiple levels; the ERROR
, WARN
, and INFO
levels are the most useful to application developers.
You configure logging by taking advantage of Logback’s configuration mechanism, which supports separate configuration for test and production environments. Logback inspects the classpath first for the file logback-test.xml representing the test configuration, and then if no test configuration is found, it searches for the file logback.xml. Here’s an example extract from a logback.xml configuration file that enables the INFO
log level for the Java driver:
<configuration> <!-- other appenders and loggers --> <logger name="com.datastax.oss.driver" level="INFO"/> </configuration>
For more detail on Logback configuration, including sample configuration files for test and production environments, see the configuration page or the Reservation Service implementation.
Sometimes it can be helpful to monitor the behavior of client applications over time in order to detect abnormal conditions and debug errors. The Java driver collects metrics on its activities and makes these available using the Dropwizard Metrics library. The driver reports metrics on connections, task queues, queries, and errors such as connection errors, read and write timeouts, retries, and speculative executions. A full list of metrics is available in the reference configuration.
You can access the Java driver metrics locally via the CqlSession.getMetrics()
operation. The Metrics library can also integrate with the Java Management Extensions (JMX) to allow remote monitoring of metrics. We’ll discuss the remote monitoring of metrics from Cassandra nodes in Chapter 11, and the same techniques apply to gathering metrics from client applications. JMX reporting is disabled by default in the v4 drivers (it was enabled by default in v3), but can be configured.
There are several drivers available for other programming languages:
The DataStax Python Driver was introduced in 2014, replacing the Pycassa client built on Cassandra’s legacy Thrift interface as the primary Python driver for Cassandra. The driver supports Python 2.7 as well as current Python 3 versions back to 3.4. You can install the driver by running the Python installer pip:
$ pip install cassandra-driver
The Python driver includes an object mapper called cqlengine and makes use of third-party libraries for performance, compression, and metrics. The driver source is available on GitHub.
The DataStax Node.js Driver was introduced in October 2014, based on the node-cassandra-cql project developed by Jorge Bay.
The Node.js driver is installed via the node package manager (NPM):
$ npm install cassandra-driver
As with other DataStax drivers, the source code is available on GitHub.
First released in July 2013, the DataStax C# Driver provides support for Windows clients using the .NET Framework. For this reason, it is also frequently referred to as the “.NET Driver.”
The C# Driver is available on NuGet, the package manager for the Microsoft development platform. Within PowerShell, run the following command at the Package Manager Console:
PM> Install-Package CassandraCSharpDriver
To use the driver, create a new project in Visual Studio and add a using
directive that references the Cassandra namespace. The C# Driver integrates with Language Integrated Query (LINQ), a Microsoft .NET Framework component that adds query capabilities to .NET languages; there is a separate object mapper available as well.
The DataStax C/C++ Driver was released in February 2014. The C/C++ Driver is a bit different than the other drivers in that its API focuses on asynchronous operations to the exclusion of synchronous operations.
The C/C++ driver uses the libuv library for asynchronous I/O operations, and optionally uses the OpenSSL library if needed for encrypted client-node connections. Instructions for compilation and linking vary by platform, so see the driver documentation for details.
DataStax also has drivers available for Ruby and PHP, although these are considered to be in maintenance mode and are updated only for critical bug fixes.
Open Database Connectivity (ODBC) is a standard developed by Microsoft that allows applications to access data using SQL. Java Database Connectivity (JDBC) is a Java API that provides a SQL abstraction—see the java.sql
package. JDBC and ODBC drivers are available from vendors, including Simba and Progress Software.
The Go language created at Google has seen a rapid increase in popularity for server applications since its public introduction in 2009. The language is like C syntax but contains similar improvements in terms of memory management and concurrency.
GoCQL is an open source driver for the Go language. It is under active development but provides many of the same features as the DataStax drivers, including connection management, statement execution, paging, batches, and more.
You should now understand the various drivers available for Cassandra, the features they provide, and how to install and use them. We gave particular attention to the DataStax Java Driver in order to get some hands-on experience, which should serve you well even if you choose to use one of the other DataStax or community drivers. You’ll continue to learn other driver features in the coming chapters as we discuss more details of reading and writing.