Chapter 12. Object-Relational Structural Patterns

Identity Field

Saves a database ID field in an object to maintain identity between an in-memory object and a database row.

Relational databases tell one row from another by using key—in particular, the primary key. However, in-memory objects don’t need such a key, as the object system ensures the correct identity under the covers (or in C++’s case with raw memory locations). Reading data from a database is all very well, but in order to write data back you need to tie the database to the in-memory object system.

In essence, Identity Field is mind-numbingly simple. All you do is store the primary key of the relational database table in the object’s fields.

How It Works

Although the basic notion of Identity Field is very simple, there are oodles of complicated issues that come up.

Choosing Your Key

The first issue is what kind of key to choose in your database. Of course, this isn’t always a choice, since you’re often dealing with an existing database that already has its key structures in place. There’s a lot of discussion and material on this in the database community. Still, mapping to objects does add some concerns to your decision.

The first concern is whether to use meaningful or meaningless keys. A meaningful key is something like the U.S. Social Security number for identifying a person. A meaningless key is essentially a random number the database dreams up that’s never intended for human use. The danger with a meaningful key is that, while in theory they make good keys, in practice they don’t. To work at all, keys need to be unique; to work well, they need to be immutable. While assigned numbers are supposed to be unique and immutable, human error often makes them neither. If you mistype my SSN for my wife’s the resulting record is neither unique nor immutable (assuming you would like to fix the mistake.) The database should detect the uniqueness problem, but it can only do that after my record goes into the system, and of course that might not happen until after the mistake. As a result, meaningful keys should be distrusted. For small systems and/or very stable cases you may get away with it, but usually you should take a rare stand on the side of meaninglessness.

The next concern is simple versus compound keys. A simple key uses only one database field; a compound key uses more than one. The advantage of a compound key is that it’s often easier to use when one table makes sense in the context of another. A good example is orders and line items, where a good key for the line item is a compound of the order number and a sequence number makes a good key for a line item. While compound keys often make sense, there is a lot to be said for the sheer uniformity of simple keys. If you use simple keys everywhere, you can use the same code for all key manipulation. Compound keys require special handling in concrete classes. (With code generation this isn’t a problem). Compound keys also carry a bit of meaning, so be careful about the uniqueness and particularly the immutability rule with them.

You have to choose the type of the key. The most common operation you’ll do with a key is equality checking, so you want a type with a fast equality operation. The other important operation is getting the next key. Hence a long integer type is often the best bet. Strings can also work, but equality checking may be slower and incrementing strings is a bit more painful. Your DBA’s preferences may well decide the issue.

(Beware about using dates or times in keys. Not only are they meaningful, they also lead to problems with portability and consistency. Dates in particular are vulnerable to this because they are often stored to some fractional second precision, which can easily get out of sync and lead to identity problems.)

You can have keys that are unique to the table or unique database-wide. A table-unique key is unique across the table, which is what you need for a key in any case. A database-unique key is unique across every row in every table in the database. A table-unique key is usually fine, but a database-unique key is often easier to do and allows you to use a single Identity Map (195). Modern values being what they are, it’s pretty unlikely that you’ll run out of numbers for new keys. If you really insist, you can reclaim keys from deleted objects with a simple database script that compacts the key space—although running this script will require that you take the application offline. However, if you use 64-bit keys (and you might as well) you’re unlikely to need this.

Be wary of inheritance when you use table-unique keys. If you’re using Concrete Table Inheritance (293) or Class Table Inheritance (285), life is much easier with keys that are unique to the hierarchy rather than unique to each table. I still use the term “table-unique,” even if it should strictly be something like “inheritance graph unique.”

The size of your key may effect performance, particularly with indexes. This is dependent on your database system and/or how many rows you have, but it’s worth doing a crude check before you get fixed into your decision.

Representing the Identity Field in an Object

The simplest form of Identity Field is a field that matches the type of the key in the database. Thus, if you use a simple integral key, an integral field will work very nicely.

Compound keys are more problematic. The best bet with them is to make a key class. A generic key class can store a sequence of objects that act as the elements of the key. The key behavior for the key object (I have a quota of puns per book to fill) is equality. It’s also useful to get parts of the key when you’re mapping to the database.

If you use the same basic structure for all keys, you can do all of the key handling in a Layer Supertype (475). You can put default behavior that will work for most cases in the Layer Supertype (475) and extend it for the exceptional cases in the particular subtypes.

You can have either a single key class, which takes a generic list of key objects, or key class for each domain class with explicit fields for each part of the key. I usually prefer to be explicit, but in this case I’m not sure it buys very much. You end up with lots of small classes that don’t do anything interesting. The main benefit is that you can avoid errors caused by users putting the elements of the key in the wrong order, but that doesn’t seem to be a big problem in practice.

If you’re likely to import data between different database instances, you need to remember that you’ll get key collisions unless you come up with some scheme to separate the keys between different databases. You can solve this with some kind of key migration on the imports, but this can easily get very messy.

Getting a New Key

To create an object, you’ll need a key. This sounds like a simple matter, but it can often be quite a problem. You have three basic choices: get the database to auto-generate, use a GUID, or generate your own.

The auto-generate route should be the easiest. Each time you insert data for the database, the database generates a unique primary key without you having to do anything. It sounds too good to be true, and sadly it often is. Not all databases do this the same way. Many that do, handle it in such a way that causes problems for object-relational mapping.

The most common auto-generation method is declaring one auto-generated field, which, whenever you insert a row, is incremented to a new value. The problem with this scheme is that you can’t easily determine what value got generated as the key. If you want to insert an order and several line items, you need the key of the new order so you can put the value in the line item’s foreign key. Also, you need this key before the transaction commits so you can save everything within the transaction. Sadly, databases usually don’t give you this information, so you usually can’t use this kind of auto-generation on any table in which you need to insert connected objects.

An alternative approach to auto-generation is a database counter, which Oracle uses with its sequence. An Oracle sequence works by sending a select statement that references a sequence; the database then returns an SQL record set consisting of the next sequence value. You can set a sequence to increment by any integer, which allows you to get multiple keys at once. The sequence query is automatically carried out in a separate transaction, so that accessing the sequence won’t lock out other transactions inserting at the same time. A database counter like this is perfect for our needs, but it’s nonstandard and not available in all databases.

A GUID (Globally Unique IDentifier) is a number generated on one machine that’s guaranteed to be unique across all machines in space and time. Often platforms give you the API to generate a GUID. The algorithm is an interesting one involving ethernet card addresses, time of the day in nanoseconds, chip ID numbers, and probably the number of hairs on your left wrist. All that matters is that the resulting number is completely unique and thus a safe key. The only disadvantage to a GUID is that the resulting key string is big, and that can be an equally big problem. There are always times when someone needs to type in a key to a window or SQL expression, and long keys are hard both to type and to read. They may also lead to performance problems, particularly with indexes.

The last option is rolling your own. A simple staple for small systems is to use a table scan using the SQL max function to find the largest key in the table and then add one to use it. Sadly, this read-locks the entire table while you’re doing it, which means that it works fine if inserts are rare, but your performance will be toasted if you have inserts running concurrently with updates on the same table. You also have to ensure you have complete isolation between transactions; otherwise, you can end up with multiple transactions getting the same ID value.

A better approach is to use a separate key table. This table is typically one with two columns: name and next available value. If you use database-unique keys, you’ll have just one row in this table. If you use table-unique keys, you’ll have one row for each table in the database. To use the key table, all you need to do is read that one row and note the number, the increment, the number and write it back to the row. You can grab many keys at a time by adding a suitable number when you update the key table. This cuts down on expensive database calls and reduces contention on the key table.

If you use a key table, it’s a good idea to design it so that access to it is in a separate transaction from the one that updates the table you’re inserting into. Say I’m inserting an order into the orders table. To do this I’ll need to lock the orders row on the key table with a write lock (since I’m updating). That lock will last for the entire transaction that I’m in, locking out anyone else who wants a key. For table-unique keys, this means anyone inserting into the orders table; for database-unique keys it means anyone inserting anywhere.

By putting access to the key table in a separate transaction, you only lock the row for that, much shorter, transaction. The downside is that, if you roll back on your insert to the orders, the key you got from the key table is lost to everyone. Fortunately, numbers are cheap, so that’s not a big issue. Using a separate transaction also allows you to get the ID as soon as you create the in-memory object, which is often some before you open the transaction to commit the business transaction.

Using a key table affects the choice of database-unique or table-unique keys. If you use a table-unique key, you have to add a row to the key table every time you add a table to the database. This is more effort, but it reduces contention on the row. If you keep your key table accesses in a different transaction, contention is not so much of a problem, especially if you get multiple keys in a single call. But if you can’t arrange for the key table update to be in a separate transaction, you have a strong reason against database-unique keys.

It’s good to separate the code for getting a new key into its own class, as that makes it easier to build a Service Stub (504) for testing purposes.

When to Use It

Use Identity Field when there’s a mapping between objects in memory and rows in a database. This is usually when you use Domain Model (116) or Row Data Gateway (152). You don’t need this mapping if you’re using Transaction Script (110), Table Module (125), or Table Data Gateway (144).

For a small object with value semantics, such as a money or date range object that won’t have its own table, it’s better to use Embedded Value (268). For a complex graph of objects that doesn’t need to be queried within the relational database, Serialized LOB (272) is usually easier to write and gives faster performance.

One alternative to Identity Field is to extend Identity Map (195) to maintain the correspondence. This can be used for systems where you don’t want to store an Identity Field in the in-memory object. Identity Map (195) needs to look up both ways: give me a key for an object or an object for a key. I don’t see this very often because usually it’s easier to store the key in the object.

Example: Integral Key (C#)

The simplest form of Identity Field is a integral field in the database that maps to an integral field in an in-memory object.

How It Works

Choosing Your Key

Representing the Identity Field in an Object

Getting a New Key

When to Use It

Further Reading

Example: Integral Key (C#)

Example: Using a Key Table (Java)

Example: Using a Compound Key (Java)

A Key Class

Reading

Insertion

Updates and Deletes

How It Works

When to Use It

Example: Single-Valued Reference (Java)

Example: Multitable Find (Java)

Example: Collection of References (C#)

How It Works

When to Use It

Example: Using Direct SQL (Java)

Example: Using a Single Query for Multiple Employees (Java)

How It Works

When to Use It

Example: Albums and Tracks (Java)

How It Works

When to Use It

Example: Simple Value Object (Java)

How It Works

Example: Serializing a Department Hierarchy in XML (Java)

How It Works

When to Use It

Example: A Single Table for Players (C#)

Loading an Object from the Database

Updating an Object

Inserting an Object

Deleting an Object

How It Works

When to Use It

Further Reading

Example: Players and Their Kin (C#)

Loading an Object

Updating an Object

Deleting an Object

How It Works

When to Use It

Example: Concrete Players (C#)

Updating an Object

Inserting an Object

Deleting an Object

How It Works

When to Use It