Chapter 7

Handling Temporal Data

IN THIS CHAPTER

Bullet Defining times and periods

Bullet Tracking what happened at specific times

Bullet Providing an audit trail of changes to the data

Bullet Handling both what happened and when the event was recorded

Bullet Formatting and parsing dates and times

Before SQL:2011, ISO/IEC standard SQL had no mechanism for dealing with data that was valid at one point in time but not valid at another. Any application that requires that an audit trail be kept needs that capability. This means that the burden of keeping track of what was true at a given time falls on the application programmer rather than the database. This sounds like a recipe for complicated, over-budget, late, and bug-infested applications.

Syntax was added to SQL:2011 that enables handling of temporal data without messing up the way code for non-temporal data is handled. This is a big advantage for anyone who wants to add temporal capability to an existing SQL database. Even though SQL:2011 standardized how to handle temporal data, it did not specify how temporal data templates should be formatted or parsed. That deficiency was finally remedied with SQL:2016.

What do I mean by the term temporal data? The ISO/IEC SQL:2011 standard does not use that term at all, but it is commonly used in the database community. In SQL:2011, temporal data is any data with one or more associated time periods during which that data is deemed to be effective or valid along some time dimension. In plain English, that means that with temporal data capability, you can determine when a predicate is true.

In this chapter, I introduce the concept of a period of time, defining it in a very specific way. We will look at various kinds of time and the effect that temporal data has on the definition of primary keys and referential integrity constraints. I then discuss the way that very complex data can be stored and operated on in bitemporal tables. Finally, I give the standard template placeholder arguments for date and time data.

Understanding Times and Periods

Although versions of the SQL standard prior to SQL:2011 provided for DATE, TIME, TIMESTAMP, and INTERVAL data types, they did not address the idea of a time period with a definite start time and a definite end time. One way of addressing this need is to define a new PERIOD data type. SQL:2011 however, did not do this. To introduce a new data type into SQL at this late stage in its development would wreak havoc with the ecosystem that has built up around SQL. Major surgery to virtually all existing database products would be required to add a new data type.

Instead of adding a PERIOD data type, SQL:2011 solved the problem by adding period definitions as metadata to tables. A period definition is a named table component, identifying a pair of columns that capture the period start and the period end time. The CREATE TABLE and ALTER TABLE statements used to create and modify tables were updated with new syntax to create or destroy the periods created by the period definitions.

A PERIOD is determined by two columns: a start column and an end column. These columns are conventional, just like the columns of the existing date data types, each with its own unique name. As mentioned previously, a period definition is a named table component. It occupies the same name space as column names, so it must not duplicate any existing column name.

SQL follows a closed-open model for periods, meaning that a period includes the start time but not the end time. For any table row, a period end time must be greater than its start time. This is a constraint that is enforced by the DBMS.

Remember There are two dimensions of time that are important when dealing with temporal data:

  • Valid time is the time period during which a row in a table correctly reflects reality.
  • Transaction time is the time period during which a row is committed to or recorded in a database.

The valid time and the transaction time for a row in a table need not be the same. For example, in a business database that records the period during which a contract is in effect, the information about the contract may be (and probably is) inserted before the contract start time.

As specified in SQL:2011, separate tables may be created and maintained to accommodate the two different kinds of time, or a single, bitemporal table (discussed later in this chapter) may serve the purpose. Transaction time information is kept in system-versioned tables, which contain the system-time period, denoted by the keyword SYSTEM_TIME. Valid time information, on the other hand, is maintained in tables that contain an application-time period. You can give an application-time period any name you want, provided the name is not already used for something else. You're allowed to define at most one system-time period and one application-time period.

Although temporal data support in SQL was introduced for the first time in SQL:2011, people have had to deal with temporal data long before the temporal constructs of SQL:2011 were included in any database products. This was typically done by defining two table columns, one for the start datetime and the other for the end datetime. The fact that SQL:2011 does not define a new PERIOD data type, but rather uses period definitions as metadata, means that existing tables with such start and end columns can easily be upgraded to incorporate the new capability. The logic for providing period information can be removed from existing application programs, simplifying them, speeding them up, and making them more reliable.

Working with Application-Time Period Tables

Consider an example using application-period time tables. Suppose a business wants to keep track of what department its employees belong to at any time throughout their period of employment. The business can do this by creating application-time period tables for employees and departments, like this:

CREATE TABLE employee_atpt(

EmpID INTEGER,

EmpStart DATE,

EmpEnd DATE,

EmpDept VARCHAR(30),

PERIOD FOR EmpPeriod (EmpStart, EmpEnd)

);

The starting datetime (EmpStart in the example) is included in the period, but the ending datetime (EmpEnd in the example) is not. This is what closed-open semantics means.

Technicalstuff I haven't specified a primary key yet, because that is a little more involved when you’re dealing with temporal data. I deal with that later in this chapter.

For now, put some data into this table and see what it looks like:

INSERT INTO employee_atpt

VALUES (12345, DATE '2018-01-01', DATE '9999-12-31', 'Sales');

The resulting table has one row, as shown in Table 7-1.

TABLE 7-1 The Application-Period Time Table Contains One Row

EmpID

EmpStart

EmpEnd

EmpDept

12345

2018-01-01

9999-12-31

Sales

The end date of 9999-12-31 indicates that this employee’s tenure with the company has not ended yet. For simplicity, I have left off the hours, minutes, seconds, and fractional seconds in this and subsequent examples.

Now suppose that on March 15, 2018, employee 12345 is temporarily assigned to the Engineering department until July 15, 2018, returning to the Sales department thereafter. You can accomplish this with the following UPDATE statement:

UPDATE employee_atpt

FOR PORTION OF EmpPeriod

FROM DATE '2018-03-15'

TO DATE '2018-07-15'

SET EmpDept = 'Engineering'

WHERE EmpID = 12345;

After the update, the table now has three rows, as shown in Table 7-2.

TABLE 7-2 Application-Time Period Table after an Update

EmpID

EmpStart

EmpEnd

EmpDept

12345

2018-01-01

2018-03-15

Sales

12345

2018-03-15

2018-07-15

Engineering

12345

2018-07-15

9999-12-31

Sales

Assuming employee 12345 is still employed in the Sales department, the table accurately records her department membership from New Year’s Day of 2018 up to the present time.

If you can insert new data into a table and update existing data in the table, you’d better be able to delete data from the table, too. However, deleting data from an application-time period table can be a little more complicated than merely deleting rows from an ordinary, non-temporal table. As an example, suppose that employee 12345, instead of being transferred to the Engineering department on March 15 of 2018, leaves the company on that date and is rehired on July 15 of the same year. Initially, the application-time period table will have one row, as shown in Table 7-3.

TABLE 7-3 Application-Time Period Table before Update or Deletion

EmpID

EmpStart

EmpEnd

EmpDept

12345

2018-01-01

9999-12-31

Sales

A DELETE statement will update the table to show the period during which employee 12345 was gone:

DELETE employee_atpt

FOR PORTION OF EmpPeriod

FROM DATE '2018-03-15'

TO DATE '2018-07-15'

WHERE EmpID = 12345;

The resulting table will be like Table 7-4.

TABLE 7-4 Application-Time Period Table after Deletion

EmpID

EmpStart

EmpEnd

EmpDept

12345

2018-01-01

2018-03-15

Sales

12345

2018-07-15

9999-12-31

Sales

The table now reflects the time periods during which employee 12345 was employed by the company and shows the gap during which she was not employed by the company.

You may have noticed something puzzling about the tables shown in this section. In an ordinary, non-temporal table listing an organization's employees, the employee ID number is sufficient to serve as the table’s primary key because it uniquely identifies each employee. However, an application-time period table of employees may contain multiple rows for a single employee. The employee ID number, by itself, is no longer usable as the table’s primary key. The temporal data must be added to the mix.

Designating primary keys in application-time period tables

In Tables 7-2 and 7-4, it is clear that the employee ID (EmpID) does not guarantee uniqueness. There are multiple rows with the same EmpID. To guarantee that there is no duplication of rows, the start date (EmpStart) and end date (EmpEnd) must be included in the primary key. However, just including them is not sufficient. Consider Table 7-5, showing the case where employee 12345 was merely transferred to Engineering for a few months, and then returned to her home department.

TABLE 7-5 A Situation You May Not Want to Occur

EmpID

EmpStart

EmpEnd

EmpDept

12345

2018-01-01

9999-12-31

Sales

12345

2018-03-15

2018-07-15

Engineering

The two rows of the table are guaranteed to be unique by inclusion of EmpStart and EmpEnd in the primary key, but notice that the two time periods overlap. It looks like employee 12345 is a member of both the Sales department and the Engineering department from March 15, 2018 until July 15, 2018. In some organizations, this may be possible, but it adds complication and could lead to data corruption. Enforcing a constraint that says that an employee can be a member of only one department at a time is perhaps what most organizations would want to do. You can add such a constraint to a table with an ALTER TABLE statement such as the following:

ALTER TABLE employee_atpt

ADD PRIMARY KEY (EmpID, EmpPeriod WITHOUT OVERLAPS);

There's a better way to do things than creating a table first and adding its primary key constraint later — instead, you can include the primary key constraint in the original CREATE statement. It might look like the following:

CREATE TABLE employee_atpt (

EmpID INTEGER NOT NULL,

EmpStart DATE NOT NULL,

EmpEnd DATE NOT NULL,

EmpDept VARCHAR(30),

PERIOD FOR EmpPeriod (EmpStart, EmpEnd)

PRIMARY KEY (EmpID, EmpPeriod WITHOUT OVERLAPS)

);

Now overlapping rows are prohibited. While I was at it, I added NOT NULL constraints to all the elements of the primary key. A null value in any of those fields would be a source of errors in the future. Normally, the DBMS will take care of this, but why take chances?

Applying referential integrity constraints to application-time period tables

Any database that is meant to maintain more than a simple list of items will probably require multiple tables. If a database has multiple tables, the relationships between the tables must be defined, and referential integrity constraints must be put into place.

In the example in this chapter, you have an employee application-time period table and a department application-time period table. There is a one-to-many relationship between the department table and the employee table, because a department may have multiple employees, but each employee belongs to one and only one department. This means that you need to put a foreign key into the employee table that references the primary key of the department table. With this in mind, create the employee table again, this time using a more complete CREATE statement, and create a department table in a similar manner:

CREATE TABLE employee_atpt (

EmpID INTEGER NOT NULL,

EmpStart DATE NOT NULL,

EmpEnd DATE NOT NULL,

EmpName VARACHAR (30),

EmpDept VARCHAR (30),

PERIOD FOR EmpPeriod (EmpStart, EmpEnd)

PRIMARY KEY (EmpID, EmpPeriod WITHOUT OVERLAPS)

FOREIGN KEY (EmpDept, PERIOD EmpPeriod)

REFERENCES dept_atpt (DeptID, PERIOD DeptPeriod)

);

CREATE TABLE dept_atpt (

DeptID VARCHAR (30) NOT NULL,

Manager VARCHAR (40) NOT NULL,

DeptStart DATE NOT NULL,

DeptEnd DATE NOT NULL,

PERIOD FOR DeptTime (DeptStart, DeptEnd),

PRIMARY KEY (DeptID, DeptTime WITHOUT OVERLAPS)

);

Querying application-time period tables

Now, detailed information can be retrieved from the database by using SELECT statements that make use of the temporal data.

One thing you might want to do is to list all the people who are currently employed by the organization. Even before SQL:2011, you could do it with a statement similar to the following:

SELECT *

FROM employee_atpt

WHERE EmpStart <= CURRENT_DATE()

AND EmpEnd > CURRENT_DATE();

With the new PERIOD syntax, you can get the same result a little more easily, like this:

SELECT *

FROM employee_atpt

WHERE EmpPERIOD CONTAINS CURRENT_DATE();

You can also retrieve employees who were employed during a specific period of time, like so:

SELECT *

FROM employee_atpt

WHERE EmpPeriod OVERLAPS

PERIOD (DATE ('2018-01-01'), DATE ('2018-09-16'));

Other predicates besides CONTAINS and OVERLAPS that you can use in this context include EQUALS, PRECEDES, SUCCEEDS, IMMEDIATELY PRECEDES, and IMMEDIATELY SUCCEEDS.

These predicates operate as follows:

  • If one period EQUALS another, they are exactly the same.
  • If one period PRECEDES another, it comes somewhere before it.
  • If one period SUCCEEDS another, it comes somewhere after it.
  • If one period IMMEDIATELY PRECEDES another, it comes just before and is contiguous with it.
  • If one period IMMEDIATELY SUCCEEDS another, it comes just after and is contiguous with it.

Working with System-Versioned Tables

System-versioned tables have a different purpose than application-time period tables, and consequently work differently. Application-time period tables enable you to define periods of time and operate on the data that falls within those periods. In contrast, system-versioned tables are designed to create an auditable record of exactly when a data item was added to, changed within, or deleted from a database. For example, it is important for a bank to know exactly when a deposit or withdrawal was made, and this information must be kept for a period of time designated by law. Similarly, stock brokers need to track exactly when a purchase transaction was made. There are a number of similar cases, where knowing when a particular event occurred, down to a fraction of a second, is important.

Applications such as the bank application or the stock broker application have strict requirements:

  • Any update or delete operation must preserve the original state of the row before performing the update or delete operation.
  • The system, rather than the user, maintains the start and end times of the periods of the rows.

    Original rows that have been subjected to an update or delete operation remain in the table and are henceforward referred to as historical rows. Users are prevented from modifying the contents of historical rows or the periods associated with any of the rows. Only the system, not the user, may update the periods of rows in a system-versioned table. This is done by updating the non-period columns of the table or as a result of row deletions.

    These constraints guarantee that the history of data changes is immune to tampering, thus meeting audit standards and complying with government regulations.

System-versioned tables are distinguished from application-time period tables by a couple of differences in the CREATE statements that create them:

  • Whereas in an application-time period table the user can give any name to the period, in a system-versioned table, the period name must be SYSTEM_TIME.
  • The CREATE statement must include the keywords WITH SYSTEM VERSIONING. Although SQL:2011 allowed the data type for the period start and period end to be either DATE type or one of the timestamp types, you will almost always want to use one of the timestamp types, which give you a level of precision much finer than a day. Of course, whatever type you choose for the start column must also be used for the end column.

To illustrate the use of system-versioned tables, I continue to use employee and department examples. You can create a system-versioned table with the following code:

CREATE TABLE employee_sys (

EmpID INTEGER,

Sys_Start TIMESTAMP(12) GENERATED ALWAYS AS ROW START,

Sys_End TIMESTAMP(12) GENERATED ALWAYS AS ROW END,

EmpName VARCHAR(30),

PERIOD FOR SYSTEM_TIME (SysStart, SysEnd)

) WITH SYSTEM VERSIONING;

A row in a system-versioned table is considered to be a current system row if the current time is contained in the system-time period. Otherwise it is considered to be a historical system row.

System-versioned tables are similar to application-time period tables in many respects, but there are also differences. Here are a few:

  • Users may not assign or change the values in the Sys_Start and Sys_End columns. These values are assigned and changed automatically by the DBMS. This situation is mandated by the keywords GENERATED ALWAYS.
  • When you use the INSERT operation to add something into a system-versioned table, the value in the Sys_Start column is automatically set to the transaction timestamp, which is associated with every transaction. The value assigned to the Sys_End column is the highest value of that column's data type.
  • In system-versioned tables, the UPDATE and DELETE operations operate only on current system rows. Users may not update or delete historical system rows.
  • Users may not modify the system-time period start or end time of either current or historical system rows.
  • Whenever you either use the UPDATE or DELETE operation on a current-system row, a historical system row is automatically inserted.

    An UPDATE statement on a system-versioned table first inserts a copy of the old row, with its system end time set to the transaction timestamp. This indicates that the row ceased to be current at that timestamp. Next, the DBMS performs the update, simultaneously changing the system-period start time to the transaction timestamp. Now the updated row is the current system row as of the transaction timestamp. UPDATE triggers for the rows in question will fire, but INSERT triggers will not fire even though historical rows are being inserted as a part of this operation. If you are wondering what triggers are, they are covered extensively in Chapter 23.

A DELETE operation on a system-versioned table doesn't actually delete the specified rows. Instead it changes the system-time period end time of those rows to the system timestamp. This indicates that those rows ceased to be current as of the transaction timestamp. Now those rows are part of the historical system rather than the current system. When you perform a DELETE operation, any DELETE triggers for the affected rows will fire.

Designating primary keys in system-versioned tables

Designating primary keys in system-versioned tables is a lot simpler than it is in application-time period tables. This is because you don't have to deal with time period issues. In system-versioned tables, the historical rows cannot be changed. Back when they were current rows, they were checked for uniqueness. Because they cannot be changed now, they don’t need to be checked for uniqueness now either.

If you add a primary key constraint to an existing system-versioned table with an ALTER statement, because it applies only to the current rows, you need not include period information in the statement. For example:

ALTER TABLE employee_sys

ADD PRIMARY KEY (EmpID);

That does the trick. Short and sweet.

Applying referential integrity constraints to system-versioned tables

Applying referential integrity constraints to system-versioned tables is also straightforward for the same reason. Here’s an example of that:

ALTER TABLE employee_sys

ADD FOREIGN KEY (EmpDept)

REFERENCES dept_sys (DeptID);

Because only current rows are affected, you don’t need to include the start and end of the period columns.

Querying system-versioned tables

Most queries of system-versioned tables are concerned with what was true at some point in time in the past or during some period of time in the past. To deal with these situations, SQL:2011 added some new syntax. To query a table for information about what was true at a specific point in time, the SYSTEM_TIME AS OF syntax is used. Suppose you want to know who was employed by the organization on July 15, 2017. You could find out with the following query:

SELECT EmpID, EmpName, Sys_Start, Sys_End

FROM employee_sys FOR SYSTEM_TIME AS OF

TIMESTAMP '2017-07-15 00:00:00';

This statement returns all rows whose start time is equal to or before the timestamp value and whose end time is later than the timestamp value.

To find what was true during a period of time, you can use a similar statement, with appropriate new syntax. Here's an example:

SELECT EmpID, EmpName, Sys_Start, Sys_End

FROM employee_sys FOR SYSTEM_TIME FROM

TIMESTAMP '2017-07-01 00:00:00' TO

TIMESTAMP '2017-08-01 00:00:00';

This retrieval will include all the rows starting at the first timestamp, up to but not including the second timestamp.

If a query on a system-versioned table does not include a timestamp specification, the default case is to return only the current system rows. This case would be coded similar to the following:

SELECT EmpID, EmpName, Sys_Start, Sys_End

FROM employee_sys;

If you want to retrieve all rows in a system-versioned table, both historical and current, you can do it with the following syntax:

SELECT EmpID, EmpName, Sys_Start, Sys_End

FROM employee_sys FOR SYSTEM_TIME FROM

TIMESTAMP '2017-07-01 00:00:00' TO

TIMESTAMP '9999-12-31 24:59:59';

Tracking Even More Time Data with Bitemporal Tables

Sometimes you want to know both when an event occurred in the real world and when that event was recorded in the database. For cases such as this, you may use a table that is both a system-versioned table and an application-time period table. Such tables are known as bitemporal tables.

There are a number of cases where a bitemporal table might be called for. Suppose, for example, that one of your employees moves her residence across the state line from Oregon to Washington. You must take account of the fact that her state income tax withholding must change as of the official date of the move. However, it is unlikely that the change to the database will be made on exactly that same day. Both times need to be recorded, and a bitemporal table can do that recording very well. The system-versioned time period records when the change became known to the database, and the application-time period records when the move legally went into effect. Here’s some example code to create such a table:

CREATE TABLE employee_bt (

EmpID INTEGER,

EmpStart DATE,

EmpEnd DATE,

EmpDept INTEGER,

PERIOD FOR EmpPeriod (EmpStart, EmpEnd),

Sys_Start TIMESTAMP (12) GENERATED ALWAYS AS ROW START,

Sys_End TIMESTAMP (12) GENERATED ALWAYS AS ROW END,

EmpName VARCHAR (30),

EmpStreet VARCHAR (40),

EmpCity VARCHAR (30),

EmpStateProv VARCHAR (2),

EmpPostalCode VARCHAR (10),

PERIOD FOR SYSTEM_TIME (Sys_Start, Sys_End),

PRIMARY KEY (EmpID, EPeriod WITHOUT OVERLAPS),

FOREIGN KEY (EDept, PERIOD EPeriod)

REFERENCES Dept (DeptID, PERIOD DPeriod)

) WITH SYSTEM VERSIONING;

Bitemporal tables serve the purposes of both system-versioned tables and application-time tables. The user supplies values for the application-time period start and end columns. An INSERT operation in such a table automatically sets the value of the system-time period to the transaction timestamp. The value of the system-time period end column is automatically set to the highest value permitted for that column’s data type.

UPDATE and DELETE operations work as they do for standard application-time period tables. As is true with system-versioned tables, UPDATE and DELETE operations affect only current rows, and with each such operation a historical row is automatically inserted.

A query made upon a bitemporal table can specify an application-time period, a system-versioned period, or both. Here's an example of the “both” case:

SELECT EmpID

FROM employee_bt FOR SYSTEM TIME AS OF

TIMESTAMP '2017-07-15 00:00:00'

WHERE EmpID = 314159 AND

EmpPeriod CONTAINS DATE '2017-06-20 00:00:00';

Formatting and Parsing Dates and Times

A language standard, such as the international SQL standard, should describe legal syntax for the language, and in addition, how to format language elements, such as dates and times. Amazingly, the SQL standard did not do this until SQL:2016. It’s possible to express dates and times in a variety of ways, and the standard format in the USA, for example, is different from the standard format in Europe.

SQL:2016 specifies how to represent units of time in a template, to show how actual dates and/or times should be represented. For example, a data of September 16, 2018 could be represented as ’09-16-2018’ in an SQL statement. A template that shows the expected format for a date could be of the form ‘MM-DD-YYYY’. This template tells the SQL programmer to express the data first as month, followed by day, and then by year. Alternatively, a template of ‘DD-MM-YYYY’ would tell the programmer to put the day first, followed by the month, and then the year. The MM, DD, and YYYY entries are placeholders that are to be replaced in an SQL statement by a specific month, day, and year.

In addition to MM, DD, and YYYY, there are several other placeholders that have specific meanings. Table 7-6 lists them, along with what they stand for.

TABLE 7-6 Template Placeholders and What They Stand For

Placeholder

Meaning

YYYY | YYY | YY | Y

Year

RRRR | RR

Rounded year

MM

Month

DD

Day of month

DDD

Day of year

HH | HH12

Hour, out of 12

HH24

Hour, out of 24

MI

Minute

SS

Second, out of minute

SSSS

Second, out of day

FF1 | FF2 | …. | FF9

Fraction of a second

A.M | P.M.

AM or PM

TZH

Time zone hour

TZM

Time zone minute

You can use format templates with these placeholders in the JSON path method datetime, as described in Chapter 19, and in a CAST expression, as described in Chapter 9.