Automating Common Routines

As an object-relational DBMS, PostgreSQL has helped pioneer several non-standard SQL extensions. Several of these are designed to aid in the automation of commonly executed database routines.

This section covers two such extensions: sequences and triggers.

A sequence in PostgreSQL is a database object that is essentially an automatically incrementing numeric value. For this reason, sequences are commonly known in other database products as auto-increment values. Sequences can be extremely useful in assigning non-random, unique identification numbers to tables that require such values. A sequence consists of a current numeric value, and a set of characteristics that determine how to automatically increment (or alternatively, decrement) that value upon use.

Along with its current value, a sequence also includes a minimum value, a maximum value, a starting value, and the amount to increment the sequence by. This increment is usually 1, but may be any whole integer.

In practice, sequences are not meant to be accessed directly. Instead, they are used through a set of functions built into PostgreSQL which either set, increment, or return the current value of the sequence.

Sequences are created with the CREATE SEQUENCE SQL command. The sequence can be specified to increment or decrement. The syntax for CREATE SEQUENCE is:

CREATE SEQUENCE sequencename
       [ INCREMENT increment ]
       [ MINVALUE minvalue ]
       [ MAXVALUE maxvalue ]
       [ START start ]
       [ CACHE cache ]
       [ CYCLE ]

In this syntax, sequencename is the name of the sequence to be created. This is the only required parameter. A sequence uses the integer data type, and it therefore shares its maximum and minimum limitations of 2147483647 and −2147483647, respectively.

The optional CREATE SEQUENCE clauses are as follows:

INCREMENT increment_val

Sets the numeric quantity with which to modify the sequence’s value to increment_val. This is used when the nextval( ) function is called on the sequence. Setting increment_val to a negative number results in a descending sequence. The default value is 1.

MINVALUE minvalue

Sets the fixed minimum value for the sequence to minvalue. Any attempt to lower a sequence below this value will result in an error, or in the value cycling to its maximum value (if the CYCLE keyword was used when the sequence was created).

The default value is 1 for ascending sequences, and –2147483647 for descending sequences.

MAXVALUE maxvalue

Sets the fixed maximum value for the sequence to maxvalue. Any attempt to raise a sequence above this value will result in an error, or in the value cycling to its minimum value.

The default value is 2147483647 for ascending sequences, and –1 for descending sequences.

START start_val

Sets the value that the sequence begins at. It may be any integer between the minimum and maximum values. The sequence defaults to start at its minimum value for ascending sequences, and its maximum value for descending sequences.

CACHE cache

Provides the ability for sequence values to be pre-allocated and stored in memory. This can result in faster access times to highly used sequences. The minimum and default value is 1; a higher value of cache results in more values being cached.

CYCLE

Enables the sequence to continue generating new values after it has reached its maximum or minimum value. When the limit is reached, the sequence starts over at the minimum value (for ascending sequences), or at the maximum value (descending sequences).

Example 7-28 creates a simple ascending sequence named shipments_ship_id_seq that starts at a value of 0, and will be incremented by the default increment of 1 until it reaches the default maximum limit of 2147483647. By not using the CYCLE keyword, the sequence is guaranteed to always return a unique value.

The output from the \d command within psql shows whether or not a database object is a sequence, table, view or index. More specifically, the \ds command can be used to view all sequences in the currently connected database. For example:

booktown=# \ds
             List of relations
          Name         |   Type   |  Owner
-----------------------+----------+---------
 book_ids              | sequence | manager
 shipments_ship_id_seq | sequence | manager
 subject_ids           | sequence | manager
(3 rows)

While not often necessary, sequences can be directly queried with SELECT statements, as if they were a table or view. When you query a sequence, you use the attributes of that sequence as columns in your select list. The attributes of a sequence are shown in Table 7-1.

Example 7-29 illustrates a query to the shipments_ship_id_seq sequence. This query selects the last_value attribute, which is the most currently selected value from the sequence, and the increment_by attribute, which is the amount the sequence is to be incremented each time the nextval( ) function is called.

Since the sequence in question has just been created, its last_value is still set to 0.

Sequences are typically not queried directly, but are instead used through functions. There are three functions in PostgreSQL which apply exclusively to sequences:

The most commonly used sequence function is nextval( ). This is the function that actually pushes the increment of the value. It requires the name of the sequence as the argument (bound by single quotes), and returns a value of type integer.

Example 7-30 selects a couple of incremented values from the sequence named shipments_ship_id_seq.

Sequences are commonly used as default values for tables which require unique integer identifiers. The shipments table within the booktown database, shown in Table 7-2, exemplifies this.

The syntax to create the table in Table 7-2, with the auto-incrementing DEFAULT and PRIMARY KEY constraint, is:

CREATE TABLE shipments
       (id integer DEFAULT nextval('shipments_ship_id_seq') PRIMARY KEY,
        customer_id integer, isbn text, ship_date timestamp)

The default value for the id column in Table 7-2 is set to the nextval( )’s result on the shipments_ship_id_seq sequence. Insertion of row data that does not specify a value for id will therefore choose its value from the result of this function call.

After the nextval( ) function has been called on a sequence in a given session (a connection to PostgreSQL), the currval( ) function may be used on that same sequence to return the most recently returned value from the sequence. Note that this function may only be called on a sequence that has been called through nextval( ) in the active session.

Example 7-31 inserts a new row into the shipments column, without specifying the value for the id column. This causes the default value to be used, which (as noted in Table 7-2) is the result of the shipments_ship_id_seq being incremented by the nextval( ) function. The currval( ) function is then used to access the row that was just inserted.

Finally, a sequence may also have its last_value attribute reset to an arbitrary numeric value (within its maximum and minimum value range) by using the setval( ) function. This requires the name of the sequence as a single-quote bound character string for the first argument and an integer constant representing the new value for last_value for the second argument.

There are two ways to go about this. By default, setval( ) assumes that the new setting is for an initialized sequence; this means that the next value returned by nextval( ) will actually be incremented once past the value set by setval( ).

Alternatively, an optional false value of type boolean may be added as the last argument to setval( ), de-initializing the sequence. This modifies the sequence so that the next value returned by nextval( ) will be the same numeric value passed to setval( ) (though the sequence will of course be incremented on the next call to nextval( )).

Example 7-32 sets the shipments_ship_id_seq’s last_value to 1010 through each method, and selects the nextval( ) on the same sequence to illustrate the effective result.

Often, anticipated SQL events should precede or follow a particular action. This action might be a consistency check on a set of values to be inserted, the formatting of supplied data before it is inserted, or a modification to a separate table following the removal or modification of a set of rows. Traditionally, such actions are handled at the programmatic level within an application connected to the database, rather than by the database software itself.

To ease the responsibility of the application’s database interaction, PostgreSQL supports a non-standard programmatic extension known as a trigger. A trigger defines a function which occurs before, or after, another action on a table. A trigger is implemented through C, Pl/pgSQL or any other functional language (with the exception of SQL) that PostgreSQL can use to define a function (see the section titled Extending PostgreSQL later in this chapter for more on creating functions, or Chapter 11 for more on PL/pgSQL).

Triggers may affect any of the following SQL events on a table:

In order to create a trigger, a function must first exist for it to execute. PostgreSQL supports many types of functions, including those defined by SQL, PL/pgSQL, and C. As of PostgreSQL 7.1.x, a trigger may use a function defined in any language, with the exception that the function cannot be defined as a purely SQL function.

Once a function is defined, a trigger may be defined to call that function either before or after an event on a specified table. Here is the syntax to create a trigger, followed by a description of its syntax:

CREATE TRIGGER name { BEFORE | AFTER } { event [ OR event ... ] }
               ON tablename
               FOR EACH { ROW | STATEMENT }
               EXECUTE PROCEDURE functionname ( arguments )
CREATE TRIGGER name

name is any arbitrary name for the new trigger. A trigger may have the same name as an existing trigger in a database provided that it is defined to operate on a different table. Also, like most other non-system database objects, triggers must only have a unique name (and table to operate on) within the database they are created in.

{ BEFORE | AFTER }

The BEFORE keyword instructs the defined function to be executed before the event is attempted, which also precedes any built-in constraint checking on the values involved in the case of an INSERT or DELETE event. Alternatively, the AFTER keyword causes the function to be called only after the attempted action has finished.

{ event [ OR event ... ] }

event is any one of the supported SQL events; multiple events may be listed, separated by the OR keyword.

ON tablename

tablename is the name of the table which, when modified by event, initiates this trigger.

FOR EACH { ROW | STATEMENT }

The keyword following the FOR EACH clause determines how many times the function should be called when the defined event is triggered. Use the ROW keyword to specify that the function is to be executed once for each affected row. Conversely, if the function should be executed only once for the calling statement, the STATEMENT keyword is used.

EXECUTE PROCEDURE functionname ( arguments )

functionname is the name of the existing function to be executed, with passed arguments.

While PostgreSQL tables support constraints to perform simple checks against static criteria, sometimes more involved procedures may be needed to validate input values. This is a typical example of where a trigger might be useful.

A trigger may be used to validate input values by preparing a validation function to be executed before values are inserted into a table, or before values in a table are updated. The function can then be made responsible for verifying that the values meet a complex set of restrictions, and even return an appropriate error through PostgreSQL’s error logging system.

Suppose that you have written a function in a procedural language that validates attempted INSERT or UPDATE values on the shipments table, and that then performs an update on the stock table to decrement the inventory for the shipment. This function could be written in any language that PostgreSQL supports (with the noted exception of pure SQL).

Specifically, suppose that this function verifies that both the provided customer_id and isbn exist in their respective customers and editions tables. If at least one is missing, a meaningful error is returned. Otherwise, the SQL statement is allowed to execute, and on a successful INSERT statement, the stock table is automatically decremented to reflect the drop in stock from the shipment.

Example 7-35 creates a trigger to be “fired” immediately before an INSERT or UPDATE statement is processed on the shipments table. The trigger invokes the check_shipment_addition( ) function once per each modified row.

Since the check_shipment trigger is configured to execute the check_shipment_addition( ) function for both INSERT and UPDATE statements, the integrity of the customer_id and isbn columns are fairly robustly maintained. Its use of the ROW keyword ensures that each added or modified row will be processed by the check_shipment_addition( ) validation function.

No arguments are passed to the check_shipment_addition( ) function, as it uses internal PL/pgSQL variables to check incoming rows. See Example 7-53, in Chapter 11, for the implementation of the check_shipment_addition( ) function, written in PL/pgSQL.

Triggers are stored in the pg_trigger PostgreSQL system table, and can have their characteristics queried after creation. The structure of the pg_trigger table is shown in Table 7-3.

Most of the columns in the Table 7-3 column are unlikely to be useful in a direct query. The most immediately relevant attributes of the pg_trigger system table are tgrelid and tgname.

The tgrelid value is the trigger’s relation identifier number. This value is of type oid, and corresponds to the relfilenode column in the pg_class system table. The tgname is the identifier which represents the name of the trigger, as specified in the CREATE TRIGGER command when the trigger was created.

The DROP TRIGGER command removes a trigger permanently from the database. Similar to the CREATE TRIGGER command, using this command requires you to be either the owner of the trigger, or a superuser.

Here is the syntax to remove an existing trigger:

DROP TRIGGER name ON table

Example 7-36 drops the check_shipment trigger placed on the shipments table.

The DROP statement indicates that the trigger was successfully dropped. Notice that you must specify not only the name of the trigger that you wish to remove, but also the table on which it is placed.

If you are unsure which table a particular trigger is placed on, you can derive this information from PostgreSQL’s system tables. For example, you can perform a join between the pg_trigger system table’s tgrelid column and the pg_class system table’s relfilenode column, comparing the name of the trigger against the tgname column. Example 7-37 demonstrates such a query to check the assigned relation (relname) associated with the trigger named check_shipment.