Chapter 6. Advanced DBI

This chapter covers some of the more advanced topics of using DBI, including the ability to alter the way in which the database and statement handles function on-the-fly, as well as how to use explicit transaction handling within your database. These topics are not strictly necessary for basic DBI usage, but they contain useful information that will allows you to maximize the potential of your DBI programs.

Handle Attributes and Metadata

In addition to methods associated with database and statement handles, the DBI also defines attributes for these handles that allow the developer to examine or fine-tune the environment in which the handles operate. Some attributes are unique to either database or statement handles, and some are common to both.

The attribute values of a handle can be thought of as a hash of key/value pairs, and can be manipulated in the same way as you would manipulate an ordinary hash via a reference. Here are a few examples using the AutoCommit attribute:

### Set the database handle attribute "AutoCommit" to 1 (e.g., on)
$dbh->{AutoCommit} = 1;

### Fetch the current value of "AutoCommit" from the handle
$foo = $dbh->{AutoCommit};

Fetching attributes as hash values, rather than as method calls, has the added bonus that the hash lookup can be interpolated inside double-quoted strings:

### Print the current value of "AutoCommit" from the handle
print "AutoCommit: $dbh->{AutoCommit}\n";

With AutoCommit enabled, that would print:

AutoCommit: 1

as you might expect. Actually, since AutoCommit is a boolean attribute, it would print 1 after any value that Perl considers true had been assigned to it.

After a false value was assigned, you may reasonably expect a 0 to be printed, but you might be surprised to see:

AutoCommit:

That’s because Perl uses an internal representation of false that is both a numeric zero and an empty string at the same time. When used in a string context, the empty string is printed. In a numeric context, the zero is used.

When getting or setting an attribute value, the DBI automatically checks that the attribute name you are using and generates an error if it’s not known.^[53] Similarly, any attempts to set a read-only attribute will result in an error. Be aware, however, that these errors are reported using die() regardless of the setting of the RaiseError attribute, and are thus potentially fatal. That’s another good reason to use eval {...} blocks, as we discussed in Chapter 4.

A statement handle is known as a child, or kid, of its parent database handle. Similarly, database handles are themselves children of their parent driver handle. Child handles inherit some attribute values from parent handles. The rules for this behavior are defined in a common-sense manner and are as follows:

A statement handle will inherit (copy) the current values of certain attributes from its parent database handle.
If that new statement handle then has its attribute values altered, this affects neither the parent database handle nor any other statement handles. The changes are contained entirely within the altered statement handle.
Changes to attributes within a database handle do not affect any of its existing child statement handles. The database handle attribute changes only affect future statement handles created from that database handle.

The DBI specification in Appendix A should be consulted for complete information on which attributes are inherited.

Passing Attributes to DBI Methods

Handles carry with them their set of current attribute values that methods often use to control how they behave. Many methods are defined to also accept an optional reference to a hash of attribute values.

This is primarily an escape mechanism for driver developers and their users, and so does not always work in the way you might think. For example, you might expect this code:

$dbh->{RaiseError} = 1;
...
$dbh->do( $sql_statement, undef, { RaiseError => 0 } ); # WRONG

to turn off RaiseError for the do() method call. But it doesn’t! Attribute parameters are ignored by the DBI on all database handle and statement handle method calls. You don’t even get a warning that the attribute has been ignored.

If they’re ignored, then what’s the point in having them? Well, the DBI itself ignores them, but the DBD driver that processed the method call may not. Or then again, it may! Attribute hash parameters to methods are hints to the driver and typically only usefully hold driver-specific attributes.^[54]

That doesn’t apply to the DBI->connect() method call because it’s not a driver method, it’s a DBI method. Its attribute hash parameter, \%attr , is used to set the attributes of the newly created database handle. We gave some examples using RaiseError in Chapter 4, and we give more in the following section.

Connecting with Attributes

One of Perl’s many catch phrases is "there’s more than one way to do it,” and the DBI is no exception. In addition to being able to set attributes on a handle by simple assignment and by the attribute parameter of the connect() method (as shown earlier), the DBI provides another way.

You can include attribute assignments in the data source name parameter of the connect() method. For example:

$dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
    RaiseError => 1
});

can also be expressed as:

$dbh = DBI->connect( "dbi:Oracle(RaiseError=>1):archaeo", '', '');

You can’t have any space before the opening parenthesis or after the closing one before the colon, but you can have spaces within the parentheses. You can also use just = instead of => if you prefer. If you want to set more than one attribute then use a comma to separate each one.

The attribute settings in the data source name parameter take precedence over those in the attribute parameter. This can be very handy when you want to override a hardcoded attribute setting, such as PrintError. For example, this code will leave PrintError on:

$dbh = DBI->connect( "dbi:Oracle(PrintError=>1):archaeo", '', '', {
    PrintError => 0
});

But what’s the point of just hardcoding the attribute setting in two different places? This example is not very useful as it stands, but we could let the application accept the data source name parameter from the command line as an option, or leave it empty and use the DBI_DSN environment variable. That makes the application much more flexible.

The Significance of Case

You may have noticed that some attribute names use all uppercase letters, like NUM_OF_FIELDS, while others use mixed case letters, like RaiseError. If you’ve seen any descriptions of individual database drivers you may have also noticed some attribute names that use all lowercase letters, like ado_conn and ora_type.

There is a serious method behind the apparently inconsistent madness. The letter case used for attribute names is significant and plays an important part in the portability of DBI scripts and the extensibility of the DBI itself. The letter case of the attribute name is used to signify who defined the meaning of that name and its values, as follows:

UPPER_CASE

Attribute names that use only uppercase letters and underscores are defined by external standards, such as ISO SQL or ODBC.

The statement handle TYPE attribute is a good example here. It’s an uppercase attribute because the values it returns are the standard portable datatype numbers defined by ISO SQL and ODBC, and not the nonportable native database datatype numbers.

MixedCase

Attribute names that start with an uppercase letter but include lowercase letters are defined by the DBI specification.

lower_case

Attribute names that start with a lowercase letters are defined by individual database drivers. These are known as driver-specific attributes.

Because the meanings are assigned by driver authors without any central control, it’s important that two driver authors don’t pick the same name for attributes with different behaviors. To ensure this, driver-specific attributes all begin with a prefix that identifies the particular driver. For example, DBD::ADO attributes all begin with ado_, DBD::Informix attributes begin with ix_, etc.

For example, most drivers provide a driver-specific version of the statement handle TYPE attribute that returns the native database datatype numbers instead of the standard ones. DBD::Oracle calls it ora_type, DBD::Ingres calls it ing_ingtype, and DBD::mysql calls it mysql_type. The prefix also makes it easier to find driver-specific code in applications when maintaining them.

Driver-specific attributes play an important role in the DBI. They are an escape valve. They let drivers expose more of the special functionality and information that they have available without having to fit it inside the fairly narrow DBI straitjacket.

Common Attributes

Common attributes are those that can be queried and set within both database and statement handles. This section discusses some of the most commonly used attributes, including:

PrintError

The PrintError attribute, when enabled, will cause the DBI to issue a warning when a DBI method returns with an error status. This functionality is extremely useful for rapid debugging of your programs, as you may not have written explicit return value checking code after every DBI statement.

The printed error string lists the class of the database driver through which the DBI method was dispatched, the method that caused the error to occur, and the value of $DBI::errstr. The following message was generated when the prepare() method did not successfully execute against an Oracle7 database using the DBD::Oracle driver:

DBD::Oracle::db prepare failed: ORA-00904: 
   invalid column name (DBD: error possibly near <*> indicator at char 8 in '
           SELECT <*>nname, location, mapref
           FROM megaliths
     ') at /opt/WWW/apache/cgi-bin/megalith/megadump line 79.

PrintError uses the standard Perl function called warn() to render the error message. Therefore, you could use a $SIG{_ _WARN_ _} error handler or an error handling module such as CGI::ErrorWrap to re-route the error messages from PrintError.

This attribute is enabled by default.

RaiseError

The RaiseError attribute is similar in style to its PrintError cousin, but differs slightly in operation. Whereas PrintError simply displayed a message when the DBI detected an error had occurred, RaiseError usually kills the program stone-dead.

RaiseError uses the standard Perl function die() to throw the exception and exit. This means you can use eval to catch the exception and deal with it yourself.^[55] This is an important and valuable error handling strategy for larger applications and is highly recommended when using transactions.

The format of the error message printed by RaiseError is identical to that of PrintError. If both PrintError and RaiseError are defined, PrintError will be skipped if no $SIG{_ _DIE_ _} handler is installed.^[56]

RaiseError is disabled by default.

ChopBlanks

This attribute regulates the behavior of the underlying database driver regarding the CHAR datatype in fixed-width and blank-padded character columns. By setting this attribute to a true value, any CHAR columns returned by a SELECT statement will have any trailing blanks chopped off. No other datatypes are affected even when trailing blanks are present.

Setting ChopBlanks usually occurs when you simply want to remove trailing spaces from data without having to write some explicit truncation code either in the original SQL statement or in Perl.

This can be a very handy mechanism when dealing with old databases that tend to use fixed-width, blank-padded CHAR types more often than VARCHAR types. The blank padding added by the database tends to get in the way.

This attribute is currently disabled by default.

LongReadLen and LongTruncOk

Many databases support BLOB (binary large object), LONG, or similar datatypes for holding very long strings or large amounts of binary data in a single field. Some databases support variable-length long values over 2,000,000,000 bytes in length.

Since values of that size can’t usually be held in memory, and because databases can’t usually know in advance the length of the longest LONG that will be returned from a SELECT statement (unlike other datatypes), some special handling is required. In this situation, the value of the LongReadLen attribute is used to determine how much buffer space to allocate when fetching such fields.

LongReadLen typically defaults to or a small value like 80, which means that little or no LONG data will be fetched at all. If you plan to fetch any LONG datatypes, you should set LongReadLen within your application to slightly more than the length of the longest long column you expect to fetch. Setting it too high just wastes memory.^[57]

The LongTruncOk attribute is used to determine how to behave if a fetched value turns out to be larger than the buffer size defined by LongReadLen. For example, if LongTruncOk is set to a true value, (e.g., “truncation is okay”) the over-long value will be silently truncated to the length specified by LongReadLen, without an error.

On the other hand, if LongTruncOk is false then fetching a LONG data value larger than LongReadLen is treated as an error. If RaiseError is not enabled then the fetch call retrieving the data will appear to fail in the usual way, which looks like the end of data has been reached.

LongTruncOk is set to false by default, which causes overly long data fetches to fail. Be sure to enable RaiseError or check for errors after your fetch loops.

We’ll discuss handling LONG data in more detail in later in this chapter.

The DBI specification in Appendix A provides a complete list of all the common attributes defined within the DBI.

Database Handle Attributes

Database handle attributes are specific to database handles and are not valid for other types of handles. They include:

AutoCommit

The AutoCommit database handle attribute can be used to allow your programs to use fine-grained transaction behavior (as opposed to the default “commit everything” behavior).

The functionality of this attribute is closely tied into the way in which DBI defines transaction control. Therefore, a complete description of this parameter can be found later in this chapter.

Name

The Name database handle attribute holds the “name” of the database. Usually the same as the "dbi:DriverName:..." string used to connect to the database, but with the leading "dbi:DriverName:" removed.

The DBI Specification in Appendix A provides a complete list of all the database handle attributes defined within the DBI. We’ll discuss statement handle attributes in a moment, but first we’ll explore database metadata.

Database Metadata

Database metadata is high-level information, or “data about data,” stored within a database describing that database. This information is extremely useful for dynamically building SQL statements or even generating dynamic views of the database contents.

The metadata stored by a database, and the way in which it’s stored, varies widely between different database systems. Most major systems provide a system catalog , consisting of a set of tables and views that can be queried to get information about all the entities in the database, including tables and views. There are two common problems with trying to query the system catalog directly: they can be complex and difficult to query, and the queries are not portable to other types of database.

The DBI should provide a range of handy methods to access this information in a portable way, and one day it will. However, currently it only provides two methods that can be executed against a valid database handle to extract entity metadata from the database.

The first of these methods is called tables() , and simply returns an array containing the names of tables and views within the database defined by the relevant database handle. The following code illustrates the use of this method:

### Connect to the database
my $dbh = DBI->connect( 'dbi:Oracle:archaeo', 'stones', 'stones' );

### Get a list of tables and views
my @tables = $dbh->tables();

### Print 'em out
foreach my $table ( @tables ) {
    print "Table: $table\n";
}

Connecting to a MySQL database would generate:

Table: megaliths
Table: media
Table: site_types

However, connecting to an Oracle database would generate:

Table: STONES.MEGALITHS
Table: STONES.MEDIA
Table: STONES.SITE_TYPES

In both cases, if the database contains other tables, they’d be included in the output.

Oracle stores all names in uppercase by default, so that explains one of the differences, but what about the "STONES." that’s been prefixed to each table name?

Oracle, like most other big database systems, supports the concept of schemas . A schema is a way of grouping together related tables and other database objects into a named collection. In Oracle each user gets their own schema with the same name as the user. (Not all databases that support schemas take this approach.)

If an Oracle user other than stones wanted to refer to the media table then, by default, they would need to fully qualify the table name by adding the stones schema name, e.g., stones.media. If they didn’t then the database would think they were refering to a media table in their own schema.

So, the leading STONES in the output is the name of the schema that the tables are defined in. Returning the fully qualified table names is important because the tables() method will return the names of all the tables owned by all the users that it can discover.

The other method used to retrieve database metadata is called table_info() , and returns more detailed information about the tables and views stored within the database.

When invoked, table_info() returns a prepared and executed statement handle that can be used to fetch information on the tables and views in the database. Each row fetched from this statement handle contains at least the following fields in the order listed:^[58]

TABLE_QUALIFIER: This field contains the table qualifier identifier. In most cases this will be undef (NULL).
TABLE_OWNER: This field contains the name of the owner of the table. If your database does not support multiple schema or table owners, this field will contain undef (NULL).
TABLE_NAME: This field contains the name of the table and should never be undef.
TABLE_TYPE: This field contains the ``type'' of entity signified by this row. The possible values include TABLE, VIEW, SYSTEM TABLE, GLOBAL TEMPORARY, LOCAL TEMPORARY, ALIAS, SYNONYM, or some database driver-specific identifier.
REMARKS: This field contains a description or comment about the table. This field may be undef (NULL).

Therefore, if we wished to list some basic information on the tables contained within the current schema or database, we can write the following program that uses table_info() to retrieve all the table information, then formats the output:

#!/usr/bin/perl -w # # ch06/dbhdump: Dumps information about a SQL statement. use DBI; ### Connect to the database my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , { RaiseError => 1 } ); ### Create a new statement handle to fetch table information my $tabsth = $dbh->table_info(); ### Print the header print "Qualifier Owner Table Name Type Remarks\n"; print "========= ======== =============================== ===== =======\n\n"; ### Iterate through all the tables... while ( my ( $qual, $owner, $name, $type, $remarks ) = $tabsth->fetchrow_array() ) { ### Tidy up NULL fields foreach ($qual, $owner, $name, $type, $remarks) { $_ = "N/A" unless defined $_; } ### Print out the table metadata... printf "%-9s %-9s %-32s %-6s %s\n", $qual, $owner, $name, $type, $remarks; } exit;
Running this program against our megalithic database on an Oracle database produces the following output:
Qualifier Owner Table Name Type Remarks ========= ======== =============================== ===== ======= N/A STONES MEDIA TABLE N/A N/A STONES MEGALITHS TABLE N/A N/A STONES SITE_TYPES TABLE N/A
This form of metadata is not tremendously useful, as it lists only metadata about the objects within the database, and not the structure of the objects themselves (such as table column names). Extracting the structure of each table or view within the database requires us to look to a different type of metadata, which is available via statement handles .

Statement Handle Attributes or Statement Metadata

Statement handle attributes are specific to statement handles, and inherit any inheritable attributes from their parent database handle. Many statement handle attributes are defined as being read-only because they simply describe the prepared statement or its results.

In theoretical terms, these attributes should be defined when the statement handle is prepared, but in practical terms, you should only rely on the attribute values after the statement handle has been both prepared and executed. Similarly, with a few drivers, fetching all the data from a SELECT statement or explicitly invoking the finish() method against a statement handle may cause the values of the statement handle attributes to be no longer available.

The DBI specification in Appendix A provides a complete list of all the statement handle attributes defined within the DBI.

Statement

This attribute contains the statement string passed to the prepare() method.

NUM_OF_FIELDS

This attribute is set to contain the number of columns that will be returned by a SELECT statement. For example:

$sth = $dbh->prepare( " SELECT name, location, mapref FROM megaliths " ); $sth->execute(); print "SQL statement contains $sth->{NUM_OF_FIELDS} columns\n"; Non-SELECT statements will contain the attribute value of zero. This allows you to quickly determine whether or not the statement is a SELECT statement.

NAME NAME_uc NAME_lc

The NAME attribute contains the names of the selected columns within the statement. The attribute value is actually a reference to an array, with length equal to the number of fields in the original statement.

For example, you can list all the column names of a table like this:

$sth = $dbh->prepare( "SELECT * FROM megaliths" ); $sth->execute(); for ( $i = 1 ; $i <= $sth->{NUM_OF_FIELDS} ; $i++ ) { print "Column $i is called $sth->{NAME}->[$i-1]\n"; } The names contained within the attribute array are the column names returned by the underlying database. There are two additional attributes relating to the column names. NAME_uc contains the same column names as the NAME attribute, but with any lowercase characters converted to uppercase. Similarly the NAME_lc attribute has any uppercase characters converted to lowercase. Generally these attributes should be used in preference to NAME.

TYPE

The TYPE attribute contains a reference to an array of integer values representing the international standard values for the respective datatypes. The array of integers has a length equal to the number of columns selected within the original statement, and can be referenced in a similar way to the NAME attribute example shown earlier.

The standard values for common types are:

SQL_CHAR 1 SQL_NUMERIC 2 SQL_DECIMAL 3 SQL_INTEGER 4 SQL_SMALLINT 5 SQL_FLOAT 6 SQL_REAL 7 SQL_DOUBLE 8 SQL_DATE 9 SQL_TIME 10 SQL_TIMESTAMP 11 SQL_VARCHAR 12 SQL_LONGVARCHAR -1 SQL_BINARY -2 SQL_VARBINARY -3 SQL_LONGVARBINARY -4 SQL_BIGINT -5 SQL_TINYINT -6 SQL_BIT -7 SQL_WCHAR -8 SQL_WVARCHAR -9 SQL_WLONGVARCHAR -10While these numbers are fairly standard,^[59] the way drivers map their native types to these standard types varies greatly. Native types that don’t correspond well to one of these types may be mapped into the range officially reserved for use by the Perl DBI: -9999 to -9000.

PRECISION

The PRECISION attribute contains a reference to an array of integer values that represent the defined length or size of the columns in the SQL statement.

There are two general ways in which the precision of a column is calculated. String datatypes, such as CHAR and VARCHAR, return the maximum length of the column. For example, a column defined within a table as:

location VARCHAR2(1000) would return a precision value of 1000.Numeric datatypes are treated slightly differently in that the number of significant digits is returned. This may have no direct relationship with the space used to store the number. Oracle, for example, stores numbers with 38 digits of precision but uses a variable length internal format of between 1 and 21 bytes. For floating-point types such as REAL, FLOAT, and DOUBLE, the maximum ``display size'' can be up to seven characters greater than the precision due to concatenated sign, decimal point, the letter ``E,'' a sign, and two or three exponent digits.

SCALE

The SCALE attribute contains a reference to an array of integer values that represents the number of decimal places in the column. This is obviously only of any real use with floating-point numbers. Integers and non-numeric datatypes will return zero.

NULLABLE

The NULLABLE attribute contains a reference to an array of integer values that tells us whether or not a column may contain a NULL value. The elements of the attribute array each contain one of three values:

0: The column cannot contain a NULL value.
1: The column can contain a NULL value.
2: It is unknown if the column can contain a null value.

NUM_OF_PARAMS: The NUM_OF_PARAMS attribute contains the number of parameters (placeholders) specified within the statement.

Common uses for these statement handle attributes are to format and display data fetched from queries dynamically and to find out information about the tables stored within the database.

The following script performs the latter operation by first creating a statement handle that fetches information on all tables, as discussed earlier in Section 6.1.6, and then iterating through each table listing the table structure via the statement metadata:

#!/usr/bin/perl -w # # ch06/tabledump: Dumps information about all the tables. use DBI; ### Connect to the database my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , { RaiseError => 1 }); ### Create a new statement handle to fetch table information my $tabsth = $dbh->table_info(); ### Iterate through all the tables... while ( my ( $qual, $owner, $name, $type ) = $tabsth->fetchrow_array() ) { ### The table to fetch data for my $table = $name; ### Build the full table name with quoting if required $table = qq{"$owner"."$table"} if defined $owner; ### The SQL statement to fetch the table metadata my $statement = "SELECT * FROM $table"; print "\n"; print "Table Information\n"; print "=================\n\n"; print "Statement: $statement\n"; ### Prepare and execute the SQL statement my $sth = $dbh->prepare( $statement ); $sth->execute(); my $fields = $sth->{NUM_OF_FIELDS}; print "NUM_OF_FIELDS: $fields\n\n"; print "Column Name Type Precision Scale Nullable?\n"; print "------------------------------ ---- --------- ----- ---------\n\n"; ### Iterate through all the fields and dump the field information for ( my $i = 0 ; $i < $fields ; $i++ ) { my $name = $sth->{NAME}->[$i]; ### Describe the NULLABLE value my $nullable = ("No", "Yes", "Unknown")[ $sth->{NULLABLE}->[$i] ]; ### Tidy the other values, which some drivers don't provide my $scale = $sth->{SCALE}->[$i]; my $prec = $sth->{PRECISION}->[$i]; my $type = $sth->{TYPE}->[$i]; ### Display the field information printf "%-30s %5d %4d %4d %s\n", $name, $type, $prec, $scale, $nullable; } ### Explicitly deallocate the statement resources ### because we didn't fetch all the data $sth->finish(); } exit; When executed against our megalithic database, the following output is displayed:Table Information ================= Statement: SELECT * FROM STONES.MEDIA NUM_OF_FIELDS: 5 Column Name Type Precision Scale Nullable? ------------------------------ ---- --------- ----- --------- ID 3 38 0 No MEGALITH_ID 3 38 0 Yes URL 12 1024 0 Yes CONTENT_TYPE 12 64 0 Yes DESCRIPTION 12 1024 0 Yes Table Information ================= Statement: SELECT * FROM STONES.MEGALITHS NUM_OF_FIELDS: 6 Column Name Type Precision Scale Nullable? ------------------------------ ---- --------- ----- --------- ID 3 38 0 No NAME 12 512 0 Yes DESCRIPTION 12 2048 0 Yes LOCATION 12 2048 0 Yes MAPREF 12 16 0 Yes SITE_TYPE_ID 3 38 0 Yes Table Information ================= Statement: SELECT * FROM STONES.SITE_TYPES NUM_OF_FIELDS: 3 Column Name Type Precision Scale Nullable? ------------------------------ ---- --------- ----- --------- ID 3 38 0 No SITE_TYPE 12 512 0 Yes DESCRIPTION 12 2048 0 YesThis output shows the structural information of entities within our database. We could have achieved the same effect by querying our database’s underlying system tables. This would give us more information, but would not be portable.

^[53] Driver-specific attributes, e.g., those that start with a lowercase letter, are a special case. Any get or set of a driver-specific attribute that hasn’t been handled by the driver is handled by the DBI without error. That makes life easier for driver developers. On the other hand, you need to take extra care with the spelling. ^[54] It’s possible that a future version of the DBI may look for certain non-driver-specific attributes, such as RaiseError. ^[55]It also allows you to define a $SIG{_ _DIE_ _} handler, which handles the die() call instead of the Perl default behavior. ^[56]A future release may also skip PrintError if RaiseError is set and the current code is executing within an eval. ^[57]Using a value which is a power of two, such as 64 KB, 512 KB, 8 MB etc., can actually cause twice that amount to be taken on systems that have poor memory allocators. That’s because a few extra bytes are needed for housekeeping information and, because the dumb allocator only works with powers of two, it has to double the allocation to make room for it. ^[58]Database drivers are free to include additional columns of information in the result data. ^[59]Some are ISO standard, others are Microsoft ODBC de facto standard. See ftp://jerry.ece.umassd.edu/isowg3/dbl/SQL_Registry and search for “SQL Data Types,” or the types names of interest, on http://search.microsoft.com/us/dev/ and browse the results.