The
DBI requires some additional
information to allow you to query back LONG/LOB (long/large object)
datatypes from a database. As we discussed earlier in the section on
the LongReadLen
and
LongTruncLen
attributes, the DBI is unable to
determine how large a buffer to allocate when fetching columns
containing LOB data. Therefore, we cannot simply issue a
SELECT
statement and expect it to work.
Selecting LOB data is straightforward and essentially identical to
selecting any other column of another datatype, with the important
exception that you should set at least the
LongReadLen
attribute value prior to preparing the statement that will return the
LOB. For example:
### We're not expecting binary data of more than 512 KB... $dbh->{LongReadLen} = 512 * 1024; ### Select the raw media data from the database $sth = $dbh->prepare( " SELECT mega.name, med.media_data FROM megaliths mega, media med WHERE mega.id = med.megaliths_id " ); $sth->execute(); while ( ($name, $data) = $sth->fetchrow_array ) { ... }
Without the all-important setting of LongReadLen
,
the
fetchrow_array()
call would likely fail when fetching the first row, because the
default value for LongReadLen
is very
small—typically 80 or less.
What happens if there’s a rogue column in the database that is
longer than LongReadLen
? How would the code in the
previous example cope? What would happen?
When the length of the fetched LOB data exceeds the value of
LongReadLen
, an error occurs
unless you have set the
LongTruncOk
attribute to a true value. The DBI defaults
LongTruncOk
to false to ensure that accidental
truncation is an error.
But there’s a potential problem here if
RaiseError
is not enabled. How does the snippet of code above behave if it tries
to fetch a row with a LOB field that exceeds the value of
LongReadLen
? The
fetchrow_array()
returns an empty list if
there’s an error when trying to fetch a row. But
fetchrow_array()
also returns an empty list when
there’s no more data to fetch. The while
loop will simply end and any code following it will be executed. If
the loop should have fetched 50 records it might stop after 45 if the
46th record was too big. Without error checking, you may never
realize that you’re missing some rows! The same applies to
loops using other fetchrow
methods such as
fetchrow_hashref()
.
Few people remember to check for errors after
fetch loops and
that’s a common cause of problems with code that handles
LONG/LOB fields. Even when not handling special datatypes it’s
always a good idea to check for errors after
fetch loops, or let the DBI do it for you by enabling
RaiseError
, as we discussed in Chapter 4.
Getting back to our little snippet of code, let’s assume that
we are happy for values longer than
LongReadLen
to be silently truncated without
causing an error. The following code stub would correctly handle this
eventuality:
### We are interested in the first 512 KB of data $dbh->{LongReadLen} = 512 * 1024; $dbh->{LongTruncOk} = 1; ### We're happy to truncate any excess ### Select the raw media data from the database $sth = $dbh->prepare( " SELECT mega.name, med.media_data FROM megaliths mega, media med WHERE mega.id = med.megaliths_id " ); $sth->execute(); while ( ($name, $data) = $sth->fetchrow_arrayref ) { ... }
The only change, apart from comments, is the addition of a line
setting the LongTruncOk
attribute to a true value.
The ability to
truncate LOB data when overly large is
quite useful for text and some forms of binary data, but not for
others. Storing streaming media that is interpreted on a temporal
basis doesn’t unduly suffer from being truncated, as you will
be able to view or listen to the stream up until the point of
truncation. However, binary files such as ZIP files that store a
checksum at the very end will be rendered useless when truncated.
With this sort of data, it’s not recommended that
LongTruncOk
be enabled, as it will allow
truncated, and hence corrupted, data to be returned with no
indication that there’s a problem. In that situation, you
won’t be able to determine whether or not the column contains
corrupted data, or if the column has been chopped by DBI. Caveat
emptor!
One thing to be aware of when writing portable code to fetch LOB data
from a database is that the format of that data may vary on a
per-database and datatype basis. For example, in Oracle, a column
with a LONG RAW datatype, rather than a simple LONG type, is passed
to and from the database encoded as a pair of hexadecimal digits for
each byte. So after fetching the hex string, you’d need to
decode it using unpack("H*",...)
to get the
original binary value. For historical reasons, for these datatypes,
the LongReadLen
attribute refers to the length of
the binary data, so hex-encoded strings up to twice that length may
be
fetched.
The DBI currently defines
no
way to fetch LONG/LOB values piece-wise, in
other words, piece-by-piece. That means you’re limited to
fetching values that will fit into your available memory. It also
means you can’t stream the data out while
still fetching it from the database. Some drivers do implement an
unofficial blob_read()
method, so take a look at
your driver documentation if you need piece-wise fetches.
Some databases let you insert into LONG/LOB columns using SQL statements with literal strings, like this:
INSERT INTO table_name (key_num, long_description) VALUES (42, '...')
Ignoring portability for the moment, that’s fine for simple
short textual strings, but soon runs into problems for anything else.
Firstly, most databases have a limit on the maximum length of an SQL
statement, and it’s usually far shorter than the maximum length
of a LONG/LOB column. Secondly, most databases have limits on which
characters can be included in literal strings. The DBD driver’s
quote()
method will do its best, but it’s
often not possible to put all possible binary data values into a
string. Finally, coming back to portability, many databases are
strict about data typing and just don’t let you assign literal
strings to LONG/LOB columns.
So how do we avoid these problems? Here’s where placeholders come to our aid once again. We discussed placeholders in some detail in Chapter 5 so we’ll only cover LONG/LOB issues here.
To use placeholders, we’d implement the statement above using the DBI as:
use DBI qw(:sql_types); $sth = $dbh->prepare( " INSERT INTO table_name (key_num, long_description) VALUES (?, ?) " ); $sth->bind_param( 1, 42 ); $sth->bind_param( 2, $long_description, SQL_LONGVARCHAR); $sth->execute();
Passing SQL_LONGVARCHAR
as the optional TYPE
parameter to bind_ param()
gives the driver a strong hint that you’re binding a LONG/LOB
type. Some drivers don’t need the hint but it’s always a
good idea to include it.
The DBI currently defines no way to insert or update LONG/LOB values piece-wise, in other words, piece by piece. That means you’re limited to handling values that will fit into your available memory.