One of the bigger problems when using a DBM file
with the storage mechanism of
DB_HASH
is that the keys against which the data
is stored must be unique. For example, if we stored two different
values with the key of ``Wiltshire,'' say for
Stonehenge and Avebury, generally the last value inserted into the
hash would get stored in the database. This is a bit problematic, to
say the least.
In a good database design, the primary key of any data structure generally should be unique in order to speed up searches. But quick and dirty databases, badly designed ones, or databases with a suboptimal data quality may not be able to enforce this uniqueness. Similarly, using referential hashtables to provide nonprimary key searching of the database also triggers this problem.
Therefore, to solve this problem, we need to look at using the
different Berkeley DB storage management method of
DB_BTREE
, which orders its keys prior to
insertion. With this mechanism, it is possible to have duplicate
keys, because the underlying DBM file is in the form of an array
rather than a hashtable. Fortunately, you still reference the DBM
file via a Perl hashtable, so DB_BTREE
is not any
harder to use. The main downside to DB_BTREE
storage is a penalty in performance, since a B-Tree is generally
slightly slower than a hashtable for data retrieval.
The following short program creates a Berkeley DB
using the DB_BTREE
storage mechanism and also
specifies a flag to indicate that duplicate keys are allowed. A
number of rows are inserted with duplicate keys, and finally the
database is dumped to show that the keys have been stored:
#!/usr/bin/perl -w # # ch02/DBM/dupkey1: Creates a Berkeley DB with the DB_BTREE mechanism and # allows for duplicate keys. We then insert some test # object data with duplicate keys and dump the final # database. use DB_File; use Fcntl ':flock'; use Megalith; ### Set Berkeley DB BTree mode to handle duplicate keys $DB_BTREE->{'flags'} = R_DUP; ### Remove any existing database files unlink 'dupkey2.dat'; ### Open the database up my %database; my $db = tie %database, 'DB_File', "dupkey2.dat", O_CREAT | O_RDWR, 0666, $DB_BTREE or die "Can't initialize database: $!\n"; ### Exclusively lock the database to ensure no one accesses it my $fd = $db->fd( ); open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n"; print "Acquiring exclusive lock..."; flock( DATAFILE, LOCK_EX ) or die "Unable to acquire lock: $!. Aborting"; print "Acquired lock. Ready to update database!\n\n"; ### Create, pack and insert some rows with duplicate keys $database{'Wiltshire'} = new Megalith( 'Avebury', 'Wiltshire', 'SU 103 700', 'Stone Circle and Henge', 'Largest stone circle in Britain' )->pack( ); $database{'Wiltshire'} = new Megalith( 'Stonehenge', 'Wiltshire', 'SU 123 400', 'Stone Circle and Henge', 'The most popularly known stone circle in the world' )->pack( ); $database{'Wiltshire'} = new Megalith( 'The Sanctuary', 'Wiltshire', 'SU 118 680', 'Stone Circle ( destroyed )', 'No description available' )->pack( ); ### Dump the database foreach my $key ( keys %database ) { ### Unpack the record into a new megalith object my $megalith = new Megalith( $database{$key} ); ### And display the record $megalith->dump( ); } ### Close the database undef $db; untie %database; ### Close the filehandle to release the lock close DATAFILE; exit;
The output you get from running this program is not exactly what we’d hoped for:
Acquiring exclusive lock...Acquired lock. Ready to update database! The Sanctuary ( Stone Circle ( destroyed ) ) ============================================ Location: Wiltshire Map Reference: SU 118 680 Description: No description available The Sanctuary ( Stone Circle ( destroyed ) ) ============================================ Location: Wiltshire Map Reference: SU 118 680 Description: No description available The Sanctuary ( Stone Circle ( destroyed ) ) ============================================ Location: Wiltshire Map Reference: SU 118 680 Description: No description available
It seems that we’ve managed to successfully store three copies of the same record instead of three different records!
Fortunately, this isn’t actually the case. We have correctly stored the three different records with the same key in the DBM file. The problem lies in the way we’ve tried to read these records back out of the DBM file. A basic dereference using the hash key obviously doesn’t work, since Perl stores only a single value for each key, as we already know.
To get around this limitation, we can use the seq( )
method
declared within the DB_File
module, which is used
to
traverse
chained records stored within a single hash
element. Figure 2.2 illustrates the principle of
chained record traversal within a hash element.
The corrected record dumping chunk is rewritten to use
seq()
in this way:
### Dump the database my ($key, $value, $status) = ('', '', 0); for ( $status = $db->seq( $key, $value, R_FIRST ) ; $status == 0 ; $status = $db->seq( $key, $value, R_NEXT ) ) { ### Unpack the record into a new megalith object my $megalith = new Megalith( $value ); ### And display the record $megalith->dump(); }
Running this corrected version produces the output we expected, i.e., records for three different megalithic sites.
for ( $status = $db->seq( $key, $value, R_LAST ) ; $status == 0 ; $status = $db->seq( $key, $value, R_PREV ) ) { ... }
A
quicker and easier utility method
for querying duplicate values also exists:
get_dup()
.
This method returns either the number of records with the given key
or an array or hash containing the appropriate records. For example,
given that we have three records in our database with the key of
Wiltshire, we could verify that fact by
writing:
### Displays the number of records inserted against ### the "Wiltshire" key my $numRecords = $db->get_dup( 'Wiltshire' ); print "Number of Wiltshire records: $numRecords\n";