Read-only databases (that is, databases
that only allow you to SELECT
data from them) are
very useful. Data warehouses
are typically massive
read-only databases populated with
archived data mangled into a form suitable for reporting. However, in
the cut-and-thrust world of transaction-processing databases, the
ability to modify data within the database
quickly and efficiently is of paramount importance.
There are several core operations that comprise the broader definition of data modification, namely:
Inserting new data into the database
Deleting data from the database
Updating, or modifying, existing data within the database
Each of these operations falls into the grouping of Data Manipulation Language
commands, or
DMLs, alongside SELECT
.
We shall discuss each of these tasks in turn and apply the theory to our example database.
Before a database can be really of any use, data must be inserted into it by some means, either by manual data entry or with an automated batch loading program. The action of inserting data only applies to cases in which you wish to load a completely new record of information into the database. If the record already exists and merely requires modification of a column value, the update operation should be used instead.
Data inserts in the relational database model are done on a row-by-row basis: each record or item of information that you load into the database corresponds to a brand-new row within a given existing table. As each inserted record corresponds to a new row in one table, multitable inserts are not possible.[28]
The SQL INSERT
keyword
provides a simple mechanism for inserting new rows of data into the
database. For example, assuming the megaliths
table is already present in the database and and contains the six
columns shown earlier in Figure 3.1, a single row
of data can be inserted into it using the following SQL statement:
INSERT INTO megaliths VALUES ( 0, 'Callanish I', '"Stonehenge of the North"', 'Western Isles', 'NB 213 330', 1 )
If you then SELECT
back all the rows in the table,
you should be able to see the row that has just been inserted.
Just as the SELECT
statement could specify which
columns from a table should be returned in the query, it is also
possible (and good practice) to specify into which columns of the
table the values should be inserted. The unspecified columns will
take the default value, typically NULL
. For
example, if you wished to specify only the id
and
name
columns of the table, allowing
description
and location
to be
NULL
, the SQL statement would be:
INSERT INTO megaliths ( id, name ) VALUES ( 0, 'Callanish I' )
There must be an exact mapping between the number of columns and column values specified in the SQL statement. It is also essential to make sure that the datatypes of the supplied values and the corresponding columns match.
One
of the more sneaky uses for the
INSERT
keyword is to transfer data from one table
or column to another in one easy operation. This seems to fly in the
face of our previous assertion that only one row can be inserted with
each INSERT
statement, but in fact, follows the
rules correctly (in an underhand manner).
For example, if we wanted to make a quick copy of the
megaliths
table into a new table called
megaliths_copy
, the following SQL statement can be
used:
INSERT INTO megaliths_copy SELECT * FROM megaliths
This process inserts each row returned from the
SELECT
statement into the new table, row by row,
until an exact copy is created. This feature of
INSERT
is extremely useful for making quick copies
of tables if you need to do some destructive maintenance work on the
original, such as pruning redundant data. For this SQL to work, the
original table and destination table must have an identical
structure.
You can further refine this operation by specifying conditions that the rows to be transferred must meet before being inserted. For example, to copy across only the rows of data for megaliths located in Wiltshire:
INSERT INTO megaliths_copy SELECT * FROM megaliths WHERE location LIKE '%Wiltshire%'
Furthermore, you can make extracts of data from
tables into new tables by explicitly specifying the targeted columns
in the new table. This is useful when building large denormalized
tables for use within a data warehouse. Therefore, if we had a table
called megalocations
that contained two columns
called name
and location
, we
could populate this new table from the megaliths
table in the following way:
INSERT INTO megalocations SELECT name, location FROM megaliths
Or, we can even select data from multiple tables for inserting. A
denormalized table containing the rows coalesced from the
megaliths
and media
tables
might contain two columns, name
and
url
. Populating this table with an
INSERT
statement is easy:
INSERT INTO megamedia SELECT name, url FROM megaliths, media WHERE megaliths.id = media.megaliths_id
However, in general, table population via INSERT
statements is usually performed by batch-loading programs that
generate suitable SQL statements and execute them within the
database, such as Oracle’s SQL*Loader. Of course, Perl is a
good example of a programming language that makes loading data from a
file remarkably easy via the DBI.
[28] Well, this is not strictly true these days, as database servers get smarter. Oracle, for example, allows inserts into equi-join views and also supports “INSTEAD OF” triggers that make just about anything possible.