Modifying Data Within Tables

Read-only databases (that is, databases that only allow you to SELECT data from them) are very useful. Data warehouses are typically massive read-only databases populated with archived data mangled into a form suitable for reporting. However, in the cut-and-thrust world of transaction-processing databases, the ability to modify data within the database quickly and efficiently is of paramount importance.

There are several core operations that comprise the broader definition of data modification, namely:

Inserting new data into the database
Deleting data from the database
Updating, or modifying, existing data within the database

Each of these operations falls into the grouping of Data Manipulation Language commands, or DMLs, alongside SELECT.

We shall discuss each of these tasks in turn and apply the theory to our example database.

Inserting Data

Before a database can be really of any use, data must be inserted into it by some means, either by manual data entry or with an automated batch loading program. The action of inserting data only applies to cases in which you wish to load a completely new record of information into the database. If the record already exists and merely requires modification of a column value, the update operation should be used instead.

Data inserts in the relational database model are done on a row-by-row basis: each record or item of information that you load into the database corresponds to a brand-new row within a given existing table. As each inserted record corresponds to a new row in one table, multitable inserts are not possible.^[28]

The SQL INSERT keyword provides a simple mechanism for inserting new rows of data into the database. For example, assuming the megaliths table is already present in the database and and contains the six columns shown earlier in Figure 3.1, a single row of data can be inserted into it using the following SQL statement:

INSERT INTO megaliths VALUES ( 0, 'Callanish I', 
                               '"Stonehenge of the North"',
                               'Western Isles',
                               'NB 213 330', 1 )

If you then SELECT back all the rows in the table, you should be able to see the row that has just been inserted.

Just as the SELECT statement could specify which columns from a table should be returned in the query, it is also possible (and good practice) to specify into which columns of the table the values should be inserted. The unspecified columns will take the default value, typically NULL. For example, if you wished to specify only the id and name columns of the table, allowing description and location to be NULL, the SQL statement would be:

INSERT INTO megaliths ( id, name ) VALUES ( 0, 'Callanish I' )

There must be an exact mapping between the number of columns and column values specified in the SQL statement. It is also essential to make sure that the datatypes of the supplied values and the corresponding columns match.

Using INSERT for data transfers

One of the more sneaky uses for the INSERT keyword is to transfer data from one table or column to another in one easy operation. This seems to fly in the face of our previous assertion that only one row can be inserted with each INSERT statement, but in fact, follows the rules correctly (in an underhand manner).

For example, if we wanted to make a quick copy of the megaliths table into a new table called megaliths_copy, the following SQL statement can be used:

INSERT INTO megaliths_copy
    SELECT *
    FROM megaliths
This process inserts each row returned from the
SELECT statement into the new table, row by row,
until an exact copy is created. This feature of
INSERT is extremely useful for making quick copies
of tables if you need to do some destructive maintenance work on the
original, such as pruning redundant data. For this SQL to work, the
original table and destination table must have an identical
structure.
You can further refine this operation by specifying conditions that
the rows to be transferred must meet before being inserted. For
example, to copy across only the rows of data for megaliths located
in Wiltshire:INSERT INTO megaliths_copy
    SELECT *
    FROM megaliths
    WHERE location LIKE '%Wiltshire%'
Furthermore, you can make extracts of data from
tables into new tables by explicitly specifying the targeted columns
in the new table. This is useful when building large denormalized
tables for use within a data warehouse. Therefore, if we had a table
called megalocations that contained two columns
called name and location, we
could populate this new table from the megaliths
table in the following way:INSERT INTO megalocations
    SELECT name, location
    FROM megaliths
Or, we can even select data from multiple tables for inserting. A
denormalized table containing the rows coalesced from the
megaliths and media tables
might contain two columns, name and
url. Populating this table with an
INSERT statement is easy:INSERT INTO megamedia
    SELECT name, url
    FROM megaliths, media
    WHERE megaliths.id = media.megaliths_idHowever, in general, table population via INSERT
statements is usually performed by batch-loading programs that
generate suitable SQL statements and execute them within the
database, such as Oracle’s SQL*Loader. Of course, Perl is a
good example of a programming language that makes loading data from a
file remarkably easy via the DBI.


^[28] Well,
this is not strictly true these days, as database servers get
smarter. Oracle, for example, allows inserts into equi-join views and
also supports “INSTEAD OF” triggers that make just about
anything possible.