Modifying Data Within Tables

Read-only databases (that is, databases that only allow you to SELECT data from them) are very useful. Data warehouses are typically massive read-only databases populated with archived data mangled into a form suitable for reporting. However, in the cut-and-thrust world of transaction-processing databases, the ability to modify data within the database quickly and efficiently is of paramount importance.

There are several core operations that comprise the broader definition of data modification, namely:

Each of these operations falls into the grouping of Data Manipulation Language commands, or DMLs, alongside SELECT.

We shall discuss each of these tasks in turn and apply the theory to our example database.

Before a database can be really of any use, data must be inserted into it by some means, either by manual data entry or with an automated batch loading program. The action of inserting data only applies to cases in which you wish to load a completely new record of information into the database. If the record already exists and merely requires modification of a column value, the update operation should be used instead.

Data inserts in the relational database model are done on a row-by-row basis: each record or item of information that you load into the database corresponds to a brand-new row within a given existing table. As each inserted record corresponds to a new row in one table, multitable inserts are not possible.[28]

The SQL INSERT keyword provides a simple mechanism for inserting new rows of data into the database. For example, assuming the megaliths table is already present in the database and and contains the six columns shown earlier in Figure 3.1, a single row of data can be inserted into it using the following SQL statement:

INSERT INTO megaliths VALUES ( 0, 'Callanish I', 
                               '"Stonehenge of the North"',
                               'Western Isles',
                               'NB 213 330', 1 )

If you then SELECT back all the rows in the table, you should be able to see the row that has just been inserted.

Just as the SELECT statement could specify which columns from a table should be returned in the query, it is also possible (and good practice) to specify into which columns of the table the values should be inserted. The unspecified columns will take the default value, typically NULL. For example, if you wished to specify only the id and name columns of the table, allowing description and location to be NULL, the SQL statement would be:

INSERT INTO megaliths ( id, name ) VALUES ( 0, 'Callanish I' )

There must be an exact mapping between the number of columns and column values specified in the SQL statement. It is also essential to make sure that the datatypes of the supplied values and the corresponding columns match.

One of the more sneaky uses for the INSERT keyword is to transfer data from one table or column to another in one easy operation. This seems to fly in the face of our previous assertion that only one row can be inserted with each INSERT statement, but in fact, follows the rules correctly (in an underhand manner).

For example, if we wanted to make a quick copy of the megaliths table into a new table called megaliths_copy, the following SQL statement can be used:

INSERT INTO megaliths_copy
    SELECT *
    FROM megaliths

This process inserts each row returned from the SELECT statement into the new table, row by row, until an exact copy is created. This feature of INSERT is extremely useful for making quick copies of tables if you need to do some destructive maintenance work on the original, such as pruning redundant data. For this SQL to work, the original table and destination table must have an identical structure.

You can further refine this operation by specifying conditions that the rows to be transferred must meet before being inserted. For example, to copy across only the rows of data for megaliths located in Wiltshire:

INSERT INTO megaliths_copy
    SELECT *
    FROM megaliths
    WHERE location LIKE '%Wiltshire%'

Furthermore, you can make extracts of data from tables into new tables by explicitly specifying the targeted columns in the new table. This is useful when building large denormalized tables for use within a data warehouse. Therefore, if we had a table called megalocations that contained two columns called name and location, we could populate this new table from the megaliths table in the following way:

INSERT INTO megalocations
    SELECT name, location
    FROM megaliths

Or, we can even select data from multiple tables for inserting. A denormalized table containing the rows coalesced from the megaliths and media tables might contain two columns, name and url. Populating this table with an INSERT statement is easy:

INSERT INTO megamedia
    SELECT name, url
    FROM megaliths, media
    WHERE megaliths.id = media.megaliths_id

However, in general, table population via INSERT statements is usually performed by batch-loading programs that generate suitable SQL statements and execute them within the database, such as Oracle’s SQL*Loader. Of course, Perl is a good example of a programming language that makes loading data from a file remarkably easy via the DBI.



[28] Well, this is not strictly true these days, as database servers get smarter. Oracle, for example, allows inserts into equi-join views and also supports “INSTEAD OF” triggers that make just about anything possible.