Description

The INSERT command adds new rows to tables. An individual INSERT command can only insert rows into one table, and in most cases can only insert one row at a time. There are several variants of the INSERT command. The primary differences relate to how the columns and values are specified.

The basic format of the INSERT command starts with the command word INSERT, followed by an optional conflict resolution clause (OR ROLLBACK, etc.). The INSERT conflict resolution clause is identical to the one found in the UPDATE command. See UPDATE for more information on the different conflict resolution options and how they behave. The command word REPLACE can also be used, which is just a shortcut for INSERT OR REPLACE. This is discussed in more detail below.

After the conflict clause, the command declares which table it is acting upon. This is generally followed by a list of columns in parentheses. This column list defines which columns will have values set in the newly inserted row. If no column list is given, a default column list is assumed to include all of the table’s columns, in order, as they appear in the table definition. A default list will not include the raw ROWID column, but any ROWID alias (INTEGER PRIMARY KEY) column is included. If an explicit list of columns is given, the list may contain any column (including the ROWID column) in any order.

Following the column list is a description of the values to insert into the new row. The values are most commonly defined by the keyword VALUES, followed by an explicit list of values in parentheses. The values are matched to columns by position. No matter how the column list is defined, the column list and the value list must have the same number of entries so that one value can be matched to each column in the column list.

The INSERT values can also be defined with a subquery. Using a subquery is the only case when a single INSERT command can insert more than one row. The result set generated by the subquery must have the same number of columns as the column list. As with the value list, the values of the subquery result set are matched to the insert columns by position.

Any table column that does not appear in the insert column list is assigned a default value. Unless the table definition says otherwise, the default value is a NULL. If you have a ROWID alias column that you want assigned an automatic value, you must use an explicit column list, and that list must not include the ROWID alias. If a column is contained in the column list, either explicitly, or by default, a value must be provided for that column. There is no way to specify a default value except by leaving the column out of the column list, or knowing what the default value is and explicitly inserting it.

Alternatively, the column list and value list can be skipped all together. The DEFAULT VALUES variant provides neither a column list nor a source of values and can be used to insert a new row that consists entirely of default values.

Because a typical INSERT command only allows a single row to be inserted, it is not uncommon to have a string of INSERT commands that are used to bulk-load or otherwise populate a new table. Like any other command, each INSERT is normally wrapped in its own transaction. Committing and synchronizing this transaction can be quite expensive, often limiting the number of inserts to a few dozen a second.

Due of the expense associated with committing a transaction, it is very common to batch multiple inserts into a single transaction. Especially in the case of bulk inserts, it is not uncommon to batch 1,000 or even 10,000 or more INSERT commands into a single transaction, allowing a much higher insert rate. Just understand that if an error is encountered, there are situations where the whole transaction will be rolled back. While this may be acceptable for loading bulk data from a file (that can simply be rerun), it may not be acceptable for data that is inserted from a real-time data stream. Batch transactions greatly speed things up, but they can also make it significantly more difficult to recover from an error.

One final word on the INSERT OR REPLACE command. This type of command is frequently used in event-tracking systems where a “last seen” timestamp is required. When an event is processed, the system needs to either insert a new row (if this type of event has never been seen before) or it needs to update an existing record. While the INSERT OR REPLACE variant seems perfect for this, it has some specific limitations. Most importantly, the command truly is an “insert or replace” and not an “insert or update.” The REPLACE option fully deletes any old rows before processing the INSERT request, making it ineffective to update a subset of columns. In order to effectively use the INSERT OR REPLACE option, the INSERT needs to be capable of completely replacing the existing row, and not simply updating a subset of columns.