Maintaining a Database

Database maintenance is a broad subject. This section covers the physical maintenance of the system (pertaining to its disk usage), analytical maintenance (to increase performance), and database object documentation (to add to the maintainability and clarity of the schema).

The primary tool for physical and analytical database maintenance in PostgreSQL is the VACUUM SQL command, and its accompanying command-line script, vacuumdb. They each perform the same two general functions:

Remove any leftover data from rollbacks and other processes that can leave temporary data
Analyze activity in the database to assist PostgreSQL in designing efficient queries

It is good practice to perform a VACUUM nightly on a production database. While it can be run at the same time data is accessed, doing so will decrease the response time of the server. As such, it is generally preferable to schedule it at a time when you do not expect a great deal of database activity.

Any time an exceptionally large number of records are added or deleted, it is prudent to perform a VACUUM to analyze the database, which automatically updates the PostgreSQL query optimizer of major changes to the tables. By doing this you allow PostgreSQL to have a more up-to-date profile of the data within the database, providing a better set of information with which to plan the most efficient queries. All of these actions should result in a faster, more efficient response from the database.

Note

The VACUUM command locks tables in access exclusive mode. This means that any query involving a table being vacuumed will pause and wait until the vacuum of the affected table is complete before continuing.

Using VACUUM

The syntax for the VACUUM SQL command is as follows:

VACUUM [ VERBOSE ] [ ANALYZE ] [ table ]
VACUUM [ VERBOSE ] ANALYZE [ table [ ( column [, ...] ) ] ]

Used without any of the optional keywords or identifiers, a VACUUM statement will clean up each table in the presently connected database, one at a time, deleting temporary data and recovering disk space. This use of VACUUM is primarily to maximize free disk space.

An optional table identifier may be specified if you want the VACUUM to clean a single table in the connected database, rather than all tables. It will also update statistics in the system catalogs pertaining to the number of records and amount of data stored in each table. Example 9-14 shows the use of a VACUUM statement in the booktown database, on the books table.

Example 9-14. Using VACUUM on a table

booktown=# VACUUM books;
VACUUM

The VACUUM message returned in Example 9-14 indicates that the process finished successfully. If a specified table cannot be found, you will instead receive the following notice:

NOTICE: Vacuum: table not found

With the use of the optional ANALYZE keyword, PostgreSQL examines the allocation of data in each column for each table (or the specified table, if provided), and uses the information to prepare the query optimizer for better planning. With the use of the ANALYZE keyword, you also have the option to analyze only specified columns. Example 9-15 shows the use of the VACUUM ANALYZE command on the entire booktown database.

Example 9-15. Using VACUUM ANALYZE on a database

booktown=# VACUUM ANALYZE;
VACUUM

Finally, the optional VERBOSE keyword may be applied if you are interested in seeing a detailed internal report of the findings of the VACUUM statement. This is most likely not of interest to anyone not actively developing the PostgreSQL engine, or related software.

Using vacuumdb

As with many of the database management SQL commands, the VACUUM command has a command-line executable wrapper called vacuumdb. The vacuumdb script provides one significant added function to the normal use of the VACUUM SQL statement, in that you can instruct it to perform a VACUUM on each PostgreSQL database on your system.

Additionally, since it accepts connection parameters on how to connect to PostgreSQL, you may use vacuumdb remotely (i.e., without having to first connect to the machine via a terminal client, and then executing vacuumdb or psql from the remote machine). This is provided that your authentication scheme in PostgreSQL’s pg_hba.conf file is configured for outside access (see Chapter 8 for more information on this).

Here is the syntax for vacuumdb:

vacuumdb [ options ] [ dbname ]

Like the createdb and dropdb scripts, vacuumdb accepts both single-dashed and GNU-style double-dashed arguments from the command line. The only required option is the dbname (unless you specify - -all), which describes the database to be cleaned and analyzed. The options parameters describe which mode the VACUUM command should be invoked in. The following are the available options for the vacuumdb script:

-h HOSTNAME, - -host=HOSTNAME: Specifies that you are connected to HOSTNAME, rather than the localhost. Use this option when vacuuming a remote database.
-p PORT, - -port=PORT: Specifies that the database connection is made on port PORT, rather than the default port (usually 5432, though it may have been configured differently when PostgreSQL was compiled, by the - -with-pgport flag).
-U USERNAME, - -username=USERNAME: Specifies that the username USERNAME is the user who connects to PostgreSQL (rather than the name of the system user executing vacuumdb).
-W, - -password: Accepts no parameters, and causes a password prompt, which occurs automatically if the pg_hba.conf file on the target server is configured not to trust the requesting host.
-d DBNAME, - -dbname=DBNAME: Explicitly specifies the name of the database to perform the VACUUM statement on. This option is mutually exclusive to the - -all option.
-a, - -all: Applies the VACUUM command, with specified options, to all databases in the system catalog.
-z, - -analyze: Equivalent to the ANALYZE keyword for the VACUUM SQL command. Updates stored statistics about the data allocation between columns, which are used by the query optimizer to help guide internal query planning.
-t ‘TABLE [ ( column [, ...] ) ]’, - -table=’TABLE [ ( column [, ...] ) ]’: Targets a specific table TABLE (or specific columns within that table) to be affected. The - -analyze option is required to target specific columns.
-v, - -verbose

Example 9-16 demonstrates the use of the vacuumdb script from the Book Town server. The -U flag specifies that the connection should use the manager user to connect, while the - -all flag causes all databases in the system catalog to be affected in sequence.

Example 9-16. Using vacuumdb on all databases

[jworsley@booktown ~]$ vacuumdb -U manager --all
Vacuuming postgres
VACUUM
Vacuuming booktown
VACUUM
Vacuuming template1
VACUUM

As mentioned, because of the connectivity options available to the vacuumdb script, it can be easily executed from a remote server. Example 9-17 shows a similar process to the command used in Example 9-16, but with the addition of the -h flag to specify a remote server named booktown.commandprompt.com. Example 9-17 also targets the booktown database specifically, rather than all databases.

Example 9-17. Using vacuumdb on a remote database

[jworsley@cmd ~]$ vacuumdb -h booktown.commandprompt.com -U manager booktown
VACUUM

Documenting a Database

Using COMMENT

PostgreSQL offers a non-standard SQL command called COMMENT, which allows for documentation of any database object. By using COMMENT on a table, function, operator, or other database object, you can provide a description that is stored in the pg_description system table. Descriptions can be easily retrieved through a set of extended psql slash commands.

Many standard objects in the database have default descriptions, which can be perused (along with user-added descriptions) with the psql \dd slash command.

Here is the syntax for COMMENT:

COMMENT ON [ [ DATABASE | INDEX | RULE | SEQUENCE | TABLE | TYPE | VIEW ]
  { object_name |
    COLUMN table_name.column_name |
    AGGREGATE aggregate_name aggregate_type |
    FUNCTION function_name ( argument_type [, ...] ) |
    OPERATOR operator_name ( leftoperand_type , rightoperand_type ) |
    TRIGGER trigger_name ON table_name }
] IS 'description'

In this syntax, object_name is the name of the database object that you wish to add a comment or description to. The keywords for the major database objects are optional, but if you intend to place a comment on a column, function, aggregate function, operator, or trigger, you must specify the preceding keyword so that PostgreSQL knows what kind of syntax to expect, and where to look for the object name specified.

Note

Note that any comment added to a database is tied to both the database in which it is added, and the user who added it. You can only see those comments that you create.

The description string constant following the IS keyword is the literal comment to be placed in the database. Example 9-18 demonstrates placing a simple description on the id column of the booktown database.

Example 9-18. Commenting the books table

booktown=# COMMENT ON COLUMN books.id
booktown-#            IS 'An Internal Book Town Identifier';
COMMENT

The COMMENT server message returned in Example 9-18 indicates that the comment was successfully placed on the column.

Retrieving comments

You may retrieve comments from the database easily by using the psql slash-plus commands. These are as follows:

\d+: Displays the same information as the standard \d command (displaying all tables, views, sequences, and indices in the currently connected database), but adds a column for the comments as well.
\l+: Displays comments on all databases.
\df+ [ pattern ]: Displays descriptions for each function in the currently connected database (as well as the language and source of the function). You may wish to view this slash command in expanded mode for readability by first initiating the \x slash command (see Chapter 6 for more about this). You can optionally supply a regular expression pattern to compare against existing function names, allowing you to limit the number of functions displayed.
\dt+: Displays comments on all tables in the currently connected database.
\di+: Displays comments on all indices in the currently connected database.
\ds+: Displays comments on all sequences in the currently connected database.
\dv+: Displays comments on all views in the currently connected database.
\dS+: Displays comments on system tables. Note that comments placed on system tables are still tied to a particular database, even though they are accessible from all databases, and will therefore not be visible when performing a \dS+ slash command from another database connection.
\dd: Displays all descriptions for all database objects.

Example 9-19 demonstrates the retrieval of the comment that was placed on the books table in Example 9-18 with the use of the psql \d+ command.

Example 9-19. Retrieving a comment

booktown=# \d+ books
                           Table "books"
 Attribute  |  Type   | Modifier |           Description
------------+---------+----------+----------------------------------
 id         | integer | not null | An Internal Book Town Identifier
 title      | text    | not null |
 author_id  | integer |          |
 subject_id | integer |          |
Index: books_id_pkey

The COMMENT SQL command provides a very simple way of internally documenting your objects, from tables to functions. This can be of great help when working with large or complicated database schema. Even the best of naming conventions do not always result in database objects whose applications are self-evident; this is especially the case when working with multiple developers.