Database maintenance is a broad subject. This section covers the physical maintenance of the system (pertaining to its disk usage), analytical maintenance (to increase performance), and database object documentation (to add to the maintainability and clarity of the schema).
The primary tool for physical and analytical database maintenance in PostgreSQL is the
VACUUM
SQL command, and its accompanying command-line script,
vacuumdb. They each perform the same two general functions:
Remove any leftover data from rollbacks and other processes that can leave temporary data
Analyze activity in the database to assist PostgreSQL in designing efficient queries
It is good practice to perform a VACUUM
nightly on a production
database. While it can be run at the same time data is accessed, doing so will decrease the
response time of the server. As such, it is generally preferable to schedule it at a time when
you do not expect a great deal of database activity.
Any time an exceptionally large number of records are added or deleted, it is prudent to
perform a VACUUM
to analyze the database, which automatically updates the
PostgreSQL query optimizer of major changes to the tables. By doing this you allow PostgreSQL
to have a more up-to-date profile of the data within the database, providing a better set of
information with which to plan the most efficient queries. All of these actions should result
in a faster, more efficient response from the database.
The VACUUM
command locks tables in access exclusive
mode. This means that any query involving a table being vacuumed will pause and
wait until the vacuum of the affected table is complete before continuing.
The syntax for the VACUUM
SQL command is as follows:
VACUUM [ VERBOSE ] [ ANALYZE ] [ table ] VACUUM [ VERBOSE ] ANALYZE [ table [ ( column [, ...] ) ] ]
Used without any of the optional keywords or identifiers, a VACUUM
statement will clean up each table in the presently connected database, one at a time,
deleting temporary data and recovering disk space. This use of VACUUM
is
primarily to maximize free disk space.
An optional table
identifier may be specified if
you want the VACUUM
to clean a single table in the connected database,
rather than all tables. It will also update statistics in the system catalogs pertaining to
the number of records and amount of data stored in each table. Example 9-14 shows the use of a VACUUM
statement in
the booktown
database, on the books
table.
The VACUUM
message returned in Example 9-14 indicates that the process finished successfully. If a specified table cannot be found, you
will instead receive the following notice:
NOTICE: Vacuum: table not found
With the use of the optional ANALYZE
keyword, PostgreSQL examines the
allocation of data in each column for each table (or the specified table, if provided), and
uses the information to prepare the query optimizer for better planning. With the use of the
ANALYZE
keyword, you also have the option to analyze only specified
columns. Example 9-15 shows the use of the VACUUM ANALYZE
command on the entire booktown
database.
Finally, the optional VERBOSE
keyword may be applied if you are
interested in seeing a detailed internal report of the findings of the VACUUM
statement. This is most likely not of interest to anyone not actively
developing the PostgreSQL engine, or related software.
As with many of the database management SQL commands, the VACUUM
command has a command-line executable wrapper called vacuumdb. The
vacuumdb script provides one significant added function to the normal
use of the VACUUM
SQL statement, in that you can instruct it to perform a
VACUUM
on each PostgreSQL database on your system.
Additionally, since it accepts connection parameters on how to connect to PostgreSQL, you may use vacuumdb remotely (i.e., without having to first connect to the machine via a terminal client, and then executing vacuumdb or psql from the remote machine). This is provided that your authentication scheme in PostgreSQL’s pg_hba.conf file is configured for outside access (see Chapter 8 for more information on this).
Here is the syntax for vacuumdb:
vacuumdb [ options ] [ dbname ]
Like the createdb and dropdb scripts,
vacuumdb accepts both single-dashed and GNU-style double-dashed
arguments from the command line. The only required option is the dbname
(unless you specify - -all), which describes the database to be cleaned
and analyzed. The options parameters describe which mode the VACUUM
command should be invoked in. The following are the available options for
the vacuumdb script:
HOSTNAME
, - -host=HOSTNAME
Specifies that you are connected to HOSTNAME
,
rather than the localhost. Use this option when vacuuming a remote database.
PORT
, - -port=PORT
Specifies that the database connection is made on port PORT
, rather than the default port (usually 5432, though it may
have been configured differently when PostgreSQL was compiled, by the -
-with-pgport flag).
USERNAME
, - -username=USERNAME
Specifies that the username USERNAME
is the
user who connects to PostgreSQL (rather than the name of the system user executing
vacuumdb).
Accepts no parameters, and causes a password prompt, which occurs automatically if the pg_hba.conf file on the target server is configured not to trust the requesting host.
DBNAME
, - -dbname=DBNAME
Explicitly specifies the name of the database to perform the VACUUM
statement on. This option is mutually exclusive to the - -all
option.
Applies the VACUUM
command, with specified options, to all
databases in the system catalog.
Equivalent to the ANALYZE
keyword for the VACUUM
SQL command. Updates stored statistics about the data allocation between columns, which are
used by the query optimizer to help guide internal query planning.
TABLE
[ ( column
[, ...] ) ]’, -
-table=’TABLE
[ ( column
[, ...] )
]’Targets a specific table TABLE
(or specific
columns
within that table) to be affected. The
- -analyze option is required to target specific columns.
Equivalent
to the VERBOSE
keyword for the VACUUM
SQL command.
Causes a detailed internal report of the processing performed to be displayed.
Accepts no parameters, and causes the query sent to PostgreSQL to be displayed to the screen as it is executed by vacuumdb.
Accepts no parameters, and causes no output to be sent to stdout (though any errors will still be sent to stderr).
Example 9-16 demonstrates the use of the
vacuumdb script from the Book Town server. The -U
flag specifies that the connection should use the manager
user to connect,
while the - -all flag causes all databases in the system catalog to be
affected in sequence.
Example 9-16. Using vacuumdb on all databases
[jworsley@booktown ~]$ vacuumdb -U manager --all
Vacuuming postgres
VACUUM
Vacuuming booktown
VACUUM
Vacuuming template1
VACUUM
As mentioned, because of the connectivity options available to the
vacuumdb script, it can be easily executed from a remote server. Example 9-17 shows a similar process to the command used
in Example 9-16, but with the addition of the
-h flag to specify a remote server named
booktown.commandprompt.com. Example 9-17 also targets the booktown
database specifically, rather than all databases.
PostgreSQL offers a non-standard SQL command called COMMENT
, which
allows for documentation of any database object. By using COMMENT
on a
table, function, operator, or other database object, you can provide a description that is
stored in the pg_description
system table. Descriptions can be easily
retrieved through a set of extended psql slash commands.
Many standard objects in the database have default descriptions, which can be perused
(along with user-added descriptions) with the psql
\dd
slash command.
Here is the syntax for COMMENT
:
COMMENT ON [ [ DATABASE | INDEX | RULE | SEQUENCE | TABLE | TYPE | VIEW ] { object_name | COLUMN table_name.column_name | AGGREGATE aggregate_name aggregate_type | FUNCTION function_name ( argument_type [, ...] ) | OPERATOR operator_name ( leftoperand_type , rightoperand_type ) | TRIGGER trigger_name ON table_name } ] IS 'description'
In this syntax, object_name
is the name of the
database object that you wish to add a comment or description to. The
keywords for the major database objects are optional, but if you intend to place a comment on
a column, function, aggregate function, operator, or trigger, you must specify the preceding
keyword so that PostgreSQL knows what kind of syntax to expect, and where to look for the
object name specified.
Note that any comment added to a database is tied to both the database in which it is added, and the user who added it. You can only see those comments that you create.
The description
string constant following the
IS
keyword is the literal comment to be placed in the database. Example 9-18 demonstrates placing a simple description on the
id
column of the booktown
database.
Example 9-18. Commenting the books table
booktown=# COMMENT ON COLUMN books.id booktown-# IS 'An Internal Book Town Identifier'; COMMENT
The COMMENT
server message returned in Example 9-18 indicates that the comment was successfully placed on
the column.
You may retrieve comments from the database easily by using the psql slash-plus commands. These are as follows:
\d+
Displays the same information as the standard \d
command
(displaying all tables, views, sequences, and indices in the currently connected
database), but adds a column for the comments as well.
\l+
Displays comments on all databases.
\df+ [
pattern
]
Displays descriptions for each function in the currently connected database (as well
as the language and source of the function). You may wish to view this slash command in
expanded mode for readability by first initiating the \x
slash command
(see Chapter 6 for more about this). You can optionally supply
a regular expression pattern to compare against existing function
names, allowing you to limit the number of functions displayed.
\dt+
Displays comments on all tables in the currently connected database.
\di+
Displays comments on all indices in the currently connected database.
\ds+
Displays comments on all sequences in the currently connected database.
\dv+
Displays comments on all views in the currently connected database.
\dS+
Displays comments on system tables. Note that comments placed on system tables are
still tied to a particular database, even though they are accessible from all databases,
and will therefore not be visible when performing a \dS+
slash command
from another database connection.
\dd
Displays all descriptions for all database objects.
Example 9-19 demonstrates the retrieval of the comment that
was placed on the books
table in Example 9-18 with the use of the psql
\d+
command.
Example 9-19. Retrieving a comment
booktown=# \d+ books
Table "books"
Attribute | Type | Modifier | Description
------------+---------+----------+----------------------------------
id | integer | not null | An Internal Book Town Identifier
title | text | not null |
author_id | integer | |
subject_id | integer | |
Index: books_id_pkey
The COMMENT
SQL command provides a very simple way of internally
documenting your objects, from tables to functions. This can be of great help when working
with large or complicated database schema. Even the best of naming conventions do not always
result in database objects whose applications are self-evident; this is especially the case when working with
multiple developers.