If PostgreSQL crashes, there will be a message in the server log with the severity level set to PANIC. PostgreSQL will immediately restart and attempt to recover using the transaction log or Write-Ahead Log (WAL).
The WAL consists of a series of files written to the pg_wal subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name write-ahead log, as a synonym of transaction log. When a transaction commits, the default (and safe) behavior is to force the WAL records to disk. Should PostgreSQL crash, the WAL will be replayed, which returns the database to the point of the last committed transaction, and thus ensures the durability of any database changes.
Crash recovery replays the WAL, but from what point does it start to recover? Recovery starts from points in the WAL known as checkpoints. The duration of a crash recovery depends on the number of changes in the transaction log since the last checkpoint. A checkpoint is a known safe starting point for recovery, since it guarantees that all the previous changes to the database have already been written to disk.
A checkpoint can become a performance bottleneck on busy database servers because of the number of writes required. We will see a number of ways to fine-tune that, but you must also understand the effect that those tuning options may have on crash recovery.
A checkpoint can be either immediate or scheduled. Immediate checkpoints are triggered by some action of a superuser, such as the CHECKPOINT command. Scheduled checkpoints are decided automatically by PostgreSQL.
Two parameters control the occurrence of scheduled checkpoints. The first is checkpoint_timeout, which is the number of seconds until the next checkpoint. While this parameter is time-based, the second parameter, max_wal_size, influences the amount of WAL data that will be written before a checkpoint is triggered; the actual limit is computed from that parameter, taking into account the fact that WAL files can be deleted after two checkpoints. A checkpoint is called whenever either of these two limits is reached.
It's tempting to banish checkpoints as much as possible by setting the following parameters:
max_wal_size = 20GB
checkpoint_timeout = 3600
However, if you do this, you should give some thought to how long crash recovery will take if you do, and whether you want that; you must consider as well how many changes will accumulate before the next checkpoint—and more importantly how much I/O those changes will generate.
Also, you should make sure that the pg_wal directory is mounted on disks with enough disk space. By default, max_wal_size is set to 1 GB. The amount of disk space required by pg_wal might also be influenced by:
- Unexpected spikes in workload
- Failures in continuous archiving (see archive_command in the Hot physical backup and continuous archiving section)
- The wal_keep_segments setting (you will need 16 MB x wal_keep_segments of space)
In contrast with max_wal_size, with min_wal_size you can control the minimum size allotted to WAL storage, and allow PostgreSQL to recycle existing WAL files instead of removing them.