Backup and restore performance can be improved in different ways, depending on the backup type:
- Physical backup: Improving the performance of a physical backup can be done by performing the backup in parallel and copying the files using more than one task. The more tasks you use, the more it will impact the current system. When backing up, you can skip certain files. You won't need the following:
- Any files placed in the data directory by DBA that shouldn't actually be there
- Any files in pg_wal
- Any old server log files in pg_log (even the current one)
Remember, it's safer not to try to exclude files at all because, if you miss something critical, you may end up with data loss. Also remember that your backup speed may be bottlenecked by your disks or your network. Some larger systems have dedicated networks in place, solely for backups.
- Logical backup: As explained in a previous recipe, if you want to back up all databases in a database server, then you should use multiple pg_dump tasks running in parallel. You may want to increase the dump speed of a pg_dump task, but there really isn't an easy way of doing that right now. If you're using compression, look at the There's more... at the end of this recipe.
- Physical restore: Just as with physical backup, it's possible for us to put everything back quicker if we use parallel copy, which is able to speed things up by automatically reusing existing files.
- Logical restore: Whether you use psql or pg_restore, you can speed up the program by assigning maintenance_work_mem = 128MB or more, either in postgresql.conf or on the user that will run the restore. If neither of those ways is easily possible, you can specify the option using the PGOPTIONS environment variable, as follows:
export PGOPTIONS ="-c work_mem = 128000"
This will then be used to set that option value for subsequent connections.
If you are running archiving or streaming replication, then transaction log writes may become a problem. This can be mitigated by increasing the size of the WAL buffer and making checkpoints less frequent for the duration of the recovery operation. Set wal_buffers between 16 MB and 64 MB, and set max_wal_size to a large value such as 20 GB so that it has room to breathe.
If you aren't running archiving or streaming replication, or you've turned it off during the restore, then you'll be able to minimize the amount of transaction log writes. In that case, you may wish to use the single-transaction option, as that will also help improve performance.
If a pg_dump was made using -F c (custom format), then we can restore in parallel, as follows:
pg_restore -j NumJobs
You'll have to be careful about how you select the degree of parallelism to use. A good starting point is the number of CPUs of the server. Be very careful that you don't overflow the available memory when using parallel restore. Each job will use memory up to the value of maintenance_work_mem, so the whole restore could begin swapping when it hits larger indexes later in the restore. Plan the size of shared_buffers and maintenance_work_mem according to the number of jobs specified.
Whatever you do, make sure you run ANALYZE afterwards on every object that was created. This will happen automatically if autovacuum is enabled. It often helps to disable autovacuum completely while running a large restore, so double-check that you have it switched on again after the restore. The consequence of skipping this step will be extremely poor performance when you start your application again, which can easily make everybody panic.