How to do it…

On the standby node, changes from the master are read from the transaction log and applied to the standby database. Hot Standby works by emulating running transactions from the master so that queries on the standby have the visibility information they need to fully respect MVCC. This makes the Hot Standby mode particularly suitable for serving a large workload of short or fast SELECT queries. If the workload is consistently short, then few conflicts will delay the standby, and the server will run smoothly.

Queries that run on the standby node see a version of the database that is slightly behind the primary node. We describe this as eventually consistent. How long is eventually? That time is exactly the replication delay plus the apply delay, as discussed in the Replication concepts section. You may also request that standby servers delay applying changes. See the Delaying, pausing, and synchronizing replication recipe later on in this section.

Resource contention (CPU, I/O, and so on) may increase apply delay. If the server is busy applying changes from the master, then you will have fewer resources to use for queries. This means that if there are no changes arriving, then you'll get more query throughput. If there are predictable changes in the write workload on the master, then you may need to throttle back your query workload on the standby when they occur.

Replication apply may also generate conflicts with running queries. Conflict may cause the replay to pause, and eventually queries on the standby may be canceled or disconnected. There are three main types of conflicts that can occur between the master and queries on the standby, which are as follows:

Locks such as Access Exclusive locks
Cleanup records
Other special cases

If cancellations do occur, they will throw either error or fatal-level errors. These will be marked with SQLSTATE 40001 SERIALIZATION FAILURE. This could be trapped by an application, and the SQL can be resubmitted.

You can monitor the number of conflicts that occur in two places. The total number of conflicts in each database can be seen using this query:

SELECT datname, conflicts FROM pg_stat_database;

You can drill down further to look at the types of conflict seen using the following query:

    SELECT
  datname
 ,confl_tablespace
 ,confl_lock
 ,confl_snapshot
 ,confl_bufferpin
 ,confl_deadlock
 FROM pg_stat_database_conflicts;

Tablespace conflicts are the easiest to understand. If you try to drop a tablespace that someone is still using, then you're going to get a conflict. Don't do that!

Lock conflicts are also easy to understand. If you wish to run a command on the master, such as ALTER TABLE ... DROP COLUMN, then you must lock the table first to prevent all types of access. The lock request is sent to the standby server as well, which will then cancel standby queries that are currently accessing that table after a configurable delay.

On high-availability systems, making DDL changes to tables that cause long periods of locking on the master can be difficult. You may want the tables on the standby to stay available for reads during the period in which the changes are being made on the master. To do that, temporarily set these parameters on the standby: max_standby_streaming delay = -1 and max_standby_archive_delay = -1. Then, reload the server. As soon as the first lock record is seen on the standby, all further changes will be held. Once the locks on the master are released, you can reset the original parameter values on the standby, which will then allow the changes to be made there.

Setting the max_standby_streaming_delay and max_standby_archive_delay parameters to -1 is very timid and may not be useful for normal running if the standby is intended to provide High Availability. No user query will ever be canceled if it conflicts with applying changes. It will cause the apply process to wait indefinitely. As a result, the apply delay can increase significantly over time, depending on the frequency and duration of queries and the frequency of conflicts. To work out an appropriate setting for these parameters, you need to understand more about the other types of conflict, though there is also a simple way to avoid this problem entirely.

Snapshot conflicts require some understanding of the internal workings of MVCC, which many people find confusing. To avoid snapshot conflicts, you should set hot_standby_feedback = on in the standby's postgresql.conf file.

In some cases, this could cause table bloat on the master, so it is not set by default. If you don't wish to set hot_standby_feedback = on, then you have further options to consider. You can set an upper limit on the acceptable apply delay caused by conflicts by controlling two similar parameters: max_standby_streaming_delay and max_standby_archive_delay. As a last resort, you can also provide some protection against cancelled queries by setting vacuum_defer_cleanup_age to a value higher than 0. This parameter is fairly hard to set accurately, though I would suggest starting with a value of 1000 and then tune upwards. A vague and inaccurate assumption would be to say that each 1000 will be approximately one second of additional delay. This is probably helpful more often than it is wrong. Other conflict types (bufferpin, deadlocks, and so on) are possible, but they are rare.

If you want a completely static standby database with no further changes applied, then you can do this by stopping the server, modifying recovery.conf such that neither restore_command nor primary_conninfo are set but standby_mode is on, and then restarting the server. You can come back out of this mode, but only if the archive contains the required WAL files to catch up. Otherwise, you will need to reconfigure the standby from a base backup again.

If you attempt to run a non-read-only query, then you will receive an error marked with SQLSTATE 25006 READ ONLY TRANSACTION. That could be used to redirect SQL to the master, where it can execute successfully.