Monitoring and Diagnosis

In this chapter, you will find recipes for some common monitoring and diagnosis actions you will want to do inside your database. They are meant to answer specific questions that you often face when using PostgreSQL.

Databases are not isolated entities. They live on computer hardware using CPUs, RAM, and disk subsystems. Users access databases using networks. Depending on the setup, databases themselves may need network resources to function in any of the following ways: performing some authentication checks when users log in, using disks that are mounted over the network (not generally recommended), or making remote function calls to other databases.

This means that monitoring only the database is not enough. As a minimum, one should also monitor everything directly involved in using the database. This means knowing the following:

The two aspects of monitoring are collecting historical data to see how things have evolved, and getting alerts when things go seriously wrong. Tools based on a Round Robin Database Tool (RRDtool) such as Munin, or time series databases such as Graphite’s Carbon, and Prometheus, are quite popular for collecting the historical information on all aspects of the servers and presenting this information in an easy-to-follow graphical form, for which Grafana is a popular tool. Seeing several statistics on the same timescale can really help when trying to figure out why the system is behaving the way it is.

Another aspect of monitoring is getting alerts when something goes really wrong and needs (immediate) attention.

For alerting, one of the most widely used tools is Icinga (a fork of Nagios), an established solution. The aforementioned trending tools can integrate with it. check_postgres is a rather popular Icinga plugin for monitoring many standard aspects of a PostgreSQL database server.

Icinga is a stable and mature solution, based on the long-standing approach where each plugin decides whether a given measurement is cause of alarm, more complex to manage and maintain. A more recent tool is the aforementioned Prometheus, based on a design that separates data collection from the centralized alerting logic.

Should you need a solution for both the alerting and trending aspects of a monitoring tool, you might also want to look into Zabbix.

Furthermore, if you require integration with a system based on Simple Network Management Protocol (SNMP), the pgsnmpd project offers some basic support. However, as it is not actively maintained, we recommend switching to one of the other monitoring approaches if at all possible.

In this chapter, we will cover the following recipes:

  • Providing PostgreSQL information to monitoring tools
  • Real-time viewing using pgAdmin or OmniDB
  • Checking whether a user is connected
  • Checking whether a computer is connected
  • Repeatedly executing a query in psql
  • Checking which queries are running
  • Checking which queries are active or blocked
  • Knowing who is blocking a query
  • Killing a specific session
  • Detecting an in-doubt prepared transaction
  • Knowing whether anybody is using a specific table
  • Knowing when a table was last used
  • Usage of disk space by temporary data
  • Understanding why queries slow down
  • Investigating and reporting a bug
  • Producing a daily summary of log file errors
  • Analyzing the real-time performance of your queries