vmstat

If you post a question to the pgsql-performance mailing list that suggests your system might be overloaded, the first thing you'll be asked for is a snapshot of vmstat data. It's the most valuable quick summary of what your system is doing. Because it displays a full system snapshot per line, it's even possible to extract short-term trends from staring at a screen full of data.

Since the output from vmstat is a bit too wide to fit on the page at once, we've broken it up into a left and right side for now; later examples will include just the interesting columns. Here's the left side showing a few seconds of heavy memory-limited pgbench work:

    $ vmstat 1
    procs -----------memory------------- ---swap--
    r  b   swpd   free    buff   cache   si   so
    8  0      0 2542248 386604 3999148    0    0
    3  0      0 2517448 386668 4023252    0    0
    1  0      0 2494880 386732 4043064    0    0
    7  1      0 2476404 386792 4060776    0    0

The explanations for these columns in the manual for vmstat are:

r: The number of processes waiting for runtime.
b: The number of processes in uninterruptible sleep.
swpd: The amount of virtual memory used.
free: The amount of idle memory.
buff: The amount of memory used as buffers.
cache: The amount of memory used as cache.
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).

Next, you'll see some examples of how to interpret the procs data, and what it looks like when the server runs low on RAM. One thing not shown here is what happens when the server starts using swap. On a database server, if you're using swap at all, you've probably made a configuration error, and should reduce memory usage. Therefore, the main thing to monitor the swap figures for is that any value other than zero for si or so is a likely problem. On Linux, the swappiness setting (covered in Chapter 4, Disk Setup) can have a major impact on how this works.

The part of the vmstat data that's much more interesting for database performance is that on the right side; these are the other half of the preceding four lines:

    $ vmstat 1
    ----io---- --system--- -----cpu------
    bi    bo    in    cs us sy id wa st
    24 38024  7975 73394 40 18 34  7  0
    48 57652 11701 93110 43 16 34  6  0
    36 75932 11936 86932 44 15 34  7  0
    4 96628 12423 77317 39 17 37  6  0

Here's what the manual has to say about these:

bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
us: CPU Time spent running non-kernel code. (user time, including nice time)
sy: CPU Time spent running kernel code. (system time)
id: CPU Time spent idle.
wa: CPU Time spent waiting for IO.
st: CPU Time stolen from a virtual machine.

The various "CPU Time" figures are all given in percentages. By default, the Linux vmstat being used here is counting blocks in units of 1,024 bytes, which means that the numbers given are in KB/s. Therefore the first bo figure, 38,024, means approximately 38 MB/s of disk writes happened during that time. This may not be true on non-Linux systems; see the following iostat section for more background about block sizes.

All of the vmstat examples here are produced using a one second time interval, the parameter passed on the command line in the preceding examples. All of the counts in its data (as opposed to the percentages) are averages per second over the given time period, so the interpretation isn't impacted by the collection period. It just changes the resolution of the data you see.

The other thing to note about vmstat and iostat is that when you run them, the first line they output is a long-term one summarizing all activity since the server was started. The snapshots of a small unit of time start on the second line printed. If you're writing scripts to collect this data and process it, typically you'll need to be careful to always throw away the first line.

As a first example of what bad data looks like, here's a snapshot from the preceding pgbench run showing a period where the system became less responsive for about two seconds:

    procs ----io---- --system--- -----cpu------
    r  b   bi    bo    in    cs us sy id wa st  
    2  2    4 93448 11747 84051 44 19 32  5  0
    0  3    0 54156  8888 47518 23 10 53 14  0
    0  2    0  6944  1259  1322  1  0 72 27  0
    0  2    0 12168  2025  2422  0  0 65 35  0
    8  0    0 26916  5090 41152 23  9 47 21  0
    2  0    4 57960  9802 54723 31 12 46 11  0

Note the dramatic drop in context switches (cs) for the middle two entries there. Since most completed work executed by the server and the pgbench client itself involves a context switch, those low entries represent a period where almost nothing happened. Instead of tens of thousands of things happening during that second, there were only a few thousand. Also note how that corresponds with a jump in the waiting for the I/O (wa) category, and the CPUs becoming less active. All these things are characteristics of what a bad performing section of time looks like, when the system is at a bottleneck waiting for the disk drive(s).