Showing the number of observations

It is often useful to know the number of observations for each variable or group when comparing them on a box plot. We did this earlier with the varwidth argument that makes the widths of boxes proportional to the square root of the number of observations. In this recipe, we will learn how to display the number of observations on a box plot.

We will continue using the base graphics library functions, so we need not load any additional library or package. We just need to run the recipe code at the R prompt. We can also save the code as a script to use it later. Here, we will use the metals.csv example dataset again:

Once again, let's use the metal concentrations box plot and display the number of observations for each metal below its label on the x axis:

How to do it...

In the example, we first made the same stylized box plot as we did two recipes ago, but we suppressed drawing the default x axis by setting xaxt to "n". We then used the axis() command to create our custom axis with the metal names and number of observations as labels.

We set side to 1 to denote the x axis. Note that we saved the object returned by the boxplot() function as b, which is a list that contains useful information about the box plot. You can test this by typing in b at the R prompt and hitting the Enter key (after you've run the boxplot command). We combined the names and n (number of observations) components of b using paste() to construct the labels argument. The at argument was set to integer values starting from 1 to the number of metals. Finally, we also used the mgp argument to set the margin line for the axis labels to 2, instead of the default 1, so that the extra line with the number of observations doesn't make the labels overlap with the tick marks (you can see this if you omit mgp).

Another way of displaying the number of observations on a box plot is to use the boxplot.n() function from the gplots package. First, let's make sure that the gplots package is installed and loaded:

There's more

The problem with using this function is that the number labels are cut off by the axis. One way to get around this problem is to place the labels at the top of the plot region by setting the top argument to TRUE in the boxplot.n() function call.