It is often useful to know the number of observations for each variable or group when comparing them on a box plot. We did this earlier with the varwidth
argument that makes the widths of boxes proportional to the square root of the number of observations. In this recipe, we will learn how to display the number of observations on a box plot.
We will continue using the base graphics library functions, so we need not load any additional library or package. We just need to run the recipe code at the R prompt. We can also save the code as a script to use it later. Here, we will use the metals.csv
example dataset again:
metals<-read.csv("metals.csv")
Once again, let's use the metal concentrations box plot and display the number of observations for each metal below its label on the x axis:
b<-boxplot(metals[,-1], xaxt="n",border = "white",col = "black", boxwex = 0.3,medlwd=1,whiskcol="black", staplecol="black",outcol="red",cex=0.3,outpch=19, main="Summary of metal concentrations by Site") axis(side=1,at=1:length(b$names), labels=paste(b$names,"\n(n=",b$n,")",sep=""), mgp=c(3,2,0))
In the example, we first made the same stylized box plot as we did two recipes ago, but we suppressed drawing the default x axis by setting xaxt
to "n"
. We then used the axis()
command to create our custom axis with the metal names and number of observations as labels.
We set side
to 1
to denote the x axis. Note that we saved the object returned by the boxplot()
function as b
, which is a list that contains useful information about the box plot. You can test this by typing in b
at the R prompt and hitting the Enter key (after you've run the boxplot
command). We combined the names
and n
(number of observations) components of b
using paste()
to construct the labels
argument. The at
argument was set to integer values starting from 1
to the number of metals. Finally, we also used the mgp
argument to set the margin line for the axis labels to 2
, instead of the default 1
, so that the extra line with the number of observations doesn't make the labels overlap with the tick marks (you can see this if you omit mgp
).
Another way of displaying the number of observations on a box plot is to use the boxplot.n()
function from the gplots
package. First, let's make sure that the gplots
package is installed and loaded:
install.packages("gplots") library(gplots) boxplot.n(metals[,-1], border = "white",col = "black",boxwex = 0.3, medlwd=1,whiskcol="black",staplecol="black", outcol="red",cex=0.3,outpch=19, main="Summary of metal concentrations by Site")
The problem with using this function is that the number labels are cut off by the axis. One way to get around this problem is to place the labels at the top of the plot region by setting the top
argument to TRUE
in the boxplot.n()
function call.