Grouping over a variable

In this recipe, we will see how we can summarize data for a variable with respect to another variable in the dataset. We will learn to group over a variable such that a separate box plot is created for each group.

We will only use the base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the metals.csv example dataset for this recipe. So, let's first load it:

Let's make a box plot that shows copper (Cu) concentrations grouped over measurement sites:

How to do it...

The preceding box plot works by using the formula notation, y~group, where y is the variable whose values are depicted as separated box plots for each value of group.

Grouping over a variable works well only when the group variable has a limited number of values, for example, when it is a category (or factor in terms of an R data type) such as Source in this example. Grouping over another numerical variable with lots of unique values (say, manganese (Mn) concentrations) would result in a graph with too many box plots and not tell us much about the data.

We can also group over more than one category. If we wanted to group over Source and another variable, Expt, the experiment number, we can run:

We will use grouped box plots as examples in the next few recipes.