Histograms in the margins of line and scatter plots

In this recipe, we will learn how to draw histograms in the top and right margins of a bivariate scatter plot.

We will use the airpollution.csv example dataset for this recipe. So, let's make sure it is loaded:

Let's make a scatter plot showing the relationship between concentrations of respirable particles and nitrogen oxides with histograms of both the variables in the margins:

How to do it...

The given example is a bit more complex than the recipes we have seen so far. However, if we look at each line of code one-by-one, we can understand it quite easily.

First, we used the layout() function to divide the graph into separate regions for the scatter plot and the two histograms. We could also use the par() function with the mfrow argument instead, but layout() gives us finer control over the height and width of each cell of the graph. When we use par() with mfrow or mfcol to create a matrix layout, all cells are automatically created of equal height and width.

The first argument to the layout() function is a matrix that specifies the number of rows and columns the graphics device should be divided into and the location of each figure. Run just the matrix command from the code at the R prompt to see the resultant matrix:

The matrix values shown here mean that the first figure should be drawn in the second row and first column (scatter plot), the second figure in the first row and first column (a histogram of the X variable), and the third figure in the second row and second column (a histogram of the Y variable).

The other arguments to layout() are widths and heights that specify the widths and heights of the columns and rows, respectively, as a vector. The last argument is set to TRUE so that a unit column width is the same physical measurement on the device as a unit row height.

We have chosen this particular layout so that the scatter plot occupies most of the area of the graph and the histograms are plotted in a smaller area as they are only giving supplementary information.

Once the layout is created, we draw the plots one by one in the order that we set up the layout matrix. So, first we made the scatter plot giving specific x and y axes limits so that we can use the same limits to plot the histograms with the correct breaks.

Then, we made the histogram of nitrogen oxides in the top margin just above the scatter plot. We first used the par() function with the mar argument to set the margins so as not to leave any margin at the bottom and matching the margins on the left and right to those of the scatter plot. We specified breaks exactly as a vector of values between the x and y limits of the scatter plot by using the seq() function. The axes and annotations are suppressed by setting the axes and ann arguments to FALSE, thus giving the histogram a clean minimal look.

Next, we added the rotated histogram of respirable particle concentrations to the right of the scatter plot. We had to do this differently from the first histogram because the hist() function does not have an inbuilt way to draw the bars horizontally. As we have seen in earlier chapters, the barplot() function does have such a capability. So, we first created a histogram object but suppressed its plotting by setting plot to FALSE. Then, we passed the density values from that object to the barplot() function to plot them horizontally by setting the horiz argument to TRUE. Just as with the x axis histogram, we set the breaks of the y histogram equal to a sequence matching the Y limits of the scatter plot. Then, we set the margins so that the bottom and top margins match those of the scatter plot and the left margin is 0. Then, we called the barplot() function to draw the horizontal bars. Note that we set the space argument equal to 0; otherwise, the bars are drawn with gaps between them by default.