While legends are the most commonly used method of providing a key to read multiple-variable graphs, they are often not the easiest to read. Labeling lines directly is one way of getting around this problem.
We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script for use again later.
Let's use the gdp.txt
example dataset to look at the trends in annual GDP of five countries:
gdp<-read.table("gdp_long.txt",header=T) library(RColorBrewer) pal<-brewer.pal(5,"Set1") par(mar=par()$mar+c(0,0,0,2),bty="l") plot(Canada~Year,data=gdp,type="l",lwd=2,lty=1,ylim=c(30,60), col=pal[1],main="Percentage change in GDP",ylab="") mtext(side=4,at=gdp$Canada[length(gdp$Canada)],text="Canada", col=pal[1],line=0.3,las=2) lines(gdp$France~gdp$Year,col=pal[2],lwd=2) mtext(side=4,at=gdp$France[length(gdp$France)],text="France", col=pal[2],line=0.3,las=2) lines(gdp$Germany~gdp$Year,col=pal[3],lwd=2) mtext(side=4,at=gdp$Germany[length(gdp$Germany)],text="Germany", col=pal[3],line=0.3,las=2) lines(gdp$Britain~gdp$Year,col=pal[4],lwd=2) mtext(side=4,at=gdp$Britain[length(gdp$Britain)],text="Britain", col=pal[4],line=0.3,las=2) lines(gdp$USA~gdp$Year,col=pal[5],lwd=2) mtext(side=4,at=gdp$USA[length(gdp$USA)]-2, text="USA",col=pal[5],line=0.3,las=2)
We first read the gdp.txt
data file using the read.table()
function. Next, we loaded the RColorBrewer
color palette library and set our color palette, pal
, to "Set1"
(with five colors).
Before drawing the graph, we used the par()
command to add extra space to the right margin, so that we have enough space for the labels. Depending on the size of the text labels, you might have to experiment with this margin until you get it right. Finally, we set the box type (bty
) to an L shape ("l"
) so that there is no line on the right margin. We can also set it to "c"
if we want to keep the top line.
We used the mtext()
function to label each of the lines individually in the right margin. The first argument we passed to the function is the side where we want the label to be placed. Sides (margins) are numbered starting from 1
for the bottom side and going round in a clockwise direction so that 2
is left, 3
is top, and 4
is right.
The at
argument was used to specify the Y coordinate of the label. This is a bit tricky because we have to make sure we place the label as close to the corresponding line as possible. So, here we have used the last value of each line. For example, gdp$France[length(gdp$France)
picks the last value in the France
vector by using its length as the index. Note that we had to adjust the value for USA by subtracting 2
from its last value so that it doesn't overlap the label for Canada.
We used the text
argument to set the text of the labels as country names. We set the col
argument to the appropriate element of the pal
vector by using a number index. The line
argument sets an offset in terms of margin lines, starting at 0
counting forward. Finally, setting las
to 2
rotates the labels so they're perpendicular to the axis, instead of the default value of 1
, which makes them parallel to the axis.
Sometimes, simply using the last value of a set of values might not work because the value might be missing. In that case, we can use the second-last value or visually choose a value that places the label closest to the line. Also, the size of the plot window and the proximity of the final values can cause overlapping of labels. So, we might need to iterate a few times before we get the placement right. We can write functions to automate this process but it is still good to visually inspect the outcome.