Using ggplot2

In the next few pages, you will see how to apply the principles of the The Grammar of Graphics with the ggplot2 package. With this package, you are able to change a lot of details in your graphics and create your own individual style. We will give you examples of some of its main settings.

You can find the ggplot2 package on CRAN, which makes it very easy to install it:

You are then able to simply load it with the following:

Or, you can just check the box in front of the package name in the packages pane of RStudio:

Installing the ggplot2 package

The ggplot2 package provides two functions to create graphic objects:

qplot stands just for quick plot, and ggplot is an abbreviation of grammar of graphics plot, which shows its strong connection to the framework mentioned earlier.

qplot aims to be very similar to the basic plot function, and to be very simple to use. But it does not follow the full capacity of the framework and its elements.

For beginners, ggplot and its aspects are not easy to learn, but when you've made yourself familiar with the function, it is a very powerful way to create graphics.

ggplot always focuses on enabling the building of graphics using the three basic components:

But ggplot offers a lot of different options for these components.

For our first graph, we will use the preinstalled iris dataset. This data will be loaded automatically when you open R or RStudio. You can look at it using the following line:

We can then use the ggplot() function and add a data argument and an aesthetics element:

If you execute this line, you will get the following error message:

This is telling you that you have to add a geom object to the function call, which actually defines the type of the chart:

Creating your first graph with ggplot2

After adding some further options, you can see the power of ggplot2, and how easy it is with one line of code to create a complex data visualization:

Creating your first graph with ggplot2

We will now take a closer look at the elements that we used to create this graph. But before we can do this, we have to look at the most significant difference from the base plotting system: the plus (+) operator.

The aesthetics function helps you to define what data values should be added to the geom and how variables in the data are mapped to visual properties. You can define the x and y locations, as well as additional parameters such as the color or the size. This depends on the geom function that you use for your visualization. Different visualization forms understand different aesthetics inputs.

Basically, you have to decide on what geoms option to use, based on your dataset and what you want to visualize. Choosing a geom function is deciding how you want to represent data points and variables. The different geom functions return a layer that you can than add to your ggplot object with the + operator. So, to add a layer to our previous example, we could use the geom_point() option, which is often used if you want to visualize two variables and turn the output into a scatterplot:

This will create the following output in the Viewer pane:

Adding layers using geoms

First, we created a ggplot object with the iris dataset. This will not display anything, as it does not include a layer. We add this to our object with the plus operator, and choose a geom_point() model in this case. For this layer object, we define the aesthetics to be Sepal.Length on the x axis, and Sepal.Width on the y axis.

Besides this, the geom_point() function understands seven different aesthetic inputs. They are:

So, we can add another parameter to the aesthetics function of the geom_point() model. In this case, we define the color and shape of the data point to change according to the species it displays:

Adding layers using geoms

Choosing the right geom is one of the most important tasks when you want to visualize your data. You need to have a vision of what you want to visualize, and what it should look like. And you also should know the variables that you want to display in your graphic.

ggplot2 offers a lot of different geoms, and you can choose one according to your needs. Basically, geoms are separated based on how many variables you want to visualize:

One Variable

Two Variables

Continuous Variable

  • geom_area
  • geom_density
  • geom_dotplot
  • geom_freqpoly
  • geom_histogram

Continuous X, Continuous Y variable

  • geom_blank
  • geom_jitter
  • geom_point
  • geom_point
  • geom_quantile
  • geom_rug
  • geom_smooth
  • geom_text

Discrete Variable

  • geom_bar

Discrete X, Continuous Y variable

  • geom_bar
  • beom_boxplot
  • geom_dotplot
  • geom_violin

Three Variables

  • geom_contour
  • geom_raster
  • geom_tile

Continuous Bivariate Distribution

  • geom_bin2d
  • geom_density2d
  • geom_hex

Graphical Primitives

  • geom_polygon
  • geom_path
  • geom_ribbon
  • geom_segment
  • geom_rect

Continuous Function

  • geom_area
  • geom_line
  • geom_step
  • Visualizing error
  • geom_crossbar
  • geom_errorbar
  • geom_linerange
  • geom_pointrange

You can get more information about the geom types by searching through the ggplot package description for geom. In RStudio, you can do this by clicking on the ggplot2 package in the Package browser. Then, the Help pane will open and you can search for geom.

Choosing the right geom

ggplot offers a lot of ways to modify your graphics. We will now take a look at three options:

Another way to add a layer is by using the stats element. These layers do not just display your data, but they also transform it. Some of the geoms elements already include stat objects. As do the geom_area() or geom_bar() functions. This one includes the stat argument to be bin, and the other one identity.

You can also include it on your own by adding a stat layer to your ggplot object:

Using stats layers

You can get a good overview of the stats options, as you could get for the geom options. This time, you search for stats in the ggplot package description:

Using stats layers

Exporting graphics from R can sometimes be very hard when you are working with the base plotting system, but ggplot2 offers you the ggsave() function. This function just needs a filename, including a file extension, to save your plot:

The ggsave function currently recognizes the following extensions:

Besides the file format, you can also set a scaling factor for the width, height, as well as the dpi to the user for raster graphics.

When called, ggsave saves the last displayed plot. But you can also specify a plot with the plot argument.