Setting the bin size and the number of breaks

As we saw in the previous recipe, the hist() function automatically computes the number of breaks and the size of bins in which the values of the variable will be grouped. In this recipe, we will learn how we can control this and specify exactly how many bins we want or where to have breaks between bars.

Getting ready

Once again, we will use the airpollution.csv example dataset, so make sure that you have loaded it:

air<-read.csv("airpollution.csv")

How to do it...

First, let's see how to specify the number of breaks. Let's make 20 breaks in the nitrogen oxides histogram instead of the default 11:

hist(air$Nitrogen.Oxides,
breaks=20,xlab="Nitrogen Oxide Concentrations",
main="Distribution of Nitrogen Oxide Concentrations")

How it works...

We used the breaks argument to specify the number of bars for the histogram. We set breaks to 20. However, the graph shows more than 20 bars because R uses the value specified only as a suggestion and computes the best way to bin the data with breaks as close to the value specified as possible.

There's more

We can also specify the exact values at which we want the breaks to occur. In this case, R does use the value we specify. Once again, we use the breaks argument but, this time, we have to set it to a numerical vector that contains the values at which we want the breaks. The breaks vector must cover the full range of values of the X variable.

Let's say we want breaks at every 100 units of concentration:

hist(air$Nitrogen.Oxides,
breaks=c(0,100,200,300,400,500,600),
xlab="Nitrogen Oxide Concentrations",
main="Distribution of Nitrogen Oxide Concentrations")

So, as you might have noticed, the breaks argument can take different types of values: a single value that suggests the number of breaks or a vector that specifies exact bin breaks. In addition, breaks can also take a function that computes the number of bins.

Finally, breaks can also take a character string as a value that names an algorithm to calculate the number of bins. By default, it is set to "Sturges". Other names for which algorithms are supplied are "Scott" and "FD" (or "Freedman-Diaconis").