Creating a normal Q-Q plot

Commonly, we compare an empirical distribution with known theoretical distribution. The most popular and most used theoretical distribution is the normal distribution. To compare an empirical distribution with a normal distribution, we use a normal Q-Q plot. In this recipe, we will see how we can compare a distribution of a numeric variable with the theoretical normal distribution through a normal Q-Q plot.

Getting ready

The data for this recipe is generated using the following code:

# Set a seed value to make the data reproducible
set.seed(12345)
qqdata <-data.frame(disA=rnorm(n=100,mean=20,sd=3),
                disB=rnorm(n=100,mean=25,sd=4),
                disC=rnorm(n=100,mean=15,sd=1.5),
                age=sample((c(1,2,3,4)),size=100,replace=T),
                sex=sample(c("Male","Female"),size=100,replace=T),
                econ_status=sample(c("Poor","Middle","Rich"),
                size=100,replace=T))

How to do it…

We can produce the plot with the following code:

qqmath(~disA|sex,data=qqdata,f.value=ppoints(50),distribution=qnorm)

The preceding code will produce the Q-Q plot for the disA variable for each value in the sex variable (shown in the following figure):

How it works…

The formula argument is the same as any other function in the lattice environment. A noticeable feature of the qqmath() function is the f.value argument and distribution=; f.value specifies how many quantile points should be used to produce the plot. The default is the number of quantiles that equal the number of actual data points. In our case, we have used 50 quantiles, which means that there will be 50 points that display the distribution in our plot. The distribution argument specifies the theoretical distribution; here, we have used the normal distribution.

There's more…

If we want to produce the same plot for multiple numeric variables, then we can simply add the new variable in the formula as ~disA+disB. For example, to produce the same plot for three variables grouped by the sex variable, we can use the following code:

qqmath(~disA+disB+disC|sex,data=qqdata,f.value=ppoints(50))