NA and NULL Values

Readers with a background in other scripting languages may be aware of “no such animal” values, such as None in Python and undefined in Perl. R actually has two such values: NA and NULL.

In statistical data sets, we often encounter missing data, which we represent in R with the value NA. NULL, on the other hand, represents that the value in question simply doesn’t exist, rather than being existent but unknown. Let’s see how this comes into play in concrete terms.

Using NA

In many of R’s statistical functions, we can instruct the function to skip over any missing values, or NAs. Here is an example:

> x <- c(88,NA,12,168,13)
> x
[1]  88  NA  12 168  13
> mean(x)
[1] NA
> mean(x,na.rm=T)
[1] 70.25
> x <- c(88,NULL,12,168,13)
> mean(x)
[1] 70.25

In the first call, mean() refused to calculate, as one value in x was NA. But by setting the optional argument na.rm (NA remove) to true (T), we calculated the mean of the remaining elements. But R automatically skipped over the NULL value, which we’ll look at in the next section.

There are multiple NA values, one for each mode:

> x <- c(5,NA,12)
> mode(x[1])
[1] "numeric"
> mode(x[2])
[1] "numeric"
> y <- c("abc","def",NA)
> mode(y[2])
[1] "character"
> mode(y[3])
[1] "character"

Using NULL

One use of NULL is to build up vectors in loops, in which each iteration adds another element to the vector. In this simple example, we build up a vector of even numbers:

# build up a vector of the even numbers in 1:10
> z <- NULL
> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)
> z
[1]  2  4  6  8 10

Recall from Chapter 1 that %% is the modulo operator, giving remainders upon division. For example, 13 %% 4 is 1, as the remainder of dividing 13 by 4 is 1. (See Section 7.2 for a list of arithmetic and logic operators.) Thus the example loop starts with a NULL vector and then adds the element 2 to it, then 4, and so on.

This is a very artificial example, of course, and there are much better ways to do this particular task. Here are two more ways another way to find even numbers in 1:10:

> seq(2,10,2)
[1]  2  4  6  8 10
> 2*1:5
[1]  2  4  6  8 10

But the point here is to demonstrate the difference between NA and NULL. If we were to use NA instead of NULL in the preceding example, we would pick up an unwanted NA:

> z <- NA
> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)
> z
[1] NA  2  4  6  8 10

NULL values really are counted as nonexistent, as you can see here:

> u <- NULL
> length(u)
[1] 0
> v <- NA
> length(v)
[1] 1

NULL is a special R object with no mode.