Chapter 2. Vectors

image with no caption

The fundamental data type in R is the vector. You saw a few examples in Chapter 1, and now you’ll learn the details. We’ll start by examining how vectors relate to some other data types in R. You’ll see that unlike in languages in the C family, individual numbers (scalars) do not have separate data types but instead are special cases of vectors. On the other hand, as in C family languages, matrices are special cases of vectors.

We’ll spend a considerable amount of time on the following topics:

Recycling

The automatic lengthening of vectors in certain settings

Filtering

The extraction of subsets of vectors

Vectorization

Where functions are applied element-wise to vectors

All of these operations are central to R programming, and you will see them referred to often in the remainder of the book.

In many programming languages, vector variables are considered different from scalars, which are single-number variables. Consider the following C code, for example:

int x;
int y[3];

This requests the compiler to allocate space for a single integer named x and a three-element integer array (C terminology analogous to R’s vector type) named y. But in R, numbers are actually considered one-element vectors, and there is really no such thing as a scalar.

R variable types are called modes. Recall from Chapter 1 that all elements in a vector must have the same mode, which can be integer, numeric (floating-point number), character (string), logical (Boolean), complex, and so on. If you need your program code to check the mode of a variable x, you can query it by the call typeof(x).

Unlike vector indices in ALGOL-family languages, such as C and Python, vector indices in R begin at 1.

You can obtain the length of a vector by using the length() function:

> x <- c(1,2,4)
> length(x)
[1] 3

In this example, we already know the length of x, so there really is no need to query it. But in writing general function code, you’ll often need to know the lengths of vector arguments.

For instance, suppose that we wish to have a function that determines the index of the first 1 value in the function’s vector argument (assuming we are sure there is such a value). Here is one (not necessarily efficient) way we could write the code:

first1 <- function(x) {
   for (i in 1:length(x)) {
      if (x[i] == 1) break  # break out of loop
   }
   return(i)
}

Without the length() function, we would have needed to add a second argument to first1(), say naming it n, to specify the length of x.

Note that in this case, writing the loop as follows won’t work:

for (n in x)

The problem with this approach is that it doesn’t allow us to retrieve the index of the desired element. Thus, we need an explicit loop, which in turn requires calculating the length of x.

One more point about that loop: For careful coding, you should worry that length(x) might be 0. In such a case, look what happens to the expression 1:length(x) in our for loop:

> x <- c()
> x
NULL
> length(x)
[1] 0
> 1:length(x)
[1] 1 0

Our variable i in this loop takes on the value 1, then 0, which is certainly not what we want if the vector x is empty.

A safe alternative is to use the more advanced R function seq(), as we’ll discuss in Section 2.4.4.