The fundamental data type in R is the vector. You saw a few examples in Chapter 1, and now you’ll learn the details. We’ll start by examining how vectors relate to some other data types in R. You’ll see that unlike in languages in the C family, individual numbers (scalars) do not have separate data types but instead are special cases of vectors. On the other hand, as in C family languages, matrices are special cases of vectors.
We’ll spend a considerable amount of time on the following topics:
The automatic lengthening of vectors in certain settings
The extraction of subsets of vectors
Where functions are applied element-wise to vectors
All of these operations are central to R programming, and you will see them referred to often in the remainder of the book.
In many programming languages, vector variables are considered different from scalars, which are single-number variables. Consider the following C code, for example:
int x; int y[3];
This requests the compiler to allocate space for a single integer named x
and a three-element integer array (C terminology analogous to R’s vector type) named y
. But in R, numbers are actually considered one-element vectors, and there is really no such thing as a scalar.
R variable types are called modes. Recall from Chapter 1 that all elements in a vector must have the same mode, which can be integer, numeric (floating-point number), character (string), logical (Boolean), complex, and so on. If you need your program code to check the mode of a variable x, you can query it by the call typeof(x)
.
Unlike vector indices in ALGOL-family languages, such as C and Python, vector indices in R begin at 1.
Vectors are stored like arrays in C, contiguously, and thus you cannot insert or delete elements—something you may be used to if you are a Python programmer. The size of a vector is determined at its creation, so if you wish to add or delete elements, you’ll need to reassign the vector.
For example, let’s add an element to the middle of a four-element vector:
> x <- c(88,5,12,13) > x <- c(x[1:3],168,x[4]) # insert 168 before the 13 > x [1] 88 5 12 168 13
Here, we created a four-element vector and assigned it to x
. To insert a new number 168 between the third and fourth elements, we strung together the first three elements of x
, then the 168, then the fourth element of x
. This creates a new five-element vector, leaving x
intact for the time being. We then assigned that new vector to x
.
In the result, it appears as if we had actually changed the vector stored in x
, but really we created a new vector and stored that vector in x
. This difference may seem subtle, but it has implications. For instance, in some cases, it may restrict the potential for fast performance in R, as discussed in Chapter 14.
For readers with a background in C, internally, x
is really a pointer, and the reassignment is implemented by pointing x
to the newly created vector.
You can obtain the length of a vector by using the length()
function:
> x <- c(1,2,4) > length(x) [1] 3
In this example, we already know the length of x
, so there really is no need to query it. But in writing general function code, you’ll often need to know the lengths of vector arguments.
For instance, suppose that we wish to have a function that determines the index of the first 1 value in the function’s vector argument (assuming we are sure there is such a value). Here is one (not necessarily efficient) way we could write the code:
first1 <- function(x) { for (i in 1:length(x)) { if (x[i] == 1) break # break out of loop } return(i) }
Without the length()
function, we would have needed to add a second argument to first1()
, say naming it n
, to specify the length of x
.
Note that in this case, writing the loop as follows won’t work:
for (n in x)
The problem with this approach is that it doesn’t allow us to retrieve the index of the desired element. Thus, we need an explicit loop, which in turn requires calculating the length of x
.
One more point about that loop: For careful coding, you should worry that length(x)
might be 0. In such a case, look what happens to the expression 1:length(x)
in our for
loop:
> x <- c() > x NULL > length(x) [1] 0 > 1:length(x) [1] 1 0
Our variable i
in this loop takes on the value 1, then 0, which is certainly not what we want if the vector x
is empty.
A safe alternative is to use the more advanced R function seq()
, as we’ll discuss in Section 2.4.4.
Arrays and matrices (and even lists, in a sense) are actually vectors too, as you’ll see. They merely have extra class attributes. For example, matrices have the number of rows and columns. We’ll discuss them in detail in the next chapter, but it’s worth noting now that arrays and matrices are vectors, and that means that everything we say about vectors applies to them, too.
Consider the following example:
> m [,1] [,2] [1,] 1 2 [2,] 3 4 > m + 10:13 [,1] [,2] [1,] 11 14 [2,] 14 17
The 2-by-2 matrix m
is stored as a four-element vector, column-wise, as (1,3,2,4). We then added (10,11,12,13) to it, yielding (11,14,14,17), but R remembered that we were working with matrices and thus gave the 2-by-2 result you see in the example.