Chapter 4. Vectors, Matrices, and Arrays

In Chapters 1 and 2, we saw several types of vectors for logical values, character strings, and of course numbers. This chapter shows you more manipulation techniques for vectors and introduces their multidimensional brethren, matrices and arrays.

After reading this chapter, you should:

So far, you have used the colon operator, :, for creating sequences from one number to another, and the c function for concatenating values and vectors to create longer vectors. To recap:

8.5:4.5                #sequence of numbers from 8.5 down to 4.5
## [1] 8.5 7.5 6.5 5.5 4.5
c(1, 1:3, c(5, 8), 13) #values concatenated into single vector
## [1]  1  1  2  3  5  8 13

The vector function creates a vector of a specified type and length. Each of the values in the result is zero, FALSE, or an empty string, or whatever the equivalent of “nothing” is:

vector("numeric", 5)
## [1] 0 0 0 0 0
vector("complex", 5)
## [1] 0+0i 0+0i 0+0i 0+0i 0+0i
vector("logical", 5)
## [1] FALSE FALSE FALSE FALSE FALSE
vector("character", 5)
## [1] "" "" "" "" ""
vector("list", 5)
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
##
## [[5]]
## NULL

In that last example, NULL is a special “empty” value (not to be confused with NA, which indicates a missing data point). We’ll look at NULL in detail in Chapter 5. For convenience, wrapper functions exist for each type to save you typing when creating vectors in this way. The following commands are equivalent to the previous ones:

numeric(5)
## [1] 0 0 0 0 0
complex(5)
## [1] 0+0i 0+0i 0+0i 0+0i 0+0i
logical(5)
## [1] FALSE FALSE FALSE FALSE FALSE
character(5)
## [1] "" "" "" "" ""

Oftentimes we may want to access only part of a vector, or perhaps an individual element. This is called indexing and is accomplished with square brackets, []. (Some people also call it subsetting or subscripting or slicing. All these terms refer to the same thing.) R has a very flexible system that gives us several choices of index:

Consider this vector:

x <- (1:5) ^ 2
## [1]  1  4  9 16 25

These three indexing methods return the same values:

x[c(1, 3, 5)]
x[c(-2, -4)]
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
## [1]  1  9 25

After naming each element, this method also returns the same values:

names(x) <- c("one", "four", "nine", "sixteen", "twenty five")
x[c("one", "nine", "twenty five")]
##         one        nine twenty five
##           1           9          25

Mixing positive and negative values is not allowed, and will throw an error:

x[c(1, -1)]      #This doesn't make sense!
## Error: only 0's may be mixed with negative subscripts

If you use positive numbers or logical values as the index, then missing indices correspond to missing values in the result:

x[c(1, NA, 5)]
##         one        <NA> twenty five
##           1          NA          25
x[c(TRUE, FALSE, NA, FALSE, TRUE)]
##         one        <NA> twenty five
##           1          NA          25

Missing values don’t make any sense for negative indices, and cause an error:

x[c(-2, NA)]     #This doesn't make sense either!
## Error: only 0's may be mixed with negative subscripts

Out of range indices, beyond the length of the vector, don’t cause an error, but instead return the missing value NA. In practice, it is usually better to make sure that your indices are in range than to use out of range values:

x[6]
## <NA>
##   NA

Noninteger indices are silently rounded toward zero. This is another case where R is arguably too permissive. If you find yourself passing fractions as indices, you are probably writing bad code:

x[1.9]   #1.9 rounded to 1
## one
##   1
x[-1.9]  #-1.9 rounded to -1
##        four        nine     sixteen twenty five
##           4           9          16          25

Not passing any index will return the whole of the vector, but again, if you find yourself not passing any index, then you are probably doing something odd:

x[]
##         one        four        nine     sixteen twenty five
##           1           4           9          16          25

The which function returns the locations where a logical vector is TRUE. This can be useful for switching from logical indexing to integer indexing:

which(x > 10)
##     sixteen twenty five
##           4           5

which.min and which.max are more efficient shortcuts for which(min(x)) and which(max(x)), respectively:

which.min(x)
## one
##   1
which.max(x)
## twenty five
##           5

So far, all the vectors that we have added together have been the same length. You may be wondering, “What happens if I try to do arithmetic on vectors of different lengths?”

If we try to add a single number to a vector, then that number is added to each element of the vector:

1:5 + 1
## [1] 2 3 4 5 6
1 + 1:5
## [1] 2 3 4 5 6

When adding two vectors together, R will recycle elements in the shorter vector to match the longer one:

1:5 + 1:15
##  [1]  2  4  6  8 10  7  9 11 13 15 12 14 16 18 20

If the length of the longer vector isn’t a multiple of the length of the shorter one, a warning will be given:

1:5 + 1:7
## Warning: longer object length is not a multiple of shorter object length
## [1]  2  4  6  8 10  7  9

It must be stressed that just because we can do arithmetic on vectors of different lengths, it doesn’t mean that we should. Adding a scalar value to a vector is okay, but otherwise we are liable to get ourselves confused. It is much better to explicitly create equal-length vectors before we operate on them.

The rep function is very useful for this task, letting us create a vector with repeated elements:

rep(1:5, 3)
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
rep(1:5, each = 3)
##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
rep(1:5, times = 1:5)
##  [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
rep(1:5, length.out = 7)
## [1] 1 2 3 4 5 1 2

Like the seq function, rep has a simpler and faster variant, rep.int, for the most common case:

rep.int(1:5, 3)  #the same as rep(1:5, 3)
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Recent versions of R (since v3.0.0) also have rep_len, paralleling seq_len, which lets us specify the length of the output vector:

rep_len(1:5, 13)
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3

The vector variables that we have looked at so far are one-dimensional objects, since they have length but no other dimensions. Arrays hold multidimensional rectangular data. “Rectangular” means that each row is the same length, and likewise for each column and other dimensions. Matrices are a special case of two-dimensional arrays.

To create an array, you call the array function, passing in a vector of values and a vector of dimensions. Optionally, you can also provide names for each dimension:

(three_d_array <- array(
  1:24,
  dim = c(4, 3, 2),
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei"),
    c("un", "deux")
  )
))
## , , un
##
##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12
##
## , , deux
##
##       ein zwei drei
## one    13   17   21
## two    14   18   22
## three  15   19   23
## four   16   20   24
class(three_d_array)
## [1] "array"

The syntax for creating matrices is similar, but rather than passing a dim argument, you specify the number of rows or the number of columns:

(a_matrix <- matrix(
  1:12,
  nrow = 4,            #ncol = 3 works the same
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei")
  )
))
##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12
class(a_matrix)
## [1] "matrix"

This matrix could also be created using the array function. The following two-dimensional array is identical to the matrix that we just created (it even has class matrix):

(two_d_array <- array(
  1:12,
  dim = c(4, 3),
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei")
  )
))
##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12
identical(two_d_array, a_matrix)
## [1] TRUE
class(two_d_array)
## [1] "matrix"

When you create a matrix, the values that you passed in fill the matrix column-wise. It is also possible to fill the matrix row-wise by specifying the argument byrow = TRUE:

matrix(
  1:12,
  nrow = 4,
  byrow = TRUE,
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei")
  )
)
##       ein zwei drei
## one     1    2    3
## two     4    5    6
## three   7    8    9
## four   10   11   12

The standard arithmetic operators (+, -, \*, /) work element-wise on matrices and arrays, just they like they do on vectors:

a_matrix + another_matrix
##       ein zwei drei
## one     3   15   27
## two     6   18   30
## three   9   21   33
## four   12   24   36
a_matrix * another_matrix
##       ein zwei drei
## one     2   50  162
## two     8   72  200
## three  18   98  242
## four   32  128  288

When performing arithmetic on two arrays, you need to make sure that they are of an appropriate size (they must be “conformable,” in linear algebra terminology). For example, both arrays must be the same size when adding, and for multiplication the number of rows in the first matrix must be the same as the number of columns in the second matrix:

(another_matrix <- matrix(1:12, nrow = 2))
a_matrix + another_matrix   #adding nonconformable matrices throws an error

If you try to add a vector to an array, then the usual vector recycling rules apply, but the dimension of the results is taken from the array.

The t function transposes matrices (but not higher-dimensional arrays, where the concept isn’t well defined):

t(a_matrix)
##      one two three four
## ein    1   2     3    4
## zwei   5   6     7    8
## drei   9  10    11   12

For inner and outer matrix multiplication, we have the special operators %*% and %o%. In each case, the dimension names are taken from the first input, if they exist:

a_matrix %*% t(a_matrix)  #inner multiplication
##       one two three four
## one   107 122   137  152
## two   122 140   158  176
## three 137 158   179  200
## four  152 176   200  224
1:3 %o% 4:6               #outer multiplication
##      [,1] [,2] [,3]
## [1,]    4    5    6
## [2,]    8   10   12
## [3,]   12   15   18
outer(1:3, 4:6)           #same
##      [,1] [,2] [,3]
## [1,]    4    5    6
## [2,]    8   10   12
## [3,]   12   15   18

The power operator, ^, also works element-wise on matrices, so to invert a matrix you cannot simply raise it to the power of minus one. Instead, this can be done using the solve function:[16]

(m <- matrix(c(1, 0, 1, 5, -3, 1, 2, 4, 7), nrow = 3))
##      [,1] [,2] [,3]
## [1,]    1    5    2
## [2,]    0   -3    4
## [3,]    1    1    7
m ^ -1
##      [,1]    [,2]   [,3]
## [1,]    1  0.2000 0.5000
## [2,]  Inf -0.3333 0.2500
## [3,]    1  1.0000 0.1429
(inverse_of_m <- solve(m))
##      [,1] [,2] [,3]
## [1,]  -25  -33   26
## [2,]    4    5   -4
## [3,]    3    4   -3
m %*% inverse_of_m
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1


[15] Lengths are limited to 2^31-1 elements on 32-bit systems and versions of R prior to 3.0.0.

[16] qr.solve(m) and chol2inv(chol(m)) provide alternative algorithms for inverting matrices, but solve should be your first port of call.