Chapter 4. Vectors, Matrices, and Arrays

In Chapters 1 and 2, we saw several types of vectors for logical values, character strings, and of course numbers. This chapter shows you more manipulation techniques for vectors and introduces their multidimensional brethren, matrices and arrays.

Chapter Goals

After reading this chapter, you should:

Be able to create new vectors from existing vectors
Understand lengths, dimensions, and names
Be able to create and manipulate matrices and arrays

Vectors

So far, you have used the colon operator, :, for creating sequences from one number to another, and the c function for concatenating values and vectors to create longer vectors. To recap:

8.5:4.5                #sequence of numbers from 8.5 down to 4.5

## [1] 8.5 7.5 6.5 5.5 4.5

c(1, 1:3, c(5, 8), 13) #values concatenated into single vector

## [1]  1  1  2  3  5  8 13

The vector function creates a vector of a specified type and length. Each of the values in the result is zero, FALSE, or an empty string, or whatever the equivalent of “nothing” is:

vector("numeric", 5)

## [1] 0 0 0 0 0

vector("complex", 5)

## [1] 0+0i 0+0i 0+0i 0+0i 0+0i

vector("logical", 5)

## [1] FALSE FALSE FALSE FALSE FALSE

vector("character", 5)

## [1] "" "" "" "" ""

vector("list", 5)

## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
##
## [[5]]
## NULL

In that last example, NULL is a special “empty” value (not to be confused with NA, which indicates a missing data point). We’ll look at NULL in detail in Chapter 5. For convenience, wrapper functions exist for each type to save you typing when creating vectors in this way. The following commands are equivalent to the previous ones:

numeric(5)

## [1] 0 0 0 0 0

complex(5)

## [1] 0+0i 0+0i 0+0i 0+0i 0+0i

logical(5)

## [1] FALSE FALSE FALSE FALSE FALSE

character(5)

## [1] "" "" "" "" ""

Note

As we’ll see in the next chapter, the list function does not work the same way. list(5) creates something a little different.

Sequences

Beyond the colon operator, there are several functions for creating more general sequences. The seq function is the most general, and allows you to specify sequences in many different ways. In practice, though, you should never need to call it, since there are three other specialist sequence functions that are faster and easier to use, covering specific use cases.

seq.int lets us create a sequence from one number to another. With two inputs, it works exactly like the colon operator:

seq.int(3, 12)     #same as 3:12

##  [1]  3  4  5  6  7  8  9 10 11 12

seq.int is slightly more general than :, since it lets you specify how far apart intermediate values should be:

seq.int(3, 12, 2)

## [1]  3  5  7  9 11

seq.int(0.1, 0.01, -0.01)

##  [1] 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01

seq_len creates a sequence from 1 up to its input, so seq_len(5) is just a clunkier way of writing 1:5. However, the function is extremely useful for situations when its input could be zero:

n <- 0
1:n        #not what you might expect!

## [1] 1 0

seq_len(n)

## integer(0)

seq_along creates a sequence from 1 up to the length of its input:

pp <- c("Peter", "Piper", "picked", "a", "peck", "of", "pickled", "peppers")
for(i in seq_along(pp)) print(pp[i])

## [1] "Peter"
## [1] "Piper"
## [1] "picked"
## [1] "a"
## [1] "peck"
## [1] "of"
## [1] "pickled"
## [1] "peppers"

For each of the preceding examples, you can replace seq.int, seq_len, or seq_along with plain seq and get the same answer, though there is no need to do so.

Lengths

I’ve just sneakily introduced a new concept related to vectors. That is, all vectors have a length, which tells us how many elements they contain. This is a nonnegative integer^[15] (yes, zero-length vectors are allowed), and you can access this value with the length function. Missing values still count toward the length:

length(1:5)

## [1] 5

length(c(TRUE, FALSE, NA))

## [1] 3

One possible source of confusion is character vectors. With these, the length is the number of strings, not the number of characters in each string. For that, we should use nchar:

sn <- c("Sheena", "leads", "Sheila", "needs")
length(sn)

## [1] 4

nchar(sn)

## [1] 6 5 6 5

It is also possible to assign a new length to a vector, but this is an unusual thing to do, and probably indicates bad code. If you shorten a vector, the values at the end will be removed, and if you extend a vector, missing values will be added to the end:

poincare <- c(1, 0, 0, 0, 2, 0, 2, 0)  #See http://oeis.org/A051629
length(poincare) <- 3
poincare

## [1] 1 0 0

length(poincare) <- 8
poincare

## [1]  1  0  0 NA NA NA NA NA

Names

A great feature of R’s vectors is that each element can be given a name. Labeling the elements can often make your code much more readable. You can specify names when you create a vector in the form name = value. If the name of an element is a valid variable name, it doesn’t need to be enclosed in quotes. You can name some elements of a vector and leave others blank:

c(apple = 1, banana = 2, "kiwi fruit" = 3, 4)

##      apple     banana kiwi fruit
##          1          2          3          4

You can add element names to a vector after its creation using the names function:

x <- 1:4
names(x) <- c("apple", "bananas", "kiwi fruit", "")
x

##      apple    bananas kiwi fruit
##          1          2          3          4

This names function can also be used to retrieve the names of a vector:

names(x)

## [1] "apple"      "bananas"    "kiwi fruit" ""

If a vector has no element names, then the names function returns NULL:

names(1:4)

## NULL

Indexing Vectors

Oftentimes we may want to access only part of a vector, or perhaps an individual element. This is called indexing and is accomplished with square brackets, []. (Some people also call it subsetting or subscripting or slicing. All these terms refer to the same thing.) R has a very flexible system that gives us several choices of index:

Passing a vector of positive numbers returns the slice of the vector containing the elements at those locations. The first position is 1 (not 0, as in some other languages).
Passing a vector of negative numbers returns the slice of the vector containing the elements everywhere except at those locations.
Passing a logical vector returns the slice of the vector containing the elements where the index is TRUE.
For named vectors, passing a character vector of names returns the slice of the vector containing the elements with those names.

Consider this vector:

x <- (1:5) ^ 2

## [1]  1  4  9 16 25

These three indexing methods return the same values:

x[c(1, 3, 5)]

x[c(-2, -4)]

x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]

## [1]  1  9 25

After naming each element, this method also returns the same values:

names(x) <- c("one", "four", "nine", "sixteen", "twenty five")
x[c("one", "nine", "twenty five")]

##         one        nine twenty five
##           1           9          25

Mixing positive and negative values is not allowed, and will throw an error:

x[c(1, -1)]      #This doesn't make sense!

## Error: only 0's may be mixed with negative subscripts

If you use positive numbers or logical values as the index, then missing indices correspond to missing values in the result:

x[c(1, NA, 5)]

##         one        <NA> twenty five
##           1          NA          25

x[c(TRUE, FALSE, NA, FALSE, TRUE)]

##         one        <NA> twenty five
##           1          NA          25

Missing values don’t make any sense for negative indices, and cause an error:

x[c(-2, NA)]     #This doesn't make sense either!

## Error: only 0's may be mixed with negative subscripts

Out of range indices, beyond the length of the vector, don’t cause an error, but instead return the missing value NA. In practice, it is usually better to make sure that your indices are in range than to use out of range values:

x[6]

## <NA>
##   NA

Noninteger indices are silently rounded toward zero. This is another case where R is arguably too permissive. If you find yourself passing fractions as indices, you are probably writing bad code:

x[1.9]   #1.9 rounded to 1

## one
##   1

x[-1.9]  #-1.9 rounded to -1

##        four        nine     sixteen twenty five
##           4           9          16          25

Not passing any index will return the whole of the vector, but again, if you find yourself not passing any index, then you are probably doing something odd:

x[]

##         one        four        nine     sixteen twenty five
##           1           4           9          16          25

The which function returns the locations where a logical vector is TRUE. This can be useful for switching from logical indexing to integer indexing:

which(x > 10)

##     sixteen twenty five
##           4           5

which.min and which.max are more efficient shortcuts for which(min(x)) and which(max(x)), respectively:

which.min(x)

## one
##   1

which.max(x)

## twenty five
##           5

Vector Recycling and Repetition

So far, all the vectors that we have added together have been the same length. You may be wondering, “What happens if I try to do arithmetic on vectors of different lengths?”

If we try to add a single number to a vector, then that number is added to each element of the vector:

1:5 + 1

## [1] 2 3 4 5 6

1 + 1:5

## [1] 2 3 4 5 6

When adding two vectors together, R will recycle elements in the shorter vector to match the longer one:

1:5 + 1:15

##  [1]  2  4  6  8 10  7  9 11 13 15 12 14 16 18 20

If the length of the longer vector isn’t a multiple of the length of the shorter one, a warning will be given:

1:5 + 1:7

## Warning: longer object length is not a multiple of shorter object length

## [1]  2  4  6  8 10  7  9

It must be stressed that just because we can do arithmetic on vectors of different lengths, it doesn’t mean that we should. Adding a scalar value to a vector is okay, but otherwise we are liable to get ourselves confused. It is much better to explicitly create equal-length vectors before we operate on them.

The rep function is very useful for this task, letting us create a vector with repeated elements:

rep(1:5, 3)

##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

rep(1:5, each = 3)

##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

rep(1:5, times = 1:5)

##  [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5

rep(1:5, length.out = 7)

## [1] 1 2 3 4 5 1 2

Like the seq function, rep has a simpler and faster variant, rep.int, for the most common case:

rep.int(1:5, 3)  #the same as rep(1:5, 3)

##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Recent versions of R (since v3.0.0) also have rep_len, paralleling seq_len, which lets us specify the length of the output vector:

rep_len(1:5, 13)

##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3

Matrices and Arrays

The vector variables that we have looked at so far are one-dimensional objects, since they have length but no other dimensions. Arrays hold multidimensional rectangular data. “Rectangular” means that each row is the same length, and likewise for each column and other dimensions. Matrices are a special case of two-dimensional arrays.

Creating Arrays and Matrices

To create an array, you call the array function, passing in a vector of values and a vector of dimensions. Optionally, you can also provide names for each dimension:

(three_d_array <- array(
  1:24,
  dim = c(4, 3, 2),
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei"),
    c("un", "deux")
  )
))

## , , un
##
##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12
##
## , , deux
##
##       ein zwei drei
## one    13   17   21
## two    14   18   22
## three  15   19   23
## four   16   20   24

class(three_d_array)

## [1] "array"

The syntax for creating matrices is similar, but rather than passing a dim argument, you specify the number of rows or the number of columns:

(a_matrix <- matrix(
  1:12,
  nrow = 4,            #ncol = 3 works the same
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei")
  )
))

##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12

class(a_matrix)

## [1] "matrix"

This matrix could also be created using the array function. The following two-dimensional array is identical to the matrix that we just created (it even has class matrix):

(two_d_array <- array(
  1:12,
  dim = c(4, 3),
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei")
  )
))

##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12

identical(two_d_array, a_matrix)

## [1] TRUE

class(two_d_array)

## [1] "matrix"

When you create a matrix, the values that you passed in fill the matrix column-wise. It is also possible to fill the matrix row-wise by specifying the argument byrow = TRUE:

matrix(
  1:12,
  nrow = 4,
  byrow = TRUE,
  dimnames = list(
    c("one", "two", "three", "four"),
    c("ein", "zwei", "drei")
  )
)

##       ein zwei drei
## one     1    2    3
## two     4    5    6
## three   7    8    9
## four   10   11   12

Rows, Columns, and Dimensions

For both matrices and arrays, the dim function returns a vector of integers of the dimensions of the variable:

dim(three_d_array)

## [1] 4 3 2

dim(a_matrix)

## [1] 4 3

For matrices, the functions nrow and ncol return the number of rows and columns, respectively:

nrow(a_matrix)

## [1] 4

ncol(a_matrix)

## [1] 3

nrow and ncol also work on arrays, returning the first and second dimensions, respectively, but it is usually better to use dim for higher-dimensional objects:

nrow(three_d_array)

## [1] 4

ncol(three_d_array)

## [1] 3

The length function that we have previously used with vectors also works on matrices and arrays. In this case it returns the product of each of the dimensions:

length(three_d_array)

## [1] 24

length(a_matrix)

## [1] 12

We can also reshape a matrix or array by assigning a new dimension with dim. This should be used with caution since it strips dimension names:

dim(a_matrix) <- c(6, 2)
a_matrix

##      [,1] [,2]
## [1,]    1    7
## [2,]    2    8
## [3,]    3    9
## [4,]    4   10
## [5,]    5   11
## [6,]    6   12

nrow, ncol, and dim return NULL when applied to vectors. The functions NROW and NCOL are counterparts to nrow and ncol that pretend vectors are matrices with a single column (that is, column vectors in the mathematical sense):

identical(nrow(a_matrix), NROW(a_matrix))

## [1] TRUE

identical(ncol(a_matrix), NCOL(a_matrix))

## [1] TRUE

recaman <- c(0, 1, 3, 6, 2, 7, 13, 20)
nrow(recaman)
## NULL
NROW(recaman)
## [1] 8
ncol(recaman)
## NULL
NCOL(recaman)
## [1] 1
dim(recaman)

Row, Column, and Dimension Names

In the same way that vectors have names for the elements, matrices have rownames and colnames for the rows and columns. For historical reasons, there is also a function row.names, which does the same thing as rownames, but there is no corresponding col.names, so it is better to ignore it and use rownames instead. As with the case of nrow, ncol, and dim, the equivalent function for arrays is dimnames. The latter returns a list (see Lists) of character vectors. In the following code chunk, a_matrix has been restored to its previous state, before its dimensions were changed:

rownames(a_matrix)

## [1] "one"   "two"   "three" "four"

colnames(a_matrix)

## [1] "ein"  "zwei" "drei"

dimnames(a_matrix)

## [[1]]
## [1] "one"   "two"   "three" "four"
##
## [[2]]
## [1] "ein"  "zwei" "drei"

rownames(three_d_array)

## [1] "one"   "two"   "three" "four"

colnames(three_d_array)

## [1] "ein"  "zwei" "drei"

dimnames(three_d_array)

## [[1]]
## [1] "one"   "two"   "three" "four"
##
## [[2]]
## [1] "ein"  "zwei" "drei"
##
## [[3]]
## [1] "un"   "deux"

Indexing Arrays

Indexing works just like it does with vectors, except that now we have to specify an index for more than one dimension. As before, we use square brackets to denote an index, and we still have four choices for specifying the index (positive integers, negative integers, logical values, and element names). It is perfectly permissible to specify the indices for different dimensions in different ways. The indices for each dimension are separated by commas:

a_matrix[1, c("zwei", "drei")] #elements in 1st row, 2nd and 3rd columns

## zwei drei
##    5    9

To include all of a dimension, leave the corresponding index blank:

a_matrix[1, ]                  #all of the first row

##  ein zwei drei
##    1    5    9

a_matrix[, c("zwei", "drei")]  #all of the second and third columns

##       zwei drei
## one      5    9
## two      6   10
## three    7   11
## four     8   12

Combining Matrices

The c function converts matrices to vectors before concatenating them:

(another_matrix <- matrix(
  seq.int(2, 24, 2),
  nrow = 4,
  dimnames = list(
    c("five", "six", "seven", "eight"),
    c("vier", "funf", "sechs")
  )
))

##       vier funf sechs
## five     2   10    18
## six      4   12    20
## seven    6   14    22
## eight    8   16    24

c(a_matrix, another_matrix)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12  2  4  6  8 10 12 14 16 18 20 22
## [24] 24

More natural combining of matrices can be achieved by using cbind and rbind, which bind matrices together by columns and rows:

cbind(a_matrix, another_matrix)

##       ein zwei drei vier funf sechs
## one     1    5    9    2   10    18
## two     2    6   10    4   12    20
## three   3    7   11    6   14    22
## four    4    8   12    8   16    24

rbind(a_matrix, another_matrix)

##       ein zwei drei
## one     1    5    9
## two     2    6   10
## three   3    7   11
## four    4    8   12
## five    2   10   18
## six     4   12   20
## seven   6   14   22
## eight   8   16   24

Array Arithmetic

The standard arithmetic operators (+, -, \*, /) work element-wise on matrices and arrays, just they like they do on vectors:

a_matrix + another_matrix

##       ein zwei drei
## one     3   15   27
## two     6   18   30
## three   9   21   33
## four   12   24   36

a_matrix * another_matrix

##       ein zwei drei
## one     2   50  162
## two     8   72  200
## three  18   98  242
## four   32  128  288

When performing arithmetic on two arrays, you need to make sure that they are of an appropriate size (they must be “conformable,” in linear algebra terminology). For example, both arrays must be the same size when adding, and for multiplication the number of rows in the first matrix must be the same as the number of columns in the second matrix:

(another_matrix <- matrix(1:12, nrow = 2))
a_matrix + another_matrix   #adding nonconformable matrices throws an error

If you try to add a vector to an array, then the usual vector recycling rules apply, but the dimension of the results is taken from the array.

The t function transposes matrices (but not higher-dimensional arrays, where the concept isn’t well defined):

t(a_matrix)

##      one two three four
## ein    1   2     3    4
## zwei   5   6     7    8
## drei   9  10    11   12

For inner and outer matrix multiplication, we have the special operators %*% and %o%. In each case, the dimension names are taken from the first input, if they exist:

a_matrix %*% t(a_matrix)  #inner multiplication

##       one two three four
## one   107 122   137  152
## two   122 140   158  176
## three 137 158   179  200
## four  152 176   200  224

1:3 %o% 4:6               #outer multiplication

##      [,1] [,2] [,3]
## [1,]    4    5    6
## [2,]    8   10   12
## [3,]   12   15   18

outer(1:3, 4:6)           #same

##      [,1] [,2] [,3]
## [1,]    4    5    6
## [2,]    8   10   12
## [3,]   12   15   18

The power operator, ^, also works element-wise on matrices, so to invert a matrix you cannot simply raise it to the power of minus one. Instead, this can be done using the solve function:^[16]

(m <- matrix(c(1, 0, 1, 5, -3, 1, 2, 4, 7), nrow = 3))

##      [,1] [,2] [,3]
## [1,]    1    5    2
## [2,]    0   -3    4
## [3,]    1    1    7

m ^ -1

##      [,1]    [,2]   [,3]
## [1,]    1  0.2000 0.5000
## [2,]  Inf -0.3333 0.2500
## [3,]    1  1.0000 0.1429

(inverse_of_m <- solve(m))

##      [,1] [,2] [,3]
## [1,]  -25  -33   26
## [2,]    4    5   -4
## [3,]    3    4   -3

m %*% inverse_of_m

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

Summary

seq and its variants let you create sequences of numbers.
Vectors have a length that can be accessed or set with the length function.
You can name elements of vectors, either when they are created or with the names function.
You can access slices of a vector by passing an index into square brackets. The rep function creates a vector with repeated elements.
Arrays are multidimensional objects, with matrices being the special case of two-dimensional arrays.
nrow, ncol, and dim provide ways of accessing the dimensions of an array.
Likewise, rownames, colnames, and dimnames access the names of array dimensions.

Test Your Knowledge: Quiz

Question 4-1: How would you create a vector containing the values 0, 0.25, 0.5, 0.75, and 1?
Question 4-2: Describe two ways of naming elements in a vector.
Question 4-3: What are the four types of index for a vector?
Question 4-4: What is the length of a 3-by-4-by-5 array?
Question 4-5: Which operator would you use to perform an inner product on two matrices?

Test Your Knowledge: Exercises

Exercise 4-1

The nth triangular number is given by n * (n + 1) / 2. Create a sequence of the first 20 triangular numbers. R has a built-in constant, letters, that contains the lowercase letters of the Roman alphabet. Name the elements of the vector that you just created with the first 20 letters of the alphabet. Select the triangular numbers where the name is a vowel. [10]

Exercise 4-2

The diag function has several uses, one of which is to take a vector as its input and create a square matrix with that vector on the diagonal. Create a 21-by-21 matrix with the sequence 10 to 0 to 11 (i.e., 11, 10, … , 1, 0, 1, …, 11). [5]

Exercise 4-3

By passing two extra arguments to diag, you can specify the dimensions of the output. Create a 20-by-21 matrix with ones on the main diagonal. Now add a row of zeros above this to create a 21-by-21 square matrix, where the ones are offset a row below the main diagonal.

Create another matrix with the ones offset one up from the diagonal.

Add these two matrices together, then add the answer from Exercise 4-2. The resultant matrix is called a Wilkinson matrix.

The eigen function calculates eigenvalues and eigenvectors of a matrix. Calculate the eigenvalues for your Wilkinson matrix. What do you notice about them? [20]

^[15] Lengths are limited to 2^31-1 elements on 32-bit systems and versions of R prior to 3.0.0.

^[16]qr.solve(m) and chol2inv(chol(m)) provide alternative algorithms for inverting matrices, but solve should be your first port of call.