In Chapters 1 and 2, we saw several types of vectors for logical values, character strings, and of course numbers. This chapter shows you more manipulation techniques for vectors and introduces their multidimensional brethren, matrices and arrays.
After reading this chapter, you should:
So far, you have used the colon operator, :
, for creating sequences from one number to another, and the c
function for concatenating values and vectors to create longer vectors. To recap:
8.5
:4.5
#sequence of numbers from 8.5 down to 4.5
## [1] 8.5 7.5 6.5 5.5 4.5
c(
1
,
1
:3
,
c(
5
,
8
),
13
)
#values concatenated into single vector
## [1] 1 1 2 3 5 8 13
The vector
function creates a vector of a specified type and length. Each of the values in the result is zero, FALSE
, or an empty string, or whatever the equivalent of “nothing” is:
vector(
"numeric"
,
5
)
## [1] 0 0 0 0 0
vector(
"complex"
,
5
)
## [1] 0+0i 0+0i 0+0i 0+0i 0+0i
vector(
"logical"
,
5
)
## [1] FALSE FALSE FALSE FALSE FALSE
vector(
"character"
,
5
)
## [1] "" "" "" "" ""
vector(
"list"
,
5
)
## [[1]] ## NULL ## ## [[2]] ## NULL ## ## [[3]] ## NULL ## ## [[4]] ## NULL ## ## [[5]] ## NULL
In that last example, NULL
is a special “empty” value (not to be confused with NA
, which indicates a missing data point). We’ll look at NULL
in detail in Chapter 5. For convenience, wrapper functions exist for each type to save you typing when creating vectors in this way. The following commands are equivalent to the previous ones:
numeric(
5
)
## [1] 0 0 0 0 0
complex(
5
)
## [1] 0+0i 0+0i 0+0i 0+0i 0+0i
logical(
5
)
## [1] FALSE FALSE FALSE FALSE FALSE
character(
5
)
## [1] "" "" "" "" ""
As we’ll see in the next chapter, the list
function does not work the same way. list(5)
creates something a little different.
Beyond the colon operator, there are several functions for creating more general sequences. The seq
function is the most general, and allows you to specify sequences in many different ways. In practice, though, you should never need to call it, since there are three other specialist sequence functions that are faster and easier to use, covering specific use cases.
seq.int
lets us create a sequence from one number to another. With two inputs, it works exactly like the colon operator:
seq.int(
3
,
12
)
#same as 3:12
## [1] 3 4 5 6 7 8 9 10 11 12
seq.int
is slightly more general than :
, since it lets you specify how far apart intermediate values should be:
seq.int(
3
,
12
,
2
)
## [1] 3 5 7 9 11
seq.int(
0.1
,
0.01
,
-0.01
)
## [1] 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01
seq_len
creates a sequence from 1 up to its input, so seq_len(5)
is just a clunkier way of writing 1:5
. However, the function is extremely useful for situations when its input could be zero:
n<-
0
1
:n#not what you might expect!
## [1] 1 0
seq_len(
n)
## integer(0)
seq_along
creates a sequence from 1 up to the length of its input:
pp<-
c(
"Peter"
,
"Piper"
,
"picked"
,
"a"
,
"peck"
,
"of"
,
"pickled"
,
"peppers"
)
for
(
i in seq_along(
pp))
print(
pp[
i])
## [1] "Peter" ## [1] "Piper" ## [1] "picked" ## [1] "a" ## [1] "peck" ## [1] "of" ## [1] "pickled" ## [1] "peppers"
For each of the preceding examples, you can replace seq.int
, seq_len
, or seq_along
with plain seq
and get the same answer, though there is no need to do so.
I’ve just sneakily introduced a new concept related to vectors. That is, all vectors have a length, which tells us how many elements they contain. This is a nonnegative integer[15] (yes, zero-length vectors are allowed), and you can access this value with the length
function. Missing values still count toward the length:
length(
1
:5
)
## [1] 5
length(
c(
TRUE
,
FALSE
,
NA
))
## [1] 3
One possible source of confusion is character vectors. With these, the length is the number of strings, not the number of characters in each string. For that, we should use nchar
:
sn<-
c(
"Sheena"
,
"leads"
,
"Sheila"
,
"needs"
)
length(
sn)
## [1] 4
nchar(
sn)
## [1] 6 5 6 5
It is also possible to assign a new length to a vector, but this is an unusual thing to do, and probably indicates bad code. If you shorten a vector, the values at the end will be removed, and if you extend a vector, missing values will be added to the end:
poincare<-
c(
1
,
0
,
0
,
0
,
2
,
0
,
2
,
0
)
#See http://oeis.org/A051629
length(
poincare)
<-
3
poincare
## [1] 1 0 0
length(
poincare)
<-
8
poincare
## [1] 1 0 0 NA NA NA NA NA
A great feature of R’s vectors is that each element can be given a name. Labeling the elements can often make your code much more readable. You can specify names when you create a vector in the form name = value
. If the name of an element is a valid variable name, it doesn’t need to be enclosed in quotes. You can name some elements of a vector and leave others blank:
c(
apple=
1
,
banana=
2
,
"kiwi fruit"
=
3
,
4
)
## apple banana kiwi fruit ## 1 2 3 4
You can add element names to a vector after its creation using the names
function:
x<-
1
:4
names(
x)
<-
c(
"apple"
,
"bananas"
,
"kiwi fruit"
,
""
)
x
## apple bananas kiwi fruit ## 1 2 3 4
This names
function can also be used to retrieve the names of a vector:
names(
x)
## [1] "apple" "bananas" "kiwi fruit" ""
If a vector has no element names, then the names
function returns NULL
:
names(
1
:4
)
## NULL
Oftentimes we may want to access only part of a vector, or perhaps an individual element. This is called indexing and is accomplished with square brackets, []
. (Some people also call it subsetting or subscripting or slicing. All these terms refer to the same thing.) R has a very flexible system that gives us several choices of index:
TRUE
.
Consider this vector:
x<-
(
1
:5
)
^
2
## [1] 1 4 9 16 25
These three indexing methods return the same values:
x[
c(
1
,
3
,
5
)]
x[
c(
-2
,
-4
)]
x[
c(
TRUE
,
FALSE
,
TRUE
,
FALSE
,
TRUE
)]
## [1] 1 9 25
After naming each element, this method also returns the same values:
names(
x)
<-
c(
"one"
,
"four"
,
"nine"
,
"sixteen"
,
"twenty five"
)
x[
c(
"one"
,
"nine"
,
"twenty five"
)]
## one nine twenty five ## 1 9 25
Mixing positive and negative values is not allowed, and will throw an error:
x[
c(
1
,
-1
)]
#This doesn't make sense!
## Error: only 0's may be mixed with negative subscripts
If you use positive numbers or logical values as the index, then missing indices correspond to missing values in the result:
x[
c(
1
,
NA
,
5
)]
## one <NA> twenty five ## 1 NA 25
x[
c(
TRUE
,
FALSE
,
NA
,
FALSE
,
TRUE
)]
## one <NA> twenty five ## 1 NA 25
Missing values don’t make any sense for negative indices, and cause an error:
x[
c(
-2
,
NA
)]
#This doesn't make sense either!
## Error: only 0's may be mixed with negative subscripts
Out of range indices, beyond the length of the vector, don’t cause an error, but instead return the missing value NA
. In practice, it is usually better to make sure that your indices are in range than to use out of range values:
x[
6
]
## <NA> ## NA
Noninteger indices are silently rounded toward zero. This is another case where R is arguably too permissive. If you find yourself passing fractions as indices, you are probably writing bad code:
x[
1.9
]
#1.9 rounded to 1
## one ## 1
x[
-1.9
]
#-1.9 rounded to -1
## four nine sixteen twenty five ## 4 9 16 25
Not passing any index will return the whole of the vector, but again, if you find yourself not passing any index, then you are probably doing something odd:
x[]
## one four nine sixteen twenty five ## 1 4 9 16 25
The which
function returns the locations where a logical vector is TRUE
. This can be useful for switching from logical indexing to integer indexing:
which(
x>
10
)
## sixteen twenty five ## 4 5
which.min
and which.max
are more efficient shortcuts for which(min(x))
and which(max(x))
, respectively:
which.min(
x)
## one ## 1
which.max(
x)
## twenty five ## 5
So far, all the vectors that we have added together have been the same length. You may be wondering, “What happens if I try to do arithmetic on vectors of different lengths?”
If we try to add a single number to a vector, then that number is added to each element of the vector:
1
:5
+
1
## [1] 2 3 4 5 6
1
+
1
:5
## [1] 2 3 4 5 6
When adding two vectors together, R will recycle elements in the shorter vector to match the longer one:
1
:5
+
1
:15
## [1] 2 4 6 8 10 7 9 11 13 15 12 14 16 18 20
If the length of the longer vector isn’t a multiple of the length of the shorter one, a warning will be given:
1
:5
+
1
:7
## Warning: longer object length is not a multiple of shorter object length
## [1] 2 4 6 8 10 7 9
It must be stressed that just because we can do arithmetic on vectors of different lengths, it doesn’t mean that we should. Adding a scalar value to a vector is okay, but otherwise we are liable to get ourselves confused. It is much better to explicitly create equal-length vectors before we operate on them.
The rep
function is very useful for this task, letting us create a vector with repeated elements:
rep(
1
:5
,
3
)
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
rep(
1
:5
,
each=
3
)
## [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
rep(
1
:5
,
times=
1
:5
)
## [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
rep(
1
:5
,
length.out=
7
)
## [1] 1 2 3 4 5 1 2
Like the seq
function, rep
has a simpler and faster variant, rep.int
, for the most common case:
rep.int(
1
:5
,
3
)
#the same as rep(1:5, 3)
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Recent versions of R (since v3.0.0) also have rep_len
, paralleling seq_len
, which lets us specify the length of the output vector:
rep_len(
1
:5
,
13
)
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3
The vector variables that we have looked at so far are one-dimensional objects, since they have length but no other dimensions. Arrays hold multidimensional rectangular data. “Rectangular” means that each row is the same length, and likewise for each column and other dimensions. Matrices are a special case of two-dimensional arrays.
To create an array, you call the array
function, passing in a vector of values and a vector of dimensions. Optionally, you can also provide names for each dimension:
(
three_d_array<-
array(
1
:24
,
dim=
c(
4
,
3
,
2
),
dimnames=
list(
c(
"one"
,
"two"
,
"three"
,
"four"
),
c(
"ein"
,
"zwei"
,
"drei"
),
c(
"un"
,
"deux"
)
)
))
## , , un ## ## ein zwei drei ## one 1 5 9 ## two 2 6 10 ## three 3 7 11 ## four 4 8 12 ## ## , , deux ## ## ein zwei drei ## one 13 17 21 ## two 14 18 22 ## three 15 19 23 ## four 16 20 24
class(
three_d_array)
## [1] "array"
The syntax for creating matrices is similar, but rather than passing a dim
argument, you specify the number of rows or the number of columns:
(
a_matrix<-
matrix(
1
:12
,
nrow=
4
,
#ncol = 3 works the same
dimnames=
list(
c(
"one"
,
"two"
,
"three"
,
"four"
),
c(
"ein"
,
"zwei"
,
"drei"
)
)
))
## ein zwei drei ## one 1 5 9 ## two 2 6 10 ## three 3 7 11 ## four 4 8 12
class(
a_matrix)
## [1] "matrix"
This matrix could also be created using the array
function. The following two-dimensional array is identical to the matrix that we just created (it even has class matrix
):
(
two_d_array<-
array(
1
:12
,
dim=
c(
4
,
3
),
dimnames=
list(
c(
"one"
,
"two"
,
"three"
,
"four"
),
c(
"ein"
,
"zwei"
,
"drei"
)
)
))
## ein zwei drei ## one 1 5 9 ## two 2 6 10 ## three 3 7 11 ## four 4 8 12
identical(
two_d_array,
a_matrix)
## [1] TRUE
class(
two_d_array)
## [1] "matrix"
When you create a matrix, the values that you passed in fill the matrix column-wise. It is also possible to fill the matrix row-wise by specifying the argument byrow = TRUE
:
matrix(
1
:12
,
nrow=
4
,
byrow=
TRUE
,
dimnames=
list(
c(
"one"
,
"two"
,
"three"
,
"four"
),
c(
"ein"
,
"zwei"
,
"drei"
)
)
)
## ein zwei drei ## one 1 2 3 ## two 4 5 6 ## three 7 8 9 ## four 10 11 12
For both matrices and arrays, the dim
function returns a vector of integers of the dimensions of the variable:
dim(
three_d_array)
## [1] 4 3 2
dim(
a_matrix)
## [1] 4 3
For matrices, the functions nrow
and ncol
return the number of rows and columns, respectively:
nrow(
a_matrix)
## [1] 4
ncol(
a_matrix)
## [1] 3
nrow
and ncol
also work on arrays, returning the first and second dimensions, respectively, but it is usually better to use dim
for higher-dimensional objects:
nrow(
three_d_array)
## [1] 4
ncol(
three_d_array)
## [1] 3
The length
function that we have previously used with vectors also works on matrices and arrays. In this case it returns the product of each of the dimensions:
length(
three_d_array)
## [1] 24
length(
a_matrix)
## [1] 12
We can also reshape a matrix or array by assigning a new dimension with dim
. This should be used with caution since it strips dimension names:
dim(
a_matrix)
<-
c(
6
,
2
)
a_matrix
## [,1] [,2] ## [1,] 1 7 ## [2,] 2 8 ## [3,] 3 9 ## [4,] 4 10 ## [5,] 5 11 ## [6,] 6 12
nrow
, ncol
, and dim
return NULL
when applied to vectors. The functions NROW
and NCOL
are counterparts to nrow
and ncol
that pretend vectors are matrices with a single column (that is, column vectors in the mathematical sense):
identical(
nrow(
a_matrix),
NROW(
a_matrix))
## [1] TRUE
identical(
ncol(
a_matrix),
NCOL(
a_matrix))
## [1] TRUE
recaman<-
c(
0
,
1
,
3
,
6
,
2
,
7
,
13
,
20
)
nrow(
recaman)
## NULL
NROW(
recaman)
## [1] 8
ncol(
recaman)
## NULL
NCOL(
recaman)
## [1] 1
dim(
recaman)
In the same way that vectors have names
for the elements, matrices have rownames
and colnames
for the rows and columns. For historical reasons, there is also a function row.names
, which does the same thing as rownames
, but there is no corresponding col.names
, so it is better to ignore it and use rownames
instead. As with the case of nrow
, ncol
, and dim
, the equivalent function for arrays is dimnames
. The latter returns a list (see Lists) of character vectors. In the following code chunk, a_matrix
has been restored to its previous state, before its dimensions were changed:
rownames(
a_matrix)
## [1] "one" "two" "three" "four"
colnames(
a_matrix)
## [1] "ein" "zwei" "drei"
dimnames(
a_matrix)
## [[1]] ## [1] "one" "two" "three" "four" ## ## [[2]] ## [1] "ein" "zwei" "drei"
rownames(
three_d_array)
## [1] "one" "two" "three" "four"
colnames(
three_d_array)
## [1] "ein" "zwei" "drei"
dimnames(
three_d_array)
## [[1]] ## [1] "one" "two" "three" "four" ## ## [[2]] ## [1] "ein" "zwei" "drei" ## ## [[3]] ## [1] "un" "deux"
Indexing works just like it does with vectors, except that now we have to specify an index for more than one dimension. As before, we use square brackets to denote an index, and we still have four choices for specifying the index (positive integers, negative integers, logical values, and element names). It is perfectly permissible to specify the indices for different dimensions in different ways. The indices for each dimension are separated by commas:
a_matrix[
1
,
c(
"zwei"
,
"drei"
)]
#elements in 1st row, 2nd and 3rd columns
## zwei drei ## 5 9
To include all of a dimension, leave the corresponding index blank:
a_matrix[
1
,
]
#all of the first row
## ein zwei drei ## 1 5 9
a_matrix[,
c(
"zwei"
,
"drei"
)]
#all of the second and third columns
## zwei drei ## one 5 9 ## two 6 10 ## three 7 11 ## four 8 12
The c
function converts matrices to vectors before concatenating them:
(
another_matrix<-
matrix(
seq.int(
2
,
24
,
2
),
nrow=
4
,
dimnames=
list(
c(
"five"
,
"six"
,
"seven"
,
"eight"
),
c(
"vier"
,
"funf"
,
"sechs"
)
)
))
## vier funf sechs ## five 2 10 18 ## six 4 12 20 ## seven 6 14 22 ## eight 8 16 24
c(
a_matrix,
another_matrix)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 2 4 6 8 10 12 14 16 18 20 22 ## [24] 24
More natural combining of matrices can be achieved by using cbind
and rbind
, which bind matrices together by columns and rows:
cbind(
a_matrix,
another_matrix)
## ein zwei drei vier funf sechs ## one 1 5 9 2 10 18 ## two 2 6 10 4 12 20 ## three 3 7 11 6 14 22 ## four 4 8 12 8 16 24
rbind(
a_matrix,
another_matrix)
## ein zwei drei ## one 1 5 9 ## two 2 6 10 ## three 3 7 11 ## four 4 8 12 ## five 2 10 18 ## six 4 12 20 ## seven 6 14 22 ## eight 8 16 24
The standard arithmetic operators (+
, -
, \*
, /
) work element-wise on matrices and arrays, just they like they do on vectors:
a_matrix +
another_matrix
## ein zwei drei ## one 3 15 27 ## two 6 18 30 ## three 9 21 33 ## four 12 24 36
a_matrix *
another_matrix
## ein zwei drei ## one 2 50 162 ## two 8 72 200 ## three 18 98 242 ## four 32 128 288
When performing arithmetic on two arrays, you need to make sure that they are of an appropriate size (they must be “conformable,” in linear algebra terminology). For example, both arrays must be the same size when adding, and for multiplication the number of rows in the first matrix must be the same as the number of columns in the second matrix:
(
another_matrix<-
matrix(
1
:12
,
nrow=
2
))
a_matrix+
another_matrix#adding nonconformable matrices throws an error
If you try to add a vector to an array, then the usual vector recycling rules apply, but the dimension of the results is taken from the array.
The t
function transposes matrices (but not higher-dimensional arrays, where the concept isn’t well defined):
t(
a_matrix)
## one two three four ## ein 1 2 3 4 ## zwei 5 6 7 8 ## drei 9 10 11 12
For inner and outer matrix multiplication, we have the special operators %*%
and %o%
. In each case, the dimension names are taken from the first input, if they exist:
a_matrix%*%
t(
a_matrix)
#inner multiplication
## one two three four ## one 107 122 137 152 ## two 122 140 158 176 ## three 137 158 179 200 ## four 152 176 200 224
1
:3
%
o%
4
:6
#outer multiplication
## [,1] [,2] [,3] ## [1,] 4 5 6 ## [2,] 8 10 12 ## [3,] 12 15 18
outer(
1
:3
,
4
:6
)
#same
## [,1] [,2] [,3] ## [1,] 4 5 6 ## [2,] 8 10 12 ## [3,] 12 15 18
The power operator, ^
, also works element-wise on matrices, so to invert a matrix you cannot simply raise it to the power of minus one. Instead, this can be done using the solve
function:[16]
(
m<-
matrix(
c(
1
,
0
,
1
,
5
,
-3
,
1
,
2
,
4
,
7
),
nrow=
3
))
## [,1] [,2] [,3] ## [1,] 1 5 2 ## [2,] 0 -3 4 ## [3,] 1 1 7
m^
-1
## [,1] [,2] [,3] ## [1,] 1 0.2000 0.5000 ## [2,] Inf -0.3333 0.2500 ## [3,] 1 1.0000 0.1429
(
inverse_of_m<-
solve(
m))
## [,1] [,2] [,3] ## [1,] -25 -33 26 ## [2,] 4 5 -4 ## [3,] 3 4 -3
m %*%
inverse_of_m
## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 1 0 ## [3,] 0 0 1
seq
and its variants let you create sequences of numbers.
length
function.
names
function.
rep
function creates a vector with repeated elements.
nrow
, ncol
, and dim
provide ways of accessing the dimensions of an array.
rownames
, colnames
, and dimnames
access the names of array dimensions.
n
th triangular number is given by n
* (
n
+ 1) / 2
. Create a sequence of the first 20 triangular numbers.
R has a built-in constant, letters
, that contains the lowercase letters of the Roman alphabet. Name the elements of the vector that you just created with the first 20 letters of the alphabet.
Select the triangular numbers where the name is a vowel. [10]
diag
function has several uses, one of which is to take a vector as its input and create a square matrix with that vector on the diagonal. Create a 21-by-21 matrix with the sequence 10 to 0 to 11 (i.e., 11, 10, … , 1, 0, 1, …, 11). [5]
By passing two extra arguments to diag
, you can specify the dimensions of the output. Create a 20-by-21 matrix with ones on the main diagonal. Now add a row of zeros above this to create a 21-by-21 square matrix, where the ones are offset a row below the main diagonal.
Create another matrix with the ones offset one up from the diagonal.
Add these two matrices together, then add the answer from Exercise 4-2. The resultant matrix is called a Wilkinson matrix.
The eigen
function calculates eigenvalues and eigenvectors of a matrix. Calculate the eigenvalues for your Wilkinson matrix. What do you notice about them? [20]