Avoiding Unintended Dimension Reduction

In the world of statistics, dimension reduction is a good thing, with many statistical procedures aimed to do it well. If we are working with, say, 10 variables and can reduce that number to 3 that still capture the essence of our data, we’re happy.

However, in R, something else might merit the name dimension reduction that we may sometimes wish to avoid. Say we have a four-row matrix and extract a row from it:

> z
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8
> r <- z[2,]
> r
[1] 2 6

This seems innocuous, but note the format in which R has displayed r. It’s a vector format, not a matrix format. In other words, r is a vector of length 2, rather than a 1-by-2 matrix. We can confirm this in a couple of ways:

> attributes(z)
$dim
[1] 4 2
> attributes(r)
NULL
> str(z)
 int [1:4, 1:2] 1 2 3 4 5 6 7 8
> str(r)
 int [1:2] 2 6

Here, R informs us that z has row and column numbers, while r does not. Similarly, str() tells us that z has indices ranging in 1:4 and 1:2, for rows and columns, while r’s indices simply range in 1:2. No doubt about it—r is a vector, not a matrix.

This seems natural, but in many cases, it will cause trouble in programs that do a lot of matrix operations. You may find that your code works fine in general but fails in a special case. For instance, suppose that your code extracts a submatrix from a given matrix and then does some matrix operations on the submatrix. If the submatrix has only one row, R will make it a vector, which could ruin your computation.

Fortunately, R has a way to suppress this dimension reduction: the drop argument. Here’s an example, using the matrix z from above:

> r <- z[2,, drop=FALSE]
> r
     [,1] [,2]
[1,]    2    6
> dim(r)
[1] 1 2

Now r is a 1-by-2 matrix, not a two-element vector.

For these reasons, you may find it useful to routinely include the drop=FALSE argument in all your matrix code.

Why can we speak of drop as an argument? Because that [ is actually a function, just as is the case for operators like +. Consider the following code:

> z[3,2]
[1] 7
> "["(z,3,2)
[1] 7

If you have a vector that you wish to be treated as a matrix, you can use the as.matrix() function, as follows:

> u
[1] 1 2 3
> v <- as.matrix(u)
> attributes(u)
NULL
> attributes(v)
$dim
[1] 3 1