R is a block-structured language in the manner of the ALGOL-descendant family, such as C, C++, Python, Perl, and so on. As you’ve seen, blocks are delineated by braces, though braces are optional if the block consists of just a single statement. Statements are separated by newline characters or, optionally, by semicolons.
Here, we cover the basic structures of R as a programming language. We’ll review some more details on loops and the like and then head straight into the topic of functions, which will occupy most of the chapter.
In particular, issues of variable scope will play a major role. As with many scripting languages, you do not “declare” variables in R. Programmers who have a background in, say, the C language, will find similarities in R at first but then will see that R has a richer scoping structure.
Control statements in R look very similar to those of the ALGOL-descendant family languages mentioned above. Here, we’ll look at loops and if-else
statements.
In Section 1.3, we defined the oddcount()
function. In that function, the following line should have been instantly recognized by Python programmers:
for (n in x) {
It means that there will be one iteration of the loop for each component of the vector x
, with n
taking on the values of those components—in the first iteration, n = x[1];
in the second iteration, n = x[2];
and so on. For example, the following code uses this structure to output the square of every element in a vector:
> x <- c(5,12,13) > for (n in x) print(n^2) [1] 25 [1] 144 [1] 169
C-style looping with while
and repeat
is also available, complete with break
, a statement that causes control to leave the loop. Here is an example that uses all three:
> i <- 1 > while (i <= 10) i <- i+4 > i [1] 13 > > i <- 1 > while(TRUE) { # similar loop to above + i <- i+4 + if (i > 10) break + } > i [1] 13 > > i <- 1 > repeat { # again similar + i <- i+4 + if (i > 10) break + } > i [1] 13
In the first code snippet, the variable i
took on the values 1, 5, 9, and 13 as the loop went through its iterations. In that last case, the condition i <= 10
failed, so the break
took hold and we left the loop.
This code shows three different ways of accomplishing the same thing, with break
playing a key role in the second and third ways.
Note that repeat
has no Boolean exit condition. You must use break
(or something like return()).
Of course, break
can be used with for
loops, too.
Another useful statement is next
, which instructs the interpreter to skip the remainder of the current iteration of the loop and proceed directly to the next one. This provides a way to avoid using complexly nested if-then-else constructs, which can make the code confusing. Let’s take a look at an example that uses next
. The following code comes from an extended example in Chapter 8:
1 sim <- function(nreps) { 2 commdata <- list() 3 commdata$countabsamecomm <- 0 4 for (rep in 1:nreps) { 5 commdata$whosleft <- 1:20 6 commdata$numabchosen <- 0 7 commdata <- choosecomm(commdata,5) 8 if (commdata$numabchosen > 0) next 9 commdata <- choosecomm(commdata,4) 10 if (commdata$numabchosen > 0) next 11 commdata <- choosecomm(commdata,3) 12 } 13 print(commdata$countabsamecomm/nreps) 14 }
There are next
statements in lines 8 and 10. Let’s see how they work and how they improve on the alternatives. The two next
statements occur within the loop that starts at line 4. Thus, when the if
condition holds in line 8, lines 9 through 11 will be skipped, and control will transfer to line 4. The situation in line 10 is similar.
Without using next
, we would need to resort to nested if
statements, something like these:
1 sim <- function(nreps) { 2 commdata <- list() 3 commdata$countabsamecomm <- 0 4 for (rep in 1:nreps) { 5 commdata$whosleft <- 1:20 6 commdata$numabchosen <- 0 7 commdata <- choosecomm(commdata,5) 8 if (commdata$numabchosen == 0) { 9 commdata <- choosecomm(commdata,4) 10 if (commdata$numabchosen == 0) 11 commdata <- choosecomm(commdata,3) 12 } 13 } 14 print(commdata$countabsamecomm/nreps) 15 }
Because this simple example has just two levels, it’s not too bad. However, nested if
statements can become confusing when you have more levels.
The for
construct works on any vector, regardless of mode. You can loop over a vector of filenames, for instance. Say we have a file named file1 with the following contents:
1 2 3 4 5 6
We also have a file named file2 with these contents:
5 12 13
The following loop reads and prints each of these files. We use the scan()
function here to read in a file of numbers and store those values in a vector. We’ll talk more about scan()
in Chapter 10.
> for (fn in c("file1","file2")) print(scan(fn)) Read 6 items [1] 1 2 3 4 5 6 Read 3 items [1] 5 12 13
So, fn
is first set to file1, and the file of that name is read in and printed out. Then the same thing happens for file2.
R does not directly support iteration over nonvector sets, but there are a couple of indirect yet easy ways to accomplish it:
Use lapply()
, assuming that the iterations of the loop are independent of each other, thus allowing them to be performed in any order.
Use get()
. As its name implies, this function takes as an argument a character string representing the name of some object and returns the object of that name. It sounds simple, but get()
is a very powerful function.
Let’s look at an example of using get()
. Say we have two matrices, u
and v
, containing statistical data, and we wish to apply R’s linear regression function lm()
to each of them.
> u [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 4 > v [,1] [,2] [1,] 8 15 [2,] 12 10 [3,] 20 2 > for (m in c("u","v")) { + z <- get(m) + print(lm(z[,2] ˜ z[,1])) + } Call: lm(formula = z[, 2] ˜ z[, 1]) Coefficients: (Intercept) z[, 1] −0.6667 1.5000 Call: lm(formula = z[, 2] ˜ z[, 1]) Coefficients: (Intercept) z[, 1] 23.286 −1.071
Here, m
was first set to u
. Then these lines assign the matrix u
to z
, which allows the call to lm()
on u
:
z <- get(m) print(lm(z[,2] ˜ z[,1]))
The same then occurs with v
.
The syntax for if-else
looks like this:
if (r == 4) { x <- 1 } else { x <- 3 y <- 4 }
It looks simple, but there is an important subtlety here. The if
section consists of just a single statement:
x <- 1
So, you might guess that the braces around that statement are not necessary. However, they are indeed needed.
The right brace before the else
is used by the R parser to deduce that this is an if-else
rather than just an if
. In interactive mode, without braces, the parser would mistakenly think the latter and act accordingly, which is not what we want.
An if-else
statement works as a function call, and as such, it returns the last value assigned.
v <- if (cond) expression1 else expression2
This will set v
to the result of expression1
or expression2
, depending on whether cond
is true. You can use this fact to compact your code. Here’s a simple example:
> x <- 2 > y <- if(x == 2) x else x+1 > y [1] 2 > x <- 3 > y <- if(x == 2) x else x+1 > y [1] 4
Without taking this tack, the code
y <- if(x == 2) x else x+1
would instead consist of the somewhat more cluttered
if(x == 2) y <- x else y <- x+1
In more complex examples, expression1
and/or expression2
could be function calls. On the other hand, you probably should not let compactness take priority over clarity.
When working with vectors, use the ifelse()
function, as discussed in Chapter 2, as it will likely produce faster code.