As in most programming languages, the heart of R programming consists of writing functions. A function is a group of instructions that takes inputs, uses them to compute other values, and returns a result.
As a simple introduction, let’s define a function named oddcount()
, whose purpose is to count the odd numbers in a vector of integers. Normally, we would compose the function code using a text editor and save it in a file, but in this quick-and-dirty example, we’ll enter it line by line in R’s interactive mode. We’ll then call the function on a couple of test cases.
# counts the number of odd integers in x > oddcount <- function(x) { + k <- 0 # assign 0 to k + for (n in x) { + if (n %% 2 == 1) k <- k+1 # %% is the modulo operator + } + return(k) + } > oddcount(c(1,3,5)) [1] 3 > oddcount(c(1,2,3,7,9)) [1] 4
First, we told R that we wanted to define a function named oddcount
with one argument, x
. The left brace demarcates the start of the body of the function. We wrote one R statement per line.
Until the body of the function is finished, R reminds you that you’re still in the definition by using +
as its prompt, instead of the usual >
. (Actually, +
is a line-continuation character, not a prompt for a new input.) R resumes the >
prompt after you finally enter a right brace to conclude the function body.
After defining the function, we evaluated two calls to oddcount()
. Since there are three odd numbers in the vector (1,3,5)
, the call oddcount(c(1,3,5))
returns the value 3
. There are four odd numbers in (1,2,3,7,9)
, so the second call returns 4
.
Notice that the modulo operator for remainder arithmetic is %%
in R, as indicated by the comment. For example, 38 divided by 7 leaves a remainder of 3:
> 38 %% 7 [1] 3
For instance, let’s see what happens with the following code:
for (n in x) { if (n %% 2 == 1) k <- k+1 }
First, it sets n
to x[1]
, and then it tests that value for being odd or even. If the value is odd, which is the case here, the count variable k
is incremented. Then n
is set to x[2]
, tested for being odd or even, and so on.
By the way, C/C++ programmers might be tempted to write the preceding loop like this:
for (i in 1:length(x)) { if (x[i] %% 2 == 1) k <- k+1 }
Here, length(x)
is the number of elements in x
. Suppose there are 25 elements. Then 1:length(x)
means 1:25, which in turn means 1,2,3,...,25. This code would also work (unless x
were to have length 0), but one of the major themes of R programming is to avoid loops if possible; if not, keep loops simple. Look again at our original formulation:
for (n in x) { if (n %% 2 == 1) k <- k+1 }
It’s simpler and cleaner, as we do not need to resort to using the length()
function and array indexing.
At the end of the code, we use the return
statement:
return(k)
This has the function return the computed value of k
to the code that called it. However, simply writing the following also works:
k
R functions will return the last value computed if there is no explicit return()
call. However, this approach must be used with care, as we will discuss in Section 7.4.1.
In programming language terminology, x
is the formal argument (or formal parameter) of the function oddcount()
. In the first function call in the preceding example, c(1,3,5)
is referred to as the actual argument. These terms allude to the fact that x
in the function definition is just a placeholder, whereas c(1,3,5)
is the value actually used in the computation. Similarly, in the second function call, c(1,2,3,7,9)
is the actual argument.
A variable that is visible only within a function body is said to be local to that function. In oddcount()
, k
and n
are local variables. They disappear after the function returns:
> oddcount(c(1,2,3,7,9)) [1] 4 > n Error: object 'n' not found
It’s very important to note that the formal parameters in an R function are local variables. Suppose we make the following function call:
> z <- c(2,6,7) > oddcount(z)
Now suppose that the code of oddcount()
changes x
. Then z
would not change. After the call to oddcount()
, z
would have the same value as before. To evaluate a function call, R copies each actual argument to the corresponding local parameter variable, and changes to that variable are not visible outside the function. Scoping rules such as these will be discussed in detail in Chapter 7.
Variables created outside functions are global and are available within functions as well. Here’s an example:
> f <- function(x) return(x+y) > y <- 3 > f(5) [1] 8
Here y
is a global variable.
A global variable can be written to from within a function by using R’s superassignment operator, <<-
. This is also discussed in Chapter 7.
R also makes frequent use of default arguments. Consider a function definition like this:
> g <- function(x,y=2,z=T) { ... }
Here y
will be initialized to 2
if the programmer does not specify y
in the call. Similarly, z
will have the default value TRUE
.
Now consider this call:
> g(12,z=FALSE)
Here, the value 12
is the actual argument for x
, and we accept the default value of 2
for y
, but we override the default for z
, setting its value to FALSE
.
The preceding example also demonstrates that, like many programming languages, R has a Boolean type; that is, it has the logical values TRUE
and FALSE
.
R allows TRUE
and FALSE
to be abbreviated to T
and F
. However, you may choose not to abbreviate these values to avoid trouble if you have a variable named T
or F
.