Environment and Scope Issues

A function—formally referred to as a closure in the R documentation—consists not only of its arguments and body but also of its environment. The latter is made up of the collection of objects present at the time the function is created. An understanding of how environments work in R is essential for writing effective R functions.

The Top-Level Environment

Consider this example:

> w <- 12
> f <- function(y) {
+    d <- 8
+    h <- function() {
+       return(d*(w+y))
+    }
+    return(h())
+ }
> environment(f)
<environment: R_GlobalEnv>

Here, the function f() is created at the top level—that is, at the interpreter command prompt—and thus has the top-level environment, which in R output is referred to as R_GlobalEnv but which confusingly you refer to in R code as .GlobalEnv. If you run an R program as a batch file, that is considered top level, too.

The function ls() lists the objects of an environment. If you call it at the top level, you get the top-level environment. Let’s try it with our example code:

> ls()
[1] "f" "w"

As you can see, the top-level environment here includes the variable w, which is actually used within f(). Note that f() is here too, as functions are indeed objects and we did create it at the top level. At levels other than the top, ls() works a little differently, as you’ll see in Section 7.6.3.

You get a bit more information from ls.str():

> ls.str()
f : function (y)
w :  num 12

Next, we’ll look at how w and other variables come into play within f().

The Scope Hierarchy

Let’s first get an intuitive overview of how scope works in R and then relate it to environments.

If we were working with the C language (as usual, background in C is not assumed), we would say that the variable w in the previous section is global to f(), while d is local to f(). Things are similar in R, but R is more hierarchical. In C, we would not have functions defined within functions, as we have with h() inside f() in our example. Yet, since functions are objects, it is possible—and sometimes desirable from the point of view of the encapsulation goal of object-oriented programming—to define a function within a function; we are simply creating an object, which we can do anywhere.

Here, we have h() being local to f(), just like d. In such a situation, it makes sense for scope to be hierarchical. Thus, R is set up so that d, which is local to f(), is in turn global to h(). The same is true for y, as arguments are considered locals in R.

In terms of environments then, h()’s environment consists of whatever objects are defined at the time h() comes into existence; that is, at the time that this assignment is executed:

h <- function() {
   return(d*(w+y))
}

(If f() is called multiple times, h() will come into existence multiple times, going out of existence each time f() returns.)

What, then, will be in h()’s environment? Well, at the time h() is created, there are the objects d and y created within f(), plus f()’s environment (w). In other words, if one function is defined within another, then that inner function’s environment consists of the environment of the outer one, plus whatever locals have been created so far within the outer one. With multiple nesting of functions, you have a nested sequence of larger and larger environments, with the “root” consisting of the top-level objects.

Let’s try out the code:

> f(2)
[1] 112

What happened? The call f(2) resulted in setting the local variable d to 8, followed by the call h(). The latter evaluated d*(w+y)—that is, 8*(12+2)—giving us 112.

Note carefully the role of w. The R interpreter found that there was no local variable of that name, so it ascended to the next higher level—in this case, the top level—where it found a variable w with value 12.

Keep in mind that h() is local to f() and invisible at the top level.

> h
Error: object'h' not found

It’s possible (though not desirable) to deliberately allow name conflicts in this hierarchy. In our example, for instance, we could have a local variable d within h(), conflicting with the one in f(). In such a situation, the innermost environment is used first. In this case, a reference to d within h() would refer to h()’s d, not f()’s.

Environments created by inheritance in this manner are generally referred to by their memory locations. Here is what happened after adding a print statement to f() (using edit(), not shown here) and then running the code:

> f
function(y) {
   d <- 8
   h <- function() {
      return(d*(w+y))
   }
   print(environment(h))
   return(h())
}
> f(2)
<environment: 0x875753c>
[1] 112

Compare all this to the situation in which the functions are not nested:

> f
function(y) {
   d <- 8
   return(h())
}


> h
function() {
   return(d*(w+y))
}

The result is as follows:

> f(5)
Error in h() : object 'd' not found

This does not work, as d is no longer in the environment of h(), because h() is defined at the top level. Thus, an error is generated.

Worse, if by happenstance there had been some unrelated variable d in the top-level environment, we would not get an error message but instead would have incorrect results.

You might wonder why R didn’t complain about the lack of y in the alternate definition of h() in the preceding example. As mentioned earlier, R doesn’t evaluate a variable until it needs it under a policy called lazy evaluation. In this case, R had already encountered an error with d and thus never got to the point where it would try to evaluate y.

The fix is to pass d and y as arguments:

> f
function(y) {
   d <- 8
   return(h(d,y))
}
> h
function(dee,yyy) {
   return(dee*(w+yyy))
}
> f(2)
[1] 88

Okay, let’s look at one last variation:

> f
function(y,ftn) {
   d <- 8
   print(environment(ftn))
   return(ftn(d,y))
}
> h
function(dee,yyy) {
   return(dee*(w+yyy))
}


> w <- 12
> f(3,h)
<environment: R_GlobalEnv>
[1] 120

When f() executed, the formal argument ftn was matched by the actual argument h. Since arguments are treated as locals, you might guess that ftn could have a different environment than top level. But as discussed, a closure includes environment, and thus ftn has h’s environment.

Note carefully that all the examples so far involving nonlocal variables are for reads, not writes. The case of writes is crucial, and it will be covered in Section 7.8.1.

More on ls()

Without arguments, a call to ls() from within a function returns the names of the current local variables (including arguments). With the envir argument, it will print the names of the locals of any frame in the call chain.

Here’s an example:

> f
function(y) {
   d <- 8
   return(h(d,y))
}
> h
function(dee,yyy) {
   print(ls())
   print(ls(envir=parent.frame(n=1)))
   return(dee*(w+yyy))
}



> f(2)
[1] "dee" "yyy"
[1] "d" "y"
[1] 112

With parent.frame(), the argument n specifies how many frames to go up in the call chain. Here, we were in the midst of executing h(), which had been called from f(), so specifying n = 1 gives us f()’s frame, and thus we get its locals.

Functions Have (Almost) No Side Effects

Yet another influence of the functional programming philosophy is that functions do not change nonlocal variables; that is, generally, there are no side effects. Roughly speaking, the code in a function has read access to its nonlocal variables, but it does not have write access to them. Our code can appear to reassign those variables, but the action will affect only copies, not the variables themselves. Let’s demonstrate this by adding some more code to our previous example.

> w <- 12
> f
function(y) {
   d <- 8
   w <- w + 1
   y <- y - 2
   print(w)
   h <- function() {
      return(d*(w+y))
   }
   return(h())
}
> t <- 4
> f(t)
[1] 13
[1] 120
> w
[1] 12
> t
[1] 4

So, w at the top level did not change, even though it appeared to change within f(). Only a local copy of w, within f(), changed. Similarly, the top-level variable t didn’t change, even though its associated formal argument y did change.

Note

More precisely, references to the local w actually go to the same memory location as the global one, until the value of the local changes. In that case, a new memory location is used.

An important exception to this read-only nature of globals arises with the superassignment operator, which we’ll discuss later in Section 7.8.1.

Extended Example: A Function to Display the Contents of a Call Frame

In single-stepping through your code in a debugging setting, you often want to know the values of the local variables in your current function. You may also want to know the values of the locals in the parent function—that is, the one from which the current function was called. Here, we will develop code to display these values, thereby further demonstrating access to the environment hierarchy. (The code is adapted from my edtdbg debugging tool in R’s CRAN code repository.)

For example, consider the following code:

f <- function() {
   a <- 1
   return(g(a)+a)
}

g <- function(aa) {
   b <- 2
   aab <- h(aa+b)
   return(aab)
}

h <- function(aaa) {
   c <- 3
   return(aaa+c)
}

When we call f(), it in turn calls g(), which then calls h(). In the debugging setting, say we are currently about to execute the return() within g(). We want to know the values of the local variables of the current function, say the variables aa, b, and aab. And while we’re in g(), we also wish to know the values of the locals in f() at the time of the call to g(), as well as the values of the global variables. Our function showframe() will do all this.

The showframe() function has one argument, upn, which is the number of frames to go up the call stack. A negative value of the argument signals that we want to view the globals—the top-level variables.

Here’s the code:

# shows the values of the local variables (including arguments) of the
# frame upn frames above the one from which showframe() is called; if
# upn < 0, the globals are shown; function objects are not shown
showframe <- function(upn) {
   # determine the proper environment
   if (upn < 0) {
      env <- .GlobalEnv
   } else {
      env <- parent.frame(n=upn+1)
   }
   # get the list of variable names
   vars <- ls(envir=env)
   # for each variable name, print its value
   for (vr in vars) {
      vrg <- get(vr,envir=env)
      if (!is.function(vrg)) {
         cat(vr,":\n",sep="")
         print(vrg)
      }
   }
}

Let’s try it out. Insert some calls into g():

> g
function(aa) {
   b <- 2
   showframe(0)
   showframe(1)
   aab <- h(aa+b)
   return(aab)
}

Now run it:

> f()
aa:
[1] 1
b:
[1] 2
a:
[1] 1

To see how this works, we’ll first look at the get() function, one of the most useful utilities in R. Its job is quite simple: Given the name of an object, it fetches the object itself. Here’s an example:

> m <- rbind(1:3,20:22)
> m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]   20   21   22
> get("m")
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]   20   21   22

This example with m involves the current call frame, but in our showframe() function, we deal with various levels in the environment hierarchy. So, we need to specify the level via the envir argument of get():

vrg <- get(vr,envir=env)

The level itself is determined largely by calling parent.frame():

if (upn < 0) {
   env <- .GlobalEnv
} else {
   env <- parent.frame(n=upn+1)
}

Note that ls() can also be called in the context of a particular level, thus enabling you to determine which variables exist at the level of interest and then inspect them. Here’s an example:

vars <- ls(envir=env)
for (vr in vars) {

This code picks up the names of all the local variables in the given frame and then loops through them, setting things up for get() to do its work.