Using R Debugging Facilities

The R base package includes a number of debugging facilities, and more functional debugging packages are also available. We’ll discuss both the base facilities and other packages, and our extended example will present a fully detailed debugging session.

The core of R’s debugging facility consists of the browser. It allows you to single-step through your code, line by line, taking a look around as you go. You can invoke the browser through a call to either the debug() or browser() function.

R’s debugging facility is specific to individual functions. If you believe there is a bug in your function f(), you can make the call debug(f) to set the debug status for the function f(). This means that from that point onward, each time you call the function, you will automatically enter the browser at the beginning of the function. Calling undebug(f) will unset the debug status of the function so that entry to the function will no longer invoke the browser.

On the other hand, if you place a call to browser() at some line within f(), the browser will be invoked only when execution reaches that line. You then can single-step through your code until you exit the function. If you believe the bug’s location is not near the beginning of the function, you probably don’t want to be single-stepping from the beginning, so this approach is more direct.

Readers who have used C debuggers such as GDB (the GNU debugger) will find similarity here, but some aspects will come as a surprise. As noted, for instance, debug() is called on the function level, not on the overall program level. If you believe you have bugs in several of your functions, you’ll need to call debug() on each one.

It can become tedious to call debug(f) and then undebug(f) when you just want to go through one debugging session for f(). Starting with R 2.10, one can now call debugonce() instead; calling debugonce(f) puts f() into debugging status the first time you execute it, but that status is reversed immediately upon exit from the function.

While you are in the browser, the prompt changes from > to Browse[d]>. (Here, d is the depth of the call chain.) You may submit any of the following commands at that prompt:

Calling debug(f) places a call to browser() at the beginning of f(). However, this may be too coarse a tool in some cases. If you suspect that the bug is in the middle of the function, it’s wasteful to trudge through all the intervening code.

The solution is to set breakpoints at certain key locations of your code—places where you want execution to be paused. How can this be done in R? You can call browser directly or use the setBreakpoint() function (with R version 2.10 and later).

The trace() function is flexible and powerful, though it takes some initial effort to learn. We will discuss some of the simpler usage forms here, beginning with the following:

> trace(f,t)

This call instructs R to call the function t() every time we enter the function f(). For instance, say we wish to set a breakpoint at the beginning of the function gy(). We could use this command:

> trace(gy,browser)

This has the same effect as placing the command browser() in our source code for gy(), but it’s quicker and more convenient than inserting such a line, saving the file, and rerunning source() to load in the new version of the file. Calling trace() does not change your source file, though it does change a temporary version of your file maintained by R. It would also be quicker and more convenient to undo, by simply running untrace:

> untrace(gy)

You can turn tracing on or off globally by calling tracingState(), using the argument TRUE to turn it on or FALSE to turn it off.

Say your R code crashes when you are not running the debugger. There is still a debugging tool available to you after the fact. You can do a “postmortem” by simply calling traceback(). It will tell you in which function the problem occurred and the call chain that led to that function.

You can get a lot more information if you set up R to dump frames in the event of a crash:

> options(error=dump.frames)

If you’ve done this, then after a crash, run this command:

> debugger()

You will then be presented with a choice of levels of function calls to view. For each one that you choose, you can take a look at the values of the variables there. After browsing through one level, you can return to the debugger() main menu by hitting N.

You can arrange to automatically enter the debugger by writing this code:

> options(error=recover)

Note, though, that if you do choose this automatic route, it will whisk you into the debugger, even if you simply have a syntax error (not a useful time to enter the debugger).

To turn off any of this behavior, type the following:

> options(error=NULL)

You’ll see a demonstration of this approach in the next section.

Now that we’ve looked at R’s debugging tools, let’s try using them to find and fix code problems. We’ll begin with a simple example and then move on to a more complicated one.

First recall our extended example of finding runs of 1s in Chapter 2. Here is a buggy version of the code:

1    findruns <- function(x,k) {
2       n <- length(x)
3       runs <- NULL
4       for (i in 1:(n-k)) {
5          if (all(x[i:i+k-1]==1)) runs <- c(runs,i)
6       }
7       return(runs)
8    }

Let’s try it on a small test case:

> source("findruns.R")
> findruns(c(1,0,0,1,1,0,1,1,1),2)
[1] 3 4 6 7

The function was supposed to report runs at indices 4, 7, and 8, but it found some indices that it shouldn’t have and missed some as well. Something is wrong. Let’s enter the debugger and take a look around.

> debug(findruns)
> findruns(c(1,0,0,1,1,0,1,1,1),2)
debugging in: findruns(c(1, 0, 0, 1, 1, 0, 1, 1, 1), 2)
debug at findruns.R#1: {
    n <- length(x)
    runs <- NULL
    for (i in 1:(n - k)) {
        if (all(x[i:i + k - 1] == 1))
            runs <- c(runs, i)
    }
    return(runs)
}
attr(,"srcfile")
findruns.R

So, according to the principle of confirmation, let’s first make sure our test vector was received correctly:

Browse[2]> x
[1] 1 0 0 1 1 0 1 1 1

So far, so good. Let’s step through the code a bit. We hit n a couple of times to single-step through the code.

Browse[2]> n
debug at findruns.R#2: n <- length(x)
Browse[2]> n
debug at findruns.R#3: runs <- NULL
Browse[2]> print(n)
[1] 9

Note that after each single step, R tells us which statement would be the next one to execute. In other words, at the time we executed print(n), we had not yet executed the assignment of NULL to runs.

Note, too, that although normally you can print out the value of a variable by simply typing its name, we could not do so here for our variable n, because n is also the abbreviation for the debugger’s next command. Thus, we needed print().

At any rate, we found that the length of our test vector was 9, confirming what we knew. Now, let’s single-step some more, getting into the loop.

Browse[2]> n
debug at findruns.R#4: for (i in 1:(n - k + 1)) {
    if (all(x[i:i + k - 1] == 1))
        runs <- c(runs, i)
}
Browse[2]> n
debug at findruns.R#4: i
Browse[2]> n
debug at findruns.R#5: if (all(x[i:i + k - 1] == 1)) runs <- c(runs, i)

Since k is 2—that is, we are checking for runs of length 2— the if() statement should be checking the first two elements of x, which are (1,0). Let’s confirm:

Browse[2]> x[i:i + k - 1]
[1] 0

So, it did not confirm. Let’s check that we have the correct subscript range, which should be 1:2. Is it?

Browse[2]> i:i + k - 1
[1] 2

Also wrong. Well, how about i and k? They should be 1 and 2, respectively. Are they?

Browse[2]> i
[1] 1
Browse[2]> k
[1] 2

Well, those do confirm. Thus, our problem must be with the expression i:i + k - 1. After some thought, we realize there is an operator precedence problem there, and we correct it to i:(i + k - 1).

Is it okay now?

> source("findruns.R")
> findruns(c(1,0,0,1,1,0,1,1,1),2)
[1] 4 7

No, as mentioned, it should be (4,7,8).

Let’s set a breakpoint inside the loop and take a closer look.

> setBreakpoint("findruns.R",5)
/home/nm/findruns.R#5:
 findruns step 4,4,2 in <environment: R_GlobalEnv>
> findruns(c(1,0,0,1,1,0,1,1,1),2)
findruns.R#5
Called from: eval(expr, envir, enclos)
Browse[1]> x[i:(i+k-1)]
[1] 1 0

Good, we’re dealing with the first two elements of the vector, so our bug fix is working so far. Let’s look at the second iteration of the loop.

Browse[1]> c
findruns.R#5
Called from: eval(expr, envir, enclos)
Browse[1]> i
[1] 2
Browse[1]> x[i:(i+k-1)]
[1] 0 0

That’s right, too. We could go another iteration, but instead, let’s look at the last iteration, a place where bugs frequently arise in loops. So, let’s add a conditional breakpoint, as follows:

findruns <- function(x,k) {
   n <- length(x)
   runs <- NULL
   for (i in 1:(n-k)) {
      if (all(x[i:(i+k-1)]==1)) runs <- c(runs,i)
      if (i == n-k) browser()  # break in last iteration of loop
   }
   return(runs)
}

And now run it again.

> source("findruns.R")
> findruns(c(1,0,0,1,1,0,1,1,1),2)
Called from: findruns(c(1, 0, 0, 1, 1, 0, 1, 1, 1), 2)
Browse[1]> i
[1] 7

This shows the last iteration was for i = 7. But the vector is nine elements long, and k = 2, so our last iteration should be i = 8. Some thought then reveals that the range in the loop should have been written as follows:

for (i in 1:(n-k+1)) {

By the way, note that the breakpoint that we set using setBreakpoint() is no longer valid, now that we’ve replaced the old version of the object findruns.

Subsequent testing (not shown here) indicates the code now works. Let’s move on to a more complex example.

Recall our code in Section 3.4.2, which found the pair of cities with the closest distance between them. Here is a buggy version of that code:

1    returns the minimum value of d[i,j], i != j, and the row/col attaining
2    that minimum, for square symmetric matrix d; no special policy on
3    ties;
4    motivated by distance matrices
5    mind <- function(d) {
6       n <- nrow(d)
7       add a column to identify row number for apply()
8       dd <- cbind(d,1:n)
9       wmins <- apply(dd[-n,],1,imin)
10       wmins will be 2xn, 1st row being indices and 2nd being values
11       i <- which.min(wmins[1,])
12       j <- wmins[2,i]
13       return(c(d[i,j],i,j))
14    }
15
16    finds the location, value of the minimum in a row x
17    imin <- function(x) {
18       n <- length(x)
19       i <- x[n]
20       j <- which.min(x[(i+1):(n-1)])
21       return(c(j,x[j]))
22    }

Let’s use R’s debugging tools to find and fix the problems.

We’ll run it first on a small test case:

> source("cities.R")
> m <- rbind(c(0,12,5),c(12,0,8),c(5,8,0))
> m
     [,1] [,2] [,3]
[1,]    0   12    5
[2,]   12    0    8
[3,]    5    8    0
> mind(m)
Error in mind(m) : subscript out of bounds

Not an auspicious start! Unfortunately, the error message doesn’t tell us where the code blew up. But the debugger will give us that information:

> options(error=recover)
> mind(m)
Error in mind(m) : subscript out of bounds

Enter a frame number, or 0 to exit

1: mind(m)

Selection: 1
Called from: eval(expr, envir, enclos)


Browse[1]> where
where 1: eval(expr, envir, enclos)
where 2: eval(quote(browser()), envir = sys.frame(which))
where 3 at cities.R#13: function ()
{
    if (.isMethodsDispatchOn()) {
        tState <- tracingState(FALSE)
...

Okay, so the problem occurred in mind() rather than imin() and in particular at line 13. It still could be the fault of imin(), but for now, let’s deal with the former.

Note

There is another way we could have determined that the blowup occurred on line 13. We would enter the debugger as before but probe the local variables. We could reason that if the subscript bounds error had occurred at line 9, then the variable wmins would not have been set, so querying it would give us an error message like Error: object 'wmins' not found. On the other hand, if the blowup occurred on line 13, even j would have been set.

Since the error occurred with d[i,j], let’s look at those variables:

Browse[1]> d
     [,1] [,2] [,3]
[1,]    0   12    5
[2,]   12    0    8
[3,]    5    8    0
Browse[1]> i
[1] 2
Browse[1]> j
[1] 12

This is indeed a problem—d only has three columns, yet j, a column subscript, is 12.

Let’s look at the variable from which we gleaned j, wmins:

Browse[1]> wmins
     [,1] [,2]
[1,]    2    1
[2,]   12   12

If you recall how the code was designed, column k of wmins is supposed to contain information about the minimum value in row k of d. So here wmins is saying that in the first row (k = 1) of d,(0,12,5), the minimum value is 12, occurring at index 2. But it should be 5 at index 3. So, something went wrong with this line:

wmins <- apply(dd[-n, ], 1, imin)

There are several possibilities here. But since ultimately imin() is called, we can check them all from within that function. So, let’s set the debug status of imin(), quit the debugger, and rerun the code.

Browse[1]> Q
> debug(imin)
> mind(m)
debugging in: FUN(newX[, i], ...)
debug at cities.R#17: {
    n <- length(x)
    i <- x[n]
    j <- which.min(x[(i + 1):(n - 1)])
    return(c(j, x[j]))
}
...

So, we’re in imin(). Let’s see if it properly received the first row of dd, which should be (0,12,5,1).

Browse[4]> x
[1]  0 12  5  1

It’s confirmed. This seems to indicate that the first two arguments to apply() were correct and that the problem is instead within imin(), though that remains to be seen.

Let’s single-step through, occasionally typing confirmational queries:

Browse[2]> n
debug at cities.r#17: n <- length(x)
Browse[2]> n
debug at cities.r#18: i <- x[n]
Browse[2]> n
debug at cities.r#19: j <- which.min(x[(i + 1):(n - 1)])
Browse[2]> n
debug at cities.r#20: return(c(j, x[j]))
Browse[2]> print(n)
[1] 4
Browse[2]> i
[1] 1
Browse[2]> j
[1] 2

Recall that we designed our call which.min(x[(i + 1):(n - 1)] to look only at the above-diagonal portion of this row. This is because the matrix is symmetric and because we don’t want to consider the distance between a city and itself.

But the value j = 2 does not confirm. The minimum value in (0,12,5) is 5, which occurs at index 3 of that vector, not index 2. Thus, the problem is in this line:

j <- which.min(x[(i + 1):(n - 1)])

What could be wrong?

After taking a break, we realize that although the minimum value of (0,12,5) occurs at index 3 of that vector, that is not what we asked which.min() to find for us. Instead, that i + 1 term means we asked for the index of the minimum in (12,5), which is 2.

We did ask which.min() for the correct information, but we failed to use it correctly, because we do want the index of the minimum in (0,12,5). We need to adjust the output of which.min() accordingly, as follows:

j <- which.min(x[(i+1):(n-1)])
k <- i + j
return(c(k,x[k]))

We make the fix and try again.

> mind(m)
Error in mind(m) : subscript out of bounds

Enter a frame number, or 0 to exit

1: mind(m)

Selection:

Oh no, another bounds error! To see where the blowup occurred this time, we issue the where command as before, and we find it was at line 13 again. What about i and j now?

Browse[1]> i
[1] 1
Browse[1]> j
[1] 5

The value of j is still wrong; it cannot be larger than 3, as we have only three columns in this matrix. On the other hand, i is correct. The overall minimum value in dd is 5, occurring in row 1, column 3.

So, let’s check the source of j again, the matrix wmins:

Browse[1]> wmins
     [,1] [,2]
[1,]    3    3
[2,]    5    8

Well, there are the 3 and 5 in column 1, just as should be the case. Remember, column 1 here contains the information for row 1 in d, so wmins is saying that the minimum value in row 1 is 5, occurring at index 3 of that row, which is correct.

After taking another break, though, we realize that while wmins is correct, our use of it isn’t. We have the rows and columns of that matrix mixed up. This code:

i <- which.min(wmins[1,])
j <- wmins[2,i]

should be like this:

i <- which.min(wmins[2,])
j <- wmins[1,i]

After making that change and resourcing our file, we try it out.

> mind(m)
[1] 5 1 3

This is correct, and subsequent tests with larger matrices worked, too.