Using R Debugging Facilities

Calling the browser directly, rather than entering the debugger via debug() is very useful in situations in which you have a loop with many iterations and the bug surfaces only after, say, the 50th iteration. If the loop index is i, then you could write this:

if (i > 49) browser()

That way, you would avoid the tedium of stepping through the first 49 iterations!

Using the setBreakpoint() Function

Starting with R 2.10, you can use setBreakpoint() in the format

setBreakpoint((*@filename,linenumber@*))

This will result in browser() being called at line linenumber of our source file filename.

This is especially useful when you are in the midst of using the debugger, single-stepping through code. Say you are currently at line 12 of your source file x.R and want to have a breakpoint at line 28. Instead of exiting the debugger, adding a call to browser() at line 28, and then re-entering the function, you could simply type this:

> setBreakpoint("x.R",28)

You could then resume execution within the debugger, say by issuing the c command.

The setBreakpoint() function works by calling the trace() function, discussed in the next section. Thus, to cancel the breakpoint, you cancel the trace. For instance, if we had called setBreakpoint() at a line in the function g(), we would cancel the breakpoint by typing the following:

> untrace(g)

You can call setBreakpoint() whether or not you are currently in the debugger. If you are not currently running the debugger and you execute the affected function and hit the breakpoint during that execution, you will be put into the browser automatically. This is similar to the case of browser(), but using this approach, you save yourself the trouble of changing your code via your text editor.

Tracking with the trace() Function

The trace() function is flexible and powerful, though it takes some initial effort to learn. We will discuss some of the simpler usage forms here, beginning with the following:

> trace(f,t)

This call instructs R to call the function t() every time we enter the function f(). For instance, say we wish to set a breakpoint at the beginning of the function gy(). We could use this command:

> trace(gy,browser)

This has the same effect as placing the command browser() in our source code for gy(), but it’s quicker and more convenient than inserting such a line, saving the file, and rerunning source() to load in the new version of the file. Calling trace() does not change your source file, though it does change a temporary version of your file maintained by R. It would also be quicker and more convenient to undo, by simply running untrace:

> untrace(gy)

You can turn tracing on or off globally by calling tracingState(), using the argument TRUE to turn it on or FALSE to turn it off.

Performing Checks After a Crash with the traceback() and debugger() Function

Say your R code crashes when you are not running the debugger. There is still a debugging tool available to you after the fact. You can do a “postmortem” by simply calling traceback(). It will tell you in which function the problem occurred and the call chain that led to that function.

You can get a lot more information if you set up R to dump frames in the event of a crash:

> options(error=dump.frames)

If you’ve done this, then after a crash, run this command:

> debugger()

You will then be presented with a choice of levels of function calls to view. For each one that you choose, you can take a look at the values of the variables there. After browsing through one level, you can return to the debugger() main menu by hitting N.

You can arrange to automatically enter the debugger by writing this code:

> options(error=recover)

Note, though, that if you do choose this automatic route, it will whisk you into the debugger, even if you simply have a syntax error (not a useful time to enter the debugger).

To turn off any of this behavior, type the following:

> options(error=NULL)

You’ll see a demonstration of this approach in the next section.

Extended Example: Two Full Debugging Sessions

Now that we’ve looked at R’s debugging tools, let’s try using them to find and fix code problems. We’ll begin with a simple example and then move on to a more complicated one.

Debugging Finding Runs of Ones

First recall our extended example of finding runs of 1s in Chapter 2. Here is a buggy version of the code:

1    findruns <- function(x,k) {
2       n <- length(x)
3       runs <- NULL
4       for (i in 1:(n-k)) {
5          if (all(x[i:i+k-1]==1)) runs <- c(runs,i)
6       }
7       return(runs)
8    }

Let’s try it on a small test case:

> source("findruns.R")
> findruns(c(1,0,0,1,1,0,1,1,1),2)
[1] 3 4 6 7

The function was supposed to report runs at indices 4, 7, and 8, but it found some indices that it shouldn’t have and missed some as well. Something is wrong. Let’s enter the debugger and take a look around.

> debug(findruns)
> findruns(c(1,0,0,1,1,0,1,1,1),2)
debugging in: findruns(c(1, 0, 0, 1, 1, 0, 1, 1, 1), 2)
debug at findruns.R#1: {
    n <- length(x)
    runs <- NULL
    for (i in 1:(n - k)) {
        if (all(x[i:i + k - 1] == 1))
            runs <- c(runs, i)
    }
    return(runs)
}
attr(,"srcfile")
findruns.R

So, according to the principle of confirmation, let’s first make sure our test vector was received correctly:

Browse[2]> x
[1] 1 0 0 1 1 0 1 1 1

So far, so good. Let’s step through the code a bit. We hit n a couple of times to single-step through the code.

Browse[2]> n
debug at findruns.R#2: n <- length(x)
Browse[2]> n
debug at findruns.R#3: runs <- NULL
Browse[2]> print(n)
[1] 9

Note that after each single step, R tells us which statement would be the next one to execute. In other words, at the time we executed print(n), we had not yet executed the assignment of NULL to runs.

Note, too, that although normally you can print out the value of a variable by simply typing its name, we could not do so here for our variable n, because n is also the abbreviation for the debugger’s next command. Thus, we needed print().

At any rate, we found that the length of our test vector was 9, confirming what we knew. Now, let’s single-step some more, getting into the loop.

Browse[2]> n
debug at findruns.R#4: for (i in 1:(n - k + 1)) {
    if (all(x[i:i + k - 1] == 1))
        runs <- c(runs, i)
}
Browse[2]> n
debug at findruns.R#4: i
Browse[2]> n
debug at findruns.R#5: if (all(x[i:i + k - 1] == 1)) runs <- c(runs, i)

Since k is 2—that is, we are checking for runs of length 2— the if() statement should be checking the first two elements of x, which are (1,0). Let’s confirm:

Browse[2]> x[i:i + k - 1]
[1] 0

So, it did not confirm. Let’s check that we have the correct subscript range, which should be 1:2. Is it?

Browse[2]> i:i + k - 1
[1] 2

Also wrong. Well, how about i and k? They should be 1 and 2, respectively. Are they?

Browse[2]> i
[1] 1
Browse[2]> k
[1] 2

Well, those do confirm. Thus, our problem must be with the expression i:i + k - 1. After some thought, we realize there is an operator precedence problem there, and we correct it to i:(i + k - 1).

Is it okay now?

> source("findruns.R")
> findruns(c(1,0,0,1,1,0,1,1,1),2)
[1] 4 7

No, as mentioned, it should be (4,7,8).

Let’s set a breakpoint inside the loop and take a closer look.

> setBreakpoint("findruns.R",5)
/home/nm/findruns.R#5:
 findruns step 4,4,2 in <environment: R_GlobalEnv>
> findruns(c(1,0,0,1,1,0,1,1,1),2)
findruns.R#5
Called from: eval(expr, envir, enclos)
Browse[1]> x[i:(i+k-1)]
[1] 1 0

Good, we’re dealing with the first two elements of the vector, so our bug fix is working so far. Let’s look at the second iteration of the loop.

Browse[1]> c
findruns.R#5
Called from: eval(expr, envir, enclos)
Browse[1]> i
[1] 2
Browse[1]> x[i:(i+k-1)]
[1] 0 0

That’s right, too. We could go another iteration, but instead, let’s look at the last iteration, a place where bugs frequently arise in loops. So, let’s add a conditional breakpoint, as follows:

findruns <- function(x,k) {
   n <- length(x)
   runs <- NULL
   for (i in 1:(n-k)) {
      if (all(x[i:(i+k-1)]==1)) runs <- c(runs,i)
      if (i == n-k) browser()  # break in last iteration of loop
   }
   return(runs)
}

And now run it again.

> source("findruns.R")
> findruns(c(1,0,0,1,1,0,1,1,1),2)
Called from: findruns(c(1, 0, 0, 1, 1, 0, 1, 1, 1), 2)
Browse[1]> i
[1] 7

This shows the last iteration was for i = 7. But the vector is nine elements long, and k = 2, so our last iteration should be i = 8. Some thought then reveals that the range in the loop should have been written as follows:

for (i in 1:(n-k+1)) {

By the way, note that the breakpoint that we set using setBreakpoint() is no longer valid, now that we’ve replaced the old version of the object findruns.

Subsequent testing (not shown here) indicates the code now works. Let’s move on to a more complex example.

Debugging Finding City Pairs

Recall our code in Section 3.4.2, which found the pair of cities with the closest distance between them. Here is a buggy version of that code:

1    returns the minimum value of d[i,j], i != j, and the row/col attaining
2    that minimum, for square symmetric matrix d; no special policy on
3    ties;
4    motivated by distance matrices
5    mind <- function(d) {
6       n <- nrow(d)
7       add a column to identify row number for apply()
8       dd <- cbind(d,1:n)
9       wmins <- apply(dd[-n,],1,imin)
10       wmins will be 2xn, 1st row being indices and 2nd being values
11       i <- which.min(wmins[1,])
12       j <- wmins[2,i]
13       return(c(d[i,j],i,j))
14    }
15
16    finds the location, value of the minimum in a row x
17    imin <- function(x) {
18       n <- length(x)
19       i <- x[n]
20       j <- which.min(x[(i+1):(n-1)])
21       return(c(j,x[j]))
22    }

Let’s use R’s debugging tools to find and fix the problems.

We’ll run it first on a small test case:

> source("cities.R")
> m <- rbind(c(0,12,5),c(12,0,8),c(5,8,0))
> m
     [,1] [,2] [,3]
[1,]    0   12    5
[2,]   12    0    8
[3,]    5    8    0
> mind(m)
Error in mind(m) : subscript out of bounds

Not an auspicious start! Unfortunately, the error message doesn’t tell us where the code blew up. But the debugger will give us that information:

> options(error=recover)
> mind(m)
Error in mind(m) : subscript out of bounds

Enter a frame number, or 0 to exit

1: mind(m)

Selection: 1
Called from: eval(expr, envir, enclos)


Browse[1]> where
where 1: eval(expr, envir, enclos)
where 2: eval(quote(browser()), envir = sys.frame(which))
where 3 at cities.R#13: function ()
{
    if (.isMethodsDispatchOn()) {
        tState <- tracingState(FALSE)
...

Okay, so the problem occurred in mind() rather than imin() and in particular at line 13. It still could be the fault of imin(), but for now, let’s deal with the former.

Note

There is another way we could have determined that the blowup occurred on line 13. We would enter the debugger as before but probe the local variables. We could reason that if the subscript bounds error had occurred at line 9, then the variable wmins would not have been set, so querying it would give us an error message like Error: object 'wmins' not found. On the other hand, if the blowup occurred on line 13, even j would have been set.

Since the error occurred with d[i,j], let’s look at those variables:

Browse[1]> d
     [,1] [,2] [,3]
[1,]    0   12    5
[2,]   12    0    8
[3,]    5    8    0
Browse[1]> i
[1] 2
Browse[1]> j
[1] 12

This is indeed a problem—d only has three columns, yet j, a column subscript, is 12.

Let’s look at the variable from which we gleaned j, wmins:

Browse[1]> wmins
     [,1] [,2]
[1,]    2    1
[2,]   12   12

If you recall how the code was designed, column k of wmins is supposed to contain information about the minimum value in row k of d. So here wmins is saying that in the first row (k = 1) of d,(0,12,5), the minimum value is 12, occurring at index 2. But it should be 5 at index 3. So, something went wrong with this line:

wmins <- apply(dd[-n, ], 1, imin)

There are several possibilities here. But since ultimately imin() is called, we can check them all from within that function. So, let’s set the debug status of imin(), quit the debugger, and rerun the code.

Browse[1]> Q
> debug(imin)
> mind(m)
debugging in: FUN(newX[, i], ...)
debug at cities.R#17: {
    n <- length(x)
    i <- x[n]
    j <- which.min(x[(i + 1):(n - 1)])
    return(c(j, x[j]))
}
...

So, we’re in imin(). Let’s see if it properly received the first row of dd, which should be (0,12,5,1).

Browse[4]> x
[1]  0 12  5  1

It’s confirmed. This seems to indicate that the first two arguments to apply() were correct and that the problem is instead within imin(), though that remains to be seen.

Let’s single-step through, occasionally typing confirmational queries:

Browse[2]> n
debug at cities.r#17: n <- length(x)
Browse[2]> n
debug at cities.r#18: i <- x[n]
Browse[2]> n
debug at cities.r#19: j <- which.min(x[(i + 1):(n - 1)])
Browse[2]> n
debug at cities.r#20: return(c(j, x[j]))
Browse[2]> print(n)
[1] 4
Browse[2]> i
[1] 1
Browse[2]> j
[1] 2

Recall that we designed our call which.min(x[(i + 1):(n - 1)] to look only at the above-diagonal portion of this row. This is because the matrix is symmetric and because we don’t want to consider the distance between a city and itself.

But the value j = 2 does not confirm. The minimum value in (0,12,5) is 5, which occurs at index 3 of that vector, not index 2. Thus, the problem is in this line:

j <- which.min(x[(i + 1):(n - 1)])

What could be wrong?

After taking a break, we realize that although the minimum value of (0,12,5) occurs at index 3 of that vector, that is not what we asked which.min() to find for us. Instead, that i + 1 term means we asked for the index of the minimum in (12,5), which is 2.

We did ask which.min() for the correct information, but we failed to use it correctly, because we do want the index of the minimum in (0,12,5). We need to adjust the output of which.min() accordingly, as follows:

j <- which.min(x[(i+1):(n-1)])
k <- i + j
return(c(k,x[k]))

We make the fix and try again.

> mind(m)
Error in mind(m) : subscript out of bounds

Enter a frame number, or 0 to exit

1: mind(m)

Selection:

Oh no, another bounds error! To see where the blowup occurred this time, we issue the where command as before, and we find it was at line 13 again. What about i and j now?

Browse[1]> i
[1] 1
Browse[1]> j
[1] 5

The value of j is still wrong; it cannot be larger than 3, as we have only three columns in this matrix. On the other hand, i is correct. The overall minimum value in dd is 5, occurring in row 1, column 3.

So, let’s check the source of j again, the matrix wmins:

Browse[1]> wmins
     [,1] [,2]
[1,]    3    3
[2,]    5    8

Well, there are the 3 and 5 in column 1, just as should be the case. Remember, column 1 here contains the information for row 1 in d, so wmins is saying that the minimum value in row 1 is 5, occurring at index 3 of that row, which is correct.

After taking another break, though, we realize that while wmins is correct, our use of it isn’t. We have the rows and columns of that matrix mixed up. This code:

i <- which.min(wmins[1,])
j <- wmins[2,i]

should be like this:

i <- which.min(wmins[2,])
j <- wmins[1,i]

After making that change and resourcing our file, we try it out.

> mind(m)
[1] 5 1 3

This is correct, and subsequent tests with larger matrices worked, too.