Programmers often find that they spend more time debugging a program than actually writing it. Good debugging skills are invaluable. In this chapter, we’ll discuss debugging in R.
Beware of bugs in the above code; I have only proved it correct, not tried it. | ||
--Donald Knuth, pioneer of computer science |
Though debugging is an art rather than a science, it involves some fundamental principles. Here, we’ll look at some debugging best practices.
As Pete Salzman and I said in our book on debugging, The Art of Debugging, with GDB, DDD, and Eclipse (No Starch Press, 2008), the principle of confirmation is the essence of debugging.
Fixing a buggy program is a process of confirming, one by one, that the many things you believe to be true about the code actually are true. When you find that one of your assumptions is not true, you have found a clue to the location (if not the exact nature) of a bug.
Another way of saying this is, “Surprises are good!” For example, say you have the following code:
x <- y^2 + 3*g(z,2) w <- 28 if (w+q > 0) u <- 1 else v <- 10
Do you think the value of your variable x
should be 3 after x
is assigned? Confirm it! Do you think the else
will be executed, not the if
on that third line? Confirm it!
Eventually, one of these assertions that you are so sure of will turn out to not confirm. Then you will have pinpointed the likely location of the error, thus enabling you to focus on the nature of the error.
At least at the beginning of the debugging process, stick to small, simple test cases. Working with large data objects may make it harder to think about the problem.
Of course, you should eventually test your code on large, complicated cases, but start small.
Most good software developers agree that code should be written in a modular manner. Your first-level code should not be longer than, say, a dozen lines, with much of it consisting of function calls. And those functions should not be too lengthy and should call other functions if necessary. This makes the code easier to organize during the writing stage and easier for others to understand when it comes time for the code to be extended.
You should debug in a top-down manner, too. Suppose that you have set the debug status of your function f()
(that is, you have called debug(f)
, to be explained shortly) and f()
contains this line:
y <- g(x,8)
You should take an “innocent until proven guilty” approach to g()
. Do not call debug(g)
yet. Execute that line and see if g()
returns the value you expect. If it does, then you’ve just avoided the time-consuming process of single-stepping through g()
. If g()
returns the wrong value, then now is the time to call debug(g)
.
You may adopt some “antibugging” strategies as well. Suppose you have a section of code in which a variable x
should be positive. You could insert this line:
stopifnot(x > 0)
If there is a bug earlier in the code that renders x
equal to, say, −12, the call to stopifnot()
will bring things to a halt right there, with an error message like this:
Error: x > 0 is not TRUE
(C programmers may notice the similarity to C’s assert
statement.)
After fixing a bug and testing the new code, you might want to keep that code handy so you can check later that the bug did not somehow reappear.