Other Ways to Speed Up R

Sometimes you can cheat a little bit: you can make R run faster without tuning your code. This section shows two ways to do that.

The R Byte Code Compiler

Normally, R is an interpreted language.

But beginning in R 2.13.0, R has included a byte code compiler to speed up computations. As an example, let’s consider the vector of squares function that we used above:

> naive.vector.of.squares
function(n) {
  v <- 1:n
  for (i in 1:n)
    v[i] <- v[i]^2
}
> system.time(naive.vector.of.squares(1000000))
   user  system elapsed
  3.025   0.016   3.036

Now we’ll use the cmpfun function to create a compiled version of this function and then test its performance.

> library(compiler)
> compiled.naive.vector.of.squares <- cmpfun(naive.vector.of.squares)
> system.time(compiled.naive.vector.of.squares(1000000))
   user  system elapsed
  0.637   0.005   0.636

As you can see, the compiled version of this function runs much faster. Of course, it still runs more slowly than the vector operation:

> system.time(better.vector.of.squares(1000000))
   user  system elapsed
  0.008   0.000   0.008

And compiling the vector operation does not make a huge difference:

> better.vector.of.squares.compiled <- cmpfun(better.vector.of.squares)
> system.time(better.vector.of.squares.compiled(1000000))
   user  system elapsed
  0.007   0.000   0.007

But that doesn’t mean you shouldn’t try the compiler for your problem. It’s one of the simplest tricks for speeding up your code. (It’s even easier than ordering a new, faster server. And it’s cheaper.)

Manual compilation

Here’s a description of the compiler functions. To compile an R expression, use the compile function:

compile(e, env = .GlobalEnv, options = NULL)

If you have assigned a function to variable, you can use the cmpfun function as a shorthand:

cmpfun(f, options = NULL)

If you have a large amount of code to compile, you can store it in file and use cmpfile to compile everything at once:

cmpfile(infile, outfile, ascii = FALSE, env = .GlobalEnv,
        verbose = FALSE, options = NULL)

Each of these functions allows you to specify a list of options:

optimize: The level of optimization; the default is 2.
suppressAll: Disables printing messages; default is false.
suppressUndefined: Suppressed messages about undefined variables if set to TRUE. If set to a vector of character values, suppresses messages about the names of variables in the list. Default is c(".Generic", ".Method", ".Random.seed", ".self").

You can also set these options globally with the setCompilerOptions function, or find their current values with the getCompilerOption function. The argument level is an integer between 0 and 3 that describes how much compilation you would like:

0: Disables compilation
1: Compiles closures before first use
2: Compiles closures before first use, and closures before they are duplicated
3: Compiles closures before first use, closures before they are duplicated, and loops before they are executed

Inspecting byte code

Printing a compiled function will show the original R code and a reference to the byte code:

> compiled.naive.vector.of.squares
function(n) {
  v <- 1:n
  for (i in 1:n)
    v[i] <- v[i]^2
}
<bytecode: 0x117f7db90>

To see the byte code, you can use the disassemble function:

> disassemble(compiled.naive.vector.of.squares)
list(.Code, list(7L, GETBUILTIN.OP, 1L, PUSHCONSTARG.OP, 2L,
    GETVAR.OP, 3L, PUSHARG.OP, CALLBUILTIN.OP, 4L, SETVAR.OP,
    5L, POP.OP, GETBUILTIN.OP, 1L, PUSHCONSTARG.OP, 2L, GETVAR.OP,
    3L, PUSHARG.OP, CALLBUILTIN.OP, 4L, STARTFOR.OP, 7L, 6L,
    51L, GETVAR.OP, 5L, STARTSUBSET.OP, 8L, 35L, GETVAR_MISSOK.OP,
    6L, PUSHARG.OP, DFLTSUBSET.OP, LDCONST.OP, 9L, EXPT.OP, 10L,
    STARTASSIGN.OP, 5L, STARTSUBASSIGN.OP, 11L, 48L, GETVAR_MISSOK.OP,
    6L, PUSHARG.OP, DFLTSUBASSIGN.OP, ENDASSIGN.OP, 5L, POP.OP,
    STEPFOR.OP, 26L, ENDFOR.OP, INVISIBLE.OP, RETURN.OP), list(
    {
        v <- 1:n
        for (i in 1:n) v[i] <- v[i]^2
    }, `:`, 1, n, 1:n, v, i, for (i in 1:n) v[i] <- v[i]^2, v[i],
    2, v[i]^2, `[<-`(`*tmp*`, i, value = v[i]^2)))

Just-in-time compilation

If you want to compile all of your R code as you are using it, you can enable just-in-time compilation with the compiler package. To do this, execute the function enableJIT:

$ enableJIT(level)

The argument level is an integer between 0 and 3 that is described above. You can also set the environment variable R_ENABLE_JIT to your desired compilation level (1, 2, or 3) to enable the JIT for everything you do in R.

However, before you set the default to level 3 for all computation, you should remember two things. First, it takes time to compile code. For very simple operations on small data sets, it might take more time to compile your code than to execute it. Secondly, the compiler is still experimental. It’s possible that some code might execute differently after compilation, resulting in subtle and difficult-to-understand bugs. So make sure to use this feature carefully.

High-Performance R Binaries

On some platforms (like Mac OS X), R is compiled with high-quality math libraries. However, the default libraries on other platforms (like Windows) can be sluggish. If you’re working with large data sets or complicated mathematical operations, you might find it worthwhile to build an optimized version of R with better math libraries.

Revolution R

Revolution Computing is a software company that makes a high-performance version of R. It offers both free and commercial versions, including a 64-bit build of R for Windows. For the latest version, check out its website: http://www.revolution-computing.com/.

Revolution R looks a lot like the standard R binaries (although a little outdated; at the time I was writing this book, Revolution was shipping Revolution R 1.3.0 included R 2.7.2, while the current version from CRAN was 2.10.0). The key difference is the addition of improved math libraries. These are multithreaded and can take advantage of multiple cores when available. There are two helper functions included with Revolution R that can help you set and check the number of cores in use. To check the number of cores, use:

getMKLthreads()

Revolution R guesses the number of threads to use, but you can change the number yourself if it guesses wrong (or if you want to experiment). To set the number of cores explicitly, use:

setMKLthreads(n)

The help file suggests not setting the number of threads higher than the number of available cores.

Building your own

Building your own R can be useful if you want to compile it to run more efficiently. For example, you can compile a 64-bit version of R if you want to work with data sets that require much more than 4 GB of memory. This section explains how to build R yourself.

Building on Microsoft Windows

The easiest way to build your own R binaries on Microsoft Windows is to use the Rtools software. The R compilation process is very sensitive to the tools that you use. So the Rtools software bundles together a set of tools that are known to work correctly with R. Even if you plan to use your own compiler, math libraries, or other components, you should probably start with the standard toolkit and incrementally modify it. That will help you isolate problems in the build process.

Here is how to successfully build your own R binaries (and installer!) on Microsoft Windows:

Download the R source code from http://cran.r-project.org/src/base/.
Download the “Rtools” software from http://www.murdoch-sutherland.com/Rtools/.
Run the Rtools installer application. Follow the directions to install Rtools. You can select most default options, but I do not suggest installing all components at this stage. (The “Extras to build R” needs to be installed in the source code directory to be useful. However, we don’t install those until steps 4 and 5. Unfortunately, you need other tools from the RTools software in order to execute steps 4 and 5, so we can’t change the order of the steps to avoid running the installer twice.) As shown in Figure 24-1, you should select everything except “Extras to build R.” We’ll install that stuff later, so don’t throw out the tools installer yet. Also, if you use Cygwin, be sure to read the notes about conflicts with Cygwin DLLs (dynamic-link libraries). Be sure to select the option allowing Rtools to modify your PATH variable (or make sure to change it yourself).
Move the source code file to a build directory, open a command-line window (possibly with cmd), and change to the build directory. (Be sure to open the command shell after installing the Rtools and modifying your PATH. This will guarantee that the commands in the next few steps are available.)
Run the following command to unpack the source code into the directory R-2.9.2:
```
$ tar xvfz R-2.9.2.tar.gz
```
(Note that I used R-2.9.2.tar.gz. Change the command as needed for the R version you are installing.)
Rerun the Rtools setup program. This time, select only the “Extras to build R” component, and no other components. Install the components into the source code directory that you just unpacked. (For example, if you have installed R into C:\stuff\things, then select C:\stuff\things\R-2.9.2.)

Figure 24-1. Selecting components in Rtools

At this point, you may install several additional pieces of software:
1. (Optional) If you want to build Microsoft HTML help files, then download and install the Microsoft HTML Help Workshop from http://www.microsoft.com/downloads/details.aspx?FamilyID=00535334-c8a6-452f-9aa0-d597d16580cc. Make sure the location where it is installed (for example, C:\Program Files\HTML Help Workshop) is included in the PATH.
2. (Optional) If you want to build your own R installer, then download and install Inno Setup from http://www.jrsoftware.org/isinfo.php. After you have done this, edit the file src\gnuwin32\MkRules in the R-2.9.2 directory. Change ISDIR to the location where Inno Setup was installed. (By default, this location is C:\Program Files\Inno Setup 5.)
3. (Optional) Download and install LaTeX if you want to build PDF versions of the help files. A suitable version is MiKTeX, from http://www.miktex.org/.
Return to the command window and change directories to the src\gnuwin32 directory in the R sources (for example, C:\stuff\things\R-2.9.2\src\gnuwin32). Run the following command to build R:
```
$ make all recommended
```
To check that the build was successful, you can run the command:
```
$ make check
```
Or for more comprehensive checks:
```
$ make check-all
```
I found that the checks failed due to a silly error. (The checks included testing examples in libraries, so the test application tried to open a network connection to http://foo.bar, a hostname that could not be resolved.) Use your own discretion about whether the tests were successful or not.
If everything worked correctly, you can now try your own build of R. The executables will be located in the R-2.9.2\bin directory. The full GUI version is named Rgui.exe; the command-line version is R.exe.
If you would like to build your own installer, then execute the following command in the src\gnuwin32 directory:
```
$ make distribution
```
(I got some errors late in the install process. The standard makefiles try to delete content when they’re done. If you don’t make it past building rinstaller, manually run make cran.) To check if the process worked, look for the installer in the gnuwin32\cran directory.

For more information about how to build R on Microsoft Windows platforms, see the directions in the R Installation and Administration Manual. (You can read the manual online at http://cran.r-project.org/doc/manuals/R-admin.html, or you can download a PDF from http://cran.r-project.org/doc/manuals/R-admin.pdf.)

Building R on Unix-like systems

Unix-like systems are by far the easiest systems on which to build R. Here is how to do it:

Install the standard development tools: gcc, make, perl, binutiles, and LaTeX. (If you don’t know if you have all the tools and are using a standard Linux version such as Fedora, you have probably already installed all the components you need. Unfortunately, it’s outside the scope of this book to explain how to find and install missing components. Try using the precompiled binaries, or find a good book on Unix system administration.)
Download the R source code from http://cran.r-project.org/src/base/.
Run the following command to unpack the source code into the directory R-2.10.0:
```
$ tar xvfz R-2.10.0.tar.gz
```
(Note that I used R-2.10.0.tar.gz. Change the command as needed for the R version you are installing.)
Change to the R-2.10.0 directory. Run the following commands to build R:
```
$ ./configure
$ make
```
To check that the build was successful, you can run the command:
```
$ make check
```
Or for more comprehensive checks:
```
$ make check-all
```
Finally, if everything is OK, run the following command to install R:
```
$ make install
```

These directions will work on Mac OS X if you want to build a command-line version of R or a version of R that works through the X Windows system. They will not build the full Mac OS X GUI.

Building R on Mac OS X

Building R on Mac OS X is a little trickier than building it on Windows or Linux systems because you have to fetch more individual pieces. For directions on how to compile R on Mac OS X, see http://cran.r-project.org/doc/manuals/R-admin.html. You may also want to read the FAQ file at http://cran.cnr.Berkeley.edu/bin/macosx/RMacOSX-FAQ.html, which gives some hints on how to build.