Other Ways to Speed Up R

Sometimes you can cheat a little bit: you can make R run faster without tuning your code. This section shows two ways to do that.

Normally, R is an interpreted language.

But beginning in R 2.13.0, R has included a byte code compiler to speed up computations. As an example, let’s consider the vector of squares function that we used above:

> naive.vector.of.squares
function(n) {
  v <- 1:n
  for (i in 1:n)
    v[i] <- v[i]^2
}
> system.time(naive.vector.of.squares(1000000))
   user  system elapsed
  3.025   0.016   3.036

Now we’ll use the cmpfun function to create a compiled version of this function and then test its performance.

> library(compiler)
> compiled.naive.vector.of.squares <- cmpfun(naive.vector.of.squares)
> system.time(compiled.naive.vector.of.squares(1000000))
   user  system elapsed
  0.637   0.005   0.636

As you can see, the compiled version of this function runs much faster. Of course, it still runs more slowly than the vector operation:

> system.time(better.vector.of.squares(1000000))
   user  system elapsed
  0.008   0.000   0.008

And compiling the vector operation does not make a huge difference:

> better.vector.of.squares.compiled <- cmpfun(better.vector.of.squares)
> system.time(better.vector.of.squares.compiled(1000000))
   user  system elapsed
  0.007   0.000   0.007

But that doesn’t mean you shouldn’t try the compiler for your problem. It’s one of the simplest tricks for speeding up your code. (It’s even easier than ordering a new, faster server. And it’s cheaper.)

On some platforms (like Mac OS X), R is compiled with high-quality math libraries. However, the default libraries on other platforms (like Windows) can be sluggish. If you’re working with large data sets or complicated mathematical operations, you might find it worthwhile to build an optimized version of R with better math libraries.

Revolution Computing is a software company that makes a high-performance version of R. It offers both free and commercial versions, including a 64-bit build of R for Windows. For the latest version, check out its website: http://www.revolution-computing.com/.

Revolution R looks a lot like the standard R binaries (although a little outdated; at the time I was writing this book, Revolution was shipping Revolution R 1.3.0 included R 2.7.2, while the current version from CRAN was 2.10.0). The key difference is the addition of improved math libraries. These are multithreaded and can take advantage of multiple cores when available. There are two helper functions included with Revolution R that can help you set and check the number of cores in use. To check the number of cores, use:

getMKLthreads()

Revolution R guesses the number of threads to use, but you can change the number yourself if it guesses wrong (or if you want to experiment). To set the number of cores explicitly, use:

setMKLthreads(n)

The help file suggests not setting the number of threads higher than the number of available cores.

Building your own R can be useful if you want to compile it to run more efficiently. For example, you can compile a 64-bit version of R if you want to work with data sets that require much more than 4 GB of memory. This section explains how to build R yourself.

The easiest way to build your own R binaries on Microsoft Windows is to use the Rtools software. The R compilation process is very sensitive to the tools that you use. So the Rtools software bundles together a set of tools that are known to work correctly with R. Even if you plan to use your own compiler, math libraries, or other components, you should probably start with the standard toolkit and incrementally modify it. That will help you isolate problems in the build process.

Here is how to successfully build your own R binaries (and installer!) on Microsoft Windows:

  1. Download the R source code from http://cran.r-project.org/src/base/.

  2. Download the “Rtools” software from http://www.murdoch-sutherland.com/Rtools/.

  3. Run the Rtools installer application. Follow the directions to install Rtools. You can select most default options, but I do not suggest installing all components at this stage. (The “Extras to build R” needs to be installed in the source code directory to be useful. However, we don’t install those until steps 4 and 5. Unfortunately, you need other tools from the RTools software in order to execute steps 4 and 5, so we can’t change the order of the steps to avoid running the installer twice.) As shown in Figure 24-1, you should select everything except “Extras to build R.” We’ll install that stuff later, so don’t throw out the tools installer yet. Also, if you use Cygwin, be sure to read the notes about conflicts with Cygwin DLLs (dynamic-link libraries). Be sure to select the option allowing Rtools to modify your PATH variable (or make sure to change it yourself).

  4. Move the source code file to a build directory, open a command-line window (possibly with cmd), and change to the build directory. (Be sure to open the command shell after installing the Rtools and modifying your PATH. This will guarantee that the commands in the next few steps are available.)

  5. Run the following command to unpack the source code into the directory R-2.9.2:

    $ tar xvfz R-2.9.2.tar.gz

    (Note that I used R-2.9.2.tar.gz. Change the command as needed for the R version you are installing.)

  6. Rerun the Rtools setup program. This time, select only the “Extras to build R” component, and no other components. Install the components into the source code directory that you just unpacked. (For example, if you have installed R into C:\stuff\things, then select C:\stuff\things\R-2.9.2.)

  1. At this point, you may install several additional pieces of software:

    1. (Optional) If you want to build Microsoft HTML help files, then download and install the Microsoft HTML Help Workshop from http://www.microsoft.com/downloads/details.aspx?FamilyID=00535334-c8a6-452f-9aa0-d597d16580cc. Make sure the location where it is installed (for example, C:\Program Files\HTML Help Workshop) is included in the PATH.

    2. (Optional) If you want to build your own R installer, then download and install Inno Setup from http://www.jrsoftware.org/isinfo.php. After you have done this, edit the file src\gnuwin32\MkRules in the R-2.9.2 directory. Change ISDIR to the location where Inno Setup was installed. (By default, this location is C:\Program Files\Inno Setup 5.)

    3. (Optional) Download and install LaTeX if you want to build PDF versions of the help files. A suitable version is MiKTeX, from http://www.miktex.org/.

  2. Return to the command window and change directories to the src\gnuwin32 directory in the R sources (for example, C:\stuff\things\R-2.9.2\src\gnuwin32). Run the following command to build R:

    $ make all recommended
  3. To check that the build was successful, you can run the command:

    $ make check

    Or for more comprehensive checks:

    $ make check-all

    I found that the checks failed due to a silly error. (The checks included testing examples in libraries, so the test application tried to open a network connection to http://foo.bar, a hostname that could not be resolved.) Use your own discretion about whether the tests were successful or not.

  4. If everything worked correctly, you can now try your own build of R. The executables will be located in the R-2.9.2\bin directory. The full GUI version is named Rgui.exe; the command-line version is R.exe.

  5. If you would like to build your own installer, then execute the following command in the src\gnuwin32 directory:

    $ make distribution

    (I got some errors late in the install process. The standard makefiles try to delete content when they’re done. If you don’t make it past building rinstaller, manually run make cran.) To check if the process worked, look for the installer in the gnuwin32\cran directory.

For more information about how to build R on Microsoft Windows platforms, see the directions in the R Installation and Administration Manual. (You can read the manual online at http://cran.r-project.org/doc/manuals/R-admin.html, or you can download a PDF from http://cran.r-project.org/doc/manuals/R-admin.pdf.)