In computer science curricula, a common theme is the trade-off between time and space. In order to have a fast-running program, you may need to use more memory space. On the other hand, in order to conserve memory space, you might need to settle for slower code. In the R language, this trade-off is of particular interest for the following reasons:
R is an interpreted language. Many of the commands are written in C and thus do run in fast machine code. But other commands, and your own R code, are pure R and thus interpreted. So, there is a risk that your R application may run more slowly than you would like.
All objects in an R session are stored in memory. More precisely, all objects are stored in R’s memory address space. R places a limit of 231 – 1 bytes on the size of any object, even on 64-bit machines and even if you have a lot of RAM. Yet some applications do encounter larger objects.
This chapter will suggest ways that you can enhance the performance of your R code, taking into account the time/space trade-off.
What can be done to make R code faster? Here are the main tools available to you:
Optimize your R code through vectorization, use of byte-code compilation, and other approaches.
Write the key, CPU-intensive parts of your code in a compiled language such as C/C++.
Write your code in some form of parallel R.
The first approach will be covered in this chapter, and the other approaches are covered in Chapter 15 and Chapter 16.
To optimize your R code, you need to understand R’s functional programming nature and the way R uses memory.