Using R from Python

Python is an elegant and powerful language, but it lacks built-in facilities for statistical and data manipulation, two areas in which R excels. This section demonstrates how to call R from Python, using RPy, one of the most popular interfaces between the two languages.

RPy is a Python module that allows access to R from Python. For extra efficiency, it can be used in conjunction with NumPy.

You can build the module from the source, available from http://rpy.sourceforge.net, or download a prebuilt version. If you are running Ubuntu, simply type this:

sudo apt-get install python-rpy

To load RPy from Python (whether in Python interactive mode or from code), execute the following:

from rpy import *

This will load a variable r, which is a Python class instance.

Running R from Python is in principle quite simple. Here is an example of a command you might run from the >>> Python prompt:

>>> r.hist(r.rnorm(100))

This will call the R function rnorm() to produce 100 standard normal variates and then input those values into R’s histogram function, hist().

As you can see, R names are prefixed by r., reflecting the fact that Python wrappers for R functions are members of the class instance r.

The preceding code will, if not refined, produce ugly output, with your (possibly voluminous!) data appearing as the graph title and the x-axis label. You can avoid this by supplying a title and label, as in this example:

>>> r.hist(r.rnorm(100),main='',xlab='')

RPy syntax is sometimes less simple than these examples would lead you to believe. The problem is that R and Python syntax may clash. For instance, consider a call to the R linear model function lm(). In our example, we will predict b from a.

>>> a = [5,12,13]
>>> b = [10,28,30]
>>> lmout = r.lm('v2 ˜ v1',data=r.data_frame(v1=a,v2=b))

This is somewhat more complex than it would have been if done directly in R. What are the issues here?

First, since Python syntax does not include the tilde character, we needed to specify the model formula via a string. Since this is done in R anyway, this is not a major departure.

Second, we needed a data frame to contain our data. We created one using R’s data.frame() function. In order to form a period in an R function name, we need to use an underscore on the Python end. Thus we called r.data_frame(). Note that in this call, we named the columns of our data frame v1 and v2 and then used these in our model formula.

The output object is a Python dictionary (analog of R’s list type), as you can see here (in part):

>>> lmout
{'qr': {'pivot': [1, 2], 'qr': array([[ −1.73205081, −17.32050808],
       [  0.57735027,  −6.164414  ],
       [  0.57735027,   0.78355007]]), 'qraux':

You should recognize the various attributes of lm() objects here. For example, the coefficients of the fitted regression line, which would be contained in lmout$coefficients if this were done in R, are here in Python as lmout[’coefficients’]. So, you can access those coefficients accordingly, for example like this:

>>> lmout['coefficients']
{'v1': 2.5263157894736841, '(Intercept)': −2.5964912280701729}
>>> lmout['coefficients']['v1']
2.5263157894736841

You can also submit R commands to work on variables in R’s namespace, using the function r(). This is convenient if there are many syntax clashes. Here is how we could run the wireframe() example in Section 12.4 in RPy:

>>> r.library('lattice')
>>> r.assign('a',a)
>>> r.assign('b',b)
>>> r('g <- expand.grid(a,b)')
>>> r('g$Var3 <- g$Var1^2 + g$Var1 * g$Var2')
>>> r('wireframe(Var3 ˜ Var1+Var2,g)')
>>> r('plot(wireframe(Var3 ˜ Var1+Var2,g))')

First, we used r.assign() to copy a variable from Python’s namespace to R’s. We then ran expand.grid() (with a period in the name instead of an underscore, since we are running in R’s namespace), assigning the result to g. Again, the latter is in R’s namespace. Note that the call to wireframe() did not automatically display the plot, so we needed to call plot().

The official documentation for RPy is at http://rpy.sourceforge.net/rpy/doc/rpy.pdf. Also, you can find a useful presentation, “RPy—R from Python,” at http://www.daimi.au.dk/~besen/TBiB2007/lecture-notes/rpy.html.