Since many R users have very large computational needs, various tools for some kind of parallel operation of R have been devised. This chapter is devoted to parallel R.
Many a novice in parallel processing has, with great anticipation, written parallel code for some application only to find that the parallel version actually ran more slowly than the serial one. For reasons to be discussed in this chapter, this problem is especially acute with R.
Accordingly, understanding the nature of parallel-processing hardware and software is crucial to success in the parallel world. These issues will be discussed here in the context of common platforms for parallel R.
We’ll start with a few code examples and then move to general performance issues.
Consider a network graph of some kind, such as web links or links in a social network. Let A
be the adjacency matrix of the graph, meaning that, say, A[3,8]
is 1 or 0, depending on whether there is a link from node 3 to node 8.
For any two vertices, say any two websites, we might be interested in mutual outlinks—that is, outbound links that are common to two sites. Suppose that we want to find the mean number of mutual outlinks, averaged over all pairs of websites in our data set. This mean can be found using the following outline, for an n-by-n matrix:
1 sum = 0 2 for i = 0...n-1 3 for j = i+1...n-1 4 for k = 0...n-1 sum = sum + a[i][k]*a[j][k] 5 mean = sum / (n*(n-1)/2)
Given that our graph could contain thousands—even millions—of websites, our task could entail quite large amounts of computation. A common approach to dealing with this problem is to divide the computation into smaller chunks and then process each of the chunks simultaneously, say on separate computers.
Let’s say that we have two computers at our disposal. We might have one computer handle all the odd values of i
in the for i
loop in line 2 and have the second computer handle the even values. Or, since dual-core computers are fairly standard these days, we could take this same approach on a single computer. This may sound simple, but a number of major issues can arise, as you’ll learn in this chapter.