R package parallel

First, let's look at the simple usage of an R function called lapply() (refer to the following code):

> lapply(1:3,function(x) c(sin(x),x^2)) 
[[1]] 
[1] 0.841471 1.000000 
[[2]] 
[1] 0.9092974 4.0000000 
[[3]] 
[1] 0.14112 9.00000 

The meaning is clear: we have an input size of 1, 2, and 3 and we assign them to three functions. The following example is a slightly more complex one:

myFunctions<-c(sin(x),x^2+2,4*x^2-x^3-2) 
inputValue<-1:10 
output<-lapply(inputValue,function(x) myFunctions) 

The first couple of lines are shown here:

The following example is borrowed from Gordon (2015):

library(parallel) 
n_cores <- detectCores() - 1 
cl <- makeCluster(n_cores) 
parLapply(cl, 2:4,function(exponent) 2^exponent) 
stopCluster(cl) 

In the preceding code, the makeCluster() function will set up the cluster. The parLapply() function calls the parallel version of lapply() or parLapply() functions. The output is shown here:

 [[1]] 
[1] 4 
 
[[2]] 
[1] 8 
 
[[3]] 
[1] 16 

For the following code, we will see an error message:

c2<-makeCluster(n_cores) 
base <- 2 
parLapply(c2, 2:4, function(exponent) base^exponent) 
stopCluster(c2) 

The error message is as follows:

Error in checkForRemoteErrors(val) :  
  3 nodes produced errors; first error: object 'base' not found 

To correct it, the base variable will be added (refer to the following code).

c3<-makeCluster(n_cores) 
base <- 2 
clusterExport(c3, "base") 
parLapply(c3, 2:4, function(exponent)  base^exponent) 
stopCluster(c3) 

To save space, the output will not be shown here. The following is another example to see the difference between calling the lapply() and mclapply() functions:

library(parallel) 
detectCores() 
myFunction<- function(iter=1,n=5){ 
    x<- rnorm(n, mean=0, sd=1 ) 
    eps <- runif(n,-2,2) 
    y <- 1 + 2*x + eps 
    result<-lm( y ~ x ) 
    final<-cbind(result$coef,confint(result)) 
    return(final)  
} 
# 
m<-5000
n2<-5000
system.time(lapply(1:m,myFunction,n=n2))
system.time(mclapply(1:m,myFunction,n=n2))

The output is shown here:

> system.time(lapply(1:n,myFunction,n=n2)) 
   user  system elapsed  
  63.97    3.26   22.49
> system.time(mclapply(1:n,myFunction,n=n2)) 
   user  system elapsed  
  63.33 3.28 22.26  

In the preceding code, the lappy() and mclappy() functions are used. The mclapply() function is a parallelized version of the lapply() function. It returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. The following program is borrowed from http://www.smart-stats.org/wiki/parallel-computing-cluster-using-r with minor modifications. Note that the program is run on a UNIX instead of a PC:

library(snow) 
library(parallel) 
#library(Rmpi) 
myFunction<-function(n) { 
    a<-rnorm(n) 
    final<-log(abs(a))+a^3+2*a; 
    return(final) 
} 
nCores=11; 
#Using multicore 
system.time(mclapply(rep(5E6,11),myFunction,mc.cores=nCores)) 
#Using snow via MPI 
system.time(sapply(rep(5E6,11),myFunction)) 
#cl <- getMPIcluster() 
cl <- makeCluster(c("localhost","localhost"), type = "SOCK") 
system.time(parSapply(cl,rep(5E6,11),myFunction)) 

The related output is shown here:

> system.time(mclapply(rep(5E6,11),myFunction,mc.cores=nCores)) 
   user  system elapsed 
  4.440   1.075   1.926 
> system.time(sapply(rep(5E6,11),myFunction)) 
   user  system elapsed 
 10.294   0.992  11.286 
> system.time(parSapply(cl,rep(5E6,11),myFunction)) 
   user  system elapsed 
  0.655   0.626   7.328 
> proc.time() 
   user  system elapsed 
 15.621   2.936  22.134