First, let's look at the simple usage of an R function called lapply() (refer to the following code):
> lapply(1:3,function(x) c(sin(x),x^2)) [[1]] [1] 0.841471 1.000000 [[2]] [1] 0.9092974 4.0000000 [[3]] [1] 0.14112 9.00000
The meaning is clear: we have an input size of 1, 2, and 3 and we assign them to three functions. The following example is a slightly more complex one:
myFunctions<-c(sin(x),x^2+2,4*x^2-x^3-2) inputValue<-1:10 output<-lapply(inputValue,function(x) myFunctions)
The first couple of lines are shown here:
The following example is borrowed from Gordon (2015):
library(parallel) n_cores <- detectCores() - 1 cl <- makeCluster(n_cores) parLapply(cl, 2:4,function(exponent) 2^exponent) stopCluster(cl)
In the preceding code, the makeCluster() function will set up the cluster. The parLapply() function calls the parallel version of lapply() or parLapply() functions. The output is shown here:
[[1]] [1] 4 [[2]] [1] 8 [[3]] [1] 16
For the following code, we will see an error message:
c2<-makeCluster(n_cores) base <- 2 parLapply(c2, 2:4, function(exponent) base^exponent) stopCluster(c2)
The error message is as follows:
Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: object 'base' not found
To correct it, the base variable will be added (refer to the following code).
c3<-makeCluster(n_cores) base <- 2 clusterExport(c3, "base") parLapply(c3, 2:4, function(exponent) base^exponent) stopCluster(c3)
To save space, the output will not be shown here. The following is another example to see the difference between calling the lapply() and mclapply() functions:
library(parallel) detectCores() myFunction<- function(iter=1,n=5){ x<- rnorm(n, mean=0, sd=1 ) eps <- runif(n,-2,2) y <- 1 + 2*x + eps result<-lm( y ~ x ) final<-cbind(result$coef,confint(result)) return(final) } # m<-5000
n2<-5000
system.time(lapply(1:m,myFunction,n=n2))
system.time(mclapply(1:m,myFunction,n=n2))
The output is shown here:
> system.time(lapply(1:n,myFunction,n=n2)) user system elapsed 63.97 3.26 22.49 > system.time(mclapply(1:n,myFunction,n=n2)) user system elapsed 63.33 3.28 22.26
In the preceding code, the lappy() and mclappy() functions are used. The mclapply() function is a parallelized version of the lapply() function. It returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. The following program is borrowed from http://www.smart-stats.org/wiki/parallel-computing-cluster-using-r with minor modifications. Note that the program is run on a UNIX instead of a PC:
library(snow) library(parallel) #library(Rmpi) myFunction<-function(n) { a<-rnorm(n) final<-log(abs(a))+a^3+2*a; return(final) } nCores=11; #Using multicore system.time(mclapply(rep(5E6,11),myFunction,mc.cores=nCores)) #Using snow via MPI system.time(sapply(rep(5E6,11),myFunction)) #cl <- getMPIcluster() cl <- makeCluster(c("localhost","localhost"), type = "SOCK") system.time(parSapply(cl,rep(5E6,11),myFunction))
The related output is shown here:
> system.time(mclapply(rep(5E6,11),myFunction,mc.cores=nCores)) user system elapsed 4.440 1.075 1.926 > system.time(sapply(rep(5E6,11),myFunction)) user system elapsed 10.294 0.992 11.286 > system.time(parSapply(cl,rep(5E6,11),myFunction)) user system elapsed 0.655 0.626 7.328 > proc.time() user system elapsed 15.621 2.936 22.134