First, let's look at the simple usage of an R function called lapply() (refer to the following code):
> lapply(1:3,function(x) c(sin(x),x^2)) [[1]] [1] 0.841471 1.000000 [[2]] [1] 0.9092974 4.0000000 [[3]] [1] 0.14112 9.00000
The meaning is clear: we have an input size of 1, 2, and 3 and we assign them to three functions. The following example is a slightly more complex one:
myFunctions<-c(sin(x),x^2+2,4*x^2-x^3-2) inputValue<-1:10 output<-lapply(inputValue,function(x) myFunctions)
The first couple of lines are shown here:
The following example is borrowed from Gordon (2015):
library(parallel) n_cores <- detectCores() - 1 cl <- makeCluster(n_cores) parLapply(cl, 2:4,function(exponent) 2^exponent) stopCluster(cl)
In the preceding code, the makeCluster() function will set up the cluster. The parLapply() function calls the parallel version of lapply() or parLapply() functions. The output is shown here:
[[1]] [1] 4 [[2]] [1] 8 [[3]] [1] 16
For the following code, we will see an error message:
c2<-makeCluster(n_cores) base <- 2 parLapply(c2, 2:4, function(exponent) base^exponent) stopCluster(c2)
The error message is as follows:
Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: object 'base' not found
To correct it, the base variable will be added (refer to the following code).
c3<-makeCluster(n_cores) base <- 2 clusterExport(c3, "base") parLapply(c3, 2:4, function(exponent) base^exponent) stopCluster(c3)
To save space, the output will not be shown here. The following is another example to see the difference between calling the lapply() and mclapply() functions:
library(parallel) detectCores() myFunction<- function(iter=1,n=5){ x<- rnorm(n, mean=0, sd=1 ) eps <- runif(n,-2,2) y <- 1 + 2*x + eps result<-lm( y ~ x ) final<-cbind(result$coef,confint(result)) return(final) } # m<-5000
The output is shown here:
> system.time(lapply(1:n,myFunction,n=n2)) user system elapsed 63.97 3.26 22.49 > system.time(mclapply(1:n,myFunction,n=n2)) user system elapsed 63.33 3.28 22.26
In the preceding code, the lappy() and mclappy() functions are used. The mclapply() function is a parallelized version of the lapply() function. It returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. The following program is borrowed from with minor modifications. Note that the program is run on a UNIX instead of a PC:
library(snow) library(parallel) #library(Rmpi) myFunction<-function(n) { a<-rnorm(n) final<-log(abs(a))+a^3+2*a; return(final) } nCores=11; #Using multicore system.time(mclapply(rep(5E6,11),myFunction,mc.cores=nCores)) #Using snow via MPI system.time(sapply(rep(5E6,11),myFunction)) #cl <- getMPIcluster() cl <- makeCluster(c("localhost","localhost"), type = "SOCK") system.time(parSapply(cl,rep(5E6,11),myFunction))
The related output is shown here:
> system.time(mclapply(rep(5E6,11),myFunction,mc.cores=nCores)) user system elapsed 4.440 1.075 1.926 > system.time(sapply(rep(5E6,11),myFunction)) user system elapsed 10.294 0.992 11.286 > system.time(parSapply(cl,rep(5E6,11),myFunction)) user system elapsed 0.655 0.626 7.328 > proc.time() user system elapsed 15.621 2.936 22.134