Parallel R

Parallel R
Authors
McCallum, Q. Ethan & Weston, Stephen
Publisher
O'Reilly Media
Tags
computers , programming , general
ISBN
9781449309923
Date
2011-11-02T00:00:00+00:00
Size
0.84 MB
Lang
en
Downloaded: 43 times

It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You’ll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don’t.

With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.

Snow: works well in a traditional cluster environment

Multicore: popular for multiprocessor and multicore computers

Parallel: part of the upcoming R 2.14.0 release

R+Hadoop: provides low-level access to a popular form of cluster computing

RHIPE: uses Hadoop’s power with R’s language and interactive shell

Segue: lets you use Elastic MapReduce as a backend for lapply-style operations