R High Performance Programming by Lim, Aloysius -- Read -- Imperial Library of Trantor

Index

R High Performance Programming

Table of Contents R High Performance Programming Credits About the Authors About the Reviewers www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe? Free access for Packt account holders

Preface

What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy Questions

1. Understanding R's Performance – Why Are R Programs Sometimes Slow?

Three constraints on computing performance – CPU, RAM, and disk I/O R is interpreted on the fly R is single-threaded R requires all data to be loaded into memory Algorithm design affects time and space complexity Summary

2. Profiling – Measuring Code's Performance

Measuring total execution time

Measuring execution time with system.time() Repeating time measurements with rbenchmark Measuring distribution of execution time with microbenchmark

Profiling the execution time

Profiling a function with Rprof() The profiling results

Profiling memory utilization Monitoring memory utilization, CPU utilization, and disk I/O using OS tools Identifying and resolving bottlenecks Summary

3. Simple Tweaks to Make R Run Faster

Vectorization Use of built-in functions Preallocating memory Use of simpler data structures Use of hash tables for frequent lookups on large data Seeking fast alternative packages in CRAN Summary

4. Using Compiled Code for Greater Speed

Compiling R code before execution

Compiling functions Just-in-time (JIT) compilation of R code

Using compiled languages in R

Prerequisites Including compiled code inline Calling external compiled code Considerations for using compiled code

R APIs R data types versus native data types Creating R objects and garbage collection Allocating memory for non-R objects

Summary

5. Using GPUs to Run R Even Faster

General purpose computing on GPUs R and GPUs

Installing gputools

Fast statistical modeling in R with gputools Summary

6. Simple Tweaks to Use Less RAM

Reusing objects without taking up more memory Removing intermediate data when it is no longer needed Calculating values on the fly instead of storing them persistently Swapping active and nonactive data Summary

7. Processing Large Datasets with Limited RAM

Using memory-efficient data structures

Smaller data types Sparse matrices Symmetric matrices Bit vectors

Using memory-mapped files and processing data in chunks

The bigmemory package The ff package

Summary

8. Multiplying Performance with Parallel Computing

Data parallelism versus task parallelism Implementing data parallel algorithms Implementing task parallel algorithms

Running the same task on workers in a cluster Running different tasks on workers in a cluster

Executing tasks in parallel on a cluster of computers Shared memory versus distributed memory parallelism Optimizing parallel performance Summary

9. Offloading Data Processing to Database Systems

Extracting data into R versus processing data in a database Preprocessing data in a relational database using SQL Converting R expressions to SQL

Using dplyr Using PivotalR

Running statistical and machine learning algorithms in a database Using columnar databases for improved performance Using array databases for maximum scientific-computing performance Summary

10. R and Big Data

Understanding Hadoop Setting up Hadoop on Amazon Web Services Processing large datasets in batches using Hadoop

Uploading data to HDFS Analyzing HDFS data with RHadoop Other Hadoop packages for R

Summary

Index

← Prev
Back
Next →

← Prev
Back
Next →