Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
R High Performance Programming
Table of Contents
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Understanding R's Performance – Why Are R Programs Sometimes Slow?
Three constraints on computing performance – CPU, RAM, and disk I/O
R is interpreted on the fly
R is single-threaded
R requires all data to be loaded into memory
Algorithm design affects time and space complexity
Summary
2. Profiling – Measuring Code's Performance
Measuring total execution time
Measuring execution time with system.time()
Repeating time measurements with rbenchmark
Measuring distribution of execution time with microbenchmark
Profiling the execution time
Profiling a function with Rprof()
The profiling results
Profiling memory utilization
Monitoring memory utilization, CPU utilization, and disk I/O using OS tools
Identifying and resolving bottlenecks
Summary
3. Simple Tweaks to Make R Run Faster
Vectorization
Use of built-in functions
Preallocating memory
Use of simpler data structures
Use of hash tables for frequent lookups on large data
Seeking fast alternative packages in CRAN
Summary
4. Using Compiled Code for Greater Speed
Compiling R code before execution
Compiling functions
Just-in-time (JIT) compilation of R code
Using compiled languages in R
Prerequisites
Including compiled code inline
Calling external compiled code
Considerations for using compiled code
R APIs
R data types versus native data types
Creating R objects and garbage collection
Allocating memory for non-R objects
Summary
5. Using GPUs to Run R Even Faster
General purpose computing on GPUs
R and GPUs
Installing gputools
Fast statistical modeling in R with gputools
Summary
6. Simple Tweaks to Use Less RAM
Reusing objects without taking up more memory
Removing intermediate data when it is no longer needed
Calculating values on the fly instead of storing them persistently
Swapping active and nonactive data
Summary
7. Processing Large Datasets with Limited RAM
Using memory-efficient data structures
Smaller data types
Sparse matrices
Symmetric matrices
Bit vectors
Using memory-mapped files and processing data in chunks
The bigmemory package
The ff package
Summary
8. Multiplying Performance with Parallel Computing
Data parallelism versus task parallelism
Implementing data parallel algorithms
Implementing task parallel algorithms
Running the same task on workers in a cluster
Running different tasks on workers in a cluster
Executing tasks in parallel on a cluster of computers
Shared memory versus distributed memory parallelism
Optimizing parallel performance
Summary
9. Offloading Data Processing to Database Systems
Extracting data into R versus processing data in a database
Preprocessing data in a relational database using SQL
Converting R expressions to SQL
Using dplyr
Using PivotalR
Running statistical and machine learning algorithms in a database
Using columnar databases for improved performance
Using array databases for maximum scientific-computing performance
Summary
10. R and Big Data
Understanding Hadoop
Setting up Hadoop on Amazon Web Services
Processing large datasets in batches using Hadoop
Uploading data to HDFS
Analyzing HDFS data with RHadoop
Other Hadoop packages for R
Summary
Index
← Prev
Back
Next →
← Prev
Back
Next →