Many use R mainly in an ad hoc way—to plot a histogram here, perform a regression analysis there, and carry out other discrete tasks involving statistical operations. But this book is for those who wish to develop software in R. The programming skills of our intended readers may range anywhere from those of a professional software developer to “I took a programming course in college,” but their key goal is to write R code for specific purposes. (Statistical knowledge will generally not be needed.)
Here are some examples of people who may benefit from this book:
Analysts employed by, say, a hospital or government agency who produce statistical reports on a regular basis and need to develop production programs for this purpose
Academic researchers developing statistical methodology that is either new or combines existing methods into integrated procedures who need to codify this methodology so that it can be used by the general research community
Specialists in marketing, litigation support, journalism, publishing, and so on who need to develop code to produce sophisticated graphical presentations of data
Professional programmers with experience in software development who have been assigned by their employers to projects involving statistical analysis
Students in statistical computing courses
Accordingly, this book is not a compendium of the myriad types of statistical methods that are available in the wonderful R package. It really is about programming and covers programming-related topics missing from most other books on R. I place a programming spin on even the basic subjects. Here are some examples of this approach in action:
Throughout the book, you’ll find “Extended Example” sections. These usually present complete, general-purpose functions rather than isolated code fragments based on specific data. Indeed, you may find some of these functions useful for your own daily R work. By studying these examples, you learn not only how individual R constructs work but also how to put them together into a useful program. In many cases, I’ve included a discussion of design alternatives, answering the question “Why did we do it this way?”
The material is approached with a programmer’s sensibilities in mind. For instance, in the discussion of data frames, I not only state that a data frame is an R list but also point out the programming implications of that fact. Comparisons of R to other languages are also brought in when useful, for those who happen to know other languages.
Debugging plays a key role when programming in any language, yet it is not emphasized in most R books. In this book, I devote an entire chapter to debugging techniques, using the “extended example” approach to present fully worked-out demonstrations of how actual programs are debugged.
Today, multicore computers are common even in the home, and graphics processing unit (GPU) programming is waging a quiet revolution in scientific computing. An increasing number of R applications involve very large amounts of computation, and parallel processing has become a major issue for R programmers. Thus, there is a chapter on this topic, which again presents not just the mechanics but also extended examples.
There is a separate chapter on how to take advantage of the knowledge of R’s internal behavior and other facilities to speed up R code.
A chapter discusses the interface of R to other languages, such as C and Python, again with emphasis on extended examples as well as tips on debugging.