Introduction

image with no caption

R is a scripting language for statistical data manipulation and analysis. It was inspired by, and is mostly compatible with, the statistical language S developed by AT&T. The name S, for statistics, was an allusion to another programming language with a one-letter name developed at AT&T—the famous C language. S later was sold to a small firm, which added a graphical user interface (GUI) and named the result S-Plus.

R has become more popular than S or S-Plus, both because it’s free and because more people are contributing to it. R is sometimes called GNU S, to reflect its open source nature. (The GNU Project is a major collection of open source software.)

As the Cantonese say, yauh peng, yauh leng, which means “both inexpensive and beautiful.” Why use anything else?

R has a number of virtues:

I should warn you at the outset that you typically submit commands to R by typing in a terminal window, rather than clicking a mouse in a GUI, and most R users do not use a GUI. This doesn’t mean that R doesn’t do graphics. On the contrary, it includes tools for producing graphics of great utility and beauty, but they are used for system output, such as plots, not for user input.

If you can’t live without a GUI, you can use one of the free GUIs that have been developed for R, such as the following open source or free tools:

The first three, RStudio, StatET and ESS, should be considered integrated development environments (IDEs), aimed more toward programming. StatET and ESS provide the R programmer with an IDE in the famous Eclipse and Emacs settings, respectively.

On the commercial side, another IDE is available from Revolution Analytics, an R service company (http://www.revolutionanalytics.com/).

Because R is a programming language rather than a collection of discrete commands, you can combine several commands, each using the output of the previous one. (Linux users will recognize the similarity to chaining shell commands using pipes.) The ability to combine R functions gives tremendous flexibility and, if used properly, is quite powerful. As a simple example, consider this (compound) command:

nrow(subset(x03,z == 1))

First, the subset() function takes the data frame x03 and extracts all records for which the variable z has the value 1. This results in a new frame, which is then fed to the nrow() function. This function counts the number of rows in a frame. The net effect is to report a count of z = 1 in the original frame.

The terms object-oriented programming and functional programming were mentioned earlier. These topics pique the interest of computer scientists, and though they may be somewhat foreign to most other readers, they are relevant to anyone who uses R for statistical programming. The following sections provide an overview of both topics.