Let’s make a simple data set (in R parlance, a vector) consisting of the numbers 1, 2, and 4, and name it x
:
> x <- c(1,2,4)
The standard assignment operator in R is <-
. You can also use =
, but this is discouraged, as it does not work in some special situations. Note that there are no fixed types associated with variables. Here, we’ve assigned a vector to x
, but later we might assign something of a different type to it. We’ll look at vectors and the other types in Section 1.4.
The c
stands for concatenate. Here, we are concatenating the numbers 1, 2, and 4. More precisely, we are concatenating three one-element vectors that consist of those numbers. This is because any number is also considered to be a one-element vector.
Now we can also do the following:
> q <- c(x,x,8)
which sets q
to (1,2,4,1,2,4,8)
(yes, including the duplicates).
Now let’s confirm that the data is really in x
. To print the vector to the screen, simply type its name. If you type any variable name (or, more generally, any expression) while in interactive mode, R will print out the value of that variable (or expression). Programmers familiar with other languages such as Python will find this feature familiar. For our example, enter this:
> x [1] 1 2 4
Yep, sure enough, x
consists of the numbers 1, 2, and 4.
Individual elements of a vector are accessed via [ ]
. Here’s how we can print out the third element of x
:
> x[3] [1] 4
As in other languages, the selector (here, 3
) is called the index or subscript. Those familiar with ALGOL-family languages, such as C and C++, should note that elements of R vectors are indexed starting from 1, not 0.
Subsetting is a very important operation on vectors. Here’s an example:
> x <- c(1,2,4) > x[2:3] [1] 2 4
The expression x[2:3]
refers to the subvector of x
consisting of elements 2 through 3, which are 2 and 4 here.
We can easily find the mean and standard deviation of our data set, as follows:
> mean(x) [1] 2.333333 > sd(x) [1] 1.527525
This again demonstrates typing an expression at the prompt in order to print it. In the first line, our expression is the function call mean(x)
. The return value from that call is printed automatically, without requiring a call to R’s print()
function.
If we want to save the computed mean in a variable instead of just printing it to the screen, we could execute this code:
> y <- mean(x)
Again, let’s confirm that y
really does contain the mean of x
:
> y [1] 2.333333
As noted earlier, we use #
to write comments, like this:
> y # print out y [1] 2.333333
Comments are especially valuable for documenting program code, but they are useful in interactive sessions, too, since R records the command history (as discussed in Section 1.6). If you save your session and resume it later, the comments can help you remember what you were doing.
Finally, let’s do something with one of R’s internal data sets (these are used for demos). You can get a list of these data sets by typing the following:
> data()
One of the data sets is called Nile
and contains data on the flow of the Nile River. Let’s find the mean and standard deviation of this data set:
> mean(Nile) [1] 919.35 > sd(Nile) [1] 169.2275
We can also plot a histogram of the data:
> hist(Nile)
A window pops up with the histogram in it, as shown in Figure 1-1. This graph is bare-bones simple, but R has all kinds of optional bells and whistles for plotting. For instance, you can change the number of bins by specifying the breaks
variable. The call hist(z,breaks=12)
would draw a histogram of the data set z
with 12 bins. You can also create nicer labels, make use of color, and make many other changes to create a more informative and eye-appealing graph. When you become more familiar with R, you’ll be able to construct complex, rich color graphics of striking beauty.
Well, that’s the end of our first, five-minute introduction to R. Quit R by calling the q()
function (or alternatively by pressing ctrl-D in Linux or cmd-D on a Mac):
> q() Save workspace image? [y/n/c]: n
That last prompt asks whether you want to save your variables so that you can resume work later. If you answer y
, then all those objects will be loaded automatically the next time you run R. This is a very important feature, especially when working with large or numerous data sets. Answering y
here also saves the session’s command history. We’ll talk more about saving your workspace and the command history in Section 1.6.