R
is a freely available interactive computing environment. At its most basic, you can think of R
as a fancy calculator, and you could limit yourself to using it that way. However, R
offers much richer functionalities, from the ability to generate graphs to a flexible programming language that incorporates most standard mechanism for flow control. Indeed, R
is a very flexible language; there are often many different ways to accomplish any given task.
Although a number of graphic user interfaces for R
exist, we will be using the standard distribution, which has a command-based interface. This means that you need to communicate with the software by typing instructions into a command window. For the purpose of this book, we believe that such a stripped-down interface actually makes it easier for readers to get started. The R
language is quite intuitive, so we hope that its deployment in this book will not prove an insurmountable obstacle even for readers with no programming experience.
This appendix provides readers with an introduction to the R
language that covers the background needed to understand, and eventually extend, the simulations we have included in each of the book chapters. Unless you have dabbled in R
before, we strongly recommend that you familiarize yourself with this appendix. If you are interested in learning more about the R
environment, there are a variety of books online targeted at students with all levels of backgrounds!
R
You can obtain R
distribution bundles for Windows, Mac OS, and Linux at the CRAN website, https://cran.r-project.org. Just click on the link that corresponds to the appropriate operating system and follow the instructions. Once R
has been installed, you can execute it by clicking on the icon. In a Microsoft Windows or Mac OS X computer, you should then see an interactive command console that looks like Figure A.1.
Figure A.1 The R interactive command console in a Mac OS X computer. The symbol >
is a prompt for users to provide instructions; these will be executed immediately after the user presses the RETURN
key.
The last line in the command window is a prompt line that starts with the symbol >
. This indicates that R
is waiting for instructions. At this point, we could ask it, for example, to give us the sum of 5 and 7 by typing 5 + 7
in the prompt and then pressing the RETURN
key. This is what your R
console should show if you go ahead and do it:
Note that R
provides the expected answer (the number 12) in the next line. (For now, ignore the [1]
symbol at the beginning of the line, we will explain what it means later.) Similarly, you can perform many other arithmetic operations:
The text that appears after the symbol is a comment. We have added comments to this code to explain what the different commands do. However, you do not need to type them yourself or worry about them: any text between
and the next
RETURN
is ignored by R
. Furthermore, as the last command illustrates, incomplete expressions (which could happen, e.g., when you press RETURN
too early by mistake) are continued on the next line. Continuation lines start with the +
prompt instead of the usual >
.
All your standard functions (including trigonometric and exponential functions) are implemented in R
. In addition to your regular arithmetic operations, you can also perform “integer” operations:
You can get help with these functions and operators (as well as about any other R
functionality) using the help()
function. help()
will prompt the creation of a separate window with the relevant information. For example, if you type help("%/%")
, then a window that contains detailed help on how to carry out arithmetic operations will pop up.
The standard precedence of operations applies in R, with exponentiation being resolved before multiplications/divisions and additions/subtractions being last. However, you can use parenthesis to change the order in which operations are carried out
R
can treat as a number, which is represented by the symbol
Inf
. Similarly, undefined operations, such as dividing 0 by 0, return the NaN
(“Not a Number”) symbol:
You can store values in named variables that can later be used in expressions just like regular numbers. For example,
As you could guess, variable names cannot consist of only numbers. You should also avoid the names of existing functions or operators.
To check the current value of an object, you can simply type its name at the prompt.
Once an object has been created, it remains in memory until you remove it (or close your current R
session), so you can reuse it multiple times. You can check all objects in memory by using the command ls
and remove an object from memory by using the function rm
.
Some variables containing widely used constants (such as ) are already predefined:
A vector is just a list of values that share a common name but can be accessed independently of each other. You can think of a vector as a big box divided into many compartments organized sequentially, with each compartment containing a different value. You can either move the whole box around or, if needed, access the individual compartments (see Figure A.2). You can create arbitrary vectors using the c()
function.
Figure A.2 A representation of a vector x
of length 6 as a series of containers, each one of them corresponding to a different number.
Generally speaking, creating vectors by using the c()
function is tedious. When the vectors follow regular patterns, you can use the rep()
and seq()
commands to simplify the process.
The meaning of the strings [1]
and [36]
in the first and second lines should now be clear: they tell you what is the index of the first element that appears in each line. This is meant to make it easier for you to read a vector off the screen.
You can access individual elements of a vector by using the subsetting operator []
, and you can find the length of a vector by using the length()
function.
You can also use the []
operator to create a subvector that only contains some of the entries in the original vector. Note that negative indexes remove entries.
In many ways, vectors can be manipulated as if they were scalar variables. For example, you can add or multiply two vectors of the same length. If you do, operations are carried out elementwise, that is, the result is another vector of the same length whose first element is the sum/product of the first elements in each of the two origianl vectors and so on:
If the length of the two vectors is not the same, R
“recycles” the entries of the shorter vector until the sizes match. This might lead to a warning or not, depending on whether the length of the longer vector is a multiple of the length of the shorter one. My recommendation is that you avoid recycling until you have gained substantial experience with R
.
Many functions in R
are vectorized, that is, if a vector is passed as the argument, then the function is applied individually to each element. This helps make the R
code easier to read.
Another example of a vectorized function is cumsum()
, which provides cumulative sums of the elements of a vector. This is particularly useful if the entries of the original vector represent the payoffs of a repeated bet, in which case the cumulative sum represents the running profit/loss that the player has incurred.
Some functions are not vectorized, but are instead designed to operate on all elements of the vector simultaneously. For example, the functions sum()
, mean()
, max()
, and min()
give you the sum, the average, the maximum, and the minimum of all the entries of a vector,
Matrices are similar to vectors, but instead of storing elements sequentially they do so in a rectangular array. Hence, entries on a matrix are indexed by two numbers; the first one corresponding to the row on which it is located; and the second corresponding to the column. Furthermore, each row or column of a matrix is simply a vector.
You can create a matrix by starting with a long vector and then using its elements to fill the matrix sequentially by either row or column.
Note that the strings [1,]
, [2,]
, and [3,]
at the beginning of each line serve to identify the rows of the matrix, while the strings [,1]
and [,2]
identify the columns. As this suggests, the elements of the matrix can be accessed using the []
operator with two indexes separated by a comma. If you want to access a whole row or a whole column of the matrix, leave the index empty (the result will be treated as a vector).
Sometimes, it is useful to compute rowwise or columnwise sums of the elements of a matrix. The functions rowSums()
and colSums()
allow you do to exactly that
More general functions can be used on each row or column of the array through the apply()
function.
So far, we have only discussed variables that contain real numbers. However, R
allows for variables that contain other types of objects. One example corresponds to logical variables, which take only two values (TRUE
and FALSE
) and are the centerpiece of Boolean algebra.
Logical values are often the result of comparisons between other types of objects:
Note that, while =
is the assignment operator used to assign values to variables, ==
is the equal to operator involved in comparisons
You can combine results from various comparisons using the and
and or
operators, which in Boolean algebra play a similar role to products and additions in standard algebra:
Just like multiplications are resolved before sums by convention, and
operations are resolved before or
operations. As before, you can use parentheses to change the order in which operations are carried out:
Comparison operations are also vectorized:
We can check whether a variable takes at least one value among a list of possibilities by combining multiple comparisons using or
operators:
However, this approach can be impractical if the number of options is large. As an alternative, we can use the %in%
function.
The functions any()
and all()
provide convenient ways to check if at least one or if all the elements of the vector are true.
When arithmetic functions are used with logical vectors, TRUE
values are treated as 1s and FALSE
are treated as 0s.
Logical vectors provide another way to select entries of a vector. For example, if we are interested in the sub-vector of x
that contains the entries that are greater than 2.5:
Characters in R
are distinguished by the fact that they are enclosed in quotation marks (either single or double quote delimiters can be used, but double quote are generally preferred). You can create character vectors and perform comparisons with them just like you did with numeric vectors.
Arithmetic operations are not defined for character vectors, even if they only contain numbers:
However, you can coerce characters that only contain number to numerical objects for which regular algebraic operations are defined using the as.numeric()
function.
You can use R
to easily create plots. For example, suppose that we want to plot the parabola in the interval
. To do so, we need to first compute the value of
over a fine grid of values in the interval of interest. The function
plot()
then can be used to generate a new window that contains a Cartesian coordinate system and a series of dots that represent the coordinates of each point in the grid and the corresponding value of (see Figure A.3).
Figure A.3 An example of a scatterplot in R]An example of a scatterplot in R.
Figure A.3 uses dots to represent the function. However, in this case, it would be more convenient to connect the values using lines. This can be easily achieved using the type
option. Similarly, you can change the labels of the axes using the xlab
(for the -axis label) and the
ylab
(for the -axis label) options (see Figure A.4).
Figure A.4 An example of a line plot in R.
The plot function admits a number of additional parameters that are helpful in fine tuning graphs. Examples include col
(which allows you to change the color of the lines/points) and lty
(which allows you to use dashed and dotted lines). A full discussion of all options, however, is beyond the scope of this introduction.
When creating plots, it is usually a good idea to add reference lines that help focus attention on the features of the graph that are most relevant for the discussion at hand or to place mutliple plots on a single graph. The function abline()
allows you to add straight reference lines to an existing plot that was previously created using the plot()
function. Similarly, the functions lines
and points
can be used to add additional plots to an existing one. Figure A.5 was created using the following code.
Figure A.5 Adding multiple plots and reference lines to a single graph.
One last type of graph that will be useful as you move along the book is a bar graph. As the name suggests, in a bar graph, a list of numerical values of variables are represented by the height of rectangles of equal width. The function barplot()
can be used to create a bar chart in R
(see Figure A.6):
Figure A.6 Example of a barplot in R.
When the same operation needs to be repeated a large enough number of times, sequentially inputing the commands by hand is impractical. Vectorization sometimes offers a way to deal with these situations, but it is not always possible or practical. For example, when the outcome of one iteration depends on the results from previous ones, vectorization is usually not helpful. Loops provide a flexible alternative to deal with iterated operations.
To motivate loops, consider creating a matrix with 10 rows, each corresponding to sequences of 6 integers, all with the same starting value but different increments (increments of 4 for the first row, increments of 5 for the second, etc.). This can be achieved using the following code:
Note that the 2nd to the 11th instructions are structurally identical. They only differ on two features: the index of the row increases and the by
argument changes to reflect the desired increment in the sequence. for
loops allow you to accomplish the same task without having to write one separate instruction for each row of the matrix. for
loops, which allow you to repeat the same set of instructions a fixed number times, have the following syntax:
for(counter in vector){
block of instructions to be repeated
}
The counter
, which is defined within the parentheses that follow the for
instruction, is a variable that sequentially takes the values contained in vector
. Roughly speaking, this is the variable that tells you how many times the operations are going to be repeated. On the other hand, a set of instructions that are going to be repeated, once for every value in vector
, are located within the curly brackets that follow the parentheses.
As an example, the following code uses a for
loop to complete the task of filling out the rows of a matrix with different sequences of numbers:
Iterations of a loop can depend on the result of previous iterations. For example, consider computing the first 20 terms of the Fibonacci sequenceA.1
:
while
loops are an alternative to for
loops. Rather than being executed a fixed number of times, while
loops are executed indefinitely until a given condition is satisfied. The syntax for a while
loop is
while(condition){
block of instructions to be repeated
}
The expression that replaces the placeholder condition
must result in a single logical value (while
loops are not vectorized). As before, the block of instructions that will be repeated until the condition is satisfied is placed between curly brackets. The condition associated with a while
loop is checked before each iteration is executed. Hence, if the condition is not satisfied before the loop starts, the instructions inside are never executed.
As an example of the use of while
loops, consider the problem of generating the first term of the Fibonacci sequence that is greater than 1000 (recall from our previous example that the value of such a term is 1597). Since we do not necessarily know in advance how many terms will need to be computed, we use a while
loop that checks on the value of the Fibonacci sequence after each iteration and terminates if the current term is greater than 1000.
You might sometimes find that different pieces of your code need to be executed depending on whether specific conditions are satisfied. For example, you might want to set the value of a variable differently depending on whether another variable is positive or negative. if
/else
statements allow you to accomplish this goal. The syntax for an if
/else
loop is
if(condition){
block of instructions if condition is TRUE
}else{
block of instructions if condition is FALSE
}
As with a while
loop, the expression that replaces the placeholder condition
must result in a single logical value. Depending on whether condition
is TRUE
or FALSE
, only the top (or bottom) block of instructions will be executed. If an else statement is not included, then no instructions are executed when condition
is FALSE
.
The following code shows an example of conditional execution:
if
/else
statements can be particularly useful in conjunction with for
and while
loops. The function ifelse()
is a vectorized version of the if
/else
, but we will rarely use it in this book.
Once you have finished with your work, you can save all of it by using the option Save Workspace File…
in the Workspace
menu. This will prompt a window where you can type a name for the workspace and select a folder where it will be stored. To load the workspace at a later time, you can either double click on the workspace file or use the option Load Workspace File…
in the same Workspace
menu.
One of the key features of R
is its extendibility. A number of authors have developed groups of specialized functions that are distributed in the form of “packages”. A large number of packages are available from the CRAN
website. In this book, we employ the “prob”' package developed by G. Jay Kern at Youngstown State University. To install the package, you can use the Package Installer
option of the Packages & Data
menu. Alternatively, you can use the install.packages()
function from the command line.
In either case, you will see a number of messages associated with the installation appear in the command windows. In most circumstances, you can ignore these messages. Once the package has been installed, you will need to load it at the beginning of every R
session by using the library()
function:
Failing to load the package before using any of its functions is a common source of errors and confusion. Please do not forget to do so!