Appendix A

A Brief Introduction to R

R is a freely available interactive computing environment. At its most basic, you can think of R as a fancy calculator, and you could limit yourself to using it that way. However, R offers much richer functionalities, from the ability to generate graphs to a flexible programming language that incorporates most standard mechanism for flow control. Indeed, R is a very flexible language; there are often many different ways to accomplish any given task.

Although a number of graphic user interfaces for R exist, we will be using the standard distribution, which has a command-based interface. This means that you need to communicate with the software by typing instructions into a command window. For the purpose of this book, we believe that such a stripped-down interface actually makes it easier for readers to get started. The R language is quite intuitive, so we hope that its deployment in this book will not prove an insurmountable obstacle even for readers with no programming experience.

This appendix provides readers with an introduction to the R language that covers the background needed to understand, and eventually extend, the simulations we have included in each of the book chapters. Unless you have dabbled in R before, we strongly recommend that you familiarize yourself with this appendix. If you are interested in learning more about the R environment, there are a variety of books online targeted at students with all levels of backgrounds!

A.1 Installing R

You can obtain R distribution bundles for Windows, Mac OS, and Linux at the CRAN website, https://cran.r-project.org. Just click on the link that corresponds to the appropriate operating system and follow the instructions. Once R has been installed, you can execute it by clicking on the icon. In a Microsoft Windows or Mac OS X computer, you should then see an interactive command console that looks like Figure A.1.

Snapshot of The R interactive command console in a Mac OS X computer.

Figure A.1 The R interactive command console in a Mac OS X computer. The symbol > is a prompt for users to provide instructions; these will be executed immediately after the user presses the RETURN key.

A.2 Simple Arithmetic

The last line in the command window is a prompt line that starts with the symbol >. This indicates that R is waiting for instructions. At this point, we could ask it, for example, to give us the sum of 5 and 7 by typing 5 + 7 in the prompt and then pressing the RETURN key. This is what your R console should show if you go ahead and do it:

b01uf001

Note that R provides the expected answer (the number 12) in the next line. (For now, ignore the [1] symbol at the beginning of the line, we will explain what it means later.) Similarly, you can perform many other arithmetic operations:

b01uf002

The text that appears after the c0A-math-001 symbol is a comment. We have added comments to this code to explain what the different commands do. However, you do not need to type them yourself or worry about them: any text between c0A-math-002 and the next RETURN is ignored by R. Furthermore, as the last command illustrates, incomplete expressions (which could happen, e.g., when you press RETURN too early by mistake) are continued on the next line. Continuation lines start with the + prompt instead of the usual >.

All your standard functions (including trigonometric and exponential functions) are implemented in R. In addition to your regular arithmetic operations, you can also perform “integer” operations:

b01uf003

You can get help with these functions and operators (as well as about any other R functionality) using the help() function. help() will prompt the creation of a separate window with the relevant information. For example, if you type help("%/%"), then a window that contains detailed help on how to carry out arithmetic operations will pop up.

The standard precedence of operations applies in R, with exponentiation being resolved before multiplications/divisions and additions/subtractions being last. However, you can use parenthesis to change the order in which operations are carried out

b01uf004

R can treat c0A-math-003 as a number, which is represented by the symbol Inf. Similarly, undefined operations, such as dividing 0 by 0, return the NaN (“Not a Number”) symbol:

b01uf005

A.3 Variables

You can store values in named variables that can later be used in expressions just like regular numbers. For example,

b01uf006

As you could guess, variable names cannot consist of only numbers. You should also avoid the names of existing functions or operators.

To check the current value of an object, you can simply type its name at the prompt.

b01uf007

Once an object has been created, it remains in memory until you remove it (or close your current R session), so you can reuse it multiple times. You can check all objects in memory by using the command ls and remove an object from memory by using the function rm.

b01uf008

Some variables containing widely used constants (such as c0A-math-004) are already predefined:

b01uf009

A.4 Vectors

A vector is just a list of values that share a common name but can be accessed independently of each other. You can think of a vector as a big box divided into many compartments organized sequentially, with each compartment containing a different value. You can either move the whole box around or, if needed, access the individual compartments (see Figure A.2). You can create arbitrary vectors using the c() function.

b01uf010
Illustration of a vector x of length 6 as a series of containers, each one of them corresponding to a different number.

Figure A.2 A representation of a vector x of length 6 as a series of containers, each one of them corresponding to a different number.

Generally speaking, creating vectors by using the c() function is tedious. When the vectors follow regular patterns, you can use the rep() and seq() commands to simplify the process.

b01uf011

The meaning of the strings [1] and [36] in the first and second lines should now be clear: they tell you what is the index of the first element that appears in each line. This is meant to make it easier for you to read a vector off the screen.

You can access individual elements of a vector by using the subsetting operator [], and you can find the length of a vector by using the length() function.

b01uf012

You can also use the [] operator to create a subvector that only contains some of the entries in the original vector. Note that negative indexes remove entries.

b01uf013

In many ways, vectors can be manipulated as if they were scalar variables. For example, you can add or multiply two vectors of the same length. If you do, operations are carried out elementwise, that is, the result is another vector of the same length whose first element is the sum/product of the first elements in each of the two origianl vectors and so on:

b01uf014

If the length of the two vectors is not the same, R “recycles” the entries of the shorter vector until the sizes match. This might lead to a warning or not, depending on whether the length of the longer vector is a multiple of the length of the shorter one. My recommendation is that you avoid recycling until you have gained substantial experience with R.

b01uf015

Many functions in R are vectorized, that is, if a vector is passed as the argument, then the function is applied individually to each element. This helps make the R code easier to read.

b01uf016

Another example of a vectorized function is cumsum(), which provides cumulative sums of the elements of a vector. This is particularly useful if the entries of the original vector represent the payoffs of a repeated bet, in which case the cumulative sum represents the running profit/loss that the player has incurred.

b01uf017

Some functions are not vectorized, but are instead designed to operate on all elements of the vector simultaneously. For example, the functions sum(), mean(), max(), and min() give you the sum, the average, the maximum, and the minimum of all the entries of a vector,

b01uf018

A.5 Matrices

Matrices are similar to vectors, but instead of storing elements sequentially they do so in a rectangular array. Hence, entries on a matrix are indexed by two numbers; the first one corresponding to the row on which it is located; and the second corresponding to the column. Furthermore, each row or column of a matrix is simply a vector.

You can create a matrix by starting with a long vector and then using its elements to fill the matrix sequentially by either row or column.

b01uf019

Note that the strings [1,], [2,], and [3,] at the beginning of each line serve to identify the rows of the matrix, while the strings [,1] and [,2] identify the columns. As this suggests, the elements of the matrix can be accessed using the [] operator with two indexes separated by a comma. If you want to access a whole row or a whole column of the matrix, leave the index empty (the result will be treated as a vector).

b01uf020

Sometimes, it is useful to compute rowwise or columnwise sums of the elements of a matrix. The functions rowSums() and colSums() allow you do to exactly that

b01uf021

More general functions can be used on each row or column of the array through the apply() function.

b01uf022

A.6 Logical Objects and Operations

So far, we have only discussed variables that contain real numbers. However, R allows for variables that contain other types of objects. One example corresponds to logical variables, which take only two values (TRUE and FALSE) and are the centerpiece of Boolean algebra.

Logical values are often the result of comparisons between other types of objects:

b01uf023

Note that, while = is the assignment operator used to assign values to variables, == is the equal to operator involved in comparisons

b01uf024

You can combine results from various comparisons using the and and or operators, which in Boolean algebra play a similar role to products and additions in standard algebra:

b01uf025

Just like multiplications are resolved before sums by convention, and operations are resolved before or operations. As before, you can use parentheses to change the order in which operations are carried out:

b01uf026

Comparison operations are also vectorized:

b01uf027

We can check whether a variable takes at least one value among a list of possibilities by combining multiple comparisons using or operators:

b01uf028

However, this approach can be impractical if the number of options is large. As an alternative, we can use the %in% function.

b01uf029

The functions any() and all() provide convenient ways to check if at least one or if all the elements of the vector are true.

b01uf030

When arithmetic functions are used with logical vectors, TRUE values are treated as 1s and FALSE are treated as 0s.

b01uf031

Logical vectors provide another way to select entries of a vector. For example, if we are interested in the sub-vector of x that contains the entries that are greater than 2.5:

b01uf032

A.7 Character Objects

Characters in R are distinguished by the fact that they are enclosed in quotation marks (either single or double quote delimiters can be used, but double quote are generally preferred). You can create character vectors and perform comparisons with them just like you did with numeric vectors.

b01uf033

Arithmetic operations are not defined for character vectors, even if they only contain numbers:

b01uf034

However, you can coerce characters that only contain number to numerical objects for which regular algebraic operations are defined using the as.numeric() function.

b01uf035

A.8 Plots

You can use R to easily create plots. For example, suppose that we want to plot the parabola c0A-math-005 in the interval c0A-math-006. To do so, we need to first compute the value of c0A-math-007 over a fine grid of values in the interval of interest. The function plot() then can be used to generate a new window that contains a Cartesian coordinate system and a series of dots that represent the coordinates of each point in the grid and the corresponding value of c0A-math-008 (see Figure A.3).

b01uf036
Illustration of scatterplot in R.

Figure A.3 An example of a scatterplot in R]An example of a scatterplot in R.

Figure A.3 uses dots to represent the function. However, in this case, it would be more convenient to connect the values using lines. This can be easily achieved using the type option. Similarly, you can change the labels of the axes using the xlab (for the c0A-math-009-axis label) and the ylab (for the c0A-math-010-axis label) options (see Figure A.4).

b01uf037
Illustration of a line plot in R.

Figure A.4 An example of a line plot in R.

The plot function admits a number of additional parameters that are helpful in fine tuning graphs. Examples include col (which allows you to change the color of the lines/points) and lty (which allows you to use dashed and dotted lines). A full discussion of all options, however, is beyond the scope of this introduction.

When creating plots, it is usually a good idea to add reference lines that help focus attention on the features of the graph that are most relevant for the discussion at hand or to place mutliple plots on a single graph. The function abline() allows you to add straight reference lines to an existing plot that was previously created using the plot() function. Similarly, the functions lines and points can be used to add additional plots to an existing one. Figure A.5 was created using the following code.

b01uf038
Illustration of Adding multiple plots and reference lines to a single graph.

Figure A.5 Adding multiple plots and reference lines to a single graph.

One last type of graph that will be useful as you move along the book is a bar graph. As the name suggests, in a bar graph, a list of numerical values of variables are represented by the height of rectangles of equal width. The function barplot() can be used to create a bar chart in R (see Figure A.6):

b01uf039
Illustration of a barplot in R.

Figure A.6 Example of a barplot in R.

A.9 Iterators

When the same operation needs to be repeated a large enough number of times, sequentially inputing the commands by hand is impractical. Vectorization sometimes offers a way to deal with these situations, but it is not always possible or practical. For example, when the outcome of one iteration depends on the results from previous ones, vectorization is usually not helpful. Loops provide a flexible alternative to deal with iterated operations.

To motivate loops, consider creating a matrix with 10 rows, each corresponding to sequences of 6 integers, all with the same starting value but different increments (increments of 4 for the first row, increments of 5 for the second, etc.). This can be achieved using the following code:

b01uf040

Note that the 2nd to the 11th instructions are structurally identical. They only differ on two features: the index of the row increases and the by argument changes to reflect the desired increment in the sequence. for loops allow you to accomplish the same task without having to write one separate instruction for each row of the matrix. for loops, which allow you to repeat the same set of instructions a fixed number times, have the following syntax:

   for(counter in vector){
     block of instructions to be repeated
   }

The counter, which is defined within the parentheses that follow the for instruction, is a variable that sequentially takes the values contained in vector. Roughly speaking, this is the variable that tells you how many times the operations are going to be repeated. On the other hand, a set of instructions that are going to be repeated, once for every value in vector, are located within the curly brackets that follow the parentheses.

As an example, the following code uses a for loop to complete the task of filling out the rows of a matrix with different sequences of numbers:

b01uf041

Iterations of a loop can depend on the result of previous iterations. For example, consider computing the first 20 terms of the Fibonacci sequenceA.1

:

b01uf042

while loops are an alternative to for loops. Rather than being executed a fixed number of times, while loops are executed indefinitely until a given condition is satisfied. The syntax for a while loop is

   while(condition){
     block of instructions to be repeated
   }

The expression that replaces the placeholder condition must result in a single logical value (while loops are not vectorized). As before, the block of instructions that will be repeated until the condition is satisfied is placed between curly brackets. The condition associated with a while loop is checked before each iteration is executed. Hence, if the condition is not satisfied before the loop starts, the instructions inside are never executed.

As an example of the use of while loops, consider the problem of generating the first term of the Fibonacci sequence that is greater than 1000 (recall from our previous example that the value of such a term is 1597). Since we do not necessarily know in advance how many terms will need to be computed, we use a while loop that checks on the value of the Fibonacci sequence after each iteration and terminates if the current term is greater than 1000.

b01uf043

A.10 Selection and Forking

You might sometimes find that different pieces of your code need to be executed depending on whether specific conditions are satisfied. For example, you might want to set the value of a variable differently depending on whether another variable is positive or negative. if/else statements allow you to accomplish this goal. The syntax for an if/else loop is

   if(condition){
     block of instructions if condition is TRUE
   }else{
     block of instructions if condition is FALSE
   }

As with a while loop, the expression that replaces the placeholder condition must result in a single logical value. Depending on whether condition is TRUE or FALSE, only the top (or bottom) block of instructions will be executed. If an else statement is not included, then no instructions are executed when condition is FALSE.

The following code shows an example of conditional execution:

b01uf044

if/else statements can be particularly useful in conjunction with for and while loops. The function ifelse() is a vectorized version of the if/else, but we will rarely use it in this book.

A.11 Other Things to Keep in Mind

Once you have finished with your work, you can save all of it by using the option Save Workspace File… in the Workspace menu. This will prompt a window where you can type a name for the workspace and select a folder where it will be stored. To load the workspace at a later time, you can either double click on the workspace file or use the option Load Workspace File… in the same Workspace menu.

One of the key features of R is its extendibility. A number of authors have developed groups of specialized functions that are distributed in the form of “packages”. A large number of packages are available from the CRAN website. In this book, we employ the “prob”' package developed by G. Jay Kern at Youngstown State University. To install the package, you can use the Package Installer option of the Packages & Data menu. Alternatively, you can use the install.packages() function from the command line.

b01uf045

In either case, you will see a number of messages associated with the installation appear in the command windows. In most circumstances, you can ignore these messages. Once the package has been installed, you will need to load it at the beginning of every R session by using the library() function:

b01uf046

Failing to load the package before using any of its functions is a common source of errors and confusion. Please do not forget to do so!