Armed with a suitable model of uncertainty, possibly one of those described in Chapter 3, whose parameters have been estimated as illustrated in Chapter 4, we are ready to run a set of Monte Carlo experiments. To do so, we must feed our simulation program with a stream of random variates mimicking uncertainty. Actually, the job has already been done, at least for R users trusting the rich library of random generators that is available. So, why should we bother with the internal working of these generators and learn how random variates are generated? Indeed, we will not dig too deep in sophisticated algorithms, and we will also refrain from treating generators for a too wide class of probability distributions. Still, there are quite good reasons to get a glimpse of the underpinnings of random variate generation. The aims of this chapter are:
The overall structure of a Monte Carlo simulation is described in Section 5.1. Essentially, a Monte Carlo simulator is a function mapping a stream of (pseudo)-random numbers, i.e., a sequence of i.i.d. variables with uniform distribution on the unit interval (0, 1), into a set of estimates of interest. The generation of random numbers is described in Section 5.2. Then, random numbers are transformed in order to create a sample from another, more interesting, probability distribution. We describe a few standard techniques for random variate generation, like the inverse transform method (Section 5.3) and the acceptance–rejection method (Section 5.4). For certain specific distributions, it may be preferable to apply ad hoc methods. The ubiquitous normal distribution is dealt with in Section 5.5, where we cover both the univariate and the multivariate case; a few more ad hoc methods are outlined in Section 5.6. Finally, in Section 5.7 we discuss how to generate jointly distributed variables whose dependence structure is described by a copula. We defer to Chapter 6 the task of generating sample paths of stochastic processes, typically described by stochastic differential equations, since they involve additional issues about the discretization of continuous-time models.
A note on terminology. As we have already hinted at, there is nothing random in a Monte Carlo simulation carried out in R, as it typically involves the generation of pseudorandom numbers, rather than genuinely random ones. We will occasionally stress this point, but, more often than not, we will drop the “pseudo” for the sake of brevity. Indeed, in both output analysis and variance reduction methods, we behave as we were truly dealing with random variates. We will use the term random variate in general but random number to refer to uniform variates only, as they are actually the primary input of a Monte Carlo simulation. Later, we will mostly avoid the term quasi–random numbers when referring to deterministic low-discrepancy sequences that are the subject of Chapter 9, as this name is misleading. Of course, pseudorandom numbers are by necessity deterministic as well, but low-discrepancy sequences are conceived as deterministic.
A typical Monte Carlo simulation can be represented as the sequence of building blocks depicted in Fig. 5.1.
FIGURE 5.1 Structure of a Monte Carlo simulation.
Now we may grasp the link between multidimensional integration and Monte Carlo simulation: The structure of Fig. 5.1 points out quite clearly that a typical Monte Carlo simulation is, at least conceptually, a function mapping a vector of n uniform random numbers into a numeric output. More formally, it is a function g : (0, 1)n → , where the domain is the n-fold Cartesian product of the unit interval,
i.e., the unit hypercube. When estimating an expected value, we are calculating the following integral:
(5.1)
Actually, this view is valid whenever we know exactly the length of the input required to run one Monte Carlo replication, i.e., how many random numbers are needed to get one observation of the output random variable of interest. If so, we know the dimension of the space on which we are integrating. In other cases, as we shall see, we may not really know n. Then, the interpretation as an integral is still valid, but we cannot apply the methods of Chapter 9, which are based on deterministic low-discrepancy sequences of numbers replacing the stream of random numbers. This approach requires an integration over a domain of given dimension, and this in turn enforces a few requirements on how the block sequence of Fig. 5.1 is implemented in detail.
A pseudorandom number generator is a tool to generate a stream of numbers on the unit interval. To be more precise, some generators produce numbers in the [0, 1) interval, i.e., 0 is a possible output, but 1 is not. For some purposes, the value 0 can be troublesome; this happens, e.g., if we have to transform the stream of random numbers by taking a logarithm. In such cases, numbers in the open interval (0, 1) are preferred. In both cases, we will use the notation U ~ U(0, 1) to refer to the uniform distribution on the unit interval.
There is a rich literature on the generation of random numbers, which is often technically challenging. Fortunately, the average user need not be an expert, but some background knowledge is useful in order to use the available generators properly. In Section 5.2.1 we describe the basic textbook approach, i.e., linear congruential generators, and we point out some of their limitations in Section 5.2.2. Then, we outline a fairly general structure for random number generators in Section 5.2.3, and finally, in Section 5.2.4, we illustrate the principles that are the basis of some generators available in R.
The standard textbook method to generate U(0, 1) random numbers is the linear congruential generator (LCG). This approach was actually implemented in many generators included in commercial software, and it is still possible to choose an LCG. However, LCGs are not considered state-of-the-art anymore and should not be used for high-quality Monte Carlo experiments, especially if these are rather extensive and require a huge stream of input numbers. Nevertheless, LCGs are most useful as an introduction to the structure of more sophisticated generators and illustrate well the pitfalls of pseudorandom number generation.
An LCG actually generates a sequence of non-negative integer numbers Zi, which are transformed into uniform random numbers. The sequence of integer numbers starts from a user-selected seed Z0; then, given the previous integer number Zi−1, the next number in the sequence is generated as follows:
(5.2)
where a (the multiplier), c (the shift), and m (the modulus) are properly chosen integer numbers and “mod” denotes the remainder of integer division (e.g., 15 mod 6 = 3). Finally, the integer number Zi is transformed into a uniform number
(5.3)
Since Zi ranges in the set {0, 1, 2, 3, …, m − 1}, Ui ranges in the unit interval [0, 1). Note that 0 is a possible output, but 1 is not. In Fig. 5.2 we display a function implementing an LCG. The function, given a choice of the parameters a, c, and m, as well as a seed, yields a list consisting of two vectors of length N, where USeq contains the uniform random numbers and ZSeq contains the integer numbers.
FIGURE 5.2 Function to generate random numbers by a linear congruential generator.
In order to check the function LCG and learn a couple of fundamental points, we may run the script of Fig. 5.3 and get the following output:
FIGURE 5.3 Script to check the linear congruential generator.
1 6 0.3750 2 1 0.0625 3 8 0.5000 4 11 0.6875 5 10 0.6250 6 5 0.3125 7 12 0.7500 8 15 0.9375 9 14 0.8750 10 9 0.5625 11 0 0.0000 12 3 0.1875 13 2 0.1250 14 13 0.8125 15 4 0.2500 16 7 0.4375 17 6 0.3750 18 1 0.0625 19 8 0.5000 20 11 0.6875
On the basis of this simple run, we may observe what follows:
> set.seed(55555) > runif (5) > [1] 0.9598632 0.6586607 0.3076999 0.9489697 0.1188039 > runif (5) [1] 0.3415344 0.0415116 0.4067968 0.10739569 0.6403764 > set.seed(55555) > runif (5) [1] 0.9598632 0.6586607 0.3076999 0.9489697 0.1188039
The last observation has been one of the reasons behind the development of alternative generators. Nevertheless, a proper choice of a and c ensures that the period is maximized and that the sequence looks random enough. For instance, a generator proposed in [4] features the following parameters:
This LCG has zero shift, which is a rather common choice, as c does not provide too many advantages. Hence, we speak of a multiplicative generator; it is also clear that in such a case we must steer away from zero, as the LCG would get stuck there. But what is “random enough” exactly? There are quite sophisticated testing procedures to check random number generators, which are beyond the scope of this book. Anyway, we point out a few important issues in the next section.
The first requirement for a good uniform random number generator is that it is, rather unsurprisingly, uniform. A quick check of uniformity can be run by evaluating mean and variance of sample sequences. Since we know that, for a random variable U ~ U(0, 1), we have
a comparison is easy to make. A less crude test would check if, by dividing the unit interval into equally spaced bins, the same number of observations falls into each one of them. In other words, we may draw a histogram or run something like a goodness-of-fit test. However, this is not quite enough to tell a good random number generator. To see the point, consider a sequence like
which is obtained by an LCG with a = c = 1. This is certainly uniform and has a maximum period. However, we want i.i.d. variables, where now we stress independence. From this point of view, the choice a = 1 is a very poor one in general, as we get a sequence like
featuring long increasing subsequences.
Given the structure of LCGs, there is no hope of getting a genuinely independent stream, of course; yet, a good LCG should be able to trick statistical testing procedures into “believing” that they produce a sequence of independent observations from the uniform distribution. Without considering sophisticated tests, we can investigate how short sequences of random numbers are distributed in bidimensional or tridimensional spaces. More precisely, imagine that we draw points of the form
on the plane. These points, if the sequence is independent, should be uniformly distributed on the unit square. By the same token, points of the form
should be uniformly distributed on the unit cube. As an illustration, let us try the script in Fig. 5.4.1 The first part of the script uses an LCG with a = 65, c = 1, and m = 2048. We generate the whole sequence of 2048 points, corresponding to the maximum period, and plot points of the form (Ui, Ui+1) in the upper half of Fig. 5.5. If we look at the plot, we observe a fairly good filling of the unit square, which may be interpreted as a sign of good quality. However, if we only draw the first quarter of the sequence, as we do next, a different pattern emerges. The bottom half of Fig. 5.5 shows that the initial portion of the sequence does not cover the unit square uniformly, pointing out a deficiency of the generator.
FIGURE 5.4 Script to illustrate the lattice structure of LCGs.
FIGURE 5.5 Plots obtained by running the script of Fig. 5.4.
The second part of the script, where we set a = 1365, yields much worse results, however. In Fig. 5.6 we observe that points are arranged on parallel lines. This kind of pattern is known as lattice structure and is typical of LCGs. Actually, this also happens in Fig. 5.5. In general, similar structures can be observed in three dimensions when considering points with coordinates of the form (Ui, Ui+1, Ui+2). The problem is more evident here, as very few lines with large gaps are observed.
FIGURE 5.6 Lattice structure in LCGs.
Let us try to understand where the problem comes from, with reference to Fig. 5.6.2 A quick look at the figure suggests that points (Ui, Ui+1) are arranged in lines of the form
for just a few values of the intercept αj. In order to check this, let us investigate what happens in terms of the corresponding integer numbers Zi and Zi+1, by noting that
implies
for some integer k. Hence, with the choice a = 1365, c = 1, and m = 2048, we find
Now, if we divide by m = 2048 in order to get back to the uniform numbers, we notice that 3 × 1365 + 1 = 4096, an integer multiple of the modulus. Indeed, this is what makes this case peculiar: the right-hand side of the previous equation, when divided by 2048, turns out to be the intercept αj:
where βj is integer. Therefore, we find just a few lines of the form of Eq. (5.4). To see how many, notice that, since Ui, Ui+1 [0, 1), we have
suggesting that at most four parallel lines may intersect the unit square. Indeed, a look at Fig. 5.6 shows that there are four such lines, but one seems to boil down to an isolated point close to the origin. Using R, it is easy to check that we get back to the initial integer seed Z0 = 0 after Z2047 = 3, since (3 × 1365 + 1) mod 2048 = 0. Therefore, we obtain
which corresponds to a line with βj = 0, almost passing through the origin.
Having pointed out some issues with LCGs, we should also mention that they do have some advantages as well:
The last point deserves some comment. By “separate streams” we mean different sequences of “independent” random numbers that can either be assigned to model different sources of randomness in a simulation3 or to different CPUs when running parallel Monte Carlo. We could just assign a different seed to each stream, but with a naive choice there is a danger that they overlap, impairing the independence requirement. With a multiplicative LCG, i.e., when the increment c = 0, is easy to skip ahead in the sequence in order to partition the whole sequence in nonoverlapping subsequences:
So, skipping ahead by k steps just requires computing ak mod m once. All of these requirements should also be met by alternative and more sophisticated generators.
In the previous section we have shown that LCGs may have several limitations, and that careful testing is needed in order to select a suitable one. Indeed, LCGs were state of the art in the past, but they have been replaced by alternative generators. Nevertheless, all of these generators share the same conceptual structure as LCGs, in which we have:
In the simple LCG case, the states are just integer numbers and the output space is the unit interval. More recent generators use more complex state spaces. We cannot cover all of them, but we do point out a couple of ideas that have actually been implemented in R.
One possibility is to devise combined generators. The idea is to combine multiple generators that, taken individually, would have quite some weaknesses, but improve a lot when combined. One such generator is the Wichman–Hill generator, which combines three LCGs. For instance:
We see that S consists of three integer numbers. The output function is
This choice results in much longer periods, and other generators along this line have been proposed. It should be noted that an advantage of a combined generator over an LCG with large modulus m is that we avoid numerical issues with potential overflows.
An alternative approach is to use a richer state space S, as in multiple recursive generators:
A class of generators, called Tausworthe generators or linear feedback shift registers, implements this idea with m = 2. In other words, the state consists of bits, a choice that takes advantage of the binary operations of a computer. The output function maps a sequence of bits into a number in the unit interval:
where s and L are positive integers playing the role of a step size and a word length, respectively. A natural choice for L is 32 or 64, depending on the word length of the machine. The advantage of exploiting the binary arithmetic of hardware, where operations modulo 2 correspond to a XOR (exclusive OR) operation is obvious.4 Actually, variants of this scheme are used. Generalized feedback shift registers (GFSR) use k vectors Xi = (xi,1, …, xi,L) of L bits, rather than single bits:
Then, we may use an output function like
One very common generator, the Mersenne twister, is obtained by adding a further tweak, i.e., by interpreting a GFSR in matrix terms:
where Yi is a vector of kL bits and A is a (kL) × (kL) matrix. In the case of the GFSR of Eq. (5.5), we have
where IL is the L × L identity matrix. Clearly, the upper part of the matrix implements a shift, and the new bit word is generated by the last line. Twisting is obtained by using a different last line in matrix A. A version of Mersenne twister is widely used and, at the time of writing, it is the default generator in R. This version uses k = 624 and has the huge period
Incidentally, the name stems from the concept of Mersenne numbers, i.e., numbers of the form 2p − 1; for certain values of p, a Mersenne number is also a prime number. Indeed, both p = 19,937 and the corresponding Mersenne number are prime.
Random numbers can be generated in R by the function runif, which generates a vector of given size:
> set.seed(55555) > runif(5) [1] 0.9598632 0.6586607 0.3076999 0.9489697 0.1188039
Notice the use of set.seed to set the seed (better said, the state) of the generator. The function .Random.seed can be used to inspect the current state of the generator. Its value depends on the kind of generator, which can be selected by the function RNGkind by setting the input parameter kind; possible values of this parameter are wichmann–Hill and Mersenne–Twister, the default.
The following snapshot shows that, with the default setting, the state of the generator is an integer vector consisting of 626 entries, which can be reset by passing a single number to set.seed:
> mode(.Random.seed) [1] “numeric” > class(.Random.seed) [1] “integer” > length(.Random.seed) [1] 626 > set.seed(55555) > cat(.Random.seed[1:6],“\n”) 403 624 −734508337 397372100 1252553461 −911245518 > runif(3) [1] 0.9598632 0.6586607 0.3076999 > cat(.Random.seed[1:6],“\n”) 403 3 −1423130422 −543017428 −426382801 −1354865947 > set.seed(55555) > cat(.Random.seed[1:6],“\n”) 403 624 −734508337 397372100 1252553461 −911245518
This corresponds to the state of a Mersenne twister, plus some additional information. The next snapshot shows how to change the generator from Mersenne twister to Wichmann–Hill, as well as the effect on the state:
> RNGkind(kind=“Mersenne–Twister”) > set.seed(55555) > cat(.Random.seed[1:6],“\n”) 403 624 −734508337 397372100 1252553461 −911245518 > RNGkind(kind=“Wichmann-Hill”) > set.seed(55555) > cat(.Random.seed,“\n”) 400 15266 22906 19508 > RNGkind(kind=“default”) > set.seed(55555) > cat(.Random.seed[1:6],“\n”) 403 624 −734508337 397372100 1252553461 −911245518
Indeed, the state of a combined Wichmann-Hill generator consists of three integer numbers.
Suppose that we are given the CDF FX(x) = P{X ≤ x}, and that we want to generate random variates distributed according to FX(·). If we are able to invert FX(·) easily, then we may apply the following inverse transform method:
It is easy to see that the random variate X generated by this method is actually characterized by the CDF FX (·):
where we have used the monotonicity of FX and the fact that U is uniformly distributed.5
A typical distribution which can be easily simulated by the inverse transform method is the exponential distribution. If X ~ exp(μ), where 1/μ, is the expected value of X, its CDF is
A direct application of the inverse transform method yields
Since the distributions of U and (1 − U) are actually the same, it is customary to generate exponential variates by drawing a random number U and by returning − log(U)/μ. Generating exponential random variates is also one way to simulate a Poisson process, as we shall see in Chapter 6.
The inverse transform method is quite simple conceptually; however, we may not apply it when FX is not invertible, as it happens with discrete distributions. In this case the cumulative distribution function is piecewise constant, with jumps located where the probability mass is concentrated, i.e., for possible values of the discrete random variable. Nevertheless, we may adapt the method with a little thought. Consider a discrete empirical distribution with a finite support:
Then we should generate a uniform random variate U and return X as
It may be instructive to see how this idea may be implemented in a simple way (not the most efficient one, however). Suppose that we have a distribution defined by probabilities
for the values 1, 2, 3, 4, 5. First, we find the cumulative probabilities,
and then we draw a uniform random number, say U = 0.82. For each cumulative probability P, we check if U > P, which yields the vector
where 1 corresponds to “true” and 0 to “false.” To select the correct value to return, we must sum the ones in this vector (the total is 3 here) and add 1; in this case we should return the value 4. Using R, this may be accomplished by working directly on vectors, as shown in Fig. 5.7. The code also includes a script to check the results by plotting a histogram, shown in Fig. 5.8, for the specific numerical example that we are considering.
FIGURE 5.7 Sampling from an empirical discrete distribution.
FIGURE 5.8 Histogram produced by calling EmpiricalDrnd.
Sometimes, the CDF is invertible in principle, but this is quite costly computationally. In such a case, one possibility is to resort to the acceptance–rejection method we describe next.
Suppose that we must generate random variates according to a PDF fX(x), and that the difficulty in inverting the corresponding CDF makes the inverse transform method unattractive. To see a concrete example, consider the beta distribution with PDF
for parameters α1, α2 > 1, where
is the beta function needed for normalization.6 To obtain the CDF we should integrate the PDF, resulting in a function that requires a costly numerical inversion by solving a nonlinear equation.
Assume that we can find a function t(x) such that
where I is the support of fX(·). The function t(·) is not a probability density, but the related function g(x) = t(x)/c is, provided that we choose
If the distribution g(·) is easy to sample from, it can be shown that the following acceptance–rejection method generates a random variate X distributed according to the density fX(·):
The PDF fX(·) is called the target density, and the “easy” density g(·) is called the instrumental or candidate density. If the support I is bounded, a natural choice of instrumental density is the uniform distribution on I, and we may set
Before proving the correctness of the method, let us get an intuitive grasp from Fig. 5.9, where we show a target PDF much like a beta distribution, whose support is the unit interval (0, 1). The upper bounding function t(·) is a constant corresponding to the maximum of the target density. Hence, the instrumental density is uniform, and the variables Y spread evenly over the unit interval. Consider point A; since fX(A) is close to t(A), A is likely to be accepted, as the ratio fX(A)/t(A) is close to 1. When we consider point B, where the value of the density fX(·) is small, we see that the ratio fX(B)/t(B) is small; hence, B is unlikely to be accepted, which is what we would expect. It can also be shown that the average number of iterations to terminate the procedure with an accepted value is c.
FIGURE 5.9 Graphical illustration of the acceptance–rejection method.
Consider the density
The reader is urged to verify that this is indeed a density (actually, it is the beta density with α1 = α2 = 3). Since the PDF is a fourth-degree polynomial, the CDF is a fifth-degree polynomial. If we apply the inverse transform method, we have to invert such a polynomial at each generation; this inconvenience suggests the application of the acceptance–rejection method. By straightforward calculus we see that
for x* = 0.5. We may check this result by running the first lines of the script in Fig. 5.10:
FIGURE 5.10 Script to check acceptance–rejection for Example 5.2.
> f <- function (x) dbeta(x,shape1=3,shape2=3) > out <- optimize(f,interval=c(0,1),maximum=TRUE) > out $maximum [1] 0.5 $objective [1] 1.875
Using the uniform density as the easy instrumental density g(·), we obtain the following algorithm:
In the script of Fig. 5.10 we also draw the points generated according to this scheme, with coordinates U1 and U2; this produces the plot in Fig. 5.11, where rejected points are depicted as empty circles, and accepted points as bullets. We observe that a larger fraction of points is accepted when the PDF is larger, as intuition requires. The average number of iterations to generate one random variate is .
FIGURE 5.11 The plot produced by the R script of Fig. 5.10.
Now, let us prove:
The plot of Fig. 5.11 is enlightening, as it shows that what we are doing is based on a common trick in mathematics: embedding a problem into a higher-dimensional space.7 In fact, our task is to generate a single random variable X according to the PDF fX(·), but this density may also be interpreted as
i.e., as a marginal density within the joint distribution of two random variables X and U, umformly distributed on the set {(x, u) | 0 ≤ u ≤ fX(x)}, i.e.,
Thus, sampling in the one-dimensional case is equivalent to sampling the bidimensional distribution. The argument applies to the case in which the upper bounding function t(·) is a constant, as in Example 5.2, so that g(·) boils down to a uniform density and the larger domain is just a box. To generalize the idea to an arbitrary instrumental density, let us calculate the probability , for an arbitrary set8
, when X is generated by the acceptance–rejection scheme:
Note that X is given by Y conditional on the event “accept”, which is equivalent to {U ≤ fX(Y)}. Let X be the support of fX(·). We can write:
This shows that the density of the accepted Y is indeed fX(·). Furthermore, we also see that the probability of the event “accept” is
However, since the iterations in acceptance–rejection are independent and we stop at the first success, the number of iterations follows a geometric distribution with parameter p = 1/c. Since a geometric random variable with parameter p has expected value 1/p,9 it follows that the expected number of iterations to accept a proposed value is c.
We also observe that:
The second point implies that we do not know a priori the dimension of space on which we are integrating when carrying out Monte Carlo simulation. This does not prevent the application of Monte Carlo sampling, but it is a stumbling block when we use low-discrepancy sequences, as we shall see in Chapter 9. It is also worth mentioning that rejection sampling is not restricted to univariate distributions, and it is can also be applied to sample from a PDF which is not completely known and is given up to a normalization constant. This is quite relevant in computational Bayesian statistics, as we shall see in Chapter 14.
Sampling from a normal distribution is a common need in financial applications. Recall first that if X ~ N(0, 1), then μ + σX ~ N(μ, σ2); hence, we just need a method for generating a stream of independent standard normal variables. If correlation needs to be induced, this can be easily accomplished as we shall explain. Over the years, a number of methods have been proposed to generate standard normal variates. Some are obsolete, like the idea of summing a suitably large number of uniform variates, taking advantage of the central limit theorem. The inverse transform method was not considered efficient in the past, but it is quite practical by now, as efficient approximations have been developed to invert the CDF of the standard normal. In between, there is a set of ad hoc methods, and some are implemented in the R function rnorm.
The general-purpose inverse transform methods may be applied by inverting the standard normal CDF. In R, this is accomplished, for a generic normal, by the function call
x <- qnorm(p, mu, sigma)
that returns the quantile at probability level p of a variable with expected value mu and standard deviation sigma. The following snapshot shows that there is little difference between calling rnorm or generating uniform variates and then inverting the normal CDF:
> system.time(qnorm(runif(50000000) ) ) user system elapsed 5.74 0.16 5.89 > system.time(rnorm(50000000)) user system elapsed 5.80 0.05 5.85
Indeed, inversion of the CDF is currently the default method in R. Nevertheless, there are faster ad hoc methods that can be selected as follows:
> set.seed(55555,normal.kind=“Ahrens-Dieter”) > system.time(rnorm(50000000)) user system elapsed 3.43 0.05 3.48
In the following, we will not cover the generation of normal variates too extensively, but it is nevertheless instructive to get a glimpse of the ingenuity that may be employed to come up with ad hoc methods, and the dangers of bad random variate generators. A well-known approach is the Box–Muller method, which takes advantage of the representation of points on the plane by polar coordinates. Consider two independent variables X, Y ~ N(0, 1), and let (R, θ) be the polar coordinates of the point of Cartesian coordinates (X, Y) in the plane, so that
Given independence, the joint density of X and Y is
The last expression looks like a product of an exponential density for d and a uniform distribution; the term 1/2π may be interpreted as the uniform distribution for the angle θ (0, 2π). However, we are missing some constant term in order to obtain the exponential density. In fact, to properly express the density in terms of (d, θ), we should take the Jacobian of the transformation from (x, y) to (d, θ) into account.10 Some calculations yield
and the correct density in the alternative coordinates is
Hence, we may generate R2 as an exponential variable with mean 2, θ as a uniformly distributed angle, and then transform back into Cartesian coordinates to obtain two independent standard normal variates. The Box–Muller algorithm may be implemented as follows:
In practice, this algorithm may be improved by avoiding the costly evaluation of trigonometric functions and integrating the Box–Muller approach with the rejection approach. The idea results in the following polar rejection method:
We refer the reader to [9, Section 5.3] for a justification of the polar rejection method.
We have seen that LCGs may exhibit a lattice structure. Since the Box–Muller transformation is nonlinear, one might wonder if the composition of these two features may yield weird effects. We may check this in a somewhat peculiar case (see [7]), using the R script in Fig. 5.12. The script generates 2046 uniform random numbers for a sequence with modulus m = 2048; we discard the last pair, because the generator has maximum period and reverts back to the seed, which is 0 and causes trouble with the logarithm. Vectors U1 and U2 contain odd- and even-numbered random numbers in the sequence. The first part of the resulting plot, displayed in Fig. 5.13, shows poor coverage of the plane. The second part shows that swapping the pairs of random numbers may have a significant effect, whereas with truly random numbers the swap should be irrelevant. Of course, using better LCGs, or better random number generators altogether prevents pathological behavior like this. Nevertheless, it may be preferable to use the inverse transform method to avoid undesirable effects.
FIGURE 5.12 Script to check the Box–Muller approach.
FIGURE 5.13 Effect of swapping random numbers in the Box–Muller transformation.
In many financial applications one has to generate variates according to a multivariate normal distribution with (vector) expected value μ and covariance matrix Σ. This task may be accomplished by finding the Cholesky factor for Σ, i.e., an upper triangular11 matrix U such that Then we apply the following algorithm:
A rough code to simulate multivariate normal variables is illustrated in Fig. 5.14. The code builds a matrix, where columns correspond to variables and rows to observations, as usual. Assume that we have the following parameters:
FIGURE 5.14 Code to sample from a multivariate normal distribution.
> Sigma <- matrix (c (4, 1, −2, 1, 3, 1, −2, 1, 5), nrow=3) > mu <- c(8,6,10) > eigen(Sigma)$values [1] 6.571201 4.143277 1.285521
Note that we make sure that the matrix Σ is positive definite, as it should be, by checking its eigenvalues with the function eigen. Now we may generate a sample and verify the results:
> set.seed(55555) > Z <- MultiNormrnd(mu,Sigma,10000) > colMeans(Z) [1] 7.995686 5.981794 9.987316 > cov(Z) [,1] [,2] [,3] [1,] 3.9930313 0.9789963 −2.003760 [2,] 0.9789963 2.9525912 1.034064 [3,] −2.0037602 1.0340636 5.021224
We should note that Cholesky decomposition is not the only one that we can use.12 An alternative is the square root matrix Σ1/2, i.e., a symmetric matrix such that
One way to find a symmetric square root is the following, taking advantage of matrix diagonalization:
> eig <- eigen(Sigma) > V <- eig$vectors > V %*% diag(eig$values) %*% t(V) [,1] [,2] [,3] [1,] 4 1 −2 [2,] 1 3 1 [3,] −2 1 5 > sqrtSigma <- V %*% diag(sqrt(eig$values)) %*% t(V) > sqrtSigma %*% sqrtSigma [,1] [,2] [,3] [1,] 4 1 −2 [2,] 1 3 1 [3,] −2 1 5
A handier alternative is the function sqrtm from package expm, as we have seen in Section 3.3.4. Anyway, with R there is no need to carry out that amount of work by ourselves, as we can rely on the mvrnorm function from the MASS package:
> library(MASS) > set.seed(55555) > Z <- mvrnorm(n=10000,mu=mu,Sigma=Sigma) > colMeans(Z) [1] 7.967968 5.993851 10.001322 > cov(Z) [,1] [,2] [,3] [1,] 3.9914023 0.9278669 −2.083500 [2,] 0.9278669 2.9171899 1.014537 [3,] −2.0835005 1.0145372 5.089220
The mvtnorm package offers an alternative function, rmvnorm, for this purpose.
Many ad hoc methods rely on relationships among distributions.13 For instance, we know that an Erlang variable Erl(n, λ) is the sum of n independent exponentials Xi with rate λ. Hence, we may use the inverse transform method as follows:
By a similar token, we could take advantage of the fact that a chi-square variable χ2n with n degrees of freedom is obtained by squaring and summing n independent standard normals. However, we have also seen that these distributions may be regarded as specific subcases of the gamma distribution. We recall the PDF of a Gamma(α, λ) random variable:
(5.6)
where α is a shape parameter and λ is a scale parameter. Clearly, there is a notable difference between the cases α < 1 and α ≥ 1. Let us focus on the latter one. Different algorithms have been proposed for such a gamma variable. The following one, which we describe for illustrative purposes only, is based on acceptance–rejection:
The example also points out the fact that the boundary between ad hoc and general methods is a bit blurred, since we are using acceptance–rejection, but in a very peculiar way. Luckily, we may just rely on the rgamma function in R!
Another interesting case is the multivariate t distribution tν(μ, Σ) (see Section 3.3.4). We may take advantage of the fact that a t variable is related to the ratio of a standard normal and a chi-square, which in turn is a special case of the gamma. To induce correlation, we may use Cholesky factors of the covariance matrix, just like in the case of the normal.
To deal with the multivariate t distribution, we may use R functions from the package mvtnorm, such as rmvt, which has been introduced in Section 3.3.4.
Given a copula C(u1, …, un) and marginal CDFs Fi(·), i = 1, …, n, we may generate a vector X of n dependent variables with those marginals by the following scheme:
As an instructive example, say that we want to generate a sample of n = 10,000 points (X1, X2), where X1 ~ gamma(2, 1) and X2 ~ N(0, 1), and the dependence is described by the Student’s t copula
where we denote by Tν,Σ(···) the CDF of a multivariate t distribution with v degrees of freedom and by Tν(·) the CDF of the corresponding univariate distribution; note that the covariance matrix Σ is actually a correlation matrix, in the sense that its diagonal entries are set to 1. To sample from the copula, we need not invert the multidimensional CDF. We rely on the extension of the following property for univariate distribution:14 If X has CDF FX(·), then FX(X) ~ U(0, 1). Thus, we sample Y ~ tν(0, Σ) from the multidimensional t distribution, as we have seen in the previous section, and then we compute U = (Tν(Y1), …, Tν(Yn)). The vector U takes values in the unit hypercube and has uniform marginals, thanks to the aforementioned property; however, its components reflect the dependence structure associated with the copula. Finally, we generate X by inverting the marginals. The code in Fig. 5.15 accomplishes all of this when ν = 10 and
FIGURE 5.15 Code to sample from a multivariate t copula with gamma and normal marginals.
and it also produces the scatterplot and marginal histograms displayed in Fig. 5.16. Note the nonlinear association between the two variables, which cannot be accounted for by Pearson’s correlation. We urge the reader to play with the code in order to see the effect of changing the correlation in the copula.
FIGURE 5.16 Scatterplot and histograms of the marginals for a bidimensional t copula.
All of the above can also be performed using the copula package. With the code in Fig. 5.17 we accomplish two tasks: First we sample from a given copula; then, given the sample, we fit a copula to see if we can get back the copula. The details are as follows:
FIGURE 5.17 Using the package copula to create, sample from, and fit a t copula.
FIGURE 5.18 Transforming observation to fit a copula.
Running the code, we obtain the following fitted copula, which is in good agreement with the one we have used to sample data:
> fitted.copula fitCopula () estimation based on ‘maximum pseudo-likelihood’ and a sample of size 10000. Estimate Std. Error z value Pr(>|z|) rho. 1 0.805305 0.005384 149.6 <2e-16 *** df 9.161193 NA NA NA — Signif.codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 The maximized loglikelihood is 5277 Optimization converged Number of loglikelihood evaluations: function gradient 34 11
1 J.H. Ahrens and U. Dieter. Computer methods for sampling from gamma, beta, Poisson and binomial distributions. Computing, 12:223–246, 1974.
2 D.P. Kroese, T. Taimre, and Z.I. Botev. Handbook of Monte Carlo Methods. Wiley, Hoboken, NJ, 2011.
3 A.M. Law and W.D. Kelton. Simulation Modeling and Analysis (3rd ed.). McGraw-Hill, New York, 1999.
4 P. L’Ecuyer. Efficient and portable combined random number generators. Communications of the ACM, 31:742–749, 1988.
5 C. Lemieux. Monte Carlo and Quasi–Monte Carlo Sampling. Springer, New York, 2009.
6 B.D. Ripley. The lattice structure of pseudo-random number generators. Proceedings of the Royal Society of London; Series A, Mathematical and Physical Sciences, 389:197–204, 1983.
7 B.D. Ripley. Stochastic Simulation. Wiley, New York, 1987.
8 C.P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, New York, 2004.
9 S. Ross. Simulation (2nd ed.). Academic Press, San Diego, CA, 1997.
10 S. Ross. Introduction to Probability Models (8th ed.). Academic Press, San Diego, CA, 2002.
11 S. Tezuka. Uniform Random Numbers: Theory and Practice. Kluwer Academic Publishers, Boston, 1995.
1These examples are taken from [7, pp. 22–25].
2See, e.g., [6] for a deeper investigation.
3 This may be useful to apply certain variance reduction strategies.
4 The exclusive OR, denoted as ⊕, is defined as follows: (1 ⊕ 0) = (0 ⊕ 1) = 1; (0 ⊕ 0) = (1 ⊕ 1) = 1. In the inclusive OR case, denoted as V, we have (1 V 1) = 1.
5 The reader may appreciate the similarity between this line of reasoning and that of Eq. (3.40), where we showed that if X has CDF FX(·), then FX(X) is a uniform random variable.
6 See Section 3.2.4. The definition of the beta distribution, per se, does not require the above condition on its parameters; however, the PDF would have a vertical asymptote for either α1 < 1 or α2 < 1, which would prevent the application of acceptance–rejection.
7 The treatment here follows [8, pp. 47–51].
8 Strictly speaking, the set should be measurable.
9 See Section 3.2.1 and Eq. (3.14).
10 See, e.g., [10] for details.
11 In many numerical analysis textbooks, Cholesky decomposition is expressed as A = LL, for a lower triangular matrix L; R returns an upper triangular matrix U, but the two forms are clearly equivalent.
12 More on this in Section 13.4.
13 See Sections 3.2.2, 3.2.3, and 3.2.5.