Chapter Five: Random Variate Generation

Armed with a suitable model of uncertainty, possibly one of those described in Chapter 3, whose parameters have been estimated as illustrated in Chapter 4, we are ready to run a set of Monte Carlo experiments. To do so, we must feed our simulation program with a stream of random variates mimicking uncertainty. Actually, the job has already been done, at least for R users trusting the rich library of random generators that is available. So, why should we bother with the internal working of these generators and learn how random variates are generated? Indeed, we will not dig too deep in sophisticated algorithms, and we will also refrain from treating generators for a too wide class of probability distributions. Still, there are quite good reasons to get a glimpse of the underpinnings of random variate generation. The aims of this chapter are:

To gain an understanding of the different options that R offers: for instance, there are alternative generators that can be set and controlled with the RNGkind function.

To understand how to manage seeds of random generators, which is essential for debugging and controlling experiments.

To have an idea of what may go wrong with bad generators that have been proposed in the past and may still be around.

To pave the way for some variance reduction strategies, to be described in Chapter 8.

To better understand the link between Monte Carlo simulation and integration, which in turn is essential to apply alternative approaches based on the low-discrepancy sequences dealt with in Chapter 9.

A bit of knowledge about random variate generation by acceptance–rejection is also useful to better appreciate certain methods for computational Bayesian statistics, like Markov chain Monte Carlo; see Chapter 14.

The overall structure of a Monte Carlo simulation is described in Section 5.1. Essentially, a Monte Carlo simulator is a function mapping a stream of (pseudo)-random numbers, i.e., a sequence of i.i.d. variables with uniform distribution on the unit interval (0, 1), into a set of estimates of interest. The generation of random numbers is described in Section 5.2. Then, random numbers are transformed in order to create a sample from another, more interesting, probability distribution. We describe a few standard techniques for random variate generation, like the inverse transform method (Section 5.3) and the acceptance–rejection method (Section 5.4). For certain specific distributions, it may be preferable to apply ad hoc methods. The ubiquitous normal distribution is dealt with in Section 5.5, where we cover both the univariate and the multivariate case; a few more ad hoc methods are outlined in Section 5.6. Finally, in Section 5.7 we discuss how to generate jointly distributed variables whose dependence structure is described by a copula. We defer to Chapter 6 the task of generating sample paths of stochastic processes, typically described by stochastic differential equations, since they involve additional issues about the discretization of continuous-time models.

A note on terminology. As we have already hinted at, there is nothing random in a Monte Carlo simulation carried out in R, as it typically involves the generation of pseudorandom numbers, rather than genuinely random ones. We will occasionally stress this point, but, more often than not, we will drop the “pseudo” for the sake of brevity. Indeed, in both output analysis and variance reduction methods, we behave as we were truly dealing with random variates. We will use the term random variate in general but random number to refer to uniform variates only, as they are actually the primary input of a Monte Carlo simulation. Later, we will mostly avoid the term quasi–random numbers when referring to deterministic low-discrepancy sequences that are the subject of Chapter 9, as this name is misleading. Of course, pseudorandom numbers are by necessity deterministic as well, but low-discrepancy sequences are conceived as deterministic.

5.1 The structure of a Monte Carlo simulation

A typical Monte Carlo simulation can be represented as the sequence of building blocks depicted in Fig. 5.1.

1. The starting point is a stream of random numbers {U_i}, i.e., a sequence of i.i.d. variables uniformly distributed on the unit interval (0, 1). Clearly, unless we resort to not quite practical physical random number generators, we must rely on an algorithm that actually generates pseudo-random numbers. The reasons for doing so are:

Efficiency: To run a Monte Carlo simulation we may need a huge set of random numbers, and an efficient computer algorithm is the only way to achieve that, as physical generators are slower.

Repeatability: We may need to repeat the experiment, and program debugging would turn into a nightmare in the case of true randomness. In principle, we could generate random numbers by a physical device and then store them, but this would not be practical for very extensive Monte Carlo runs.

2. We do have some good strategies to emulate randomness, which are limited to uniform variates. Hence, even if we need, say, normal random variables, we have to use uniform random numbers as the primary input and then resort to some transformation to obtain more relevant distributions. In the second step, using an array of such methods, we transform the stream of uniform random numbers into a sequence of random variates with the desired distribution, resulting in a stream {X_j}. We use a different subscript, j rather than i, to stress the fact that more than one random number might be needed to generate one realization of the random variate of interest, as is the case with acceptance–rejection strategies described in the following. Furthermore, the random variate we need may be multidimensional, as in the case of a multivariate normal.

3. The third step is the actual simulation model, where domain-specific dynamics are represented and used to generate a sample path. It may be the case that a sample path consists of the draw of a single random variate, but in general we may need to generate a whole sample path of a possibly complicated stochastic process. Unlike the previous steps, this is highly problem dependent. For instance, we may generate a sample path of stock prices or interest rates and, finally, the payoff of an exotic option.

4. In the fourth and last step, all of the outcomes are collected in order to come up with an estimate, typically in the form of both a point estimate and a confidence interval. It is important, when applying standard inferential statistics methods, that the outcomes form a sample of i.i.d. random variables (see Chapter 7).

Now we may grasp the link between multidimensional integration and Monte Carlo simulation: The structure of Fig. 5.1 points out quite clearly that a typical Monte Carlo simulation is, at least conceptually, a function mapping a vector of n uniform random numbers into a numeric output. More formally, it is a function g : (0, 1)ⁿ →

, where the domain is the n-fold Cartesian product of the unit interval,

i.e., the unit hypercube. When estimating an expected value, we are calculating the following integral:

Actually, this view is valid whenever we know exactly the length of the input required to run one Monte Carlo replication, i.e., how many random numbers are needed to get one observation of the output random variable of interest. If so, we know the dimension of the space on which we are integrating. In other cases, as we shall see, we may not really know n. Then, the interpretation as an integral is still valid, but we cannot apply the methods of Chapter 9, which are based on deterministic low-discrepancy sequences of numbers replacing the stream of random numbers. This approach requires an integration over a domain of given dimension, and this in turn enforces a few requirements on how the block sequence of Fig. 5.1 is implemented in detail.

5.2 Generating pseudorandom numbers

A pseudorandom number generator is a tool to generate a stream of numbers on the unit interval. To be more precise, some generators produce numbers in the [0, 1) interval, i.e., 0 is a possible output, but 1 is not. For some purposes, the value 0 can be troublesome; this happens, e.g., if we have to transform the stream of random numbers by taking a logarithm. In such cases, numbers in the open interval (0, 1) are preferred. In both cases, we will use the notation U ~ U(0, 1) to refer to the uniform distribution on the unit interval.

There is a rich literature on the generation of random numbers, which is often technically challenging. Fortunately, the average user need not be an expert, but some background knowledge is useful in order to use the available generators properly. In Section 5.2.1 we describe the basic textbook approach, i.e., linear congruential generators, and we point out some of their limitations in Section 5.2.2. Then, we outline a fairly general structure for random number generators in Section 5.2.3, and finally, in Section 5.2.4, we illustrate the principles that are the basis of some generators available in R.

5.2.1 LINEAR CONGRUENTIAL GENERATORS

The standard textbook method to generate U(0, 1) random numbers is the linear congruential generator (LCG). This approach was actually implemented in many generators included in commercial software, and it is still possible to choose an LCG. However, LCGs are not considered state-of-the-art anymore and should not be used for high-quality Monte Carlo experiments, especially if these are rather extensive and require a huge stream of input numbers. Nevertheless, LCGs are most useful as an introduction to the structure of more sophisticated generators and illustrate well the pitfalls of pseudorandom number generation.

An LCG actually generates a sequence of non-negative integer numbers Z_i, which are transformed into uniform random numbers. The sequence of integer numbers starts from a user-selected seed Z₀; then, given the previous integer number Z_i−1, the next number in the sequence is generated as follows:

where a (the multiplier), c (the shift), and m (the modulus) are properly chosen integer numbers and “mod” denotes the remainder of integer division (e.g., 15 mod 6 = 3). Finally, the integer number Z_i is transformed into a uniform number

Since Z_i ranges in the set {0, 1, 2, 3, …, m − 1}, U_i ranges in the unit interval [0, 1). Note that 0 is a possible output, but 1 is not. In Fig. 5.2 we display a function implementing an LCG. The function, given a choice of the parameters a, c, and m, as well as a seed, yields a list consisting of two vectors of length N, where USeq contains the uniform random numbers and ZSeq contains the integer numbers.

In order to check the function LCG and learn a couple of fundamental points, we may run the script of Fig. 5.3 and get the following output:

It is clear that there is nothing random in the sequence generated by an LCG. The sequence is built from an initial number Z₀, the seed of the sequence, which is the initial state of the generator. By starting the sequence from the same seed, we will always get the same sequence. We see quite clearly why we should speak of pseudorandom numbers. We may also check this point in R, which generates random numbers with the function runif. Actually, runif does not use an LCG (by default), but we may control the seed (better said, the initial state of the generator) as follows:

> set.seed(55555)
> runif (5)
> [1] 0.9598632 0.6586607 0.3076999 0.9489697 0.1188039
> runif (5)
[1] 0.3415344 0.0415116 0.4067968 0.10739569 0.6403764
> set.seed(55555)
> runif (5)
[1] 0.9598632 0.6586607 0.3076999 0.9489697 0.1188039

The function set.seed receives an integer number and sets the initial state of the generator accordingly; we will see later what this state actually is when using more sophisticated generators.

Since we take a ratio of integer numbers, an LCG actually generates rational numbers in the set

equation

rather than real numbers. This may look like a significant limitation, but on second thought we should realize that whatever we do on a computer is based on rational numbers, as floating-point numbers are represented, and possibly truncated, by a finite number of bits. In practice, this is not a serious problem, provided that the modulus m is large enough.

A less obvious point is that the generator is necessarily periodic. As we have already observed, we may generate at most m distinct integer numbers Z_i in the range from 0 to m − 1, and whenever we repeat a previously generated number, the sequence will be repeated as well (which is not very random at all!). A careful look at the previous output shows that, having started with the initial seed Z₀ = 7, we get back there after 16 steps. Of course, an LCG with a period of 16 is not acceptable for any practical purpose; yet, this is not too bad, as 16 is the maximum possible period for a modulus m = 16. We do much worse if we select a = 11, c = 5, and m = 16. In this case, starting from Z₀ = 3, we get the following sequence of integer numbers Z_i:

equation

which has half the maximal period. Thus, since the maximum possible period is m, we should choose a large modulus in order to have a large period. Indeed, a common choice for generators proposed in the literature was

equation

This may look like a very large period, but with faster and faster hardware we run longer and longer simulations, and the above period may not be good enough.

The last observation has been one of the reasons behind the development of alternative generators. Nevertheless, a proper choice of a and c ensures that the period is maximized and that the sequence looks random enough. For instance, a generator proposed in [4] features the following parameters:

This LCG has zero shift, which is a rather common choice, as c does not provide too many advantages. Hence, we speak of a multiplicative generator; it is also clear that in such a case we must steer away from zero, as the LCG would get stuck there. But what is “random enough” exactly? There are quite sophisticated testing procedures to check random number generators, which are beyond the scope of this book. Anyway, we point out a few important issues in the next section.

5.2.2 DESIRABLE PROPERTIES OF RANDOM NUMBER GENERATORS

The first requirement for a good uniform random number generator is that it is, rather unsurprisingly, uniform. A quick check of uniformity can be run by evaluating mean and variance of sample sequences. Since we know that, for a random variable U ~ U(0, 1), we have

a comparison is easy to make. A less crude test would check if, by dividing the unit interval into equally spaced bins, the same number of observations falls into each one of them. In other words, we may draw a histogram or run something like a goodness-of-fit test. However, this is not quite enough to tell a good random number generator. To see the point, consider a sequence like

which is obtained by an LCG with a = c = 1. This is certainly uniform and has a maximum period. However, we want i.i.d. variables, where now we stress independence. From this point of view, the choice a = 1 is a very poor one in general, as we get a sequence like

Given the structure of LCGs, there is no hope of getting a genuinely independent stream, of course; yet, a good LCG should be able to trick statistical testing procedures into “believing” that they produce a sequence of independent observations from the uniform distribution. Without considering sophisticated tests, we can investigate how short sequences of random numbers are distributed in bidimensional or tridimensional spaces. More precisely, imagine that we draw points of the form

on the plane. These points, if the sequence is independent, should be uniformly distributed on the unit square. By the same token, points of the form

should be uniformly distributed on the unit cube. As an illustration, let us try the script in Fig. 5.4.¹ The first part of the script uses an LCG with a = 65, c = 1, and m = 2048. We generate the whole sequence of 2048 points, corresponding to the maximum period, and plot points of the form (U_i, U_i+1) in the upper half of Fig. 5.5. If we look at the plot, we observe a fairly good filling of the unit square, which may be interpreted as a sign of good quality. However, if we only draw the first quarter of the sequence, as we do next, a different pattern emerges. The bottom half of Fig. 5.5 shows that the initial portion of the sequence does not cover the unit square uniformly, pointing out a deficiency of the generator.

The second part of the script, where we set a = 1365, yields much worse results, however. In Fig. 5.6 we observe that points are arranged on parallel lines. This kind of pattern is known as lattice structure and is typical of LCGs. Actually, this also happens in Fig. 5.5. In general, similar structures can be observed in three dimensions when considering points with coordinates of the form (U_i, U_i+1, U_i+2). The problem is more evident here, as very few lines with large gaps are observed.

Let us try to understand where the problem comes from, with reference to Fig. 5.6.² A quick look at the figure suggests that points (U_i, U_i+1) are arranged in lines of the form

for just a few values of the intercept α_j. In order to check this, let us investigate what happens in terms of the corresponding integer numbers Z_i and Z_i+1, by noting that

for some integer k. Hence, with the choice a = 1365, c = 1, and m = 2048, we find

Now, if we divide by m = 2048 in order to get back to the uniform numbers, we notice that 3 × 1365 + 1 = 4096, an integer multiple of the modulus. Indeed, this is what makes this case peculiar: the right-hand side of the previous equation, when divided by 2048, turns out to be the intercept α_j:

where β_j is integer. Therefore, we find just a few lines of the form of Eq. (5.4). To see how many, notice that, since U_i, U_i+1

[0, 1), we have

suggesting that at most four parallel lines may intersect the unit square. Indeed, a look at Fig. 5.6 shows that there are four such lines, but one seems to boil down to an isolated point close to the origin. Using R, it is easy to check that we get back to the initial integer seed Z₀ = 0 after Z₂₀₄₇ = 3, since (3 × 1365 + 1) mod 2048 = 0. Therefore, we obtain

Having pointed out some issues with LCGs, we should also mention that they do have some advantages as well:

They are efficient.

It is easy to come up with portable generators, i.e., algorithms that can work on different machines.

They allow for reproducibility by managing a simple seed.

Sequences may be partitioned into separate streams.

The last point deserves some comment. By “separate streams” we mean different sequences of “independent” random numbers that can either be assigned to model different sources of randomness in a simulation³ or to different CPUs when running parallel Monte Carlo. We could just assign a different seed to each stream, but with a naive choice there is a danger that they overlap, impairing the independence requirement. With a multiplicative LCG, i.e., when the increment c = 0, is easy to skip ahead in the sequence in order to partition the whole sequence in nonoverlapping subsequences:

So, skipping ahead by k steps just requires computing a^k mod m once. All of these requirements should also be met by alternative and more sophisticated generators.

5.2.3 GENERAL STRUCTURE OF RANDOM NUMBER GENERATORS

In the previous section we have shown that LCGs may have several limitations, and that careful testing is needed in order to select a suitable one. Indeed, LCGs were state of the art in the past, but they have been replaced by alternative generators. Nevertheless, all of these generators share the same conceptual structure as LCGs, in which we have:

A finite set S of states

A way to assign the initial state in S

A state transition function from S to S

An output space U

An output function from S to U

In the simple LCG case, the states are just integer numbers and the output space is the unit interval. More recent generators use more complex state spaces. We cannot cover all of them, but we do point out a couple of ideas that have actually been implemented in R.

One possibility is to devise combined generators. The idea is to combine multiple generators that, taken individually, would have quite some weaknesses, but improve a lot when combined. One such generator is the Wichman–Hill generator, which combines three LCGs. For instance:

This choice results in much longer periods, and other generators along this line have been proposed. It should be noted that an advantage of a combined generator over an LCG with large modulus m is that we avoid numerical issues with potential overflows.

An alternative approach is to use a richer state space S, as in multiple recursive generators:

A class of generators, called Tausworthe generators or linear feedback shift registers, implements this idea with m = 2. In other words, the state consists of bits, a choice that takes advantage of the binary operations of a computer. The output function maps a sequence of bits into a number in the unit interval:

where s and L are positive integers playing the role of a step size and a word length, respectively. A natural choice for L is 32 or 64, depending on the word length of the machine. The advantage of exploiting the binary arithmetic of hardware, where operations modulo 2 correspond to a XOR (exclusive OR) operation is obvious.⁴ Actually, variants of this scheme are used. Generalized feedback shift registers (GFSR) use k vectors X_i = (x_i,1, …, x_i,L) of L bits, rather than single bits:

One very common generator, the Mersenne twister, is obtained by adding a further tweak, i.e., by interpreting a GFSR in matrix terms:

where Y_i is a vector of kL bits and A is a (kL) × (kL) matrix. In the case of the GFSR of Eq. (5.5), we have

where I_L is the L × L identity matrix. Clearly, the upper part of the matrix implements a shift, and the new bit word is generated by the last line. Twisting is obtained by using a different last line in matrix A. A version of Mersenne twister is widely used and, at the time of writing, it is the default generator in R. This version uses k = 624 and has the huge period

Incidentally, the name stems from the concept of Mersenne numbers, i.e., numbers of the form 2^p − 1; for certain values of p, a Mersenne number is also a prime number. Indeed, both p = 19,937 and the corresponding Mersenne number are prime.

5.2.4 RANDOM NUMBER GENERATORS IN R

Random numbers can be generated in R by the function runif, which generates a vector of given size:

Notice the use of set.seed to set the seed (better said, the state) of the generator. The function .Random.seed can be used to inspect the current state of the generator. Its value depends on the kind of generator, which can be selected by the function RNGkind by setting the input parameter kind; possible values of this parameter are wichmann–Hill and Mersenne–Twister, the default.

The following snapshot shows that, with the default setting, the state of the generator is an integer vector consisting of 626 entries, which can be reset by passing a single number to set.seed:

This corresponds to the state of a Mersenne twister, plus some additional information. The next snapshot shows how to change the generator from Mersenne twister to Wichmann–Hill, as well as the effect on the state:

Indeed, the state of a combined Wichmann-Hill generator consists of three integer numbers.

5.3 The inverse transform method

Suppose that we are given the CDF F_X(x) = P{X ≤ x}, and that we want to generate random variates distributed according to F_X(·). If we are able to invert F_X(·) easily, then we may apply the following inverse transform method:

1. Draw a random number U ~ U(0, 1)

2. Return X = F_X⁻¹ (U)

It is easy to see that the random variate X generated by this method is actually characterized by the CDF F_X (·):

where we have used the monotonicity of F_X and the fact that U is uniformly distributed.⁵

Example 5.1 Generating exponential variates

A typical distribution which can be easily simulated by the inverse transform method is the exponential distribution. If X ~ exp(μ), where 1/μ, is the expected value of X, its CDF is

equation

A direct application of the inverse transform method yields

equation

Since the distributions of U and (1 − U) are actually the same, it is customary to generate exponential variates by drawing a random number U and by returning − log(U)/μ. Generating exponential random variates is also one way to simulate a Poisson process, as we shall see in Chapter 6.

The inverse transform method is quite simple conceptually; however, we may not apply it when F_X is not invertible, as it happens with discrete distributions. In this case the cumulative distribution function is piecewise constant, with jumps located where the probability mass is concentrated, i.e., for possible values of the discrete random variable. Nevertheless, we may adapt the method with a little thought. Consider a discrete empirical distribution with a finite support:

It may be instructive to see how this idea may be implemented in a simple way (not the most efficient one, however). Suppose that we have a distribution defined by probabilities

and then we draw a uniform random number, say U = 0.82. For each cumulative probability P, we check if U > P, which yields the vector

where 1 corresponds to “true” and 0 to “false.” To select the correct value to return, we must sum the ones in this vector (the total is 3 here) and add 1; in this case we should return the value 4. Using R, this may be accomplished by working directly on vectors, as shown in Fig. 5.7. The code also includes a script to check the results by plotting a histogram, shown in Fig. 5.8, for the specific numerical example that we are considering.

Sometimes, the CDF is invertible in principle, but this is quite costly computationally. In such a case, one possibility is to resort to the acceptance–rejection method we describe next.

5.4 The acceptance–rejection method

Suppose that we must generate random variates according to a PDF f_X(x), and that the difficulty in inverting the corresponding CDF makes the inverse transform method unattractive. To see a concrete example, consider the beta distribution with PDF

is the beta function needed for normalization.⁶ To obtain the CDF we should integrate the PDF, resulting in a function that requires a costly numerical inversion by solving a nonlinear equation.

where I is the support of f_X(·). The function t(·) is not a probability density, but the related function g(x) = t(x)/c is, provided that we choose

If the distribution g(·) is easy to sample from, it can be shown that the following acceptance–rejection method generates a random variate X distributed according to the density f_X(·):

1. Generate Y ~ g.

2. Generate U ~ U(0, 1), independent of Y.

3. If U ≤ f_X(Y)/t(Y), return X = Y; otherwise, repeat the procedure.

The PDF f_X(·) is called the target density, and the “easy” density g(·) is called the instrumental or candidate density. If the support I is bounded, a natural choice of instrumental density is the uniform distribution on I, and we may set

Before proving the correctness of the method, let us get an intuitive grasp from Fig. 5.9, where we show a target PDF much like a beta distribution, whose support is the unit interval (0, 1). The upper bounding function t(·) is a constant corresponding to the maximum of the target density. Hence, the instrumental density is uniform, and the variables Y spread evenly over the unit interval. Consider point A; since f_X(A) is close to t(A), A is likely to be accepted, as the ratio f_X(A)/t(A) is close to 1. When we consider point B, where the value of the density f_X(·) is small, we see that the ratio f_X(B)/t(B) is small; hence, B is unlikely to be accepted, which is what we would expect. It can also be shown that the average number of iterations to terminate the procedure with an accepted value is c.

Example 5.2 Generating a beta variate

Consider the density

equation

The reader is urged to verify that this is indeed a density (actually, it is the beta density with α₁ = α₂ = 3). Since the PDF is a fourth-degree polynomial, the CDF is a fifth-degree polynomial. If we apply the inverse transform method, we have to invert such a polynomial at each generation; this inconvenience suggests the application of the acceptance–rejection method. By straightforward calculus we see that

equation

for x^* = 0.5. We may check this result by running the first lines of the script in Fig. 5.10:

FIGURE 5.10 Script to check acceptance–rejection for Example 5.2.

> f <- function (x) dbeta(x,shape1=3,shape2=3)
> out <- optimize(f,interval=c(0,1),maximum=TRUE)
> out
$maximum
[1] 0.5
$objective
[1] 1.875

Using the uniform density as the easy instrumental density g(·), we obtain the following algorithm:

1. Draw two independent and uniformly distributed random variables U₁ and U₂.

2. If U₂ ≤ 16(U₁² − 2U₁³ + U₁⁴), accept X = U₁; otherwise, reject and go back to step 1.

In the script of Fig. 5.10 we also draw the points generated according to this scheme, with coordinates U₁ and U₂; this produces the plot in Fig. 5.11, where rejected points are depicted as empty circles, and accepted points as bullets. We observe that a larger fraction of points is accepted when the PDF is larger, as intuition requires. The average number of iterations to generate one random variate is .

1. That the acceptance-rejection algorithm indeed is equivalent to sampling from the PDF f_X(·).

2. That the expected number of iterations to accept Y is c.

The plot of Fig. 5.11 is enlightening, as it shows that what we are doing is based on a common trick in mathematics: embedding a problem into a higher-dimensional space.⁷ In fact, our task is to generate a single random variable X according to the PDF f_X(·), but this density may also be interpreted as

i.e., as a marginal density within the joint distribution of two random variables X and U, umformly distributed on the set {(x, u) | 0 ≤ u ≤ f_X(x)}, i.e.,

Thus, sampling in the one-dimensional case is equivalent to sampling the bidimensional distribution. The argument applies to the case in which the upper bounding function t(·) is a constant, as in Example 5.2, so that g(·) boils down to a uniform density and the larger domain is just a box. To generalize the idea to an arbitrary instrumental density, let us calculate the probability

, for an arbitrary set⁸

, when X is generated by the acceptance–rejection scheme:

sample Y ~ g;

sample U|Y = y ~ U{0, cg(y));

accept y if it satisfies the constraint u ≤ f_X(y).

Note that X is given by Y conditional on the event “accept”, which is equivalent to {U ≤ f_X(Y)}. Let X be the support of f_X(·). We can write:

This shows that the density of the accepted Y is indeed f_X(·). Furthermore, we also see that the probability of the event “accept” is

However, since the iterations in acceptance–rejection are independent and we stop at the first success, the number of iterations follows a geometric distribution with parameter p = 1/c. Since a geometric random variable with parameter p has expected value 1/p,⁹ it follows that the expected number of iterations to accept a proposed value is c.

1. A tight upper bounding function, which is reflected in a small value of c, is needed for efficiency.

2. A priori, we do not know how many random numbers are needed to generate a single variate by acceptance–rejection.

The second point implies that we do not know a priori the dimension of space on which we are integrating when carrying out Monte Carlo simulation. This does not prevent the application of Monte Carlo sampling, but it is a stumbling block when we use low-discrepancy sequences, as we shall see in Chapter 9. It is also worth mentioning that rejection sampling is not restricted to univariate distributions, and it is can also be applied to sample from a PDF which is not completely known and is given up to a normalization constant. This is quite relevant in computational Bayesian statistics, as we shall see in Chapter 14.

5.5 Generating normal variates

Sampling from a normal distribution is a common need in financial applications. Recall first that if X ~ N(0, 1), then μ + σX ~ N(μ, σ²); hence, we just need a method for generating a stream of independent standard normal variables. If correlation needs to be induced, this can be easily accomplished as we shall explain. Over the years, a number of methods have been proposed to generate standard normal variates. Some are obsolete, like the idea of summing a suitably large number of uniform variates, taking advantage of the central limit theorem. The inverse transform method was not considered efficient in the past, but it is quite practical by now, as efficient approximations have been developed to invert the CDF of the standard normal. In between, there is a set of ad hoc methods, and some are implemented in the R function rnorm.

5.5.1 SAMPLING THE STANDARD NORMAL DISTRIBUTION

The general-purpose inverse transform methods may be applied by inverting the standard normal CDF. In R, this is accomplished, for a generic normal, by the function call

that returns the quantile at probability level p of a variable with expected value mu and standard deviation sigma. The following snapshot shows that there is little difference between calling rnorm or generating uniform variates and then inverting the normal CDF:

Indeed, inversion of the CDF is currently the default method in R. Nevertheless, there are faster ad hoc methods that can be selected as follows:

In the following, we will not cover the generation of normal variates too extensively, but it is nevertheless instructive to get a glimpse of the ingenuity that may be employed to come up with ad hoc methods, and the dangers of bad random variate generators. A well-known approach is the Box–Muller method, which takes advantage of the representation of points on the plane by polar coordinates. Consider two independent variables X, Y ~ N(0, 1), and let (R, θ) be the polar coordinates of the point of Cartesian coordinates (X, Y) in the plane, so that

The last expression looks like a product of an exponential density for d and a uniform distribution; the term 1/2π may be interpreted as the uniform distribution for the angle θ

(0, 2π). However, we are missing some constant term in order to obtain the exponential density. In fact, to properly express the density in terms of (d, θ), we should take the Jacobian of the transformation from (x, y) to (d, θ) into account.¹⁰ Some calculations yield

Hence, we may generate R² as an exponential variable with mean 2, θ as a uniformly distributed angle, and then transform back into Cartesian coordinates to obtain two independent standard normal variates. The Box–Muller algorithm may be implemented as follows:

1. Generate two independent uniform variates U₁, U₂ ~ U(0, 1).

2. Set R² = −2log U₁ and θ = 2πU₂.

3. Return X = R cos θ, Y = R sin θ.

In practice, this algorithm may be improved by avoiding the costly evaluation of trigonometric functions and integrating the Box–Muller approach with the rejection approach. The idea results in the following polar rejection method:

1. Generate two independent uniform variates U₁, U₂ ~ U(0, 1).

2. Set V₁ = 2U₁ − 1, V₂ = 2U₂ − 1, S = V₁² + V₂².

3. If S > 1, return to step 1; otherwise, return the independent standard normal variates:

We refer the reader to [9, Section 5.3] for a justification of the polar rejection method.

Example 5.3 A bad interaction of strategies

We have seen that LCGs may exhibit a lattice structure. Since the Box–Muller transformation is nonlinear, one might wonder if the composition of these two features may yield weird effects. We may check this in a somewhat peculiar case (see [7]), using the R script in Fig. 5.12. The script generates 2046 uniform random numbers for a sequence with modulus m = 2048; we discard the last pair, because the generator has maximum period and reverts back to the seed, which is 0 and causes trouble with the logarithm. Vectors U1 and U2 contain odd- and even-numbered random numbers in the sequence. The first part of the resulting plot, displayed in Fig. 5.13, shows poor coverage of the plane. The second part shows that swapping the pairs of random numbers may have a significant effect, whereas with truly random numbers the swap should be irrelevant. Of course, using better LCGs, or better random number generators altogether prevents pathological behavior like this. Nevertheless, it may be preferable to use the inverse transform method to avoid undesirable effects.

5.5.2 SAMPLING A MULTIVARIATE NORMAL DISTRIBUTION

In many financial applications one has to generate variates according to a multivariate normal distribution with (vector) expected value μ and covariance matrix Σ. This task may be accomplished by finding the Cholesky factor for Σ, i.e., an upper triangular¹¹ matrix U such that

Then we apply the following algorithm:

1. Generate n independent standard normal variates Z₁, …, Z_n ~ N(0, 1).

2. Return

, where

Example 5.4 Sampling using Cholesky factors

A rough code to simulate multivariate normal variables is illustrated in Fig. 5.14. The code builds a matrix, where columns correspond to variables and rows to observations, as usual. Assume that we have the following parameters:

FIGURE 5.14 Code to sample from a multivariate normal distribution.

> Sigma <- matrix (c (4, 1, −2, 1, 3, 1, −2, 1, 5), nrow=3)
> mu <- c(8,6,10)
> eigen(Sigma)$values
[1] 6.571201 4.143277 1.285521

We should note that Cholesky decomposition is not the only one that we can use.¹² An alternative is the square root matrix Σ^1/2, i.e., a symmetric matrix such that

One way to find a symmetric square root is the following, taking advantage of matrix diagonalization:

A handier alternative is the function sqrtm from package expm, as we have seen in Section 3.3.4. Anyway, with R there is no need to carry out that amount of work by ourselves, as we can rely on the mvrnorm function from the MASS package:

5.6 Other ad hoc methods

Many ad hoc methods rely on relationships among distributions.¹³ For instance, we know that an Erlang variable Erl(n, λ) is the sum of n independent exponentials X_i with rate λ. Hence, we may use the inverse transform method as follows:

By a similar token, we could take advantage of the fact that a chi-square variable χ²_n with n degrees of freedom is obtained by squaring and summing n independent standard normals. However, we have also seen that these distributions may be regarded as specific subcases of the gamma distribution. We recall the PDF of a Gamma(α, λ) random variable:

where α is a shape parameter and λ is a scale parameter. Clearly, there is a notable difference between the cases α < 1 and α ≥ 1. Let us focus on the latter one. Different algorithms have been proposed for such a gamma variable. The following one, which we describe for illustrative purposes only, is based on acceptance–rejection:

1. Set d = α −

and c =

2. Generate two independent variates U ~ U(0, 1) and Z ~ N(0, 1), and set V = (1 + cZ)³.

3. If Z > −1/c and log U < 0.5Z² + d − dV + d log V, then return X = dV; otherwise go back to step 2.

The example also points out the fact that the boundary between ad hoc and general methods is a bit blurred, since we are using acceptance–rejection, but in a very peculiar way. Luckily, we may just rely on the rgamma function in R!

Another interesting case is the multivariate t distribution t_ν(μ, Σ) (see Section 3.3.4). We may take advantage of the fact that a t variable is related to the ratio of a standard normal and a chi-square, which in turn is a special case of the gamma. To induce correlation, we may use Cholesky factors of the covariance matrix, just like in the case of the normal.

1. Generate a column vector Z ~ N(0, I) of n independent standard normals.

2. Generate S ~ Gamma (ν/2, 1/2) ≡ χ²_ν.

3. Compute Y = √ν/S Z and the Cholesky factor U of Σ.

4. Return X = μ + UY.

To deal with the multivariate t distribution, we may use R functions from the package mvtnorm, such as rmvt, which has been introduced in Section 3.3.4.

5.7 Sampling from copulas

Given a copula C(u₁, …, u_n) and marginal CDFs F_i(·), i = 1, …, n, we may generate a vector X of n dependent variables with those marginals by the following scheme:

1. Generate U ~ C(u₁, …, u_n).

2. Return X = (F₁⁻¹(U₁), …, F_n⁻¹(U_n)).

As an instructive example, say that we want to generate a sample of n = 10,000 points (X₁, X₂), where X₁ ~ gamma(2, 1) and X₂ ~ N(0, 1), and the dependence is described by the Student’s t copula

where we denote by T_ν,Σ(···) the CDF of a multivariate t distribution with v degrees of freedom and by T_ν(·) the CDF of the corresponding univariate distribution; note that the covariance matrix Σ is actually a correlation matrix, in the sense that its diagonal entries are set to 1. To sample from the copula, we need not invert the multidimensional CDF. We rely on the extension of the following property for univariate distribution:¹⁴ If X has CDF F_X(·), then F_X(X) ~ U(0, 1). Thus, we sample Y ~ t_ν(0, Σ) from the multidimensional t distribution, as we have seen in the previous section, and then we compute U = (T_ν(Y₁), …, T_ν(Y_n)). The vector U takes values in the unit hypercube and has uniform marginals, thanks to the aforementioned property; however, its components reflect the dependence structure associated with the copula. Finally, we generate X by inverting the marginals. The code in Fig. 5.15 accomplishes all of this when ν = 10 and

and it also produces the scatterplot and marginal histograms displayed in Fig. 5.16. Note the nonlinear association between the two variables, which cannot be accounted for by Pearson’s correlation. We urge the reader to play with the code in order to see the effect of changing the correlation in the copula.

All of the above can also be performed using the copula package. With the code in Fig. 5.17 we accomplish two tasks: First we sample from a given copula; then, given the sample, we fit a copula to see if we can get back the copula. The details are as follows:

1. We use tCopula to create a bidimensional t copula, with degrees of freedom 10 and correlation parameter 0.8; the copula is specified as exchangeable by dispstr=“ex”, i.e., the two dimensions behave in the same way. Note that we could also use ellipCopula, which generates an elliptical copula. Such a copula family includes the normal and Student’s t, which can be specified by the family parameter.

2. We sample from the copula using rCopula; the output matrix of observations U is transformed into x by using the marginals we wish.

3. To go back and fit the copula, we need to transform the observations into pseudo-observations in the unit square; this may be accomplished by the function pobs. Alternatively, we might fit a marginal and use that. The observations and the pseudo-observations are plotted in Fig. 5.18.

FIGURE 5.18 Transforming observation to fit a copula.

4. Finally, using the pseudo-observations Uhat, we fit a bidimensional t copula.

Running the code, we obtain the following fitted copula, which is in good agreement with the one we have used to sample data:

For further reading

For an extensive treatment of random number generators, see [11].

Testing procedures for random number generators are described in [5].

Some books on simulation include extensive chapters on random variate generation. See, e.g., [3] or, at a more advanced level, [2].

Advanced methods for the generation of random variates are treated at a research level in [1].

References

1 J.H. Ahrens and U. Dieter. Computer methods for sampling from gamma, beta, Poisson and binomial distributions. Computing, 12:223–246, 1974.

2 D.P. Kroese, T. Taimre, and Z.I. Botev. Handbook of Monte Carlo Methods. Wiley, Hoboken, NJ, 2011.

3 A.M. Law and W.D. Kelton. Simulation Modeling and Analysis (3rd ed.). McGraw-Hill, New York, 1999.

4 P. L’Ecuyer. Efficient and portable combined random number generators. Communications of the ACM, 31:742–749, 1988.

5 C. Lemieux. Monte Carlo and Quasi–Monte Carlo Sampling. Springer, New York, 2009.

6 B.D. Ripley. The lattice structure of pseudo-random number generators. Proceedings of the Royal Society of London; Series A, Mathematical and Physical Sciences, 389:197–204, 1983.

8 C.P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, New York, 2004.

10 S. Ross. Introduction to Probability Models (8th ed.). Academic Press, San Diego, CA, 2002.

11 S. Tezuka. Uniform Random Numbers: Theory and Practice. Kluwer Academic Publishers, Boston, 1995.

⁴ The exclusive OR, denoted as ⊕, is defined as follows: (1 ⊕ 0) = (0 ⊕ 1) = 1; (0 ⊕ 0) = (1 ⊕ 1) = 1. In the inclusive OR case, denoted as V, we have (1 V 1) = 1.

⁵ The reader may appreciate the similarity between this line of reasoning and that of Eq. (3.40), where we showed that if X has CDF F_X(·), then F_X(X) is a uniform random variable.

⁶ See Section 3.2.4. The definition of the beta distribution, per se, does not require the above condition on its parameters; however, the PDF would have a vertical asymptote for either α₁ < 1 or α₂ < 1, which would prevent the application of acceptance–rejection.

¹¹ In many numerical analysis textbooks, Cholesky decomposition is expressed as A = LL, for a lower triangular matrix L; R returns an upper triangular matrix U, but the two forms are clearly equivalent.