Chapter Two

Numerical Integration Methods

Numerical integration is a standard topic in numerical analysis, and in the previous chapter we have hinted at the link between integration and Monte Carlo methods. In this chapter we have a twofold objective:

On the one hand, we want to insist on the link between numerical integration and Monte Carlo methods, as this provides us with the correct framework to understand some variance reduction methods, such as importance sampling, as well as alternative approaches based on low-discrepancy sequences.
On the other hand, we want to outline classical and less classical approaches to numerical integration, which are deterministic in nature, to stress the fact that there are sometimes valuable alternatives to crude Monte Carlo; actually, stochastic and deterministic approaches to numerical integration should both be included in our bag of tricks and can sometimes be integrated (no pun intended).

In many financial problems we are interested in the expected value of a function of random variables. For instance, the fair price of a European-style option may be evaluated as the discounted expected value of its payoff under a risk-neutral probability measure. In the one-dimensional case, the expected value of a function g(·) of a single random variable X with probability density fx(x) is given by the following integral:

equation

This is just an ordinary and quite familiar integral. We also recall that when we need to estimate the probability of an event A, which may occur or not depending on the realization of a random variable X, we are back to the case above:

equation

where 1A (x) is the indicator function for event A (taking the value 1 if event A occurs when X = x, 0 otherwise).

In lucky cases, an integral may be evaluated in analytical or semianalytical form. To illustrate the latter case, consider the Black–Scholes–Merton (BSM) formula for the price of a call option written on a non-dividend-paying stock share at time t:

equation

In the formula, S(t) is the current spot price of the underlying asset, K is the strike price, T is the option maturity, i.e., when it can be exercised, and

equation

where r is the continuously compounded risk-free rate, and σ is the annualized volatility of the underlying asset. Both r and σ are assumed constant in this model. Finally, Φ(z) is the cumulative distribution function of a standard normal variable:

equation

The BSM formula is typically labeled as analytical, even though, strictly speaking, it is semianalytical, since there is no analytical expression for φ(z); nevertheless, there are quite efficient and accurate ways to approximate this function, and we need not resort to general-purpose numerical integration. In other cases, we do need a numerical approach, typically based on quadrature formulas. In this chapter we will only give the basics of such numerical strategies, but quite sophisticated methods are available. Indeed, there is no need for Monte Carlo methods in low-dimensional problems. The difficulty arises when we deal with a vector random variable X, with support , and the corresponding multidimensional integral

equation

Product quadrature formulas quickly grow out of hand, in the sense that they require a huge number of points, and this is where Monte Carlo comes into play. Indeed, even sophisticated numerical computing environments only offer limited support for multidimensional integration via classical quadrature formulas.

However, we cannot afford to dismiss deterministic approaches to numerical integration. To see why, consider a stochastic optimization problem, whereby we need the expected value of a function h(·, ·) depending on both a vector of control variables x, modeling our decisions, and a vector of random variables ξ with joint density fξ(·), modeling what we cannot control:

equation

In such a case, we do need just a single number, i.e., an expected value estimated by a sample mean, but a way to approximate the function H(x), possibly obtained by selecting S scenarios characterized by realizations ξs and probabilities πs, s = 1, …, S:

equation

As we have already pointed out, Monte Carlo methods may need a large sample size to achieve an acceptable degree of accuracy, and we may not afford that when solving a difficult optimization problem. In this setting, alternative approaches may be valuable. Some of them are deterministic in nature and are best understood with reference to a view integrating numerical integration and random sampling.

We start in Section 2.1 with a brief overview of quadrature formulas, emphasizing the role of interpolation based on polynomials. Then, we outline more sophisticated Gaussian quadrature formulas in Section 2.2. These approaches are developed for one-dimensional integration but, at least in principle, they can be extended to the multidimensional case by product formulas; this idea and its limitations are discussed in Section 2.3. Alternative approaches for high-dimensional problems, such as good lattices, Monte Carlo, and quasi–Monte Carlo, are introduced in Section 2.4. In Section 2.5 we pause to illustrate some important links between statistics and numerical integration, by considering moment matching methods. We close the chapter by outlining some R functions for numerical integration in Section 2.6.

2.1 Classical quadrature formulas

Consider the problem of approximating the value of the one-dimensional definite integral

equation

involving a function f(·) of a single variable, over a bounded interval [a, b]. Since the integration is a linear operator, i.e.,

equation

for all functions f, g and real numbers α, β, it is natural to look for an approximation preserving linearity. A quadrature formula for the interval [a, b] is defined by two ingredients:

1. A set of n + 1 nodes xj, j = 0, 1, …, n, such that

equation

2. A corresponding set of weights wj, j = 0, 1, …, n.

Then, we approximate I[f] by

equation

To be precise, a quadrature formula like the one we are describing is called a closed formula, since the set of nodes includes the extreme points a and b of the interval. Open formulas are used when the function is not well-behaved near a or b, or when we are integrating on an infinite interval. In this book we will only consider closed formulas, as in most cases of interest for Monte Carlo simulation, as we shall see later, the integration domain is the unit interval [0, 1] or the unit hypercube [0, 1]p.

2.1.1 THE RECTANGLE RULE

One intuitive way to choose nodes and weights is the rectangle rule. The idea is illustrated in Fig. 2.1, where the underlying idea is clearly to regard the integral as an area and to approximate it by the union of rectangles. If we are integrating over the unit interval [0, 1], the rectangle rule may be expressed as

FIGURE 2.1 Illustration of the rectangle rule.

(2.1) equation

Note that, in such a case, we divide the unit interval in n slices, and we take the function value at the left end of each subinterval as the representative value for that slice; if we sum for j in the range from 1 to n, we are considering right end points.

For the case in the figure, i.e., f(x) = x, it is fairly easy to see that

equation

Hence, the absolute integration error is

equation

As expected, the error goes to zero when n goes to infinity, but perhaps we may do better. By “better” we do not only mean faster convergence. One would expect that, at least for such an easy function like f(x) = x, a quadrature formula would be able to hand us the exact result. We shall see next how to accomplish this, but we should mention that the simple rectangle formula can be generalized to much more sophisticated rules based on good lattices, as we shall see later.

2.1.2 INTERPOLATORY QUADRATURE FORMULAS

Polynomials are arguably the easiest function to integrate. We also know, from one of the many Weierstrass theorems, that a continuous function can be approximated on an interval by a suitably high-order polynomial with arbitrary precision. So, it is natural to look for quadrature formulas such that the integration error

equation

is zero for a wide class of polynomials.

DEFINITION 2.1 (Order of a quadrature formula) We say that a quadrature formula Q is of order m if the integration error is zero for all the polynomials of degree m or less, but there is a polynomial of degree m + 1 such that the error is not zero.

Now, how can we approximate a function by a polynomial? One possibility is represented by Lagrange polynomials, which we outline very briefly. Given a function f(·), say that we want to interpolate it at a given set of points (xj, yj), j = 0, 1, …, n, where yj = f(xj), and xjxk for jk. It is easy to find a polynomial Pn of degree (at most) n such that Pn(xj) = yj for all j. The Lagrange polynomial Lj(x) is defined as

(2.2) equation

Note that this is a polynomial of degree n and that

equation

Then, an interpolating polynomial can be easily written as

equation

Now, in order to define a quadrature formula, we have to define its nodes. The seemingly obvious way is to consider equally spaced nodes:

equation

where h = (b − a)/n; also let fj = f(xj). As it turns out, this choice need not be the best one, but it is a natural starting point. Selecting equally spaced nodes yields the set of Newton–Cotes quadrature formulas. Given these n + 1 nodes and the corresponding Lagrange polynomial Pn(x) of degree n, we compute the quadrature weights as follows:

equation

which yields

(2.3) equation

Consider the case of two nodes only, x0 = a and x1 = b, with step h = x1x0. Here we are just interpolating f by a straight line:

equation

A straightforward calculation yields

equation

A closer look shows that this formula approximates the area below the function using a trapeze. We may improve the integration error by applying the formula on subintervals, thus approximating the overall area by trapezoidal elements, as depicted in Fig. 2.2. Thus, we find the trapezoidal rule:

FIGURE 2.2 Illustrating the trapezoidal quadrature formula.

equation

This idea is general: Given any quadrature formula for an interval, we can build a composite formula by applying the same pattern to small subintervals of the integration domain.

2.1.3 AN ALTERNATIVE DERIVATION

A quadrature formula based on n + 1 nodes and the corresponding weights obtained by Lagrange polynomials is by construction exact for polynomials of degree ≤ n. We may go the other way around, and find weights of a formula by requiring that it features a given order. For the sake of simplicity, let us consider an integral on the unit interval, and the three quadrature nodes 0, 0.5, 1:

equation

Since we have three nodes, we should be able to find a formula that is exact for polynomials of degree ≤ 2. Given linearity of integration, we may enforce this requirement by making sure that the formula is exact for the following monomials:

equation

The corresponding weights can be found by solving the following system of linear equations:

(2.4) equation

which yields . Applying the same idea on the more general interval [a, b], we obtain the well-known Simpson’s rule:

equation

It is fairly easy to see that, somewhat surprisingly, this formula is actually exact for polynomials of degree ≤ 3. In fact, we have

equation

Applying Simpson’s rule we have, by some straightforward algebra:

equation

In order to improve precision, Simpson’s rule may be applied to subintervals of (a, b), resulting in a composite formula.

What we have discussed only scratches the surface of numerical integration. We would also need:

A way to estimate the integration error in order to decide if a more refined partition of the interval is needed.
A way to adapt partitions to the variability of the function in different integration subintervals.
A non-naive way to choose nodes.

In the next section we give a partial answer to the last question. What is fundamental, however, is to start seeing the connection between the system of equations (2.4), which is motivated by a classical problem in numerical analysis, and the statistical concept of moment matching, to be discussed in Section 2.5.

2.2 Gaussian quadrature

In Newton–Cotes formulas, nodes are fixed and suitable weights are sought in order to squeeze the most in terms of quadrature order. With n + 1 nodes, we may find a quadrature formula with order n. However, we might improve the order by finding nodes and weights jointly, as this essentially means doubling the degrees of freedom. In fact, this is the rationale behind Gaussian quadrature. Gaussian quadrature may yield high-quality numerical integration, and it is particularly relevant for some problems in finance, statistics, and stochastic optimization, as it may be interpreted in terms of discretization of a continuous probability distribution. There are different Gaussian quadrature formulas, associated with a non-negative weight function w(x) and an integration interval:

(2.5) equation

for a weight function w(x) and an integration interval [a, b]. In this section, for the sake of convenience, we consider n nodes xi, i = 1, …, n. The integration interval can also be (−∞, +∞), in which case we need a weight function w(x) that goes to zero quickly enough to guarantee, as far as possible, the existence of the integral. This is quite relevant for our purposes, as we will relate w(x) to a probability density, such as the density of a normal random variable.1 Indeed, the integration interval is (−∞, +∞), and a weight function doing the job is w(x) = ex2. With such a choice of the weight function, we talk of Gauss–Hermite quadrature formulas. Actually, this weight function is not exactly the density of a normal variable, but it is easy to adjust the formula by a change of variable. Let Y be a random variable with normal distribution N(μ, σ2). Then, recalling the familiar density of a normal variable:

equation

In order to use weights and nodes from a Gauss–Hermite formula, we need the following change of variable:

equation

Hence, we find

(2.6) equation

In practice, we may regard Gauss–Hermite formulas as a way to approximate a normal distribution by a discrete random variable with realizations xi and probabilities wi, i = 1, …, n. In the next two sections we provide the interested reader with the theoretical background behind Gaussian quadrature, showing the role of orthogonal polynomials, and some hints about computational issues. However, the practically oriented reader may safely skip these sections and proceed directly to Section 2.2.2, where we show how to use the corresponding R package.

2.2.1 THEORY OF GAUSSIAN QUADRATURE: THE ROLE OF ORTHOGONAL POLYNOMIALS

There is a rich theory behind Gaussian quadrature formulas, and the key concept is related to families of orthogonal polynomials. In fact, a quadrature formula with maximum order is found by choosing nodes nodes as the n roots of a polynomial of order n, selected within a family of orthogonal polynomials with respect to the inner product

equation

where w(·) is a weight function and the integration interval can be bounded or, with a suitable choice of the weight function, unbounded. The following properties, among many others, can be proved:2

A polynomial of degree n within such a family has n distinct real roots, all located within the interval (a, b).
These roots are interleaved, in the sense that each of the n − 1 roots of the polynomial of degree n − 1 lies in an interval defined by a pair of consecutive roots of the polynomial of degree n.

Using this choice of nodes, along with a proper choice of weights, yields a quadrature formula with order 2n − 1. To see this, let Πn denote the set of polynomials of degree n, and let us consider a polynomial q Πn that is orthogonal to all polynomials in Πn−1, i.e.,

equation

One might wonder whether such a polynomial can always be found. The answer lies in how families of orthogonal polynomials can be built, on the basis of recursive formulas that we hint at later. We obtain a sequence of polynomials of increasing degree and, since orthogonal polynomials of degree up to n − 1 form a basis for Πn−1, the next polynomial in the sequence is of degree n and orthogonal to all polynomials in Πn−1.

Now, let us consider a function f Π2n−1, i.e., a polynomial of order 2n − 1. Any such polynomial can be divided by the aforementioned q, obtaining a quotient polynomial p and a remainder polynomial r:

equation

where p, r Πn−1. Now let us integrate the (ordinary) function product wf by a quadrature formula based on n nodes xi, i = 1, …, n, chosen as the zeros of q:

equation

This shows that indeed the formula is exact for polynomials of degree 2n − 1. Using this property, it is possible to prove the following essential facts:

The sum of weights is 1.
Weights are positive.

These are essential facts in light of our probabilistic applications. In fact, different formulas can be adapted to different probability distributions, like we have shown for Gauss–Hermite formulas. The specific family of orthogonal polynomials to be used depends on the weight function and the integration interval. Other examples are:

Gauss–Legendre formulas, where the integration domain is (−1, 1) and the weight function is w(x) = 1.
Gauss–Laguerre formulas, where the integration domain is (0, +∞) and the weight function is w(x) = xαex.

After clarifying the link between orthogonal polynomials and Gaussian quadrature, how can we actually find nodes and weights? There are different approaches.

1. We may build the relevant sequence of orthogonal polynomials by a recursion of the form

equation

We omit further details, such as the form of the coefficients aj and bj, referring the reader to the references. We just stress two facts:
Since roots of successive polynomials in the sequence are interleaved, the roots of the previous polynomials are useful to initialize fast root-finding algorithms.
After determining the nodes, we may find the corresponding weights, e.g., by solving a system of linear equations or by using a set of formulas involving the underlying orthogonal polynomials and their derivatives; see, e.g., [10, Chapter 4].
2. Another kind of approach has the same starting point as the previous one, but the recursive formulas are used to build a matrix (tridiagonal and sym metric), whose eigenvalues are, in fact, the roots we need. Indeed, the R functions we describe in the next section follow an approach in this vein; see [4].

2.2.2 GAUSSIAN QUADRATURE IN R

Gaussian quadrature in R is accomplished by the package statmod, which offers the following two functions:

gauss.quad(n,kind=legendre”), whose main input arguments are the number n of nodes and weights, and the kind of quadrature, which can be of many types, including “hermite”; “legendre” is the default setting of the parameter. If we choose this function, we have to use the transformations of Eq. (2.6) to deal with the normal distribution.
If we want a more direct approach, we may use
gauss.quad.prob(n,dist=“uniform”,1=0,u=1,mu=0,sigma=1, alpha=1,beta=1),
which allows to select a distribution “uniform”, “normal”, “beta”, or “gamma”, along with the relevant parameters.3

In both cases, the output is a list including weights and nodes. To illustrate, let us check a Gauss–Hermite quadrature with 5 nodes, for a normal random variable with parameters μ = 50 and σ = 20:

> mu <- 50
> sigma <- 20
> N <- 5
> out <- gauss.quad(N, kind=“hermite”)
> w1 <- out$weights/sqrt(pi)
> x1 <- sqrt(2)*sigma*out$nodes + mu
> out <- gauss.quad.prob(N,dist=“normal”,mu=mu,sigma=sigma)
> w2 <- out$weights
> x2 <- out$nodes
> rbind(w1,w2)
[,1] [,2] [,3] [,4] [,5]
w1 0.01125741 0.2220759 0.5333333 0.2220759 0.01125741
w2 0.01125741 0.2220759 0.5333333 0.2220759 0.01125741
> rbind(x1,x2)
[,1] [,2] [,3] [,4] [,5]
x1 −7.1394 22.88748 50 77.11252 107.1394
x2 −7.1394 22.88748 50 77.11252 107.1394

We immediately verify that the same results are obtained by using the two functions, provided that we apply the correct variable transformations. The nodes, as expected, are symmetrically centered around the expected value. Furthermore, we may also check that probabilities add up to one and we get the correct first- and second-order moments:

> sum(w1)
[1] 1
> sum(w1*x1)
[1] 50
> sqrt(sum(w1*x1^2)-sum(w1*x1) ^2)
[1] 20

Readers familiar with the concepts of skewness and kurtosis, will also appreciate the following results:4

> sum(w1*(x1-50)^3)/20^3
[1] −8.526513e-16
> sum(w1*(x1-50)^4)/20^4
[1] 3

Within numerical accuracy, the above results are correct for a normal distribution: We are able to match moments of order 3 and 4 as well, with only five nodes. We will come back to this in Section 2.5. All of the results above are related to the quadrature order of Gauss–Hermite formulas when applied to polynomials. With a more complicated function, say, an exponential, we cannot expect exact results; so, let us compute the expected value of an exponential function of a normal random variable. In this particular case we can even afford an instructive comparison with the exact result. Indeed, given the properties of the lognormal distribution,5 we know that if X ~ N(μ, σ2), then

equation

The script displayed in Fig. 2.3 computes the percentage error for μ = 4 and σ = 2, when 5, 10, 15, and 20 nodes are used. Running the script, we observe that a remarkable precision is achieved with a fairly modest number of nodes:

FIGURE 2.3 Script to check Gauss–Hermite quadrature.

N= 5 True= 403.4288 Approx= 398.6568 %error 1.182865
N= 10 True= 403.4288 Approx= 403.4286 %error 5.537709e-05
N= 15 True= 403.4288 Approx= 403.4288 %error 1.893283e-10
N= 20 True= 403.4288 Approx= 403.4288 %error 9.863052e-14

These results explain why Gaussian quadrature is used, e.g., when solving stochastic dynamic optimization problems by dynamic programming. However, there remains an issue: How does all of this scale to higher dimensions? We investigate the issue next; unfortunately, the answer is not quite positive.

2.3 Extension to higher dimensions: Product rules

Let us consider a function of p variables

equation

How can we integrate this function over a domain Ω? We know from Fubini theorem that a multidimensional integral can be transformed into a sequence of nested one-dimensional integrals. If we consider the case in which the integration region is just the Cartesian product of subsets,

equation

we have

equation

It is worth mentioning that this is indeed the relevant case for Monte Carlo simulation, where Ω = [0, 1]p = [0, 1] × [0, 1] × ··· × [0, 1].

Therefore, it seems quite natural to extend quadrature formulas to multidimensional integration by using a product rule reflecting the above structure. More precisely, let us assume that we have weights and nodes for a Newton–Cotes quadrature formula along each dimension: For dimension k, k = 1, …, p, we have weights wik and nodes xik, i = 1, …, mk. A product rule approximates the integral as

equation

The idea behind a product rule is quite intuitive, and it is illustrated in Fig. 2.4. From the figure, we may immediately appreciate the drawbacks of the approach:

FIGURE 2.4 An illustration of product rules.

If N nodes per dimension are required to achieve a satisfactory precision, a total of Np nodes are required; this exponential growth makes the approach impractical for high-dimensional integrals.
A subtler issue is that if we want to add a few points to improve precision, there is no obvious way to do it; adding a single point would disrupt the overall structure.
Furthermore, we also observe that the same coordinates are used repeatedly for each dimension: The grid of Fig. 2.4 consists of 11 × 11 = 121 points in the plane, but if we project them on each coordinate axis, we only find 11 points. One might wonder if there is a better way to gather information about the function.

Since such regular grid is going to be impractical for large p, we have to find alternative approaches, either deterministic or stochastic.

2.4 Alternative approaches for high-dimensional integration

In order to overcome the limitations of classical quadrature, alternative approaches have been developed. Needless to say, random sampling strategies collectively known as Monte Carlo methods play a prominent role in this respect, but we should not take for granted that random methods are the only alternative. In fact, there are deterministic strategies that work well in medium-sized problems:

Lattice methods
Low-discrepancy sequences

Here we just introduce Monte Carlo methods within the framework of numerical integration, leaving further development to later chapters. Low-discrepancy sequences are also just outlined here, as they are the subject of Chapter 9. We discuss lattice methods in some more detail, as they will not be dealt with elsewhere in this book.

2.4.1 MONTE CARLO INTEGRATION

To see how random sampling may be used to compute a definite integral, which is a deterministic quantity, let us consider a one-dimensional integral on the unit interval [0, 1]:

equation

The last rewriting, where we multiply the function g(u) by 1, may not look much of an achievement. However, let us recall that the density of a uniform random variable U on the unit interval [0, 1] is, in fact,

equation

Therefore, we may think of this integral as the expected value

equation

for U ~ U(0, 1). Now it is natural to estimate the expected value by a sample mean. What we have to do is to generate a sequence {Ui} of independent random realizations from the uniform distribution, and then to estimate I as

equation

The strong law of large numbers implies that, with probability 1,

equation

Genuine random sampling is not really feasible with a computer, but we can produce a sequence of pseudorandom numbers using generators provided by most programming languages and environments, including R, as we shall see in Chapter 5.

Example 2.1 A first Monte Carlo integration

Consider the trivial case

equation

To sample the uniform distribution, we rely on the run if function, whose first argument is the sample size, setting the seed of the generator in order to allow the replication of the experiment:

> set.seed(55555)
> mean(exp(runif(10)))
[1] 1.657509
> mean(exp(runif(10)))
[1] 1.860796
> mean(exp(runif(10)))
[1] 1.468096
> mean(exp(runif(100)))
[1] 1.701009
> mean(exp(runif(100)))
[1] 1.762114
> mean(exp(runif(100)))
[1] 1.706034
> mean(exp(runif(1000000)))
[1] 1.718733

Clearly, a sample size m = 10 does not yield a reliable estimate, and m = 100 is just a bit better. A large sample size is needed to get closer to the actual result. Monte Carlo looks extremely straightforward and flexible, but not necessarily quite efficient. It is useful to compare the above results with those obtained by Gaussian quadrature:

> require(statmod)
Loading required package: statmod
> out=gauss.quad.prob(10,dist=“uniform”, 1=0, u=1)
> sum(out$weight*exp(out$nodes))
[1] 1.718282

Needless to say, we do not know the exact result in practice, and we should wonder how to qualify the reliability of the estimate. This can be done by computing a confidence interval as follows:

> set.seed(55555)
> out <- t.test(exp(runif (100)))
> out$estimate
mean of x
1.717044
> out$conf.int
[1] 1.621329 1.812759
attr(,“conf.level”)
[1] 0.95

Note how a confidence interval for a mean is obtained in R by what may seem a weird route, using the t.test function, which returns a data structure including the estimate and, by default, a 95% confidence interval, among other things. This is due to the link between testing hypotheses about the mean of a normal population and confidence intervals for the mean. Even though the result may not look bad, the 95% confidence interval points out the issues with Monte Carlo. A large sample size is needed, unless suitable variance reduction strategies are employed, as we discuss at length in Chapter 8.

We will learn later why integrating on the unit interval is so relevant, but we may estimate the expected value with respect to other distributions. For instance, we have already considered the expected value of the exponential function of a normal random variable. In Section 2.2.2 we have seen that E[eX] = 403.4288 for X ~ N(4, 22). Here we sample the normal distribution using rnorm:6

> set.seed(55555)
> X=exp(rnorm(100,mean=4,sd=2))
> t.test(X)
            One Sample t-test
data: X
t = 4.1016, df = 99, p-value = 8.426e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
153.2929 440.5990
sample estimates:
mean of x
296.9459

Once again, the results are disappointing. Indeed, for one-dimensional integration, Monte Carlo is hardly competitive with deterministic quadrature, but when computing a multidimensional integral it may be the only viable option. In general, if we have an integral like

(2.7) equation

where Ω ⊂ p, we may estimate I[g] by randomly sampling a sequence of points xi Ω, i = 1, …, m, and building the estimator

(2.8) equation

where vol(Ω) denotes the volume of the region Ω. To understand the formula, we should think that the ratio estimates the average value of the function, which must be multiplied by the volume of the integration region in order to find the integral.

We will see that in practice we need only to integrate over the unit hypercube, i.e.,

equation

hence vol(Ω) = 1. More generally, if we consider a vector random variable

equation

with joint density function fx(x1, ···, xp), we have to sample X to estimate:

equation

The good news is that, as we know from elementary inferential statistics, the half-length of a standard confidence interval with confidence level 1 − α is

equation

assuming that the sample size n is large enough to warrant use of quantiles z1−α/2 from the standard distribution. Here σ is the standard deviation of the estimator, and we see that H does not depend on the problem dimensionality7 p, whereas product rules based on classical quadrature are highly impractical for large p. The bad news is in the denominator, , which does not grow quickly enough. In fact, to reduce H by a factor of 10, we have to increase n by a factor of 100, which shows that Monte Carlo estimates may converge slowly in practice.

2.4.2 LOW-DISCREPANCY SEQUENCES

As we have seen in the previous sections, the performance of naive Monte Carlo is far from impressive for small sample sizes. We may also get a further intuition by plotting a bidimensional random sample on the unit square. The following R commands

> set.seed(55555)
> plot(runif(100),runif(100),pch=20)
> grid(nx=10)

produce the plot in Fig. 2.5. Here we are sampling a bidimensional uniform distribution, with independent marginals. We observe that the coverage is far from accurate and uniform. Using the grid(nx=10) function, we plot a grid of 10 × 10 small squares, many of which do not contain any point (ideally, each subsquare should contain exactly one). One possibility, of course, is to increase the sample size. In Chapter 9 we will see that we may define an objective measure of lack of uniform coverage of the unit hypercube, namely, the star discrepancy. Good set of points should have low discrepancy, and this leads to the concept of low-discrepancy sequences. They are deterministic sequences but, unlike grids produced by product quadrature rules, they do not display any apparent regularity. The Halton sequence is arguably the simplest low-discrepancy sequence, which is included in the library randtoolbox along with other sequences, such as the Sobol sequence. To illustrate the idea, let us generate and plot a two-dimensional sequence of 100 points using the Halton sequence:

FIGURE 2.5 A uniform random sample within the unit square.

> library(randtoolbox)
> × <- halton(100,dim=2)
> plot(x[,1], x[,2], pch=20)
> grid(nx=10)

The resulting sequence, displayed in Fig. 2.6, looks more evenly distributed than the random one of Fig. 2.5. More formally, while Monte Carlo has an error of order , low-discrepancy sequences have an error of order O(log np/n). This is bad news on the one hand, since the problem dimensionality p does play a role; on the other hand, we get rid of the square root, which makes Monte Carlo not quite efficient. In practice, it turns out that low-discrepancy sequences are efficient for problems with a moderate dimensionality. Another point worth mentioning is that if we want to add one point to a low-discrepancy, we can do so without disrupting its structure, just like with random sequences; this is not true for regular grids. We should also note that deterministic and stochastic methods are not mutually exclusive, as scrambled low-discrepancy sequences can also be used, as we shall see later in Chapter 9.

FIGURE 2.6 A bidimensional low-discrepancy sequence of points.

2.4.3 LATTICE METHODS

Another class of deterministic integration methods that is often discussed in conjunction with low-discrepancy sequences is based on lattices. Since the term “lattice” is used in diverse ways with a different meaning, it is best to state clearly what is meant in this context.

DEFINITION 2.2 (Lattice) A p-dimensional lattice is a discrete subset of p that is closed under addition and subtraction.

A familiar lattice is depicted in Fig. 2.4. Indeed, a regular grid associated with product rules is a lattice or, maybe more precisely, a lattice point set. Lattices may be associated with a basis, i.e., a linearly independent set of vectors {w1, …, wp}, called generators, such that the whole lattice is obtained by taking suitable linear combinations of the generators:

(2.9) equation

where p is the set of p-dimensional vectors with integer coordinates. Actually, different bases may generate the same lattice.

Example 2.2 A regular grid as a lattice

Consider the following basis:

equation

where the superscrip denotes vector and matrix transposition. This basis generates a grid of points with vertical and horizontal spacing given by 0.2. If we take the intersection of this grid with the half-open square [0, 1)2, we find points with coordinates in the set

equation

Clearly, there are N = 25 points in the set. Now, let us consider the matrix formed by lining up the vectors of the basis in columns:

equation

Clearly, 1/det(W) = 25, i.e., the determinant of W is related to the number N of points in the intersection between the lattice and the half-open unit square. As we shall see, this is no coincidence. More generally, if we consider a basis formed by vectors

equation

where ej is the jth unit vector in p and nj is an integer number associated with dimension j, we obtain a grid with

equation

points. If we adopt the same spacing along each dimension, i.e., we consider a single value n, we obtain a grid with N = np points.

In the following, we will be interested in integrating a function on the half-open unit hypercube [0, 1)p. One might wonder why we should consider [0, 1)p rather than [0, 1]p. From a theoretical point of view, lattice rules are designed for fairly smooth and periodic functions,8 such that f(x) = f(x + z) for any x p and z p. Thus, in lattice rules, the North–East border is ruled out as there the function is supposed to take the same values as in the opposite border, and we do not want to double-count them. When applying lattice rules to more general functions, in principle, nothing changes since we are ruling out a set of measure zero. From a numerical point of view, we should also note that many pseudorandom number generators employed in Monte Carlo integration actually produce numbers in the interval [0, 1).

Using Eq. (2.9), an integration lattice Lp is generated; the lattice point set is just the intersection

equation

which includes N points ui, i = 1, …, N. The integral of a function f(x) on the unit hypercube is approximated as

equation

Note that weights are not related to quadrature formulas, and that this rule can be associated with sampling. We may also consider lattice rules as a generalization of the rectangle rule of Section 2.1.1. The key feature of lattices is that they can be generated in a more general way than a rectangular grid, improving the accuracy of numerical integration. To this aim, a proper choice of the basis is in order.

To get a better understanding of lattices, it is useful to associate a “unit cell” with each choice of generators. The unit cell is generated by taking linear combinations as follows:

equation

It is easy to see that this is parallelepiped. To illustrate, let us consider the basis

(2.10) equation

The corresponding lattice is displayed in Fig. 2.7, along with an arbitrarily selected shaded unit cell. We note the following:

FIGURE 2.7 Lattice generated by the basis in Eq. (2.10).

The unit shaded cell can be shifted repeatedly to generate the lattice.
The area of the parallelepiped is just the determinant of the matrix W whose columns are the generators.
This area is easily seen to be det(W) = 1/5.
If we imagine tiling the space by shifting the unit cell, we see that the average number of lattice points per unit volume is just 1/ det(W) = 5.
Indeed, in the unit square [0, 1)2, we have five points, the vertices of the shaded cell plus the origin.

When choosing the basis, we have to fix the number N of points that we want to include in the point set, and this in turn is influenced by the choice of generators. Actually, we need not select p generators. We may simplify the choice and the related analysis by selecting a number r < p of generators. This results in rank-r lattice rules, which may be rewritten as

equation

Note that by taking a number x modulo 1 we obtain its fractional part. The simplest rank-r rule is obtained for r = 1, and an example of rank-1 rule is the Korobov point set, which is generated as follows:

equation

This point set can be also generated by the basis

equation

The determinant is clearly 1/N, and in fact the point set, with a proper choice of parameters, includes N points. An R function to generate a Korobov point set is illustrated in Fig. 2.8. In the code we take the increasing powers of a immediately modulo N, rather than computing possibly large powers of a and then applying the modulo operator; this may avoid issues with overflow or loss of precision. The following snapshot generates the point set displayed in Fig. 2.9.

FIGURE 2.8 Function to generate a Korobov point set.

FIGURE 2.9 Korobov point set for a = 3, d = 2, N = 10.

> a <- 3
> d <- 2
> N <- 10
> X <- korobov (a, d,N)
> X
  [1,] [,2]
  [1,] 0.1 0.3
  [2,] 0.2 0.6
  [3,] 0.3 0.9
  [4,] 0.4 0.2
  [5,] 0.5 O.5
  [6,] 0.6 0.8
  [7,] 0.7 0.1
  [8,] 0.8 0.4
  [9,] 0.9 0.7
  [10,] 0.0 0.0
> plot(X[,1],X[,2],pch=20)

This choice of parameters is good in the sense that it is projection regular. A point set consisting of N points is projection regular if the projection of its points on each coordinate axis consists of N distinct points:

> sort(X[,1])
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
> sort(X[,2])
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

We observe that a regular grid does not enjoy such a property, as it is clear in Fig. 2.4. For instance, if we build a 5 × 5 grid, we have 25 points on the grid, but only 5 along each projection. This “shadowing” effect may have the effect that we do not collect rich enough information about the function.9 Full projection is not to be taken for granted. For instance, if we take N = 12 points in the above sequence, the point set is less satisfactory:

> N <- 12
> X <- korobov(a,d,N)
> sort(X[,1])
[1] 0.00000000 0.08333333 0.16666667 0.25000000
0.33333333 0.41666667 0.50000000 0.58333333
[9] 0.66666667 0.75000000 0.83333333 0.91666667
> sort (X[,2])
[1] 0.00 0.00 0.00 0.25 0.25 0.25 0.50 0.50 0.50 0.75 0.75 0.75

In this case, we do not have 12 distinct points along the second dimension. There are theoretical conditions ensuring projection regularity, as well as tables suggesting parameters in order to generate good lattices.

Another important feature of lattice rules is that they are associated with a fixed number N of points. Unlike pseudorandom or low-discrepancy sequences, there is no obvious way to add a single point. However, extensible lattice rules have been proposed in the literature.

2.5 Relationship with moment matching

In Section 2.1.3, when developing quadrature formulas, we have considered matching the integral of a monomial exactly:

equation

If we interpret integrals as expectations and quadrature formulas as discretizations of continuous probability distributions, with realizations xj and probabilities wj, we may wonder if there is a link with matching moments E[Xk] of the continuous random variable with the corresponding moments of the discrete random variable approximating X. Indeed, the link is pretty evident for Gaussian quadrature, as we have observed in Section 2.2.2. In the following two sections we further illustrate the idea with reference to scenario generation. Again, it is important to understand that plain Monte Carlo scenario generation need not be the best option.

2.5.1 BINOMIAL LATTICES

A widely used approach to price financial options relies on a binomial lattice10 discretizing the continuous-time, continuous-state stochastic process followed by the underlying variable. The underlying variable could be the price S(t) of a stock share. Let S0 be the price of the asset at time t = 0, and consider the new price Sδt after a time period δt. We discretize the process with respect to both time and price, assuming that the new price can only take two values:

equation

Here u is a multiplicative shock corresponding to an “up” move, whereas d corresponds to a “down” move. The advantage of using multiplicative shocks is that the lattice recombines: Since S0ud = S0du, an up-down sequence leads to the same price as a down-up sequence. The resulting lattice is illustrated in Fig. 2.10. Clearly, the size of a recombining lattice grows linearly in time, whereas a non-recombining tree explodes exponentially. If we assume that the parameters u, d, and p apply to the whole lattice, how can we choose them? Since we have three degrees of freedom, we may match two moments of the random variable Sδt. Let

FIGURE 2.10 A recombining binomial lattice.

equation

We will see in Chapter 3 that if we assume a geometric Brownian motion, the resulting distribution is lognormal, and its expected value and variance are related to the drift and the volatility of the stochastic process; for now, let us just take the above values as given. The moment matching conditions are

equation

We are left with a residual degree of freedom that can be used to set additional conditions. One possible choice is p = 0.5; another widely used choice is u = 1/d. In general, we find a system of possibly nonlinear equations that can be solved exactly or approximately, depending on its complexity. For instance, if S(t) follows a geometric Brownian motion with drift μ and volatility σ, a common calibration of the lattice yields11

equation

For small values of δt, this discretized lattice does approximate very well the continuous-time, continuous-state stochastic process, allowing for efficient pricing of options, including those featuring early exercise opportunities. However, the approach can only be applied to low-dimensional problems. A subtler issue is path dependency: Some options, like Asian-style options, have a payoff depending on the whole path of the underlying variables. The recombining binomial lattice has a Markovian feature preventing its direct use in such a case. For high-dimensional and path-dependent options, Monte Carlo is the method of choice.

2.5.2 SCENARIO GENERATION IN STOCHASTIC PROGRAMMING

We have introduced stochastic programming models in Section 1.4.2. One of the main issues when formulating such problems is scenario generation. Given, e.g., a multivariate normal variable Y, modeling the random data, with expected value μ and covariance matrix Σ, we might use Monte Carlo sampling to generate a discrete set of equiprobable scenarios. As an alternative, we may look for a set of scenarios matching moments and, possibly, a few more desirable properties. Since we are dealing with a multivariate normal, we know that for each marginal Yi skewness should be zero and kurtosis should be 3:

equation

Say that we want to generate a scenario fan (the single stage of a tree) of size S and that each realization has probability 1/S, for the sake of simplicity. Let us denote by yis the value of the random variable Yi in scenario s. Then, we should have

equation

Approximate moment matching is obtained by solving the problem:

equation

The objective function includes four weights wk, which may be used to fine tune performance. Again, this is a deterministic approach that should be compared against random sampling. We will consider scenario generation for stochastic programming models in Section 10.5.3.

2.6 Numerical integration in R

Available functions to compute integrals in R are, e.g., integrate for one-dimensional integrals and adaptIntegrate for multidimensional integrals on hypercubes. Consider the integral

equation

Integration by parts yields the exact result, which we may use to check the accuracy of numerical solutions:

equation

The following R snapshot illustrates how to do the job:

> f <- function(x) exp(−x)*sin(10*x)
> integrate (f, lower=0, upper=2*pi)
0.09882501 with absolute error < 5.9e-08

It is worth noting that, because of its internal working, integrate needs a vectorized function, i.e., a function that can receive a vector input and return the corresponding vector output. The following example, involving a constant function, illustrates the issue:

> g <- function (x) return (1)
> integrate(g,0,10)
Error in integrate(g, 0, 10) :
evaluation of function gave a result of wrong length

One possibility to overcome the obstacle is to vectorize the function manually:

> g1 <- function(x) rep(1,length(x))
> g(1:5) [1] 1
> gl(1:5)
[1] 1 1 1 1 1
> integrate(g1,0,10)
10 with absolute error < 1.1e-13

Alternatively, we may use the higher level function Vectorize:

> g2 <- Vectorize(g)
> g2(1:5)
[1] 1 1 1 1 1
> integrate(g2,0,10)
10 with absolute error < 1.1e-13

The integrate function can deal with integrals on unbounded domains, and it is always suggested to take advantage of that feature, rather than using very small or very large values for lower or upper integration limits, respectively. Let us try integrating the density of a standard normal distribution to see the point:

> integrate(dnorm,0,2)
0.4772499 with absolute error < 5.3e-15
> integrate(dnorm,0,2000)
0.5 with absolute error < 4.4e-06
> integrate(dnorm,0,20000000)
0 with absolute error < 0
> integrate(dnorm,0,Inf)
0.5 with absolute error < 4.7e-05

To deal with multidimensional integrals, we may use adaptIntegrate from the package cubature. As an example, let us consider the bidimensional integral:

equation

> library(cubature)
> h <- function(x) exp(−x[1]*x[2])*(sin(6*pi*x[1])+
cos(8*pi*x[2]))
> adaptIntegrate(h,lower=c(0,0),upper=c(1,1))$integral
[1] 0.01986377

For further reading

Numerical integration methods are covered in any textbook on numerical analysis. One such general reference is [7]; for numerical methods in economics see [6].
A more specific reference for numerical integration, with emphasis on Gaussian quadrature, is [2].
Computational approaches for Gaussian quadrature are discussed, e.g., in [10], where approaches based on finding roots of orthogonal polynomials are illustrated, and in [4], which deals with approaches based on matrix eigenvalues. A related interesting reading is [3], where the role of matrix computations is emphasized, as well as the relationship with moments, which is also interesting from a probabilistic viewpoint.
A deep treatment on lattice methods for numerical integration is [11]. You may also see [9] for a treatment related to Monte Carlo and quasi–Monte Carlo sampling, including randomized versions of these methods.
Scenario generation for stochastic programming is discussed, e.g., in [5] and [8].

References

1 P. Brandimarte. Numerical Methods in Finance and Economics: A MATLAB-Based Introduction (2nd ed.). Wiley, Hoboken, NJ, 2006.

2 P.J. Davis and P. Rabinowitz. Methods of Numerical Integration (2nd ed.). Academic Press, Orlando, FL, 1984.

3 G.H. Golub and G. Meurant. Matrices, Moments, and Quadrature with Applications. Princeton University Press, Princeton, NJ, 2010.

4 G.H. Golub and J.H. Welsch. Calculation of Gaussian quadrature rules. Mathematics of Computation, 23:221–230, 1969.

5 K. Hoyland and S.W. Wallace. Generating scenario trees for multistage decision problems. Management Science, 47:296–307, 2001.

6 K.L. Judd. Numerical Methods in Economics. MIT Press, Cambridge, MA, 1998.

7 D. Kincaid and W. Cheney. Numerical Analysis: Mathematics of Scientific Computing. Brooks/Cole Publishing Company, Pacific Grove, CA, 1991.

8 A.J. King and S.W. Wallace. Modeling with Stochastic Programming. Springer, Berlin, 2012.

9 C. Lemieux. Monte Carlo and Quasi–Monte Carlo Sampling. Springer, New York, 2009.

10 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C (2nd ed.). Cambridge University Press, Cambridge, 1992.

11 I.H. Sloan and S. Joe. Lattice Methods for Multiple Integration. Oxford University Press, New York, 1994.

1 In case of need, the reader may find a refresher on the normal distribution in Section 3.2.3.

2 See. e.g., [2, Chapter 2].

3 The reader who needs a refresher on such probability distributions may have a look at Chapter 3.

4 Skewness and kurtosis for a random variable X with expected value μ and standard deviation σ are defined as E[(X − μ)3]/σ3 and E[(X − μ)4]4, respectively. Skewness measures the lack of symmetry and is zero for a normal distribution, which is symmetric; kurtosis is related to the tails of the distribution and is 3 for any normal distribution. Distributions with fatter tails, which play a key role in risk management, have a larger kurtosis.

5 See Section 3.2.3.

6 Unlike the last example, we let R print the whole output from t.test; clearly, we are only interested in the estimate, and the test p-value is irrelevant here.

7 As we shall also remark later, this statement should be somewhat tempered. It is true that p does not play an explicit role, but it may be expected that a large p will influence σ implicitly, not to mention the computational effort.

8 Periodicity is required for the validity of certain arguments showing the precision of lattice rules, which rely on Fourier series. See [11] for details.

9This is not only relevant in numerical integration but also in global optimization.

10Needless to say, this kind of lattice has nothing to do with lattices discussed in Section 2.4.3.

11See, e.g., [1] for a derivation and an illustration of possible computational advantages of this choice.