Numerical integration is a standard topic in numerical analysis, and in the previous chapter we have hinted at the link between integration and Monte Carlo methods. In this chapter we have a twofold objective:
In many financial problems we are interested in the expected value of a function of random variables. For instance, the fair price of a European-style option may be evaluated as the discounted expected value of its payoff under a risk-neutral probability measure. In the one-dimensional case, the expected value of a function g(·) of a single random variable X with probability density fx(x) is given by the following integral:
This is just an ordinary and quite familiar integral. We also recall that when we need to estimate the probability of an event A, which may occur or not depending on the realization of a random variable X, we are back to the case above:
where 1A (x) is the indicator function for event A (taking the value 1 if event A occurs when X = x, 0 otherwise).
In lucky cases, an integral may be evaluated in analytical or semianalytical form. To illustrate the latter case, consider the Black–Scholes–Merton (BSM) formula for the price of a call option written on a non-dividend-paying stock share at time t:
In the formula, S(t) is the current spot price of the underlying asset, K is the strike price, T is the option maturity, i.e., when it can be exercised, and
where r is the continuously compounded risk-free rate, and σ is the annualized volatility of the underlying asset. Both r and σ are assumed constant in this model. Finally, Φ(z) is the cumulative distribution function of a standard normal variable:
The BSM formula is typically labeled as analytical, even though, strictly speaking, it is semianalytical, since there is no analytical expression for φ(z); nevertheless, there are quite efficient and accurate ways to approximate this function, and we need not resort to general-purpose numerical integration. In other cases, we do need a numerical approach, typically based on quadrature formulas. In this chapter we will only give the basics of such numerical strategies, but quite sophisticated methods are available. Indeed, there is no need for Monte Carlo methods in low-dimensional problems. The difficulty arises when we deal with a vector random variable X, with support , and the corresponding multidimensional integral
Product quadrature formulas quickly grow out of hand, in the sense that they require a huge number of points, and this is where Monte Carlo comes into play. Indeed, even sophisticated numerical computing environments only offer limited support for multidimensional integration via classical quadrature formulas.
However, we cannot afford to dismiss deterministic approaches to numerical integration. To see why, consider a stochastic optimization problem, whereby we need the expected value of a function h(·, ·) depending on both a vector of control variables x, modeling our decisions, and a vector of random variables ξ with joint density fξ(·), modeling what we cannot control:
In such a case, we do need just a single number, i.e., an expected value estimated by a sample mean, but a way to approximate the function H(x), possibly obtained by selecting S scenarios characterized by realizations ξs and probabilities πs, s = 1, …, S:
As we have already pointed out, Monte Carlo methods may need a large sample size to achieve an acceptable degree of accuracy, and we may not afford that when solving a difficult optimization problem. In this setting, alternative approaches may be valuable. Some of them are deterministic in nature and are best understood with reference to a view integrating numerical integration and random sampling.
We start in Section 2.1 with a brief overview of quadrature formulas, emphasizing the role of interpolation based on polynomials. Then, we outline more sophisticated Gaussian quadrature formulas in Section 2.2. These approaches are developed for one-dimensional integration but, at least in principle, they can be extended to the multidimensional case by product formulas; this idea and its limitations are discussed in Section 2.3. Alternative approaches for high-dimensional problems, such as good lattices, Monte Carlo, and quasi–Monte Carlo, are introduced in Section 2.4. In Section 2.5 we pause to illustrate some important links between statistics and numerical integration, by considering moment matching methods. We close the chapter by outlining some R functions for numerical integration in Section 2.6.
Consider the problem of approximating the value of the one-dimensional definite integral
involving a function f(·) of a single variable, over a bounded interval [a, b]. Since the integration is a linear operator, i.e.,
for all functions f, g and real numbers α, β, it is natural to look for an approximation preserving linearity. A quadrature formula for the interval [a, b] is defined by two ingredients:
Then, we approximate I[f] by
To be precise, a quadrature formula like the one we are describing is called a closed formula, since the set of nodes includes the extreme points a and b of the interval. Open formulas are used when the function is not well-behaved near a or b, or when we are integrating on an infinite interval. In this book we will only consider closed formulas, as in most cases of interest for Monte Carlo simulation, as we shall see later, the integration domain is the unit interval [0, 1] or the unit hypercube [0, 1]p.
One intuitive way to choose nodes and weights is the rectangle rule. The idea is illustrated in Fig. 2.1, where the underlying idea is clearly to regard the integral as an area and to approximate it by the union of rectangles. If we are integrating over the unit interval [0, 1], the rectangle rule may be expressed as
FIGURE 2.1 Illustration of the rectangle rule.
(2.1)
Note that, in such a case, we divide the unit interval in n slices, and we take the function value at the left end of each subinterval as the representative value for that slice; if we sum for j in the range from 1 to n, we are considering right end points.
For the case in the figure, i.e., f(x) = x, it is fairly easy to see that
Hence, the absolute integration error is
As expected, the error goes to zero when n goes to infinity, but perhaps we may do better. By “better” we do not only mean faster convergence. One would expect that, at least for such an easy function like f(x) = x, a quadrature formula would be able to hand us the exact result. We shall see next how to accomplish this, but we should mention that the simple rectangle formula can be generalized to much more sophisticated rules based on good lattices, as we shall see later.
Polynomials are arguably the easiest function to integrate. We also know, from one of the many Weierstrass theorems, that a continuous function can be approximated on an interval by a suitably high-order polynomial with arbitrary precision. So, it is natural to look for quadrature formulas such that the integration error
is zero for a wide class of polynomials.
DEFINITION 2.1 (Order of a quadrature formula) We say that a quadrature formula Q is of order m if the integration error is zero for all the polynomials of degree m or less, but there is a polynomial of degree m + 1 such that the error is not zero.
Now, how can we approximate a function by a polynomial? One possibility is represented by Lagrange polynomials, which we outline very briefly. Given a function f(·), say that we want to interpolate it at a given set of points (xj, yj), j = 0, 1, …, n, where yj = f(xj), and xj ≠ xk for j ≠ k. It is easy to find a polynomial Pn of degree (at most) n such that Pn(xj) = yj for all j. The Lagrange polynomial Lj(x) is defined as
(2.2)
Note that this is a polynomial of degree n and that
Then, an interpolating polynomial can be easily written as
Now, in order to define a quadrature formula, we have to define its nodes. The seemingly obvious way is to consider equally spaced nodes:
where h = (b − a)/n; also let fj = f(xj). As it turns out, this choice need not be the best one, but it is a natural starting point. Selecting equally spaced nodes yields the set of Newton–Cotes quadrature formulas. Given these n + 1 nodes and the corresponding Lagrange polynomial Pn(x) of degree n, we compute the quadrature weights as follows:
which yields
(2.3)
Consider the case of two nodes only, x0 = a and x1 = b, with step h = x1 − x0. Here we are just interpolating f by a straight line:
A straightforward calculation yields
A closer look shows that this formula approximates the area below the function using a trapeze. We may improve the integration error by applying the formula on subintervals, thus approximating the overall area by trapezoidal elements, as depicted in Fig. 2.2. Thus, we find the trapezoidal rule:
FIGURE 2.2 Illustrating the trapezoidal quadrature formula.
This idea is general: Given any quadrature formula for an interval, we can build a composite formula by applying the same pattern to small subintervals of the integration domain.
A quadrature formula based on n + 1 nodes and the corresponding weights obtained by Lagrange polynomials is by construction exact for polynomials of degree ≤ n. We may go the other way around, and find weights of a formula by requiring that it features a given order. For the sake of simplicity, let us consider an integral on the unit interval, and the three quadrature nodes 0, 0.5, 1:
Since we have three nodes, we should be able to find a formula that is exact for polynomials of degree ≤ 2. Given linearity of integration, we may enforce this requirement by making sure that the formula is exact for the following monomials:
The corresponding weights can be found by solving the following system of linear equations:
which yields . Applying the same idea on the more general interval [a, b], we obtain the well-known Simpson’s rule:
It is fairly easy to see that, somewhat surprisingly, this formula is actually exact for polynomials of degree ≤ 3. In fact, we have
Applying Simpson’s rule we have, by some straightforward algebra:
In order to improve precision, Simpson’s rule may be applied to subintervals of (a, b), resulting in a composite formula.
What we have discussed only scratches the surface of numerical integration. We would also need:
In the next section we give a partial answer to the last question. What is fundamental, however, is to start seeing the connection between the system of equations (2.4), which is motivated by a classical problem in numerical analysis, and the statistical concept of moment matching, to be discussed in Section 2.5.
In Newton–Cotes formulas, nodes are fixed and suitable weights are sought in order to squeeze the most in terms of quadrature order. With n + 1 nodes, we may find a quadrature formula with order n. However, we might improve the order by finding nodes and weights jointly, as this essentially means doubling the degrees of freedom. In fact, this is the rationale behind Gaussian quadrature. Gaussian quadrature may yield high-quality numerical integration, and it is particularly relevant for some problems in finance, statistics, and stochastic optimization, as it may be interpreted in terms of discretization of a continuous probability distribution. There are different Gaussian quadrature formulas, associated with a non-negative weight function w(x) and an integration interval:
(2.5)
for a weight function w(x) and an integration interval [a, b]. In this section, for the sake of convenience, we consider n nodes xi, i = 1, …, n. The integration interval can also be (−∞, +∞), in which case we need a weight function w(x) that goes to zero quickly enough to guarantee, as far as possible, the existence of the integral. This is quite relevant for our purposes, as we will relate w(x) to a probability density, such as the density of a normal random variable.1 Indeed, the integration interval is (−∞, +∞), and a weight function doing the job is w(x) = e−x2. With such a choice of the weight function, we talk of Gauss–Hermite quadrature formulas. Actually, this weight function is not exactly the density of a normal variable, but it is easy to adjust the formula by a change of variable. Let Y be a random variable with normal distribution N(μ, σ2). Then, recalling the familiar density of a normal variable:
In order to use weights and nodes from a Gauss–Hermite formula, we need the following change of variable:
Hence, we find
In practice, we may regard Gauss–Hermite formulas as a way to approximate a normal distribution by a discrete random variable with realizations xi and probabilities wi, i = 1, …, n. In the next two sections we provide the interested reader with the theoretical background behind Gaussian quadrature, showing the role of orthogonal polynomials, and some hints about computational issues. However, the practically oriented reader may safely skip these sections and proceed directly to Section 2.2.2, where we show how to use the corresponding R package.
There is a rich theory behind Gaussian quadrature formulas, and the key concept is related to families of orthogonal polynomials. In fact, a quadrature formula with maximum order is found by choosing nodes nodes as the n roots of a polynomial of order n, selected within a family of orthogonal polynomials with respect to the inner product
where w(·) is a weight function and the integration interval can be bounded or, with a suitable choice of the weight function, unbounded. The following properties, among many others, can be proved:2
Using this choice of nodes, along with a proper choice of weights, yields a quadrature formula with order 2n − 1. To see this, let Πn denote the set of polynomials of degree n, and let us consider a polynomial q Πn that is orthogonal to all polynomials in Πn−1, i.e.,
One might wonder whether such a polynomial can always be found. The answer lies in how families of orthogonal polynomials can be built, on the basis of recursive formulas that we hint at later. We obtain a sequence of polynomials of increasing degree and, since orthogonal polynomials of degree up to n − 1 form a basis for Πn−1, the next polynomial in the sequence is of degree n and orthogonal to all polynomials in Πn−1.
Now, let us consider a function f Π2n−1, i.e., a polynomial of order 2n − 1. Any such polynomial can be divided by the aforementioned q, obtaining a quotient polynomial p and a remainder polynomial r:
where p, r Πn−1. Now let us integrate the (ordinary) function product wf by a quadrature formula based on n nodes xi, i = 1, …, n, chosen as the zeros of q:
This shows that indeed the formula is exact for polynomials of degree 2n − 1. Using this property, it is possible to prove the following essential facts:
These are essential facts in light of our probabilistic applications. In fact, different formulas can be adapted to different probability distributions, like we have shown for Gauss–Hermite formulas. The specific family of orthogonal polynomials to be used depends on the weight function and the integration interval. Other examples are:
After clarifying the link between orthogonal polynomials and Gaussian quadrature, how can we actually find nodes and weights? There are different approaches.
Gaussian quadrature in R is accomplished by the package statmod, which offers the following two functions:
gauss.quad.prob(n,dist=“uniform”,1=0,u=1,mu=0,sigma=1, alpha=1,beta=1),
In both cases, the output is a list including weights and nodes. To illustrate, let us check a Gauss–Hermite quadrature with 5 nodes, for a normal random variable with parameters μ = 50 and σ = 20:
> mu <- 50 > sigma <- 20 > N <- 5 > out <- gauss.quad(N, kind=“hermite”) > w1 <- out$weights/sqrt(pi) > x1 <- sqrt(2)*sigma*out$nodes + mu > out <- gauss.quad.prob(N,dist=“normal”,mu=mu,sigma=sigma) > w2 <- out$weights > x2 <- out$nodes > rbind(w1,w2) [,1] [,2] [,3] [,4] [,5] w1 0.01125741 0.2220759 0.5333333 0.2220759 0.01125741 w2 0.01125741 0.2220759 0.5333333 0.2220759 0.01125741 > rbind(x1,x2) [,1] [,2] [,3] [,4] [,5] x1 −7.1394 22.88748 50 77.11252 107.1394 x2 −7.1394 22.88748 50 77.11252 107.1394
We immediately verify that the same results are obtained by using the two functions, provided that we apply the correct variable transformations. The nodes, as expected, are symmetrically centered around the expected value. Furthermore, we may also check that probabilities add up to one and we get the correct first- and second-order moments:
> sum(w1) [1] 1 > sum(w1*x1) [1] 50 > sqrt(sum(w1*x1^2)-sum(w1*x1) ^2) [1] 20
Readers familiar with the concepts of skewness and kurtosis, will also appreciate the following results:4
> sum(w1*(x1-50)^3)/20^3 [1] −8.526513e-16 > sum(w1*(x1-50)^4)/20^4 [1] 3
Within numerical accuracy, the above results are correct for a normal distribution: We are able to match moments of order 3 and 4 as well, with only five nodes. We will come back to this in Section 2.5. All of the results above are related to the quadrature order of Gauss–Hermite formulas when applied to polynomials. With a more complicated function, say, an exponential, we cannot expect exact results; so, let us compute the expected value of an exponential function of a normal random variable. In this particular case we can even afford an instructive comparison with the exact result. Indeed, given the properties of the lognormal distribution,5 we know that if X ~ N(μ, σ2), then
The script displayed in Fig. 2.3 computes the percentage error for μ = 4 and σ = 2, when 5, 10, 15, and 20 nodes are used. Running the script, we observe that a remarkable precision is achieved with a fairly modest number of nodes:
FIGURE 2.3 Script to check Gauss–Hermite quadrature.
N= 5 True= 403.4288 Approx= 398.6568 %error 1.182865 N= 10 True= 403.4288 Approx= 403.4286 %error 5.537709e-05 N= 15 True= 403.4288 Approx= 403.4288 %error 1.893283e-10 N= 20 True= 403.4288 Approx= 403.4288 %error 9.863052e-14
These results explain why Gaussian quadrature is used, e.g., when solving stochastic dynamic optimization problems by dynamic programming. However, there remains an issue: How does all of this scale to higher dimensions? We investigate the issue next; unfortunately, the answer is not quite positive.
Let us consider a function of p variables
How can we integrate this function over a domain Ω? We know from Fubini theorem that a multidimensional integral can be transformed into a sequence of nested one-dimensional integrals. If we consider the case in which the integration region is just the Cartesian product of subsets,
we have
It is worth mentioning that this is indeed the relevant case for Monte Carlo simulation, where Ω = [0, 1]p = [0, 1] × [0, 1] × ··· × [0, 1].
Therefore, it seems quite natural to extend quadrature formulas to multidimensional integration by using a product rule reflecting the above structure. More precisely, let us assume that we have weights and nodes for a Newton–Cotes quadrature formula along each dimension: For dimension k, k = 1, …, p, we have weights wik and nodes xik, i = 1, …, mk. A product rule approximates the integral as
The idea behind a product rule is quite intuitive, and it is illustrated in Fig. 2.4. From the figure, we may immediately appreciate the drawbacks of the approach:
FIGURE 2.4 An illustration of product rules.
Since such regular grid is going to be impractical for large p, we have to find alternative approaches, either deterministic or stochastic.
In order to overcome the limitations of classical quadrature, alternative approaches have been developed. Needless to say, random sampling strategies collectively known as Monte Carlo methods play a prominent role in this respect, but we should not take for granted that random methods are the only alternative. In fact, there are deterministic strategies that work well in medium-sized problems:
Here we just introduce Monte Carlo methods within the framework of numerical integration, leaving further development to later chapters. Low-discrepancy sequences are also just outlined here, as they are the subject of Chapter 9. We discuss lattice methods in some more detail, as they will not be dealt with elsewhere in this book.
To see how random sampling may be used to compute a definite integral, which is a deterministic quantity, let us consider a one-dimensional integral on the unit interval [0, 1]:
The last rewriting, where we multiply the function g(u) by 1, may not look much of an achievement. However, let us recall that the density of a uniform random variable U on the unit interval [0, 1] is, in fact,
Therefore, we may think of this integral as the expected value
for U ~ U(0, 1). Now it is natural to estimate the expected value by a sample mean. What we have to do is to generate a sequence {Ui} of independent random realizations from the uniform distribution, and then to estimate I as
The strong law of large numbers implies that, with probability 1,
Genuine random sampling is not really feasible with a computer, but we can produce a sequence of pseudorandom numbers using generators provided by most programming languages and environments, including R, as we shall see in Chapter 5.
Consider the trivial case
To sample the uniform distribution, we rely on the run if function, whose first argument is the sample size, setting the seed of the generator in order to allow the replication of the experiment:
> set.seed(55555) > mean(exp(runif(10))) [1] 1.657509 > mean(exp(runif(10))) [1] 1.860796 > mean(exp(runif(10))) [1] 1.468096 > mean(exp(runif(100))) [1] 1.701009 > mean(exp(runif(100))) [1] 1.762114 > mean(exp(runif(100))) [1] 1.706034 > mean(exp(runif(1000000))) [1] 1.718733
Clearly, a sample size m = 10 does not yield a reliable estimate, and m = 100 is just a bit better. A large sample size is needed to get closer to the actual result. Monte Carlo looks extremely straightforward and flexible, but not necessarily quite efficient. It is useful to compare the above results with those obtained by Gaussian quadrature:
> require(statmod) Loading required package: statmod > out=gauss.quad.prob(10,dist=“uniform”, 1=0, u=1) > sum(out$weight*exp(out$nodes)) [1] 1.718282
Needless to say, we do not know the exact result in practice, and we should wonder how to qualify the reliability of the estimate. This can be done by computing a confidence interval as follows:
> set.seed(55555) > out <- t.test(exp(runif (100))) > out$estimate mean of x 1.717044 > out$conf.int [1] 1.621329 1.812759 attr(,“conf.level”) [1] 0.95
Note how a confidence interval for a mean is obtained in R by what may seem a weird route, using the t.test function, which returns a data structure including the estimate and, by default, a 95% confidence interval, among other things. This is due to the link between testing hypotheses about the mean of a normal population and confidence intervals for the mean. Even though the result may not look bad, the 95% confidence interval points out the issues with Monte Carlo. A large sample size is needed, unless suitable variance reduction strategies are employed, as we discuss at length in Chapter 8.
We will learn later why integrating on the unit interval is so relevant, but we may estimate the expected value with respect to other distributions. For instance, we have already considered the expected value of the exponential function of a normal random variable. In Section 2.2.2 we have seen that E[eX] = 403.4288 for X ~ N(4, 22). Here we sample the normal distribution using rnorm:6
> set.seed(55555) > X=exp(rnorm(100,mean=4,sd=2)) > t.test(X) One Sample t-test data: X t = 4.1016, df = 99, p-value = 8.426e-05 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 153.2929 440.5990 sample estimates: mean of x 296.9459
Once again, the results are disappointing. Indeed, for one-dimensional integration, Monte Carlo is hardly competitive with deterministic quadrature, but when computing a multidimensional integral it may be the only viable option. In general, if we have an integral like
(2.7)
where Ω ⊂ p, we may estimate I[g] by randomly sampling a sequence of points xi
Ω, i = 1, …, m, and building the estimator
(2.8)
where vol(Ω) denotes the volume of the region Ω. To understand the formula, we should think that the ratio estimates the average value of the function, which must be multiplied by the volume of the integration region in order to find the integral.
We will see that in practice we need only to integrate over the unit hypercube, i.e.,
hence vol(Ω) = 1. More generally, if we consider a vector random variable
with joint density function fx(x1, ···, xp), we have to sample X to estimate:
The good news is that, as we know from elementary inferential statistics, the half-length of a standard confidence interval with confidence level 1 − α is
assuming that the sample size n is large enough to warrant use of quantiles z1−α/2 from the standard distribution. Here σ is the standard deviation of the estimator, and we see that H does not depend on the problem dimensionality7 p, whereas product rules based on classical quadrature are highly impractical for large p. The bad news is in the denominator, , which does not grow quickly enough. In fact, to reduce H by a factor of 10, we have to increase n by a factor of 100, which shows that Monte Carlo estimates may converge slowly in practice.
As we have seen in the previous sections, the performance of naive Monte Carlo is far from impressive for small sample sizes. We may also get a further intuition by plotting a bidimensional random sample on the unit square. The following R commands
> set.seed(55555) > plot(runif(100),runif(100),pch=20) > grid(nx=10)
produce the plot in Fig. 2.5. Here we are sampling a bidimensional uniform distribution, with independent marginals. We observe that the coverage is far from accurate and uniform. Using the grid(nx=10) function, we plot a grid of 10 × 10 small squares, many of which do not contain any point (ideally, each subsquare should contain exactly one). One possibility, of course, is to increase the sample size. In Chapter 9 we will see that we may define an objective measure of lack of uniform coverage of the unit hypercube, namely, the star discrepancy. Good set of points should have low discrepancy, and this leads to the concept of low-discrepancy sequences. They are deterministic sequences but, unlike grids produced by product quadrature rules, they do not display any apparent regularity. The Halton sequence is arguably the simplest low-discrepancy sequence, which is included in the library randtoolbox along with other sequences, such as the Sobol sequence. To illustrate the idea, let us generate and plot a two-dimensional sequence of 100 points using the Halton sequence:
FIGURE 2.5 A uniform random sample within the unit square.
> library(randtoolbox) > × <- halton(100,dim=2) > plot(x[,1], x[,2], pch=20) > grid(nx=10)
The resulting sequence, displayed in Fig. 2.6, looks more evenly distributed than the random one of Fig. 2.5. More formally, while Monte Carlo has an error of order , low-discrepancy sequences have an error of order O(log np/n). This is bad news on the one hand, since the problem dimensionality p does play a role; on the other hand, we get rid of the square root, which makes Monte Carlo not quite efficient. In practice, it turns out that low-discrepancy sequences are efficient for problems with a moderate dimensionality. Another point worth mentioning is that if we want to add one point to a low-discrepancy, we can do so without disrupting its structure, just like with random sequences; this is not true for regular grids. We should also note that deterministic and stochastic methods are not mutually exclusive, as scrambled low-discrepancy sequences can also be used, as we shall see later in Chapter 9.
FIGURE 2.6 A bidimensional low-discrepancy sequence of points.
Another class of deterministic integration methods that is often discussed in conjunction with low-discrepancy sequences is based on lattices. Since the term “lattice” is used in diverse ways with a different meaning, it is best to state clearly what is meant in this context.
DEFINITION 2.2 (Lattice) A p-dimensional lattice is a discrete subset of p that is closed under addition and subtraction.
A familiar lattice is depicted in Fig. 2.4. Indeed, a regular grid associated with product rules is a lattice or, maybe more precisely, a lattice point set. Lattices may be associated with a basis, i.e., a linearly independent set of vectors {w1, …, wp}, called generators, such that the whole lattice is obtained by taking suitable linear combinations of the generators:
where p is the set of p-dimensional vectors with integer coordinates. Actually, different bases may generate the same lattice.
Consider the following basis:
where the superscrip denotes vector and matrix transposition. This basis generates a grid of points with vertical and horizontal spacing given by 0.2. If we take the intersection of this grid with the half-open square [0, 1)2, we find points with coordinates in the set
Clearly, there are N = 25 points in the set. Now, let us consider the matrix formed by lining up the vectors of the basis in columns:
Clearly, 1/det(W) = 25, i.e., the determinant of W is related to the number N of points in the intersection between the lattice and the half-open unit square. As we shall see, this is no coincidence. More generally, if we consider a basis formed by vectors
where ej is the jth unit vector in p and nj is an integer number associated with dimension j, we obtain a grid with
points. If we adopt the same spacing along each dimension, i.e., we consider a single value n, we obtain a grid with N = np points.
In the following, we will be interested in integrating a function on the half-open unit hypercube [0, 1)p. One might wonder why we should consider [0, 1)p rather than [0, 1]p. From a theoretical point of view, lattice rules are designed for fairly smooth and periodic functions,8 such that f(x) = f(x + z) for any x
p and z
p. Thus, in lattice rules, the North–East border is ruled out as there the function is supposed to take the same values as in the opposite border, and we do not want to double-count them. When applying lattice rules to more general functions, in principle, nothing changes since we are ruling out a set of measure zero. From a numerical point of view, we should also note that many pseudorandom number generators employed in Monte Carlo integration actually produce numbers in the interval [0, 1).
Using Eq. (2.9), an integration lattice Lp is generated; the lattice point set is just the intersection
which includes N points ui, i = 1, …, N. The integral of a function f(x) on the unit hypercube is approximated as
Note that weights are not related to quadrature formulas, and that this rule can be associated with sampling. We may also consider lattice rules as a generalization of the rectangle rule of Section 2.1.1. The key feature of lattices is that they can be generated in a more general way than a rectangular grid, improving the accuracy of numerical integration. To this aim, a proper choice of the basis is in order.
To get a better understanding of lattices, it is useful to associate a “unit cell” with each choice of generators. The unit cell is generated by taking linear combinations as follows:
It is easy to see that this is parallelepiped. To illustrate, let us consider the basis
The corresponding lattice is displayed in Fig. 2.7, along with an arbitrarily selected shaded unit cell. We note the following:
FIGURE 2.7 Lattice generated by the basis in Eq. (2.10).
When choosing the basis, we have to fix the number N of points that we want to include in the point set, and this in turn is influenced by the choice of generators. Actually, we need not select p generators. We may simplify the choice and the related analysis by selecting a number r < p of generators. This results in rank-r lattice rules, which may be rewritten as
Note that by taking a number x modulo 1 we obtain its fractional part. The simplest rank-r rule is obtained for r = 1, and an example of rank-1 rule is the Korobov point set, which is generated as follows:
This point set can be also generated by the basis
The determinant is clearly 1/N, and in fact the point set, with a proper choice of parameters, includes N points. An R function to generate a Korobov point set is illustrated in Fig. 2.8. In the code we take the increasing powers of a immediately modulo N, rather than computing possibly large powers of a and then applying the modulo operator; this may avoid issues with overflow or loss of precision. The following snapshot generates the point set displayed in Fig. 2.9.
FIGURE 2.8 Function to generate a Korobov point set.
FIGURE 2.9 Korobov point set for a = 3, d = 2, N = 10.
> a <- 3 > d <- 2 > N <- 10 > X <- korobov (a, d,N) > X [1,] [,2] [1,] 0.1 0.3 [2,] 0.2 0.6 [3,] 0.3 0.9 [4,] 0.4 0.2 [5,] 0.5 O.5 [6,] 0.6 0.8 [7,] 0.7 0.1 [8,] 0.8 0.4 [9,] 0.9 0.7 [10,] 0.0 0.0 > plot(X[,1],X[,2],pch=20)
This choice of parameters is good in the sense that it is projection regular. A point set consisting of N points is projection regular if the projection of its points on each coordinate axis consists of N distinct points:
> sort(X[,1]) [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 > sort(X[,2]) [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
We observe that a regular grid does not enjoy such a property, as it is clear in Fig. 2.4. For instance, if we build a 5 × 5 grid, we have 25 points on the grid, but only 5 along each projection. This “shadowing” effect may have the effect that we do not collect rich enough information about the function.9 Full projection is not to be taken for granted. For instance, if we take N = 12 points in the above sequence, the point set is less satisfactory:
> N <- 12 > X <- korobov(a,d,N) > sort(X[,1]) [1] 0.00000000 0.08333333 0.16666667 0.25000000 0.33333333 0.41666667 0.50000000 0.58333333 [9] 0.66666667 0.75000000 0.83333333 0.91666667 > sort (X[,2]) [1] 0.00 0.00 0.00 0.25 0.25 0.25 0.50 0.50 0.50 0.75 0.75 0.75
In this case, we do not have 12 distinct points along the second dimension. There are theoretical conditions ensuring projection regularity, as well as tables suggesting parameters in order to generate good lattices.
Another important feature of lattice rules is that they are associated with a fixed number N of points. Unlike pseudorandom or low-discrepancy sequences, there is no obvious way to add a single point. However, extensible lattice rules have been proposed in the literature.
In Section 2.1.3, when developing quadrature formulas, we have considered matching the integral of a monomial exactly:
If we interpret integrals as expectations and quadrature formulas as discretizations of continuous probability distributions, with realizations xj and probabilities wj, we may wonder if there is a link with matching moments E[Xk] of the continuous random variable with the corresponding moments of the discrete random variable approximating X. Indeed, the link is pretty evident for Gaussian quadrature, as we have observed in Section 2.2.2. In the following two sections we further illustrate the idea with reference to scenario generation. Again, it is important to understand that plain Monte Carlo scenario generation need not be the best option.
A widely used approach to price financial options relies on a binomial lattice10 discretizing the continuous-time, continuous-state stochastic process followed by the underlying variable. The underlying variable could be the price S(t) of a stock share. Let S0 be the price of the asset at time t = 0, and consider the new price Sδt after a time period δt. We discretize the process with respect to both time and price, assuming that the new price can only take two values:
Here u is a multiplicative shock corresponding to an “up” move, whereas d corresponds to a “down” move. The advantage of using multiplicative shocks is that the lattice recombines: Since S0ud = S0du, an up-down sequence leads to the same price as a down-up sequence. The resulting lattice is illustrated in Fig. 2.10. Clearly, the size of a recombining lattice grows linearly in time, whereas a non-recombining tree explodes exponentially. If we assume that the parameters u, d, and p apply to the whole lattice, how can we choose them? Since we have three degrees of freedom, we may match two moments of the random variable Sδt. Let
FIGURE 2.10 A recombining binomial lattice.
We will see in Chapter 3 that if we assume a geometric Brownian motion, the resulting distribution is lognormal, and its expected value and variance are related to the drift and the volatility of the stochastic process; for now, let us just take the above values as given. The moment matching conditions are
We are left with a residual degree of freedom that can be used to set additional conditions. One possible choice is p = 0.5; another widely used choice is u = 1/d. In general, we find a system of possibly nonlinear equations that can be solved exactly or approximately, depending on its complexity. For instance, if S(t) follows a geometric Brownian motion with drift μ and volatility σ, a common calibration of the lattice yields11
For small values of δt, this discretized lattice does approximate very well the continuous-time, continuous-state stochastic process, allowing for efficient pricing of options, including those featuring early exercise opportunities. However, the approach can only be applied to low-dimensional problems. A subtler issue is path dependency: Some options, like Asian-style options, have a payoff depending on the whole path of the underlying variables. The recombining binomial lattice has a Markovian feature preventing its direct use in such a case. For high-dimensional and path-dependent options, Monte Carlo is the method of choice.
We have introduced stochastic programming models in Section 1.4.2. One of the main issues when formulating such problems is scenario generation. Given, e.g., a multivariate normal variable Y, modeling the random data, with expected value μ and covariance matrix Σ, we might use Monte Carlo sampling to generate a discrete set of equiprobable scenarios. As an alternative, we may look for a set of scenarios matching moments and, possibly, a few more desirable properties. Since we are dealing with a multivariate normal, we know that for each marginal Yi skewness should be zero and kurtosis should be 3:
Say that we want to generate a scenario fan (the single stage of a tree) of size S and that each realization has probability 1/S, for the sake of simplicity. Let us denote by yis the value of the random variable Yi in scenario s. Then, we should have
Approximate moment matching is obtained by solving the problem:
The objective function includes four weights wk, which may be used to fine tune performance. Again, this is a deterministic approach that should be compared against random sampling. We will consider scenario generation for stochastic programming models in Section 10.5.3.
Available functions to compute integrals in R are, e.g., integrate for one-dimensional integrals and adaptIntegrate for multidimensional integrals on hypercubes. Consider the integral
Integration by parts yields the exact result, which we may use to check the accuracy of numerical solutions:
The following R snapshot illustrates how to do the job:
> f <- function(x) exp(−x)*sin(10*x) > integrate (f, lower=0, upper=2*pi) 0.09882501 with absolute error < 5.9e-08
It is worth noting that, because of its internal working, integrate needs a vectorized function, i.e., a function that can receive a vector input and return the corresponding vector output. The following example, involving a constant function, illustrates the issue:
> g <- function (x) return (1) > integrate(g,0,10) Error in integrate(g, 0, 10) : evaluation of function gave a result of wrong length
One possibility to overcome the obstacle is to vectorize the function manually:
> g1 <- function(x) rep(1,length(x)) > g(1:5) [1] 1 > gl(1:5) [1] 1 1 1 1 1 > integrate(g1,0,10) 10 with absolute error < 1.1e-13
Alternatively, we may use the higher level function Vectorize:
> g2 <- Vectorize(g) > g2(1:5) [1] 1 1 1 1 1 > integrate(g2,0,10) 10 with absolute error < 1.1e-13
The integrate function can deal with integrals on unbounded domains, and it is always suggested to take advantage of that feature, rather than using very small or very large values for lower or upper integration limits, respectively. Let us try integrating the density of a standard normal distribution to see the point:
> integrate(dnorm,0,2) 0.4772499 with absolute error < 5.3e-15 > integrate(dnorm,0,2000) 0.5 with absolute error < 4.4e-06 > integrate(dnorm,0,20000000) 0 with absolute error < 0 > integrate(dnorm,0,Inf) 0.5 with absolute error < 4.7e-05
To deal with multidimensional integrals, we may use adaptIntegrate from the package cubature. As an example, let us consider the bidimensional integral:
> library(cubature) > h <- function(x) exp(−x[1]*x[2])*(sin(6*pi*x[1])+ cos(8*pi*x[2])) > adaptIntegrate(h,lower=c(0,0),upper=c(1,1))$integral [1] 0.01986377
1 P. Brandimarte. Numerical Methods in Finance and Economics: A MATLAB-Based Introduction (2nd ed.). Wiley, Hoboken, NJ, 2006.
2 P.J. Davis and P. Rabinowitz. Methods of Numerical Integration (2nd ed.). Academic Press, Orlando, FL, 1984.
3 G.H. Golub and G. Meurant. Matrices, Moments, and Quadrature with Applications. Princeton University Press, Princeton, NJ, 2010.
4 G.H. Golub and J.H. Welsch. Calculation of Gaussian quadrature rules. Mathematics of Computation, 23:221–230, 1969.
5 K. Hoyland and S.W. Wallace. Generating scenario trees for multistage decision problems. Management Science, 47:296–307, 2001.
6 K.L. Judd. Numerical Methods in Economics. MIT Press, Cambridge, MA, 1998.
7 D. Kincaid and W. Cheney. Numerical Analysis: Mathematics of Scientific Computing. Brooks/Cole Publishing Company, Pacific Grove, CA, 1991.
8 A.J. King and S.W. Wallace. Modeling with Stochastic Programming. Springer, Berlin, 2012.
9 C. Lemieux. Monte Carlo and Quasi–Monte Carlo Sampling. Springer, New York, 2009.
10 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C (2nd ed.). Cambridge University Press, Cambridge, 1992.
11 I.H. Sloan and S. Joe. Lattice Methods for Multiple Integration. Oxford University Press, New York, 1994.
1 In case of need, the reader may find a refresher on the normal distribution in Section 3.2.3.
2 See. e.g., [2, Chapter 2].
3 The reader who needs a refresher on such probability distributions may have a look at Chapter 3.
4 Skewness and kurtosis for a random variable X with expected value μ and standard deviation σ are defined as E[(X − μ)3]/σ3 and E[(X − μ)4]/σ4, respectively. Skewness measures the lack of symmetry and is zero for a normal distribution, which is symmetric; kurtosis is related to the tails of the distribution and is 3 for any normal distribution. Distributions with fatter tails, which play a key role in risk management, have a larger kurtosis.
5 See Section 3.2.3.
6 Unlike the last example, we let R print the whole output from t.test; clearly, we are only interested in the estimate, and the test p-value is irrelevant here.
7 As we shall also remark later, this statement should be somewhat tempered. It is true that p does not play an explicit role, but it may be expected that a large p will influence σ implicitly, not to mention the computational effort.
8 Periodicity is required for the validity of certain arguments showing the precision of lattice rules, which rely on Fourier series. See [11] for details.
9This is not only relevant in numerical integration but also in global optimization.
10Needless to say, this kind of lattice has nothing to do with lattices discussed in Section 2.4.3.
11See, e.g., [1] for a derivation and an illustration of possible computational advantages of this choice.