Chapter Nine: Low-Discrepancy Sequences

In the previous chapter we have reviewed the main variance reduction strategies that are used in financial and economic applications. Since those approaches are based on probabilistic concepts, they implicitly assume that random sampling in Monte Carlo methods is really random. However, the pseudorandom numbers produced by an LCG or by more sophisticated algorithms are not random at all. Hence, one could take a philosophical view and wonder about the very validity of variance reduction methods, and even the Monte Carlo approach itself. Taking a more pragmatic view, and considering the fact that Monte Carlo methods have proven their value over the years, we should conclude that this shows that there are some deterministic number sequences that work well in generating samples. It is also useful to remember that the aim of a Monte Carlo simulation is actually to estimate a multidimensional integral on the unit hypercube:

The function h(·) may be so complicated that we cannot express it analytically, but this is of no concern conceptually. We need a stream of i.i.d. random numbers to fill the integration domain in a satisfactory manner. When a regular grid derived from classical product rules for numerical integration (see Chapter 2) is not feasible, we may fill it by random numbers, but we could also resort to alternative deterministic sequences that are, in some sense, evenly distributed. This idea may be made more precise by defining the discrepancy of a sequence of numbers. In a sense, low-discrepancy sequences are half-way between deterministic grid-based methods and Monte Carlo sampling, as they are meant to be deterministic, whereas pseudorandom numbers are deterministic for the lack of a better alternative, yet they do not look regular at all, unlike grids. Low-discrepancy sequences are the foundation of quasi–Monte Carlo methods; sometimes, the term “quasi-random” sequence is used, but we will mostly avoid it as it is a bit misleading.

We first introduce the concept of discrepancy in Section 9.1. Then we cover two well-known low-discrepancy sequences: Halton sequences in Section 9.2 and Sobol sequences in Section 9.3. These two sequences do not exhaust the list of possible approaches, yet they are adequate to grasp the basic concepts and understand the pitfalls of quasi–Monte Carlo simulation. The deterministic nature of low-discrepancy sequences has one major drawback with respect to random sampling: There is no obvious way to assess the quality of estimates, since the usual confidence intervals do not make any sense. There are theoretical bounds on the integration error, but they may be very weak and of little practical use. A possible remedy is the randomization of low-discrepancy sequences. Furthermore, there are scrambling strategies to improve the quality of standard low-discrepancy sequences. We outline these matters in Section 9.4, without entering into theoretical details; some scrambling strategies are offered in R functions, and we just aim at providing the reader with the minimal background for their use. Low-discrepancy sequences play in quasi–Monte Carlo methods the same role as pseudorandom sequences do in Monte Carlo methods. When we need observations from another distribution, most notably the normal, we resort to some techniques that we have introduced in Chapter 5 for random variate generation. We illustrate such transformations and sample path generation by low-discrepancy sequences in Section 9.5. Finally, we should also mention that:

9.1 Low-discrepancy sequences

In the first chapters of the book we have insisted on the fact that a Monte Carlo simulator is essentially a complicated function mapping a stream of independent uniform random variables into an estimator of a quantity of interest. Whenever the dimension m of the integration domain is fixed, we are actually computing an integral over the unit hypercube I^m = (0, 1)^m ⊂

^m. This is the case when we use, for instance, the inverse transform method to obtain random variates from uniform pseudorandom numbers. However, this does not hold when we resort to acceptance–rejection, because with this kind of method we do not know a priori how many uniform random variables are needed to generate a single observation of the distribution of interest. Indeed, this is the reason why acceptance–rejection strategies are not compatible with the methods of this chapter. A good integration strategy should “cover” the unit hypercube in the most even and uniform way; note that, in this framework, we need not be concerned with independence of random variates, since this is actually guaranteed by a good coverage of I^m. However, we need to clarify what a “good coverage” is, which in turn requires a well-defined way to measure its quality.

in order to cover that domain in a satisfactory way. If the points are well distributed, the number of them included in any subset G of I^m should be roughly proportional to its volume vol(G). For instance, if N = 100, m = 2, and we consider the square G = [0, 0.5] × [0, 0.5] ⊂ I², we would expect to find 25 points in set G, since its area is 0.5 × 0.5 = 0.25. Whatever N, about 25% of the total number of points should fall there. However, this should apply to any subset of I². The concept of discrepancy aims at formalizing this intuition in a mathematically tractable way. Since dealing with arbitrary subsets of the unit hypercube is not quite feasible, we should restrict their form. Given a vector x = [x₁, x₂, …, x_m], consider the rectangular subset G_x defined as

which has volume x₁x₂ ··· x_m. If we denote by S(

, G) a function counting the number of points in the sequence

that are contained in a subset G ⊂ I^m, a possible definition of discrepancy is

To be precise, this is called the star discrepancy, and it measures the worst-case distance between the ideal and the actual number of points included in a rectangle of the type specified by Eq. (9.1). In Section 4.3.2 we have considered the Kolmogorov–Smirnov measure of fit for probability distributions: Star discrepancy can be interpreted as such a measure of fit for a multidimensional uniform distribution with independent marginals.

When computing a multidimensional integral on the unit hypercube, it is natural to look for low-discrepancy sequences. Some theoretical results suggest that low-discrepancy sequences may perform better than pseudorandom sequences obtained through an LCG or its variations. From the theory of confidence intervals, we know that the estimation error with Monte Carlo sampling is something like

where N is the sample size. With certain point sequences, it can be shown that the error is something like

where m is the dimension of the space in which we are integrating.¹ It is conjectured that the expression in Eq. (9.2) is a lower bound on what can be achieved, and the term “low-discrepancy sequence” is used when referring to sequences achieving this bound. Various such sequences have been proposed in the literature, and some are available in R. In the following, we illustrate the basic ideas behind two of them, namely, Halton and Sobol sequences.

9.2 Halton sequences

The Halton sequence was the first sequence for which the attainment of the low-discrepancy bound of Eq. (9.2) was shown. This sequence relies on a generalization of the Van der Corput one-dimensional sequence, which is based on the following simple recipe:

Example 9.1 Van der Corput sequence in base 2

The generation of a sequence with base 2 is quite instructive. Let us start with n = 1, which may be represented in base 2 as

equation

Reflecting this number we find

equation

Applying the same drill to n = 2 yields

equation

Now the pattern should be clear enough:

equation

This sequence may be generated in R using the halton function from the randtoolbox package:

> library(randtoolbox)
> halton(10, dim = 1)
[1] 0.5000 0.2500 0.7500 0.1250 0.6250 0.3750
0.8750 0.0625 0.5625 0.3125

We see from the example how a Van der Corput sequence works: It fills the unit interval by jumping around in such a way to maintain a balanced coverage. A Halton sequence in multiple dimensions is obtained by associating a Van der Corput sequence with each dimension, using prime numbers for each base. For instance, the snapshot

shows that the base b = 2 is associated with dimension 1, whereas bases b = 3 and b = 5 are associated with dimensions 2 and 3, respectively.

It is instructive to compare how a pseudorandom sample and a Halton sequence cover the unit square (0, 1) × (0, 1). The following snapshot produces the plots in Figs. 9.1 and 9.2:

The judgment is a bit subjective here, but it could be argued that the coverage of the Halton sequence is more even. What is certainly true is that using non-prime numbers as bases may result in quite unsatisfactory patterns, such as the one shown in Fig. 9.3. This plot has been obtained using Van der Corput sequences with bases 2 and 4.

Example 9.2 Numerical integration with Halton sequences

Let us explore the use of Halton low-discrepancy sequences in a bidimensional integration context. Suppose that we want to compute

equation

The surface corresponding to this function is illustrated in Fig. 9.4 and features a nasty oscillatory behavior. In R, we first write a function to compute the integrand and then use the adapt Integrate function from the cubature package to get an estimate by traditional quadrature formulas:

FIGURE 9.4 Plot of the integrand function in Example 9.2.

> library(cubature)
> f <- function(x) exp(−x[1]*x [2])*
(sin(6*pi*x[1])+cos(8*pi*x[2]))
> adaptIntegrate(f,rep(0,2),rep(1,2),tol=1e-4)
$integral
[1] 0.01986377
$error
[1] 1.983034e-06
$functionEvaluations
[1] 13957
$returnCode
[1] 0

It is easy to see that Monte Carlo estimates based on a sample of 10,000 pseudorandom points are not reliable:

> set.seed(55555)
> mean(apply(matrix(runif(20000),ncol=2),1,f))
[1] 0.02368898
> mean(apply(matrix(runif(20000),ncol=2), 1, f))
[1] 0.01311604
> mean(apply(matrix(runif(20000),ncol=2),1,f))
[1] 0.01870101
> mean(apply(matrix(runif(20000),ncol=2),1,f))
[1] 0.01796109
> mean(apply(matrix(runif(20000), ncol=2),1, f))
[1] 0.01680492

In the above snapshot we sample 20,000 observations from the uniform distribution, but they are arranged into a two-column matrix and correspond to 10,000 points; also note the use of apply. On the contrary, the result obtained by the bidimensional Halton sequence seems fairly satisfactory:

> library(randtoolbox)
> mean(apply(halton(10000,dim=2),1,f))
[1] 0.01987514

The example above looks quite promising. Nevertheless, there are issues in using Halton sequences for high-dimensional problems. To see why, let us plot the Van der Corput sequence for a large base:

Here we generate a Halton sequence with dimension 30 and plot the last column of the output matrix, which corresponds to a Van der Corput sequence with base b = 113. The resulting plot, shown in Fig. 9.5, features long monotonic subsequences. This is not surprising, given the large base: For instance, we have to move from 1/113 to 112/113 before breaking the first subsequence. This results in a very bad coverage of the unit square when we use Halton sequences with high bases associated with high dimensions. The following command produces the plot in Fig. 9.6:

This is just a projection of the 30-dimensional sequence on two margins, but it is reasonable to expect that such a pattern may have a strong adverse effect on the results.

9.3 Sobol low-discrepancy sequences

With Halton sequences, we have to use Van der Corput sequences associated with large prime numbers for high-dimensional problems, and as we have seen this may result in poor performance. A way to overcome this issue is to generate only Van der Corput sequences associated with a suitably small base. This idea has been pursued and resulted in two more low-discrepancy sequences:

Here we only outline the basic ideas behind Sobol sequences, which rely on more sophisticated concepts in the algebra of polynomials on a binary field. Before doing so, however, it is useful to get acquainted with this kind of sequence by using the function sobol provided by the package randtoolbox:

It is easy to see that the four columns are all permutations of the Van der Corput sequence with base b = 2.

9.3.1 SOBOL SEQUENCES AND THE ALGEBRA OF POLYNOMIALS

The uninterested reader may safely skip this section, which is more technical and not needed in the rest of the book. We will only give some hints at the theory behind Sobol sequences; the section can also be seen as an excuse to illustrate some R functions oriented toward bit manipulation. Let us focus on the generation of a one-dimensional sequence

in the unit interval [0, 1]. A Sobol sequence is generated on the basis of a set of direction numbers v₁, v₂, …; we will see shortly how direction numbers are selected, but for now just think of them as numbers which are less than 1. To get the nth number in the Sobol sequence, consider the binary representation of the integer n:²

The corresponding number in the sequence is obtained by computing the bitwise exclusive OR of the direction numbers v_i for which b_i ≠ 0:

If direction numbers are chosen properly, a low-discrepancy sequence will be generated [8]. A direction number may be thought as a binary fraction:

where m_i < 2ⁱ is an odd integer. To generate direction numbers, we exploit primitive polynomials over the field

₂, i.e., polynomials with binary coefficients:

Irreducible polynomials are those polynomials which cannot be factored; primitive polynomials are a subset of the irreducible polynomials and are strongly linked to the theory of error-correcting codes, which is beyond the scope of this book. Some irreducible polynomials over the field

₂ are listed, e.g., in [7, Chapter 7], to which the reader is referred for further information. Given a primitive polynomial of degree d, the procedure for generating direction numbers is based on the recurrence formula:

Some numbers m₁, …, m_d are needed to initialize the recursion. They may be chosen arbitrarily, provided that each m_i is odd and m_i < 2ⁱ.

Example 9.3 Direction numbers

As an example, let us build the set of direction numbers on the basis of the primitive polynomial

equation

In this polynomial we have coefficients a₁ = 0 and a₂ = 1. Therefore, the recursive scheme of Eq. (9.4) runs as follows:

equation

which may be initialized with m₁ = 1, m₂ = 3, m₃ = 7. The reasons why this may be a good choice are given in [2]. We may carry out the necessary computations step by step in R, using the bitXor function.

> library(bitops)
> m <- c(1, 3, 7)
> i <- 4
> m[i] <- bitXor(4*m[i-2],bitXor(8*m[i-3],m[i-3]))
> i <- 5
> m[i] <- bitXor(4*m[i-2],bitXor(8*m[i-3],m[i-3]))
> i <- 6
> m[i] <- bitXor(4*m[i-2],bitXor(8*m[i-3],m[i-3]))
> m
[1] 1 3 7 5 7 43

Given the integer numbers m_i, we may build the direction numbers v_i. To implement the generation of direction numbers, we may use a function like GetDirNumbers, which is given in Fig. 9.7. The function requires a primitive polynomial p, a vector of initial numbers m, and the number n of direction numbers we want to generate. On exit, we obtain the direction numbers v and the integer numbers m.

FIGURE 9.7 R code to generate direction numbers for Sobol sequences.

> p <- c(1,0,1,1)
> m0 <- c(1,3,7)
> GetDirNumbers(p,m0,6)
$v
[1] 0.500000    0.750000 0.875000 0.312500
0.218750 0.671875
$m
[1] 1 3 7 5 7 43

The code is not optimized; for instance, the first and last coefficients of the input polynomial should be 1 by default, and no check is done on the congruence in size of the input vectors.

After computing the direction numbers, we could generate a Sobol sequence according to Eq. (9.3). However, an improved method was proposed by Antonov and Saleev [1], who proved that the discrepancy is not changed by using the Gray code representation of n. Gray codes are discussed, e.g., in [7, Chapter 20]; all we need to know is the following:

1. A Gray code is a function mapping an integer i to a corresponding binary representation G(i); the function, for a given integer N, is one-to-one for 0 ≤ i ≤ 2^N − 1.

2. A Gray code representation for the integer n is obtained from its binary representation by computing

equation

3. The main feature of such a code is that the codes for consecutive numbers n and n + 1 differ only in one position.

Example 9.4 Gray code

Computing a Gray code is easily accomplished in R. For instance, we may define a gray function and compute the Gray codes for the numbers i = 0, 1, …, 15 as follows:

> gray <- function (x) bitXor(x,bitShiftR(x,b=1))
> codes <- matrix(0,nrow=16,ncol=4)
> for (i in 1:16) codes[i,]
<- as.numeric(intToBits(gray(i-1)))[c(4,3,2,1)]
> codes
      [,1]    [,2]    [,3]    [,4]
[1,]    0    0    0    0
[2,]    0    0    0    1
[3,]    0    0    1    1
[4,]    0    0    1    0
[5,]    0    1    1    0
[6,]    0    1    1    1
[7,]    0    1    0    1
[8,]    0    1    0    0
[9,]    1    1    0    0
[10,]    1    1    0    1
[11,]    1    1    1    1
[12,]    1    1    1    0
[13,]    1    0    1    0
[14,]    1    0    1    1
[15,]    1    0    0    1
[16,]    1    0    0    0

We have used the function bitShiftR of the package bitops to shift the binary representation of × one position to the right and the function intToBits to obtain the binary representation of a number. We see that indeed the Gray codes for consecutive numbers i and i + 1 differ in one position; that position corresponds to the rightmost zero bit in the binary representation of i (adding leading zeros if necessary).

Using the feature of Gray codes, we may streamline the generation of a Sobol sequence. Given x⁽ⁿ⁾, we have x⁽ⁿ⁺¹⁾ = x⁽ⁿ⁾ ⊕ v_c, where c is the index of the rightmost zero bit b_c in the binary representation of n.

9.4 Randomized and scrambled low-discrepancy sequences

In Chapter 7 we have seen how confidence intervals may be used to measure the reliability of Monte Carlo estimates. With low-discrepancy sequences, we cannot apply that idea, as there is nothing random in quasi–Monte Carlo simulation. Actually, there is a result, known as the Koksma–Hlawka theorem, which provides us with a bound on the integration error we incur when approximating an integral on the unit hypercube:

where D^* (x⁽¹⁾, …, x^(N)) is the star discrepancy of the point sequence

_N = (x⁽¹⁾, …, x^(N)), and V(f) is a quantity related to the integrand function, known as the Hardy–Krause variation. We will refrain from defining this form of variation, but let us say that, in general, variations try to capture the variability of a function, as this clearly affects the difficulty in approximating its integral numerically. Unfortunately, this bound is not quite useful, unless we have an operational way to evaluate the variation V(f). Another issue with such bounds is that they can be very weak. We may compare the above bound with the corresponding Monte Carlo probabilistic bound obtained by a pseudorandom sequence {U^(k)} in the unit hypercube:

with probability 1 − α/2, where σ_f² = Var[f(U)]. It is true that the discrepancy bound in Eq. (9.2) is more appealing than the

factor in Monte Carlo sampling, but the latter is related to a quite operational measure of estimate quality.

An array of methods have been designed both to allow the construction of confidence intervals and to improve the quality of point sequences:

Random shift. This is simple idea that lends itself to application with Korobov point sets (see Section 2.4.3). Given a point set {u⁽ⁱ⁾}, a uniform random vector V in [0, 1)^m is sampled and used to shift the point set to yield a sequence consisting of

Digital shift. The digital shift differs from the random shift in how the shift is applied, i.e., in an arithmetic modulo b, where b is the base used to represent the numbers in the unit interval. Consider the expansion in base b of the coordinate j of a point u⁽ⁱ⁾ in [0, 1)^m:

Scrambling. Scrambling is a deeper readjustment of points in the sequence. A scrambling procedure was originally proposed by Tezuka to improve Faure sequences, and is therefore known as Faure–Tezuka scrambling; an alternative approach is due to Owen. Both of these scrambling approaches are available in R and can be applied to Sobol sequences.

Given a randomized shifting, we may generate a sample of sequences and come up with a confidence interval. The approach is justified by distributional results that we skip here. We just want to illustrate how scrambling of Sobol sequences is applied in R by setting the optional parameter scrambling in the function sobol:

By setting scrambling to 1, Owen scrambling is selected, whereas 2 corresponds to Faure–Tezuka scrambling. We may check the effect of scrambling on the integration problem of Example 9.2:

9.5 Sample path generation with low-discrepancy sequences

Whatever low-discrepancy sequence we use, we need to transform the “uniform” numbers into something else. Since normal variables are ubiquitous in financial applications, let us consider how we may generate standard normal variables. To generate normal variates, we may either use the Box–Muller method, which we described in Section 5.5.1, or the inverse transform method. We cannot apply polar rejection, because when using low-discrepancy sequences we must integrate over a space with a well-defined dimensionality. We must know exactly how many quasi-random numbers we need, whereas with rejection-based methods we cannot anticipate that.

We recall the Box–Muller algorithm here for the sake of convenience. To generate two independent standard normal variates, we should first generate two independent random numbers U₁ and U₂, and then set

Rather than generating pseudorandom numbers, we may use a bidimensional low-discrepancy sequence. Given the potentially weird effects of the Box–Muller transformation, which we have illustrated in Fig. 5.12, one could argue that the inverse transform method is a safer approach. Since fast and accurate methods to invert the CDF of the standard normal have been devised, we will only pursue the latter approach.

Example 9.5 Pricing a vanilla call option

As an illustration, let us price a vanilla call option using Halton and Sobol sequences. The code in Fig. 9.8 receives a sequence of uniform numbers, possibly obtained by a pseudorandom generator, and transforms them to standard normals by inversion. The snapshot below shows that low-discrepancy sequences yield much more accurate answers than naive Monte Carlo, for a relatively small sample size.

FIGURE 9.8 Pricing a vanilla call option with low-discrepancy sequences.

> S0 <- 50; K <- 50; sigma <- 0.4
> r <- 0.05; T <- 5/12
> EuVanillaCall(S0,K,T,r,sigma)
[1] 5.614995
> set.seed(55555)
> BlsLowD(S0,K,T,r,sigma,runif(5000))
[1] 5.673162
> BlsLowD(S0,K,T,r,sigma, halton(5000,dim=1))
[1] 5.592316
> BlsLowD(S0,K,T,r,sigma,sobol(5000,dim=1))
[1] 5.605094
> BlsLowD(S0,K,T,r,sigma,
sobol(5000,dim=1, scrambling=1))
[1] 5.612801

In this case, too, it seems that scrambling a Sobol sequence works fairly well.

A final important note concerns the correct use of low-discrepancy sequences for sample path generation. Generating sample paths for geometric Brownian motion is rather easy, as we only need standard normal variables. Consider the following matrix of uniform numbers:

It must be realized that with these numbers we may generate ten sample paths with five time instants along each path (not including the initial value of the process). Each dimension of the sequence (number of columns) must correspond to a time instant, and each row corresponds to a sample path. A common mistake is to associate a dimension with a sample path, but by doing so we lose the intertemporal independence property of the standard Wiener processes.

The function in Fig. 9.9 is quite similar to the vectorized code of Fig. 6.3. It can use either Halton or Sobol sequences and, for the latter ones, we may also select the type of scrambling. As a first check, let us generate and plot five sample paths, without any scrambling:

The sample paths, displayed in Fig. 9.10 do not look quite satisfactory. The case of Halton sequence shows decreasing paths for latter time periods; the case of Sobol sequence shows one sample path that looks like a straight horizontal line. A bit of reflection explains why we observe such weird patterns. We have already noted that, for large bases, Van der Corput sequences suffer from long monotonically increasing subsequences. Thus, they start with small numbers, which correspond to negative normal shocks and a monotonic decrease in the stochastic process, at least in the first steps. On the contrary, the first part of the sample paths seems less problematic, as there we are using small bases. As to the Sobol sequences, since the first point consists of a repetition of values 0.5, which correspond to zero normal shocks, the first part of the sample path may look like a straight line, even though it is actually the initial part of an exponential.

If we sample more paths, as shown in Fig. 9.11, the results look a bit more reasonable. Indeed, skipping the initial parts of a low-discrepancy sequence is a strategy that is sometimes adopted, as well as the Brownian bridge, in order to break some nasty patterns. Another possibility, of course, is to use scrambling. The ten sample paths of Fig. 9.12 have been produced by Sobol sequences with Owen scrambling, and they definitely look more realistic.

For further reading

We have introduced low-discrepancy sequences in a rather informal way, without referring to fundamental concepts such as the digital nets. A rigorous account can be found in [6], whereas a less extensive but readable treatment can be found in [3, Chapter 5].

You may also see [5], where Chapter 5, among other things, deals with quality measures, and Chapter 6 analyzes questions related to decomposition and effective dimensions.

Another interesting reference is [4, Chapter 8], which is more specifically aimed at financial applications.

References

1 I.A. Antonov and V.M. Saleev. An economic method of computing lp_τ sequences. USSR Computational Mathematics and Mathematical Physics, 19:252–256, 1979.

2 P. Bratley and B.L. Fox. Algorithm 659: Implementing Sobol’s quasirandom sequence generator. ACM Transactions on Mathematical Software, 14:88–100, 1988.

3 P. Glasserman. Monte Carlo Methods in Financial Engineering. Springer, New York, 2004.

5 C. Lemieux. Monte Carlo and Quasi–Monte Carlo Sampling. Springer, New York, 2009.

6 H. Niederreiter. Random Number Generation and Quasi–Monte Carlo Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992.

7 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C (2nd ed.). Cambridge University Press, Cambridge, 1992.

8 I.M. Sobol. On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7:86–112, 1967.

¹This is stated in a slightly more precise form in Section 9.4. See [6] for a detailed and rigorous account of these results.

² In this section, the rightmost digit in the (binary) representation of an integer number is denoted by b₁ instead of d₀; this is just a matter of convenience, as we associate bit positions with direction numbers rather than powers of a generic base.