Distance doesn’t exist, in fact, and neither does time. Vibrations from love or music can be felt everywhere, at all times.
Yoko Ono (1933- ) Japanese multimedia artist, singer, and peace activist, and widow of John Lennon of The Beatles.
Covering a hypercube in , with side length L, by boxes of side length s requires (L∕s)E boxes. This complexity, exponential in E, makes dB expensive to compute for sets in a high-dimensional space. The correlation dimension was introduced in 1983 by Grassberger and Procaccia [Grassberger 83a, Grassberger 83b] and by Hentschel and Procaccia [Hentschel 83] as a computationally attractive alternative to dB. The correlation dimension has been enormously important to physicists [Baker 90], and even by 1999 the literature on the computation of the correlation dimension was huge [Hegger 99, Lai 98]. The Grassberger and Procaccia papers introducing the correlation dimension were motivated by the study of strange attractors, a topic in dynamical systems. Although this book is not focused on chaos and dynamical systems, so much of the literature on the correlation dimension is motivated by these topics that a very short review, based on [Ruelle 80, Ruelle 90] will provide a foundation for our discussion of the correlation dimension of a geometric object (in this chapter and in the next chapter) and of a network (in Chap. 11).
9.1 Dynamical Systems
A dynamical system (e.g., a physical, chemical, or biological dynamical system) is a system that changes over time. A deterministic dynamical system models the evolution of x(t), where t is time and x(t) is a vector with finite or infinite dimension. A continuous time dynamical system has the form , for some function F, while a discrete time dynamical system has the form for some function F. A physical system can be described using its phase space, so that each point in phase space represents an instantaneous description of the system at a given point in time. We will be concerned only with finite dimensional systems for which the phase space is , and the state of the system is specified by x = (x1, x2, …, xE). For example, for a physical system the state might be the coordinates on a grid, and for a chemical system the state might be temperature, pressure, and concentration of the reactants. The E coordinates of x are assumed to vary with time, and their values at time t are x(t) = (x1(t), x2(t), …, xE(t)). We assume [Ruelle 80] the state at time t + 1 depends only on the state at time t:
Define F = (F1, F2, …, FE), so . Given x(0), we can determine x(t) for each time t.
If an infinitesimal change δx(0) is made to the initial conditions (at time t = 0), there will be a corresponding change δx(t) at time t. If for some λ > 0 we have |δx(t)|∝|δx(0)|eλt (i.e., if δx(t) grows exponentially with t), then the system is said to exhibit sensitive dependence on initial conditions. This means that small changes in the initial state become magnified, eventually becoming distinct trajectories [Mizrach 92]. The exponent λ is called a Lyapunov exponent.
A dynamical system is said to be chaotic if for all (or almost all) initial conditions x(0) and almost all δx(0) the system exhibits sensitive dependence on initial conditions. The evolution of chaotic systems is complex and essentially random by standard statistical tests. Only short-term prediction of chaotic systems is possible, even if the model is known with certainty [Barkoulas 98]. Chaos was first observed by J. Hadamard in 1898, and H. Poincaré in 1908 realized the implications for weather predictability, a subject addressed by Edward Lorenz [Chang 08] in a famous 1963 paper that revitalized interest in chaos.1 Lorenz and others have shown that theoretically it is not possible to make reasonable weather forecasts more than one or 2 weeks ahead. An example of a continuous time chaotic dynamical system is the system studied by Lorenz in 1963:
These equations approximate the behavior of a horizontal fluid layer heated from below. As the fluid warms, it tends to rise, creating convection currents. If the heating is sufficiently intense, the convection currents become irregular and turbulent. Such behavior happens in the earth’s atmosphere, and since a sensitive dependence on initial conditions holds, this model provides some theoretical justification for the inability of meteorologists to make accurate long-range weather forecasts.2
There are two types of dynamical physical systems. When the system loses energy due to friction, as happens in most natural physical systems, the system is dissipative. If there is no loss of energy due to friction, the system is conservative. A dissipative system always approaches, as t →∞, an asymptotic state [Chatterjee 92]. Consider a dissipative dynamical system , and suppose that at time 0, all possible states of the system lie in a set Ω0 having positive volume (i.e., positive Lebesgue measure, as defined in Sect. 5.1). Since the system is dissipative, Ω0 must be compressed due to loss of energy, and Ω0 converges to a compact set Ω as t →∞. The set Ω lives in a lower dimensional subspace of the phase space, and thus has Lebesgue measure zero in the phase space. The set Ω satisfies F(Ω) = Ω, and Ω can be extremely complicated, often possessing a fractal structure [Hegger 99]. Interestingly, Benford’s Law (Sect. 3.2) has been shown to apply to one-dimensional dynamical systems [Fewster 09].
The term strange attractor was introduced by Ruelle and Takens in 1971. Informally, an attractor of a dynamical system is the subset of phase space to which the system evolves, and a strange attractor is an attractor that is a fractal [Theiler 90]. More formally, a bounded set , with infinitely many points, is a strange attractor for the map if there is a set U with the following four properties [Ruelle 80]:
1.
U is an E-dimensional neighborhood of Ω, i.e., for each point x ∈ Ω, there is a ball centered at x and entirely contained in U. In particular, Ω ⊂ U.
2.
For each initial point x(0) ∈ U, the point x(t) ∈ U for all t > 0, and x(t) is arbitrarily close to Ω for all sufficiently large t. (This means that Ω is attracting.)
3.
There is a sensitive dependence on initial condition whenever x(0) ∈ U. (This makes Ω is strange attractor.)
4.
One can choose a point x(0) in Ω such that, for each other point y in Ω, there is a point x(t) that is arbitrarily close to y for some positive t. (This indecomposability condition says that Ω cannot be split into two different attractors.)
To see how a strange attractor arises in dissipative systems, imagine we begin with a small sphere. After a small time increment, the sphere may evolve into an ellipsoid, such that the ellipsoid volume is less than the sphere volume. However, the length of longest principle axis of the ellipsoid can exceed the sphere diameter. Thus, over time, the sphere may compress in some directions and expand in other directions. For the system to remain bounded, expansion in a given direction cannot continue indefinitely, so folding occurs. If the pattern of compressing, expansion, and folding continues, trajectories that are initially close can diverge rapidly over time, thus creating the sensitive dependence on initial conditions. The amounts of compression or stretching in the various dimensions are quantified by the Lyapunov exponents [Chatterjee 92].
A well-known example of a discrete time dynamical system was introduced by Hénon. For this two-dimensional system, we have x1(t + 1) = x2(t) + 1 − ax1(t)2 and x2(t + 1) = bx2(t). With a = 1.4 and b = 0.3, the first 104 points distribute themselves on a complex system of lines that are a strange attractor (see [Ruelle 80] for pictures); for a = 1.3 and b = 0.3 the lines disappear and instead we obtain 7 points of a periodic attractor.
Chaos in human physiology was studied in many papers by Goldberger, e.g., [Goldberger 90]. In the early 1980s, when investigators began to apply chaos theory to psychological systems, they expected that chaos would be most apparent in diseased or aging systems (e.g., [Ruelle 80] stated that the normal cardiac regime is periodic). Later research found instead that the phase-space representation of the normal heartbeat contains a strange attractor, suggesting that the dynamics of the normal heartbeat may be chaotic. There are advantages to such dynamics [Goldberger 90]: “Chaotic systems operate under a wide range of conditions and are therefore adaptable and flexible. This plasticity allows systems to cope with the exigencies of an unpredictable and changing environment.”
Several other examples of strange attractors were provided in [Ruelle 80]. Strange attractors arise in studies of the earth’s magnetic field, which reverses itself at irregular intervals, with at least 16 reversals in the last 4 million years. Geophysicists have developed dynamo equations, which possess chaotic solutions, to describe this behavior. The Belousov–Zhabotinski chemical reaction turns ferroin from blue to purple to red; simultaneously the cerium ion changes from pale yellow to colorless, so all kinds of hues are produced (this reaction caused astonishment and some disbelief when it was discovered). For a biological example, discrete dynamical models of the population of multiple species yield strange attractors. Even for a single species, the equation , where R > 0, gives rise to non-periodic behavior. Ruelle observed that the non-periodic fluctuations of a dynamical system do not necessarily indicate an experiment that was spoiled by mysterious random forces, but rather may indicate the presence of a strange attractor. Chaos in ecology was studied in [Hastings 93].
Physicists studying dynamical systems want to distinguish behavior that is irregular, but low-dimensional, from behavior that is irregular because it is essentially stochastic, with many effective degrees of freedom [Theiler 90]. Calculating the fractal dimension of the attractor helps make this distinction, since the fractal dimension roughly estimates the number of independent variables involved in the process [Badii 85]. That is, the fractal dimension can be interpreted as the number of degrees of freedom of the system [Ruelle 90]. A finite non-integer fractal dimension indicates the presence of complex deterministic dynamics [Krakovska 95]. The connection between strange attractors and fractals was succinctly described in [Grassberger 83b]:
In a system with F degrees of freedom, an attractor is a subset of F-dimensional phase space toward which almost all sufficiently close trajectories get “attracted” asymptotically. Since volume is contracted in dissipative flows, the volume of an attractor is always zero, but this leaves still room for extremely complex structures.
Typically, a strange attractor arises when the flow does not contract a volume element in all directions, but stretches it in some. In order to remain confined to a bounded domain, the volume element gets folded at the same time, so that it has after some time a multisheeted structure. A closer study shows that it finally becomes (locally) Cantor-set like in some directions, and is accordingly a fractal in the sense of Mandelbrot [Mandelbrot 77].
We could determine the fractal dimension of an attractor by applying box counting. However, a major limitation of box counting is that it counts only the number of occupied boxes, and is insensitive to the number of points in an occupied box. Thus dB is more of a geometrical measure, and provides very limited information about the clumpiness of a distribution. Moreover, if the box size s is small enough to investigate the small-scale structure of a strange attractor, the relation (assuming diameter-based boxes) implies that the number BD(s) of occupied boxes becomes very large. The most interesting cases of dynamical systems occur when the dimension of the attractor exceeds 3. As observed in [Farmer 82], when dB = 3 and s = 10−2 from we obtain one million boxes, which in 1982 taxed the memory
of a computer. While such a requirement would today be no burden, the point remains: box-counting methods in their basic form have time and space complexity exponential in dB. Moreover, for a dynamical system, some regions of the attractor have a very low probability, and it may be necessary to wait a very long time for a trajectory to visit all of the very low probability regions. These concerns were echoed in [Eckmann 85], who noted that the population of boxes is usually very uneven, so that it may take a long time for boxes that should be occupied to actually become occupied. For these reasons, box counting was not (as of 1985) used to study dynamical systems [Eckmann 85]. Nine years later, [Strogatz 94] also noted that dB is rarely used for studying high-dimensional dynamical processes.
An early (1981) alternative to box counting was proposed in [Froehling 81]. Given a set of points in , use regression to find the best hyperplane
of dimension m that fits the data, where m is an integer in the range 1 ≤ m ≤ E − 1. The goodness of fit of a given hyperplane is measured by χ2, the sum of deviations from the hyperplane divided by the number of degrees of freedom. If m is too small, the fit will typically be poor, and χ2 will be large. As m is increased, χ2 will drop sharply. The m for which χ2 is lowest is the approximate fractal dimension. A better approach, proposed two years later in 1983, is the correlation dimension. Our discussion of the correlation dimension begins with the natural invariant measure and the pointwise mass dimension.
9.2 Pointwise Mass Dimension
The pointwise mass dimension is based upon the measure of a subset of . The study of measures of sizes of sets dates at least as far back as Borel (1885), and was continued by Lebesgue (1904), Carathéodory (1914), and Hausdorff (1919) [Chatterjee 92]. Recall that the Lebesgue measure μL was defined in Chap. 5. Formalizing the qualitative definition of “dissipative” provided above, we say that a dynamical physical system gt, t = 1, 2, …, where , is dissipative if the Lebesgue measure μL of a bounded subset tends to 0 as t →∞, i.e., if for each bounded we have
(9.1)
It can be shown that only dissipative systems have strange attractors [Theiler 90]. Suppose the dissipative system has the strange attractor Ω. Since by (9.1) we have μL(Ω) = 0, then to study Ω we require a new measure defined on subsets of Ω; the measure of a subset should reflect how much of the subset is contained in Ω. The new measure μ of a subset B ⊂ Ω should count how often and for how long a typical trajectory visits B, and an appropriate definition is
(9.2)
where is the starting point of the system gt, and IB[x] is the indicator function whose value is 1 if x ∈ B and 0 otherwise [Theiler 90]. The measure defined by (9.2) is the natural measure for almost all x0. For a discrete dynamical system, the natural measure of B is [Peitgen 92]
The natural measure is an invariant measure, which means [Peitgen 92] that for the system xt+1 = f(xt) if f−1(B) is the set of points which after one iteration land in B, then μ(f−1(B)) = μ(B). For x ∈ Ω and r > 0, define
where the choice of norm determines the shape of Ω(x, r) (e.g., a hypersphere for the Euclidean norm, and a hypercube for the L∞ norm). Then μ(Ω(x, r)) is the natural measure of the ball Ω(x, r).
Definition9.1 The pointwise dimension of μ at x, denoted by d(x), is given by
(9.3)
□ Thus μ(Ω(x, r)) ∼ rd(x) as r → 0. The pointwise dimension at x is also called the local Hölder exponent at x. In contrast to the Hausdorff and box counting dimensions, which are global quantities whose computation requires covering all of Ω, the pointwise dimension is a local quantity, defined at a single point. If d(x) is independent of x, then the probability (i.e., “mass”) of Ω(x, r) depends only on r but not on x; this holds, e.g., when points in Ω are uniformly distributed. If d(x) is a non-constant function of x, then the mass of Ω(x, r) depends on x and r; this holds, e.g., for the Gaussian distribution. (See [Simpelaere 98] for additional theory of the pointwise dimension.) The average pointwise dimension, denoted by 〈d〉, is given by
(9.4)
where dμ(x) is the differential of μ(x).
9.3 The Correlation Sum
Let , and suppose we have determined N points in Ω. Denote these points by x(n) for n = 1, 2, …, N. Denote the E coordinates of x(n) by x1(n), x2(n), …, xE(n). We can approximate the average pointwise dimension 〈d〉 defined by (9.4) by averaging d(x) over these points, that is, by computing . Since by (9.3) each is a limit, then is an average of N limits. An alternative method of estimating (9.4) is to first compute, for each r, the average
(9.5)
and then compute the limit
(9.6)
Thus, instead of taking the average of N limits, in (9.6) we take the limit of the average of N natural measures. Writing (9.6) in a computationally useful form will lead us to the correlation dimension.
Let be the Heaviside step function, defined by
(9.7)
The subscript “0” in I0 is used to distinguish this step function from one that will be defined in a later chapter. Note that I0(0) = 1. Then
(9.8)
Typically, dist(x, y) is the Euclidean distance; we could alternatively use the L∞ (sup norm) and define [Frank 89]. Define
(9.9)
so C(x(i), r) is the number of points, distinct from x(i), whose distance from x(i) does not exceed r.3 Although C(x(i), r) depends on N, for notational simplicity we write C(x(i), r) rather than, e.g., CN(x(i), r). We have 0 ≤ C(x(i), r) ≤ N − 1. We can approximate μ(Ω(x(i), r)) using
(9.10)
This is an approximation, and not an equality, since we only have a finite data set; the accuracy of the approximation increases as N →∞. Define the correlation sumC(r) by
(9.11)
Although C(r) depends on N, for notational simplicity we write C(r) rather than, e.g., CN(r). If 0 < dist(x(i), x(j)) ≤ r, then both (i, j) and (j, i) contribute to C(r).4 In words, C(r) is the ratio
Equivalently, C(r) is the fraction of all pairs of distinct points for which the distance between the pair does not exceed r. An equivalent definition of C(r) that invokes the natural measure μ defines C(r) to be the probability that a pair of distinct points, chosen randomly on the attractor with respect to μ, is separated by a distance not exceeding r [Ding 93]. From (9.5), (9.10), and (9.11) we have
(9.12)
Grassberger stressed the importance of excluding i = j from the double sum defining C(r); see [Theiler 90]. However, taking the opposite viewpoint, [Smith 88] wrote that it is essential that the i = j terms be included in the correlation sum, “in order to distinguish the scaling of true noise at small r from fluctuations due to finite N”. The argument in [Smith 88] is that, when the i = j terms are omitted, is not bounded as r → 0, and steep descents in may result in large estimates of dC, which may exceed E. In this chapter, we will use definition (9.9).
Given the set of N points, for , the distance between each pair of distinct points does not exceed , so C(r) = 1. For , the distance between each pair of distinct points is less than , so C(r) = 0. Hence the useful range of r is the interval . If there is a single pair of points at distance , then , and . Since , the range of over the interval is approximately . However, for box counting with N points, the number of non-empty boxes (i.e., BR(r) for radius-based boxes and BD(s) for diameter-based boxes) ranges from 1 to N, so the range of or is , which is only half the range of . Similarly, for a single point x, over its useful range the pointwise mass dimension estimate (9.10) ranges from 1∕(N − 1) to 1, so the range of its logarithm is also approximately , half the range for . The greater range, by a factor of 2, for is “the one advantage” the correlation sum has over the average pointwise dimension [Theiler 90].
Consider an alternate definition of the correlation sum, now using the double summation which does not exclude the case i = j. With this alternative definition, each of the N pairs (x(i), x(i)) contributes 1 to the double summation. Following [Yadav 10],
(9.13)
where Nn(r) is the number of points whose distance from point n does not exceed r. For each n, since Nn(r) always counts n itself, then Nn(r) ≥ 1. For all sufficiently large r we have Nn(r) = N for each n, in which case . In (9.13), we can interpret Nn(r)∕N as the expected fraction of points whose distance from n does not exceed r. Suppose this fraction is independent of n, and suppose we have a probability distribution P(k, r) providing the probability that k of N points are within a distance r from a random point. The expected value (with respect to k) of this probability distribution is the expected fraction of the N points within distance r of a random point, so
(9.14)
which with (9.13) provides yet another way to write :
(9.15)
While (9.14) provides the mean (with respect to k) of the distribution P(k, r), to fully characterize the distribution we need its moments. So for define the generalized correlation sum
(9.16)
which for q = 2 yields (9.15). For q ≫ 1 the terms dominating the sum in (9.16) are terms kq−1P(k, r) for which k is “large” (i.e., many points in the ball of radius r centered at a random point), whereas for q ≪ 0 the terms dominating the sum in (9.16) are terms kq−1P(k, r) for which k is “small” (i.e., few points in the ball of radius r). Thus studying the behavior of for −∞ < q < ∞ provides information about regions containing many points as well as regions with few points. We will explore this further in Chap. 16 on multifractals.
exists, then dC is called the correlation dimension of Ω. □
Assume for the remainder of this section that dC exists. Then dC is independent of the norm used in (9.8) to define the distance between two points. Also, as r → 0, where “∼” means “tends to be approximately proportional to” (see Sect. 1.2 for a discussion of this symbol). However, the existence of dC does not imply that , since this proportionality may fail to hold, even as r → 0. For r > 0, define the function by
(9.18)
We cannot conclude that Φ(r) approaches a constant value as r → 0, but the existence of dC does imply [Theiler 88].
One great advantage of computing dC rather than dB is that memory
requirements are greatly reduced, since there is no need to construct boxes covering Ω. With box counting it is also necessary to examine each box to determine if it contains any points; this may be impractical, particularly if r is very small. Another advantage, mentioned in Sect. 9.3 above, is that the range of is twice the range of or . In [Eckmann 85], the correlation dimension was deemed a “quite sound” and “highly successful” means of determining the dimension of a strange attractor, and it was observed that the method “is not entirely justified mathematically, but nevertheless quite sound”.5 In [Ruelle 90] it was noted that “... this algorithm has played a very important role in allowing us to say something about systems that otherwise defied analysis”. The importance of dC was echoed in [Schroeder 91], which noted that the correlation dimension is one of the most widely used fractal dimensions, especially if the fractal comes as “dust”, meaning isolated points, sparsely distributed over some region. Lastly, [Mizrach 92] observed that, in the physical sciences, the correlation dimension has become the preferred dimension to use with experimental data.
There are several possible approaches to computing dC from the N points x(n), n = 1, 2, …, N. First, we could pick the smallest r available, call it rmin, and estimate dC by . The problem with this approach is that the convergence of to dC is very slow [Theiler 90]; this is the same problem identified in Sect. 4.2 in picking the smallest s value available to estimate dB. A second possible approach [Theiler 88] is to take the local slope of the versus curve, using
(9.19)
A third possible approach [Theiler 88] is to pick two values r1 and r2, where r1 < r2, and use the two-point secant estimate
(9.20)
While the third approach avoids the slow logarithmic convergence, it requires choosing r1 and r2, both of which should tend to zero. This approach was analyzed in [Theiler 93] and compared to other methods for estimating dC. (For a network, two-point secant estimates of fractal dimensions were utilized in [Rosenberg 16b, Rosenberg 17c, Wang 11].)
By far, the most popular method for estimating dC is the Grassberger–Procaccia method. The steps are (i) compute C(r) for a set of positive radii rm, m = 1, 2, …, M, (i) determine a subset of these M values of rm over which the versus curve is roughly linear, and (iii) estimate dC as the slope of the best fit line through the points over this subset of the M points. As observed in [Theiler 88], there is no clear criterion for how to determine the slope through the chosen subset of the M points. Finding the slope that minimizes the unweighted sum of squares has two drawbacks. First, the least squares model assumes a uniform error in the calculated , which is incorrect since the statistics get worse as r decreases. Second, the least squares model assumes that errors in are independent of each other, which is incorrect since C(r + 𝜖) is equal to C(r) plus the fraction of distances between r and 𝜖; hence for small 𝜖 the error at is strongly correlated with the error at .
Suppose C(rm) has been computed for m = 1, 2, …, M, where rm < rm+1. In studying the orbits of celestial bodies, [Freistetter 00] used the following iterative procedure to determine the range of r values over which the slope of the versus curve should be computed. Since the versus curve is not linear in the region of very small and very large r, we trim the ends of the interval as follows. In each iteration we first temporarily remove the leftmost r value (which in the first iteration is r1) and use linear least squares to obtain dR, the slope corresponding to {r2, r3, …, rM}, and the associated correlation coefficient cR. (Here the correlation coefficient, not to be confused with the correlation dimension, is a byproduct of the regression indicating the goodness of fit.) Next we temporarily remove the rightmost r value (which in the first iteration is rM) and use linear least squares to obtain dL, the slope corresponding to {r1, r2, …, rM−1}, and the associated correlation coefficient cL. Since a higher correlation coefficient indicates a better fit (where c = 1 is a perfect linear relationship), if cR > cL we delete rL while if cL > cR we delete rM. We begin the next iteration with either {r2, r3, …, rM} or {r1, r2, …, rM−1}. The iterations stop when we obtain a correlation coefficient higher than a desired value.
An alternative to the Grassberger–Procaccia method is the method of [Takens 85], which is based on maximum likelihood estimation. Suppose we randomly select N points from Ω. Assume that dC exists and that the function Φ defined by (9.18) takes the constant value k, so . Assume also that the distances between pairs of randomly selected points are independent and randomly distributed such that the probability P(x ≤ r) that the distance x between a random pair of points does not exceed r is given by . Let X(r) be the number of pairwise distances less than r, and denote these X(r) distance values by ri for 1 ≤ i ≤ X(r). The value of dC that maximizes the probability of finding the X(r) observed distances {ri} is given by the unbiased minimal variance Takens estimator , defined by Guerrero and Smith [Guerrero 03]
(9.21)
so uses all distances less than a specified upper bound r, but there is no lower bound on distances. As r → 0 and X(r) →∞ we have . If we replace X(r) − 1 by X(r) (which introduces a slight bias [Guerrero 03]), we can write (9.21) more compactly as
where 〈 〉 denotes the average over all distances (between pairs of points) not exceeding r. It can be shown that [Theiler 90]
(9.22)
As in [Theiler 88], the slightly more unwieldy form of the right-hand side of (9.22) is shown for comparison with (9.19). The estimate
(9.23)
(the middle term in (9.22) above) has been called the Takens-Theiler estimator [Hegger 99]. The advantage of this estimator is that, as r → 0, the Takens-Theiler estimate T(r) approaches dC much more rapidly than does . However, for some fractal sets T(r) does not converge [Theiler 90]. The non-convergence is explained by considering the function Φ(r) defined by (9.18). A fractal has “periodic lacunarity” if for some positive number L we have Φ(r) = Φ(Lr). For example, the middle-thirds Cantor set exhibits periodic lacunarity with L = 1∕3. The Takens estimate T(r) does not converge for fractals with periodic lacunarity. Theiler also supplied a condition on Φ(r) under which T(r) does converge.
Exercise9.1 Show that the middle-thirds Cantor set exhibits periodic lacunarity with L = 1∕3. □
To evaluate (9.23), the method of [Hegger 99] can be used. Since C(r) is only evaluated at the discrete points {r1, r2, r3, …, rM}, we approximate by the linear function for r ∈ [ri−1, ri], which implies C(r) is approximated by the exponential function for r ∈ [ri−1, ri]. The values ai and bi can be computed from the M points . Assume ri ≤ r for 1 ≤ i ≤ M. Since the integral of a linear function is trivial,
(9.24)
Considering the M points for i = 1, 2, …, M, suppose there are indices L and U, where 1 ≤ L < U ≤ M, such that a straight line is a good fit to the points for L ≤ i ≤ U. Then, using (9.23) and (9.24), a reasonable estimate of dC is
[Hegger 99]. Various other approaches to computing dC have also been proposed, e.g., [Shirer 97].
Exercise9.2 Use (9.24) to estimate dC for the middle-thirds Cantor set for r ∈ [0.001, 1]. □
When the Grassberger–Procaccia method is applied to a time series
(Sect. 10.2), as in the calculation of dC of a strange attractor
of a dynamical system, the upper bound
The range of r over which f(r) can be computed is , where
and . For time series
applications, only part of this range is usable, since the lower part of the range suffers from statistical fluctuations, and the upper part suffers from nonlinearities. Let rmin be the smallest usable r for which we evaluate f(r), and let rmax be the largest usable r, where . Suppose rmax∕rmin ≥ 10, so the range is at least one decade. The correlation dimension dC is approximately the slope m, where
Since each point is within distance r of itself, then f(r) ≥ N for r > 0. Hence f(rmin) ≥ N, so log10f(rmin) ≥ 0. Also, f(rmax) ≤ N2. Since rmax∕rmin ≥ 10 then log10rmax −log10rmin ≥ 1. Hence
Thus one should not believe a correlation dimension estimate that is not substantially less than 2log10N [Ruelle 90].
This bound was used in [Boon 08] to study the visual evoked potential. Unfortunately, numerous studies do violate this bound [Ruelle 90]. If the inequality rmax∕rmin ≥ 10 is replaced by the more general bound rmax∕rmin ≥ k, then we obtain dC ≤ 2logkN. Ruelle observed that it seems unreasonable to have the ratio rmax∕rmin be much less than 10.
To conclude this section, we quote these words of caution in [Hegger 99]:
… dimensions characterize a set or an invariant measure whose support is the set, whereas any data set contains only a finite number of points representing the set or the measure. By definition, the dimension of a finite set of points is zero. When we determine the dimension of an attractor numerically, we extrapolate from finite length scales … to the infinitesimal scales, where the concept of dimensions is defined. This extrapolation can fail for many reasons …
9.5 Comparison with the Box Counting Dimension
In this section we present the proof of [Grassberger 83a, Grassberger 83b] that dC ≤ dB for a geometric object Ω.6 We first discuss notation. The pointwise mass measure μ(Ω(r)) was defined in Sect. 9.2 in terms of the box radius r. The correlation sum C(r) given by (9.11), and the correlation dimension dC given by (9.17), were similarly defined using r. Since C(r) counts pairs of points whose distance is less than r, we could equally well have defined the correlation sum and dC using using the variable s, rather than r, to represent distance. The motivation for using s rather than r in this section is that we will compute dC by first covering Ω with diameter-based boxes, and we always denote the linear size of a diameter-based box by s. Therefore, in this section we will denote the correlation sum by C(s); its definition is unchanged.
For a given positive s ≪ 1, let be a minimal covering of Ω by diameter-based boxes of size s, so the distance between any two points in a given box does not exceed s. Let N points be drawn randomly from Ω according to its natural measure and let Nj be the number of points in box . Discard from each box for which Nj = 0, and, as usual, let BD(s) be the number of non-empty boxes in .
Since diam(Bj) ≤ s, then the Nj points in Bj yield Nj(Nj − 1) ordered pairs of points such that the distance between the two points does not exceed s. The sum counts all ordered pairs (x(i), x(j)) of points, separated by a distance not exceeding s, such that x(i) and x(j) lie in the same box. Since a given pair (x(i), x(j)) of points satisfying dist(x(i), x(j)) ≤ s may lie within some box Bj, or x(i) and x(j) may lie in different boxes, from (9.9) and (9.11),
(9.27)
(recall that although C(s) depends on N, for notational simplicity we simply write C(s).) Let pj ≡ Nj∕N be the probability of box Bj, and define
(9.28)
Although pj depends on both N and s, for notational simplicity we write pj; similarly, although μj depends on s, for notational simplicity we write μj. Following [Chatterjee 92, Grassberger 83a] and using (9.11) and (9.27),
(9.29)
Relation (9.29) says that for large N and small s, the correlation sum C(s) is no less than the probability that two points of Ω are contained in some box of size s [Krakovska 95].
A generalization [Peitgen 92] of the correlation sum considers q-tuples (x(1), x(2), …, x(q)) of distinct points, where q ≥ 2, such that the pairwise distance condition holds: for each pair of points x(i) and x(j) in the q-tuple we have dist(x(i), x(j)) ≤ s. With N, , and BD(s) as defined as in the previous paragraph, let Q(N, s, q) be the number of q-tuples of the N points such that the pairwise distance condition holds. Then
(9.30)
The sum on the right-hand side of (9.30) is the probability that q points, randomly drawn from Ω according to its natural measure, fall into any box. From this generalized correlation sum we can define generalized dimensions, to be studied in Chap. 16.
Having shown that we can lower bound the correlation sum by covering Ω with boxes, we now present the proof [Grassberger 83a, Grassberger 83b] that dC ≤ dB. The proof in [Grassberger 83b] assumes and . Since the symbol ∼ is ambiguous, we recast these assumptions as Assumptions 9.1 and 9.2 below.
Assumption9.1 For some positive number dC and for some function satisfying
Assumption9.2. For some positive number dB and for some function satisfying
for all sufficiently small positive s we have
(9.32)
where BD(s) is the minimal number of diameter-based boxes of size s needed to cover Ω. □
Jensen’s inequality says that if f is a convex function, is a random variable, and denotes expectation, then . In particular, for a random variable taking a finite set of values, if are nonnegative weights summing to 1, if are positive numbers, and if f(z) = z2 then
(9.33)
Applying this to the covering of Ω, set J = BD(s) and xj = μj and αj = 1∕J for 1 ≤ j ≤ J. From (9.33),
(9.34)
Theorem9.1 If Assumptions 9.1 and 9.2 hold then dC ≤ dB.
Proof For each s, let be a minimal s-covering of Ω and let BD(s) be the number of boxes in . Randomly select from Ω, according to its natural measure, the N points x(1), x(2), …, x(N). Let be the set of these N points. For box , let Nj(s) be the number of points of that are contained in Bj, so . From (9.29),
By Assumption 9.1, for s ≪ 1 we have . By Assumption 9.2, for s ≪ 1 we have , so . Hence, from (9.35) for s ≪ 1 we have
Taking logarithms yields
Dividing both sides of the inequality by , which is negative, and letting s → 0 yields dC ≤ dB. □
For strictly self-similar fractals such as the Sierpiński gasket, the correlation dimension dC, the Hausdorff dimension dH, and the similarity dimension dS are all equal ( [Schroeder 91], p. 216).7