Humankind has not woven the web of life. We are but one thread within it. Whatever we do to the web, we do to ourselves. All things are bound together. All things connect.
Chief Seattle (1786–1866), a leader of the Suquamish and Duwamish Native American tribes in what is now Washington state.
This chapter provides supplemental information on six topics: (i) rectifiable arcs and arc length, to supplement the material on jagged coastlines in the beginning of Chap. 3 and also used in Sect. 6.4 on the box counting dimension of a spiral, (ii) the Gamma function and hypersphere volumes, used in Sect. 5.1 on the Hausdorff dimension, (iii) Laplace’s method of steepest descent, used in Sect. 16.3 on the spectrum of the multifractal Cantor set, (iv) Stirling’s approximation, used in Sect. 16.3, (v) convex functions and the Legendre transform, used in Sect. 16.4 on multifractals, and (vi) graph spectra (the study of the eigenvectors and eigenvalues of the adjacency matrix of ), used in Sect. 20.1.
23.1 Rectifiable Arcs and Arc Length
In this section we compute the length of an arc, where “arc” refers not to a connection between two nodes in a graph, but rather to a curvy line segment. Consider the curve shown in Fig. 23.1 and the points A and B on the curve. As shown in this figure, divide the arc between A and B into K pieces by choosing K − 1 points P1, P2, …, PK−1 on the curve. Draw the K chords AP1, P1P2, P2P3, …, PK−1B. Let L(AP1) be the length of chord AP1. For k = 1, 2, …, K − 2, let L(PkPk+1) be the length of chord PkPk+1. Let L(PK−1B) be the length of chord PK−1B. Define LK ≡ L(AP1) + L(P1P2) + ⋯ + L(PK−1B), so LK is the sum of the K chord lengths. Let δ be the length of the longest of these K chords, and let the number of points dividing the arc AB be increased such that δ → 0 as K →∞. If limδ→0LK exists and is independent of the choice of the points P1, P2, ⋯ , PK−1, then arc AB is said to be rectifiable, and this limit is defined to be the length of the arc [Smail 53].
In Sect. 6.4 we need to calculate the length of an arc. Let the function f(x) and its derivative be continuously differentiable on the interval (S, T). Then the length L of the arc of f(x) from S to T is
(23.1)
For an informal derivation of this formula, consider Fig. 23.2 and the two points x ∈ (S, T) and x + dx, where dx ≪ 1. The length of the arc from x to x + dx is approximated by the length L(x) of the chord connecting the points and . We have
Now divide the interval [S, T] into K pieces, each of size (T − S)∕K, by inserting points xk, k = 1, 2, …, K − 1 such that S < x1 < x2 < ⋯ < xK−1 < T. Define x0 = S and xK = T. Define dx = (T − S)∕K. Then the arc length of f(x) from S to T is approximated by the sum of the K chord lengths:
Letting K →∞ yields the integral (23.1). A rigorous proof would invoke the mean value theorem for derivatives [Smail 53].
To derive the arc length for a curve defined using polar coordinates, let y = f(x), so f′(x) = dy∕dx. Rewrite (23.1) as
Since and , then and . From this we obtain, after some algebra,
(23.2)
(23.3)
where the final inequality uses (23.2), and θ(S) and θ(T) are the limits of integration for the integral over θ.
23.2 The Gamma Function and Hypersphere Volumes
The gamma function Γ is defined for positive real x by
(23.4)
Integration by parts yields
Since Γ(1) = 1, it is easily verified that Γ(n + 1) = n! if n is a positive integer.
The volume VE(r) of a hypersphere in with radius r is [Manin 06]
To evaluate VE(r), we first evaluate Γ(1∕2). We have
With the change of variables , we have t = u2 and
yielding
where the last equality follows as the area under the normal curve is . Using the identity Γ(x + 1) = xΓ(x),
and the volume of a sphere in is
(23.5)
23.3 Laplace’s Method of Steepest Descent
Laplace’s method, also called the method of steepest descent, is a way to determine the leading order behavior of the integral
as λ →∞. The essence of the method is that, for λ ≫ 1, the integral F(λ) can be closely approximated by a simpler integral whose limits of integration are a neighborhood [y − 𝜖, y + 𝜖] around a point y which minimizes g over [a, b], and the simpler integral itself can be easily approximated.1 Even if g is relatively flat over [a, b], when λ ≫ 1 the term e−λg(x) is much larger in the neighborhood of y than elsewhere on [a, b].
We require three assumptions. Assume that (i) F(λ) exists for all sufficiently large λ; (ii) the functions f and g are twice differentiable on (a, b); and (iii) the minimal value of g over [a, b] is attained at some y ∈ (a, b) satisfying g′′(y) > 0 and f(y) ≠ 0. Assumption (ii) allows us to replace f and g by their second-order Taylor series expansions. Assumption (iii) implies that g′(y) = 0 and that g is strictly convex at y, so y is an isolated local minimum of g.
Rewriting F(λ) and then applying several approximations, as λ →∞ we have
where the final equality follows from the area under a normal curve:
We thus have the final result that, as λ →∞,
(23.6)
23.4 Stirling’s Approximation
In Chap. 16 on multifractals, we made frequent use of the approximation for large n, and often this is referred to as Stirling’s approximation. The actual approximation credited to Stirling is
(23.7)
We now derive (23.7). From (23.4), for x > 0 we have
so
(23.8)
Now apply Laplace’s approximation (23.6) to the integral in (23.8), with z taking the place of x, with x taking the place of λ, with f(z) = 1, and with . The point z = 1 is the unique global minimum of g(z) over {z | z > 0}, and g(1) = 1, g′(1) = 0, and g′′(1) = 1. Since all three assumptions of Sect. 23.3 are satisfied, by (23.6) we have, as x →∞,
If x is the positive integer n, then Γ(n + 1) = n!, which yields (23.7).
Exercise23.1 Plot the error incurred by using (23.7) to approximate n! as a function of the positive integer n. □
23.5 Convex Functions and the Legendre Transform
In this section we present the Legendre transform of a convex function of a single variable, drawing on [Kennerly 11, Mangasarian 69, Rockafellar 74, Zia 09]. Let be convex. For simplicity, assume for the remainder of this section that (i) F is twice differentiable and (ii) F′′(x) > 0 for all x. These two assumptions imply that F is strictly convex and that F′(x) is monotone increasing with x.2 For this discussion it is important to distinguish between the function F and the value F(x) of the function at the point x. Since F is differentiable and convex, then
for all x and y. As illustrated by Fig. 23.3, this inequality means that F lies above the line with slope F′(x) and vertical intercept F(x) − xF′(x) which is tangent to F at x. Since F lies above this line, and since F intersects the line, the line is called a supporting hyperplane for F. Figure 23.4 illustrates two supporting hyperplanes of a convex function. Each convex function (whether or not it is differentiable) can be characterized in terms of its supporting hyperplanes [Rockafellar 74].
Since F′(x) is monotone increasing with x, for each p there is a unique x such that F′(x) = p. For example, if F(x) = (x − 1)2, setting F′(x) = 2(x − 1) = p yields x = 1 + (p∕2). Let be the function defined by
That is, for each p the value is the x value at which the derivative of F is p. For example, if F(x) = (x − 1)2, then . Note that x is a point and is a function. Having defined the function , we can write , which is the value of F at the point . Finally, the Legendre transform of F is the function defined by
(23.10)
These definitions are illustrated in Fig. 23.5, which shows that G(p) is the negative of the y intercept of the line through the point with slope p. To see this algebraically, consider the equation (y − y0) = m(x − x0) describing a line with slope m through the point (x0, y0). The y-intercept of this line is y0 − mx0. Choosing m = p and , the y-intercept is , which by (23.10) is − G(p).
Example23.1 Consider the function F defined by F(x) = x6. We have F′(x) = 6x5. Setting 6x5 = p yields . From (23.10) we have G(p) = p(p∕6)1∕5 − [(p∕6)1∕5]6 = [6−1∕5 − 6−6∕5]p6∕5. □
Exercise23.2 Derive an expression for the Legendre transform of 1 + x4, and plot G(p) versus p. □
Exercise23.3 Let G be the Legendre transform of F. Prove that the Legendre transform of G is F itself. □
Suppose F satisfies, in addition to assumptions (i) and (ii) above, the additional assumptions that F(x) →∞ as |x|→∞, and that F′(x) = 0 at some point x. Then the following informal argument suggests that G is convex. Under these additional assumptions, if x ≪ 0, then p = F′(x) ≪ 0, so the intercept − G(p) ≪ 0, which means G(p) ≫ 0. As x increases then p = F′(x) increases, so the intercept − G(p) increases, which means G(p) decreases. As x increases beyond the point where F′(x) = 0, the derivative p = F′(x) continues to increase, so the intercept − G(p) decreases, which means G(p) increases. The assumptions in the above informal argument are stronger than required: it was shown in [Rockafellar 74] that if is a finite and differentiable convex function such that for each the equation ∇F(x) = p has a unique solution , then G is a finite and differentiable convex function.
The Legendre transform of F is a special case of the conjugate of F, defined in [Rockafellar 74]:
(23.11)
Since the conjugate F⋆ is the supremum of an (infinite) collection of linear functions, where each linear function maps p to px − F(x), then F⋆ is convex. For a given p, to evaluate F⋆(p) we maximize px − F(x) over . The definition of F⋆ does not require F to be differentiable. However, since in this section we do assume that F is twice differentiable, to maximize px − F(x) we take the derivative with respect to x and equate the derivative to 0. The maximum is achieved at the point x at which F′(x) = p; this point is, by definition, . Since F is strictly convex, then is the unique point yielding a global maximum, over , of the concave function px − F(x). This is illustrated in Fig. 23.6, where (the length of the green vertical bar) exceeds (the length of the red vertical bar), where is an arbitrary point not equal to . From (23.11) we have , which agrees with (23.10), since G is a special case of F⋆. Thus G(p) = F⋆(p).
We have shown that, under assumptions (i) and (ii) on the convex function F, we can describe F in two different ways. The usual way is to specify the functional form of F. The Legendre transform provides a second way by specifying, for each possible value of F′(x), the vertical intercept of the line tangent to F with slope F′(x). While the above discussion considered only a function of a single variable, the theory of Legendre transforms and conjugate functions holds for multivariate functions [Rockafellar 74, Zia 09].
23.6 Graph Spectra
The spectrum of is the set of eigenvectors and eigenvalues of the node-node adjacency matrix M of (Sect. 2.1). Since is undirected and has no self-loops, then the diagonal elements of M are zero, and M is symmetric (Mij = Mji).3 There is a large body of work, including several books, on graph spectra. To motivate our brief discussion of this subject, we consider the 1-dimensional quadratic assignment problem studied in [Hall 70]. This is the problem of placing the nodes of on a line to minimize the sum of the squared distances between adjacent nodes. (Here “adjacent” refers to adjacency in .) A single constraint is imposed to render infeasible the trivial solution in which all nodes are placed at the same point. Hall’s algorithm uses the method of Lagrange multipliers for constrained optimization problems, and Hall’s algorithm can be extended to the placement of the nodes of in two dimensions. Thus spectral graph analysis is a key technique in various methods for drawing graphs. Spectral methods are also used for community detection in networks, particularly in social networks.
Let x denote an N-dimensional column vector (i.e., a matrix with 1 column and N rows). The transpose x′ of x is an N-dimensional row vector (i.e., a matrix with 1 row and N columns). Consider the optimization problem of minimizing the objective function
(23.12)
subject to the single constraint . This constraint is compactly expressed as x′x = 1. Thus the problem is to place N nodes on the line (hence the adjective 1-dimensional) to minimize the sum of squared distances, where the sum is over all pairs of adjacent nodes (i.e., adjacent in ). For example, let N = 3 and let be the complete graph K3. Suppose the three nodes are placed at positions 1∕6, 1∕3, and 1∕2. Since each off-diagonal element of Mij is 1, the objective function value is
Clearly, solves the optimization problem of minimizing (23.12) subject to the constraint x′x = 1, since this x yields an objective function value of 0. Since this x places each node at the same position, we exclude this x and seek a non-trivial solution.
Let D be the N × N diagonal matrix defined by for i = 1, 2, …, N and Dij = 0 for i ≠ j. Thus Dii is the sum of the elements in row i of M. Since M is symmetric, Dii is also the sum of the elements in column i of M. Compute the N × N matrix D − M, where (D − M)ij = Dij − Mij.
Example23.2 [Hall 70] Consider the illustrated in Fig. 23.7. Suppose columns 1, 2, 3, and 4 of M correspond to nodes a, b, c, and d, respectively. Then
□
Consider the objective function (23.12). For simplicity, we write ∑i to denote and ∑j to denote . We have
(23.13)
Since ∑i∑j (xi − xj)2Mij ≥ 0 for all x then from (23.13) we have x′(D − M)x ≥ 0 for all x; that is, the matrix D − M is positive semi-definite. Thus the N eigenvalues of D − M are nonnegative.
Let ϕ0 be the N-dimensional column vector each of whose components is 1. The transpose ϕ0′ of ϕ0 is the N-dimensional row vector (1, 1, …, 1). By definition of D we have ϕ0′(D − M) = 0, where the “0” on the right-hand side of this equality is the N-dimension row vector each of whose components is the number zero. Hence ϕ0 is an eigenvector of D − M with associated eigenvalue 0. It was proved in [Hall 70] that for a connected graph the matrix D − M has rank N − 1, which implies that the other N − 1 eigenvalues of D − M are positive.
From (23.13), minimizing (23.12) subject to x′x = 1 is equivalent to minimizing x′(D − M)x subject to x′x = 1. We write x′x = 1 as x′Ix − 1 = 0, where I is the N × N identity matrix. Invoking the classical theory of constrained optimization, we associate the Lagrange multiplier λ with the constraint x′Ix − 1 = 0 and form the Lagrangian L(x, λ) defined by
Each solution of the optimization problem must satisfy, for i = 1, 2, …, N,
which implies (D − M − λI)x = 0, that is, (D − M)x = λx. Thus (x, λ) solves the optimization problem only if (i) x′x = 1 and (ii) x is an eigenvector of D − M and λ is the associated eigenvalue. If x does satisfy (i) and (ii), then
so the objective function value at x is equal to the associated eigenvalue. Since we want to minimize the objective function, we want the smallest eigenvalue. But, as we have shown, the smallest eigenvalue is 0 and corresponds to the trivial solution with all xi equal. Hence the desired solution is the eigenvector corresponding to the second smallest eigenvalue, and the minimal objective function value is the second smallest eigenvalue.
Example23.2(Continued) For the G illustrated in Fig. 23.7, the four eigenvalues and eigenvectors are given in Table 23.1. The solution is x = ϕ2, which yields an objective function value of 0.586; this solution “unravels” , placing node a to the left of node d, node d to the left of node b, and node b to the left of node c. □
Table 23.1
Eigenvalues and eigenvectors for the example
Eigenvalue
Eigenvector
λ1 = 0.0
ϕ1 = (0.5, 0.5, 0.5, 0.5)
λ2 = 0.586
ϕ2 = (−0.653, 0.270, 0.653, − 0.270)
λ3 = 2.0
ϕ3 = (−0.5, 0.5, − 0.5, 0.5)
λ4 = 3.414
ϕ4 = (0.270, 0.653, − 0.270, − 0.653)
Exercise23.4 Apply Hall’s method to find the optimal 1-dimensional layout of when is the network of three nodes connected in a ring. □
Exercise23.5 Apply Hall’s method to find the optimal 1-dimensional placement of nodes for the graph of Fig. 23.8. □