E. RosenbergFractal Dimensions of Networkshttps://doi.org/10.1007/978-3-030-43169-3_23

23. Supplemental Material

Eric Rosenberg¹

(1)

AT&T Labs, Middletown, NJ, USA

Humankind has not woven the web of life. We are but one thread within it. Whatever we do to the web, we do to ourselves. All things are bound together. All things connect.
Chief Seattle (1786–1866), a leader of the Suquamish and Duwamish Native American tribes in what is now Washington state.

This chapter provides supplemental information on six topics: (i) rectifiable arcs and arc length, to supplement the material on jagged coastlines in the beginning of Chap. 3 and also used in Sect. 6.4 on the box counting dimension of a spiral, (ii) the Gamma function and hypersphere volumes, used in Sect. 5.1 on the Hausdorff dimension, (iii) Laplace’s method of steepest descent, used in Sect. 16.3 on the spectrum of the multifractal Cantor set, (iv) Stirling’s approximation, used in Sect. 16.3, (v) convex functions and the Legendre transform, used in Sect. 16.4 on multifractals, and (vi) graph spectra (the study of the eigenvectors and eigenvalues of the adjacency matrix of $\mathbb {G}$ ), used in Sect. 20.1.

23.1 Rectifiable Arcs and Arc Length

In this section we compute the length of an arc, where “arc” refers not to a connection between two nodes in a graph, but rather to a curvy line segment. Consider the curve shown in Fig. 23.1 and the points A and B on the curve. As shown in this figure, divide the arc between A and B into K pieces by choosing K − 1 points P ₁, P ₂, …, P _K−1 on the curve. Draw the K chords AP ₁, P ₁P ₂, P ₂P ₃, …, P _K−1B. Let L(AP ₁) be the length of chord AP ₁. For k = 1, 2, …, K − 2, let L(P _kP _k+1) be the length of chord P _kP _k+1. Let L(P _K−1B) be the length of chord P _K−1B. Define L _K ≡ L(AP ₁) + L(P ₁P ₂) + ⋯ + L(P _K−1B), so L _K is the sum of the K chord lengths. Let δ be the length of the longest of these K chords, and let the number of points dividing the arc AB be increased such that δ → 0 as K →∞. If lim_δ→0L _K exists and is independent of the choice of the points P ₁, P ₂, ⋯ , P _K−1, then arc AB is said to be rectifiable, and this limit is defined to be the length of the arc [Smail 53].

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig1_HTML.png — Figure 23.1
Rectifiable Arc

In Sect. 6.4 we need to calculate the length of an arc. Let the function f(x) and its derivative ../images/487758_1_En_23_Chapter/487758_1_En_23_IEq3_HTML.gif

be continuously differentiable on the interval (S, T). Then the length L of the arc of f(x) from S to T is

$\displaystyle \begin{aligned} \begin{array}{rcl} L = \int_S^T \sqrt{1 + [f^{\prime}(x) ]^2 } \; dx \, . {} \end{array} \end{aligned}$

(23.1)

For an informal derivation of this formula, consider Fig. 23.2 and the two points x ∈ (S, T) and x + dx, where dx ≪ 1. The length of the arc from x to x + dx is approximated by the length L(x) of the chord connecting the points $\bigl (x, f(x) \bigr )$ and $\bigl (x+ dx, f(x+dx) \bigr )$ . We have

$\displaystyle \begin{aligned} \begin{array}{rcl} L(x) &\displaystyle =&\displaystyle \sqrt { (dx)^2 + [f(x+dx)-f(x)]^2 } \\ &\displaystyle =&\displaystyle \sqrt { (dx)^2 [ 1 + [\bigl( f(x+dx)-f(x) \bigr)/dx]^2 } \\ &\displaystyle \approx &\displaystyle \sqrt{ 1 + [ f^{\prime}(x) ]^2 } \; dx \, . \end{array} \end{aligned}$

Now divide the interval [S, T] into K pieces, each of size (T − S)∕K, by inserting points x _k, k = 1, 2, …, K − 1 such that S < x ₁ < x ₂ < ⋯ < x _K−1 < T. Define x ₀ = S and x _K = T. Define dx = (T − S)∕K. Then the arc length of f(x) from S to T is approximated by the sum of the K chord lengths:

$\displaystyle \begin{aligned} \begin{array}{rcl} L &\displaystyle \approx &\displaystyle \sum_{k=0}^{K-1} \sqrt{ 1 + [ f^{\prime}(x_{{k}}) ]^2 } \; dx \, . \end{array} \end{aligned}$

Letting K →∞ yields the integral (23.1). A rigorous proof would invoke the mean value theorem for derivatives [Smail 53].

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig2_HTML.png — Figure 23.2
Computing the length of a rectifiable arc

To derive the arc length for a curve defined using polar coordinates, let y = f(x), so f ^′(x) = dy∕dx. Rewrite (23.1) as

$\displaystyle \begin{aligned} L = \int_S^T \sqrt{1 + (dy/dx)^2 } \; dx \, . \end{aligned}$

Since $x = r \cos (\theta )$ and $y = r \, \sin (\theta )$ , then $dx/d\theta = (dr/d\theta ) \cos {}(\theta ) - r \sin {}(\theta )$ and $dy/d\theta = (dr/d\theta ) \sin {}(\theta ) + r \cos {}(\theta )$ . From this we obtain, after some algebra,

$\displaystyle \begin{aligned} \begin{array}{rcl} (dx/d\theta)^2 + (dy/d\theta)^2 = r^2 +(dr/d\theta)^2 \, . {} \end{array} \end{aligned}$

(23.2)

$\displaystyle \begin{aligned} L &= \int_S^T \sqrt{1 + (dy/dx)^2 } \; dx = \, \int_S^T \sqrt{1 + \left( \frac{dy/d\theta}{dx/d\theta} \right)^2 } \; dx \\&= \, \int_S^T \sqrt{ \frac{ \left( dx/d\theta \right)^2 + \left( dy/d\theta \right)^2}{ \left( dx/d\theta \right)^2} } \; dx \\ & = \, \int_S^T \sqrt{ \left( \frac{dx}{d\theta} \right)^2 + \left( \frac{dy}{d\theta} \right)^2 } \; \frac{d\theta}{dx} \; dx = \, \int_{\theta(S)}^{\theta(T)} \sqrt{ r^2 +(dr/d\theta)^2 } \; {d\theta} \; , {} \end{aligned}$

(23.3)

where the final inequality uses (23.2), and θ(S) and θ(T) are the limits of integration for the integral over θ.

23.2 The Gamma Function and Hypersphere Volumes

The gamma function Γ is defined for positive real x by

$\displaystyle \begin{aligned} \begin{array}{rcl} \varGamma(x) \equiv \int_0^{\infty} t^{x-1} \, e^{-t} dt \, . {} \end{array} \end{aligned}$

(23.4)

Integration by parts yields

$\displaystyle \begin{aligned} \varGamma(x+1) = \int_0^{\infty} t^{x} \, e^{-t} dt = - t^{x} e^{-t} \Big|{}_{t=0}^{\infty} + x \int_0^{\infty} t^{x-1} \, e^{-t} dt = x \varGamma(x) \, . \end{aligned}$

Since Γ(1) = 1, it is easily verified that Γ(n + 1) = n! if n is a positive integer.

The volume V _E(r) of a hypersphere in ${\mathbb {R}}^E$ with radius r is [Manin 06]

$\displaystyle \begin{aligned} V_E(r) = \frac{ \varGamma(1/2)^E }{ \varGamma\big(1 + (E/2) \big) } \, r^E \, . \end{aligned}$

To evaluate V _E(r), we first evaluate Γ(1∕2). We have

$\displaystyle \begin{aligned} \varGamma(1/2) = \int_0^\infty t^{-1/2} e^{-t}\, dt = 2 \int_0^\infty e^{-t}\, \frac{1}{2 \sqrt{t}} \, dt. \end{aligned}$

With the change of variables $u = \sqrt {t}$ , we have t = u ² and

$\displaystyle \begin{aligned} du = \frac{1}{2 \sqrt{t}} \, dt \end{aligned}$

yielding

$\displaystyle \begin{aligned} \varGamma(1/2) = 2 \int_{0}^\infty e^{-u^2} \, du = \int_{-\infty}^{\infty} e^{-u^2} \, du = \sqrt{\pi} \, , \end{aligned}$

where the last equality follows as the area under the normal curve is $\sqrt {\pi }$ . Using the identity Γ(x + 1) = xΓ(x),

$\displaystyle \begin{aligned} \begin{array}{rcl} \varGamma(3/2) &\displaystyle =&\displaystyle \varGamma\big( (1/2) + 1 \big) = (1/2) \sqrt{\pi} \\ \varGamma(5/2) &\displaystyle =&\displaystyle \varGamma\big( (3/2) + 1 \big) = (3/2) (1/2) \sqrt{\pi} = (3/4) \sqrt{\pi} \end{array} \end{aligned}$

and the volume of a sphere in ${\mathbb {R}}^3$ is

$\displaystyle \begin{aligned} \begin{array}{rcl} V_3(r) = \frac{ \varGamma(1/2)^3 }{ \varGamma(5/2) } \, r^3 = (4/3) \pi r^3 \, . \end{array} \end{aligned}$

(23.5)

23.3 Laplace’s Method of Steepest Descent

Laplace’s method, also called the method of steepest descent, is a way to determine the leading order behavior of the integral

$\displaystyle \begin{aligned} \begin{array}{rcl} F(\lambda) = \int_a^b f(x) \, e^{-\lambda g(x)} dx \end{array} \end{aligned}$

as λ →∞. The essence of the method is that, for λ ≫ 1, the integral F(λ) can be closely approximated by a simpler integral whose limits of integration are a neighborhood [y − 𝜖, y + 𝜖] around a point y which minimizes g over [a, b], and the simpler integral itself can be easily approximated.¹ Even if g is relatively flat over [a, b], when λ ≫ 1 the term e ^−λg(x) is much larger in the neighborhood of y than elsewhere on [a, b].

We require three assumptions. Assume that (i) F(λ) exists for all sufficiently large λ; (ii) the functions f and g are twice differentiable on (a, b); and (iii) the minimal value of g over [a, b] is attained at some y ∈ (a, b) satisfying g ^′′(y) > 0 and f(y) ≠ 0. Assumption (ii) allows us to replace f and g by their second-order Taylor series expansions. Assumption (iii) implies that g ^′(y) = 0 and that g is strictly convex at y, so y is an isolated local minimum of g.

Rewriting F(λ) and then applying several approximations, as λ →∞ we have

$\displaystyle \begin{aligned} \begin{array}{rcl} F(\lambda) &\displaystyle =&\displaystyle e^{-\lambda g(y)} \int_a^b f(x) \, e^{-\lambda[ g(x) - g(y)]} dx \\ &\displaystyle \approx&\displaystyle e^{-\lambda g(y)} \int_{y-\epsilon}^{y+\epsilon} f(x) \, e^{-\lambda[ g(x) - g(y)]} dx \\ &\displaystyle \approx&\displaystyle e^{-\lambda g(y)} f(y) \int_{y-\epsilon}^{y+\epsilon} \, e^{-\lambda[ g(x) - g(y)]} dx \\ &\displaystyle \approx&\displaystyle e^{-\lambda g(y)} f(y) \int_{y-\epsilon}^{y+\epsilon} \, e^{-\lambda[ g^{\, \prime}(y)(x-y) + (1/2) g^{\prime \prime}(y)(x-y)^2]} dx \\ &\displaystyle =&\displaystyle e^{-\lambda g(y)} f(y) \int_{y-\epsilon}^{y+\epsilon} \, e^{-(\lambda/2)g^{\prime \prime}(y)(x-y)^2} dx \\ &\displaystyle \approx&\displaystyle e^{-\lambda g(y)} f(y) \int_{-\infty}^{\infty} \, e^{-(\lambda/2)g^{\prime \prime}(y)(x-y)^2} dx \\ &\displaystyle =&\displaystyle e^{-\lambda g(y)} f(y) \int_{-\infty}^{\infty} \, e^{-(\lambda/2)g^{\prime \prime}(y)z^2]} dz \\ &\displaystyle =&\displaystyle e^{-\lambda g(y)} f(y) \sqrt{ \frac{2 \pi}{\lambda g^{\prime \prime}(y)} } \end{array} \end{aligned}$

where the final equality follows from the area under a normal curve:

$\displaystyle \begin{aligned} \int_{-\infty}^{\infty} \, e^{-a z^2} dz = \sqrt{\frac{\pi}{a}} \, . \end{aligned}$

We thus have the final result that, as λ →∞,

$\displaystyle \begin{aligned} \begin{array}{rcl} \int_a^b f(x) \, e^{-\lambda g(x)} dx \approx e^{-\lambda g(y)} f(y) \sqrt{ \frac{2 \pi}{\lambda g^{\prime \prime}(y)} } \; . {} \end{array} \end{aligned}$

(23.6)

23.4 Stirling’s Approximation

In Chap. 16 on multifractals, we made frequent use of the approximation $\log n! \approx n \log n$ for large n, and often this is referred to as Stirling’s approximation. The actual approximation credited to Stirling is

$\displaystyle \begin{aligned} \begin{array}{rcl} n! \approx \sqrt{2 \pi} \, n^{n+ (1/2)} \, e^{-n} \mbox{ as } n \rightarrow \infty \, . {} \end{array} \end{aligned}$

(23.7)

We now derive (23.7). From (23.4), for x > 0 we have

$\displaystyle \begin{aligned} \begin{array}{rcl} \varGamma(x+1) &\displaystyle =&\displaystyle \int_{t=0}^{\infty} t^{x} \, e^{-t} dt \\ &\displaystyle = &\displaystyle \int_{t=0}^{\infty} e^{x \ln t} \, e^{-t} dt \\ &\displaystyle = &\displaystyle \int_{t=0}^{\infty} e^{-x[(t/x) - \ln t]} dt \\ &\displaystyle = &\displaystyle x \int_{z=0}^{\infty} e^{-x[z - \ln(xz)]} dz \;\;\;\;\; \mbox{(using } t=xz\mbox{)} \\ &\displaystyle = &\displaystyle x e^{x \ln x} \int_{z=0}^{\infty} e^{-x(z - \ln z)} dz \\ &\displaystyle = &\displaystyle x^{x +1} \int_{z=0}^{\infty} e^{-x(z - \ln z)} dz \end{array} \end{aligned}$

$\displaystyle \begin{aligned} \begin{array}{rcl} \varGamma(x+1) = x^{x +1} \int_{z=0}^{\infty} e^{-x(z - \ln z)} dz \, . {} \end{array} \end{aligned}$

(23.8)

Now apply Laplace’s approximation (23.6) to the integral in (23.8), with z taking the place of x, with x taking the place of λ, with f(z) = 1, and with $g(z) = z - \ln z$ . The point z = 1 is the unique global minimum of g(z) over {z | z > 0}, and g(1) = 1, g ^′(1) = 0, and g ^′′(1) = 1. Since all three assumptions of Sect. 23.3 are satisfied, by (23.6) we have, as x →∞,

$\displaystyle \begin{aligned} \begin{array}{rcl} \int_{z=0}^{\infty} e^{-x(z - \ln z)} dz \approx e^{-x} \, \sqrt{ \frac{2 \pi}{x}} \, . {} \end{array} \end{aligned}$

(23.9)

From (23.8) and (23.9) we have, as x →∞,

$\displaystyle \begin{aligned} \varGamma(x+1) \approx x^{x +1} \, e^{-x} \, \sqrt{ \frac{2 \pi}{x}} \; = \sqrt{ 2 \pi} \; x^{x +(1/2)} \, e^{-x} \, . \end{aligned}$

If x is the positive integer n, then Γ(n + 1) = n!, which yields (23.7).

Exercise 23.1 Plot the error incurred by using (23.7) to approximate n! as a function of the positive integer n. □

23.5 Convex Functions and the Legendre Transform

In this section we present the Legendre transform of a convex function of a single variable, drawing on [Kennerly 11, Mangasarian 69, Rockafellar 74, Zia 09]. Let $F:\mathbb {R} \rightarrow \mathbb {R}$ be convex. For simplicity, assume for the remainder of this section that (i) F is twice differentiable and (ii) F ^′′(x) > 0 for all x. These two assumptions imply that F is strictly convex and that F ^′(x) is monotone increasing with x.² For this discussion it is important to distinguish between the function F and the value F(x) of the function at the point x. Since F is differentiable and convex, then

$\displaystyle \begin{aligned} \begin{array}{rcl} F(y) \geq F(x) + F^{\prime}(x)(y-x) \end{array} \end{aligned}$

for all x and y. As illustrated by Fig. 23.3, this inequality means that F lies above the line with slope F ^′(x) and vertical intercept F(x) − xF ^′(x) which is tangent to F at x. Since F lies above this line, and since F intersects the line, the line is called a supporting hyperplane for F. Figure 23.4 illustrates two supporting hyperplanes of a convex function. Each convex function (whether or not it is differentiable) can be characterized in terms of its supporting hyperplanes [Rockafellar 74].

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig3_HTML.png — Figure 23.3
A differentiable convex function lies above each tangent line

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig4_HTML.png — Figure 23.4
Two supporting hyperplanes of a convex function

Since F ^′(x) is monotone increasing with x, for each p there is a unique x such that F ^′(x) = p. For example, if F(x) = (x − 1)², setting F ^′(x) = 2(x − 1) = p yields x = 1 + (p∕2). Let $\widetilde {x}: {\mathbb {R}} \rightarrow {\mathbb {R}}$ be the function defined by

$\displaystyle \begin{aligned} \widetilde{x}(p) = \{x \, \vert \, F^{\prime}(x) = p \} \, . \end{aligned}$

That is, for each p the value $\widetilde {x}(p)$ is the x value at which the derivative of F is p. For example, if F(x) = (x − 1)², then $\widetilde {x}(p) = 1 + (p/2)$ . Note that x is a point and $\widetilde {x}$ is a function. Having defined the function $\widetilde {x}$ , we can write $F\big ( \widetilde {x}(p) \big )$ , which is the value of F at the point $\widetilde {x}(p)$ . Finally, the Legendre transform of F is the function $G: {\mathbb {R}} \rightarrow {\mathbb {R}}$ defined by

$\displaystyle \begin{aligned} \begin{array}{rcl} G(p) \equiv p \, \widetilde{x}(p) - F\bigl( \widetilde{x}(p) \bigr) \, . {} \end{array} \end{aligned}$

(23.10)

These definitions are illustrated in Fig. 23.5, which shows that G(p) is the negative of the y intercept of the line through the point $\Big ( x(p), F\big ( \widetilde {x}(p) \big ) \Big )$ with slope p. To see this algebraically, consider the equation (y − y ₀) = m(x − x ₀) describing a line with slope m through the point (x ₀, y ₀). The y-intercept of this line is y ₀ − mx ₀. Choosing m = p and $(x_{{0}}, y_{{0}}) = \Big ({\widetilde {x}}(p), F\big ( \widetilde {x}(p) \big ) \Big )$ , the y-intercept is $F\big ( \widetilde {x}(p) \big ) - p {\widetilde {x}}(p)$ , which by (23.10) is − G(p).

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig5_HTML.png — Figure 23.5
The Legendre transform G of F

Example 23.1 Consider the function F defined by F(x) = x ⁶. We have F ^′(x) = 6x ⁵. Setting 6x ⁵ = p yields $\widetilde {x}(p) = (p/6)^{1/5}$ . From (23.10) we have G(p) = p(p∕6)^1∕5 − [(p∕6)^1∕5]⁶ = [6^−1∕5 − 6^−6∕5]p ^6∕5. □

Exercise 23.2 Derive an expression for the Legendre transform of 1 + x ⁴, and plot G(p) versus p. □

Exercise 23.3 Let G be the Legendre transform of F. Prove that the Legendre transform of G is F itself. □

Suppose F satisfies, in addition to assumptions (i) and (ii) above, the additional assumptions that F(x) →∞ as |x|→∞, and that F ^′(x) = 0 at some point x. Then the following informal argument suggests that G is convex. Under these additional assumptions, if x ≪ 0, then p = F ^′(x) ≪ 0, so the intercept − G(p) ≪ 0, which means G(p) ≫ 0. As x increases then p = F ^′(x) increases, so the intercept − G(p) increases, which means G(p) decreases. As x increases beyond the point where F ^′(x) = 0, the derivative p = F ^′(x) continues to increase, so the intercept − G(p) decreases, which means G(p) increases. The assumptions in the above informal argument are stronger than required: it was shown in [Rockafellar 74] that if $F:\mathbb {R}^E \rightarrow \mathbb {R}$ is a finite and differentiable convex function such that for each $p \in \mathbb {R}^E$ the equation ∇F(x) = p has a unique solution $\widetilde {x}(p)$ , then G is a finite and differentiable convex function.

The Legendre transform of F is a special case of the conjugate $F^{\star }: {\mathbb {R}} \rightarrow {\mathbb {R}}$ of F, defined in [Rockafellar 74]:

$\displaystyle \begin{aligned} \begin{array}{rcl} F^{\star}(p) = \sup_{x \in {\mathbb{R}}} \{ px - F(x) \} \; . {} \end{array} \end{aligned}$

(23.11)

Since the conjugate F ^⋆ is the supremum of an (infinite) collection of linear functions, where each linear function maps p to px − F(x), then F ^⋆ is convex. For a given p, to evaluate F ^⋆(p) we maximize px − F(x) over $x \in {\mathbb {R}}$ . The definition of F ^⋆ does not require F to be differentiable. However, since in this section we do assume that F is twice differentiable, to maximize px − F(x) we take the derivative with respect to x and equate the derivative to 0. The maximum is achieved at the point x at which F ^′(x) = p; this point is, by definition, $\widetilde {x}(p)$ . Since F is strictly convex, then $\widetilde {x}(p)$ is the unique point yielding a global maximum, over ${\mathbb {R}}$ , of the concave function px − F(x). This is illustrated in Fig. 23.6, where $p \, \widetilde {x}(p) - F\bigl ( \widetilde {x}(p) \bigr )$ (the length of the green vertical bar) exceeds $p \, \hat {x} - F\bigl ( \hat {x} \bigr )$ (the length of the red vertical bar), where ${\hat {x}}$ is an arbitrary point not equal to ${\widetilde {x}(p)}$ . From (23.11) we have $F^{\star }(p) = p \, \widetilde {x}(p) - F\bigl ( \widetilde {x}(p) \bigr )$ , which agrees with (23.10), since G is a special case of F ^⋆. Thus G(p) = F ^⋆(p).

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig6_HTML.png — Figure 23.6
The conjugate F ^⋆ of F

We have shown that, under assumptions (i) and (ii) on the convex function F, we can describe F in two different ways. The usual way is to specify the functional form of F. The Legendre transform provides a second way by specifying, for each possible value of F ^′(x), the vertical intercept of the line tangent to F with slope F ^′(x). While the above discussion considered only a function of a single variable, the theory of Legendre transforms and conjugate functions holds for multivariate functions [Rockafellar 74, Zia 09].

23.6 Graph Spectra

The spectrum of $\mathbb {G}$ is the set of eigenvectors and eigenvalues of the node-node adjacency matrix M of $\mathbb {G}$ (Sect. 2.1). Since ${\mathbb {G}}$ is undirected and has no self-loops, then the diagonal elements of M are zero, and M is symmetric (M _ij = M _ji).³ There is a large body of work, including several books, on graph spectra. To motivate our brief discussion of this subject, we consider the 1-dimensional quadratic assignment problem studied in [Hall 70]. This is the problem of placing the nodes of $\mathbb {G}$ on a line to minimize the sum of the squared distances between adjacent nodes. (Here “adjacent” refers to adjacency in $\mathbb {G}$ .) A single constraint is imposed to render infeasible the trivial solution in which all nodes are placed at the same point. Hall’s algorithm uses the method of Lagrange multipliers for constrained optimization problems, and Hall’s algorithm can be extended to the placement of the nodes of ${\mathbb {G}}$ in two dimensions. Thus spectral graph analysis is a key technique in various methods for drawing graphs. Spectral methods are also used for community detection in networks, particularly in social networks.

Let x denote an N-dimensional column vector (i.e., a matrix with 1 column and N rows). The transpose x ^′ of x is an N-dimensional row vector (i.e., a matrix with 1 row and N columns). Consider the optimization problem of minimizing the objective function

$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \, \Big({x_{{i}}} - {x_{{j}}}\Big)^2 M_{ij} \; {} \end{array} \end{aligned}$

(23.12)

subject to the single constraint $\sum _{i=1}^N {x_{{i}}}^2 = 1$ . This constraint is compactly expressed as x ^′x = 1. Thus the problem is to place N nodes on the line (hence the adjective 1-dimensional) to minimize the sum of squared distances, where the sum is over all pairs of adjacent nodes (i.e., adjacent in $\mathbb {G}$ ). For example, let N = 3 and let $\mathbb {G}$ be the complete graph K ₃. Suppose the three nodes are placed at positions 1∕6, 1∕3, and 1∕2. Since each off-diagonal element of M _ij is 1, the objective function value is

$\displaystyle \begin{aligned} \left( \frac{1}{2} \right) \left[2 \left(\frac{1}{6}-\frac{1}{3} \right)^2 + 2 \left(\frac{1}{6}-\frac{1}{2} \right )^2 + 2 \left(\frac{1}{3}-\frac{1}{2} \right)^2 \right] \, . \end{aligned}$

Clearly, $x = (\frac {1}{\sqrt {N}}, \cdots , \frac {1}{\sqrt {N}})^{\prime }$ solves the optimization problem of minimizing (23.12) subject to the constraint x ^′x = 1, since this x yields an objective function value of 0. Since this x places each node at the same position, we exclude this x and seek a non-trivial solution.

Let D be the N × N diagonal matrix defined by $D_{ii} = \sum _{j=1}^N M_{ij}$ for i = 1, 2, …, N and D _ij = 0 for i ≠ j. Thus D _ii is the sum of the elements in row i of M. Since M is symmetric, D _ii is also the sum of the elements in column i of M. Compute the N × N matrix D − M, where (D − M)_ij = D _ij − M _ij.

Example 23.2 [Hall 70] Consider the ${\mathbb {G}}$ illustrated in Fig. 23.7. Suppose columns 1, 2, 3, and 4 of M correspond to nodes a, b, c, and d, respectively. Then

$\displaystyle \begin{aligned} M &= \left( \begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \end{array} \right) \;\;\;\; \;\;\;\; D = \left( \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 2 \end{array} \right)\\ D-M &= \left( \begin{array}{rrrr} 1 & 0 & 0 & -1 \\ 0 & 2 & -1 & -1 \\ 0 & -1 & 1 & 0 \\ -1 & -1 & 0 & 2 \end{array} \right) \; \; \end{aligned}$

□

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig7_HTML.png — Figure 23.7
Small network to illustrate Hall’s method

Consider the objective function (23.12). For simplicity, we write ∑_i to denote $\sum _{i=1}^N$ and ∑_j to denote $\sum _{j=1}^N$ . We have

$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{1}{2} \sum_{i} \sum_{j} \, \Big({x_{{i}}} - {x_{{j}}}\Big)^2 M_{ij} &\displaystyle = &\displaystyle \frac{1}{2} \sum_{i} \sum_{j} \, \Big({x_{{i}}}^2 - 2 {x_{{i}}} {x_{{j}}} + {x_{{j}}}^2\Big) M_{ij} \\ &\displaystyle = &\displaystyle \frac{1}{2} \left( \sum_{i} \, {x_{{i}}}^2 D_{ii} - 2 \sum_{i} \sum_{j} {x_{{i}}} {x_{{j}}} M_{ij} + \sum_{j} {x_{{j}}}^2 D_{jj} \right) \\ &\displaystyle = &\displaystyle \sum_{i} \, {x_{{i}}}^2 D_{ii} - \sum_{i} \sum_{j} {x_{{i}}} {x_{{j}}} M_{ij} \\ &\displaystyle = &\displaystyle x^{\prime} D x - x^{\prime} M x \\ &\displaystyle = &\displaystyle x^{\prime} (D-M) x {} \; . \end{array} \end{aligned}$

(23.13)

Since ∑_i∑_j (x _i − x _j)²M _ij ≥ 0 for all x then from (23.13) we have x ^′(D − M)x ≥ 0 for all x; that is, the matrix D − M is positive semi-definite. Thus the N eigenvalues of D − M are nonnegative.

Let ϕ ₀ be the N-dimensional column vector each of whose components is 1. The transpose ϕ ₀ ^′ of ϕ ₀ is the N-dimensional row vector (1, 1, …, 1). By definition of D we have ϕ ₀ ^′(D − M) = 0, where the “0” on the right-hand side of this equality is the N-dimension row vector each of whose components is the number zero. Hence ϕ ₀ is an eigenvector of D − M with associated eigenvalue 0. It was proved in [Hall 70] that for a connected graph ${\mathbb {G}}$ the matrix D − M has rank N − 1, which implies that the other N − 1 eigenvalues of D − M are positive.

From (23.13), minimizing (23.12) subject to x ^′x = 1 is equivalent to minimizing x ^′(D − M)x subject to x ^′x = 1. We write x ^′x = 1 as x ^′Ix − 1 = 0, where I is the N × N identity matrix. Invoking the classical theory of constrained optimization, we associate the Lagrange multiplier λ with the constraint x ^′Ix − 1 = 0 and form the Lagrangian L(x, λ) defined by

$\displaystyle \begin{aligned} L(x,\lambda) \equiv x^{\prime} (D-M) x - \lambda ( x^{\prime} I x - 1) = x^{\prime}(D-M - \lambda I ) x - \lambda \; . \end{aligned}$

Each solution of the optimization problem must satisfy, for i = 1, 2, …, N,

$\displaystyle \begin{aligned} \frac{\partial L(x, \lambda)} {\partial {x_{{i}}} } = 0 \end{aligned}$

which implies (D − M − λI)x = 0, that is, (D − M)x = λx. Thus (x, λ) solves the optimization problem only if (i) x ^′x = 1 and (ii) x is an eigenvector of D − M and λ is the associated eigenvalue. If x does satisfy (i) and (ii), then

$\displaystyle \begin{aligned} x^{\prime} (D-M)x = \lambda x^{\prime} I x = \lambda \end{aligned}$

so the objective function value at x is equal to the associated eigenvalue. Since we want to minimize the objective function, we want the smallest eigenvalue. But, as we have shown, the smallest eigenvalue is 0 and corresponds to the trivial solution with all x _i equal. Hence the desired solution is the eigenvector corresponding to the second smallest eigenvalue, and the minimal objective function value is the second smallest eigenvalue.

Example 23. 2 (Continued) For the G illustrated in Fig. 23.7, the four eigenvalues and eigenvectors are given in Table 23.1. The solution is x = ϕ ₂, which yields an objective function value of 0.586; this solution “unravels” ${\mathbb {G}}$ , placing node a to the left of node d, node d to the left of node b, and node b to the left of node c. □

Table 23.1

Eigenvalues and eigenvectors for the example

Eigenvalue	Eigenvector
λ ₁ = 0.0	ϕ ₁ = (0.5, 0.5, 0.5, 0.5)
λ ₂ = 0.586	ϕ ₂ = (−0.653, 0.270, 0.653, − 0.270)
λ ₃ = 2.0	ϕ ₃ = (−0.5, 0.5, − 0.5, 0.5)
λ ₄ = 3.414	ϕ ₄ = (0.270, 0.653, − 0.270, − 0.653)

Exercise 23.4 Apply Hall’s method to find the optimal 1-dimensional layout of $\mathbb {G}$ when ${\mathbb {G}}$ is the network of three nodes connected in a ring. □

Exercise 23.5 Apply Hall’s method to find the optimal 1-dimensional placement of nodes for the graph of Fig. 23.8. □

../images/487758_1_En_23_Chapter/487758_1_En_23_Fig8_HTML.png — Figure 23.8
Small hub and spoke network

References

[Hall 70]
K. Hall, “An r-Dimensional Quadratic Placement Algorithm”, Management Science 17 (1970) pp. 219–229Crossref
[Kennerly 11]
S. Kennerly, “A Graphical Derivation of the Legendre Transform”, April 12, 2001 www.physics.drexel.edu/~skennerly/maths/Legendre.pdf
[Mangasarian 69]
O.L. Mangasarian, Nonlinear Programming (McGraw-Hill, New York, 1969)zbMATH
[Manin 06]
Y.I. Manin, “The Notion of Dimension in Geometry and Algebra”, Bulletin (New Series) of the American Mathematical Society 43 (2006) pp. 139–161
[Rockafellar 74]
R.T. Rockafellar, Conjugate Duality and Optimization (Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1974)
[Smail 53]
L.L. Smail, Analytic Geometry and Calculus (Appleton-Century-Crofts, New York, 1953)zbMATH
[Zia 09]
R.K.P. Zia, E.F. Redish, and S.R. McKay, “Making Sense of the Legendre Transform”, American Journal of Physics 77 (2009) 614Crossref