7.9   THE METHOD OF ITERATION FOR DOMINANT EIGENVALUES

In this section we shall present a method for computing the dominant eigenvalue or eigenvalues of an n × n matrix A. To take the most useful case, we shall assume that A is real. In scientific and engineering applications, we are typically interested in finding the real and the complex eigenvalues of a real matrix. Our analysis could readily be extended to matrices with complex components. In Section 7.10 we will show how to compute the smaller eigenvalues.

A real matrix A has a characteristic polynomial det (λIA) with real coefficients. Therefore, the eigenvalues of A occur as real numbers or as complex-conjugate pairs, λ and image. For clarity, we shall suppose that the eigenvalues λ1, λ2, . . . , λn of A are distinct, so that A has n linearly independent eigenvectors u1, u2, . . . , un. (Without difficulty the reader will be able to generalize this analysis, using either the Jordan form of A or an expansion in principal vectors, as in Section 7.6.)

Let image. The dominant eigenvalues are defined to be the eigenvalues ot greatest absolute value. We shall consider only two cases.

Real case. Here we suppose that A has a single real eigenvalue λ1 of maximum absolute value. We allow λ1 to be either positive or negative. Thus,

 

image

Complex case. Here we suppose that A has the complex-conjugate dominant eigenvalues

 

image

Other cases are possible but unusual, and we ignore them. For example, A could have the dominant eigenvalues +1 and −1. Or A could have the five dominant eigenvalues λk = −7 exp (2kπi/5) (k = 1, . . . , 5). Observe that, in both of these examples, the translation λkλk + 1, caused by adding 1 to each diagonal element of A, produces eigenvalues λk + 1 which fall into the real case (1), or the complex case (2).

In the real case (1), we pick an initial vector x0 and form the iterates x1 = Ax0, x2 = Ax1, . . . . If x0 has the expansion

 

image

then the iterate xk has the expansion

 

image

 

image

where, since image for all image.

 

image

We suppose that ξ1 ≠ 0. This supposition is almost surely correct if x0 is picked in some bizarre way, e.g.,

 

image

Then for large k, as (5) shows,

 

image

For large k, successive vectors are approximately multiples of each other, i.e., they are approximately eigenvectors:

 

image

In the complex case (2), the iterates xk = Akx0 do not approximate eigenvectors as k → ∞.

EXAMPLE 1. Let

 

image

The eigenvalues are λ1 = 1 + i, λ2 = 1 − i. The iterates x0, x1,. . . are

 

image

The vector xk+l is found by rotating xk through 450 and multiplying the length by image.

In general, let x0 have an eigenvector expansion (3). Even though we are computing complex eigenvalues λ1 and image we will use only real numbers until the last step. We pick for x0 a real vector such as the vector defined by (7). The iterate xk has the eigenvector expansion (4). Let

 

image

Since image for v > 2, the expansion (4) yields

 

image

where rk → 0 as k → ∞. Since A and xk are real, we may assume that image and that the eigenvectors u1 and u2 are complex conjugates. We assume that ξ1 ≠ 0. Note that, for all k, since u1 and u2 are independent,

 

image

where image. Therefore, we have the asymptotic form

 

image

In the complex case for large k, successive vectors xk do not become multiples of each other. For large k, the vectors xk lie approximately in the plane spanned by the independent eigenvectors u1 and u2. Therefore, successive triples xk, xk+1, xk+2 are almost dependent. Explicitly, if α and β are the real coefficients of the quadratic

 

image

then, for all k, image, and hence

 

image

The asymptotic form (14) now yields the approximate equation

 

image

If we can find the coefficients of dependence α and β we can compute λ1 and λ2 as the roots of the quadratic (15):

 

image

Since λ1 and λ2 are complex, α2 – 4β < 0.

EXAMPLE 2. In Example 1, Since λ1 = 1 + i and λ2 = 1 − i are the only eigenvalues, the iterates (11) satisfy

 

image

Here α = –2, β = 2, (λλ1)(λλ2) = λ2 − 2λ + 2.

Computing procedure. In practice, we are given an n × n real matrix A; we assume that we have either the real case (1) or the complex case (2), but we do not know which. If we have the real case, we must compute λ1; if we have the complex case, we must compute λ1 and image

Begin with the vector x = x0 defined, say, by (7). Compute the first iterate y = Ax. Now make a least-squares fit of x to y, i.e., choose the real number λ to minimize

 

image

Setting the derivative with respect to λ equal to zero, we find the value

 

image

Define a small tolerance image > 0, for example, image = 10−5. If the quantities x, y, and λ satisfy

 

image

we conclude that we have the real case, and we assign the value λ to λ1.

If the test (21) is failed, we compute the third iterate, z = Ay. Pick the real numbers α and β which minimize

 

image

Calculus gives the necessary conditions

 

image

The optimal values α and β are found from (23) to be

 

image

If the quantities x,y, z, α, β satisfy

 

image

we conclude that we have the complex case, and we assign the root values (18) to λ1 and λ2.

Suppose that both the real test (21) and the complex test (25) are failed. Then we must iterate further. If we form the iterates Akx without some scaling, we may eventually obtain numbers too large or too small for the floating-point representation. The last computed iterate was z = A2x. We scale z by dividing z by the component zm of largest absolute value. We then define

 

image

We now repeat the preceding process: We form y = Ax and make the real test (21). If that test is failed, we form z = Ay and make the complex test (25). If the complex test is also failed, we define a new vector x by (26), etc. If image is not too small, one of the tests ultimately will be passed. Scaling does not affect the asymptotic equation for xk = x, xk+1 = y, and xk+2 = z:

 

image

because the equation is homogeneous. The sequence of scaled vectors x; y, z; x, y, z;. . . is simply

 

image

where σ1, σ2,. . . are nonzero scalars.

The method of iteration suffers very little from rounding error. If a new x is slightly in error, we may simply regard this x as a new initial vector which is, no doubt, better than the previous initial vector. Rounding error affects the accuracy of the computed eigenvalues only in the last stage of the computation, when x is fitted to y, or when x and y are fitted to z.

The reader may have wondered why, if we are computing eigenvalues, we do not simply find the coefficients of the characteristic polynomial φ(λ) = det (λIA). Then the eigenvalues λj could be found by some numerical method for solving polynomial equations image Indeed, this is sometimes done. But unless an extremely efficient method such as that of Householder and Bauer (see Section 7.12) is used, there is a large rounding error from the extensive computations needed to obtain the n coefficients of image from the n2 components of A. Even for n = 5 or 6 the rounding error may be quite large.

Moreover, a root λj of image depends unstably on the coefficients if image is small. Infinitesimal variations dc1. . . , dcn in the coefficients produce the variation according to the formula of calculus:

 

image

Since image we find

 

image

PROBLEMS

    1.  Show what happens when the method of iteration is applied to

 

image

What are the coordinates of the vectors xk for k = 760, 761, 762, 763 ?

    2.  Suppose that A has a simple eigenvalue λi which is known to be approximately equal to −1.83. Form the matrix B = (A + 1.83I)−1, and apply the method of iteration to B. What is the dominant eigenvalue of B ? How can you compute λi?

    3.  Suppose that the real matrix A has distinct eigenvalues. Suppose that two of the eigenvalues, λi and image are known to be approximately equal to 1 ± i. Form the real matrix

 

image

How can the method of iteration be used to compute λj and image ?

    4.  Let image. Define the rate of convergence to be the reciprocal of the asymptotic value ve for the number of iterations needed to reduce the error by the factor 1/e. Show that, for the method of iteration,

 

image

    5.  Let A be a real, symmetric, positive-definite matrix with the eigenvalues λ1 > λ2 > ... > λn. Suppose that we have computed numerical values for λ1 and for a corresponding unit eigenvector u1. If y ≠ a multiple of u1, form

 

image

The vector x0 is nonzero and is (apart from rounding error) orthogonal to u1. If there were no rounding error and if u1 were exact, show that the method of iteration, applied to x0, would produce the second eigenvalue, λ2. Explain why rounding error causes the method of iteration, applied to x0, to produce the dominant eigenvalue, λ1, again.

    6.  In formula (24), the denominator

 

image

equals zero if and only if the vectors x and y are dependent, as we saw in the proof of the Cauchy-Schwarz inequality. Why are the vectors x and y independent whenever formula (24) is applied in the method of iteration ?

    7.  Let A be a real, symmetric matrix, with eigenvalues image.

Let λ be any number satisfying the test for a real eigenvalue:

 

image

Prove that, for some eigenvalue λj,

 

image

Thus, image represents an upper bound for the proportional error. (Expand x in terms of unit orthogonal eigenvectors u1, . . . , un.)

   8.* Let A be an n × n matrix with independent unit eigenvectors u1, . . . , un. Let T be the matrix with columns u1, . . .,un. If

 

image

prove that, for some eigenvalue λ j,

 

image

Use the inequality

 

image

7.10   REDUCTION TO OBTAIN THE SMALLER EIGENVALUES

In the last section we supposed that A was an n × n real matrix with real and complex eigenvalues λl . . . , λn. We showed how to compute either a dominant real eigenvalue, λ1, satisfying

 

image

or a dominant complex-conjugate pair, image, satisfying

 

image

We will now show how to obtain from A a reduced matrix R whose eigenvalues are the remaining eigenvalues of A. The method of iteration for dominant eigenvalues, when applied to R, yields one or two smaller eigenvalues of A. The matrix R then can be reduced further to a matrix R′, etc., until all the eigenvalues of the original matrix A have been computed.

In the real case (1) the method of iteration yielded quantities λ = λ1, x, and y = Ax satisfying

 

image

apart from a negligible error. Suppose for the moment that

 

image

Form the nonsingular n × n matrix

 

image

We will show that

 

image

where the (n − 1) × (n − 1) matrix R has the required property

 

image

The numbers * in the top row of (6) are irrelevant.

To prove (6), multiply (5) on the left by A. Since Ax = λ1x and Aej = aj, the jth column of A, we find

 

image

The matrix B = T−lAT satisfies the equation TB = AT. If the columns of B are b1, . . . , bn, then the columns of TB are Tb1, . . . , Tbn. By (8), the equation TB = AT implies

 

image

Since x is the first column of T, the solutionof the equation Tbl = λ1x is

 

image

Therefore, B = T −lAT has the form (6). The identity (7) follows by taking the characteristic determinants of the matrices on both sides of (6):

 

image

Recalling the invariance (see Theorem 1, Section 4.2)

 

image

we find from (11) the identity

 

image

Division of both sides of (13) by λλ1 yields the required result (7). Note that I in the expression λIR, is the identity of order n − 1.

To compute R, we verify that

 

image

Indeed, since x1 = 1, the matrix (14) times the matrix (5) equals I. Since R comprises the elements in i, image in the product T −1 AT, we have

 

image

where X consists of the rows image of T −1, and Y consists of the columns image of T. Explicitly,

 

image

Therefore,

 

image

which yields R = (rij) in the explicit form

 

image

Now we shall eliminate the assumption that 1 = x1 = max | xi |. This assumption was convenient because it insured that the matrix T, denned by (5), was nonsingular and because it produced no large numbers xi + 1 in the final formula (18). Given A, λ1, and x ≠ 0, we will find a new A and a new x satisfying Ax = λ1x such that the new A has the same eigenvalues as the old A, and such that the new x does satisfy 1 = x1 = max | xi |. Let

 

image

Since x ≠ 0, we have xm ≠ 0. The vector u satisfies

 

image

Let Pij be the matrix which permutes components i and j of a vector. For example, if n = 3,

 

image

Pij is the matrix formed by interchanging rows i and j of the identity matrix. Note that

 

image

Note also that APij is formed by interchanging columns i and j in A; PijA is formed by interchanging rows i and j in A.

Referring to (20), we see that υ = Plm u is a vector with maximal component υ1 = 1. Further,

 

image

But, by (21), PAP = P−l AP, which has the same eigenvalues as A. Thus, we define

 

image

The new matrix A and the new vector x have all the required properties, including (4). Note that the new A is formed from A by interchanging rows 1 and m and interchanging columns 1 and m. The reduced matrix R is then computed directly from formula (18).

In the complex case (2), the method of iteration yielded real quantities α, β, x, y = Ax, and z = Ay satisfying

 

image

We know that x and y are independent because in the testing procedure of the preceding section, the numbers α and β were computed only after the failure of an attempt to express y as a scalar multiple of x ≠ 0. For the moment assume that x and y have the special forms

 

image

The coefficients α, β satisfy

 

image

We will find an (n − 2) × (n − 2) matrix R whose eigenvalues are λ3, . . ., λn.

Define the n × n matrix

 

image

Since Ax = y and Ay = z = − βxαy,

 

image

The matrix B = T −lAT satisfies TB = AT. If b1, ... are the columns of B, then by (28)

 

image

By (27) we conclude that bl and b2 have the forms

 

image

Therefore, B = T −lAT has the form

 

image

The numbers * are irrelevant. Because of the zeros below the 2 × 2 block in the upper-left corner,

 

image

Because B is similar to A, B has the same characteristic determinant. Therefore (32) implies

 

image

The identity λ2 + αλ + β = (λλ1)(λλ2) now gives

 

image

To compute R, we verify that

 

image

Indeed, since x and y have the special forms (25), the matrix (35) times the matrix (27) equals I. Since R consists of the elements in rows image and columns image of the product T−lAT, we have

 

image

where X consists of the rows image of T−l, and Y consists of the columns image of T. Explicitly,

 

image

Therefore,

 

image

which yields R = (rij) in the explicit form

 

image

Suppose now that x, y = Ax, and z = Ay satisfy the dependence (24) but that x and y do not have the special form (25). Then x and y are independent vectors satisfying

 

image

Here [xy] represents the n × 2 matrix with columns x and y. Let N and Q be nonsingular n × n and 2 × 2 matrices. By (40),

 

image

Set

 

image

Suppose further that the vectors x′, y′ have the special form

 

image

In analogy to (27), let T′ be the n × n matrix

 

image

Equation (41) states that

 

image

where the 2 × 2 matrix S is denned in (42). Note that

 

image

By analogy to (31), B′ = (T′)−l AT′ has the form

 

image

Therefore,

 

image

But B′ is similar to A′, which is similar to A. Therefore,

 

image

Dividing (48) by (λλ1)(λλ2), we find

 

image

Therefore, R′ is the required reduced matrix. By analogy to (39), we have the explicit form R′ = (rij), with

 

image

It remains only to find an n × n matrix N and a 2 × 2 matrix Q such that N [xy]Q = [xy′], where x′, y′ have the form (43). Let

 

image

Define image. Define N1 = P1m, the matrix which permutes the first and wth components. Then we have the form

 

image

Let Q2 be a 2 × 2 matrix which subtracts η times the first column from the second column:

 

image

Then, by (53), we have the form

 

image

Let image; this maximum is positive because the two columns (55) remain independent. Let N2 = P2k, and let N3 divide row 2 by ζk. Then we have the form

 

image

Explicitly, N3 = diag image. Let Q3 subtract ω times column 2 from column 1:

 

image

Then we obtain x′, y

 

image

There is no need to obtain N = N3N2N1 and Q = Q1Q2Q3 explicitly, but we must obtain A′ = NAN−1 explicitly to compute the reduced matrix R′ from (51). We have

 

image

since image. Formula (59) gives the following procedure. Begin with A. Interchange columns 1 and m. Interchange rows 1 and m. Interchange columns 2 and k. Interchange rows 2 and k. Divide row 2 by ζk. Multiply column 2 by ζk. The result is A′. Since the vectors x′,y′ were given by (58), we can now compute the reduced matrix from (51).

Although the analysis has been long, the explicit computations (58), (59), (51) are very brief. Therefore, double precision can be used with a negligible increase in computing time. In any case, rounding error is reduced by the choice of the maximal pivot-elements xm and ζk.

PROBLEMS

    1.  Let

 

image

This matrix has the simple eigenvalue λ1 = 4, with the eigenvector x = col (1, 1, 1). Find the 2 × 2 reduced matrix R.

    2.  Let

 

image

This matrix has the simple eigenvalue λl = −1, with the eigenvector x = col (0, 3, 10). Find the 2 × 2 reduced matrix R.

    3.  Let

 

image

Verify that A2x − 4Ax + 5x = 0. Compute two complex-conjugate eigenvalues. Using formula (51), compute the l × l reduced matrix R. First perform the transformations (58) and (59).

    4.  Let x and Ax be independent; let x, Ax, and A2x be dependent:

 

image

If λ2 + αλ + β ≡ (λλ1)(λλ2), show that (Aλ2I)x is an eigenvector belonging to λ1

    5.  Let

 

image

Verify that x, Ax, and A2x are dependent. Compute two complex eigenvalues, and form the 2 × 2 reduced matrix, R. Then compute the other two eigenvalues of A.

7.11   EIGENVALUES AND EIGENVECTORS OF TRIDIAGONAL AND HESSENBERG MATRICES

Some methods for computing the eigenvalues of an n × n matrix A depend upon finding the coefficients cj of the characteristic polynomial

 

image

The eigenvalues are then found by some numerical method for computing the roots of polynomial equations.

In the method of Householder and Bauer, to be presented in the next section, given A, we construct a unitary matrix U such that the similar matrix U* AU has the special form

 

image

This is the Hessenberg form, with

 

image

In particular, if A is Hermitian, then H is Hermitian. A Hermitian Hessenberg matrix H is tridiagonal, with image.

The reduction (2) is useful if there is an easy way to find the coefficients of the characteristic polynomial of a Hessenberg matrix, since the identities U*AU = H, U*U = I imply

 

image

The purpose of this section is to show how to find the coefficients cj of the characteristic polynomial of a Hessenberg matrix H. As a by-product, we shall show how to find an eigenvector belonging to a known eigenvalue of H. We also will show, in the Hermitian case, that the eigenvalues can be located in intervals without any numerical evaluation of the roots of the characteristic equation.

Given H = Hn, construct the sequence of Hessenberg matrices

 

image

Let Δ1 = det H1, . . . , Δn = det Hn. We will develop a recursion formula for these determinants.

EXAMPLE 1. For n = 5 we have

 

image

We shall expand Δ5 = det H5 by the last row.

 

image

Thus, if we define Δ0 = 1,

 

image

In the general case, expanding Δn = det Hn by the last row, we find

 

image

To find the characteristic polynomial image, we only need to replace the diagonal elements hvv by hvv − λ. Then (7) yields, for image,

 

image

where image and image. This is a formula for computing, successively, image, . . . . If Hn is tridiagonal, (8) takes the simpler form

 

image

Next we shall show that, if λ is an eigenvalue of Hn, an eigenvector x = col (x1, . . . , xn) is given by

 

image

provided that at least one of these numbers xi is nonzero. Since x1 = ±α1 . . . αn−1, we have at least x1 ≠ 0 if all αv ≠ 0.

To show that (10) provides an eigenvector, we must show that, if image, then

 

image

If y = (Hn − λI)x, then

 

image

which equals image, by (8). For k < n we have

 

image

But this expression equals 0, as we see by replacing n by k in the recursion formula (8).

If H is Hermitian, with all αj ≠ 0, we will show how the eigenvalues can be located in intervals. Since image, the polynomials image, . . . , image defined by (9) satisfy

 

image

Let a < λ < b be a given interval. We suppose that neither a nor b is a zero of any of the polynomials image. We will count the number of zeros of image in the interval (a, b). Note the following properties:

      (i)   image in (a, b). [In fact, image.]

     (ii)   If image, then image and image are nonzero and of opposite signs.

Proof. By (13), if either adjacent polynomial, image or image, is zero, the other one must also be zero. But then image and image have a common zero, and the recursion formula shows that image, etc. Finally, we could conclude image, which is impossible. Therefore, image implies both image and image. By (13),

 

image

Therefore, image and image have opposite signs.

    (iii)   As λ passes through a zero of image, the quotient image changes from positive to negative.

Proof: By (ii), the adjacent polynomials image and image cannot have a common zero. By the inclusion principle (Section 6.4), the zeros μj of image separate the zeros λj of image:

 

image

Furthermore,

 

image

Therefore, image changes sign exactly once, at the pole λ = μj, between consecutive zeros λj+1, λj; and we have Fig. 7.6 as a picture of the algebraic signs of image. We see that the sign changes from + to − as λ crosses each λj.

image

Figure 7.6

Sequences of polynomials image with properties (i), (ii), (iii) are called Sturm sequences. They have this remarkable property:

Theorem 1. Let image be Sturm sequence on the interval (a, b). Let σ(λ) be the number of sign changes in the ordered array of numbers image. Then the number of zeros of the function image in the interval (a, b) equals σ(b) − σ(a).

EXAMPLE. Let

 

image

If image and image for image, then

 

image

For λ = 0 we have the signs

 

image

Hence, σ(0) = 1. For λ = 10 we have the signs

 

image

Hence, σ(10) = 3. The number of zeros of image in the interval (0, 10) is, therefore, σ(10) − σ(0) = 3 − 1 = 2.

Proof of Sturm’s Theorem. The function σ(λ) can change only when λ passes through a zero of one of the functions image. By property (i), image never changes sign in (a, b). If image, consider the triple

 

image

If image, property (ii) states that image and image have opposite signs. Therefore, the triple (14) yields exactly one sign change for all λ in the neighborhood of λ0. Assume image. Then σ(λ) cannot change as λ crosses λ0.

If λ crosses a zero λv of image, property (iii) states that the signs of

 

image

change from + + to + −, or − − to + +. Therefore, exactly one sign change is added in the sequence of ordered values image. In other words, σ(λ) increases by 1. In summary, σ(λ) increases by 1 each time λ crosses a zero of image; otherwise, σ(λ) remains constant. Therefore, σ(b) − σ(a) is the total number of zeros of image in the interval (a, b).

In practice, Sturm’s theorem can be used with Gershgorin’s theorem (Section 6.8). For tridiagonal H = H*, we know from Gershgorin’s theorem that every eigenvalue of H satisfies one of the inequalities

 

image

where h01 = hn, n+1 = 0. The union of the possibly-overlapping intervals (15) is a set of disjoint intervals

 

image

These intervals can be subdivided, and Sturm’s theorem can be applied quickly to each of the resulting subintervals (a, b).

When Sturm’s theorem is used numerically to locate the eigenvalues of tridiagonal H = H*, the coefficients of the characteristic polynomials image should not be computed. Instead, the recursion formula (13) should be used directly to evaluate successively the numbers image and hence the integer σ(λ).

In the proof of Sturm’s theorem, we assumed that

 

image

We can show that it was only necessary to assume

 

image

If (18) is true but (17) is false, then for all sufficiently small image > 0 we have

 

image

By what we have already proved, the number of zeros of image in (a + image, bimage) is σ(bimage) − σ(a + image). But

 

image

because image has no zeros for image. Thus, the number of zeros of image in (a, b) equals the number of zeros of image in (a + image, bimage), which equals σ(b) − σ(a).

For a precise computation of the eigenvalues, Newton’s method should be used after the eigenvalues have been separated and approximated by the theorems of Gershgorin and Sturm. Newton’s method for computing the real roots of an equation image is discussed in texts on numerical analysis. This method converges extremely rapidly if the first approximation is fairly accurate.

PROBLEMS

    1.  Using the recurrence formula (8), evaluate det (HλI) for the n × n Hessenberg matrix

image

    2.  Evaluate successively

 

image

   3.* Let

 

image

Define Δ0 = 1, Δ1 = c1,

 

image

Prove that

 

image

[Show that (7) implies image as in Problem 2, evaluate Δn for all n.

    4.  Using formula (10), calculate an eigenvector for the matrix

 

image

belonging to the eigenvalue λ = 0.

    5.  Let H be a Hessenberg matrix, as in formula (2). Show that H is similar to a Hessenberg matrix H′ with image if image if αj = 0. (Form image, where A is a suitable diagonal matrix.)

    6.  Let

 

image

Let image. Using the recursion formula (13), calculate numerical values for the image for λ = 0 and for λ = 4. How many eigenvalues of A lie between 0 and 4?

   7.* Let Hn be the n × n matrix

 

image

(The matrix A in the last problem is H5). Let image and, for image, let image. I Making the change of variable λ = 2 − 2 cos θ, prove that

 

image

Hence find all the eigenvalues of Hn. [Use the recursion formula (13).]

7.12   THE METHOD OF HOUSEHOLDER AND BAUER

Let A be a real, symmetric n × n matrix. We will show how A can be reduced to tridiagonal form by n − 2 real, unitary similarity-transformations. For n = 5 the process is illustrated as follows:

 

image

From the tridiagonal form, the characteristic polynomial can be found by the recursion formula (9) of the preceding section, or the eigenvalues can be located in intervals by Sturm’s theorem (Theorem 1, Section 7.11).

If A is real but not symmetric, the method will produce zeros in the upper right corner but not in the lower left corner. For n = 5, the process would appear as follows:

 

image

The final matrix is in the Hessenberg form, and the characteristic polynomial can be found by the recursion formula (8) of the last section. If A is complex, the method is modified in the obvious way, by replacing transposes xT by their complex conjugates x*.

In the real, symmetric case, the first transformation AB = UTA U is required to produce a matrix of the form

 

image

If all a1k already image, we set B = A. Otherwise, we will use a matrix of the form

 

image

Explicitly, if the real column-vector u has components ui, then U = (δij − 2uiuj). Since we require u to have unit length, U is unitary.

 

image

U is also real and symmetric: UT = U. The requirements

 

image

place n − 1 constraints on the n-component vector u. Since there is one degree of freedom, we set u1 = 0. Note that blk = 0 implies bkl = 0, since B is symmetric.

The first row of B = UAU is the first row of UA times U. Since u1 = 0, the first row of U = I − 2uuT is just (1,0, . . ., 0). Therefore, the first row of UA is just the first row of A. Hence, the first row of B is

 

image

Since U = I − 2uuT, we have aTU = aT − 2(aTu)uT, or

 

image

The unknowns u2, . . . , un must solve the n − 1 equations

 

image

To solve these equations, we observe that the rows aT and bT = aTU must have equal lengths: bTb = aTUUTa = aTa. Thus, we require

 

image

By (8), b11 = a11. Then (11) takes the form

 

image

If image, equations (8) and (12) yield

 

image

We now multiply (13) by u2, (9) by uj (j = 3, . . . , n), and add to obtain

 

image

or

 

image

or, since we require uTu = 1,

 

image

This is a relationship between the unknown inner product aTu and the unknown component u2. Using (14) in the identity (13) for bl2, we find

 

image

We have α > 0 because at least one image. Therefore, (15) has the positive solution

 

image

We choose the sign ± so that image. Since α > | a12 |, we have image image. Having computed u2, we compute u3. . . , un from (9) and (14).

 

image

Exercise If ± is chosen so that image, and if we define

 

image

verify directly that the numbers u2, . . . , un defined by (16) and (17), solve the required equations (9) and (10).

We now transform A into B = UAU, with blk = bkl = 0 for image. If n > 3, we wish next to transform B into a matrix C = VBV with

 

image

We shall require VTV = I and V = VT.

Observe the matrix B in (3). We can partition B as follows:

 

image

where A′ is the (n − 1) × (n − 1) matrix image. By analogy to (16) and (17), we can reduce A′ to the form

 

image

by a transformation B′ = UAU′, where

 

image

If A′ has not already the form (21), we simply define

 

image

and define, in analogy to (16) and (17),

 

image

The sign ± is chosen to make image.

We now assert that the required matrix V has the form

 

image

Proof: Since image, the matrix V has the form

 

image

The first row of VBV is the first row of V times BV. Since the first row of V is (1, 0, . . . , 0), the first row of VBV is just the first row of B times V, or

 

image

Here we have used the form (25) for V. Thus, the first row and, by symmetry, the first column are unaffected. The partitoned form (24) now shows that

 

image

But U′A′U′ = B′ of the form (21). Therefore, VBV = C of the required form

 

image

where A″ is an (n − 2) × (n − 2) symmetric matrix.

We note parenthetically that V in (24) has the form

 

image

where υ is the unit vector col image.

The process now continues. Let U″ be the (n − 2) × (n − 2) matrix of the form I″ − 2u″(u″)T which reduces A″ to the form

 

image

We then define

 

image

Since image, this matrix has the form

 

image

Therefore, as (28) shows, the first two rows and the first two columns are unaffected by the transformation CWCW. Therefore,

 

image

After k reductions we obtain a symmetric n × n matrix, say Z = (zij), with

 

image

We call A(k) the lower-right part.

 

image

If k < n − 2, we will reduce Z further. If at least one of the numbers image is nonzero, we form image from (16) and (17), replacing A by A(k), and n by nk. Setting image we have the form

 

image

where

 

image

If we define the n × n matrix

 

image

we must show that the first k rows and the first k columns of Z are unaffected by the transformation ZMZM = Z′. Now

 

image

Let image Then by (38), image and

 

image

By (34), image. Therefore, (39) yields

 

image

But (38) shows that

 

image

Then (40) implies image if image. By symmetry, we also have image if image.

Finally, we must show that Z′ = MZM implies

 

image

Now the form (38) shows that

 

image

Therefore,

 

image

But (44) is equivalent to (42), since

 

image

COMPUTING ALGORITHM

The preceding three paragraphs give a rigorous justification of the method. In practice, the matrix M in (38) is not formed explicitly, nor is the matrix U(k)

At each stage k = 0, 1, . . . , n − 3 our task is to replace the (nk) × (nk) matrix A(k) by the matrix

 

image

The number θ is unchanged = image. If, by chance,

 

image

then no computation is performed at stage k, since A(k) already has the form of the right-hand side of (46). Otherwise, we do compute the numbers image by (16) and (17), replacing A by A(k), and n by nk. Dropping the superscript k, we observe that

 

image

Define the quadratic form

 

image

Compute Au = b, i.e., compute

 

image

By (48), the i, j-component of U(k) A(k) U(k) is

 

image

Replace image by

 

image

Replace image by 0, . . . , 0. For image replace image by the following equivalent of (51):

 

image

where

 

image

The components with i > j are now formed by symmetry,

COUNT OF OPERATIONS

We will count the number of multiplications and the number of square roots. At stage k = 0,1, . . . , n − 3, the calculations

 

image

 

image

require 2 square roots and approximately 2(nk) multiplications. To form b in (50) requires (nk)(nk − 1) multiplications. The calculation of c requires nk − 1 multiplications. The calculations (53) require

 

image

which is equal to (nk) (nk − 1) multiplications. In summary, at stage k there are

 

image

Summing for k = 0, . . . , n − 3, we find a total of

 

image

Thus, there are very few multiplications. After all, just to square an n × n matrix requires n3 multiplications. Since the number of multiplications is small, it will usually be economically feasible to use double precision.

PROBLEMS

    1.  Reduce the matrix

 

image

to tridiagonal form by the method of Householder and Bauer.

    2.  Reduce the matrix

 

image

to Hessenberg form by the method of Householder and Bauer.

   3.* Reduce the matrix

 

image

to tridiagonal form by the method of Householder and Bauer.

   4.* According to formula (55), the H-B reduction of an n × n symmetric matrix requires “about” 2n3/3 multiplications. What is the exact number of multiplications?

    5.  If A is real but not symmetric, and if A is to be reduced to Hessenberg form by the H-B method, how should formulas (50)(53) be modified?

    6.  If a real matrix A is reduced to a tridiagonal or Hessenberg matrix X by the H-B method, show that image. (This identity can be used as a check of rounding error.)

7.13   NUMERICAL IDENTIFICATION OF STABLE MATRICES

In the theory of automatic control, in circuit theory, and in many other contexts the following problem arises: Given an n × n matrix A, determine whether all of its eigenvalues lie in the left half-plane Re λ < 0. The matrix A is called stable if all of its eigenvalues lie in the left half-plane. The importance of this concept is shown by the following theorem:

Theorem 1. An n × n matrix A is stable if and only if all solutions x(t) of the differential equation

 

image

tend to zero as t → ∞.

Proof. Suppose that A has an eigenvalue λ with Re image. If c is an associated eigenvector, then the solution

 

image

does not tend to zero as t → ∞ because image.

Suppose, instead, that A is stable. Let λ1 . . . , λs be the different eigenvalues of A, with multiplicities m1 . . . , ms. In formula (17) of Section 5.1, we showed that every component xv(t) of every solution x(t) of the differential equation dx/dt = Ax is a linear combination of polynomials in t times exponential functions exp, λit. Since all λi have negative real parts, every such expression tends to zero as t → ∞. This completes the proof.

One way to show that a matrix is stable is to compute all its eigenvalues and to observe that they have negative real parts. But the actual computation of the eigenvalues is not necessary. By results of Routh and Hurwitz, it is only necessary to compute the coefficients ci of the characteristic polynomial

 

image

Assume that A is real, andhence that the ci are real. Form the fraction

 

image

If c1 ≠ 0, this fraction can be rewritten in the form

 

image

where α1 = 1/c1 and

 

image

with

 

image

Similarly, we write

 

image

and so on until we obtain the full continued fraction

 

image

Theorem 2. The zeros of the polynomial λn + c1λn−1 + ... + c0 all lie in the left half-plane if and only if the coefficients α1 in the continued fraction (7) are all positive.

EXAMPLE 1. Consider the polynomial

 

image

We have

 

image

with

 

image

The full continued fraction is

 

image

Here

 

image

Since all αi are positive, Routh’s theorem states that the zeros λv of the polynomial (8) are in the left half-plane. In fact, the zeros are

 

image

We will not prove Routh’s theorem, which is not a theorem about matrices, but a theorem about polynomials. A proof and a thorough discussion appear in E. A. Guillemin’s book, The Mathematics of Circuit Analysis (John Wiley and Sons, Inc., 1949) pp. 395–407. If the n × n matrix A is real, and if n is small, Routh’s theorem provides a practical criterion for the stability of the matrix A. But if n is large, Routh’s theorem is less useful because it is hard to compute accurately the coefficients of the characteristic polynomial. If the coefficients ci are very inaccurate, there is no use in applying Routh’s theorem to the wrong polynomial.

A second criterion, which applies directly to the matrix A, and which does not require that A be real, is due to Lyapunov.

Theorem 3. The n × n matrix A is stable if and only if there is a positive definite Hermitian matrix P solving the equation

 

image

If n is large, this theorem is hard to use numerically. The system (9) presents a large number, n(n+ l)/2, of linear equations for the unknown components image. One must then verify that P is positive definite by the determinant criterion (see Section 6.5)

 

image

Because Lyapunov’s theorem has great theoretical interest, we shall give the proof.

Proof. First assuming PA + A*P = –I, we shall prove that A is stable. By Theorem 1, it will be sufficient to show that every solution of the differential equation dx/dt = Ax tends to zero as t → ∞. We have, for the solution x (t),

 

image

Since P is positive definite, there is a number ρ > 0 for which

 

image

Then (11) yields

 

image

Therefore,

 

image

If x(0) = b, we conclude that image, and hence

 

image

But image for some σ >0. Therefore,

 

image

which shows that x → 0 as t → ∞.

Conversely, supposing that A is stable, we shall construct the positive-definite matrix P solving PA + A*P = –I. Let the n × n matrix X (t) solve

 

image

As in the proof of Theorem 1, we know that every component of the matrix X (t) is a linear combination of polynomials in t (possibly constants) times exponential functions exp image where Re image We bear in mind that the adjoint matrix A* has the eigenvalues image if A has the eigenvalues λv. Taking adjoints in (17), we find

 

image

Now the product X (t)X*(t) satisfies

 

image

Define the integral

 

image

This integral converges because every component of the integrand has the form

 

image

which is a linear combination of polynomials in t times decaying exponential functions. By integrating equation (19) from t = 0 to t = ∞, we find

 

image

The left-hand side, – I, appears because XX* equals I when t = 0 and equals zero when t = ∞. The matrix P is positive definite because the quadratic form belonging to a vector z ≠ 0 equals

 

image

This completes the proof.

Lyapunov’s theorem shows that, if a matrix A is stable, then there is a family of ellipsoidal surfaces (Pz, z) = const relative to which every solution of dx/dt = Ax is constantly moving inward. In fact, (11) shows that

 

image

A PRACTICAL NUMERICAL CRITERION

Let us suppose that A is an n × n matrix, not necessarily real, with n large. We will give a criterion for the stability of A which

      (a)  does not require explicit computation of the eigenvalues;

      (b)  does not require explicit computation of the coefficients of the characteristic polynomial; and

      (c)  does not require the solution of systems of N linear equations with image

If A has eigenvalues λ1, . . . , λn, and if τ is any positive number, the numbers

 

image

all lie in the unit circle | μ | < 1 if and only if all the numbers λj lie in the left half-plane. This is evident from Figure 7.7.

image

Figure 7.7

The absolute value | μj | equals the distance of λj from – τ divided by the distance of λj from τ. In the figure Re λj > 0; therefore | μj | > 1.

The numbers μj are the eigenvalues of the n × n matrix

 

image

unless τ = some λj. Proof:

 

image

Assume that τ > 0 has been chosen. The problem now is to determine whether all the eigenvalues μj of the matrix B lie inside the unit circle | μj | < 1. By Theorem 3 in Section 4.9, all | μj | < 1 if and only if

 

image

In principle, we could form the powers Bk, but we prefer not to do so because each multiplication of two n × n matrices requires n3 scalar multiplications.

We prefer to form the iterates Bkx(0) = x(k), where x(0) is some initial vector. As the vectors x(0), x(l), x(2), . . . are computed, we look at the successive lengths squared:

 

image

If these numbers appear to be tending to zero, we infer that A is stable; if these numbers appear to be tending to infinity, we infer that A is unstable; if the numbers || x(k) ||2 appear to be tending neither to 0 nor to ∞, we infer that A is stable or unstable by a very small margin. We will justify this procedure in Theorem 4.

If A is a stable normal matrix—e.g., a stable Hermitian matrix—the numbers (28) will form a monotone decreasing sequence. We see this from the expansion

 

image

of x(0) in the unit orthogonal eigenvectors uj of A. The vectors uj are also eigenvectors of B. Then

 

image

where all | μj | < 1. If A is not a normal matrix, the sequence (28) will not usually be monotone.

How large should the number τ > 0 be chosen? If τ is too small or too large, all of the eigenvalues

 

image

of B are near ±1. If any λj has Re λj > 0, we would like the modulus | μj | to be as large as possible. We assert that, if Re λ > 0, then

 

image

Proof. If λ = ρe, with | θ |< π/2, we must show that

 

image

This is true if

 

image

or if

 

image

which is true, with equality only if τ = ρ. If A is given, a reasonable first guess for the modulus, ρ, of an eigenvalue is

 

image

We find this value by summing

 

image

for i = 1, . . . , n, and by falsely replacing all aij by | aij | and all xi by 1. The number τ defined by (30) is of the right general size.

Lemma. Let B be an n × n matrix. Let p be a principal vector of grade image belonging to the eigenvalue μ. Then for large k we have the asymptotic forms

 

image

where v is an eigenvector belonging to μ and where the remainder r(k) equals 0 if g = 1 or equals

 

image

Proof. By the notation (32) we mean that, for some constant γ > 0,

 

image

If g = 1, then p is an eigenvector, p = υ, and r(k) = 0 in (31). If image, we have for image

 

image

In the expansion by the binomial theorem, we have

 

image

for the principal vector p. The vector

 

image

is an eigenvector because p has grade g. Since g is fixed as k → ∞, we have

 

image

For the other terms in (33) we have image in the formula

 

image

The last three formulas yield the required asymptotic form (31).

In the following theorem, we shall use the phrase “with probability one,” and we wish now to define the sense in which this phrase will be used. Consider an arbitrary nonzero n-component vector x. Let B be an n × n matrix with the different eigenvalues μ1 . . . , μs with multiplicities m1 . . . , ms, where Σ mi = n. As we showed in Section 5.2, the vector x has a unique expansion

 

image

in principal vectors p(j) belonging to the eigenvalues μj. In the last paragraph of Section 5.3 we concluded that the space of principal vectors p(j) has dimension mj. Let Zj be the linear space of vectors x for which p(j) = 0 in the representation (36). Then the dimension of Zj is nmj < n. Thus, Zj has n-dimensional measure zero. The set Z of vectors x for which at least one p(j) = 0 in the expansion (36), is the union

 

image

of the sets Zj, all of which have measure zero. Therefore, Z has measure zero. In this sense we say that, with probability one, all of the principal vectors p(j) are nonzero in the expansion (36) of a random vector x.

Theorem 4. Let B be an n × n matrix. Let x = x(0) be a random vector.

Let

 

x(1) = Bx(0), x(2) = Bx(1), x(3) = Bx(2), . . .

Let μ1, μ2, . . . be the eigenvalues of B

          Case 1.  if all | μj | < 1, then x(k) → 0 as k → ∞.

          Case 2.  If all image, and if at least one | μj | = 1, then with probability one the sequence x(k) does not tend to zero.

          Case 3.  If at least one | μj | > 1, then with probability one || x(k) || → ∞ as k → ∞.

Proof. In Case 1 we know that Bk → 0 as k → ∞, and therefore

 

image

Let μ1, . . . , μs be the different eigenvalues of B, with multiplicities m1 . . . , ms. Let

 

image

where Σ mi = n and image. Form the expansion (36) of x as a sum Σ p(j) of principal vectors of B. With probability one all p(j) ≠ 0. In other words, with probability one every p(j) in the expansion x = Σ p(j) has grade image. Then, by the preceding lemma,

 

image

where υ(1), . . . , υ(s) are eigenvectors corresponding to μ1 . . . , μs. Let

 

image

Let J be the set of integers j such that image and such that gj = g. Then (38) yields

 

image

where ρ = | μl | = . . . = | μr |. By the notation o(Kg−1 ρk) we mean any function of k such that

 

image

For all jJ we have | μj | = ρ. Let

 

image

where the angles θj are distinct numbers in the interval image. In Cases 2 and 3 we assume image. From (40) we have

 

image

Eigenvectors υ(j) belonging to different eigenvalues μj are linearly independent. Therefore, the minimum length

 

image

is a positive number δ > 0. Letting αj = exp (ikθj) in (43), we find

 

image

In Case 2 we have ρ = 1, and || x(k) || → ∞ if g > 1. If g = 1 and ρ = 1, then (43) shows that x(k) remains bounded but (45) shows that x(k) does not tend to zero. In Case 3 we have ρ > 1, and (45) shows that || x(k) || → ∞.

PROBLEMS

    1.  Let A be an n × n matrix. Let image and ψ(λ) be polynomials with real or complex coefficients. Suppose that ψ(λj) ≠ 0 for all eigenvalues λj of A. Prove that the n × n matrix ψ(A) has an inverse. Then prove that the eigenvalues of the matrix image are imageimage. (First prove the result for triangular matrices T. Then use the canonical triangularization A = UTU*.)

    2.  Find the Lyapunov ellipses (Px, x) = const for the stable matrix

 

image

    3.  Let A be any stable n × n matrix and let Q be any n × n positive-definite Hermitian matrix. Show that there is a positive-definite matrix P solving the equation − Q = A*P + PA.

    4.  Let A be any stable n × n matrix. Let C be an arbitrary n × n matrix. Show that the equation −C = A*M + MA has a solution M. Since every equation −C = A*M + MA has a solution, M, deduce that the particular equation −I = A*P + PA has a unique solution, P.

    5.  Let λ3 + c1λ2 + c2λ + c3 have real coefficients. According to the Routh-Hurwitz criterion, what inequalities are necessary and sufficient for the zeros of the polynomial to have negative real parts?

    6.  For the stable matrix A in Problem 2, compute the number T in formula (30). Compute the matrix B in formula (26). What are the eigenvalues of B?

    7.  Let

 

image

Compute the number τ and the matrix B. For which vectors x do the iterates Bkx tend, in length, to infinity as k → ∞? What is the probability, for a random vector x, that || Bkx || → ∞ as k → ∞?

7.14   ACCURATE UNITARY REDUCTION TO TRIANGULAR FORM

In the QR method of computing eigenvalues, we shall need an accurate way of reducing an n × n matrix A to a right-triangular matrix R by transformation

 

image

Here we assume that A, U, and R are real or complex matrices, with

 

image

In principle, we could perform this transformation by the Gram-Schmidt process described in Section 4.5. If the columns a1, . . . , an of A were independent, we could form unit vectors υ1, . . ., υn such that ak would be a linear combination of υ1, . . . , υk.

 

image

In other words.

 

image

or

 

A = VR

Then we could take U = V* in (1). Unfortunately, the Gram-Schmidt process has been found to be highly inaccurate in digital computation. This inaccuracy is demonstrated in J. H. Wilkinson’s book.2

In this section, we shall show how to perform the reduction UA = R by a sequence of reflections

 

image

where each Uj has the form I − 2ww*, or where Uj = I. Here w is a unit vector depending on j. The matrices Uj are both unitary and Hermitian:

 

image

The matrix Uj will produce zeros below the diagonal in column j; and Uj will not disturb the zeros which shall already have been produced in columns 1, . . . , j – 1. Thus, if n = 4,

image

Let A1 = A, and for j > 1 let

 

Aj = Uj − 1. . . U1A

Assume that Aj has zeros below the diagonal in columns 1, . . . , j − 1. Let the j th column of Ak have the form

 

image

Let

 

image

If β = 0, then Aj already has zeros below the diagonal in column j, and we set Uj = I.

If β > 0, we look for Uj in the form

 

image

We shall have

 

image

For this vector to have zeros in components j + 1, . . . , n, we must have wj+1, . . . , wn proportional to cj+1, . . . , cn, since 2(w*c) is simply a scalar.

Thus, we look for w in the form

 

image

Then

 

image

and UjC will have zeros in components j + 1, . . . , n if

 

image

For w to be a unit vector, we must have

 

image

Elimination of | λ |2 from (11) and (12) gives

 

image

If cj = | cj | exp (j), we will look for μ in the form μ = | μ | exp (j). Then (13) becomes

 

image

which has the positive solution

 

image

Then

 

image

Now (11) yields

 

image

and hence

 

image

Then Uj = I – 2ww*.

We must show, finally, that multiplication by Uj does not disturb the zeros below the diagonal in columns 1, . . . , j − 1. If z is any of the first j − 1 columns of Aj, then zj = zj+l = . . . = zn = 0. Therefore,

 

image

Now (15) shows that the first j − 1 columns of Aj are not changed in the multiplication UjAj.

We also observe that, since 2ww* has zeros in rows 1, . . . , j − 1, the first j − 1 rows of Aj are, likewise, not changed in the multiplication UjAj.

To perform the multiplication UjAj, we do not first form the matrix Uj. If x is the kth column of image, then the kth column of the product UjAj is

 

image

In real Euclidian space, the matrix I − 2ww* represents the easily-visualized geometric transformation in Figure 7.8. Through the origin there is a plane, π, to which w is a unit normal. The vector (w*x)w is the projection of x in the direction of w. If we subtract twice this projection from x, we obtain the vector

image

Figure 7.8

 

image

which is the reflection of x through the plane π.

PROBLEMS

    1.  Let α, β, γ, δ be real. Let

 

image

Reduce A to right-triangular form by a transformation (I − 2ww*)A = R.

    2.  Using the method described in this section, reduce the matrix

 

image

to right-triangular form.

7.15   THE QR METHOD FOR COMPUTING EIGENVALUES

In this section we will present an introduction to the method of Francis and Kublanovskaya, known as the QR method. We wish to compute the eigenvalues of an n × n real or complex matrix A. By the method of the preceding section, we factor A in the form

 

image

where Q1 Q1* = I and where R1 is right-triangular. Setting A = Al, we now form

 

image

We now factor A2 in the form A2 = Q2 R2, and we form A3 = R2 Q2. In general,

 

image

Under certain conditions on A, as s → ∞ the matrix As becomes a right-triangular matrix with the eigenvalues of A on the diagonal. We will prove this convergence in the simplest case; we shall assume that A has eigenvalues with distinct moduli:

 

image

This proof and more general proofs have been given by Wilkinson.3

First, we discuss the transition from As to As+l. By the method of the preceding section, we can find unitary, Hermitian matrices U1, . . ., Un − 1, depending on s, such that

 

image

Thus, As = Qs Rs, where Qs = U1 . . . Un − 1. Then

 

image

The matrix Qs is never computed explicitly, nor are the matrices Uj. We know that each Uj has the form

 

image

as in formula (15) of the preceding section. We do compute the scalars and vectors ζj and q(j), and by means of them we compute successively

 

image

Then we compute successively

 

image

Every product X(I – ζqq*) occurring in (8) is computed as X – ζyq*, where y = Xq.

We shall now study the uniqueness and the continuity of QR factorizations.

Lemma 1. Let Abe a nonsingular matrix with two QR factorizations

 

image

where the Q’s are unitary and the R’s are right-triangular. Then there is a diagonal matrix of the form

 

image

such that

 

image

Proof. First suppose that Q = R, where

 

image

Then I = Q*Q = R*R. The first row of R*R is

 

r11(r11,r12, . . . , r1n)

Therefore,

 

image

Since r12 = 0, the second row of R*R is

 

image

Therefore,

 

image

Continuing in this way, we verify that rij = δij(i, j = 1, . . . , n). Therefore, Q = R = I.

If A = Q1 R1 = Q2 R2, where A is nonsingular, then R1−1 exists and

 

image

If the diagonal elements of the right-triangular matrix R2 R1−1 have arguments θl, . . . , θn, and if E is defined by (10), then the matrices

 

image

satisfy the conditions (12), and Q = R. Therefore, Q = R = I, and the identities Q2 = Q1E*, R2 = ER1 follow.

Lemma 2. Let Ak → I as k → ∞. Let Ak = QkRk, where Qk is unitary, and where Rk is right-triangular with positive-diagonal elements. Then Qk → I and Rk → I as k → ∞.

Proof. We have

 

image

Let rij(k) be the i, j component of Rk. The first row of image is

 

image

Hence,

 

image

The second row of image is

 

image

Using the first result (15), we conclude that

 

image

Continuing by successive rows in this way, we conclude that RkI as k → ∞. The explicit form of the inverse image. (cofactor matrix)T now shows that image, and hence also image as k → ∞.

Lemma 3. If Q1, Q2, . . . are unitary, and if R1, R2, . . . are right-triangular, and if

 

image

then, for s = 2, 3, . . . ,

 

image

and

 

image

Proof. The identities (17) imply that

 

image

Therefore, if (18) is true for any s, we have

 

image

Since (18) is true for s = 1, it is true for all s.

Formula (19) also follows by induction from (17), since

 

image

Theorem 1. Let A1 be an n × n matrix with eigenvalues λj such that |λ 1| > |λ 2| > · · · > |λ n| > 0. Let A1 have a canonical diagonalization

 

image

where D = diag (λ1, . . . , λn). Assume that all of the n principal minors of the matrix Y = X−1 are nonsingular:

 

image

Let matrices A2, A3, . . . be formed by the QR method (17). Then there is a sequence of diagonal matrices E1, E2, . . . of the form (10) such that

 

image

where the matrix Rx is right-triangular.

In this sense, As tends to a right-triangular matrix with the eigenvalues of A, in order, appearing on the main diagonal.

Proof. As we proved in Section 7.3, the condition (21) is necessary and sufficient for the matrix Y = X−1 to have a decomposition Y = LY RY, where LY is left-triangular, where RY is right-triangular, and where LY has 1’s on the main diagonal. Then

 

image

But

 

image

since the elements above the diagonal are zero, while the elements on the diagonal are 1’s, and the elements below the diagonal satisfy

 

image

Let X have the factorization X = Qx Rx where Qx is unitary and Rx is right-triangular. Then

 

image

where

 

image

Since JsI as s → ∞, Js has a QR factorization

 

image

where image is unitary and image is right-triangular; that follows from Lemma 2. Then image has the QR decomposition

 

image

But Lemma 3 gives the QR decomposition

 

image

Now Lemma 1 states that there is a matrix Es of the form (10) such that

 

image

Now the identity (19) for As yields

 

image

But

 

image

Therefore,

 

image

and

 

image

This is the justification of the QR method in the simplest case. It is possible to remove the restriction that X−1 have nonsingular principal minors. It is even possible to remove the restriction that the eigenvalues of A have distinct moduli. If the moduli are not distinct, the matrices As tend to block-right-triangular matrices, where each m × m block centered on the diagonal is an m × m matrix whose m eigenvalues are eigenvalues of equal moduli belonging to A. The theory underlying this method is surprisingly difficult.

The idea for the QR method came, no doubt, from an older method due to Rutishauser:

 

image

where L1, L2, . . . are left-triangular and R1, R2, . . . are right-triangular.

PROBLEMS

In the following problems, assume that all matrices L are left-triangular matrices with 1’s on the main diagonal, and that all matrices R are right-triangular matrices.

    1.  If A = L1R1 = L2R2, and if A is nonsingular, prove that L1 = L2 and R1 = R2.

    2.  If Ak = LkRkI as k → ∞, prove that LkI and RkI.

    3.  If A1 is a nonsingular matrix, and if

 

image

prove that

 

image

   4.* Let A1 satisfy the conditions of Theorem 1. Assume that matrices A2, A3, . . . can be formed as in the last problem. Show that, as s → ∞, As tends to a right-triangular matrix whose diagonal elements are the eigenvalues A1.


2 J. H. Wilkinson, Rounding Errors in Algebraic Processes. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1963, pp. 86–91.

3 J. H. Wilkinson, “Convergence of the LR, QR, and related algorithms,” The Computer Journal, vol. 8 (1965), 77–84.