Schaum’s Outline Linear Algebra, Sixth Edition

Images

Diagonalization: Eigenvalues and Eigenvectors

9.1 Introduction

The ideas in this chapter can be discussed from two points of view.

Matrix Point of View

Suppose an n-square matrix A is given. The matrix A is said to be diagonalizable if there exists a nonsingular matrix P such that

is diagonal. This chapter discusses the diagonalization of a matrix A. In particular, an algorithm is given to find the matrix P when it exists.

Linear Operator Point of View

Suppose a linear operator T: V → V is given. The linear operator T is said to be diagonalizable if there exists a basis S of V such that the matrix representation of T relative to the basis S is a diagonal matrix D. This chapter discusses conditions under which the linear operator T is diagonalizable.

Equivalence of the Two Points of View

The above two concepts are essentially the same. Specifically, a square matrix A may be viewed as a linear operator F defined by

where X is a column vector, and B = P⁻¹ AP represents F relative to a new coordinate system (basis) S whose elements are the columns of P. On the other hand, any linear operator T can be represented by a matrix A relative to one basis and, when a second basis is chosen, T is represented by the matrix

where P is the change-of-basis matrix.

Most theorems will be stated in two ways: one in terms of matrices A and again in terms of linear mappings T.

Role of Underlying Field K

The underlying number field K did not play any special role in our previous discussions on vector spaces and linear mappings. However, the diagonalization of a matrix A or a linear operator T will depend on the roots of a polynomial Δ(t) over K, and these roots do depend on K. For example, suppose Δ(t) = t² + 1. Then Δ(t) has no roots if K = R, the real field; but Δ(t) has roots ±i if K = C, the complex field. Furthermore, finding the roots of a polynomial with degree greater than two is a subject unto itself (frequently discussed in numerical analysis courses). Accordingly, our examples will usually lead to those polynomials Δ(t) whose roots can be easily determined.

9.2 Polynomials of Matrices

Consider a polynomial f(t) = a_ntⁿ + · · · + a₁t + a₀ over a field K. Recall (Section 2.8) that if A is any square matrix, then we define

I where I is the identity matrix. In particular, we say that A is a root of f(t) if f(A) = 0, the zero matrix.

EXAMPLE 9.1 Let Images . Then Images . Let

Then

and

Thus, A is a zero of g(t).

The following theorem (proved in Problem 9.7) applies.

THEOREM 9.1: Let f and g be polynomials. For any square matrix A and scalar k,

(i) (f + g)(A) = f (A) + g(A)

(ii) (fg)(A) = f (A)g(A)

(iii) (kf)(A) = kf (A)

(iv) f (A)g(A) = g(A) f (A).

Observe that (iv) tells us that any two polynomials in A commute.

Matrices and Linear Operators

Now suppose that T: V → V is a linear operator on a vector space V. Powers of T are defined by the composition operation:

Also, for any polynomial f (t) = a_ntⁿ + · · · + a₁t + a₀, we define f (T) in the same way as we did for matrices:

where I is now the identity mapping. We also say that T is a zero or root of f (t) if f (T) = 0, the zero mapping. We note that the relations in Theorem 9.1 hold for linear operators as they do for matrices.

Remark: Suppose A is a matrix representation of a linear operator T. Then f(A) is the matrix representation of f(T), and, in particular, f(T) = 0 if and only if f(A) = 0.

9.3 Characteristic Polynomial, Cayley–Hamilton Theorem

Let A = [a_ij] be an n-square matrix. The matrix M = A tI_n, where I_n is the n-square identity matrix and t is an indeterminate, may be obtained by subtracting t down the diagonal of A. The negative of M is the matrix tI_n − A, and its determinant

which is a polynomial in t of degree n and is called the characteristic polynomial of A.

We state an important theorem in linear algebra (proved in Problem 9.8).

THEOREM 9.2: (Cayley–Hamilton) Every matrix A is a root of its characteristic polynomial.

Remark: Suppose A =[a_ij] is a triangular matrix. Then tI − A is a triangular matrix with diagonal entries t − a_ii; hence,

Observe that the roots of Δ(t) are the diagonal elements of A.

EXAMPLE 9.2 Let Images . Its characteristic polynomial is

As expected from the Cayley–Hamilton theorem, A is a root of Δ(t); that is,

Now suppose A and B are similar matrices, say B = P⁻¹AP, where P is invertible. We show that A and B have the same characteristic polynomial. Using tI = P⁻¹tIP, we have

Using the fact that determinants are scalars and commute and that det(P⁻¹ det(P) = 1, we finally obtain

Thus, we have proved the following theorem.

THEOREM 9.3: Similar matrices have the same characteristic polynomial.

Characteristic Polynomials of Degrees 2 and 3

There are simple formulas for the characteristic polynomials of matrices of orders 2 and 3.

(a) Suppose . Then

Here tr(A) denotes the trace of A—that is, the sum of the diagonal elements of A.

(b) Suppose . Then

(Here A₁₁, A₂₂, A₃₃ denote, respectively, the cofactors of a₁₁, a₂₂, a₃₃.)

EXAMPLE 9.3 Find the characteristic polynomial of each of the following matrices:

(a) We have tr(A) = 5 + 10 = 15 and |A| = 50 − 6 = 44; hence, Δ(t) + t² − 15t + 44.

(b) We have tr(B) = 7 + 2 = 9 and |B| = 14 + 6 = 20; hence, Δ(t) = t² − 9t + 20.

EXAMPLE 9.4 Find the characteristic polynomial of Images .

We have tr(A) = 1 + 3 + 9 = 13. The cofactors of the diagonal elements are as follows:

Thus, A₁₁ + A₂₂ + A₃₃ = 31. Also, |A| = 27 + 2 + 0 − 6 − 6 − 0 = 17. Accordingly,

Remark: The coefficients of the characteristic polynomial Δ(t) of the 3-square matrix A are, with alternating signs, as follows:

We note that each S_k is the sum of all principal minors of A of order k.

The next theorem, whose proof lies beyond the scope of this text, tells us that this result is true in general.

THEOREM 9.4: Let A be an n-square matrix. Then its characteristic polynomial is

where S_k is the sum of the principal minors of order k.

Characteristic Polynomial of a Linear Operator

Now suppose T: V → V is a linear operator on a vector space V of finite dimension. We define the characteristic polynomial Δ(t) of T to be the characteristic polynomial of any matrix representation of T. Recall that if A and B are matrix representations of T, then B = P⁻¹AP, where P is a change-of-basis matrix. Thus, A and B are similar, and by Theorem 9.3, A and B have the same characteristic polynomial. Accordingly, the characteristic polynomial of T is independent of the particular basis in which the matrix representation of T is computed.

Because f(T) = 0 if and only if f(A) = 0, where f(t) is any polynomial and A is any matrix representation of T, we have the following analogous theorem for linear operators.

THEOREM 9.2′: (Cayley–Hamilton) A linear operator T is a zero of its characteristic polynomial.

9.4 Diagonalization, Eigenvalues and Eigenvectors

Let A be any n-square matrix. Then A can be represented by (or is similar to) a diagonal matrix D = diag(k₁, k₂, …, k_n) if and only if there exists a basis S consisting of (column) vectors u₁, u₂, …, u_n such that

In such a case, A is said to be diagonizable. Furthermore, D = P⁻¹ AP, where P is the nonsingular matrix whose columns are, respectively, the basis vectors u₁, u₂, …, u_n.

The above observation leads us to the following definition.

DEFINITION: Let A be any square matrix. A scalar λ is called an eigenvalue of A if there exists a nonzero (column) vector υ such that

Any vector satisfying this relation is called an eigenvector of A belonging to the eigenvalueλ.

We note that each scalar multiple kυ of an eigenvector υ belonging to λ is also such an eigenvector, because

The set E_λ of all such eigenvectors is a subspace of V (Problem 9.19), called the eigenspace of λ. (If dim E_λ = 1, then E_λ is called an eigenline and λ is called a scaling factor.)

The terms characteristic value and characteristic vector (or proper value and proper vector) are sometimes used instead of eigenvalue and eigenvector.

The above observation and definitions give us the following theorem.

THEOREM 9.5: An n-square matrix A is similar to a diagonal matrix D if and only if A has n linearly independent eigenvectors. In this case, the diagonal elements of D are the corresponding eigenvalues and D = P⁻¹AP, where P is the matrix whose columns are the eigenvectors.

Suppose a matrix A can be diagonalized as above, say P⁻¹ AP = D, where D is diagonal. Then A has the extremely useful diagonal factorization:

Using this factorization, the algebra of A reduces to the algebra of the diagonal matrix D, which can be easily calculated. Specifically, suppose D = diag(k₁, k₂, …, k_n). Then

More generally, for any polynomial f(t),

Furthermore, if the diagonal entries of D are nonnegative, let

Then B is a nonnegative square root of A; that is, B² = A and the eigenvalues of B are nonnegative.

EXAMPLE 9.5 Let Images and Images and Images . Then

Thus, υ₁ and υ₂ are eigenvectors of A belonging, respectively, to the eigenvalues λ₁ = 1 and λ₂ = 4. Observe that υ₁ and υ₂ are linearly independent and hence form a basis of R². Accordingly, A is diagonalizable. Furthermore, let P be the matrix whose columns are the eigenvectors υ₁ and υ₂. That is, let

Then A is similar to the diagonal matrix

As expected, the diagonal elements 1 and 4 in D are the eigenvalues corresponding, respectively, to the eigenvectors υ₁ and υ₂, which are the columns of P. In particular, A has the factorization

Accordingly,

Moreover, suppose f(t) = t³ − 5t² + 3t + 6; hence, f (1) = 5 and f (4) = 2. Then

Last, we obtain a “positive square root” of A. Specifically, using and , we obtain the matrix

where B² = A and where B has positive eigenvalues 1 and 2.

Remark: Throughout this chapter, we use the following fact:

That is, P⁻¹ is obtained by interchanging the diagonal elements a and d of P, taking the negatives of the nondiagonal elements b and c, and dividing each element by the determinant |P|.

Properties of Eigenvalues and Eigenvectors

Example 9.5 indicates the advantages of a diagonal representation (factorization) of a square matrix. In the following theorem (proved in Problem 9.20), we list properties that help us to find such a representation.

THEOREM 9.6: Let A be a square matrix. Then the following are equivalent.

(i) A scalar λ is an eigenvalue of A.

(ii) The matrix M = A − λI is singular.

(iii) The scalar λ is a root of the characteristic polynomial Δ(t) of A.

The eigenspace E_λ of an eigenvalue λ is the solution space of the homogeneous system MX = 0, where M = A λI; that is, M is obtained by subtracting λ down the diagonal of A.

Some matrices have no eigenvalues and hence no eigenvectors. However, using Theorem 9.6 and the Fundamental Theorem of Algebra (every polynomial over the complex field C has a root), we obtain the following result.

THEOREM 9.7: Let A be a square matrix over the complex field C. Then A has at least one eigenvalue.

The following theorems will be used subsequently. (The theorem equivalent to Theorem 9.8 for linear operators is proved in Problem 9.21, and Theorem 9.9 is proved in Problem 9.22.)

THEOREM 9.8: Suppose υ₁, υ₂, …, υ_n are nonzero eigenvectors of a matrix A belonging to distinct eigenvalues λ₁, λ₂, …, λ_n. Then υ₁, υ₂, …, υ_n are linearly independent.

THEOREM 9.9: Suppose the characteristic polynomial Δ(t) of an n-square matrix A is a product of n distinct factors, say, Δ(t) = (t − a₁)(t − a₂)(t − a_n). Then A is similar to the diagonal matrix D = diag(a₁, a₂, …, a_n).

If λ is an eigenvalue of a matrix A, then the algebraic multiplicity of λ is defined to be the multiplicity of λ as a root of the characteristic polynomial of A, and the geometric multiplicity of λ is defined to be the dimension of its eigenspace, dim E_λ. The following theorem (whose equivalent for linear operators is proved in Problem 9.23) holds.

THEOREM 9.10: The geometric multiplicity of an eigenvalue λ of a matrix A does not exceed its algebraic multiplicity.

Diagonalization of Linear Operators

Consider a linear operator T: V → V. Then T is said to be diagonalizable if it can be represented by a diagonal matrix D. Thus, T is diagonalizable if and only if there exists a basis S = {u₁, u₂, …, u_n} of V for which

In such a case, T is represented by the diagonal matrix

relative to the basis S.

The above observation leads us to the following definitions and theorems, which are analogous to the definitions and theorems for matrices discussed above.

DEFINITION: Let T be a linear operator. A scalar λ is called an eigenvalue of T if there exists a nonzero vector υ such that T(υ) = λ υ. Every vector satisfying this relation is called an eigenvector of T belonging to the eigenvalue λ.

The set E_λ of all eigenvectors belonging to an eigenvalue λ is a subspace of V, called the eigenspace of λ. (Alternatively, λ is an eigenvalue of T if λI − T is singular, and, in this case, E_λ is the kernel of λI − T.) The algebraic and geometric multiplicities of an eigenvalue λ of a linear operator T are defined in the same way as those of an eigenvalue of a matrix A.

The following theorems apply to a linear operator T on a vector space V of finite dimension.

THEOREM 9.5′: T can be represented by a diagonal matrix D if and only if there exists a basis S of V consisting of eigenvectors of T. In this case, the diagonal elements of D are the corresponding eigenvalues.

THEOREM 9.6′: Let T be a linear operator. Then the following are equivalent:

(i) A scalar λ is an eigenvalue of T.

(ii) The linear operator λI − T is singular.

(iii) The scalar λ is a root of the characteristic polynomial Δ(t) of T.

THEOREM 9.7′: Suppose V is a complex vector space. Then T has at least one eigenvalue.

THEOREM 9.8′: Suppose υ₁, υ₂, …, υ_n are nonzero eigenvectors of a linear operator T belonging to distinct eigenvalues λ₁, λ₂, …, λ_n. Then υ₁, υ₂, …, υ_n are linearly independent.

THEOREM 9.9′: Suppose the characteristic polynomial Δ(t) of T is a product of n distinct factors, say, Δ(t) = (t − a₁)(t − a₂) · · · (t − a_n). Then T can be represented by the diagonal matrix D = diag(a₁, a₂, …, a_n).

THEOREM 9.10′: The geometric multiplicity of an eigenvalue λ of T does not exceed its algebraic multiplicity.

Remark: The following theorem reduces the investigation of the diagonalization of a linear operator T to the diagonalization of a matrix A.

THEOREM 9.11: Suppose A is a matrix representation of T. Then T is diagonalizable if and only if A is diagonalizable.

9.5 Computing Eigenvalues and Eigenvectors, Diagonalizing Matrices

This section gives an algorithm for computing eigenvalues and eigenvectors for a given square matrix A and for determining whether or not a nonsingular matrix P exists such that P⁻¹ AP is diagonal.

ALGORITHM 9.1: (Diagonalization Algorithm) The input is an n-square matrix A.

Step 1. Find the characteristic polynomial Δ(t) of A.

Step 2. Find the roots of Δ(t) to obtain the eigenvalues of A.

Step 3. Repeat (a) and (b) for each eigenvalue λ of A.

(a) Form the matrix M = A − λI by subtracting λ down the diagonal of A.

(b) Find a basis for the solution space of the homogeneous system MX = 0. (These basis vectors are linearly independent eigenvectors of A belonging to λ.)

Step 4. Consider the collection S = {υ₁, υ₂, …, υ_m} of all eigenvectors obtained in Step 3.

(a) If m ≠ n, then A is not diagonalizable.

(b) If m = n, then A is diagonalizable. Specifically, let P be the matrix whose columns are the eigenvectors υ₁, υ₂, …, υ_n. Then

where λ_i is the eigenvalue corresponding to the eigenvector υ_i.

EXAMPLE 9.6 The diagonalization algorithm is applied to Images ,

(1) The characteristic polynomial Δ(t) of A is computed:

(2) Set Δ(t) = t² − 3t − 10 = (t − 5)(t + 2) = 0. The roots λ₁ = 5 and λ₂ = 2 are the eigenvalues of A.

(3) (i) We find an eigenvector v₁ of A belonging to the eigenvalue λ₁ = 5. Subtract λ₁ = 5 down the diagonal of A to obtain the matrix

The system has only one independent solution, for example, v₁ = (2, 1).

(ii) We find an eigenvector v₂ of A belonging to the eigenvalue λ₂ = −2. Subtract −2 (or add 2) down the diagonal of A to obtain the matrix

The system has only one independent solution, for example, v₂ = (−1, 3).

(4) Let P be the matrix whose columns are the eigenvectors v₁ and v₂. Then . Thus D = P⁻¹AP is the diagonal matrix whose diagonal entries are the corresponding eigenvalues:

Remark: Find a 2 × 2 matrix A with eigenvalues λ₁ = 2 and λ₂ = 3 and corresponding eigenvectors v₁ = (1, 3) and v₂ = (1, 4). We know that P⁻¹AP = D where and .

EXAMPLE 9.7 Consider the matrix Images . We have

Accordingly, λ = 4 is the only eigenvalue of B.

Subtract λ = 4 down the diagonal of B to obtain the matrix

The system has only one independent solution; for example, x = 1, y = 1. Thus, υ = (1, 1) and its multiples are the only eigenvectors of B. Accordingly, B is not diagonalizable, because there does not exist a basis consisting of eigenvectors of B.

EXAMPLE 9.8 Consider the matrix Images . Here tr(A) = 3 − 3 = 0 and |A| = 9 + 10 = 1. Thus, Δ(t) = t² + 1 is the characteristic polynomial of A. We consider two cases:

(a) A is a matrix over the real field R. Then Δ(t) has no (real) roots. Thus, A has no eigenvalues and no eigenvectors, and so A is not diagonalizable.

(b) A is a matrix over the complex field C. Then Δ(t) = (t − i)(t + i) has two roots, i and −i. Thus, A has two distinct eigenvalues i and −i, and hence, A has two independent eigenvectors. Accordingly there exists a nonsingular matrix P over the complex field C for which

Therefore, A is diagonalizable (over C).

9.6 Diagonalizing Real Symmetric Matrices and Quadratic Forms

There are many real matrices A that are not diagonalizable. In fact, some real matrices may not have any (real) eigenvalues. However, if A is a real symmetric matrix, then these problems do not exist. Namely, we have the following theorems.

THEOREM 9.12: Let A be a real symmetric matrix. Then each root λ of its characteristic polynomial is real.

THEOREM 9.13: Let A be a real symmetric matrix. Suppose u and υ are eigenvectors of A belonging to distinct eigenvalues λ₁ and λ₂. Then u and υ are orthogonal, that; is, 〈u〉, υ = 0.

The above two theorems give us the following fundamental result.

THEOREM 9.14: Let A be a real symmetric matrix. Then there exists an orthogonal matrix P such that D = P⁻¹AP is diagonal.

The orthogonal matrix P is obtained by normalizing a basis of orthogonal eigenvectors of A as illustrated below. In such a case, we say that A is“orthogonally diagonalizable.”

EXAMPLE 9.9 Let Images , a real symmetric matrix. Find an orthogonal matrix P such that P⁻¹AP is diagonal.

First we find the characteristic polynomial Δ(t) of A. We have

Accordingly, λ₁ = 6 and λ₂ = 1 are the eigenvalues of A.

(a) Subtracting λ₁ = 6 down the diagonal of A yields the matrix

A nonzero solution is u₁ = (1, −2).

(b) Subtracting λ₂ = 1 down the diagonal of A yields the matrix

(The second equation drops out, because it is a multiple of the first equation.) A nonzero solution is u₂ = (2, 1).

As expected from Theorem 9.13, u₁ and u₂ are orthogonal. Normalizing u₁ and u₂ yields the orthonormal vectors

Images

Finally, let P be the matrix whose columns are Images and Images , respectively. Then

Images

As expected, the diagonal entries of P⁻¹AP are the eigenvalues corresponding to the columns of P.

The procedure in the above Example 9.9 is formalized in the following algorithm, which finds an orthogonal matrix P such that P⁻¹AP is diagonal.

ALGORITHM 9.2: (Orthogonal Diagonalization Algorithm) The input is a real symmetric matrix A.

Step 1. Find the characteristic polynomial Δ(t) of A.

Step 2. Find the eigenvalues of A, which are the roots of Δ(t).

Step 3. For each eigenvalue λ of A in Step 2, find an orthogonal basis of its eigenspace.

Step 4. Normalize all eigenvectors in Step 3, which then forms an orthonormal basis of Rⁿ.

Step 5. Let P be the matrix whose columns are the normalized eigenvectors in Step 4.

Application to Quadratic Forms

Let q be a real polynomial in variables x₁, x₂, …, x_n such that every term in q has degree two; that is,

Images

Then q is called a quadratic form. If there are no cross-product terms x_ix_j (i.e., all d_ij = 0), then q is said to be diagonal.

The above quadratic form q determines a real symmetric matrix A = [a_ij], where a_ii = c_i and Images . Namely, q can be written in the matrix form

Images

where X = [x₁, x₂, …, x_n]^T is the column vector of the variables. Furthermore, suppose X = PY is a linear substitution of the variables. Then substitution in the quadratic form yields

Images

Thus, P^TAP is the matrix representation of q in the new variables.

We seek an orthogonal matrix P such that the orthogonal substitution X = PY yields a diagonal quadratic form for which P^TAP is diagonal. Because P is orthogonal, P^T = P⁻¹, and hence, P^TAP = P⁻¹AP. The above theory yields such an orthogonal matrix P.

EXAMPLE 9.10 Consider the quadratic form

By Example 9.9,

Images

Let Y = [s, t]^T: Then matrix P corresponds to the following linear orthogonal substitution x = PY of the variables x and y in terms of the variables s and t:

Images

This substitution in q(x, y) yields the diagonal quadratic form q(s, t) = 6s² + t².

9.7 Minimal Polynomial

Let A be any square matrix. Let J(A) denote the collection of all polynomials f(t) for which A is a root—that is, for which f(A) = 0. The set J(A) is not empty, because the Cayley–Hamilton Theorem 9.1 tells us that the characteristic polynomial Δ_A(t) of A belongs to J(A). Let m(t) denote the monic polynomial of lowest degree in J(A). (Such a polynomial m(t) exists and is unique.) We call m(t) the minimal polynomial of the matrix A.

Remark: A polynomial f(t) ≠ 0 is monic if its leading coefficient equals one.

The following theorem (proved in Problem 9.33) holds.

THEOREM 9.15: The minimal polynomial m(t) of a matrix (linear operator) A divides every polynomial that has A as a zero. In particular, m(t) divides the characteristic polynomial Δ(t) of A.

There is an even stronger relationship between m(t) and Δ(t).

THEOREM 9.16: The characteristic polynomial Δ(t) and the minimal polynomial m(t) of a matrix A have the same irreducible factors.

This theorem (proved in Problem 9.35) does not say that m(t) = Δ(t), only that any irreducible factor of one must divide the other. In particular, because a linear factor is irreducible, m(t) and Δ(t) have the same linear factors. Hence, they have the same roots. Thus, we have the following theorem.

THEOREM 9.17: A scalar λ is an eigenvalue of the matrix A if and only if λ is a root of the minimal polynomial of A.

EXAMPLE 9.11 Find the minimal polynomial m(t) of Images .

First find the characteristic polynomial Δ(t) of A. We have

Hence,

The minimal polynomial m(t) must divide Δ(t). Also, each irreducible factor of Δ(t) (i.e., t − 1 and t − 3) must also be a factor of m(t). Thus, m(t) is exactly one of the following:

We know, by the Cayley–Hamilton theorem, that g(A) = Δ(A) = 0. Hence, we need only test f(t). We have

Thus, f(t) = m(t) = (t − 1)(t − 3) = t² − 4t + 3 is the minimal polynomial of A.

EXAMPLE 9.12

(a) Consider the following two r-square matrices, where a ≠ 0:

The first matrix, called a Jordan Block, has λ’s on the diagonal, 1’s on the superdiagonal (consisting of the entries above the diagonal entries), and 0’s elsewhere. The second matrix A has λ’s on the diagonal, a’s on the superdiagonal, and 0’s elsewhere. [Thus, A is a generalization of J(λ, r).] One can show that

is both the characteristic and minimal polynomial of both J(λ, r) and A.

(b) Consider an arbitrary monic polynomial:

Let C(f) be the n-square matrix with 1’s on the subdiagonal (consisting of the entries below the diagonal entries), the negatives of the coefficients in the last column, and 0’s elsewhere as follows:

Then C(f) is called the companion matrix of the polynomial f(t). Moreover, the minimal polynomial m(t) and the characteristic polynomial Δ(t) of the companion matrix C(f) are both equal to the original polynomial f(t).

Minimal Polynomial of a Linear Operator

The minimal polynomial m(t) of a linear operator T is defined to be the monic polynomial of lowest degree for which T is a root. However, for any polynomial f(t), we have

where A is any matrix representation of T. Accordingly, T and A have the same minimal polynomials. Thus, the above theorems on the minimal polynomial of a matrix also hold for the minimal polynomial of a linear operator. That is, we have the following theorems.

THEOREM 9.15′: The minimal polynomial m(t) of a linear operator T divides every polynomial that has T as a root. In particular, m(t) divides the characteristic polynomial Δ(t) of T.

THEOREM 9.16′: The characteristic and minimal polynomials of a linear operator T have the same irreducible factors.

THEOREM 9.17′: A scalar λ is an eigenvalue of a linear operator T if and only if λ is a root of the minimal polynomial m(t) of T.

9.8 Characteristic and Minimal Polynomials of Block Matrices

This section discusses the relationship of the characteristic polynomial and the minimal polynomial to certain (square) block matrices.

Characteristic Polynomial and Block Triangular Matrices

Suppose M is a block triangular matrix, say , where A₁ and A₂ are square matrices. Then tI − M is also a block triangular matrix, with diagonal blocks tI − A₁ and tI − A₂. Thus,

That is, the characteristic polynomial of M is the product of the characteristic polynomials of the diagonal blocks A₁ and A₂.

By induction, we obtain the following useful result.

THEOREM 9.18: Suppose M is a block triangular matrix with diagonal blocks A₁, A₂, …, A_r. Then the characteristic polynomial of M is the product of the characteristic polynomials of the diagonal blocks A_i; that is,

EXAMPLE 9.13 Consider the matrix Images .

Then M is a block triangular matrix with diagonal blocks and . Here

Accordingly, the characteristic polynomial of M is the product

Minimal Polynomial and Block Diagonal Matrices

The following theorem (proved in Problem 9.36) holds.

THEOREM 9.19: Suppose M is a block diagonal matrix with diagonal blocks A₁, A₂, …, A_r. Then the minimal polynomial of M is equal to the least common multiple (LCM) of the minimal polynomials of the diagonal blocks A_i.

Remark: We emphasize that this theorem applies to block diagonal matrices, whereas the analogous Theorem 9.18 on characteristic polynomials applies to block triangular matrices.

EXAMPLE 9.14 Find the characteristic polynomal Δ(t) and the minimal polynomial m(t) of the block diagonal matrix:

Then Δ(t) is the product of the characterization polynomials Δ₁(t), Δ₂(t), Δ₃(t) of A₁, A₂, A₃, respectively. One can show that

Thus, Δ(t) = (t − 2)³(t − 7)². [As expected, deg Δ(t) = 5.]

The minimal polynomials m₁(t), m₂(t), m₃(t) of the diagonal blocks A₁, A₂, A₃, respectively, are equal to the characteristic polynomials; that is,

But m(t) is equal to the least common multiple of m₁(t), m₂(t), m₃(t). Thus, m(t) = (t − 2)²(t − 7).

SOLVED PROBLEMS

Polynomials of Matrices, Characteristic Polynomials

9.1. Let Images . Find f(A), where

9.2. Find the characteristic polynomial Δ(t) of each of the following matrices:

Use the formula (t) = t² − tr(M) t + |M| for a 2 × 2 matrix M:

(a) tr(A) = 2 + 1 = 3, |A| = 2 − 20 = −18, so Δ(t) = t² − 3t − 18

(b) tr(B) = 7 − 2 = 5, |B| = 14 + 15 = 1, so Δ(t) = t² − 5t + 1

9.3. Find the characteristic polynomial Δ(t) of each of the following matrices:

Use the formula Δ(t) = t³ − tr(A)t² + (A₁₁ + A₂₂ + A₃₃)t − |A|, where A_ii is the cofactor of a_ii in the 3 − 3 matrix A = [a_ij].

(a) tr(A) = 1 + 0 + 5 = 6,

Thus,

(b) tr(B) = 1 + 2 − 4 = −1

Thus,

9.4. Find the characteristic polynomial Δ(t) of each of the following matrices:

(a) A is block triangular with diagonal blocks

(b) Because B is triangular, Δ(t) = (t − 1)(t − 3)(t − 5)(t − 6).

9.5. Find the characteristic polynomial Δ(t) of each of the following linear operators:

(a) F: R² → R² defined by F(x, y) = (3x + 5y; 2x − 7y).

(b) D: V → V defined by D(f) = df/dt, where V is the space of functions with basis S = {sin t, cos t}.

The characteristic polynomial Δ(t) of a linear operator is equal to the characteristic polynomial of any matrix A that represents the linear operator.

(a) Find the matrix A that represents T relative to the usual basis of R². We have

(b) Find the matrix A representing the differential operator D relative to the basis S. We have

9.6. Show that a matrix A and its transpose A^T have the same characteristic polynomial.

By the transpose operation, (tI − A)^T = tI^T − A^T = tI − A^T. Because a matrix and its transpose have the same determinant,

9.7. Prove Theorem 9.1: Let f and g be polynomials. For any square matrix A and scalar k,

(i) (f + g)(A) = f(A) + g(A),

(ii) (fg)(A) = f(A)g(A),

(iii) (kf)(A) = kf(A),

(iv) f(A)g(A) = g(A)f(A).

Suppose f = a_ntⁿ + · · · + a₁t + a₀ and g = b_mt^m + · · · + b₁t + b₀. Then, by definition,

Images

(i) Suppose m ≤ n and let b_i = 0 if i > m. Then

Images

Hence,

Images

(ii) By definition, Images , where

Images

Hence, Images and

Images

(iii) By definition, kf = ka_ntⁿ · · · + ka₁t + ka₀, and so

Images

(iv) By (ii), g(A)f(A) = (gf)(A) = (fg)(A) = f(A)g(A).

9.8. Prove the Cayley–Hamilton Theorem 9.2: Every matrix A is a root of its characterstic polynomial Δ(t).

Let A be an arbitrary n-square matrix and let Δ(t) be its characteristic polynomial, say,

Images

Now let B(t) denote the classical adjoint of the matrix tI − A. The elements of B(t) are cofactors of the matrix tI − A and hence are polynomials in t of degree not exceeding n − 1. Thus,

Images

where the B_i are n-square matrices over K which are independent of t. By the fundamental property of the classical adjoint (Theorem 8.9), (tI − A)B(t) = |tI − A|I, or

Images

Removing the parentheses and equating corresponding powers of t yields

Images

Multiplying the above equations by Aⁿ, A^{n − 1}, …, A, I, respectively, yields

Images

Adding the above matrix equations yields 0 on the left-hand side and Δ(A) on the right-hand side; that is,

Images

Therefore, Δ(A) = 0, which is the Cayley–Hamilton theorem.

Eigenvalues and Eigenvectors of 2 × 2 Matrices

9.9. Let Images .

(a) Find all eigenvalues and corresponding eigenvectors.

(b) Find matrices P and D such that P is nonsingular and D = P⁻¹AP is diagonal.

(a) First find the characteristic polynomial Δ(t) of A:

The roots λ = 2 and λ = 5 of Δ(t) are the eigenvalues of A. We find corresponding eigenvectors.

(i) Subtract λ = 2 down the diagonal of A to obtain the matrix M = A − 2I, where the corresponding homogeneous system MX = 0 yields the eigenvectors corresponding to λ = 2. We have

The system has only one free variable, and υ₁ = (4, 1) is a nonzero solution. Thus, υ₁ = (4, 1) is an eigenvector belonging to (and spanning the eigenspace of) λ = 2.

(ii) Subtract λ = −5 (or, equivalently, add 5) down the diagonal of A to obtain

The system has only one free variable, and υ₂ = (1, 2) is a nonzero solution. Thus, υ₂ = (1, 2) is an eigenvector belonging to λ = 5.

(b) Let P be the matrix whose columns are υ₁ and υ₂. Then

Note that D is the diagonal matrix whose diagonal entries are the eigenvalues of A corresponding to the eigenvectors appearing in P.

Remark: Here P is the change-of-basis matrix from the usual basis of R² to the basis S = {υ₁, υ₂}, and D is the matrix that represents (the matrix function) A relative to the new basis S.

9.10. Let Images .

(a) Find all eigenvalues and corresponding eigenvectors.

(b) Find a nonsingular matrix P such that D = P⁻¹AP is diagonal, and P⁻¹.

(d) Find a “real cube root” of B—that is, a matrix B such that B³ = A and B has real eigenvalues.

(a) First find the characteristic polynomial Δ(t) of A:

The roots λ = 1 and λ = 4 of Δ(t) are the eigenvalues of A. We find corresponding eigenvectors.

(i) Subtract λ = 1 down the diagonal of A to obtain the matrix M = A − λI, where the corresponding homogeneous system MX = 0 yields the eigenvectors belonging to λ = 1. We have

The system has only one independent solution; for example, x = 2, y = 1. Thus, υ₁ = (2, −1) is an eigenvector belonging to (and spanning the eigenspace of) λ = 1.

(ii) Subtract λ = 4 down the diagonal of A to obtain

The system has only one independent solution; for example, x = 1, y = 1. Thus, υ₂ = (1, 1) is an eigenvector belonging to λ = 4.

(b) Let P be the matrix whose columns are υ₁ and υ₂. Then

Also, f (1) = 2 and f (4) = 1. Hence,

(d) Here is the real cube root of D. Hence the real cube root of A is

9.11. Each of the following real matrices defines a linear transformation on R²:

Find, for each matrix, all eigenvalues and a maximum set S of linearly independent eigenvectors. Which of these linear operators are diagonalizable—that is, which can be represented by a diagonal matrix?

(a) First find Δ(t) = t² − 3t − 28 = (t − 7)(t + 4). The roots λ = 7 and λ = 4 are the eigenvalues of A. We find corresponding eigenvectors.

(i) Subtract λ = 7 down the diagonal of A to obtain

Here υ₁ = (3, 1) is a nonzero solution.

(ii) Subtract λ = −4 (or add 4) down the diagonal of A to obtain

Here υ₂ = (2, −3) is a nonzero solution.

Then S = {υ₁, υ₂} = {(3, 1), (2, −3)} is a maximal set of linearly independent eigenvectors. Because S is abasisof R², A is diagonalizable. Using the basis S, A is represented by the diagonal matrix D = diag(7, −4).

(b) First find the characteristic polynomial Δ(t) = t² + 1. There are no real roots. Thus B, a real matrix representing a linear transformation on R², has no eigenvalues and no eigenvectors. Hence, in particular, B is not diagonalizable.

(c) First find Δ(t) = t² − 8t + 16 = (t − 4)². Thus, λ = 4 is the only eigenvalue of C. Subtract λ = 4 down the diagonal of C to obtain

The homogeneous system has only one independent solution; for example, x = 1, y = 1. Thus, υ = (1, 1) is an eigenvector of C. Furthermore, as there are no other eigenvalues, the singleton set S = {υ} = {(1, 1)} is a maximal set of linearly independent eigenvectors of C. Furthermore, because S is not a basis of R², C is not diagonalizable.

9.12. Suppose the matrix B in Problem 9.11 represents a linear operator on complex space C². Show that, in this case, B is diagonalizable by finding a basis S of C² consisting of eigenvectors of B.

The characteristic polynomial of B is still Δ(t) = t² + 1. As a polynomial over C, Δ(t) does factor; specifically, Δ(t) = (t − i)(t + i). Thus, λ = i and λ = i are the eigenvalues of B.

(i) Subtract λ = i down the diagonal of B to obtain the homogeneous system

Images

The system has only one independent solution; for example, x = 1, y = 1 − i. Thus, υ₁ = (1, 1 − i) is an eigenvector that spans the eigenspace of λ = i.

(ii) Subtract λ = i (or add i) down the diagonal of B to obtain the homogeneous system

Images

The system has only one independent solution; for example, x = 1, y = 1 + i. Thus, υ₂ = (1, 1 + i) is an eigenvector that spans the eigenspace of λ = i.

As a complex matrix, B is diagonalizable. Specifically, S = {υ₁, υ₂} = {(1, 1 − i), (1, 1 + i)} is a basis of C² consisting of eigenvectors of B. Using this basis S, B is represented by the diagonal matrix D = diag(i, −i).

9.13. Let L be the linear transformation on R² that reflects each point P across the line y = kx, where k > 0. (See Fig. 9-1.)

(a) Show that υ₁ = (k, 1) and υ₂ = (1, −k) are eigenvectors of L.

(b) Show that L is diagonalizable, and find a diagonal representation D.

Images

Figure 9-1

(a) The vector υ₁ = (k, 1) lies on the line y = kx, and hence is left fixed by L; that is, L(υ₁) = υ₁. Thus, υ₁ is an eigenvector of L belonging to the eigenvalue λ₁ = 1.

The vector υ₂ = (1, − k) is perpendicular to the line y = kx, and hence, L reflects υ₂ into its negative; that is, L(υ₂) = υ₂. Thus, υ₂ is an eigenvector of L belonging to the eigenvalue λ₂ = 1.

(b) Here S = {υ₁, υ₂} is a basis of R² consisting of eigenvectors of L. Thus, L is diagonalizable, with the diagonal representation Images (relative to the basis S).

Eigenvalues and Eigenvectors

9.14. Let Images . (a) Find all eigenvalues of A.

(b) Find a maximum set S of linearly independent eigenvectors of A.

(a) First find the characteristic polynomial Δ(t) of A. We have

Also, find each cofactor A_ii of a_ii in A:

Hence,

Assuming Δt has a rational root, it must be among ±1, ±3, ±5, ±9, ±15, ±45. Testing, by synthetic division, we get

Thus, t = 3 is a root of Δ(t). Also, t − 3 is a factor and t² − 8t + 15 is a factor. Hence,

Accordingly, λ = 3 and λ = 5 are eigenvalues of A.

(b) Find linearly independent eigenvectors for each eigenvalue of A.

(i) Subtract λ = 3 down the diagonal of A to obtain the matrix

Here u = (1, −1, 0) and υ = (1, 0, 1) are linearly independent solutions.

(ii) Subtract λ = 5 down the diagonal of A to obtain the matrix

Only z is a free variable. Here w = (1, 2, 1) is a solution.

Thus, S = {u, υ, w} = {(1, −1, 0), (1, 0, 1), (1, 2, 1)g} is a maximal set of linearly independent eigenvectors of A.

Remark: The vectors u and υ were chosen so that they were independent solutions of the system x + y − z = 0. On the other hand, w is automatically independent of u and υ because w belongs to a different eigenvalue of A. Thus, the three vectors are linearly independent.

(c) A is diagonalizable, because it has three linearly independent eigenvectors. Let P be the matrix with columns u, υ, w. Then

9.15. Repeat Problem 9.14 for the matrix Images .

(a) First find the characteristic polynomial Δ(t) of B. We have

Images

Therefore, Δ(t) = t³ − 12t + 16 = (t − 2)²(t + 4). Thus, λ₁ = 2 and λ₂ = 4 are the eigenvalues of B.

(b) Find a basis for the eigenspace of each eigenvalue of B.

(i) Subtract λ₁ = 2 down the diagonal of B to obtain

Images

The system has only one independent solution; for example, x = 1, y = 1, z = 0. Thus, u = (1, 1, 0) forms a basis for the eigenspace of λ₁ = 2.

(ii) Subtract λ₂ = −4 (or add 4) down the diagonal of B to obtain

Images

The system has only one independent solution; for example, x = 0, y = 1, z = 1. Thus, υ = (0, 1, 1) forms a basis for the eigenspace of λ₂ = −4.

Thus S = {u, υ} is a maximal set of linearly independent eigenvectors of B.

(c) Because B has at most two linearly independent eigenvectors, B is not similar to a diagonal matrix; that is, B is not diagonalizable.

9.16. Find the algebraic and geometric multiplicities of the eigenvalue λ₁ = 2 of the matrix B in Problem 9.15.

The algebraic multiplicity of λ₁ = 2 is 2, because t − 2 appears with exponent 2 in Δ(t). However, the geometric multiplicity of λ₁ = 2 is 1, because dim E_λ₁ = 1 (where E_λ₁ is the eigenspace of λ₁).

9.17. Let T: R³ → R³ be defined by T(x, y, z) = (2x + y − 2z, 2x + 3y − 4z, x + y − z). Find all eigenvalues of T, and find a basis of each eigenspace. Is T diagonalizable? If so, find the basis S of R³ that diagonalizes T, and find its diagonal representation D.

First find the matrix A that represents T relative to the usual basis of R³ by writing down the coefficients of x, y, z as rows, and then find the characteristic polynomial of A (and T). We have

Therefore, Δ(t) = t³ − 4t² + 5t − 2 = (t − 1)²(t − 2), and so λ = 1 and λ = 2 are the eigenvalues of A (and T). We next find linearly independent eigenvectors for each eigenvalue of A.

(i) Subtract λ = 1 down the diagonal of A to obtain the matrix

Here y and z are free variables, and so there are two linearly independent eigenvectors belonging to λ = 1. For example, u = (1, −1, 0) and υ = (2, 0, 1) are two such eigenvectors.

(ii) Subtract λ = 2 down the diagonal of A to obtain

Only z is a free variable. Here w = (1, 2, 1) is a solution.

Thus, T is diagonalizable, because it has three independent eigenvectors. Specifically, choosing

as a basis, T is represented by the diagonal matrix D = diag(1, 1, 2).

9.18. Prove the following for a linear operator (matrix) T:

(a) The scalar 0 is an eigenvalue of T if and only if T is singular.

(b) If λ is an eigenvalue of T, where T is invertible, then λ⁻¹ is an eigenvalue of T⁻¹.

(a) We have that 0 is an eigenvalue of T if and only if there is a vector υ ≠ 0 such that T(υ) = 0 υ—that is, if and only if T is singular.

(b) Because T is invertible, it is nonsingular; hence, by (a), λ ≠ 0. By definition of an eigenvalue, there exists υ ≠ 0 such that T(υ) = λ υ. Applying T⁻¹ to both sides, we obtain

Therefore, λ⁻¹ is an eigenvalue of T⁻¹.

9.19. Let λ be an eigenvalue of a linear operator T: V → V, and let E_λ consists of all the eigenvectors belonging to λ (called the eigenspace of λ). Prove that E_λ is a subspace of V. That is, prove

(a) If u ∈ E_λ, then ku ∈ E_λ for any scalar k. (b) If u, υ, ∈ E_λ, then u + υ ∈ E_λ.

(a) Because u ∈ E_λ, we have T(u) = λu. Then T(ku) = kT(u) = k(λu) = λ(ku), and so ku ∈ E_λ: (We view the zero vector 0 ∈ V as an“eigenvector” of λ in order for E_λ to be a subspace of V.)

(b) As u, υ ∈ E_λ, we have T(u) = λu and T(υ) = λ υ. Then T(u + υ) = T(u) + T(υ) = λu + λ υ = λ(u + υ); and so u + υ ∈ E_λ

9.20. Prove Theorem 9.6: The following are equivalent: (i) The scalar λ is an eigenvalue of A.

(ii) The matrix λI − A is singular.

(iii) The scalar λ is a root of the characteristic polynomial Δ(t) of A.

The scalar λ is an eigenvalue of A if and only if there exists a nonzero vector υ such that

Images

or λI − A is singular. In such a case, λ is a root of Δ(t) = |tI − A|. Also, υ is in the eigenspace E_λ of λ if and only if the above relations hold. Hence, υ is a solution of (λI − A)X = 0.

9.21. Prove Theorem 9.8′: Suppose υ₁, υ₂, …, υ_n are nonzero eigenvectors of T belonging to distinct eigenvalues λ₁, λ₂, …, λ_n. Then υ₁, υ₂, …, υ_n are linearly independent.

Suppose the theorem is not true. Let υ₁, υ₂, …, υ_s be a minimal set of vectors for which the theorem is not true. We have s > 1, because υ₁ ≠ 0. Also, by the minimality condition, υ₂, …, υ_s are linearly independent. Thus, υ₁ is a linear combination of υ₂, …, υ_s, say,

(where some a_k ≠ 0). Applying T to (1) and using the linearity of T yields

Because υ_j is an eigenvector of T belonging to λ_j, we have T(υ_j) = λ_j υ_j. Substituting in (2) yields

Multiplying (1) by λ₁ yields

Setting the right-hand sides of (3) and (4) equal to each other, or subtracting (3) from (4) yields

Because υ₂, υ₃, …, υ_s are linearly independent, the coefficients in (5) must all be zero. That is,

However, the λ_i are distinct. Hence λ₁ − λ_j ≠ 0 for j > 1. Hence, a₂ = 0, a₃ = 0, …, a_s = 0. This contradicts the fact that some a_k ≠ 0. The theorem is proved.

9.22. Prove Theorem 9.9. Suppose Δ(t) = (t − a₁)(t − a₂) … (t − a_n) is the characteristic polynomial of an n-square matrix A, and suppose the n roots a_i are distinct. Then A is similar to the diagonal matrix D = diag(a₁, a₂, …, a_n).

Let υ₁, υ₂, …, υ_n be (nonzero) eigenvectors corresponding to the eigenvalues a_i. Then the n eigenvectors υ_i are linearly independent (Theorem 9.8), and hence form a basis of Kⁿ. Accordingly, A is diagonalizable (i.e., A is similar to a diagonal matrix D), and the diagonal elements of D are the eigenvalues a_i.

9.23. Prove Theorem 9.10′: The geometric multiplicity of an eigenvalue λ of T does not exceed its algebraic multiplicity.

Suppose the geometric multiplicity of λ is r. Then its eigenspace E_λ contains r linearly independent eigenvectors υ₁, …, υ_r. Extend the set {υ_i} to a basis of V, say, {υ_i, …, υ_r, w₁, …, w_s}. We have

Images

Then Images is the matrix of T in the above basis, where A = [a_ij]^T and B = [b_ij]^T:

Because M is block diagonal, the characteristic polynomial (t − λ)^r of the block λI_r must divide the characteristic polynomial of M and hence of T. Thus, the algebraic multiplicity of λ for T is at least r, as required.

Diagonalizing Real Symmetric Matrices and Quadratic Forms

9.24. Let Images . Find an orthogonal matrix P such that D = P⁻¹AP is diagonal.

First find the characteristic polynomial Δ(t) of A. We have

Thus, the eigenvalues of A are λ = 8 and λ = 2. We next find corresponding eigenvectors.

Subtract λ = 8 down the diagonal of A to obtain the matrix

A nonzero solution is u₁ = (3, 1).

Subtract λ = −2 (or add 2) down the diagonal of A to obtain the matrix

A nonzero solution is u₂ = (1, −3).

As expected, because A is symmetric, the eigenvectors u₁ and u₂ are orthogonal. Normalize u₁ and u₂ to obtain, respectively, the unit vectors

Finally, let P be the matrix whose columns are the unit vectors û₁ and û₂, respectively. Then

As expected, the diagonal entries in D are the eigenvalues of A.

9.25. Let Images . (a) Find all eigenvalues of B.

(b) Find a maximal set S of nonzero orthogonal eigenvectors of B.

(a) First find the characteristic polynomial of B. We have

Hence, Δ(t) = t³ − 6t² − 135t − 400. If Δ(t) has an integer root it must divide 400. Testing t = −5, by synthetic division, yields

Thus, t + 5 is a factor of Δ(t), and t² − 11t − 80 is a factor. Thus,

The eigenvalues of B are λ = 5 (multiplicity 2), and λ = 16 (multiplicity 1).

(b) Find an orthogonal basis for each eigenspace. Subtract λ = 5 (or, add 5) down the diagonal of B to obtain the homogeneous system

That is, 4x − 2y + z = 0. The system has two independent solutions. One solution is υ₁ = (0, 1, 2). We seek a second solution υ₂ = (a, b, c), which is orthogonal to υ₁, such that

One such solution is υ₂ = (−5, −8, 4).

Subtract λ = 16 down the diagonal of B to obtain the homogeneous system

This system yields a nonzero solution υ₃ = (4, −2, 1). (As expected from Theorem 9.13, the eigenvector υ₃ is orthogonal to υ₁ and υ₂.)

Then υ₁, υ₂, υ₃ form a maximal set of nonzero orthogonal eigenvectors of B.

Images

Then P is the matrix whose columns are Images . Thus,

Images

9.26. Let q(x, y) = x² + 6xy − 7y². Find an orthogonal substitution that diagonalizes q.

Find the symmetric matrix A that represents q and its characteristic polynomial Δ(t). We have

The eigenvalues of A are λ = 2 and λ = 8. Thus, using s and t as new variables, a diagonal form of q is

The corresponding orthogonal substitution is obtained by finding an orthogonal set of eigenvectors of A.

(i) Subtract λ = 2 down the diagonal of A to obtain the matrix

A nonzero solution is u₁ = (3, 1).

(ii) Subtract λ = −8 (or add 8) down the diagonal of A to obtain the matrix

A nonzero solution is u₂ = (−1, 3).

As expected, because A is symmetric, the eigenvectors u₁ and u₂ are orthogonal.

Now normalize u₁ and u₂ to obtain, respectively, the unit vectors

Finally, let P be the matrix whose columns are the unit vectors û₁ and û₂, respectively, and then [x, y]^T = P[s, t]^T is the required orthogonal change of coordinates. That is,

One can also express s and t in terms of x and y by using P⁻¹ = P^T. That is,

Minimal Polynomial

9.27. Let Images and Images . The characteristic polynomial of both matrices is Δ(t) = (t − 2)(t − 1)². Find the minimal polynomial m(t) of each matrix.

The minimal polynomial m(t) must divide Δ(t). Also, each factor of Δ(t) (i.e., t − 2 and t − 1) must also be a factor of m(t). Thus, m(t) must be exactly one of the following:

(a) By the Cayley–Hamilton theorem, g(A) = Δ(A) = 0, so we need only test f(t). We have

Thus, m(t) = f(t) = (t − 2)(t − 1) = t² − 3t + 2 is the minimal polynomial of A.

(b) Again g(B) = Δ(B) = 0, so we need only test f(t). We get

Thus, m(t) ≠ f(t). Accordingly, m(t) = g(t) = (t − 2)(t − 1)² is the minimal polynomial of B. [We emphasize that we do not need to compute g(B); we know g(B) = 0 from the Cayley–Hamilton theorem.]

9.28. Find the minimal polynomial m(t) of each of the following matrices:

(a)

(a) The characteristic polynomial of A is Δ(t) = t² − 12t + 32 = (t − 4)(t − 8). Because Δ(t) has distinct factors, the minimal polynomial m(t) = Δ(t) = t² − 12t + 32.

(b) Because B is triangular, its eigenvalues are the diagonal elements 1, 2, 3; and so its characteristic polynomial is Δ(t) = (t − 1)(t − 2)(t − 3). Because Δ(t) has distinct factors, m(t) = Δ(t).

(c) The characteristic polynomial of C is Δ(t) = t² − 6t + 9 = (t − 3)². Hence the minimal polynomial of C is f(t) = t 3 or g(t) = (t − 3)². However, f(C) ≠ 0; that is, C − 3I ≠ 0. Hence,

9.29. Suppose S = {u₁, u₂, …, u_n} is a basis of V, and suppose F and G are linear operators on V such that [F] has 0’s on and below the diagonal, and [G] has a ≠ 0 on the superdiagonal and 0’s elsewhere. That is,

Show that (a) Fⁿ = 0, (b) Gⁿ⁻¹ ≠ 0, but Gⁿ = 0. (These conditions also hold for [F] and [G].)

(a) We have F(u₁) = 0 and, for r > 1, F(u_r) is a linear combination of vectors preceding u_r in S. That is,

Hence, F²(u_r) = F(F(u_r)) is a linear combination of vectors preceding u_r−1, and so on. Hence, F^r(u_r) = 0 for each r. Thus, for each r, Fⁿ(u_r) = F^n−r (0) = 0, and so Fⁿ = 0, as claimed.

(b) We have G(u₁) = 0 and, for each k > 1, G(u_k) = au_k−1. Hence, G^r(u_k) = a^ru_k−r for r < k. Because a ≠ 0, aⁿ⁻¹ ≠ 0. Therefore, Gⁿ⁻¹ (u_n) = aⁿ⁻¹ u₁ ≠ 0, and so Gⁿ⁻¹ ≠ 0. On the other hand, by (a), ^Gⁿ = 0.

9.30. Let B be the matrix in Example 9.12(a) that has 1’s on the diagonal, a’s on the superdiagonal, where a ≠ 0, and 0’s elsewhere. Show that f(t) = (t − λ)ⁿ is both the characteristic polynomial Δ(t) and the minimum polynomial m(t) of A.

Because A is triangular with λ’s on the diagonal, Δ(t) = f(t) = (t − λ)ⁿ is its characteristic polynomial. Thus, m(t) is a power of t − λ. By Problem 9.29, (A − λI)^r−1 ≠ 0. Hence, m(t) = Δ(t) = (t − λ)ⁿ.

9.31. Find the characteristic polynomial Δ(t) and minimal polynomial m(t) of each matrix:

(a) M is block diagonal with diagonal blocks

The characteristic and minimal polynomial of A is f(t) = (t − 4)³ and the characteristic and minimal polynomial of B is g(t) = (t − 4)². Then

(where LCM means least common multiple). We emphasize that the exponent in m(t) is the size of the largest block.

(b) Here M′ is block diagonal with diagonal blocks and The characteristic and minimal polynomial of A′ is f(t) = (t − 2)². The characteristic polynomial of B′ is g(t) = t² − 5t + 6 = (t − 2)(t − 3), which has distinct factors. Hence, g(t) is also the minimal polynomial of B. Accordingly,

9.32. Find a matrix A whose minimal polynomial is f(t) = t³ − 8t² + 5t + 7.

Simply let Images , the companion matrix of f(t) [defined in Example 9.12(b)].

9.33. Prove Theorem 9.15: The minimal polynomial m(t) of a matrix (linear operator) A divides every polynomial that has A as a zero. In particular (by the Cayley–Hamilton theorem), m(t) divides the characteristic polynomial Δ(t) of A.

Suppose f(t) is a polynomial for which f(A) = 0. By the division algorithm, there exist polynomials q(t) and r(t) for which f(t) = m(t)q(t) + r(t) and r(t) = 0 or deg r(t) < deg m(t). Substituting t = A in this equation, and using that f(A) = 0 and m(A) = 0, we obtain r(A) = 0. If r(t) ≠ 0, then r(t) is a polynomial of degree less than m(t) that has A as a zero. This contradicts the definition of the minimal polynomial. Thus, r(t) = 0, and so f(t) = m(t)q(t); that is, m(t) divides f(t).

9.34. Let m(t) be the minimal polynomial of an n-square matrix A. Prove that the characteristic polynomial Δ(t) of A divides [m(t)ⁿ.

Suppose m(t) = t^r + c₁t^r−1 · · · + c_r−1t + c_r. Define matrices B_j as follows:

Then

Set

Then

Taking the determinant of both sides gives |tI − AjjB(t)| = |m(t)I| = [m(t)]ⁿ. Because |B(t)| is a polynomial, |tI − A| divides [m(t)]ⁿ; that is, the characteristic polynomial of A divides [m(t)]ⁿ.

9.35. Prove Theorem 9.16: The characteristic polynomial Δ(t) and the minimal polynomial m(t) of A have the same irreducible factors.

Suppose f(t) is an irreducible polynomial. If f(t) divides m(t), then f(t) also divides Δ(t) [because m(t) divides Δ(t). On the other hand, if f(t) divides Δ(t), then by Problem 9.34, f(t) also divides [m(t)]ⁿ. But f(t) is irreducible; hence, f(t) also divides m(t). Thus, m(t) and Δ(t) have the same irreducible factors.

9.36. Prove Theorem 9.19: The minimal polynomial m(t) of a block diagonal matrix M with diagonal blocks A_i is equal to the least common multiple (LCM) of the minimal polynomials of the diagonal blocks A_i.

We prove the theorem for the case r = 2. The general theorem follows easily by induction. Suppose Images , where A and B are square matrices. We need to show that the minimal polynomial m(t) of M is the LCM of the minimal polynomials g(t) and h(t) of A and B, respectively.

Because m(t) is the minimal polynomial of Images , and m(A) = 0 and m(B) = 0. Because g(t) is the minimal polynomial of A, g(t) divides m(t). Similarly, h(t) divides m(t). Thus m(t) is a multiple of g(t) and h(t).

Now let f(t) be another multiple of g(t) and h(t). Then Images . But m(t) is the minimal polynomial of M; hence, m(t) divides f(t). Thus, m(t) is the LCM of g(t) and h(t).