3

Linear Algebraic Equations, Determinants, and Matrices

1 INTRODUCTION

This chapter will be devoted to the discussion of the solution of linear algebraic equations and the related topics, determinants and matrices. These subjects are of extreme importance in applied mathematics, since a great many physical phenomena are expressed in terms of linear differential equations. By appropriate transformations the solution of a set of linear differential equations with constant coefficients may be reduced to the solution of a set of algebraic equations. Such problems as the determination of the transient behavior of an electrical circuit or the determination of the amplitudes and modes of oscillation of a dynamical system lead to the solution of a set of algebraic equations.

It is therefore important that the algebraic processes useful in the solution and manipulation of these equations should be known and clearly understood by the student.

2 SIMPLE DETERMINANTS

Before considering the properties of determinants in general, let us consider the solution of the following two linear equations:

images

If we multiply the first equation by a22 and the second by — a12 and add, we obtain

images

The expression (a11a22 — a21a12) may be represented by the symbol

images

This symbol is called a determinant of the second order. The solution of (2.2) is

images

Similarly, if we multiply the first equation of (2.1) by — a21 and the second one by a11 and add, we obtain

images

The expression for the solution of Eqs. (2.1) in terms of the determinants is very convenient and may be generalized to a set of n linear equations in n unknowns. We must first consider the fundamental definitions and rules of operation of determinants and matrices.

3 FUNDAMENTAL DEFINITIONS

Consider the square array of n2 quantities aij, where the subscripts i and j run from 1 to n, and written in the form

images

This square array of quantities, bordered by vertical bars, is a symbolical representation of a certain homogeneous polynomial of the nth order in the quantities aij to be defined later and constructed from the rows and columns of |a| in a certain manner. This symbolical representation is called a determinant. The n2 quantities aij are called the elements of the determinant.

In this brief treatment we cannot go into a detailed exposition of the fundamental theorems concerning the homogeneous polynomial that the determinant represents; only the essential theorems that are important to the solution of sets of linear equations will be considered. Before giving explicit rules concerning the construction of the homogeneous polynomial from the symbolic array of rows and columns, it will be necessary to define some terms that are of paramount importance in the theory of determinants.

a MINORS

If in the determinant |a| of (3.1) we delete the ith row and the jth column and form a determinant from all the elements remaining, we shall have a new determinant of n — 1 rows and columns. This new determinant is defined to be the minor of the element aij. For example, if |a| is a determinant of the fourth order,

images

the minor of the element a32 is denoted by M32 and is given by

images

b COFACTORS

The cofactor of an element of a determinant aij is the minor of that element with a sign attached to it determined by the numbers i and j which fix the position of aij in the determinant |a|. The sign is chosen by the equation

images

where Aij is the cofactor of the element aij and Mij is the minor of the element aij.

4 THE LAPLACE EXPANSION

We come now to a consideration of what may be regarded as the definition of the homogeneous polynomial of the nth order that the symbolical array of elements of the determinant represents. Let us, for simplicity, first consider the second-order determinant

images

By definition this symbolical array represents the second-order homogeneous polynomial

images

The third-order determinant

images

represents the unique third-order homogeneous polynomial defined by any one of the six equivalent expressions

images

where the elements aij in (4.4) must be taken from a single row or a single column of a. The Aij’s are the cofactors of the corresponding elements aij as defined in Sec. 3. As an example of this definition, we see that the third-order determinant may be expanded into the proper third-order homogeneous polynomial that it represents, in the following manner:

images

images

This expansion was obtained by applying the fundamental rule (4.4) and expanding in terms of the first row. Since one has the alternative of using any row or any column, it may be seen that (4.3) could be expanded in six different ways by the fundamental rule (4.4). It is easy to show that all six ways lead to the same third-order homogeneous polynomial (4.6).

The definition (4.4) may be generalized to the nth-order determinant (3.1), and this symbol is defined to represent the nth-order homogeneous polynomial given by

images

where the aij quantities must be taken from a single row or a single column. In this case the cofactors Aij are determinants of the (n – 1)st order, but they may in turn be expanded by the rule (4.7), and so on, until the result is a homogeneous polynomial of the nth order.

It is easy to demonstrate that, in general,

images

5 FUNDAMENTAL PROPERTIES OF DETERMINANTS

From the basic definition (4.7), the following properties of determinants may be deduced:

    a.    If all the elements in a row or in a column are zero, the determinant is equal to zero. This may be seen by expanding in terms of that row or column, in which case each term of the expansion contains a factor of zero.

    b.    If all elements but one in a row or column are zero, the determinant is equal to the product of that element and its cofactor.

    c.    The value of a determinant is not altered when the rows are changed to columns and the columns to rows. This may be proved by developing the determinant by (4.7).

    d.    The interchange of any two columns or two rows of a determinant changes the sign of the determinant.

    e.    If two columns or two rows of a determinant are identical, the determinant is equal to zero.

    f.    If all the elements in any row or column are multiplied by any factor, the determinant is multiplied by that factor.

    g.    If each element in any column or any row of a determinant is expressed as the sum of two quantities, the determinant can be expressed as the sum of two determinants of the same order.

    h.    Is it possible, without changing the value of a determinant, to multiply the elements of any row (or any column) by the same constant and add the products to any other row (or column). For example, consider the third-order determinant

images

With the aid of property g we have

images

Since the second determinant is zero because the first and third columns are identical, we have

images

6 THE EVALUATION OF NUMERICAL DETERMINANTS

The evaluation of determinants whose elements are numbers is a task of frequent occurrence in applied mathematics. This evaluation may be carried out by a direct application of the fundamental Laplacian expansion (4.7). This process, however, is most laborious for high-order determinants, and the expansion may be more easily effected by the application of the fundamental properties outlined in Sec. 5 and by the use of two theorems that will be mentioned in this section.

As an example, let it be required to evaluate the numerical determinant

images

This determinant may be transformed by the use of property b of Sec. 5. The procedure is to make all elements but one in some row or column equal to zero. The presence of the factor —1 in the second column suggests that we add four times the first row to the second, then add two times the first row to the third row, and finally add the first row to the fourth row. By Sec. 5h these operations do not change the value of the determinant. Hence we have

images

If we now subtract the elements of the last column from the first column, we have

images

We now add two times the third row to the first row and two times the third row to the second row and obtain

images

Now we may subtract the second row from the first row and obtain

images

This procedure is much shorter than a direct application of the Laplacian expansion rule.

A USEFUL THEOREM

We now turn to a consideration of a theorem of great power for evaluating numerical determinants. The method of evaluation, based on the theorem to be considered, was found most successful at the Mathematical Laboratory of the University of Edinburgh and was due originally to F. Chiò.

To deduce the theorem in question, consider the fourth-order determinant

images

We now notice whether any element is equal to unity. If not, we prepare the determinant in such a manner that one of the elements is unity. This may be done by dividing some row or column by a suitable number, say m, that will make one of the elements equal to unity and then placing the number m as a factor outside the determinant. The same result may also be effected in many cases with the aid of Sec. 5h. For simplicity we shall suppose that this has been done and that, in this case, the element b23 is equal to unity.

Now let us divide the various columns of the determinant b by b21 b22, b23, and b24, respectively; then in view of the fact that the element b23 has been made unity, we have

images

Now, if we subtract the elements of the third column from those of the other columns, we obtain

images

where

images

Expanding in terms of the second row, we obtain

images

Substituting this value of |a| into (6.8) and multiplying the various columns by the factors outside, we finally obtain

images

The theorem may be formulated by means of the following rule. The element that has been made unity at the start of the process is called the pivotal element. In this case it is the element b23. The rule says:

The row and column intersecting in the pivotal element of the original determinant, say the rth row and the sth column, are deleted; then every element u is diminished by the product of the elements which stand where the eliminated row and column are met by perpendiculars from u, and the whole determinant is multiplied by (–1)r+s.

By the application of this theorem we reduce the order of the determinant by one unit. Repeated application of this theorem reduces the determinant to one of the second order, and then its value is immediately written down.

Before turning to a consideration of linear algebraic equations and to the application of the theory of determinants to their solution, a brief discussion of matrix algebra and the fundamental operations involving matrices is necessary. In this chapter the fundamental definitions and the most important properties of matrices will be outlined.

7 DEFINITION OF A MATRIX

By a square matrix a of order n is meant a system of elements that may be real or complex numbers arranged in a square formation of n rows and columns. That is, the symbol [a] stands for the array

images

where aij denotes the element standing in the ith row and jth column. The determinant having the same elements as the matrix [a] is denoted by |a| and is called the determinant of the matrix.

Besides square arrays like (7.1) we shall have occasion to use rectangular arrays or matrices of m rows and n columns. Such arrays will be called matrices of order (m × n). Where there is only one row so that m = 1, the matrix will be termed a vector of the first kind, or a prime. As an example, we have

images

On the other hand, a matrix of a single column of n elements will be termed a vector of the second kind, or a point. To save space, it will be printed horizontally and not vertically and will be denoted by parentheses thus:

images

TRANSPOSITION OF MATRICES

The accented matrix [a] = [aji] obtained by a complete interchange of rows and columns in [a] is called the transposed matrix of [a]. The ith row of [a] is identical with the ith column of [a]. For vectors we have

images

For rectangular matrices the transpose is

images

8 SPECIAL MATRICES

    a.    Square matrix: If the numbers of rows and of columns of a matrix are equal to n, then such a matrix is said to be a square matrix of order n, or simply a matrix of order n.

    b.    Diagonal matrix: If all the elements other than those in the principal diagonal are zero, then the matrix is called a diagonal matrix.

    c.    The unit matrix: The unit matrix of order n is defined to be the diagonal matrix of order n which has l’s for all its diagonal elements. It is denoted by Un or simply by U when its order is apparent.

    d.    Symmetrical matrices: If aij = aji for all i and j, the matrix [a] is said to be symmetrical and it is identical to its transposed matrix. That is, if [a] is symmetrical, then [a]′ = [a], and conversely.

    e.    Skew-symmetric matrix: If aij = —aji for all unequal i and j, but the elements aii are not all zero, then the matrix is called a skew matrix.

If aij = —aji for all i and j so that all aii = 0, the matrix is called a skew-symmetric matrix. It may be noted that both symmetrical and skew matrices are necessarily square.

f. Null matrices: If a matrix has all its elements equal to zero, it is called a null matrix and is represented by [0].

9 EQUALITY OF MATRICES; ADDITION AND SUBTRACTION

It is apparent from what has been said concerning matrices that a matrix is entirely different from a determinant. A determinant is a symbolic representation of a certain homogeneous polynomial formed from the elements of the determinant as described in Sec. 4. A matrix, on the other hand, is merely a square or rectangular array of quantities. By defining certain rules of operation that prescribe the manner in which these arrays are to be manipulated, a certain algebra may be developed that has a formal similarity to ordinary algebra but involves certain operations that are performed on the elements of the matrices. It is to a consideration of these fundamental definitions and rules of operation that we now turn.

a EQUALITY OF MATRICES

The concept of equality is fundamental in algebra and is likewise of fundamental importance in matrix algebra. In matrix algebra two matrices [a] and [b] of the same order are defined to be equal if and only if their corresponding elements are identical; that is, we have

images

provided that

images

b ADDITION AND SUBTRACTION

If [a] and [b] are matrices of the same order, then the sum of [a] and [b] is defined to be a matrix [c], the typical element of which is cij = aij + bij. In other words, by definition

images

provided

images

In a similar manner we have

images

provided that

images

10 MULTIPLICATION OF MATRICES

a SCALAR MULTIPLICATION

By definition, multiplication of a matrix [a] by an ordinary number or scalar k results in a new matrix b defined by

images

where

images

That is, by definition, the multiplication of a matrix by a scalar quantity yields a new matrix whose elements are obtained by multiplying the elements of the original matrix by the scalar multiplier.

b MATRIX MULTIPLICATION

The definition of the operation of multiplication of matrices by matrices differs in important respects from ordinary scalar multiplication. The rule of multiplication is such that two matrices can be multiplied only when the number of columns of the first is equal to the number of rows of the second. Matrices that satisfy this condition are termed conformable matrices.

Definition The product of a matrix [a] by a matrix [b] is defined by the equation

images

where

images

and the orders of the matrices [a], [b], and [c] are (m × p), (p × n), and (m × n), respectively. As an example of this definition, let us consider the multiplication of the matrix

images

by

images

By applying the definition we obtain

images

If the matrices are square and each of order n, then the corresponding relation |a| |b| = |c| is true for the determinants of the matrices [a], [b], and [c]. The proof of this frequently useful result will be omitted.

Since matrices are regarded as equal only when they are element for element identical, it follows that, since a row-by-column rule will, in general, give different elements than a column-by-row rule, the product [b][a], when it exists, is usually different from [a][b]. Therefore it is necessary to distinguish between premultiplication, as when [b] is premultiplied by [a] to yield the product [a][b], and postmultiplication, as when [b] is postmultiplied by [a] to yield the product [b][a]. If we have the equality

images

the matrices a and b are said to commute or to be permutable. The unit matrix U, it may be noted, commutes with any square matrix of the same order. That is, we have for all such [a]

images

c CONTINUED PRODUCTS OF MATRICES

Except for the noncommutative law of multiplication (and therefore of division, which is defined as the inverse operation), all the ordinary laws of algebra apply to matrices. Of particular importance is the associative law of continued products,

images

which allows one to dispense with parentheses and to write [a][b][c] without ambiguity, since the double summation

images

can be carried out in either of the orders indicated.

It must be noted, however, that the product of a chain of matrices will have meaning only if the adjacent matrices of the chain are conformable.

d POSITIVE POWERS OF A SQUARE MATRIX

If a square matrix is multiplied by itself n times, the resultant matrix is defined as [a]n. That is,

images

11 MATRIX DIVISION; THE INVERSE MATRIX

If the determinant |a| of a square matrix [a] does not vanish, [a] is said to be nonsingular and possesses a reciprocal, or inverse, matrix [R] such that

images

where U is the unit matrix of the same order as [a].

a THE ADJOINT MATRIX OF A MATRIX

Let Aij denote the cofactor of the element aij in the determinant |a| of the matrix [a]. Then the matrix [Aji] is called the adjoint of the matrix [a]. This matrix exists whether [a] is singular or not. Now by (4.8) we have

images

It is thus seen that the product of [a] and its adjoint is a special type of diagonal matrix called a scalar matrix. Each diagonal element i = j is equal to the determinant |a|, and the other elements are zero.

If |a| ≠ 0, we may divide through by the scalar |a| and hence obtain at once the required form of [R], the inverse of [a]. From (11.2) we thus have

images

Therefore, comparing this with (11.1) we see that

images

The notation [a]–1 is introduced to denote the inverse of [a]. By actual multiplication it may be proved that

images

so that the name reciprocal and the notation [a]–1 are justified.

If a square matrix is nonsingular, it possesses a reciprocal, and multiplication by the reciprocal is in many ways analogous to division in ordinary algebra. As an illustration, let it be required to obtain the inverse of the matrix

images

Let |a| denote the determinant of [a]. The next step in the process is to form the transpose of a.

images

We now replace the elements of [a] by their corresponding cofactors and obtain

images

This inverse [a]–1 is therefore

images

As has been mentioned, multiplication by an inverse matrix plays the same role in matrix algebra that division plays in ordinary algebra. That is, if we have

images

where [a] is a nonsingular matrix, then, on premultiplying by [a]-1, the inverse of [a], we obtain

images

or

images

b NEGATIVE POWERS OF A SQUARE MATRIX

If [a] is a nonsingular matrix, then negative powers of [a] are defined by raising the inverse matrix of [a], [a]–1, to positive powers. That is,

images

12 THE REVERSALLAWINTRANSPOSEDAND RECIPROCATED PRODUCTS

One of the fundamental consequences of the noncommutative law of matrix multiplication is the reversal law exemplified in transposing and reciprocating a continued product of matrices.

a TRANSPOSITION

Let [a] be a (p × n) matrix, that is, one having; p rows and n columns, and let [b] be an (m × p) matrix. Then the product [c] = [b][a] is an (m × n) matrix of which the typical element is

images

When transposed, [a] and [b] become (n ×p) and (p × m) matrices; they are now conformable when multiplied in the order [a]′[b]′. This product is an (n × m) matrix, which may be readily seen to be the transpose of [c] since its typical element is

images

It thus follows that when a matrix product is transposed the order of the matrices forming the product must be reversed. That is,

images

and similarly

images

b RECIPROCATION

Let us suppose that in the equation [c] = [b][a] the matrices are square and nonsingular. If we premultiply both sides of the equation by [a]–1[b]–1 and postmultiply by [c]–1, then we obtain [a]–1[b]–1 = [c]–1. We thus get the rule that

images

Similarly

images

13 PROPERTIES OF DIAGONAL AND UNIT MATRICES

Suppose that [a] is a square matrix of order n and [b] is a diagonal matrix, that is, a matrix that has all its elements zero with the exception of the elements in the main diagonal, and is of the same order as [a]. Then, if [c] = [b][a], we have

images

since bir = 0 unless r = i.

It is thus seen that premultiplication by a diagonal matrix has the effect of multiplying every element in any row of a matrix by a constant. It can be similarly shown that postmultiplication by a diagonal matrix results in the multiplication of every element in any column of a matrix by a constant. The unit matrix plays the same role in matrix algebra that the number 1 does in ordinary scalar algebra.

14 MATRICES PARTITIONED INTO SUBMATRICES

It is sometimes convenient to extend the use of the fundamental laws of combinations of matrices to the case where a matrix is regarded as constructed from elements that are submatrices or minor matrices of elements. As an example, consider

images

This can be written in the form

images

where

images

In this case the diagonal submatrices [P] and [S] are square, and the partitioning is diagonally symmetrical. Let [b] be a square matrix of the third order that is similarly partitioned.

images

Then, by addition and multiplication, we have

images

images

as may be easily verified.

In each case the resulting matrix is of the same order that is partitioned in the same way as the original matrix factors. As has been stated in Sec. 10, a rectangular matrix [b] may be premultiplied by another rectangular matrix [a] provided the two matrices are conformable, that is, the number of rows of [b] is equal to the number of columns of [a]. Now, if [a] and [b] are both partitioned into submatrices such that the grouping of columns in [a] agrees with the grouping of rows in [b], it can be shown that the product [a][b] may be obtained by treating the submatrices as elements and proceeding according to the multiplication rule.

15 MATRICES OF SPECIAL TYPES

    a.    Conjugate matrices: To the operations [a] and [a]–1, defined by transposition and inversion, may be added another one. This operation is denoted by [ā] and implies that if the elements of [a] are complex numbers, the corresponding elements of [ā] are their respective complex conjugates. The matrix [ā] is called the conjugate of [a].

    b.    The associate of [a]: The transposed conjugate of [a], [ā], is called the associate of [a].

    c.    Symmetric matrix: If [a] = [a], the matrix [a] is symmetric.

    d.    Involutory matrix: If [a] = [a]–1, the matrix [a] is involutory.

    e.    Real matrix: If [a] = [ā], [a] is a real matrix.

    f.    Orthogonal matrix: If [a] = ([a])1, [a] is an orthogonal matrix.

    g.    Hermitian matrix: If [a] = [ā], [a] is a Hermitian matrix.

    h.    Unitary matrix: If [a] = ([ā])–1, [a] is unitary.

    i.    Skew-symmetric matrix: If [a] = —[a], [a] is skew symmetric.

    j.    Pure imaginary: If [a] = [ā], [a] is pure imaginary.

    k.    Skew Hermitian: If [a] = [ā], [a] is skew Hermitian.

16 THE SOLUTION OF LINEAR ALGEBRAIC EQUATIONS

Prior to discussing the solution of a system of linear algebraic equations, the rank of a matrix must be defined. The rank of a matrix is equal to the order of the largest nonvanishing determinant contained within the matrix. Thus the ranks of

images

and

images

are 2, 0, 2, 3, and 2, respectively.

The concept of rank is not restricted to square matrices, as can be seen in the third example above. For the case of square matrices, it is easily seen that if the rank equals the order of the matrix, then it necessarily has an inverse.

Now to move to the subject of this section we shall consider a non-homogeneous system of m linear algebraic equations and n unknowns:

images

Using matrix notation, the above system reduces to

images

where [A] = mxn coefficient matrix

images

Another useful matrix is the mx(n + l)-augmented matrix defined by

images

This particular matrix is formed by attaching the vector of known constants as an additional column on the coefficient matrix.

The system of equations defined by (16.2) is said to be consistent if a solution vector {x} exists which satisfies the system. If no solution exists, then the system is said to be inconsistent. We are now at a point where we may present the conditions, without proof, under which the system (16.2) will be consistent. These conditions may be grouped into the following three categories:

    1.    Rank of [A] = rank [A,b] = images This situation yields a solution for any r unknowns in terms of the n — r remaining ones.

    2.    Rank of [A] = rank of [A,b] = r = n: This situation yields a unique solution for the n unknowns.

    3.    Rank of [A] = rank of [A,b] = r < n: A solution always exists in this case.

It should be noted that in each of the above conditions the ranks of the coefficient and augmented matrices were equal. This in fact is the basic and only requirement for the existence of a solution to (16.2). Reiterating this statement, it can be proven§ that a system of linear algebraic equations is consistent if and only if the rank of the coefficient matrix equals the rank of the augmented matrix.

17 THE SPECIAL CASE OF n EQUATIONS AND n UNKNOWNS

When m = n in (16.2), we have the special case of a system of n linear algebraic equations and n unknowns. As in any case a solution will only exist when the ranks of the coefficient and augmented matrices are equal. For convenience we shall assume that the ranks of both of these matrices equal n. This allows us to premultiply by the inverse of the coefficient matrix to obtain the solution.

Suppose we have the set of n linear algebraic equations in n unknowns,

images

The quantities ej, Zij, and ij may be real or complex numbers. If the relation (17.1) exists between the variables ej and ij, then the set of variables ej is said to be derived from the set ij by a linear transformation. The whole set of Eqs. (17.1) may be represented by the single matrix equation

images

where {e} and {i} are column matrices whose elements are the variables ej and ij. The square matrix [Z] is called the matrix of the transformation. When [Z] is nonsingular, Eq. (17.2) may be premultiplied by [Z]–1 and we obtain

images

We thus have a very convenient explicit solution for the unknowns (i1, i2, . . . ,in). The elements of the matrix [Z]–1 may be calculated by Eq. (11.4). As a simple example, let us solve the system of Eqs. (2.1) by using matrix notation. If we introduce the matrices

images

then Eqs. (2.1) may be written in the form

images

and if [a] is nonsingular, the solution is

images

But by (11.4) we have

images

Hence, carrying out the matrix multiplication expressed in (17.6), we obtain

images

Although the solution of a set of linear equations is very elegantly and concisely expressed in terms of the inverse matrix of the coefficients, it frequently happens in practice that it is necessary to solve for only one or two of the unknowns.

In this case it is frequently more convenient to use Cramer’s rule to obtain the solution. The proof of Cramer’s rule follows from the properties of the determinant expressed by Eq. (4.8) and will not be given. Let us consider the set of n equations in the unknowns (x1, x2, . . . ,xn),

images

Then if the determinant of the system

images

the system of Eqs. (17.9) has a unique solution given by

images

where Dr is the determinant formed by replacing the elements

images

of the rth column of |a| by (k1, k2, . . . , kn), respectively.

For example, let us solve the system

images

We have

images

Hence

images

At the risk of repeating what has already been stated in Sec. 16, we shall review some special cases that may arise when working with a system of n equations and n unknowns.

    1.    If the determinant |a| ≠ 0, there exists a unique solution that is given by Cramer’s rule, that is, (17.11).

    2.    If a is of rank r < n and any of the determinants D1, D2, . . . , Dn of (17.11) are of rank greater than r, there is no solution.

    3.    If |a| is of rank r < n and the rank of none of D1 D2, . . . , Dn exceeds r, then there exist infinitely many sets of solutions.

18 SYSTEMS OF HOMOGENEOUS LINEAR EQUATIONS

If in Eq. (16.2) the vector {b} is a null vector, that is {b} = {0}, then (16.2) reduces to the following system of homogeneous linear equations:

images

Regardless of the order of [A], this system is consistent; hence a solution always exists. By inspection it is easy to see that the solution vector {x} = {0} satisfies (18.1) identically. This particular solution is called the trivial solution. In most cases this solution is of little interest to the analyst, and other solutions called nontrivial solutions are sought.

In the most general case the matrix [A] is of order m × n; however, it usually is n × n. In either case it may be stated that if the rank of [A] is less than n, then a nontrivial solution to (18.1) exists. In the case when the rank of the coefficient matrix is n, the only solution that exists is the trivial one.

A few examples of solutions to homogeneous systems are given below.

Example 1

images

The rank of [A] in this case is 2 and n = 2, hence only the trivial solution exists.

Example 2

images

In this case the rank of the coefficient matrix is 2, but n = 3; thus a nontrivial solution exists. To obtain the solution we proceed as follows. Since the rank of [A] is 2 and there are three unknowns, then any two unknowns may be solved for in terms of the third. This may be accomplished in a variety of ways, one of which is the following.

By partitioning, write (18.3) as

images

If we premultiply this equation through by

images

and solve for the vector

images

we find the following solution:

images

or

images

The complete solution may be written in vector form as

images

where c represents an arbitrary constant.

19 THE CHARACTERISTIC MATRIX AND THE CHARACTERISTIC EQUATION OF A MATRIX

If μ is a scalar parameter and [M] is a square matrix of order n, the matrix

images

U = the nth-order unit matrix, is called the characteristic matrix of the matrix [M]. The determinant of the characteristic matrix, det [K], is called the characteristic function of [M]. That is,

images

is the characteristic function of the matrix [M].

The equation

images

is the characteristic equation of the matrix [M]. In general, if [M] is a square matrix of the nth order, its characteristic equation is an equation of the nth degree in μ.

images

The n roots of this equation are called the characteristic roots, or eigenvalues, of the matrix [M].

20 EIGENVALUES AND THE REDUCTION OF A MATRIX TO DIAGONAL FORM

In matrix algebra, when a matrix contains only one column, it is called a column vector. If {x} is a column vector of the nth order and [M] is an nth-order square matrix, the product

images

expresses the fact that the vector {y} is derived from the vector {x} by a linear transformation. The square matrix [M] is called the transformation matrix. If it is required that

images

then (20.1) reduces to

images

and the only effect of the transformation matrix [M] on the vector {x} is to multiply it by the scalar factor μ. Equation (20.3) may be written in the form

images

This equation represents a set of homogeneous linear equations in the unknowns x1, x2, . . . , xn. Except in the trivial case where all the elements of {x} are zero, the determinant of [K] must vanish. Conversely, if |K| =0, there exist nontrivial solutions. In order for this to be the case, the scalar μ can take only the values μ1, μ2, . . . , μn which are the roots of the characteristic equation (19.3). The possible values of μ which satisfy (20.3) are thus the eigenvalues of [M], The nontrivial solutions of (20.4) associated with these eigenvalues are called eigenvectors of [M].

To each eigenvalue μi there corresponds an equation of the form (20.3) which can be written as follows :

images

where {x}i is any eigenvector associated with μi.

If (20.5) is premultiplied by [M], it becomes

images

By repeated multiplications by [M] it can be shown that

images

Let the ith eigenvector be denoted by

images

and let the square matrix [x] be the matrix formed by grouping the column eigenvectors into a square matrix so that

images

If (d) denotes the square diagonal matrix

images

it can be seen that the set of n equations (20.5) may be written in the form

images

If the n eigenvalues of [M] are distinct, it can be shown that the matrix [x] is nonsingular and possesses an inverse [x]–1. Hence (20.11) may be written in the form

images

or

images

It is thus seen that a matrix [x] which diagonalizes [M] may be found by grouping the eigenvectors of [M] into a square matrix. This may be done when the eigenvalues of [M] are all different. When two or more of the eigenvalues of [M] are equal to each other, reduction to the diagonal form is not always possible.

21 THE TRACE OF A MATRIX

The sum of all the diagonal elements of a square matrix is called the trace of the matrix and is denoted by the following expression:

images

There exist some interesting relations between the trace of a matrix and the characteristic equation of the matrix. The characteristic equation of [M] has the form

images

It can be shown that

images

If μ1, μ2, . . . , μn are the eigenvalues of [M], then as a result of (21.3) we have

images

It can be shown that the product of the eigenvalues is given by the equation

images

Maxime Bôcher has shown that there is an intimate connection between the traces of powers of a matrix and the coefficients of its characteristic equation (21.2) These results are of importance in obtaining the characteristic equations of matrices and in the computation of the inverses of matrices. The results of Bôcher are deduced in his book on higher algebra and are given here without proof.

The characteristic equation of a square matrix [M] of the nth order is, in general, an equation of the nth degree as given by (21.2). Bôcher’s results relate the coefficients ci of the characteristic equation of [M] to the traces of powers of [M]. Let

images

Then the coefficients c1 c2, . . . , cn of the characteristic equations are

images

These equations enable one to obtain the characteristic equation of a matrix in a more efficient manner than by the direct expansion of det [μUM]. If the determinant of [M] itself is desired, it can be calculated by the use of the following equation:

images

where cn is as given above.

22 THE CAYLEY-HAMILTON THEOREM

One of the most important theorems of the theory of matrices is the Cayley-Hamilton theorem. By the use of this theorem many properties of matrices may be deduced, and functions of matrices can be simplified. The Cayley-Hamilton theorem is frequently stated in the following terse manner: Every square matrix satisfies its own characteristic equation in a matrix sense. This statement means that if the characteristic equation of the nth-order matrix [M] is given by

images

the Cayley-Hamilton theorem states that

images

where U is the nth-order unit matrix.

A heuristic proof of this theorem may be given by the use of the following result of Sec. 20:

images

where μi is the ith eigenvalue of [M] and {x}i is the corresponding eigenvector. The characteristic function of [M] is given by

images

If in (22.4) the scalar μ is replaced by the square matrix [M] and the coefficient of cn is multiplied by the nth-order unit matrix U, the result may be written in the following form:

images

If (22.5) is postmultiplied by the eigenvector {x}i the result is

images

As a consequence of (22.3), this may be written in the following form:

images

However, since μi is one of the eigenvalues of [M], we have

images

Hence (22.7) reduces to

images

The set of equations (22.9) may be written as the single equation

images

where [x] is the square matrix (20.9) and [0] is an nth-order square matrix of zeros. If [M] has n distinct eigenvalues, the matrix [x] is nonsingular and has an inverse. If (22.10) is postmultiplied by [x]–1, the result is

images

This important result is the Cayley-Hamilton theorem. It may be shown that the Cayley-Hamilton theorem holds for singular matrices and for matrices that have repeated eigenvalues.

If the matrix [M] is nonsingular, its inverse may be computed by the use of the Cayley-Hamilton theorem in the following manner: Expand (22.11) in the form

images

If this equation is premultiplied by [M]–1, the result is

images

Hence, if this equation is solved for [M]–1, the result is

images

This result expresses the inverse of [M] in terms of the n — 1 powers of [M] and has been suggested as a practical method for the computation of the inverses of large matrices. Hotelling gives the following application of (22.14) to the computation of the inverse of a numerical matrix: Let the matrix to be inverted be

images

Then

images

images

images

From the diagonals of the above matrices, the following traces may be calculated:

images

By means of (21.7) the following coefficients of the characteristic equation of [M] can be computed:

images

The characteristic equation of [M] is therefore

images

The inverse of [M] is now computed by the use of (22.14) and is given by

images

If the determinant of [M] is desired, it may be obtained from the equation

images

This method for the computation of the inverse of a matrix is straightforward and easily checked and is ideally adapted to matrix multiplication by means of punched cards. As a by-product of the computation the characteristic equation and the determinant of the matrix are obtained.

23 THE INVERSION OF LARGE MATRICES

With the development of many powerful modern instruments of calculation such as punched-card machines, electronic digital-computing machines, analogue computers, etc., the field of application of mathematical analysis has vastly increased. Many diversified problems in various fields may be reduced to one mathematical operation: that of the solution of systems of linear algebraic equations. A few of the problems that may be reduced to this mathematical common denominator are the following:

    1.    The distribution of electrical currents in multimesh electrical circuits.

    2.    The distribution of the rate of flow of water in complex hydraulic systems.

    3.    The approximate solution of potential problems.

    4.    The solution of the normal equations in the application of the method of least squares to various physical investigations.

    5.    The applications of statistical studies to psychology, sociology, and economics.

    6.    The determination of the amplitudes of motion in the forced vibrations of mechanical systems.

    7.    The approximate solution of certain problems of electromagnetic radiation.

The above list is far from complete, but it gives an indication of the diversified classes of problems that may be reduced to the solution of a system of linear equations.

It may be thought that the problem of solving systems of linear equations is an old one and that it can be effected by means of well-known techniques such as Cramer’s rule (Sec. 16). However, in studying problems of the type enumerated above, it is frequently necessary to solve 150 or 200 simultaneous equations. This task presents practically insurmountable difficulties if undertaken by well-known elementary methods. In the past few years many powerful and apparently unrelated methods have been devised to perform this task. A very complete survey of these methods, with a bibliography containing 131 references, will be found in the paper by Forsythe.

The magnitude of the task of solving a large system of linear equations by conventional elementary methods and the spectacular reduction of effort that may be effected by the use of modern techniques are vividly presented by Forsythe. He points out that the conventional Cramer’s-rule procedure for solving n simultaneous equations requires the evaluation of n + 1 determinants. If these determinants are evaluated by the Laplace expansion (Sec. 4), this requires n! multiplications per determinant. It is thus apparent that the conventional determinantal method of solution requires (n + 1)! multiplications. Therefore, if it is required to solve 26 equations, it would, in general, require (n + 1)! = 27! ≈ 1028 multiplications. A typical high-speed digital machine such as the IBM/360 model 75 automatic computer can perform 133,000 multiplications per second. Hence, it would take this machine approximately 1014 years to effect the solution of this problem by the use of the conventional Cramer’s-rule technique. By more efficient techniques the computational procedure may be arranged so that only about n3/3 multiplications are required to solve the system of n equations. If this efficient method is used, 6,000 multiplications would be required, and the IBM machine can solve the problem in about 0.05 sec.

The detailed theory of the various methods that have been developed in recent years will be found in the above survey article by Forsythe and should be consulted by those interested in the solution of large numbers of simultaneous linear equations. A simple method that has been found valuable and that makes use of partitioned matrices will now be described.

Let the matrix whose inverse is desired be the matrix [M], of order n. Then, by the methods of Sec. 14, the matrix [M] can be partitioned into four submatrices in the following manner:

images

The orders of the various matrices are indicated by the letters in parentheses. The first letter denotes the number of rows and the second letter the number of columns. Since the original matrix is of order n, we have the relation

images

Now let the reciprocal of [M] be the matrix [B], and let it be correspondingly partitioned so that we have

images

Since [B] is the inverse of [M], we have

images

where Un is the unit matrix of order n. If the multiplication indicated by (23.4) is carried out, we have

images

These equations may, in general, be solved to give the submatrices of the required matrix [B] explicitly. The results may be expressed as follows: Let

images

We then have explicitly

images

These equations determine [B] = [M]–1 provided that the reciprocals images and θ–1 exist. If images known, then the matrices X, Y, and θ can be calculated by matrix multiplication and [B] can be determined by the use of (23.7). The choice of the manner in which the original matrix [M] is partitioned depends on the particular matrix to be inverted. From the above description of the method it is obvious that it yields to various modifications. The method of Sec. 22 may advantageously be applied to find images and θ–1. By this procedure the inversion of a large matrix may be reduced to the inversion of several smaller matrices.

CHOLESKI’S METHOD

Another very effective method for the inversion of matrices is Choleski’s method. In its simplest form it applies to the inversion of symmetric matrices and is simple in theory and straightforward in practice. This method is sometimes called the square-root method, and an excellent exposition of this method is given in the paper by Fox, Huskey, and Wilkinson. Choleski’s method consists simply in expressing the symmetric matrix [M] whose inverse is desired in the form

images

where [L] is a lower triangular matrix and [L], its transpose, an upper triangular matrix. Then the desired inverse is given by

images

and the problem is solved. To fix the fundamental ideas involved, consider the case of a third-order symmetric matrix [M]. We then have to find the third-order matrices [L] and [L]t by means of the equation

images

By matrix multiplication of [L][L]t the following result is obtained:

images

By equating the elements of [L]|L]t to those of [M] and solving the resulting equations one finds a=1, b = 2, c = 3, d=j, e = 2j, f=j. In order to determine [M]–1 by (23.9), it is necessary to invert [L]t. This is easily done because images is also an upper triangular matrix of the form

images

The elements of images may be determined by the equation

images

From (23.13) we find that P=l, Q = 2j, R = –j, S = –j, T=2j, U = –j. The required inverse of [M] is given by

images

The work involved in the Choleski method appears formidable in the above example but in practice can be greatly reduced. The method can be extended to the computations of inverses of unsymmetric matrices.

24 SYLVESTER’S THEOREM

A very important and useful theorem of matrix algebra is known as Sylvester’s theorem. This theorem states that, if the n eigenvalues μ1 μ2, . . . , μn of the square matrix [M] are all distinct and P([M]) is a polynomial in [M] of the following form:

images

where U is the nth-order unit matrix and the ck’s are constants, then the polynomial P([M]) may be expressed in the following form:

images

where

images

In (24.3) (μ) is the characteristic function of [M] as defined by (19.2); [A(μ)] is the adjoint matrix of the characteristic matrix of [M] defined by

images

and

images

A proof of Sylvester’s theorem as well as the confluent form of the theorem in the case that the matrix [M] does not have distinct eigenvalues will be found in the book by Frazer, Duncan, and Collar.

Sylvester’s theorem is very useful for simplifying the calculations of powers and functions of matrices. For example, let it be required to find the value of [M]256, where

images

To do this, consider the special polynomial

images

and apply the results of Sylvester’s theorem. In this special case we have

images

and

images

The eigenvalues of [M] are μ1 = 1 and μ2 = 3. [A(μ)] is now given by

images

and

images

Hence we have

images

Therefore, by Sylvester’s theorem, we have

images

As the last step above suggests, Sylvester’s theorem is very useful in obtaining the approximate value of a matrix raised to a high power. Thus, if μm is the largest eigenvalue of the square matrix [M], then, if k is very large, the term P(μm)Z0(μm) will dominate the right-hand member of (24.2) and we have the following approximate value of [M]k:

images

This expression is useful in certain applications of matrix algebra to physical problems.

25 POWER SERIES OF MATRICES; FUNCTIONS OF MATRICES

A series of the type

images

where [M] is a square matrix of order n, the coefficients ct are ordinary numbers, and [M]0 = U, the nth-order unit matrix, is called a power series of matrices. The general question of the convergence of such series will not be discussed here; however, if [M] is a square matrix of order n, it is clear that by the use of the Cayley-Hamilton theorem (Sec. 22) it is possible to reduce any matrix polynomial of [M] to a polynomial of maximum degree n — 1 regardless of the degree of the original polynomial. Hence, if the series (25.1) converges, it may be expressed in the following form:

images

where

images

as a consequence of Sylvester’s theorem (Sec. 24). For example, it can be shown that the series

images

converges for any square matrix [M]. In particular, suppose that the matrix [M] is the following second-order matrix:

images

The characteristic equation of this matrix is

images

so that the eigenvalues of [M] are μ1 = 1, μ2 = 2. The matrix [A(μ)] is now

images

Since ′(1) = –1 and ′(2) = +1, we have as a consequence of (25.2)

images

Other functions of matrices such as trigonometric, hyperbolic, and logarithmic functions are defined by the same power series that define these functions for scalar values of their arguments and by replacing these scalar arguments by square matrices. For example, we have

images

images

images

images

Although every scalar power series has its matrix analogue, the matrix power series have, in general, more complex properties. The properties of the matrix functions may be deduced from their series expansions. For example, if t is a scalar variable and [u] is a square matrix of constants, we have

images

The series obtained by the term-by-term differentiation of (25.13) is

images

In the same way, it may be shown that, if

images

then

images

26 ALTERNATE METHOD OF EVALUATING FUNCTIONS OF MATRICES

In Sec. 25 we saw how Sylvester’s theorem could be used to evaluate functions of matrices. This process, though useful, is sometimes tedious in its application. In this section an alternate method is presented which is generally more convenient to use.

In Sec. 22 it was seen how any matrix polynomial of order n could be reduced to a polynomial of order n — 1 by applying the Cayley-Hamilton theorem. Since any function of a matrix of order n may be represented by an infinite series, the Cayley-Hamilton theorem allows us to assert that any matrix function of order n may be expressed in terms of a polynomial of order n — 1; that is,

images

where [A] is of order n.

The coefficients ai in (26.1) are unknown, and the problem is to determine them. Resorting again to the Cayley-Hamilton theorem, we assert that for each eigenvalue of [A], μi, we must have

images

Equation (26.2) yields n equations for the n unknowns ai. Employing matrix notation, (26.2) may be expressed as

images

or more simply

images

where the element Fi denotes F(μi) and [V] is the Vandermonde matrix of eigenvalues.

As long as all the eigenvalues are distinct, the matrix [V] has an inverse and hence we may solve for the unknowns ai:

images

As a first example let us evaluate

images

which was done in Sec. 25 by application of Sylvester’s theorem. According to (26.1), we may write

images

The eigenvalues of [M] are 1 and 2; thus by (26.2) we have

images

or

images

Inverting and solving for a0 and a1 yields

images

Substituting into (26.6) gives

images

As a second example consider the problem of evaluating

images

The reader may verify that the eigenvalues of A are 1, 2, and 3, respectively. By (26.1) we may write

images

Substituting into (26.2) yields

images

or in the form of (26.3)

images

Solving for the vector {a} gives

images

from which a0, a1 and a2 may be found. Substituting these values into (26.12) yields the final result:

images

27 DIFFERENTIATION AND INTEGRATION OF MATRICES

Up to this point we have considered the elements of the matrices as constants, which is the most common situation. However, in the theory of matrix differential equations, cases frequently arise wherein the elements of the matrices are functions of the independent variable, e.g., time.

If, for example, we consider a matrix whose elements are functions of time, then we can define the derivative and integral of such a matrix as follows:

images

images

In other words, the derivative or integral of a matrix is the matrix of the derivatives or integrals of the individual elements. The requirement for the existence of the derivative or integral of a matrix is simply that the derivative or integral of every element must exist.

28 ASSOCIATION OF MATRICES WITH LINEAR DIFFERENTIAL EQUATIONS

A very important field of application of the theory of matrices is the solution of systems of linear differential equations. Matrix notation and operations with matrices lead to concise formulations of otherwise cumbersome procedures. This conciseness is very important in many practical problems that involve several dependent variables. Such problems arise in dynamics, the theory of vibrations of structures and machinery, the oscillations of electrical circuits, electric filters, and many other branches of physics and engineering.

In order to apply matrix notation to a set of differential equations, it is advisable to express the equations in forms that involve only the first-order differential coefficients. Any set of differential equations can be easily changed into another set that involves only first-order differential coefficients by writing the highest-order coefficient as the differential coefficient of a new dependent variable. All the other differential coefficients are regarded as separate dependent variables, and, for each new variable introduced, another equation must be added to the set. As a very simple example, consider the following equation:

images

In order to express this as a set of first-order equations, let x = x1 and

images

With this notation (28.1) becomes

images

Therefore the original second-order differential equation (28.1) may be written in the following matrix form:

images

This is a special case of the first-order matrix differential equation d.

images

where {x} is in general a column vector of the n elements x1 x2, . . . , xn and [A] is a square matrix of the nth order. A general set of homogeneous linear differential equations may be expressed in the form (28.5). In general the elements aij of the matrix [A] are functions of the independent variable t; however, in many special cases of practical importance the elements of [A] are constants.

It will be left as an exercise for the reader to show that the general linear nth-order differential equation

images

where

images

and its associated set of initial conditions

images

may be transformed into the linear first-order matrix differential equation

images

by the transformation

images

In (28.8) we have

images

For those cases where [A] is a constant matrix there are two convenient forms for the solution of (28.8). By differentiation one may verify that the solution may be expressed in the convenient matrix form

images

This solution is particularly convenient in cases where numerical solutions are desired.

The second form of the solution is commonly called an eigenvector expansion and may be derived in the following way. Consider the linear transformation from {x} to a new dependent vector {y} by means of the modal matrix of [A], [M]; that is, let

images

Substituting this relation into (28.8) and premultiplying by [M]–1 yields

images

Since [M] is the modal matrix of [A], the term [M]–1[A][M] is a diagonal matrix with the eigenvalues of [A] on the main diagonal:

images

The result is that the transformed system is uncoupled, that is, (28.14) may be represented by the single scalar differential equation

images

whose general solution is

images

where ck is an arbitrary constant. Returning to (28.14), the general solution may be written in the form

images

From (28.12) the general solution to (28.8) now becomes

images

From Sec. 20 the modal matrix is defined as

images

where {x}k is the kth eigenvector of [A]. Substituting (28.19) into (28.18) and expanding gives the general solution to (28.8) in terms of an eigenvector expansion:

images

This equation shows that the general solution may be expressed as a linear combination of the eigenvectors. By applying the initial conditions the n constants ck may be evaluated from (28.20):

images

where

images

and therefore

images

As an example of this method consider (28.1) subject to the initial conditions:

images

The matrix form of this equation is given in (28.5), and the initial condition vector is

images

The eigenvalues and eigenvectors of [A] are

images

Using (28.20), the general solution has the form

images

Applying (28.22) to determine the arbitrary constants yields

images

or

images

and the particular solution is

images

Since x1 = x, the solution to (28.1) is obtained from the first row of (28.25):

images

The only restriction on (28.20) is that the coefficient matrix must have distinct eigenvalues. Whenever multiple eigenvalues occur, a modified approach must be employed since all of the eigenvectors associated with the multiple root cannot be determined by usual methods. When this situation occurs, the alternate method is based on the method of undetermined coefficients.

For convenience, consider that the first eigenvalue of an nth-order system has a multiplicity of 3: that is, consider

images

where [A] is nth order and

images

To solve (28.27) assume a solution in the form

images

Substituting this assumed solution into (28.27) gives

images

By employing the definition of eigenvectors and eigenvalues, the summations on each side of the equality may be canceled. The remaining expression becomes

images

Equating like powers of t yields

images

which may be solved sequentially for the three unknown eigenvectors.

Once the unknown eigenvectors have been evaluated, the general solution becomes

images

for which the n arbitrary constants may be evaluated in the usual manner by application of the initial conditions.

29 METHOD OF PEANO-BAKER

When the matrix [A] is a function of the independent variable t, the set of differential equations (28.7) may be integrated by means of the Peano-Baker method. The principle of this method of integration is quite simple. Let the initial conditions of the system be such that

images

where {x0} is a column vector of initial values of the elements of {x}. Direct integration of (28.8) gives the following integral equation:

images

where t′ is a subsidiary variable of integration. This integral equation may be solved by repeated substitutions of {x} from the left-hand member of (29.2) into the integral. If the integral operator is introduced

images

then (29.2) may be written in the following form:

images

By repeated substitution, the following equation is obtained:

images

Let

images

This expression has been called the matrizant of [A]. In the series (29.6) [U] is the unit matrix of order n and the second term is the integral of [A] taken between the limits 0 and t. To obtain Q[A]Q[A], [A] and Q[A] are multiplied in the order [A] Q[A] and the product is integrated between 0 and t. The remaining terms are formed in the same manner. If the elements of the matrix [A(t)] remain bounded in the range 0 to t, it may be shown that the series (29.6) is absolutely and uniformly convergent and that it defines the square matrix G[A] in a unique manner.

The solution of the set of differential equations (28.8) is given by

images

If the elements of the matrix [A] are constants, we have, by direct integration of the series (29.6),

images

Therefore, if [A] is a matrix of constants, the solution of (28.8) is

images

The interpretation of e[A]t, for purposes of calculation, may be made by the use of Sylvester’s theorem (24.2). To illustrate the general procedure, consider the second-order case in which the matrix [A] has the form

images

In this case the characteristic equation of [A] is given by

images

Let the two roots of (29.11) or the eigenvalues of (29.10) be given by

images

By the use of Sylvester’s theorem (24.2), the function may be expressed in the following form:

images

where

images

To illustrate this method of solution by an example of technical importance, consider the differential equations

images

These differential equations govern the oscillation of a series electric circuit having a resistance R, an inductance L, and a capacitance C, all in series (see Chap. 5). The variable q is the charge on the capacitance, and i is the current that flows in the circuit. The free oscillations of such a circuit are completely determined when the initial charge q0 and the initial current i0 of the circuit at t = 0 are specified. If we let x1=q and x2 = i, Eqs. (29.15) may be written in the standard matrix form:

images

The characteristic equation of [A] is, in this case,

images

In this case

images

The oscillations of the circuit are given by the following matrix equation:

images

where the elements Aij are given by (29.14). If b is imaginary so that b =jω, then we have

images

so that the hyperbolic functions are replaced by trigonometric functions and the circuit oscillates. If b = 0, the circuit is critically damped and sinh(bt)/b becomes indeterminate. When this is evaluated by L’Hospital’s rule for indeterminate forms, it is seen that it must be replaced by t. The result (29.19) is thus seen to be a convenient expression that gives the current and charge of the circuit in terms of the initial conditions in concise form.

DIFFERENTIAL EQUATIONS WITH VARIABLE COEFFICIENTS

The Peano-Baker method is a convenient method for the integration of homogeneous linear differential equations with variable coefficients. As an example of the general procedure, consider the differential equation

images

This equation is known as Stokes’s equation in the mathematical literature of the diffraction and reflection of electromagnetic waves and in quantum mechanics.

In order to reduce this equation to the form of (28.8), let

images

The system

images

replaces the original equation (29.21). Let the initial condition vector at t = 0 be

images

The matrizant G[A] is now computed by means of the series (29.6):

images

images

images

The first few terms of the matrizant are, by (29.6),

images

The elements of the matrizant G[A] are series expansions of the Bessel functions of order images The solution of Stokes’s equation may be expressed in terms of the initial conditions in the form

images

30 ADJOINT METHOD

Besides the homogeneous first-order matrix differential equation considered in the last section, systems of nonhomogeneous equations are also of considerable interest. A typical equation of this type may be written in the following form:

images

where [A] = coefficient matrix

{f} = vector of forcing functions

{x} = unknown vector

As previously considered, both [A] and {f} may be functions of the independent variable.

The method of solving the system (30.1) presented in this section is the adjoint method. This method yields a solution to (30.1) by seeking a solution to the following set of adjoint equations:

images

where [Y] is an unknown matrix. The reasons for choosing the particular form for the adjoint system given in (30.2) will become clear as the process of solution unfolds.

The method proceeds as follows: premultiply (30.1) by [Y] and post-multiply the transpose of (30.2) by {x}. Adding the two resulting equations together yields

images

However, the two terms on the left are the derivative of a product and therefore (30.3) may be rewritten as

images

Integrating this equation between the limits of 0 and t and recalling that

images

gives

images

Hence, the solution to (30.1) is

images

where [Y] is the solution to the adjoint equation (30.2), which is homogeneous with convenient initial conditions.

If the coefficient matrix [A] is a constant matrix, the solution to (30.1) as given by (30.6) may be reduced to the following result:

images

As an example consider the equation

images

For this equation

images

Applying the method of Sec. 26 yields

images

Substituting this result into (30.11) gives

images

31 EXISTENCE AND UNIQUENESS OF SOLUTIONS TO MATRIX DIFFERENTIAL EQUATIONS

In this section we shall simply state the conditions required for the existence and uniqueness of solutions to matrix differential equations without showing the proof.

Before presenting these conditions further notation will be introduced in order to simplify the equations. Instead of writing the matrix differential equations as we have thus far, the following notation will be used:

images

where [L] = an operator matrix

{f} = a matrix of forcing functions

This notation may be used to represent (30.1) as follows:

images

Likewise (28.10) may be represented by (31.1) if

images

and

images

Hence any set of simultaneous differential equations may be written in the compact form of (31.1).

Besides (31.1) the corresponding homogeneous equation must be considered ; that is,

images

The conditions for the existence and uniqueness of a solution to (31.1) may be stated as follows:

    1.    A solution to (31.1) is unique if and only if the corresponding homogeneous equation (31.2) possesses only a trivial solution.

    2.    A solution to (31.1) exists if and only if the vector {f} is orthogonal to all solutions of the homogeneous adjoint equation [L]′{x} = {0}.

32 LINEAR EQUATIONS WITH PERIODIC COEFFICIENTS

The mathematical analysis of a considerable variety of physical problems leads to a formulation involving a differential equation that may be reduced to the following form:

images

where F(t) is a single-valued periodic function of fundamental period T. An equation of this type is known as Hill’s equation in the mathematical literature. If a change of variable of the form

images

is introduced into the Hill equation (32.1), the equation is transformed into

images

as a consequence of the periodicity of F(t). Hence Hill’s equation is invariant under the change of variable (32.2), and it is thus apparent that it is necessary only to obtain a solution of (32.1) in a fundamental interval images The final values of y and images at the end of one interval of the variation of F(t) are the initial values of y and images in the following interval.

Let u1(t) and u2(t) be two linearly independent solutions of Hill’s equation (32.1) in the fundamental interval images Let y = x1 and images Then we have

images

where K1 and K2 are arbitrary constants. Equations (32.4) may be written in the following convenient matrix form:

images

The following notation will be introduced in order to simplify the writing:

images

In this notation (32.5) may be expressed in the following concise form:

images

The determinant

images

is called the Wronskian of the two solutions u1(t) and u2(t). It is a constant in the fundamental interval images as may be shown by differentiation. Since the two solutions u1(t) and u2(t) are linearly independent, W0 ≠ 0. The matrix [u(t)] is therefore nonsingular and has the following inverse:

images

At t = 0, (32.7) reduces to

images

If this is premultiplied by [u(0)]–1, the result may be written in the following form:

images

This equation determines the column of arbitrary constants in terms of the initial conditions. If (32.11) is now substituted into (32.7), the result is

images

At the end of the fundamental interval, t = T, and the solution is given in terms of the initial conditions by the equation

images

CASCADING THE SOLUTION

Since, during the second fundamental interval, Hill’s equation has the same form as it has during the first interval, the relation expressing the initial and final values of [x(t)] in this interval has the same form as that given in (32.13). Therefore, at the end of the second cycle of the variation of F(t) we have

images

Introduce the following notation:

images

It is easy to see that, at the end of n periods of the variation of F(t), the solution of Hill’s equation is given by

images

and the solution within the (n + l)st interval is given by

images

This gives the solution of Hill’s equation at any time t > 0 in terms of the initial conditions and two linearly independent solutions in the fundamental interval images The solution involves the computation of positive integral powers of the matrix [M].

COMPUTATION OF INTEGRAL POWERS OF [M]

As a consequence of the constancy of the Wronskian determinant W0 given by (32.8), the determinant of the matrix [M] is equal to unity. We thus have

images

The characteristic equation of [M] is

images

If A + D ≠ ±2, the eigenvalues of [M] are given by

images

and in this case [M] has distinct eigenvalues. By Sylvester’s theorem (24.2) the nth power of [M] may easily be computed. The result is

images

where

images

Equation (32.21) gives the proper form for [M]n only when A + D±2. When A + D = ±2, the two eigenvalues of [M] are not distinct and a confluent form of Sylvester’s theorem must be used to compute [M]n.

33 MATRIX SOLUTION OF THE HILL-MEISSNER EQUATION

If the function F(t) has the form of the rectangular ripple of Fig. 33.1, Eq. (32.1) is known as the Hill-Meissner equation. The Hill-Meissner equation is of

images

Fig. 33.1

importance in the study of the oscillations of the driving rods of electric locomotives, and its solution is easily effected by the use of matrix algebra. In this case Eq. (32.1) reduces to two equations of the form

images

The solution of Eqs. (33.1) at t = T may be expressed in terms of the initial values of y and images at t = 0 by the equation

images

where images and

images

If the indicated matrix multiplication in (33.2) is performed, the result may be written in the following form:

images

In this case the matrix [M] has the form

images

where

images

The eigenvalues of [M] are given by

images

where

images

The values of y and images after the end of n complete periods of the square-wave variation are given by

images

THE STABILITY OF THE SOLUTION

The nature of the solution of the Hill-Meissner equation depends on the form taken by the powers of the matrix [M], as given by (32.22). There are two separate cases to be considered. If A + D > 2, a must have a positive real part and the elements of [M]n increase exponentially with the number of periods, n. This leads to an unstable solution in physical applications. If, on the other hand, A + D < 2, a must be a pure imaginary number and the elements of [M]n oscillate with n. This leads to stable solutions in physical applications. The dividing line leads to the case A + D = 2. In this case the eigenvalues of [M] are repeated and both have the value μ1 = μ2 = 1. The form (32.22) is no longer applicable and must be replaced by one obtained from a confluent form of Sylvester’s theorem. In general the condition A + D = 2 leads to instability in physical problems. The question of stability may be summarized by the following expressions:

images

In any special cases the conditions for stability or instability may be deduced from (33.8).

34 THE USE OF MATRICES TO DETERMINE THE ROOTS OF ALGEBRAIC EQUATIONS

The properties of matrices may be used to determine the roots of algebraic equations. As a simple example, consider the equation

images

Let the following third-order matrix be constructed:

images

This matrix is called the companion matrix of the cubic equation (34.1). It is easy to show that the characteristic equation of the companion matrix [C] is the cubic equation. Therefore the eigenvalues of [C] are the roots of (34.1). The general algebraic equation of the nth degree

images

has a companion matrix of the nth order

images

The eigenvalues of [C], μ1, μ2, μ3,. . . , μn, are the roots of the equation (34.3). If the various roots of the nth-degree equation (34.3) are real and distinct and if μ1 is the root having the largest absolute value, then it can be shown by Sylvester’s theorem (24.2) that for a sufficiently large m we have

images

That is, for a sufficiently large-power m, a further multiplication of [C]m by [C] yields a new matrix whose elements are μ1 times the elements of [C]m. This multiple, μ1, is the dominant root of (34.3). A practical procedure for extracting the dominant root of an equation is to choose an arbitrary column vector of the nth order (x) and set up the following sequence:

images

Then, for a sufficiently large m, we have

images

so that after a sufficient number of multiplications one more multiplication merely multiplies all the elements of the vector (x)m by a common factor. This common factor μ1 is the largest root of the equation and the largest eigenvalue of the companion matrix. As a simple numerical example, consider the following cubic equation:

images

The companion matrix of this equation

images

We now choose the arbitrary column vector

images

By direct multiplication we obtain

images

The factor 28 may be discarded since we are interested only in the ratios of similar elements after successive multiplication. Let images denote the fact that a factor has been discarded, and proceed.

images

Further multiplications only multiply the elements of the vectors by a factor of 10. Hence the dominant root is μ1 = 10. After having determined the dominant root, it may be divided out by dividing the polynomial f(z) = z3 – 16z2 + 68z – 80 by z – 10. This procedure may be performed by the synthetic division

images

Since the remainder is zero, the dominant root is exactly μ1 = 10. The polynomial f(z) is seen to have the factors f(z) = (z – 10)(z2 – 6z + 8). The roots of the quadratic z2 – 6z + 8 = 0 are μ2 = 4 and μ3 = 2. The eigenvalues of the companion matrix (34.9) are therefore μ1 = 10, μ2 = 4, μ3 = 2. If only the smallest root of the equation is desired, the substitution z=1/y may be made in Eq. (34.3) and the companion matrix of the resulting equation in y may be constructed. The iteration scheme (34.7) then yields the largest root in y and therefore the smallest root of the original equation.

The method of matrix iteration may be used to determine repeated and complex roots of algebraic equations.

PROBLEMS

1. Solve the following system of equations by the use of determinants:

images

2. Evaluate the determinant

images

3. Solve the following system of equations:

images

4. Prove Cramer’s rule for the general case of n equations.

5. Given

images

compute [a]10.

6. Given

images

compute [a]–2.

7. Find the characteristic equation and the eigenvalues of the following matrix:

images

8. Show that the matrix

images

is transformed to the diagonal form

images

where Tθ is the following matrix:

images

and tan2θ = 2h/(ab) (Jacobi’s transformation).

9. Show that

images

10. Show that the matrix

images

has properties similar to those of the imaginary number images Show that [J]2 = –U, where U is the second-order unit matrix, and that [J]3 = –[J], [J]4 = U, [J]5 = [J], etc.

11. Determine the eigenvalues and eigenvectors of the matrix

images

12. Determine the roots of the equation z3 – 16z2 + 65z – 50 = 0 by the matrix iteration method.

13. Solve the differential equation d2y/dt2 + w2y = 0, subject to the conditions y(0) = y0 and images by the Peano-Baker method.

14. Show that (18.1) is always consistent.

15. Using the matrix method illustrated in Sec. 18, solve (18.3) for x1 and x3 in terms of x2.

16. Find the value of the constant c for which the system

images

is consistent, and find the solution.

17. Derive (28.7).

18. Given (30.2) and (30.7), derive (30.8).

19. If [A] is a constant matrix, show that (30.10) reduces to (30.11).

20. If [A] is a diagonal matrix, show that f([A]) must equal [f(aii)]; for example,

images

21. Verify that (30.14) is the solution to (30.12).

22. Solve the following ([A] is the matrix in Prob. 7 above):

images

23. Solve the following systems:

images

(Hochstadt, 1964)

24. Solve the following differential equations by reducing them to first-order matrix equations:

images

(Hochstadt, 1964)

25. Find the eigenvalues and eigenvectors of the following matrices:

images

(Kreyszig, 1962)

REFERENCES

1931. Bôcher, M.: “Introduction to Higher Algebra,” The Macmillan Company, New York.

1932. Turnbull, H. W., and A. C. Aitken: “An Introduction to the Theory of Canonical Matrices,” Blackie and Son, Ltd., Glasgow.

1937. Pipes, L. A.: Matrices in Engineering, Electrical Engineering, vol. 56, pp. 1177–1190.

1938. Frazer, R. A., W. J. Duncan, and A. R. Collar: “Elementary Matrices,” Cambridge University Press, New York.

1946. MacDuffee, C. C.: “The Theory of Matrices,” Chelsea Publishing Co., New York.

1949. Wedderburn, J. H. M.: “Lectures on Matrices,” J. W. Edwards, Publisher, Incorporated, Ann Arbor, Mich.

1950. Zurmuhl, R.: “Matrizen,” Springer-Verlag OHG, Berlin.

1951. Kaufmann, A., and M. Denis-Papin: “Cours de calcul matriciel,” Editions Albin Michel, Paris.

1953. Paige, L. J., and Olga Taussky (eds.): “Simultaneous Linear Equations and the Determination of Eigenvalues,” National Bureau of Standards, Applied Mathematics Series 29, August.

1956. Friedman, B.: “Principles and Techniques of Applied Mathematics,” John Wiley & Sons, Inc., New York.

1958. Hohn, F. E.: “Elementary Matrix Algebra,” The Macmillan Company, New York.

1960. Bellman, R.: “Introduction to Matrix Analysis,” McGraw-Hill Book Company, New York.

1962. Kreyszig, E.: “Advanced Engineering Mathematics,” John Wiley & Sons, Inc., New York.

1963. Pipes, L. A.: “Matrix Methods for Engineers,” Prentice-Hall, Inc., Englewood Cliffs, N.J.

1964. Hochstadt, H.: “Differential Equations,” Holt, Rinehart and Winston, Inc., New York.

Rank may also be defined as the order of the largest nonvanishing minor contained in the matrix.

For an excellent and thorough theoretical discussion of the subject of linear algebraic equations, see Hohn, 1958, chap. 5 (see References).

Ibid, for a detailed proof.

§ See Hohn, 1958, chap. 5, for a detailed proof (see References).

This restriction may be expanded to include multiple eigenvalues as long as there are n linear independent eigenvectors associated with [M].

M. Bôcher, “Introduction to Higher Algebra,” The Macmillan Company, New York, 1931.

M. Bôcher, “Introduction to Higher Algebra,” p. 296, The Macmillan Company, New York, 1931.

H. Hotelling, New Methods in Matrix Calculation, The Annals of Mathematical Statistics, vol. 14, no. 1, pp. 1–34, March, 1943.

G. E. Forsythe, Solving Linear Equations Can Be Interesting, Bulletin of the American Mathematical Society, vol. 59, no. 4, pp. 299–329, July, 1953.

L. Fox, H. D. Huskey, and J. H. Wilkinson, Notes on the Solution of Algebraic Linear Simultaneous Equations, Quarterly Journal of Mechanics and Applied Mathematics, vol. 1, pp.149–173, 1948.

R. A. Frazer, W. J. Duncan, and A. R. Collar, “Elementary Matrices,” pp. 78–79, Cambridge University Press, New York, 1938.

Convergence is guaranteed provided S(μr) converges for each eigenvalue. Cf. Frazer, Duncan, and Collar, 1957, p. 81 (see References).

Arbitrary constants have been omitted from the terms involving the unknown eigenvectors since at most an eigenvector is known up to an arbitrary constant.

See E. L. Ince, “Ordinary Differential Equations,” pp. 408–415, Longmans, Green & Co., Ltd., London, 1927.

See “Tables of the Modified Hankel Functions of Order One-third,” Harvard University Press, Cambridge, Mass., 1945.

To be precise, the operator matrix must have a closed range. Cf. Friedman, 1956 (see References).

See L. A. Pipes, Matrix Solutions of Equations of the Mathieu-Hill Type, Journal of Applied Physics, vol. 24, no. 7, pp. 902–910, July, 1953.

See L. A. Pipes, Matrix Solution of Polynomial Equations, Journal of the Franklin Institute, vol. 225, no. 4, pp. 437–454, April, 1938.