CHAPTER 20

NUMERICAL CALCULATIONS AND FINITE DIFFERENCES

    20.1. Introduction

      20.1-1. Survey

      20.1-2. Errors

    20.2.Numerical Solution of Equations

      20.2-1. Introduction

      20.2-2. Iteration Methods. Newton-Raphson Method and Regula Falsi

      20.2-3. Numerical Solution of Algebraic Equations: Computation of Polynomial Values

(a) Successive Multiplications

(b) Horner's Scheme

      20.2-4. Solution of Algebraic Equations: Iteration Methods

(a) General Methods

(b) Muller's Method

(c) The Bairstow Method

      20.2-5. Additional Methods for Solving Algebraic Equations

(a) The Quotient-difference Algorithm

(b) Graeffe's Root-squaring Process

(c) A Matrix Method

(d) Horner's Method

      20.2-6. Systems of Equations and the Problem of Finding Maxima and Minima

(a) Problem Statement. Iteration Methods

(b) Varying One Unknown at a Time

(c) Search Methods for Finding Maxima and Minima

(d) Treatment of Constraints

      20.2-7. Steepest-descent Methods (Gradient Methods)

(a) Method of Steepest Descent

(b) Descent with Computed Gradient Components

      20.2-8. The Newton-Raphson Method and the Kantorovich Theorem

    20.3. Linear Simultaneous Equations, Matrix Inversion, and Matrix Eigenvalue Problems

      20.3-1. “Direct” or Elimination Methods for Linear Simultaneous Equations

(a) Introduction

(b) Basic Elimination Procedure (Gauss's Pivotal-condensation Method)

(c) An Improved Elimination Scheme (Banachiewicz-Cholesky-Crout Method)

(d) Direct Methods in Matrix Form

      20.3-2. Iteration Methods for Linear Equations

(a) Introduction

(b) Gauss-Seidel-type Iteration

(c) Relaxation Methods

(d) Systematic Overrelaxation

(e) Steepest-descent Methods (Gradient Methods)

(f) A Conjugate-gradient Method

      20.3-3. Matrix Inversion

      20.3-4. A Partitioning Scheme for the Solution of Simultaneous Linear Equations or for Matrix Inversion

      20.3-5. Eigenvalues and Eigenvectors of Matrices

(a) The Characteristic Equation

(b) An Iteration Method for Her-mitian Matrices

(c) A Relaxation Method

(d) Jacobi-Von Neumann-Gold-stine Rotation Process for Real Symmetric Matrices

(e) Other Methods

    20.4. Finite Differences and Difference Equations

      20.4-1. Finite Differences and Central Means

      20.4-2. Operator Notation

      20.4-3. Difference Equations

      20.4-4. Linear Ordinary Difference Equations

(a) Superposition Theorems

(b) The Method of Variation of Constants

      20.4-5. Linear Ordinary Difference Equations with Constant Coefficients: Method of Undetermined Coefficients

      20.4-6. Transform Methods for Linear Difference Equations with Constant Coefficients

(a) The z–transform Method

(b) Sampled-data Representation in Terms of Impulse Trains and Jump Functions. Laplace-transform Method

      20.4-7. Systems of Ordinary Difference Equations (State Equations). Matrix Notation

      20.4-8. Stability

    20.5. Approximation of Functions by Interpolation

      20.5-1. Introduction

      20.5-2. General Formulas for Polynomial Interpolation (Argument Values Not Necessarily Equally Spaced)

(a) Lagrange's Interpolation Formula

(b) Divided Differences and Newton's Interpolation Formula

(c) Aitken's Iterated-interpolation Method

(d) The Remainder

      20.5-3. Interpolation Formulas for Equally Spaced Argument Values. Lozenge Diagrams

(a) Newton-Gregory Interpolation Formulas

(b) Symmetric Interpolation Formulas

(c) Use of Lozenge Diagram

      20.5-4. Inverse Interpolation

      20.5-5. “Optimum-interval” Interpolation

      20.5-6. Multiple Polynomial Interpolation

      20.5-7. Reciprocal Differences and Rational-fraction Interpolation

    20.6. Approximation by Orthogonal Polynomials, Truncated Fourier Series, and Other Methods

      20.6-1. Introduction

      20.6-2. Least-squares Approximation over an Interval

      20.6-3. Least-squares Polynomial Approximation over a Set of Points

(a) Optimum-interval Tabulation

(b) Evenly Spaced Data

      20.6-4. Least-maximum-absolute-error Approximations

      20.6-5. Economization of Power Series

      20.6-6. Numerical Harmonic Analysis and Trigonometric Interpolation

      20.6-7. Miscellaneous Approximations

    20.7. Numerical Differentiation and Integration

      20.7-1. Numerical Differentiation

(a) Use of Difference Tables

(b) Use of Divided Differences

(c) Numerical Differentiation after Smoothing

(d) Approximate Differentiation of Truncated Fourier Series

      20.7-2. Numerical Integration Using Equally Spaced Argument Values

(a) Newton-Cotes Quadrature Formulas

(b) Gregory's Formula

(c) Use of Euler-MacLaurin Formula

      20.7-3. Gauss and Chebyshev Quadrature Formulas

      20.7-4. Derivation and Comparison of Integration Formulas

      20.7-5. Numerical Evaluation of Multiple Integrals

    20.8. Numerical Solution of Ordinary Differential Equations

      20.8-1. Introduction and Notation

      20.8-2. One-step Methods for Initial value Problems: Euler and Runge-Kutta-type Methods

      20.8-3. Multistep Methods for Initial-value Problems

(a) Starting the Solution

(b) Simple Extrapolation Schemes

(c) Predictor-corrector Methods and Step-size Changes

      20.8-4. Improved Multistep Methods

      20.8-5. Discussion of Different Solution Methods, Step-size Control, and Stability

      20.8-6. Ordinary Differential Equations of Order Higher Than the First and Systems of Ordinary Differential Equations

      20.8-7. Special Formulas for Second-order Equations

      20.8-8. Frequency-response Analysis

    20.9. Numerical Solution of Boundary-value Problems, Partial Differential Equations, and Integral Equations

      20.9-1. Introduction

      20.9-2. Two-point Boundary-value Problems Involving Ordinary Differential Equations: Difference Technique

      20.9-3. The Generalized Newton-Raphson Method (Quasilinearization)

      20.9-4. Finite-difference Methods for Numerical Solution of Partial Differential Equations with Two Independent Variables

      20.9-5. Two-dimensional Difference Operators

      20.9-6. Representation of Boundary Conditions

      20.9-7. Problems Involving Three or More Independent Variables

      20.9-8. Validity of Finite-difference Approximations. Some Stability Conditions

      20.9-9. Approximation-function Methods for Numerical Solution of Boundary-value Problems

(a) Approximation by Functions Which Satisfy the Boundary Conditions Exactly

(b) Approximation by Functions Which Satisfy the Differential Equation Exactly

      20.9-10. Numerical Solution of Integral Equations

    20.10. Monte-Carlo Techniques

      20.10-1. Monte-Carlo Methods

      20.10-2. Two Variance-reduction Techniques

(a) Stratified Sampling

(b) Use of Correlated Samples

      20.10-3. Use of A Priori Information: Importance Sampling

      20.10-4. Some Random-number Generators. Tests for Randomness

    20.11. Related Topics, References, and Bibliography

      20.11-1. Related Topics

      20.11-2. References and Bibliography

20.1. INTRODUCTION

20.1-1. Chapter 20 outlines a number of the best-known numerical computation schemes, with emphasis on mathematical methods rather than on detailed layout or programming. Sections 20.4-1 to 20.5-7 introduce the calculus of finite differences and the solution of difference equations. Finite-difference methods are of great interest not only for the numerical solution of differential equations but also in connection with many mathematical models whose variables change in discrete steps by nature rather than by necessity.

20.1-2. Errors. Aside from possible mistakes (blunders) in the course of numerical computations, there may be errors in the initial data, round-off errors due to the use of a finite number of digits, and truncation errors due to finite approximations of limiting processes. Approximate effects of small errors Δxi or fractional errors Δxi/xi on a result f(xh x2, . . .) can often be studied through differentiation (Sec. 4.5-3); thus

image

The generation and propagation of errors in more complicated computations are subject to continuing studies; most analytical results of such studies are restricted to very specialized classes of computations (Refs. 20.4 to 20.7, 20.10, 20.44, 20.49, 20.58, and 20.59). It is desirable to check on mistakes and errors by various checking routines (e.g., resubsti-tution of approximate solutions into given equations) carried along between appropriate steps of the main computation. As a very rough rule of thumb, one may carry two significant figures in excess of those justified by the precision of the initial data or by the required accuracy. Convergent iteration schemes (Secs. 20.2-2, 20.2-4, 20.3-2, 20.8-2c, and 20.9-3) will tend to check and reduce the effects of errors, unless the errors affect the convergence.

A computing scheme (program of numerical operations, algorithm) is said to be numerically stable if it does not increase the (absolute or relative) effects of roundoff errors in the given data and in the calculations. More precise definitions of numerical stability can often be related to the asymptotic stability of solutions of difference equations (recurrence relations) in the manner of Sec. 20.8-5.

20.2. NUMERICAL SOLUTION OF EQUATIONS

20.2-1. Introduction. The numerical solution of any equation

image

should be preceded by a rough (often graphical) survey yielding information regarding the existence and position of real roots, estimates for trial solutions, etc. (see also Secs. 1.6-6 and 7.6-9). Solutions may be checked by resubstitution.

20.2-2. Iteration Methods. Newton-Raphson Method and Reg-ula Falsi. The following procedures apply, in particular, to the solution of transcendental equations.

(a) Rewrite the given equation (1) in the form

image

(see also Sec. 20.2-26). Starting with a trial solution z[0], compute successive approximations

image

The convergence of such approximations to a desired solution z requires separate investigation. The Contraction-mapping Theorem of Sec. 12.5-6 can often be applied to establish convergence and rate of convergence. The iteration is terminated whenimageis sufficiently small.

Collatz's Convergence Criterion and Error Estimate (Ref. 20.3). If it is possible to choose a complex-plane region D such that (1) there exists a positive real number M < 1 with the property

image

and (2) D contains z[0],z[1],and every point z having the properly

image

for some value of j > 0, then the approximations (3) converge to a solution z of Eq. (1) which satisfies Eq. (4) and is unique in D. Note that the right side of Eq. (4) is an upper bound for the error of the jth approximation z[j] (see also Sec. 12.5-6).

(b) In general, there are many possible ways of rewriting the given equation (1) in the form (2). Particular choices of φ(z) yield the following special forms of the iteration formula (3):

image

These iteration schemes are especially useful for the computation of real roots. To find complex roots of real equations, one must start with a complex trial solution z[0].

The convergence considerations and error criterion of Sec. 20.2-2a apply in each case. Equation (6) is a special case of the general Newton-Raphson-Kantorovich scheme of Sec. 20.2-7. If f'(z) exists and is continuously differentiable in the region under consideration, and one can find positive real bounds A, B, C such that

image

then the desired root z satisfies the last inequality, and the convergence rate of the Newton-Raphson iteration is given by

image

Note the relatively rapid convergence under these circumstances (see also Sec. 20.2-8 and Refs. 20.43 and 20.47).

USEFUL EXAMPLES: Application of the Newton-Raphson scheme (6) to 1/z — a = 0, z2 — a = 0, 1/z2 — 1/a2 = 0, and zn — a = 0 yields iteration routines for image

image

The iteration (10) converges for all z[0] between 0 and 2/a; (11), (12,) and (13) require only z[0] > 0.

(c) The Regula Falsi. The following iteration scheme applies particularly well to real equations and real roots and is used when f′(z) is not easily computed. Given Eq. (1), start with two trial values z = z[0], z = z[0] and obtain successive approximations

image

For continuous real functions f(z), one attempts to bracket each real root between approximations z[j] and z[k] such that f(z[j]) and f(z[k]) have opposite signs. Each z[j] can then be obtained by graphical straight-line interpolation; the scale of the graph should be appropriately increased at each step.

(d) Aitken-Steffens Convergence Acceleration. If f(z) is real and three times continuously differentiable for real z in a neighborhood of a real root where f′(z)1, then one can improve on the convergence of the iteration sequence (3) by the following iteration scheme

image

The iteration is terminated when one of the denominators turns out to be zero (in general, the desired accuracy is obtained before this if the sequence converges: see also Ref. 20.5). This method, like the regula falsi, may substitute for the fast-converging Newton-Raphson scheme if computation of f′(z) is too cumbersome.

(e) Multiple Roots. Iteration schemes based on Eq. (6) (Newton-Raphson method) and Eq. (7) will not converge in the neighborhood of a multiple root of the given equation. Note that multiple zeros of f(z)are zeros f′(z); for algebraic equations, the greatest common divisor of f(z) and f′(z)can be obtained by the method of Sec. 1.7-3.

(f) Interval Halving. If a single simple real root of a real equation (1) is known to lie between z = a and z = b, compute image. If this has the same sign as, say, f(b), compute image, etc.

20.2-3. Numerical Solution of Algebraic Equations:Computation of Polynomial Values, (a) Successive Multiplications. To compute values of a polynomial

image

for use in the iteration methods of Sec. 20.2-3b, compute successively aoz + a1, z(aoz + a1) + a2, . . . ; or obtain the desired quantities f(c), f′(c), f″(c)/2!, . . . by Horner's scheme.

(b) Horner's Scheme. Long division of f(z) by (zc) yields a new polynomial f1(z)and the remainder f(c) (Sec. 1.7-2); long division of f1(z) by (zc) in turn yields a new polynomial f2(z)and the remainder f′(c). Continuing in this manner, one obtains successive remainders f″(c)/2!, f″′(c)/3!, .... Note that these remainders are the coefficients of the polynomial

image

Horner's Scheme for Complex Arguments. Given a real polynomial f(z), let c = a + ib. Then f(c) = Ac + B = (Aa + B) + iAb, where Az + B is the (real) remainder of the quotient f(z)/(z2 — 2az + a2 + b2) (Ref. 20.3).

20.2-4. Solution of Algebraic Equations: Iteration Methods. (a) General Methods. To compute single simple roots of an algebraic equation

image

one may

1. Use the Newton-Raphson method of Eq. (6).

2. Employ Eq. (5) with k = l/an-1, calculating successive values of the polynomial

image

by Horner's scheme. If an-1 = 0, introduce u = zc and use Eq. (17) to rewrite Eq. (18) as F(u) = 0.

1. Attempt to bracket the root between argument values yielding function values of opposite signs, or use the regula falsi (Sec. 20.2-2c).

Convergence of these and the following iteration methods depends on the initial trial value. This may be based on a priori information, on approximate solutions obtained by one of the methods of Sec. 20.2-5, or on a root of a second-order polynomial approximating f(z). In the absence of such information, the trial value may simply be zero or a small real or complex number. In the latter case, Horner's scheme for complex arguments may be employed if f(z) is a real polynomial.

If f(z) is a real polynomial, one may simply attempt to find trial values a + ib so as to minimize the quantities A and B in Horner's scheme for complex arguments. See also Ref. 20.3.

As successive roots zi are found, the corresponding factors z — zi are divided out (Sec. 1.7-2). zi or zi(1 + i) may be tried as a starting value for iteration on the next root.

The Muller and Bairstow methods simplify calculations for complex roots and tend to converge better than the Newton-Raphson method when roots are close together.

(b) Muller's Method (Ref. 20.5). Substitution of parabolic for linear interpolation in the derivation of the regula falsi (Sec. 20.2-2c) produces the iteration algorithm

image

where the sign in the denominator is chosen so as to make its absolute value as large as possible, and

image

The solution may be started with z[0] = — 1, z[1] = 1, and z[2] = 0 (hence q2 = — 1/2), and f0, f1, f2 respectively replaced with anan-1 + an-2, an + an-1 + an-2, an.

Muller's method applies also to nonalgebraic equations.

(c) The Bairstow Method (Refs. 20.5, 20.9, and 20.54). Assume that the given equation (16) admits a quadratic factor x2uxv denning two distinct simple roots, and that one has trial values u[0], v[0] which approximate u and v sufficiently well. Then a sequence of improved approximations converging to u, v are obtained from

image

where the bk[j], ck[j] are consecutively obtained for each j for

image

This method has been found to work best for polynomials of even degree.

20.2-5. Additional Methods for Solving Algebraic Equations, (a) The Quotient-difference Algorithm (Ref. 20.5). The following scheme (a generalization of the classical method of Bernoulli, Refs. 20.5 and 20.9) can produce approximations to all roots of suitable algebraic equations (16) in one sequence of calculations. The method may be useful for finding trial values to be used in iteration methods.

For a (suitable) given equations (16), write the tabular arrangement of Fig. 20.2-1, where successive entries are obtained from

image

This procedure breaks down if a Q[k]i+1 or Ei[k] equals zero. The quotient-difference scheme exists, however, in many useful special cases, in particular, if all n roots z1, z2, . . . , zn of the given equation (16) are positive; or if they are simple roots with distinct nonzero absolute values. In the latter case,

image

image

FIG 20.2-1. A quotient-different table.

for each k, so that each Q column in Fig 20.2-1 yields an approximation to a root.

More generally, let |z1| ≥ |z2| ≥ . . . ≥ |zn| > 0. Assuming that the quotient-difference scheme does not break down} Eq. (23) still yields every root zk which differs in absolute value from its neighbors in the above sequence; and Eq. (24) still holds whenever |zk| > |zk+1|. This helps to identify roots with equal absolute values, such as complex-conjugate roots; see Ref. 20.5 for a refined procedure which actually approximates such roots.

(b) Graeffe's Root-squaring Process. Given the real algebraic equation

image

obtain the coefficients ai(1) of

image

by writing the array

image

Repeat this process, obtaining successively the coefficients ai(j) of

image

As j increases, the array usually assumes a definite pattern: (1) the double products in a column may become negligible, so that successive column entries become squares with equal signs (regular column*); all entries of a column may have equal signs and absolute values equal to a definite fraction of the squared entry above (fractionally regular column); (2) column entries may have regularly alternating signs (fluctuating column); and (3) a column may be entirely irregular.

Each pair of regular columns (say, k and k — r) separated by r — 1 non-regular columns corresponds to a set of r roots z of equal magnitude such that

image

These r roots are all either real or pure imaginary if the r — 1 separating columns are fractionally regular. Specifically,

      1. Two adjacent regular columns (say k and k — 1) yield a simple real root z such that

image

      2. Two regular columns (k and k — 2) separated by a fluctuating column indicate a pair of simple complex-conjugate roots z such that

image

In practice, one first finds the real and purely imaginary roots; determine signs by substitution or with the aid of Sec. 1.6-6. Lehmer's method, a refined version of Graeffe's scheme, permits direct determination of signs.

(c) A Matrix Method. It is possible to compute the roots of an nth-degree algebraic equation (11) with a0 = 1 as eigenvalues of the (n + 1) X (n + 1) “com-nanion matrix”

image

by one of the methods outlined in Sec. 20.3-5.

* M. B. Reed and G. B. Reed, Mathematical Methods in Electrical Engineering, Harper, New York, 1951.

(d) Horner's Method. Horner's method (for real roots) evaluates the coefficients of successive polynomials F1(u) ≡ f(u + c1), F2(u) ≡ F1(u + c2), ... by Horner's scheme, where c1 c2, . . . are chosen so as to reduce the absolute values of the remainders. If one succeeds in obtaining Fi(ci) 0, then the desired root is approximated by c1 + c2 + . . . + ci.

20.2-6. Systems of Equations and the Problem of Finding Maxima and Minima (see also Secs. 11.1-1, 12.5-6, 20.2-2, and 20.2-3). (a) Problem Statement. Iteration Methods. The problem of solving n simultaneous equations

image

for n unknowns x1, x2, . . . , xn is equivalent to the problem of minimizing the function

image

or some other increasing real function of the absolute values |fi| of the n residuals (errors) fi = fi(x1, x2, . . . , xn). The problem of minimizing (or maximizing) a given function of n variables is of great practical importance in its own right.

A useful class of iteration methods starts with a trial solution xi[0](i = 1, 2, . . . , n) and attempts to construct successive approximations

image

which converge to a solution xi as j → ∞.

Once the ratios image (“direction” of the jth step) are chosen, one may minimizeimageas a function F[j]) of the parameter λ[j] which determines the step size. For this purpose, F[j]) may be approximated by a Taylor series or by an interpolation polynomial (Sec. 20.5-1) based on three to five trial values of λ[i]. The latter method also applies to the computation of maxima and minima of a tabulated function F(x1, x2, . . . , xn).

The convergence of such iteration methods can, again, be investigated with the aid of the Contraction-mapping Theorem of Sec. 12.5-6; the trial solutions (x1, x2, . . . , xn) are regarded as vectors with the norm (x12 + x22 + . . . + xn2)1/2 or |x1| + \x2\ + . . . + |xn| (see also Ref. 20.5).

(b) Varying One Unknown at a Time. Attempt to minimize F by varying only one of the xi at each step, either cyclically or so as to reduce the largest absolute residual (see also Sec. 20.3-2c, relaxation). Special one-dimensional search methods are discussed in Refs. 20.9 and 20.57.

(c) Search Methods for Finding Maxima and Minima. Pure random searches, which merely select the largest or smallest value F(x1,x2,...xn) for a number of randomly chosen points (x1, x2, . . . , :xn) are useful mainly for finding starting approximations for iteration methods (Ref. 20.9). Random-perturbation methods start with a parameter point (x1, x2, . . . , xn) and proceed to apply sets of random perturbations Δx1, Δx2, . . . , Δxn to all unknowns at once until an improvement in the criterion function F is obtained. The resulting perturbed parameter point constitutes the next approximation, and the search continues. This technique which, unlike a pure random search, can take advantage of the continuity of a criterion function, is a highly useful hill-climbing method when gradient methods are frustrated by adverse features of the multidimensional terrain, such as “ridges,” “canyons,” “flat spots,” and multiple maxima and minima. Random-perturbation methods have been considerably refined by strategies involving step-size changes and preferential treatment of certain directions in parameter space depending on past successes or failures (Refs. 20.9 and 20.57).

(d) Treatment of Constraints (see also Secs. 11.3-4, 11.4-1, and 11.4-3). Maxima and minima with constraints φ(x1, x2, . . . , xn) = 0 (i = 1, 2, . . . , r) can be treated either by a penalty-function technique which adds a term of the form image to the given criterion function F(x1, x2, . . . , xn), where each Kk is a large positive constant for minimization (negative for maximization), and m is an even positive integer. Another technique projects the computed gradient vector on the constraint surface, so that the gradient search proceeds along the latter (Refs. 20.9 and 20.57). The most useful digital-computer routines for optimization include options for changing the search strategy when convergence is slow.

20.2-7. Steepest-descent Methods (Gradient Methods), (a) Method of Steepest Descent. Choose vi[j] = —∂F/∂xi where all derivatives are computed for xi = xi[j], and reduce the step size λ[j] as the minimum of F is approached (Fig. 20.2-2).

For analytical functions F and small fi, a Taylor-series approximation for F(λ[i]) yields the optimum step size

image

where all derivatives are computed for xi= xi[i]. Interpolation-polynomial approximations for F(λ[i])may be more convenient.

(b) Descent with Computed Gradient Components. Frequently, derivation of the gradient components ∂F/∂xk needed for a gradient descent is impossible or impractical. The required derivatives can then be approximated by difference coefficients ΔF/Δxk obtained through perturbations of one unknown xk at a time. Since this takes n gradient-measuring steps for each “working step” in the gradient direction, one may prefer to continue in this gradient direction until the criterion function F is no longer improved and remeasure the gradient only then. A number of refined gradient-descent techniques are discussed in Refs. 20.9 and 20.57.

image

FIG. 20.2-2. Criterion-function surface for a two-variable minimum problem, showing three minima, level lines, and gradient lines. In this example, all three minima of F(x1,x2) have the same value, zero. (Based on L. D. Kovach and H. Meissinger: Solution of Algebraic Equations, Linear Programming, and Parameter Optimization, in H. D. Huskey and G. A. Korn, Computer Handbook. McGraw-Hill, New York, 1962.)

20.2-8. The Newton-Raphson Method and the Kantorovich Theorem (see also Secs. 20.2-2 and 20.9-3). (a) The Newton-Raphson Method. Start with a trial solution xi[0], and obtain successive approximations xi[j+1] by solving the simultaneous linear equations

image

with xk =xk[j] (j =0, 1, . . .)

(b) Kantorovich's Convergence Theorem (see also Ref. 20.52).Solution of the linear equations (31) implies that the matrix [∂fi/∂xk]has an inverse (Sec. 13.2-3)image for the xk values in question. Let A, B, C be positive real bounds (matrix norms, Sec. 13.2-1) such that

image

if the initial trial values xi = xi[0] are substituted in the image Also, let the fi(x1, x2, . . . , xn) be twice continuously differentiable and satisfy

image

for all (x1, x2, . . . , xn) in the cubical region defined by

image

Then the given system (27) has a solution (x1, x2, . . . , xn) in the region defined by Eq. (33), and the rate of convergence is given by

image

which again indicates relatively rapid convergence.

20.3. LINEAR SIMULTANEOUS EQUATIONS, MATRIX INVERSION, AND MATRIX EIGENVALUE PROBLEMS

20.3-1. “Direct” or Elimination Methods for Linear Simultaneous Equations, (a) Introduction. The solution of a suitable system of linear equations

image

by Cramer's rule (Sec. 1.9-2) requires too many multiplications for practical use if n > 4. The following procedures solve a given system (1) by successive elimination of unknowns.

(b) Basic Elimination Procedure (Gauss's Pivotal-condensation Method). Let aIJ be the coefficient having the largest absolute value. To eliminate xJ from the ith equation (i ≠1), multiply the Ith equation by aiJ/aIJ and subtract from the ith equation. Repeat the process to eliminate a second unknown from the remaining n — 1 equations, etc.

(c) An Improved Elimination Scheme (Banachiewiez-Cholesky-Crout Method). Obtain xn, xn-1, . . . , x1 in that order, by “back substitution” from the equations

image

after calculating successive elements of the (n + 1) X n array

image

with the aid of the recursion formulas

image

For the computation of determinants, note

image

The Banachiewicz-Cholesky-Crout method requires less intermediate recording than the Gauss method and becomes particularly simple if the matrix [aik] is symmetric (aki = aik). In this important special case,

image

(d) Direct Methods in Matrix Form. Gaussian elimination (Sec. 20.3-16) can be interpreted as a transformation of the given system (1), i.e.,

image

by successive nonsingular transformation matrices T1 T2, . . . , Tn chosen so that the resulting system matrix TnTn-1 . . . T1A has a triangular form similar to that of Eq. (2).

Again, the solution scheme of Sec. 20.3-lc amounts to the definition of a single transformation matrix T such that

image

has the triangular form (2). Various alternative methods employ postmultiplication or similarity transformations instead of the premultiplication used here.

Several modified triangularization schemes associated with the names of Crout, Doolittle, Givens, Householder, and Schmidt are described in Refs. 20.9, 20.11, 20.15, 20.22, and 20.25. Some of the modifications help to reduce round-off-error effects (Ref. 20.58). Wilkinson (Ref. 20.9) gives a comparison of methods on the basis of numerical stability and number of multiplications.

20.3-2. Iteration Methods for Linear Equations, (a) Introduction. Iteration methods (see also Secs. 20.2-6 and 20.2-7) are usually preferred to direct methods for solving linear equations only if the system matrix [aik] is “sparse,” i.e., if most of the aik, especially those not near the main diagonal, are equal to zero. Precisely this is true for the (possibly very large) systems of linear equations generated by partial-differential-equation problems (Sec. 20.9-4). Given a system of linear equations (1) with real coefficients, each of the following iteration methods approximates the solution xi (i = 1, 2, . . . , n) by successive approximations of the form (20.2-20). The residuals obtained at each step of the iteration will be denoted by

image

NOTE: Many iteration schemes require the given coefficient matrix A ≡ [aik] to be normal with a positive-definite symmetric part (Sec. 13.3-4), or to be symmetric and positive-definite (Sec. 13.5-2). If this is not true for the given coefficient matrix, one may attempt to rearrange or recombine the given equations so as to obtain relatively large positive diagonal coefficients, or one may solve the equivalent system

image

whose coefficient matrix is necessarily symmetric and positive-definite whenever the system (1) is nonsingular.

(b) Gauss-Seidel-type Iteration. Rearrange the given system (1) (possibly by recombining equations and/or multiplication by constants) to obtain as large positive diagonal coefficients an as practicable. Start ing with a trial solution xi[0] (i = 1, 2, . . . , n), compute successive approximations

image

Both schemes are simple, but convergence is assured only if the matrix [aik] is positive definite (Sec. 13.5-2); and convergence may be slow.

(c) Relaxation methods depend on the (manual) computer's judg ment to obtain rapid convergence of an iteration process. Starting with a trial solution x1[0] (frequently simply x1[0] = 1, x2[0] = x3[0] = . . . = xn[0] = 0), one attempts to introduce successive approximations xi[j] in a loosely systematic fashion so as to reduce the n residuals (6) to zero. One tabulates the residuals fi[i] at each step and combines the following procedures:

      1. Basic Relaxation Procedure. At each step, “liquidate” the residual fI[j] having the greatest absolute value by adjusting xI alone:

image

Only rough values of xI[j+l] are required for the initial steps.

      2. Block Relaxation and Group Relaxation. Apply equal increments xi[j+1]xi[j] to a set (“block”) of xi[j]'s so as to liquidate one of the residuals fi[j+1], or so as to reduce the sum of all residuals to zero. The latter procedure is useful particularly if all the initial residuals fi[0] are of equal sign.

Group relaxation applies different increments to a chosen set of xi[j]'s for similar purposes.

      3. Over relaxation. One can often improve the convergence of a relax ation process by changing the sign of the residual operated on while reducing its absolute value, without liquidating it entirely at this stage.

Relaxation methods are particularly useful for the (manual) solution of the large sets of simple linear difference equations frequently used to approximate a partial differential equation (Sec. 20.9-3). Relaxation methods at their best exploit the manual computer's skill, experience, and knowledge of the underlying physical situation. Many special tricks apply to particular applications (Refs. 20.21 to 20.30).

(d) Systematic overrelaxation is employed in an important class of iteration methods for solving the “sparse” equation systems generated by partial-differential-equation problems. Consider a system of equa tions (1) rearranged so that all diagonal coefficients equal unity (aii= 1), and modify the iteration (9) to read

image

where ωj is a real overrelaxation factor > 1 chosen (either once and for all or at each step) so as to speed up the convergence. The choice of coy is discussed in Refs. 20.15, 20.22, 20.23, and 20.30; Ref. 20.9 contains a large bibliography.

(e) Steepest-descent methods (gradient methods) minimize a positive function like imagein the manner of Sec. 20.2-4d. If the given coefficient matrix [aik] is symmetric and positive definite, one may use

image

If the convergence is of an oscillatory nature, it may be possible to accelerate the convergence by multiplying the last term in Eq. (11) by 0.9 for some values j.

(f) A Conjugate-gradient Method. Given a system of linear equations (1) with a real, symmetric, and positive definite coefficient matrix [aik], start with a trial solution xi[0].Let vi[0] = fi[0], and compute successively

image

The resulting fi[j] satisfy Eq. (6). Without round-off errors, this procedure yields the exact solution xi = xi[n] after n steps; in addition, the method has all the advantages of an iteration scheme (Sec. 20.1-2), but it requires many multiplications.

It is possible to write conjugate-gradient algorithms directly applicable to any given coefficient matrix [aik]; such schemes amount essentially to a solution of Eq. (7) by the method of Eq. (12).

20.3-3. Matrix Inversion (see also Secs. 13.2-3 and 14.5-3). (a) The methods of Secs. 20.3-1 and 20.3-2 apply directly to the numerical inversion of a given nonsingular n X n matrix A = [aik], and also to the calculation of det [aik].

(b) An Iteration Scheme for Matrix Inversion. To find the inverse A-l of a given n X n matrix A, start with an n X n trial matrix X[0] (say X[0] = I) and compute successive approximations

image

If the sequence converges, the limit is A-1 (see also Sec. 20.2-26).

(c) (See also Sec. 13.4-7.) For every nonsingular n X n matrix A,

image

20.3-4. A Partitioning Scheme for the Solution of Simultaneous Linear Equations or for Matrix Inversion (see also Sec. 13.2-8). (a) A system of linear equations (1) can be written in matrix form as

image

(Sec. 14.5-3). Equation (15) can be rewritten in the partitioned form

image

The resulting matrix equations

image

which, once the (n — m) X (nm) matrix A22 has been inverted, furnishes m linear equations for the first m unknowns x1, x2,. . . , xm. This procedure may be useful, in particular, if only the first m unknowns are of interest.

(b) The inverse matrix A-1 is obtained in the partitioned form

image

so that the inversion of the n X n matrix A is reduced to the inversion of smaller m X m and (nm) X (nm) matrices.

20.3-5. Eigenvalues and Eigenvectors of Matrices (see also Secs. 13.4-2 to 13.4-6, 14.8-5, and 14.8-9). (a) The Characteristic Equation. The eigenvalues λk of a given n X n matrix A ≡ [aik]can be obtained as roots of the characteristic equation

image

by one of the methods outlined in Secs. 20.2-1 to 20.2-3. To avoid direct expansion of the determinant, one may evaluate FA(λ) for n + 1 selected values of λ (say λ = 0, 1, 2, . . . , n), so that the nth-degree polynomial FA(λ) is given (exactly) by one of the interpolation formulas of Sec. 20.5-3.

(b) An Iteration Method for Hermitian Matrices. Let A be a hermilton matrix, so that all eigenvalues are real; in many applications A is real and symmentric. Assuming that the eigen value λM with the largest absolute value (dominant eigenvalue) is nondegenerate, start with a trial column matrix x[0] such as {1,0,0,. . . , 0} and compute siccessive matrix products

image

where α1, is a convenient numerical factor chosen, for instance, so that the component of x[i+1] largest in absolute value equals 1. As j increases, x[j] will approximate an eigenvector associated with the dominant eigenvalue λM the latter may then be obtained from

image

This process converges more rapidly if |λM| is substantially larger than the absolute values of all other eigenvalues of A, and if the direction of x[0] is already close to that of the desired eigenvector; if x[0] = {1, 0, 0, . . . , 0} does not work well, try {0, 1, 0, 0, . . . ,0}, etc. One can accelerate the convergence by using A2 or A3 instead of A in Eq. (21).

Additional eigenvectors and eigenvalues are found after reduction of the given matrix (Sec. 14.8-6); see also Refs. 20.7 and 20.48 for other procedures.

If λM is m-fold degenerate, the vectors described by Eq. (21) will not in general converge, but for large j they will tend to stay in a subspace spanned by the eigenvectors associated with λM. Hence m linearly independent matrices x[j] will yield m mutually orthogonal eigenvectors by orthogonalization (Sec. 14.7-46).

(c) A Relaxation Method. Starting with a trial vector represented by the column matrix x, compute y = Ax and adjust the matrix elements ξi of x so that the greatest absolute difference between two of the quotients n11,n22 , . . . , nnn becomes as small as possible: the r)% are the matrix elements of y.

(d) Jacobi-Von Neumann-Goldstine Rotation Process for Real Symmetric Matrices. Given a real symmetric matrix A ≡ [aik] ≡ A[0], begin by eliminating the nondiagonal element aIK having the largest absolute value through the orthogonal transformation

image

(rotation through an angle ν1 in the “plane” spanned by eI and eK, see also Secs. 2.1-6 and 14.10-26). If aII = aKK then take ν1 = ±π/4, the sign being that of aw Apply an analogous procedure to A[l] to obtain A[2], and repeat the process. The product T1T2T3 . . . of the transformation matrices converges to an orthogonal matrix which diagonalizes the given matrix A even if there are multiple eigenvalues. Variations of the Jacobi method simply reduce all off-diagonal elements am with absolute values greater than a predetermined threshold in succession (Ref. 20.9).

(e) Other Methods. A number of other numerical methods for solving matrix eigenvalue problems, including those involving nonhermitian matrices, will be found in Refs. 20.9, 20.22 to 20.26, and 20.31.

20.4. FINITE DIFFERENCES AND DIFFERENCE EQUATIONS

20.4-1. Finite Differences and Central Means, (a) Let y = y(x) be a function of the real variable x. Given a set of equally spaced argument values xk = x0 + k Δx (k = 0, ±1, ±2, . . . ; Δx = h > 0) and a corresponding set or table of function values yk = y(xk) = y(x0 + k Δx), one defines the forward differences

image

and the backward differences

image

(b) Even though the function values yk = y(x0 + k Δx) may not be known for half-integral values of k, one can calculate the central differences

image

and the central means

image

(c) Finite differences are conveniently tabulated in arrays like

image

Note that (1) the computation of an rih-order difference requires r + 1 function values, and (2) the nth-order differences of an nth-degree polynomial y(x) are constant.

20.4-2. Operator Notation, (a) Definitions. Given a suitably defined function y = y(x) of the real variable x and a fixed increment Δx = h of x, one defines the displacement operator (shift operator) E by

image

where r is any real number.

The difference operators Δ, ∇, δ and the central-mean operator μ are defined by

image

(b) Operator Relations (see also Sec. 14.3-1 and Ref. 20.36).

image

together with ∇ = E-1Δ and δ = E-1/2Δ. Note also

image

If r is a positive integer, the series (15) terminates; otherwise, its convergence requires investigation.

(c) For suitably differentiable operands,

image

(operator notation for Taylor's series, Sec. 4.10-4; see also Table 8.3-1).

20.4-3. Difference Equations, (a) An ordinary difference equation of order r is an equation

image

relating values yk= y(xk) = y(xo + k Δx)of a function y = y(x) defined on a discrete set of values x = xk = x0 + k Δx, where Δx is a fixed increment. It is often convenient to introduce k = (xx0)/Δx = 0, ±1, ±2, . . . as a new independent variable. An ordinary difference equation of order r relates yk and a set of finite differences Δiyk, iyk, or δiykof order up to and including r; or the difference equation may relate yk and difference coefficients of order Δiyk/(Δx)i,∇iyk/(∇x)i, or δiyk/(Δ)i of order i up to and including r.

To solve a difference equation (17) means to find solutions y = y(x) such that the sequence of the yk satisfies the given equation for some range of values of k. The complete solution of an ordinary difference equation of order r will, in general, involve r arbitrary constants; the latter must be determined by accessory conditions on the yk, such as initial conditions or boundary conditions. The solution of a difference equation over any finite range amounts, in principle, to that of a set of simultaneous equations.

Difference equations are used (1) to approximate differential equations (Secs. 20.9-2 and 20.9-4) and (2) to deal with situations represented by discrete-variable models.

A SIMPLE EXAMPLE: SUMMATION OF SERIES. The problem of solving a first-order difference equation (recurrence relation) of the form

image

with a given initial value y0 = α0 is identical with the problem of summation of series (Sec. 4.8-5):

image

The problem is analogous to the integration of a differential equation y′ = f(x). ∊ and ∇ are inverse operators; note Eq. (12) and

image

(b) A partial difference equation relates values Φij ... = Φ(xo + i Δx, yo+ j Δy, . . .) (i, j. . . = 0, ±1, ±2, . . .) of a function Φ = Φ(x, y, . . .); the order of the partial difference equation is the largest difference between i values, j values, . . . occurring in the equation. Refer to Sec. 20.9-5 for formulas expressing various partial-difference operators in terms of function values Φij..., and for the use of partial difference equations to approximate solutions of partial differential equations.

20.4-4. Linear Ordinary Difference Equations, (a) Superposition Theorems (see also Secs. 9.3-1 and 15.4-2). A linear difference equation is linear in the values and differences of the unknown function. Thus a linear ordinary difference equation of order r has the form

image

where the αi(k) and f(k) are given functions of k = 0, ±1, ±2, . . . . As in Sec. 9.3-1, solutions yk corresponding to f(k) = αf1(k) + βf2(k) are the corresponding linear combinations of solutions corresponding to the individual forcing functions f1(k) and f2(k) (Superposition Principle). The complete solution of Eq. (21) can be expressed as the sum of any particular solution and the complete solution of the linear and homogeneous “complementary equation”

image

Again, every linear combination of solutions of a linear homogeneous difference equation (22) is itself a solution of Eq. (22).

The theory of ordinary difference equations parallels that of ordinary differential equations in many details (see also Ref. 20.67). In particular, a homogeneous linear difference equation (22) admits at most r solutions y(1)k, y(2)k, . . . linearly independent on the set k = 0, 1, 2, . . . . r such solutions are linearly independent if and only if the Casoratian determinant

image

is not identically zero for k = 0, 1, 2, ... . The Casoratian is analogous to the Wronskian in Sec. 9.3-2.

(b) The Method of Variation of Constants (see also Sec. 9.3-3). Assuming that r linearly independent solutions y(1)k, y(2)k . . . , y(r)k of the complementary equation (22) are known, the complete solution of the nonhomogeneous linear difference equation (21) is given by

image

After solving the r simultaneous linear equations (246) for the ΔCh(k), obtain each Ch(k) by summation as in Eq. (19).

20.4-5. Linear Ordinary Difference Equations with Constant Coefficients: Method of Undetermined Coefficients (see also Secs. 9.4-1 to 9.4-8). (a) The complete solution of the linear homogeneous difference equation

image

with given constant coefficients ao, a1, . . . , ar can be expressed in terms of normal-mode terms completely analogous to those of Sec. 9.4-lc, viz.,

image

where λ1, λ2, . . . , λr are the roots of the algebraic equation

image

provided that all r roots are different. If a root, say λ1, is an m-fold root, then the corresponding term in the solution (27) becomes (C1 + KC2 + k2C3 + . . . + km-1Cm)λ1k Two terms corresponding to complex conjugate roots λ = α ± may be combined into |λ|k(A cos + B sin (φ= arctan (β/α)). The coefficients Cj, A, B, . . . must be chosen so that the solution matches suitably given initial or boundary conditions on the yk.

The solution of the nonhomogeneous difference equation

image

can often be derived in the manner of Sec. 20.4-4, but the special methods of Secs. 20.4-6 and 20.4-7a may be more convenient.

20.4-6. Transform Methods for Linear Difference Equations with Constant Coefficients, (a) The z-transform Method (see also Secs. 8.7-3 and 9.4-5). Given a difference equation (28), z transformation of both sides with the aid of the shift theorem 2 of Table 8.7-2 yields the z transform

image

of the unknown solution sequence y0, y1, y2, . . . in the form

image

where the first term on the right, as in Sec. 9.4-5, represents a “normal response” to the given forcing sequence f(0), f(l), f(2), . . . , and the second term represents the effect of r given initial values yo, y1, y2. . . , yr-1. Specifically,

image

The unknown yk may be obtained as coefficients of l/zk in the power-series expansion of Yz(z), or one may utilize a table of z-transform pairs (Table 20.4-1). As in the case of Laplace transforms, Yz(z) can be reduced to a sum of simpler forms by a partial-fraction expansion, which yields normal modes corresponding to the roots of the characteristic equation (27).

(b) Sampled-data Representation in Terms of Impulse Trains and Jump Functions. Laplace-transform Method. If one formally admits the asymmetric impulse function δ+(t) and step-function differentiation in the sense of Sec. 21.9-6, a sampled-data sequence u0, u1, u2, . . . can be represented, on a reciprocal one-to-one basis, by a corresponding impulse train*

image

where t is a real variable, and T a real positive constant (sampling interval).

If the sampled-data sequences y0, y1, y2,. . . and f(0), f(l), f(2), . . . satisfy any difference equation (28), the corresponding functions y†(t), f†(t) satisfy the functional equation (difference equation for functions)

image

with appropriate initial conditions. Unlike Eq. (28), this relation admits Laplace transformation, with

image

(see also Sec. 8.2-2). The resulting transform method is analogous to the z-transform method with z = e8T = e8Δt where Δt = T .

To avoid the use of symbolic functions, one can, instead, represent the sequence u0, u1, u2,. . . by the corresponding jump function

image

* u†(t) is usually denoted by u*(t) in the literature on sampled-data systems.

Table 20.4-1. A Short Table of z Transforms

image

which can be physically interpreted as the output of a “zero-order data-hold” device. image satisfy the same functional equation as y†(t) and f†(t), so that Laplace transformation is again possible.

20.4-7. Systems of Ordinary Difference Equations (State Equations) . Matrix Notation. Just as in the case of differential equations, one may be given a system of ordinary difference equations involving two or more unknown sequences y(xk) = yk, z(xk) = Zk, . . . . One can reduce any rth-order difference equation (17) to a set of r first-order equa- tions by introducing the Eiyk or Δiyk (i = 1, 2, . . . , r — 1) as new variables (state variables, see also Sec. 13.6-1). Again, as in Sec. 13.6-1, a given system of linear first-order difference equations (recurrence relations, state equations)

image

with constant coefficients aij may be rewritten as a matrix equation

image

(see also Sec. 14.5-3), where Yk is the column matrix (yk, zk) . . . , F(k) is the column matrix {f1(k), f2(k), . . .}, and A ≡ [aij]. Given Y0{yo, zo} . . .}, the solution is

image

where Ak, in analogy with Sec. 13.6-26, is called the state-transition matrix for the system (36). The powers Ak required for the solution can be computed with the aid of Sylvester's theorem (Sec. 13.4-76), so that each eigenvalue of A, i.e., each root λ of the characteristic equation,

image

again corresponds to a normal mode (see also Secs. 13.6-2 and 20.4-5).

This method applies, in particular, to the solution of the rth-order linear difference equation (28) if it is first reduced to a system of the form (36) through the introduction of yk+1, yk+2,. . . , yk+r-1 as new variables, so that

image

20.4-8. Stability, (a) By analogy with Sec. 9.4-4, a linear difference equation (28) or a linear system (36) with constant coefficients will be called completely stable if and only if all roots of the corresponding characteristic equation (27) or (38) have absolute values less than unity; this ensures that the effects of small changes in the initial conditions tend to zero as k increases.

Reference 20.63 presents Jury's version of the Schur-Cohn test for stability, which is analogous to the Routh-Hurwitz criterion (Sec. 1.6-66) for roots with negative real parts.

(b) Lyapunov's definitions (Sec. 13.6-5) and the related theorems (Sec. 13.6-6) on the stability and asymptotic stability of solutions are readily extended to the solution sequences Y0, Y1, Y2, . . . of linear and nonlinear autonomous difference-equation systems

image

Generally speaking, conditions like dV/dt ≤ 0 for continuous-system Lyapunov functions V(y) simply translate into analogous conditions ΔV(Yk) ≤ 0 for discrete-parameter Lyapunov functions V(Yk) (Ref. 20.63).

20.5. APPROXIMATION OF FUNCTIONS BY INTERPOLATION

20.5-1. Introduction (see also Secs. 12.5-46 and 15.2-5). Given n + 1 function values y(xk) = yk, an interpolation formula approximates the function y(x) lists the most a suitable known function Y{x) = Y(x; α0, α1 α2 . . . , αn) depending on n + 1 parameters αj chosen so that Y(xk) = y(xk) = yk for the given set of n + 1 argument values xk. In particular, Secs. 20.5-2 to 20.5-6 deal with polynomial interpolation. Other interpolation methods are discussed in Secs. 20.5-7 and 20.6-6.

The use of interpolation-type approximations (and, in particular, of polynomial interpolation) is not always justified. Thus, in the case of empirical functions, one may wish to smooth fluctuations in the y(xk) due to random errors (Sec. 20.6-3).

20.5-2. General Formulas for Polynomial Interpolation (Argument Values Not Necessarily Equally Spaced). An nth-order polynomial interpolation formula approximates the function y(x) by an nth-degree polynomial Y(x) such that Y(xk) = y(xk) = yk for a given set of n + 1 argument values xk.

(a) Lagrange's Interpolation Formula. Given y0 = y(xo), y1 = y(x1), y2 = y(x2), . . . , yn = y(xn)

image

(b) Divided Differences and Newton's Interpolation Formula. One defines the divided differences

image

Then

image

Unlike in Eq. (1), the addition of a new pair of values xn+1, yn+1 requires merely the addition of an extra term. It is convenient to tabulate the divided differences (2) for use in Eq. (3) in the manner of Eq. (20.4-5).

Taking a divided difference is a linear operation (Sec. 15.2-7) on the function y(x). Each function (2) is completely symmetric in its arguments.

(c) Aitken's Iterated-interpolation Method. The following scheme may be useful if one desires values of the interpolation polynomial Y(x) rather than a simple expression for Y(x). Let Yijk... be the interpolation polynomial through (xi, yi), (xj, yj), (xk, yk), . . . , so that Y012...n = Y(x). Interpolation polynomials of increasing order are then obtained successively as follows:

image

The process may be terminated whenever two interpolation polynomials of successive orders agree to the desired number of significant figures.

(d) The Remainder. If y(x) is suitably differentiable, the remainder (error) Rn+1(x) involved in the use of any polynomial-interpolation formula based on the n + 1 function values y0, y1, y2, . . . , yn may be

image

FIG. 20.5-1. Lozenge diagram for interpolation formulas. The abbreviated notationimageis used. To convert a path through the lozenge to an interpolation formula, the following rules are formulated:

      1. Each time a difference column is crossed from left to right a term is added.

      2. If a path enters a difference column (from the left) at a positive slope, the term added is the product of the difference, say Δky-p, at which the column is crossed and the factorial (u + p - 1)k lying just below that difference.

      3. If a path enters a difference column (from the left) at a negative slope, the term added is the product of the difference, say Δky-P, at which the column is crossed and the factorial (u + p)k lying just above that difference.

      4. If a path enters a difference column horizontally (from the left), the term added estimated from

image

where ξ lies in the smallest interval I containing xo, x1, x2, . . . , xn, and x (see also Sec. 4.10-4; ξ will, in general, depend on x), and

image

20.5-3. Interpolation Formulas for Equally Spaced Argument Values. Lozenge Diagrams. Let yk = y(xk), xk = x0 + k Δx (k = 0, ±1, ±2, . . .), where Δx is a fixed increment as in Sec. 20.4-1, and introduce the abbreviation (x — x0)/Δx = u.

(a) Newton-Gregory Interpolation Formulas. Given y0, y1, y2 . . . or yo, y-1, y-2, . . . , Eq. (3) becomes, respectively,

image

(b) Symmetric Interpolation Formulas. More frequently, one is given y0, y1, y2, . . . and y-1, y-2, . . . ; Table 20.5-1 lists the most useful interpolation formulas for this case (see also Fig. 20.5-1). Note that Everett's and Steffensen's formulas are of particular interest for use with printed tables, since only even-order or only odd-order differences need be tabulated.

is the product of the difference, say Δky-p, at which the column is crossed and the average of the two factorials (u + p)k and (u + p — 1)k lying, respectively, just above and just below that difference.

5. If a path crosses a difference column (from left to right) between two differences, say Δy-(P+1) and Δky-p) then the added term is the product of the average of these two differences and the factorial (u + p)k at which the column is crossed.

6. Any portion of a path traversed from right to left gives rise to the same terms as would arise from going along this portion from left to right except that the sign of each term is changed.

7. The zero-difference column corresponding to the tabulated values may be treated by the same rules as any other difference columns provided one thinks of the lozenge as being entered just to the left of this column. Thus this column can be crossed by a path making a positive, negative, or zero slope, just as is true of the other columns. (Reprinted by permission from K. S. Kunz, Numerical Analysis, McGraw-Hill, New York, 1957.)

Table 20.5-1. Symmetric Interpolation Formulas

image

given as simplified polynomial including the effect of sixth-order differences (see also Refs. 20.13 and 20.16).

(c) Use of Lozenge Diagrams. Many interpolation formulas can be obtained with the aid of a lozenge diagram (Fraser diagram) like that shown in Fig. 20.5-1. One can derive more sophisticated interpolation formulas (e.g., Everett's formula) by averaging two or more equivalent interpolation polynomials.

20.5-4. Inverse Interpolation, (a) Given xk = x0 + k Δx, yk = y(xk) for k = 0, 1, 2, . . . , n, it is desired to find a value ξ of x between xo and x1 so that y(x) assumes a given value η ; Δx is assumed to be so small that is unique. One may apply the general interpolation formulas (1) or (3) by reversing the roles of x and y. Alternatively, one may apply one of the iteration schemes of Sec. 20.2-2 to solve the equation

image

where Y(x) is a suitable interpolation polynomial approximating y(x); one uses linear interpolation for the first step of the iteration, second-order interpolation for the second step, etc.

(b) Inverse Interpolation by Reversion of Series. Use any desired interpolation formula and write it as a power series Y(x) = a0 + a1x + a2x2 + . . . ≈ y. Then

image

20.5-5. “Optimal-interval” Interpolation(see also Sec. 20.6-3). The nth-degree polynomial Y(x) which equals y(x) at n + 1 points x= xi (j = 0, 1, 2, . . . , n)of [a, b] and minimizesimagewill approximately minimize the maximum absolute value of the interpolation error (4) in [a, b]. The required polynomial Y(x) is given by

image

where image is the kth-degree Chebyshev polynomial (Sec. 21.7-4), and

image

20.5-6. Multiple Polynomial Interpolation. To approximate z = z(x, y) by a polynomial Z(x, y) such that Z(x, y) = z(x, y) for a given set of “points” (xi, yk), one may first interpolate with respect to x to approximate z(x, yk) for a set of values of k; interpolation with respect to y will then yield Z(x, y).

Alternatively, one can substitute an interpolation formula for y into a formula for interpolation with respect to x. Thus, if Δx = Δy = h is a given fixed increment, and

image

Tabic 20.5-2. Interpolation Coefficients*

LAGRANGE FIVE-POINT INTERPOLATION

image

NOTE: All coefficients become exact if each terminal 8 is replaced by 75, and each terminal 3 by 25.

* From F. B. Hildebrand, Introduction to Numerical Analysis, McGraw-Hill. New York, 1956,

NEWTON INTERPOLATION

image

STIRILING INTERPOLATION

image

BESSEL INTERPOLATION

image

EVERETT INTERPOLATION

image

STEFFENSEN INTERPOLATION

image

one may use Bessel's interpolation formula (Table 20.5-1) twice to obtain

image

Analogous methods apply to functions of three or more variables.

20.5-7. Reciprocal Differences and Rational-fraction Interpolation. Given y(xk) = yk(k = 0, 1, 2, . . .), where xo, x1, x2. . . are not necessarily equally spaced, one defines the reciprocal differences

image

Then

image

where one successively substitutes

image

yields a continued-fraction expansion approximating y(x) by a rational function Y(x) which assumes the given function values yo, y1, y2 . . . for x = xo, x1 x2, . . . (Thiele's Interpolation Formula, see also Ref. 20.36). The continued fraction terminates whenever y(x) is a rational function.

20.6. APPROXIMATION BY ORTHOGONAL POLYNOMIALS, TRUNCATED FOURIER SERIES, AND OTHER METHODS

20.6-1. Introduction. In practice, polynomial interpolation is best suited for analytic functions uncorrupted by noise or random errors, since high-order interpolation polynomials tend to follow the latter too closely, and low-order interpolation polynomials may waste essential information. In such cases, it is preferable to employ “smoothed” polynomial or rational-fraction approximations designed to minimize either a weighted mean-square approximation error or the maximum absolute approximation error over a specified interval a < x < b. By contrast, Taylor-series approximations, which approximate an analytic function closely in the immediate neighborhood of one given point are, in general, useful for numerical approximation only if convergence is extraordinarily rapid.

20.6-2. Least-squares Polynomial Approximation over an Interval (see also Sec. 15.2-6). It is desired to approximate the given function f(x) by

image

so as to minimize the weighted mean-square error

image

over the expansion interval (a, b), where γ(x) is a given nonnegative weighting function. If the φk(x) are mutually orthogonal real functions such that

image

then the desired coefficients ai are given by

image

Orthogonal-function approximations, exemplified by orthogonal-polynomial expansions (Secs. 20.6-2 to 20.6-4) and truncated Fourier series (Sec. 20.6-6), have the striking advantage that an improvement of the approximation through addition of an extra term an+1φn+1(x) does not affect the previously computed coefficients ao, a1, a2, . . . , an. Substitution of x= αz+ β, dx = α dz in Eqs. (1) to (4) yields rescaled and/or shifted expansion intervals. Note that computation of the coefficients (4) requires one to know f(x) throughout the entire expansion interval (a, b).

20.6-3. Least-squares Polynomial Approximation over a Set of Points. A different type of least-squares approximation of the form (1) requires f(x) to be given only at a discrete set of m + 1 points, x0, x1, x2) . . . xm One minimizes the weighted mean-square error

image

where the γk are given positive weights. This is again relatively simple if the φk(x) are chosen to be polynomials of degree k and are mutually orthogonal in the sense that

image

Such polynomials can be obtained from 1, x, x2, . . . through the Gram-Schmidt orthogonalization procedure of Sec. 14.7-4. The coefficients aiare given by

image

The resulting polynomial will be an interpolation polynomial if n = m; if n < m,addition of extra terms aiφi(x) again leaves the earlier terms unchanged. Two special cases are of particular interest.

(a) Optimum-interval Tabulation. If one is free to choose the m + 1 argument values x0, x1, x2,. . . , xm, say in the interval (—1, 1),one can pick the m + 1 roots of the m + 1st Chebyshev polynomial Tm+1(x) (Sec. 21.7-4), i.e.,

image

The resulting approximation polynomials defined by Eq. (6) for unit weights λk = 1 will be the Chebyshev polynomials Ti(x) (see also Sec. 20.5-5).

(b) Evenly Spaced Data (see also Ref. 20.6). If one has m + 1 = 2M + 1 arguments xk evenly spaced in an expansion interval (a, b), so that

image

then the polynomials φi(x) satisfying Eq. (7) for unit weights λk= 1 are given by

image

where the pi(t, 2M) are the Gram polynomials*

image

* The normalization (12) chosen is that of Ref. 20.6. Other normalization factors are used by different authors.

with

image

The first five Gram polynomials are

image

In particular, for M = 2 (five data points),

image

If we denote the given function values f(xk) by fk, Gram-polynomial approximation with M = 2 yields the following smoothed approximation-polynomial values, where FiF(xi) Gi(f0,f1,f-1 . . .) ≡ Gi(f0,f1,f-1, . . . )

image

(“smoothing by fourth differences”; δkfo is denned in Sec. 20.4-1).

20.6-4. Least-maximum-absolute-error Approximations, (a) The computation of the coefficients αi for an approximation polynomial (1) which minimizes the maximum absolute error |F(x) — f(x)| on (a, b) for f(x) given either on the entire interval or on a discrete set of points requires a fairly laborious iterative procedure. Chebyshev's classical method is described in Ref. 20.61, while Hastings (Ref. 20.41) has evolved a heuristic method relying heavily on the investigator's intuition in relocating the zero-error points in successive plots of the error F(x) — f(x) by iterative parameter changes. Hasting's method is also applicable to more general nonpolynomial approximations of the form F(x) =F(x; α1, α2, . . . , αn), where α1,α2,. . . , αn,are suitably adjusted parameters.

In practice, approximations of the form (1) in terms of Chebyshev polynomials (shifted and rescaled as needed, see also Sec. 20.6-46), although really derived from least-square considerations, are so close to the ideal least-absolute-error polynomial approximation that most texts on numerical analysis omit the laborious derivation of the latter. This is easily seen whenever it is reasonable to assume that the error of a Chebyshev-polynomial approxi mation

image

(b) It is often convenient to employ the shifted Chebyshev polynomials T*n(x) defined by

image

(Refs. 20.4 and 20.12).

(c) Table 20.6-1 lists the first few polynomials Tn(x) and T*n(x)together with expressions for 1, x, x2, ... in terms of these polynomials(see also Sec. 21.7-4). Tables 20.6-2 to 20.6-4 list some useful polynomial approximations.

20.6-5. Eeonomization of Power Series (Refs. 20.4 and 20.12). If computation of orthogonal-function-expansion coefficients is laborious, but a truncated-power-series approximation to f(x)is readily available, we can reduce its degree with little loss of accuracy by expressing higher powers of x in terms of lower powers and a Chebyshev polynomial. For

Table 20.6-1. Cliebyshev Polynomials Tn(x) and Tn*(x), and Powers of x

image

example, the left-hand side of Table 20.6-1 yields

image

The last term fluctuates with the relatively small amplitude 1/256 in (— 1, 1). We omit the last term and substitute the rest for x9, leaving a seventh-degree polynomial; the procedure may then be repeated for x7, etc.

Instead, we could express all powers in terms of Chebyshev polynomials and neglect the higher-degree polynomials, which will tend to have small coefficients. The resulting expansions on (0, 1) or (— 1, 1) are, in general, not identical with the true least-squares polynomial expansion of f(x), but the method is convenient,

20.6-6. Numerical Harmonic Analysis and Trigonometric Interpolation (see also Secs. 4.11-46 and 20.6-1). (a) Given m function values y(xk) = yk for xk = kT/m (k = 0, 1, 2, . . . , m — 1), it is desired to approximate y(x) within the interval (0, T) by a trigonometric polynomial

image

so as to minimize the mean-square errorimage. The required

Table 20.6-2. Some Polynomial Approximations

image

image

image

Table 20.6-3. Some Approximations for Cylinder Functions*

image

Table 20.6-4. Chebyshev-polynomial Approximations*

image

coefficients Aj, Bj are

image

In the special case n = m/2, Eqs. (18) and (19) together with

image

yield Y(xk) = y(xk) (trigonometric interpolation) for arbitrary Bn.

See Ref. 20.62 for similar formulas applicable when the Xk are not equidistant. Numerical methods for multidimensional Fourier analyses and syntheses are discussed in Ref. 20.2. Reference 20.70 discusses “fast” Fourier-analysis routines.

(b) The 12-ordinate Scheme. The calculation of the sums (20b)is simplified whenever mis divisible by 4. Table 20.6-5 shows a convenient computation scheme for m = 12.

If no harmonics higher than the third are required, note the simpler formulas

image

The four additional function values required for Eq. (22) can often be read directly from a graph of y(x) vs. x.

See Ref. 20.62 for a 24-ordinate scheme.

(c) Determination of Unknown Periodic Components (Prony's Method). Given N values f0 = f(0), f1 = f(1), f2 = f(2), . . . , fN-1 = f(N - 1) of an empirical function f(u)assumed to have the form

image

then cos w1 cos w2, . . . , cos wn are the m roots of the (algebraic) equation

image

whose coefficients αk must satisfy the linear equations

image

For the best least-squares fit, solve the m linear equations

image

Table 20.6-5. 12-ordinate Scheme for Harmonic Analysis

image

for the m coefficients αk. Once the ωk are known, it is relatively easy to find the Ak, Bk for the best least-squares fit in the manner indicated in Sec. 20.6-6o.

The frequencies of sinusoidal components in statistical time series can often be identified by inspection of the empirically determined autocorrelation function (Secs. 18.10-9 and 19.8-3c).

20.6-7. Miscellaneous Approximations, (a) More general approximation methods are not restricted to sums of the form (1) but employ a rational or otherwise readily computable approximation function F(x) ≡ F(x; α1, α2, . . . , αn) with parameters α1, α2, . . . , αndetermined so as to minimize a weighted mean-square error or the maximum absolute approximation error.

(b) The Padé method (Refs. 20.15 and 20.16) produces rational-fraction approximations (Padé-table entries)

image

to a given suitably differentiable function fix) by matching the m + n + 1 polynomial coefficients of the identity

image

The resulting equations determine the m + n + 1 coefficients ai and bk, if a solution exists. It is understood that the numerator and denominator in Eq. (23) have no common factor. The Rmo(x) are simply truncated MacLaurin series (Sec. 4.10-4).

EXAMPLE: For f(x) = ex, Rnm(x) = l/Rmn(-x),and

image

(c) Padé approximations, like truncated Taylor or MacLaurin series, suffer from errors increasing with distance away from a specific expansion point. By contrast, Maehly's method (Refs. 20.15, 20.42, and 20.43) derives rational-fraction approxima tions as ratios of two truncated Chebyshev series. As another alternative, Ref. 20.41 describes a quite generally applicable heuristic iteration method for finding mini mum-absolute-error approximations of the general form F(x; α1, α2,. . . , αn,).

Further discussions of the use of approximation methods in connection with digital computation will be found in Refs. 20.38 to 20.46. The optimal form of the approximation chosen depends not only on the function to be approximated but also on the hardware and software configuration of the computer used. A number of examples of various types are presented in Table 20.6-6. In general, approximations computed for successive expansion intervals are “pieced together,” and it is again possible to trade expansion-interval size for arithmetic complication. For frequently used, well-known functions, an approximation routine is, in general, substantially faster as well as more economical of computer storage than interpolation from a stored table. For digital computers incorporating fast division hardware, rational-function approximations are, in general, superior to polynomial approximations.

Table 20.6-6. Miscellaneous Approximations*

image

20.7. NUMERICAL DIFFERENTIATION AND INTEGRATION

20.7-1. Numerical Differentiation. Numerical differentiation is subject to errors due to insufficient data, truncation, etc., and should be used with caution.

(a) Use of Difference Tables (equidistant argument values, see also Sec. 20.4-1). For suitably differentiate operands y(x),

image

Differentiation of Stirling's and Bessel's interpolation formulas (Table 20.5-1) yields, respectively,

image

Many similar formulas, and also formulas for approximating higher-order derivatives, may be derived by differentiation of suitable interpolation formulas (see also Fig. 20.9-1).

Note also the following explicit three-point differentiation formulas with error estimates

image

where y-1 < ξ < y1. See Ref. 20.17 for analogous five-point formulas.

(b) Use of Divided Differences (see also Sec. 20.5-26). Given the function values y(xk) for a set of not necessarily equally spaced argument values x0, x1, x2, . . . , differentiation of Newton's interpolation formula (20.5-3) yields

image

See Ref. 20.6 for a discussion of the errors due to the interpolation-polynomial approximation.

(c) Numerical Differentiation after Smoothing. The following formulas are based on differentiation of smoothed polynomial approxi mations rather than on interpolation polynomials and may thus be less affected by random errors in empirical data (Ref. 20.12).

image

For n = 2, this yields

image

Some other formulas are

image

(d) Approximate Differentiation of Truncated Fourier Series. If y(x) is approximated by an (n + l)-term truncated Fourier series (20.6-18), Lanczos (Ref. 20.12) suggests estimation of y′(x) from

image

20.7-2. Numerical Integration Using Equally Spaced Argument Values, (a) Newton-Cotes Quadrature Formulas. Quadrature for- mulas of the closed Newton-Cotes type (Table 20.7-1) use the approximation

image

where the yk= y(xk) are given function values for n + 1 equally-spaced argument values xk= x0 + k Δx (k = 0,1, 2, . . . , n); the resulting error vanishes if y(x) is a polynomial of degree not greater than n. Instead of using values of n > 6, one adds m sums (6) of n ≤6 terms for successive subintervals:

image

(b) Gregory's Formula. The “symmetrical” formula

image

adds correction terms to the trapezoidal rule of Table 20.7-1. The

Table 20.7-1. Quadrature Formulas of the Closed Newton-Cotes Type

image

formula is derived in the manner of Sec. 20.7-4 and is suitably truncated in each case. The formula is exact for polynomials y(x) of up to the (2m + l)st degree if up to 2mth-order differences are included (m = 0, 1, 2, . . .).

In particular, omission of first-order and higher differences yields the trapezoidal-rule formula

image

which is exact for first-degree polynomials y{x). Omission of third-order and higher differences produces

image

which is exact for third-degree polynomials y(x).

(c) Use of Euler-MacLaurin Formula. The Euler-MacLaurin summation formula (4.8-10) yields a number of quadrature formulas involving values y′k = y′(xk), y″k = y″(xk), ... of the derivatives of y(x) as well as of y(x). Thus,

image

again adds correction terms to the trapezoidal rule.

20.7-3. Gauss and Chebyshev Quadrature Formulas, (a) Rewrite the given definite integral image with the aid of the transformation

image

and approximate the latter integral by

image

where the n argument values ξk are the n zeros of the nth-degree Legendre polynomial Pn(ξ) (Sec. 21.7-1). Table 20.7-2 lists the ξk and ak for a number of values of n.

The error due to the use of the Gauss quadrature formula (18) is

image

(b) A simpler class of quadrature formulas is obtained with the aid of the transformation (9) and an approximation of the form

image

Table 20.7-2 lists the ξ′k for a number of values of n. The use of equal weights minimizes the probable error if y(x) is affected by normally distributed random errors. The derivation of the ξ′k is discussed in Refs. 20.6 and 20.10.

Table 20.7-2. Abscissas and Weights for Gauss and Chebyshev Quadrature Formulas (Sec. 20.7-3; adapted from Ref. 20.11)

(a) Abscissas ξ′k and Weights ak for the Gauss Quadrature Formula (18)

image

(b) Abscissas ξ′ for the Chebyshev Quadrature Formula (20)

image

(c) Abscissas ξ′ and Weights ak for the Gauss-Laguerre Quadrature Formula (21)

image

For n = 3, the error due to the use of the Chebyshev quadrature formula (20) is image

(c) Note also

image

where the ξk are the zeros of the nth-degree Laguerre polynomial Ln(ξ) (Sec. 21.7-1); and

image

where the ξk are the zeros of the nth-degree Hermite polynomial Hn(ξ) (Sec. 21.7-1).

image

The ξk are seen to be zeros of the nth Chebyshev polynomial (Sec. 21.7-4).

Many similar formulas exist (Refs. 20.6 and 20.56). Table 20.7-2 lists some of the ξk and αk for Eqs. (21) and (22).

20.7-4. Derivation and Comparison of Integration Formulas, (a) In general, one can derive unknown coefficients ak and/or abscissas xk in an integration formula

image

as follows (Ref. 20.4):

      1. The formula is required to hold exactly for y(x) = 1, x, x2,. .. . , xm(m < n). This yields m equations

image

for the unknown akand xk.

      2. One may prescribe some or all of the xk (Newton-Cotes formulas, Gregory's formula) and/or the ak(Chebyshev Quadrature).

      3. One may impose constraints on (relations between) the akeither for symmetry (Gregory's formula) or to minimize round-off-error effects. For the latter purpose, all ak should be positive (subtraction increases round-off-error effects, Sec. 20.1-2).

The relative desirability of these measures depends on the application. The xk may or may not be in the integration interval. The same method also applies to formulas containing added derivative terms like bky′(xk),as in Sec. 20.7-2c.

(b) The Gauss-type integration formulas, like Eq. (18), are exact if η(ξ) is a polynomial of degree ≤2n — 1, while Newton-Cotes formulas are exact only for polynomials of degree ≤n. In this sense, Gaussian formulas are best for functions η(ξ) possessing derivatives of high order. If η(ξ) has only a piecewise-continuous first-order derivative, then repeated use of the simple trapezoidal rule is about as good as any quadrature formula. See Ref. 20.56 for modified Gaussian quadrature formulas exact for trigonometric polynomials and other special functions.

20.7-5. Numerical Evaluation of Multiple Integrals. Multiple integrals may be evaluated by repeated application of the procedures outlined in Secs. 20.7-2 and 20.7-3; or divide the domain of integration into subregions separated by coordinate lines and use the approximations

image

image

where fijk = f(i Δx, j Δy, k Δz), Δx = Δy = Δz = h (i, j, k = 0, ± 1).

image

A simple two-dimensional Gauss-type integration formula (Ref. 20.56) is

image

Reference 20.56 also gives integration formulas for s-dimensional cubes, spheres, and other regions. Monte-Carlo methods (Sec. 20.10-1) are also of interest for multidimensional integration.

20.8. NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS

20.8-1. Introduction and Notation. Refer to Chap. 9 for ordinary methods of solution. A rough graphical solution (Sec. 9.5-2) may precede the numerical solution for orientation purposes. Sections 20.8-2 to 20.8-8 deal with initial-value problems; the solution of boundary-value problems is discussed in Secs. 20.9-1 and 20.9-3.

To solve the first-order differential equation

image

for a given initial value y(xo) = yo, consider fixed increments Δx = h of the independent variable x. Let xk = x0 + k Δx (k = 0, 1, 2, . . .), and denote successive samples of the computed (in general, approximate) solution y(x) and its derivative y′(x) by

image

In the absence of round-off errors, the difference yk+1yTRUE(xk+1) is truncation error due to a stepwise approximation of continuous integration. If exact solution values yTRUE(xk), yTRUE(xk-1), . . . are substituted for yk, yk-1, . . . then yk+1 — yTRUE(xk+1) is the local truncation error for the given integration formula. The true truncation error, however, is affected by propagated errors from earlier solution steps as well as by the local truncation error (Sec. 20.8-5).

20.8-2. One-step Methods for Initial-value Problems: Euler and Runge-Kutta-type Methods, (a) For sufficiently small increments Δx = h and sufficiently small k = 0, 1, 2, ... , the following simple recursion formulas produce stepwise approximations yk to the solution y = y(x) of Eq.(1):

image

image

image

(b) Table 20.8-1 lists Runge-Kutta-type routines for the numerical solution of Eq. (1). Methods (a) and (b) in Table 20.8-1 are usually called “third-order” methods,* because the formulas for yk+1 are exact for f(x, y) = 1, x, x2, x3; for suitably differentiate f(x, y), the local truncation error is 0[(Δx)4] as x → 0 (Sec. 4.4-3). Methods (c), (d), and (e) are, by an analogous definition, “fourth-order” methods. See Refs. 20.11 and 20.64 for higher-order Runge-Kutta formulas (see also Sec. 20.8-5).

20.8-3. Multistep Methods for Initial-value Problems, (a) Starting the Solution. Given the initial value y0, each of the following solution schemes requires one to compute the first three to five function values y1, y2, . . . by one of the methods of Secs. 9.2-5, 20.8-2, or 20.8-4.

* Some authors call (a) and (b) “fourth-order” methods, and (c) and (d) “fifth-order” methods instead.

This “starting solution” should be computed more accurately than the required solution by at least a factor of 10. If the Runge-Kutta method is used to start the solution, employ a step size Δx = h smaller than that required for the subsequent difference scheme.

Table 20.8-1. Some Runge-Kutta-type Methods for Ordinary Differential Equations (Sec. 20.8-2) or Systems of Differential Equations (Sec. 20.8-5)

image

(b) Simple Extrapolation Schemes. Given yk, yk-1, yk-2, . . . , one approximates successive solution values

image

by integrating an extrapolation polynomial through fk, fk-1, fk-2, . . . instead of f(x, y). Using the second interpolation formula (20.5-6), one obtains

image

Truncation of the general “open” integration formula (7) after successively higher-order difference terms yields first the Euler formula (3) and the open trapezoidal rule

image

both used in digital differential analyzers, and then the third-order formula

image

and the fourth-order Adams-Bashforth predictor of Table 20.8-2.

(c) Predictor-corrector Methods and Step-size Changes. Denoting the “predicted” value (7) of yk+1 as yk+1PRED, one can improve the approximation of Sec. 20.8-36 by requiring the interpolation polynomial to assume the value fPREDk+1 = f(xk+1, yPREDk+1)- The resulting “corrected” yk+1 is obtained by using the predicted fk+1in the “closed” integration formula

image

The corrector (10) is truncated like the predictor (7). The resulting difference yCORRECTEDk+1yPREDICTEDk+1 measures the local truncation error of the corrected approximation and is reduced to some preassigned value by suitable selection of the increment Δx. To halve the step size for fourth-order formulas, use the interpolation formulas

image

To double the step size, one may restart, or use past solution values stored for this purpose (see also Ref. 20.4).

A corrected solution value yk+1 can be successively improved further if it is resub-stituted into the corrector formula as a new predicted value. In practice, step-size reduction and/or the use of modifiers (Sec. 20.8-46) is often preferred to such iteration.

20.8-4. Improved Multistep Methods, (a) More general “open” integration formulas (useful as predictors) and “closed” integration formulas (useful as correctors) can be written in the respective forms

image

image

Instead of determining all the coefficients by making each formula exact for f(x, y) = 1, x, x2, . . . , one usually prefers to require only “fourth-order” matching (up to and including x4); the coefficients thus left undetermined are chosen so as to improve error propagation and/or simplify computation (Sec. 20.8-5). Table 20.8-2 lists a number of useful fourth-order predictor and corrector formulas.

Table 20.8-2. Some Fourth-order Predictor-corrector Methods

Each predictor-corrector scheme may be used with or without modifiers} and fMODk+1 = f(xk+1, yMODk+!). In each case, the magnitude of the correction on the last line is an unner bound for the local truncation error.

image

(b) Use of Modifiers (Ref. 20.4). In each predictor-corrector method, the difference yCORRECTEDk+1yPREDICTEDk+1 is roughly proportional to the local truncation error. Hence, one may improve the solution by adding a fraction α(yCORRECTEDk+1yPREDICTEDk) 0f the preceding difference to yPREDICTEDk+1 before substitution in the corrector, and subtracting (1 - α)(yCORRECTEDk+1yPREDICTEDk+1) from yCORRECTEDk+1 to obtain yk+1 (Table 20.8-2).

(c) The Fourth-order method of Clippinger and Dimsdale (Ref. 20.11) requires iteration of

image

starting with a trial value for yk+2 say yk+2 yk + 2fk Δx, and permits step-size changes by simple substitution of new values of Δx. The method is self-starting, but once the solution is started, one can save iterations by employing extrapolation to predict yk+2.

20.8-5. Discussion of Different Solution Methods, Step-size Control, and Stability, a)Integration formulas specifically selected for low local truncation error can emphasize error propagation over successive solution steps. The design of differential-equation-solving routines requires a compromise between local truncation error, stability, and computation time. In addition, formula coefficients which produce summation terms of equal sign and not too different absolute values are preferred, so as to reduce fractional errors due to round-off. The final choice depends on the application and on the computer used. Double-precision accumulation of dependent variables is frequently indicated.

If the given f(x, y) is at all complicated, the principal cost (time) of the computation is associated with calculations of derivative values f(x, y). For differential-equation problems with reasonably smooth (repeatedly differentiable) integrator inputs f(x, y), multistep integration schemes require relatively few derivative calculations and conveniently permit timesaving automatic step-size control in terms of |yk+1CORRECTEDyk+1PREDICTED|. Runge-Kutta-type methods tend to be very stable (Sec. 20.8-56) and do not require a separate starting routine; they are, therefore, often preferred for problems involving frequent step inputs. Runge-Kutta schemes require relatively more derivative computations per step, and efficient step-size control is more difficult (see also Refs. 20.4, 20.15, and 20.52).

To estimate the local truncation error for step-size control in Runge-Kutta routines, one can compare results obtained for different step sizes (using stored derivative values whenever possible), or investigate a suitable function of the fa. For the frequently used Runge-Kutta method (c) in Table 20.8-1, the quantity

image

is a rough measure of the local truncation error preferable to |k2k3|/|k1 — k2| and k1 + k4 = 2k3, which are also used (Ref. 20.67).

(b) The stability of the approximate solution y0, y1, y2 . . . of Eq. (1) computed, say, with the aid of a general multipoint formula

image

depends on the stability of the corresponding linearized difference equation for the approximate error sequence eo, e1, e2, . . . , viz.,

image

(Sec. 20.4-8). The associated characteristic equation

image

will have a root image, then the coresponding normal mode of the error sequence is unstable, but so is the true solution y(x) near x = xk, so that the fractional error may be acceptable for small Δx. For r = 0 (simple one-step method), z1 is the only root. For r > 0, Eq. (18) will have additional roots corresponding to spurious modes in the computed solution because of the higher-order difference approximation. Relative stability of the computed solution requires that all these extra roots lie within the unit circle \z\ = 1 (Sec. 20.4-8) for the values of Δx of interest. The worst hazards exist for step inputs (which, like the round-off noise, may excite spurious modes) and near stability boundaries of the original differential equation. Such situations, if suspected, may be tested by artificially introduced perturbations.

The stability of a predictor-corrector scheme will depend on both the predictor and the corrector formula but is affected more strongly by the latter if the correction is small. Integration routines of order higher than the fourth require careful stability investigation, but their use may be economical in trajectory computations with large step sizes (Ref. 20.64).

20.8-6. Ordinary Differential Equations of Order Higher Than the First and Systems of Ordinary Differential Equations, (a) Each ordinary differential equation of the second or higher order is equivalent to a system of first-order equations (Sec. 9.1-3). If the latter is written in the matrix form of Sec. 13.6-1, each solution method of Secs. 20.8-2 to 20.8-4 yields an analogous method for the numerical solution of systems of differential equations,

(b) Specifically, consider a system of first-order differential equations

image

with solution y = y(x), z = z(x), .... Note that solution by the Taylor-series and Picard methods is essentially analogous to the procedures outlined in Secs. 9.2-5a and 9.2-56.

The Runge-Kutta method is analogous to the scheme of Sec. 20.8-2b:

image

with

image

Any one of the multistep schemes of Sec. 20.8-3 may be applied to each equation (19); one writes

image

(c) The stability of the correct solution y = y(x) of a system (matrix equation)

image

will depend on the matrix [∂f/∂y] (Sec. 13.6-5). Equation (17) becomes a matrix difference equation, whose characteristic roots must be compared to the eigenvalues of the matrix [∂f/∂y]xk to determine relative stability (Refs. 20.4, 20.6, and 20.52).

20.8-7. Special Formulas for Second-order Equations. Because of the practical importance of second-order differential equations, the following schemes for the numerical solution of the differential equation

image

with given initial values y(x0) = y0, y′(xo) = yoare of interest; in each case.

image

These methods may be extended to apply to the solution of systems of two or more second-order equations in the manner of Sec. 20.8-6.

(a) A Runge-Kutta Method for Eq. (22)

image

(b) An Interpolation-iteration Scheme. Starting with a trial value for fk+1, iterate

image

(c) Prediction-correction Schemes.

image

If f(x, y, y′) does not contain y′ explicitly, transform the given differential equation in the manner of Sec. 9.1-56, or use the following prediction-correction scheme:

image

Reference 20.4 gives modifier formulas for this scheme.

(d) Numerov's Method for Linear Equations (Refs. 20.4 and 20.11). Linear differential equations of the form

image

can be solved with the corrector of Eq. (27) alone; substitution of Eq. (28) yields

image

20.8-8. Frequency-response Analysis. Given a stable integration formula (12), a complex-sinusoid input fk = eiωxk will eventually produce a steady-state sinusoidal solution yk = H()eiωxk (just as in Sec. 9.4-6). Substitution yields the sampled-data frequency-response function

image

Integration formulas can now be designed so as to approximate the ideal integrator response 1/ as closely as possible in amplitude and phase. To reduce round-off-noise propagation, it is desirable that the error |H() — l/| decrease with frequency (Ref. 20.4). The same type of analysis applies also to double-integration formulas (Sec. 20.8-7).

20.9. NUMERICAL SOLUTION OF BOUNDARY-VALUE PROBLEMS, PARTIAL DIFFERENTIAL EQUATIONS, AND INTEGRAL EQUATIONS

20.9-1. Introduction. This chapter describes numerical methods applicable to the solution of boundary-value problems involving ordinary or partial differential equations and to the solution of hyperbolic and parabolic partial differential equations. Alternative numerical-solution methods outlined in this handbook include

      Reduction of partial differential equations to ordinary differential equations by separation of variables (Secs. 10.1-3 and 10.4-9), solution of characteristic equations (Secs. 10.2-2 and 10.2-4), and the method of characteristics (Sec. 10.3-2)

      Reduction to a variation problem (Secs. 11.1-1, 11.7-1, and 15.4-7) and solution by direct methods such as the Rayleigh-Ritz method (Sec. 11.7-2)

      Reduction to an integral equation (Sec. 15.5-2) which may be solved directly or by means of one of the approximation methods of Sec. 20.9-5

Perturbation methods for the solution of eigenvalue problems are described in Sec. 15.4-11. Many combinations of the various solution methods are possible, and vast numbers of specialized procedures have been developed for special applications (Refs. 20.9, 20.21 to 20.31, and 20.50).

The quasilinearization method for solving two-point boundary-value problems is outlined in Sec. 20.9-3.

20.9-2. Two-point Boundary-value Problems Involving Ordinary Differential Equations: Difference Technique (see also Secs. 9.3-3, 15.4-1, and 15.5-2). Two-point boundary-value problems requiring the solution of a given ordinary differential equation subject to boundary conditions at the end points of an interval (a, b) can often be reduced to initial-value problems by the method of Sec. 9.3-4. The following finite-difference method may be more convenient.

Divide the given interval (a, b) into subintervals of equal length by a net

image

and replace each derivative in the given differential equation and in the boundary conditions by a corresponding difference approximation (Sec. 20.7-la, Fig. 20.9-1, and Fig. 20.9-6) of equal (or higher) order. The given ordinary differential equation is thus approximated by a difference equation. The numerical solution of the difference equation amounts to the solution of a set of simultaneous equations for the unknown function values yk = y(x + k Δx) (Secs. 20.2-5 to 20.3-2). In typical linear problems, the system matrix is usually “sparse” (Sec. 20.3-2a), and the problem is thus suitable for the iteration methods of Sec. 20.3-2. In particular, relaxation methods (Sec. 20.3-2c) are useful for “manual” computation.

image

image

FIG. 20.9-1. Operators for central-difference approximation (rectangular cartesian coordinates: Δx = Δy = h). Percentage errors are of the order of h2.

EXAMPLE: To solveimagesubject to the boundary conditions y(0) = 1,y(l) = 0, divide the interval (0, 1) into n subintervals of length Δx = 1/n and use d2y/dx2 Δ2y/Δx2 to obtain the difference equation

image

with y0= 1, yn = 0. For n = 4 (k = 1, 2, 3), the simultaneous equations to be solved for y1, y2, y3 are

image

20.9-3. The Generalized Newton-Raphson Method (Quasilinearization). While both nonlinear and linear two-point boundary-value problems can be treated numerically by the methods of Secs. 9.3-4 and 20.9-2, both these methods are most easily applied to linear differential equations. One can often find solutions of the important class of nonlinear two-point boundary-value problems of the form

image

by the following so-called quasilinearization method. Start with a trial olution y[0](x) which satisfies the given boundary conditions, and obtain successive approximations y[1](x), y[2](x), . . . by solving linear problems

image

where the subscripts denote partial differentiation. This technique is readily generalized to apply to systems of differential equations if one introduces the matrix notation of Sec. 13.6-1 and constitutes a generalization of the Newton-Raphson method of Sec. 20.2-8: like the latter, it may converge quite rapidly. While fairly general convergence conditions can be formulated, convergence is commonly tested by trial and error (Refs. 20.47 and 20.60).

20.9-4. Finite-difference Methods for Numerical Solution of Partial Differential Equations with Two Independent Variables. Methods analogous to that of Sec. 20.9-2 apply to the solution of problems involving partial differential equations. Introduce an appropriate net of coordinate values xi = xo + i Δx, yk = yo + k Δy (i, k = 0, ±1, ±2, . . .); the unknown function Φ(x, y) will be represented by a discrete set of function values Φ(xi;, yk) = Φik. Approximate each differential operator by a corresponding difference operator, so that every derivative is approximated by a corresponding partial difference coefficient. The resulting difference equation will yield a system of simultaneous equations to be satisfied by the unknown function values Φik. The most important problems lead to the following situations.

      1. A boundary-value problem for an elliptic partial differential equation (e.g., a Dirichlefc problem for ∇Φ2 = 0, Sec. 15.6-2) produces a set of N simultaneous equations for N unknowns Φik.

The large number of equations and unknowns associated with such difference techniques makes many practical partial-differential-equation problems a challenge for even the largest digital computers. As the only redeeming feature, the nature of the difference operators resulting from the usual second-order and fourth-order linear partial differential equations (Table 10.4-1) again leads to systems of linear equations whose system matrices are “sparse,” i.e., which have few nonvanishing terms except near the main diagonal. Such problems are, then, well suited for solution by the iteration methods of Secs. 20.3-2.

      2. A linear eigenvalue problem (e.g., ∇2Φ = λΦ with suitable boundary conditions, Sec. 15.4-1) yields a matrix eigenvalue problem.

      3. Initial-value problems involving a parabolic or hyperbolic partial differential equation (Sec. 10.3-4) also lead to sets of simultaneous equations if the differencing scheme chosen relates each unknown $ik to other unknown as well as to known function values {implicit methods for solving initial-value problems).

      4. If forward-difference approximations (Sec. 20.7-1) are used for the time derivatives in an initial-value problem, one obtains a set of recursion formulas for successive Φik's, starting with the given initial values (explicit methods for solving initial-value problems).

In general, explicit methods will require less computation than implicit methods for solving initial-value problems; but the recursion schemes tend to involve error-propagation or stability problems similar to those associated with the numerical solution of ordinary differential equations. As in the latter case, approximations may be improved through predictor-corrector methods analogous to those of Sec. 20.8-3 (Refs. 20.9, 20.16, and 20.20; see also Sec. 20.9-8).

A vast number of special techniques, including coordinate transformations reducing suitable boundaries and differencing nets to more convenient shapes, will be found in Refs. 20.9, 20.16, 20.17, and 20.20 to 20.31. Reference 20.9, in particular, contains an extensive bibliography.

20.9-5. Two-dimensional Difference Operators. Figures 20.9-1 to 20.9-6 list the most frequently needed linear difference operators. In

image

particular, each diagram of Figs. 20.9-1 to 20.9-5 yields a specified central-difference expression for the center of the “star” or “molecule” as the weighted sum of the function values Φik “above,” “below” “to the right” etc., of the center, each weighting coefficient being indicated in its proper position. Thus, Fig. 20.9-1 yields

image

where h = Δx = Δy is the mesh width used for both x and y.

image

FIG. 20.9-3. A central-difference operator suitable for use with graded nets, or for transition to odd mesh widths near a boundary. The percentage error is of the order of h.

image

FIG. 20.9-4. A central-difference operator for use with oblique cartesian-coordinate nets. The percentage error is of the order of h2.

Figure 20.9-1 applies to rectangular cartesian coordinates and equally spaced points. Figure 20.9-3 is useful if one wishes to change mesh width either (1) to accommodate irregular boundaries or (2) to increase the accuracy of the computation in some region of special interest (use of graded nets). In any case, the net used can be refined after an initial rough computation.

Figures 20.9-2, 20.9-4, and 20.9-5 show difference operators for nets other than rectangular cartesian. Figure 20.9-6 shows a few forward-and backward-difference operators.

For computation purposes, the net points (xi, yk) are often labeled in some simple sequence 1, 2, ... ; the corresponding function values Φik are then denoted by Φ1, Φ2, . . . (see, for example, Fig. 20.9-7). In “manual” relaxation-type computations (Sec. 20.3-2) involving problems of type 2, it is customary to use a large plan of the region and to enter

image

FIG. 20.9-5. A central-difference operator for use with polar coordinates r, φ. Percentage errors are of the order of rΔrΔ φ a useful graded net is given by Δr = r Δ φ.

image

FIG. 20.9-6. Forward- and backward-difference approximations (rectangular cartesian coordinates, Δx = Δy = h). Percentage errors are of the order of h2.

function values and residuals directly at each net point. Function values and residuals from earlier steps of the relaxation process are simply crossed out or erased.

NOTE: In problems of types 2 and 3, one cannot always replace a given differential operator by a difference operator of higher order, since the resulting difference-equation solution might contain spurious oscillation modes. On the other hand, finite-difference solutions of problems of type 1 (initial-value problems associated with hyperbolic or paraubolic differential eqations) often employ difference operators of higher orders than those of the corresponding derivatives, just as this was one in Sees, 20.9-1 to 20.9-5 (see also Refs. 20.17, 20.23, and 20.48 for a number of special methods).

20.9-6. Representation of Boundary Conditions (Fig. 20.9-7). Approximate the given boundary by mesh lines of the net used; introduce a graded net (Fig. 20.9-3 and Sec. 20.9-36) if necessary. Then

      1. If boundary values of the unknown function are given, enter them at the appropriate boundary points.

      2. If boundary values of derivatives (such as Φ/∂n, Sec. 5.6-16) are given, extend the net beyond the given boundary, and approximate the boundary conditions by corresponding difference relations. The resulting equations, as well as the difference relations approximating the given differential equation at points near the boundary, will involve function values at points outside the boundary; but all such function values can be eliminated by algebraic manipulation.

image

FIG. Representation of boundary conditions for 2Φ = 0 or 4Φ = 0 = 0. The net is continued to the left of the given boundary, and “image values” Φ2 = Φ2, Φ4 = Φ4 are introduced to yield difference equations representing dΦ>/dn = 0 at the boundary points shown. The boundary condition Φ = 0, 2Φ = 0 (e.g., free edge of an elastic plate) is similarly represented by Φ1= Φ3 = 0, Φ '2 = - Φ2, Φ'4 = - Φ4.

EXAMPLE: Figure 20.9-76 shows how the boundary condition Φ/∂n = 0 is approximated through “reflection” of function values in the boundary, so that Φ2Φ2 = 0, Φ4Φ4 = 0. For a given differential equation, say 2Φ = 0, apply the first difference operator of Fig. 20.9-2 to obtain

image

or, since the boundary condition implies Φ2 = Φ2, Φ4 = Φ4,

image

The last equation involves only points inside or on the boundary.

20.9-7. Problems Involving Three or More Independent Variables. Analogous methods apply to problems involving three or more variables. In particular,

image

The number of independent variables can often be reduced through separation of variables (Sec. 10.1-3).

20.9-8. Validity of Finite-difference Approximations. Some Stability Conditions, (a) Every difference-approximation solution requires an investigation of its validity. The error propagation resulting from two different difference-equation approximations to the same differential equation may differ radically even though the same mesh widths are used. It is, moreover, not always possible to improve the approximation by decreases in mesh widths, even if exact solutions of the difference equations (with zero relaxation residuals) are available. For a proper finite-difference approximation, the solution of the difference equation should converge to that of the given differential equation in a nonoscillatory manner. Implicit-solution schemes for linear partial differential equations (Sec. 20.9-4) rarely lead to difficulties of this type, but in explicit methods for parabolic and hyperbolic differential equations, the various coordinate increments may have to satisfy special stability conditions.

(b) In particular, the solution of the difference equation

image

or (in recursion-relation form)

image

converges to the solution of the partial differential equation (heat-conduction equation, Sec. 10.4-1)

image

if Δt → 0, Δx → 0 so thatimage is a useful combination, which will also improve the approximation for finite increments (Ref. 20.9).

By contrast, the implicit method based on the Crank-Nicholson approximation

image

will yield approximations which converge to the correct solution of the heat-conduction equations whenever Ax —> 0, At —» 0.

(c) Again, the solution of the difference equation

image

converges to the solution of the wave equation

image

if Δt → 0, Δx → 0 so that Δt < image Δx.

(d) Reference 20.9 lists numerous other approximation formulas and stability conditions for the heat-conduction and wave equations.

20.9-9. Approximation-function Methods for Numerical Solution of Boundary-value Problems, (a) Approximation by Functions Which Satisfy the Boundary Conditions Exactly. Let x be either a one-dimensional variable or a multidimensional variable x = (x1, x2 . . . . , xn) (Sec. 15.4-1). It is desired to solve an ordinary or partial differential equation

image

subject to a boundary condition

image

on the boundary S of a given region V (Sec. 15.4-1). Approximate the desired solution φ(x) by an approximation function

image

which satisfies the given boundary conditions and depends on m parameters α1, α2, . . . , αm. In many applications, the equations (2) are linear, and one approximates Φ(x) by a linear combination of known functions

image

where φ1(x) satisfies the given boundary condition (4b), and φ2(x), φ3(x), . . . , φm(x) satisfy the associated homogeneous boundary condition (x) = 0 (x in S) (see also Sec. 15.4-2). The error due to the use of the approximation (5a) or (5b) is a function of the ak, given by

image

One determines the unknown parameters α1, α2,. . . , αm in Eq. (5a) or (5b) by one of the following schemes:

      1. Collocation. φ(x; α1, α2,. . . , αm ) is made to satisfy the given differential equation (4a) exactly at m chosen points x = X1,x = X2, . . . , x = Xm; the resulting m equations

image

are solved for α1, α2,. . . , αm .

      2. Least-squares Approximations (see also Sec. 20.6-2). Choose the αkso as to minimize the mean-square error

image

or the cruder mean-square error

image

where the bh are suitably determined weighting coefficients associated with N chosen points X1, X2, . . . , XN in V. One may, for instance, space the points Xh evenly and let

image

The αk are determined by the m conditions

image

      3. Galerkin's Weighting-function Method. Choose m linearly independent “weighting functions” Ψ1(x), Ψ2(x), . . . ,Ψm(x) (one frequently uses Ψk = φk), and determine the αk so that

image

If the given equations (4) are linear (linear boundary-value problem), then an approximation of the form (5b) will yield linear equations (7), (10), or (11).

(b) Approximation by Functions Which Satisfy the Differential Equation Exactly. It is frequently preferable to use an approximation function (5) which satisfies the given differential equation (3a) exactly for all x in V and to determine the parameters ak so as to match the given boundary conditions in the sense of the collocation method (boundary collocation), the least-squares method, or the Galerkin method.

20.9-10. Numerical Solution of Integral Equations (see also Sec. 15.3-2). (a) The solution φ(x) of a linear integral equation

image

can often be approximated by the iteration method of Sec. 15.3-8a; or one may approximate the given kernel K(x, ξ) by a polynomial or other degenerate kernel (Sec. 15.3-16) to simplify the solution. Eigenvalue problems (F ≡ 0) can often be treated as variation problems (Sec. 15.3-6).

(b) The approximation-function methods of Sec. 20.9-9 also apply directly to the numerical solution of a linear integral equation (12). Use an approximation of the form (5b) and compute the m functions

image

Then

      1. The collocation method yields the m linear equations

image

      2. The least-squares method yields the m linear equations

image

      3. Galerkiri's method yields the m linear equations

image

20.10. MONTE-CARLO TECHNIQUES

20.10-1. Monte-Carlo Methods (Refs. 20.42 and 20.46). In principle, every Monte-Carlo computation may be considered as the estimation of a suitable integral

image

by a random-sample average

image

where (x1, x2, . . . , xN) is a (generally multidimensional, N > 1) random variable with known distribution function Φ(x1,x2, . . . , XN).

Monte-Carlo techniques are widely useful in investigations of random-process phenomena too complicated for explicit solution by probability theory, viz., neutron-diffusion problems, detection and communication problems, and a vast variety of operations-research studies. In addition, it often pays to recast other types of problems, especially those involving complicated multidimensional integrals, in a form suitable for Monte-Carlo solution.

For simplicity, the following discussion is based on Monte-Carlo estimation of the one-dimensional integral

image

image

The variance of the estimate (4) of (3) on the basis of a random sample (1x 2x, . . . , nx), is

image

so that the rms fluctuation decreases only as image with increasing n (Sec. 19.2-3). The estimate variance is due to the random fluctuation in the distribution of different samples (1x, 2x, . . . , nx).

20.10-2. Two Variance-reduction Techniques (Ref. 20.46). The following techniques attempt to “doctor” the sample (1x, 2x, . . . , nx) so as to reduce the variance of the sample mean while still preserving the relation

image

i.e., without biasing the estimate.

(a) Stratified Sampling. One divides the range of the random variable x into a number of suitably chosen class intervals ξj-1 > x ≤ ξi and agrees to fix the number ni of otherwise independent sample values kx = ixi(i = 1, 2, . . . , ni) falling into the jth class interval. Assuming a priori knowledge of the probabilities

image

associated with the class intervals (e.g., on the basis of symmetry, uniform distribution, etc.), one can employ the stratified-sample average

image

as an unbiased estimate of I, with

image

Note that repeated stratified samples will differ only within class intervals. The variance (9) can be smaller than the random-sample variance Var {f(x)̄}/n with image if a priori information permits a favorable choice of the ξi and niIn principle, it would be best to choose class intervals for equal variances

image

and then to assign the theoretically correct number of samples to each class interval, i.e.,

image

In this ideal case, one should have the relatively small estimate variance

image

As the class intervals are decreased, the stratified-sampling techniques will produce results analogous to that of an integration formula, but ordinarily the class intervals are larger; practical applications are usually multidimensional, so that simple symmetry relations may yield favorable class intervals.

(b) Use of Correlated Samples. If individual sample values kx are not statisti cally independent (as they would be in a true random sample), the expression (9) for the estimate variance is replaced by

image

(see also Sec. 19.8-1). Judiciously introduced negative correlation between selected sample-value pairs % kx will produce negative covariance terms in Eq. (13) and may reduce the variance well below the random-sample variance Var {f{x)}/n without biasing the estimate.

As a simple example (Ref. 20.42), let x be uniformly distributed between x = 0 and x = 1, and let f(x) be the monotonic function (ez — l)/(e — 1). One designs the sample so that n is even, and 2x = 1 — 1x, 4x = 1 — 3x, . . . , nx = 1 — n-lx, with sample values otherwise independent. Since f(x) and /(l — x) are negatively-correlated,

image

so that the rms fluctuation is reduced by a factor of about 5.6. In addition, the correlated sample requires one to generate fewer random numbers. More interesting applications are, again, to multidimensional problems. Note that stratified sampling, in effect, also introduces negative correlation between sample values: k+1x can no longer fall into a given class interval if kx has filled the latter.

20.10-3. Use of A Priori Information: Importance Sampling. As a matter of principle, Monte-Carlo computations often can and should be simplified through judicious application of partial a priori knowledge of results. As a case in point, importance-sampling techniques attempt to estimate an integral (1) by a sample average image, where y is a random variable with probability density

image

The estimate is easily seen to be unbiased. The function g(y) is chosen so that

image

is small, subject to the constraintimage. In particular, g(y) = f(y)/I would reduce the variance (15) to zero, but this would require knowledge of the unknown quantity /. Importance sampling permits one to "concentrate" sampling near values of y of special interest, e.g., where f(y) varies rapidly.

20.10-4. Some Random-number Generators. Tests for Randomness (Refs. 20.9, 20.51, and 20.55). Congruential methods for generating pseudo-random numbers xi less than a given nonnegative modulus m start with any nonnegative xo < m and compute successive values

image

where modulo-m addition is defined in the manner of Sec. 12.2-10. On a binary computing machine, the modulus m is conveniently chosen equal to 2 computer word length For c = 0, the generator is called a multiplicative congruential generator; otherwise, it is a mixed congruential generator.

Sequences obtained in this manner are not truly random but may have “pseudorandom” properties, such as a uniform distribution between 0 and m, zero corrrelation between different xi, random-appearing runs of odd and even numbers, etc. The uniform distribution may be tested with a χ2 test (Sec. 19.6-7), and serial corelation may be tested in the manner of Sec. 19.7-4. Even with zero correlation, samples (x1, x2, . . . , xn) taken from a pseudo-random-number sequence will not be statistically independent, a fact which, depending on the specific application, can result in disagreeable surprises. It may well be wise to obtain true random samples for Monte-Carlo computations by analog-to-digital conversion of true analog noise (Ref. 20.55).

Pseudo-random numbers having other than uniform distributions are readily obtained as functions F(xi) of uniformly distributed pseudo-random variables. A number of other methods are discussed in Ref. 20.9.

20.11. RELATED TOPICS, REFERENCES, AND BIBLIOGRAPHY

20.11-1. Related Topics. The following topics related to the study of numerical computations are treated in other chapters of this handbook.

      Theory of equations Chap. 1

      Ordinary differential equations Chap. 9,Chap. 13

      Partial differential equations Chap. 10

      Linear and nonlinear programming, dynamic programming, calculus of variations Chap. 11

      Matrices, diagonalization Chap. 13, Chap. 14

      Eigenvalue problems Chap. 14, Chap. 15

      Boundary-value problems Chap. 15

      Regression Chap. 18, Chap. 19

20.11-2. References and Bibliography.

General

      20.1. Abramowitz, M., and I. A. Stegun (eds.): Handbook of Mathematical Functions National Bureau of Standards, Washington, D.C., 1964.

      20.2 Booth, A. D.: Numerical Methods, Butterworth, London, 1955.

      20.3. Collatz, L.: Numerical Methods, in Handbuch der Physik, vol. 2, Springer, Berlin, 1955.

      20.4. Hamming, R. W.: Numerical Methods for Engineers and Scientists, McGraw-Hill, New York, 1962.

      20.5. Henrici, P.: Elements of Numerical Analysis, Wiley, New York, 1964.

      20.6. Hildebrand, F. B.: Introduction to Numerical Analysis, McGraw-Hill, New York, 1956.

      20.7. Householder, A. S.: Principles of Numerical Analysis, McGraw-Hill, New York, 1953.

      20.8.Jennings, W.: First Course in Numerical Methods, Macmillan, New York, 1964.

      20.9. Klerer, M., and G. A. Korn (eds.): Digital Computer User's Handbook, McGrawHill, New York, 1967.

      20.10. Kopal, Z.: Numerical Analysis, Wiley, New York, 1955.

      20.11. Kunz, K. S.: Numerical Analysis, McGraw-Hill, New York, 1957.

      20.12. Lanczos, C: Applied Analysis, Prentice-Hall, Englewood Cliffs, N.J., 1956.

      20.13. Milne, W. E.: Numerical Solution of Differential Equations, Wiley, New York, 1953.

      20.14. Noble, B.: Numerical Methods, Oliver & Boyd, London, 1964.

      20.15. Ralston, A.: A First Course in Numerical Analysis, McGraw-Hill, New York, 1964.

      20.16. and H. S. Wilf: Mathematical Methods for Digital Computers, 2 vols., Wiley, New York, 1960 and 1967.

      20.17. Salvadori, M. G., and M. L. Baron: Numerical Methods in Engineering, 2d ed., Prentice-Hall, Englewood Cliffs, N.J., 1961.

      20.18. Scarborough, J. B.: Numerical Mathematical Analysis, 5th ed., Johns Hopkins, Baltimore, 1962.

      20.19. Stiefel, E. L.: An Introduction to Numerical Mathematics, Academic, New York, 1963.

      20.20. Todd, J.: Survey of Numerical Analysis, McGraw-Hill, New York, 1962.

Linear Equations, Matrix Problems, and Partial Differential Equations

      20.21. Allen, D. N.: Relaxation Methods, McGraw-Hill, New York, 1954.

      20.22. Fadeev, D. K., and V. N. Fadeeva: Computational Methods in Linear Algebra, Freeman, San Francisco, 1963.

      20.23. Forsythe, G. E., and W. R. Wasow: Finite-difference Methods for Partial Differential Equations, Wiley, New York, 1960.

      20.24. Fox, L.: An Introduction to Numerical Linear Algebra, Oxford, Fair Lawn, N.J., 1964.

      20.25. Householder, A. S.: The Theory of Matrices in Numerical Analysis, Blaisdell, New York, 1964.

      20.26. Paige, L. J., and O. Taussky: Simultaneous Linear Equations and the Determination of Eigenvalues, National Bureau of Standards Applied Mathematics Series 29, 1953.

      20.27. Shaw, F. S.: An Introduction to Relaxation Methods, Dover, New York, 1953.

      20.28. Southwell, R. V.: Relaxation Methods in Engineering Science, Oxford, Fair Lawn, N.J., 1940.

      :20.29. Relaxation Methods in Theoretical Physics, Oxford, Fair Lawn, N.J., 1946.

      20.30. Varga, R. S.: Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, N.J., 1962.

      20.31. Wilkinson, J. H.: The Algebraic Eigenvalue Problem, Oxford, Fair Lawn, N.J., 1965.

(See also the articles by J. H. Wilkinson and by W. K. Karplus and V. Vemuri in Ref. 20.9.)

Finite Differences and Difference Equations

      20.32. Goldberg, S.: Introduction to Difference Equations, Wiley, New York, 1958.

      20.33. Jolley, L. B.: Summation of Series, Chapman & Hall, London, 1925; reprinted, Dover, New York, 1960.

      20.34. Jordan, C: Calculus of Finite Differences, Chelsea, New York, 1947.

      20.35. Jury, E. I.: Theory and Application of the Z-transform Method, Wiley, New York, 1964.

      20.36. Milne-Thomson, L. N.: The Calculus of Finite Differences, Macmillan, London, 1951.

      20.37. Ragazzini, J. R., and G. F. Franklin; Sampled-data Control Systems, McGraw-Hill, New York, 1958.

Approximation Methods

      20.38. Carlson, B., and M. Goldstein: Rational Approximation of Functions, Los Alamos Scientific Laboratory, Rept. LA-1943, 1955.

      20.39. Davis, P. J.: Interpolation and Approximation, Blaisdell, New York, 1963.

      20.40. Dunham, C. B.: Convergence Problems in Maehly's Second Method, J. ACM, April, 1965.

      20.41. Hastings, C: Approximations for Digital Computers, Princeton, Princeton, N.J., 1955.

      20.42. Maehly, H.: First Interim Progress Report on Rational Approximations, Project NR-044-196, Princeton University, 1958.

      20.43. Methods for Fitting Rational Approximations, J. A CM, 10: 257 (1963).

      20.44. Meinardus, G.: Approximation von Funktionen und ihre numerische Behandlung, Springer, Berlin, 1964.

      20.45. Rice, J. R.: The Approximation of Functions, Addison-Wesley, Reading, Mass., 1964.

      20.46. Snyder, M. A.: Chebyshev Methods in Numerical Approximation, Prentice-Hall, Englewood Cliffs, N.J., 1966.

Miscellaneous

      20.47. Bellman, R. E., and R. E. Kalaba: Quasilinearizaiion and Nonlinear Boundary-value Problems, Elsevier, New York, 1965.

      20.48. Collatz, L.: Eigenwertaufgaben mit technischen Anwendungen, Akademische Verlagsgesellschaft m.b.H., Leipzig, 1949.

      20.49. Fox, L.: Numerical Solution of Two-point Boundary-value Problems, Oxford, Fair Lawn, N.J., 1957.

      20.50. Numerical Solution of Ordinary and Partial Differential Equations, Pergamon Press, New York, 1962.

      20.51. Hammersley, J. M., and D. C. Handscomb: Monte Carlo Methods, Wiley, New York, 1964.

      20.51. Henrici, P.: Discrete-variable Methods in Ordinary Differential Equations, Wiley, New York, 1963.

      20.53. Error Propagation for Difference Methods, Wiley, New York, 1963.

      20.54. Ostrowski, A. M.: Solution of Equations and Systems of Equations, Academic, New York, 1960.

      20.55. Schreider, Y. A.: Method of Statistical Testing (Monte Carlo Method), Elsevier, New York, 1964.

      20.56. Stroud, A. H., and D. Secrest: Gaussian Quadrature Formulas, Prentice-Hall, Englewood Cliffs, N.J., 1966.

      20.57. Wilde, D. J.: Optimum-seeking Methods, Prentice-Hall, Englewood Cliffs, N J., 1964.

      20.58. Wilkinson, J. H.: Rounding Errors in Algebraic Processes, Prentice-Hall, Englewood Cliffs, N.J., 1964.

      20.59. Rail, L. B. (ed.): Error in Digital Computation, Wiley, New York, 1965.

      20.60. Roberts, S. M., and J. S. Shipman: The Kantorovich Theorem and Two-point Boundary-value Problems, IBM J. Res., September, 1966.

      20.61. Remez, E.: General Computation Methods for Chebychev Approximation, Izv. Akad. Nauk Ukrainsk. SSR, Kiev, 1957.

      20.62. Whittaker, E., and G. Robinson: The Calculus of Observations, 4th ed., Blackie, Glasgow, 1944.

      20.63. Lindorff, D. P.: Theory of Sampled-data Control Systems, Wiley, New York, 1965.

      20.64. Fehlberg, E.: New One-step Integration Methods of High-order Accuracy, NASA TR R-248, George C. Marshall Space Flight Center, Huntsville, Ala., 1966.

      20.65. Huskey, H. D., and G. A. Korn (eds.): Computer Handbook, McGraw-Hill, New York, 1962.

      20.66. Korn, G. A., and T. M. Korn: Electronic Analog and Hybrid Computers, McGraw-Hill, New York, 1964.

      20.67. Brand, L.: Differential and Difference Equations, Wiley, New York, 1966.

      20.68. Warten, R. M.: Automatic Step-size Control for Runge-Kutta Integration, IBM J. Res., October, 1963.

      20.69. Beckett, R., and J. Hurt: Numerical Calculations and Algorithms, McGraw-Hill, New York, 1967.

      20.70. Cooley, J. W., and J. W. Tukey: An Algorithm for the Machine Calculation of Complex Fourier Series, Math. Comp., 19:297, April, 1965.