Modern Mathematics for the Engineer

ARTHUR ERDÉLYI

PROFESSOR OF MATHEMATICS
CALIFORNIA INSTITUTE OF TECHNOLOGY

1.1 Introduction

In mathematical physics, one often encounters “impulsive” forces acting for a short time only. A unit impulse would be described by a function p(t) that vanishes outside a short interval and is such that

It is convenient to idealize such forces as “instantaneous” and to attempt to describe them by a function δ(t) that vanishes except for a single value of t which we take to be t = 0, is undefined for t = 0, and for which

Such a function, one convinces oneself, should possess the “sifting property”

for every continuous function ϕ, and the corresponding property (obtained by integration by parts)

for every k times continuously differentiable function ϕ.

Unfortunately, it can be proved that no function, in the sense of the mathematical definition of this term, possesses the sifting property. Nevertheless, “impulse functions” postulated to have these or other similar properties are being used with great success in applied mathematics and mathematical physics.

The use of such improper functions can be defended as a kind of short-hand, or else as a heuristic means; it can also be justified by an appropriate mathematical theory. In Sec. 1.2 we shall indicate briefly some theories that can be employed to justify the use of the delta function. In order to provide a theoretical framework accommodating the great variety of improper functions occurring in contemporary investigations of partial differential equations, it seems necessary to widen the traditional concept of a mathematical function. The new concept, that of a “generalized function,” is abstract and cannot reproduce all aspects of the older concept of a function. In particular, it is not possible to ascribe a definite value to a generalized function at a point. Nevertheless, we shall see that in some sense such generalized functions can be described. In particular, it makes perfectly good sense to say that δ(t), which is a generalized function, vanishes on any open interval not containing t = 0.

In this chapter we shall outline two theories of generalized functions. One, essentially algebraic in nature, is restricted to generalized functions on a half line; the other, more closely related to functional analysis, places less restrictions on the independent variable. We shall also mention briefly other theories of generalized functions.

DELTA FUNCTIONS AND OTHER GENERALIZED FUNCTIONS

1.2 The Delta Function

Since the delta function is the idealization of functions that vanish outside a short interval, it is plausible to try to approximate the delta function by such functions. Let s(t) be a function on (− ∞, ∞) satisfying the following conditions:

Then the function

s_n (t) = ns(nt)

satisfies the conditions a and c and vanishes outside (−1/n, 1/n), and it may be regarded as “approaching” the delta function as n → ∞. Indeed, it can easily be proved that

for any continuous function ϕ, or even for any function ϕ that is integrable over some interval containing 0 and is continuous at 0. Furthermore,

uniformly in t over any finite interval, provided ϕ(t) is continuous over some larger interval. If s(t) is k times continuously differentiable, we also have

As a matter of fact, it is not necessary that s have the property b. If s has the properties a and c, then the condition (1.3) will hold for all functions ϕ that are bounded and continuous for − ∞ < t < ∞. Some examples of such approximations to the delta function that have been used by the great analysts of the last century are the following:

For a history of the delta function, see Ref. 16, Chap. V.

It may be remarked that we clearly have

showing that in some sense δ(t) is the derivative of the unit function

and indicating some connection between the theory of the delta function and that of generalized differentiation of discontinuous functions.

An entirely different kind of theory of the delta function was adumbrated by Heaviside (see Chaps. 2 and 3 and also Ref. 16, page 65) and more clearly pinpointed by Dirac (Ref. 2, pages 71 to 77) ; it was not carried out in detail, however, until more recently. According to this theory, the delta function is defined by its action on continuous functions, this action being given by the sifting property (1.1) or (1.3); any analytical operation that, acting on a continuous function ϕ, produces ϕ(0) is then a representation of the delta function.

We have seen that it is impossible to construct such an analytical operation in the form of a Riemann (or Lebesgue) integral; but it is possible to express it as a Stieltjes integral. Indeed,

for all continuous functions ϕ. If U were differentiable, we should have

so that here too the delta function appears as a generalized derivative of the unit function.

The two theories are not as far from each other as they might at first appear to be. Although the delta function cannot be expressed as an integral operation, it can be approximated by such operations, namely, the integral operations defined by means of the s_n. Indeed, this is exactly the burden of Eq. (1.3).

1.3 Other Generalized Functions

We have indicated theories of the delta function, the basic impulse function appropriate to functions on the line − ∞ < t < ∞. Clearly there are corresponding basic impulse functions on an arbitrary finite or infinite interval; functions of two variables in a plane, where an impulse function may be concentrated at a point, along a curve, or on a more general set of points; functions of several variables; functions of a point on a curved surface or, more generally, on a manifold; and so on. While it should be possible to devise an appropriate theory for each of these impulse functions, it is clearly preferable to seek a general theory embracing all of them.

Other generalized functions occur in connection with Fourier analysis, the modern theory of partial differential equations, etc., and one should like these subjects to be included in any useful theory of generalized functions. We shall give a simple example to indicate the application of generalized functions to partial differential equations.

This example concerns the hyperbolic partial differential wave equation

u_xx − u_yy = 0

Clearly f(x − y) + g(x + y) is a solution of this equation if f and g are twice continuously differentiable functions. Now, in many problems—e.g., problems of discontinuous wave motion—one should like to regard f(x − y) + g(x + y) as a solution of the wave equation even if f or g fails to be twice continuously differentiable. There are several ways of considering such “weak” or “generalized” solutions. From the point of view adopted here, the most natural approach is that of a generalized theory of differentiation, according to which every function has generalized derivatives that are generalized functions and satisfy the partial differential equation.

We shall outline in this chapter two theories of generalized functions. The first of these is algebraic in nature; indeed, it closely imitates the widening of the concept of number from integers to rational numbers. It is most successful with functions of a single nonnegative variable, although it has been extended to functions of several such variables and to functions of a single variable on a finite interval. It has the further advantage of providing a very natural approach to operational calculus as well as to generalized functions and generalized differentiation. Its greatest drawback at present seems to be its inability to cope with functions of unrestricted real variables or with functions of several variables ranging over an arbitrary region.

The second theory belongs more to the domain of functional analysis. In a sense, it might be compared with the extension of the concept of number from rational to real numbers, but the comparison is somewhat farfetched. The principal advantage of this theory is its ability to cope with all generalized functions needed at present. There are several approaches to this theory; we shall outline one of them in the simplest case of functions of a single real variable and briefly mention some others. The considerable number of different approaches to this theory is partly due to an endeavor to remove a basic difficulty remaining in it—the difficulty in defining the product of two generalized functions—and largely due to a desire to make this concept of generalized functions more easily accessible to applied mathematicians and engineers.

An entirely different attempt to cope with the problem of the delta function may be mentioned here. Schmieden and Laugwitz ²⁰ have enlarged the concept of real numbers. Their system of numbers contains infinitesimally small and infinitely large numbers, and the analysis based on this system leads to functions, in the mathematical sense of this word, that behave like the delta function. Moreover, in this system the multiplication of two functions presents no difficulties.

MIKUSIŃSKI’S THEORY OF OPERATIONAL CALCULUS AND GENERALIZED FUNCTIONS

1.4 The Definition of Operators

In Secs. 1.4 to 1.10, which are based largely on Ref. 12, t is a nonnegative variable, f = {f(t)} denotes a function of this variable, f(t) is the value of f at t (and hence a number), is the set of all (real or complex) numbers, the set of all continuous functions of t, small Greek letters will denote scalars (numbers), will indicate that a is an element of with similar notation for other sets, Θ will tentatively denote the function vanishing identically (later we shall see that we may replace this notation by 0), and l the function having a value equal to unity for every t ≥ 0 [except for the value 1 at t = 0, this is the restriction of U(t) to t ≥ 0]. The sifting property of the delta function appropriate to the interval 0 ≤ t ≤ ∞ may be expressed as

In addition and multiplication by a scalar are defined in the obvious way, namely, for a and and α and , αa + βb is the function whose value at t is αa(t) + βb(t). These operations have the familiar properties. The convolution a * b, or simply ab, of two functions is defined by Duhamel’s integral

This operation has all the properties of multiplication, and it commutes with multiplication by a scalar; that is, ab = ba, a(bc) = (ab)c and hence may be written as abc, (a + b)c = ac + bc, (αa)b = α(ab), etc.

The set with addition and multiplication by scalars forms a vector space. The same set with addition and convolution forms a commutative ring, which will be called the convolution ring.

Integral operations can be expressed in terms of convolutions. In fact, la is the function having value at t equal to

We also have

ll = l² = {t}

and by induction,

for n = 1, 2, 3, … , so that convolution with the latter function expresses the effect of n successive integrations with fixed limit 0. More generally, for any with , where denotes the real part of α, we may set

and call convolution with l^α fractional integration of order α.

The convolution ring has no unit element; i.e., there is no such that au = a for all a ∈ C. To see this, it is sufficient to note that lu = l means

for all t ≥ 0, which is clearly impossible. This means that the delta function appropriate to this case is certainly not a continuous function; actually, it is not any function.

A very important property of the convolution ring is contained in Titchmarsh’s theorem: For we have ab = Θ if and only if a = Θ or b = Θ (or both these equations hold). An elementary proof of this theorem was given by Mikusiński in Ref. 12, Chap. 2, and is reproduced in Ref. 3, Sec. 2.1. Because of this property of , division is a meaningful, although not always a feasible, operation in . The equations bu = a, bv = a imply b(u − v) = Θ, and if b ≠ Θ, then it follows that u = v; that is, the convolution equation bu = a has, with b ≠ Θ, at most one solution u. This solution may then be regarded as a/b. But, of course, bu = a may have no solution in . Clearly, bu(0) = 0, and hence bu = a will certainly have no solution if a(0) ≠ 0; the equation may fail to have solutions in other cases as well.

The situation encountered here is very similar to that met upon the introduction of division of integers. There the feasibility of division (with the exception of division by zero) is ensured by the extension of the number concept from integers to rational numbers, and similarly here we shall ensure the existence of a unique solution of bu = a with b ≠ Θ by enlarging the convolution ring to a field of convolution quotients. Almost any construction of rational numbers from integers can be imitated; we shall follow the construction in terms of classes of equivalent ordered pairs of integers.

We shall consider ordered pairs (a,b) of elements of , always assuming that the second element ≠ Θ. We call (a,b) and (c,d) equivalent if and only if ad = bc, denote by or a/b the class of all ordered pairs equivalent to (a,b), call a/b a convolution quotient, and denote by the set of all convolution quotients. Clearly the cancellation law (ac)/(bc) = a/b holds in .

The elements of are abstract entities of which it is difficult to form a definite picture. It is, however, possible to point out that in a sense contains numbers, functions, and also the delta function and its derivatives. Thus, convolution quotients may be regarded as generalized functions, and they include the more common impulse functions on the half line t ≥ 0. We embed in by identifying with the convolution quotient (ab)/b for any b ≠ Θ. Since

this embedding is independent of b. We shall call a function f integrable if it is absolutely integrable over every finite interval 0 ≤ t ≤ t₀. For such a function, and , the convolution fb is defined and is a continuous function. It is thus natural to identify f with the convolution quotient (fb)/b for any b ≠ Θ. As in the previous case, the embedding is independent of b. Lastly, we embed in by an identification of with the convolution quotient (αb)/b, an embedding that is independent of b. Now b/b, the image of the number 1, is the unit in and we shall see later that it acts as the delta function. Two integrable functions that differ only at a finite number of points (or, more generally, on a set of measure zero) will give the same convolution integral and hence will correspond to the same convolution quotient, thus being indistinguishable in . This already shows that it is impossible to ascribe definite “values” to convolution quotients at a point t.

We now define the operations of addition, multiplication by a scalar, and (convolution) multiplication in by the equations

It is necessary to verify that these definitions are independent of the ordered pair used in the representation of the convolution quotients involved. Now, if

Hence the definition of addition is meaningful as an operation on convolution quotients. Similarly, the other two definitions can be proved meaningful, and the operations can be shown to have all the usual properties. Moreover, it is easy to verify that the embedding of and in preserves all these operations. For this reason, we may write f in place of (fb)/b and α in place of (αb)/b; in particular, we may write 1 in place of b/b. We also see that multiplication by 1 (which may be interpreted either as multiplication by the scalar 1 or as convolution multiplication by the convolution quotient corresponding to this number) reproduces f; hence we have the identification of the convolution quotient 1 with the delta function. Furthermore, the scalar 0 and the function Θ are mapped into the same convolution quotient, so that from now on we may write 0 indiscriminately for any of these three entities, which are conceptually entirely different yet operationally indistinguishable in .

The set of convolution quotients is an algebra; i.e., it is a vector space under addition and multiplication by scalars and a field under addition and convolution multiplication. The set is closed under all these operations, and it is also closed under convolution division with the single exception of division by 0, which is prohibited. The rules of ordinary algebra hold in .

From now on, elements of will be denoted by single letters, and in case of doubt it will be indicated whether an entity belongs to or to .

The elements of will primarily be considered here as generalized functions, but we shall see in the next section that they may act as operators, thus providing a convenient approach to Heaviside’s operational calculus.

1.5 Differential and Integral Operators

We have seen that convolution multiplication of a function by l means integration of that function. It is a plausible conjecture that multiplication by the inverse element in , that is to say by

with , b ≠ 0, means differentiation. We shall see that this is not quite the case.

Let

a = {a(t)}

be a differentiable function and

an integrable function. Then

or . On multiplying by s, we obtain

and thus see that, even in the case of a differentiable function a, the product sa represents the derivative function only if a vanishes at t = 0 (Sec. 2.1). This may be explained to some extent by interpreting all our functions as vanishing for t < 0, so that there is a jump of a(0) at t = 0 that contributes a(0)δ(t) to the derivative. This also explains why, even for a differentiable function a, the product sa is in general not a function; further, it shows some of the difficulties encountered in the early applications of Heaviside’s operational calculus. By applying Eq. (1.4) several times, we obtain by induction

for a function a that is n times differentiable and has an integrable nth derivative a⁽ⁿ⁾. For such a function, sⁿa is in general a generalized function. On the other hand, sⁿa exists as a convolution quotient for any (not necessarily differentiable) function a, or any convolution quotient. The generalized function sⁿa may be called the extended or generalized nth derivative of a.

Since 1 corresponds to δ(t), sⁿ is the extended nth derivative of the delta function, and a polynomial in s with constant, i.e., scalar, coefficients is an impulse function.

Next we investigate simple rational functions of s. From Eq. (1.4) we have

From this, it can be proved by induction that

We are now ready to interpret any rational function of s. Such a function can be decomposed into a sum of a polynomial and partial fractions of the form (1.5), and every term of this decomposition can then be interpreted.

EXAMPLE 1.1. To decompose the rational function s³/(s² + 1).

Since

where j² = −1, we have

The operational calculus so developed may be used to solve ordinary linear differential equations with constant coefficients, and also to solve systems of such equations. It will be sufficient to illustrate the process by a simple example.

EXAMPLE 1.2. To solve the differential equation

By applying Eq. (1.4) twice, we have

and hence

Now,

Hence we have the solution

Such equations can be solved even if the right-hand sides are generalized functions, for instance, delta functions, as with the differential equation satisfied by the Green’s function. For instance, the solution of

obtained in a manner similar to that for the above Example 1.2, is

We note that Eq. (1.5) is in agreement with the Laplace transform of e^αttⁿ⁻¹/(n − 1)! in case s denotes a complex variable. We shall see in Example 1.8 that this is not a coincidence. Thus, tables of Laplace transforms may be used in interpreting rational functions of s.

1.6 Limits of Convolution Quotients

It is fairly clear that in we should use a notion of convergence of continuous functions under which the limit of a convergent sequence of such functions is again continuous. Uniform convergence on every finite interval offers itself as the simplest notion of convergence that preserves continuity. It is much less clear how convergence of a sequence of convolution quotients should be defined, and no simple notion of convergence in that has all the desirable properties is known. We shall follow Mikusiński in introducing a notion of convergence that is at any rate simple, has many of the desirable features of convergence, and appears to be adequate for the applications of convolution quotients to operational calculus and to partial differential equations. According to this concept of convergence, a sequence of convolution quotients is regarded as convergent if it has a common denominator and if the numerators, which are continuous functions, are convergent in the sense outlined above. Thus, we shall say that a sequence of convolution quotients a_n converges to a, in symbols

a_n → aorlim a_n = a

if there is a q ≠ 0 such that, for each and if furthermore the sequence of continuous functions qa_n converges to qa uniformly over every finite interval 0 ≤ t ≤ t₀. Clearly a itself is then a convolution quotient.

It is fairly easy to prove that the limit, if it exists, is unique and has most of the usual properties. In particular, the sequence a, a, a, … converges to a; the sum (product) of convergent sequences is convergent and tends to the sum (product) of the limits; and a sequence of scalars is convergent in the ordinary sense if and only if the corresponding sequence of convolution quotients converges in the sense outlined here. However,

does not necessarily hold even if it is assumed that b_n ≠ 0 and lim b_n ≠ 0.

We shall now give some examples and comment on them.

EXAMPLE 1.3. To prove that {sin nt} → 0.

We have

and the latter function converges to 0 uniformly (in this case over the entire nonnegative axis). This example shows that convergence in , even for ordinary functions, demands much less than ordinary convergence. It thus allows us to ascribe limits to sequences of functions that would ordinarily be regarded as divergent, and it also opens the way to a representation of some convolution quotients as limits, in this sense, of ordinary functions. (See also Example 1.5.)

EXAMPLE 1.4. To prove that for , cⁿ → 0 as n → ∞.

For a fixed t₀, there exists an M > 0 such that |c(t)| ≤ M for 0 ≤ t ≤ t₀.

We shall prove by induction that, for n = 1, 2, …,

This relationship clearly holds when n = 1. If it holds for n = 1, then

and this completes the proof by induction of the inequality (1.6). Since the right-hand side of (1.6) converges to 0 uniformly for 0 ≤ t ≤ t₀, we have cⁿ → 0 in the sense of convergence in .

EXAMPLE 1.5. To prove that if f(t) is absolutely integrable over 0 ≤ t ≤ ∞ and

then

We set

Now,

and we shall prove that this function approaches {t} = l² uniformly over every finite interval. Set

First assume that 0 ≤ t ≤ δ. Then

For δ ≤ t ≤ t₀,

In either case, for any δ between 0 and t₀,

Given t₀ > 0 and > 0, we first choose δ so that 0 < δ < /(2A) and then choose N so that

We then have |g_n(t)| < for 0 ≤ t ≤ t₀ and n ≥ N, showing that g_n(t) converges to 0 uniformly for 0 ≤ t ≤ t₀. Thus, l²{nf(nt)} converges to l² uniformly on every finite interval.

We have accordingly found a family of approximations to the delta function. This should be compared with the approximations discussed in Sec. 1.2. The result suggests that many other convolution quotients might be represented as limits of functions.

1.7 Operator Functions

We shall now consider convolution quotients that depend on parameters. For the sake of simplicity, we shall consider a single parameter x varying over a closed and bounded interval I: α ≤ x ≤ β, and we shall denote the domain α ≤ x ≤ β, t ≥ 0 of the xt plane by D. An operator function a(x) assigns to each x ∈ I a convolution quotient a(x). Mikusiński calls such a function a parametric function if each , so that a(x) = {a(x,t)}, and considers a(x) as a continuous operator function if there exists a q ≠ 0 in such that b(x) = qa(x) is a parametric function and b(x,t) is continuous in D; he says a(x) is k times continuously differentiable with respect to x if there exists a q ≠ 0 in such that b(x) = qa(x) is a parametric function that is k times continuously differentiable with respect to x; and he sets

Continuous and differentiable operator functions have many of the usual properties, and differentiation obeys the familiar rules. It is unnecessary for us to go into further details here. Instead of this, let us consider some examples.

EXAMPLE 1.6. To discuss the function a(x) = {cos (x − t)}.

This is a continuous parametric function for any interval I. By virtue of the results in Example 1.2, this operator function can be expressed as

The function is indefinitely differentiable with respect to x, and the reader may easily verify that its derivatives can be computed by differentiating the explicit form. The function a(x) satisfies the operator differential equation

a″(x) + a(x) = 0

and the initial conditions

a(0) = {cos t} a′(0) = {sin t}

EXAMPLE 1.7. To discuss the function h_α(x), defined as follows: For x ≥ 0 let

h(x,t) = 0 if 0 ≤ t < xh(x,t) = 1 if x ≤ t

and set

h(x) = {h(x,t)}h_α(x) = l^αh(x)

If , then h_α(x) is a parametric function, and

Thus h_α(x) is a continuous parametric function if , and it is k times continuously differentiable if ; further,

if . Since

l^βh_α(x) = h_α+β(x)

it follows that h_α(x) is infinitely differentiable, in the sense of differentiation of operator functions, with respect to x; and

for all α. This differential equation, together with

h_α(0) = l^αh(0) = l^α+1

suggests writing

h_α(x) = l^α+1e^−sx

We shall justify this later.

Of particular importance is

h₋₁(x) = e^−sx

For , we set

h(x)f = {g(x,t)}

and find

Consequently, for

h₋₁(x)f = sg(x) = g₁(x)

we have

g₁(x,t) = 0 for 0 ≤ t < xg₁(x,t) = f(t − x) for x ≤ t

Thus, g₁(x, t) is simply the function f(t) shifted by x, and e^−sx is the shift operator.

By a direct computation, it can be verified that

h(x)h(y) = lh(x + y)

for x ≥ 0, y ≥ 0, and this relationship holds for all real x, y, provided that we define h(x) for negative values of x by the equation

h(x)h(−x) = l²

We have thus defined h_α(x) for all complex α and for all real x. In particular,

h₋₁(x)h₋₁(−x) = 1

We now turn to integration of operator functions with respect to x. Let ϕ(x) be absolutely integrable over I = [α,β], and let a(x) be a continuous operator function and q ≠ 0 such that qa(x) = b(x) is a continuous parametric function. We then set

It can be proved that this definition is independent of q and that the integral has all the usual properties. Infinite integrals may then be defined as limits of finite integrals.

EXAMPLE 1.8. To prove that for any integrable function f, the integral exists and is equal to f.

This result is the background for the coincidence noted at the end of Sec. 1.5. It should be remarked, though, that here s is an operator, so that the integral is not a Laplace integral; also, it might be noted that the result to be proved holds without any restriction on the growth of f(x). Since

le^−sx = h(x)

we have

As β → ∞, the last function approaches lf uniformly over every finite interval 0 ≤ t ≤ t₀. Thus

exists and is equal to lf.

1.8 Exponential Functions

Suppose that, for a certain , there exists an interval I containing x = 0 and a differentiable operator function e(x) on I that satisfies the differential equation

e′(x) = we(x)

and the initial condition

e(0) = 1

We then say that w is a logarithm and set

e(x) = e^xw

It is fairly easy to prove that in this case e(x) exists for all real x, is infinitely differentiable, is uniquely defined by the differential equation and the initial conditions, and has the properties

e^xw ≠ 0 for all x(e^xw)⁻¹ = e^−xwe^xwe^yw = e^(x+y)w

EXERCISE

Verify that s is a logarithm and that (see Example 1.7)

e^−xs = h₋₁(x)

Some but not all elements of are logarithms. The element s is a logarithm, and so are real multiples of s, but it can be proved that js is not a logarithm. All elements of are logarithms, and

The series representation holds also for other w; thus it holds for integrable (rather than continuous) functions, or for w = 1. But it does not hold for all logarithms; for instance, the series fails to converge for w = s, which is a logarithm. If u and v are logarithms, then αu + βv is a logarithm for real, but not necessarily for complex, α and β.

Exponential functions arise in the solution of partial differential equations, in which they often correspond to fundamental solutions.

EXAMPLE 1.9. To prove that s^½ is a logarithm.

Let us set

Q(x) = {Q(x,t)}R(x) = {R(x,t)}

where

Clearly, Q(x) = −R′(x). Now the function

although a parametric function, fails to be continuous; but the function

is continuously differentiable with respect to x when x > 0, and

On the other hand, we have

and upon introducing a new variable of integration v by

we obtain

so that

Q′(x) = −s^½Q(x)

Moreover, l²Q(x) approaches {t} = l² uniformly in every interval 0 ≤ t ≤ t₀, and hence

In the course of this work we have also seen that

and

where erf denotes the error function. For fixed x > 0, Q(x,t) as a function of t increases for 0 < t < x²/6 and decreases thereafter. Thus,

Since this expression approaches zero, uniformly in t, as x → ∞, we see that exp (−xs^½) → 0 as x → ∞. It follows from this that, if for some , a exp (xs^½) is a bounded continuous parametric function for x ≥ 0, then a = 0; and if for and , the function

a exp (xs^½) + b exp (−xs^½)

is a bounded continuous parametric function for all real x, then a = b = 0.

1.9 The Diffusion Equation

Let us briefly indicate the application of the technique developed here to the diffusion equation

u_xx(x,t) = u_t(x,t)

in the half plane − ∞ < x < ∞, 0 ≤ t < ∞ (subscripts indicate partial derivatives). If

u(x) = {u(x,t)}

is a parametric function possessing continuous partial derivatives with respect to x and t, and a continuous second partial derivative with respect to x, our partial differential equation may be replaced by the operator differential equation

where ϕ(x) = u(x,0).

If ϕ(x) is an integrable function, this differential equation may be solved by the method of variation of parameters. Two solutions differ by an operator function of the form

a exp (xs^½) + b exp (−xs^½)

where and ; and, by the remark at the end of the preceding section, a solution that is a bounded continuous parametric function is unique within the class of such functions. It may be verified that

is a bounded continuous parametric solution of Eq. (1.7) if ϕ is a bounded measurable function. In this sense, the function

may be regarded as a (unique) generalized solution of our boundary-value problem if ϕ(x) = u(x,0) is a bounded measurable function. Actually, this solution is differentiable, indeed analytic, for t > 0 and for all x; and although it is not a continuous function for t ≥ 0, it satisfies the “initial condition” in the generalized sense that

u(x,t) → ϕ(x)as t → 0 t > 0

at least for all those x at which ϕ is continuous. By a more refined analysis, this result can be extended to measurable functions that, instead of being bounded, are assumed to satisfy an inequality

|ϕ(x)| ≤ A exp (B|x|^α)

where A, B, and α are constants and 0 ≤ α < 2.

Other problems involving parabolic equations, and problems involving the wave equation and other hyperbolic equations, can be solved by means of this operational calculus, but so far no significant and successful applications to elliptic partial differential equations, such as Laplace’s equation, are known.

1.10 Extensions and Other Theories

Mikusiński ¹⁴ has extended this theory to functions of several variables, t₁, …, t_n, ranging over the “cone”

t₁ ≥ 0, …, t_n ≥ 0

He¹³ has also developed the corresponding theory for convolution quotients of functions on a finite interval, α ≤ t ≤ β.

An alternative theory has been proposed by J. D. Weston,²⁶^,²⁷ whose generalized functions are operators acting on certain “perfect functions” rather than convolution quotients of functions.

DISTRIBUTIONS

1.11 Testing Functions

We now turn to generalized functions of an entirely different type, to distributions.²¹ There are several different approaches to this theory, most of them resembling the theories of the delta function indicated in Sec. 1.2 in that distributions appear either as generalized limits of functions or else as characterized by their action on certain classes of functions. We shall present here the second point of view for generalized functions of a single real variable t ranging over the entire real line (−∞, ∞). Alternative approaches and extensions will be mentioned in Sec. 1.18.

Since distributions will be defined in terms of their action on certain classes of functions, the resulting notion of generalized functions will depend on the class of testing functions on which distributions act. We shall use two spaces of testing functions: One has proved useful in applications to Fourier analysis, and the other has been employed in connection with partial differential equations. Other classes of testing functions have also been used.⁵

Let be the set of infinitely differentiable functions decreasing rapidly as t → ± ∞. More precisely, ϕ is in if all derivatives ϕ^(k) exist and if for any integer k and any polynomial P(t), P(t)ϕ^(k)(t) → 0 as t → ± ∞. The set is a vector space in the sense that, for any two elements ϕ₁ and ϕ₂ of and any two real or complex numbers c₁ and c₂, the function c₁ϕ₁ + c₂ϕ₂ is defined (in the obvious way) and is again in . We shall use 0 indiscriminately to denote the number 0 and the function identically equal to zero for all values of t. We now introduce a notion of convergence in S. A sequence of functions ϕ_n in S is said to converge to 0 if, for any fixed k and fixed polynomial P(t), P(t)ϕ_n^(k)(t) → 0 uniformly for all real t as n → ∞. A sequence of functions ϕ_n is said to converge to ϕ if ϕ_n − ϕ converges to 0. We shall indicate this by writing ϕ_n → ϕ as n → ∞.

Let c_n → c (in the sense of convergence of numbers) and let ϕ_n → ϕ and θ_n → θ, in the sense of convergence in , as n → ∞; then it is easy to see that also c_nϕ_n → cϕ and ϕ_n + θ_n → ϕ + θ, in the sense of convergence in , as n → ∞. Thus multiplication by a number and addition of functions are continuous operations. From now on, we shall usually omit the qualifying phrases appearing in the parentheses above, since the nature of the entities involved will indicate which space we are in and which notion of convergence should be used.

Secondly, let be the set of infinitely differentiable functions ϕ vanishing outside some finite interval, the interval depending on ϕ and varying from element to element of . There are such functions. As an example, let us define

where a and b are real numbers, a < b, and c, d, α, and β are positive numbers. Clearly, ϕ vanishes outside a finite interval and is infinitely differentiable except possibly at a and b, and it is easy enough to show that ϕ is also infinitely differentiable at these two points. We now introduce a notion of convergence in . A sequence of functions ϕ_n in is said to converge to 0 if there is a finite interval I such that each ϕ_n vanishes outside I, and if, for each fixed nonnegative integer k, ϕ_n^(k)(t) → 0 uniformly for all real t (or all t in I) as n → ∞. A sequence of functions ϕ_n is said to converge to ϕ if ϕ_n − ϕ converges to 0, and we shall write ϕ_n − ϕ → 0 and ϕ_n → ϕ as n → ∞. Clearly, is a vector space, and the operations of multiplication of a function by a number and addition of two functions are continuous. Equally clearly, every element of belongs also to , and convergence in implies convergence in .

Elements of or will be called testing functions.

In studying the action of other functions on testing functions, we shall start with continuous functions of t and proceed to functions that are merely locally integrable in the sense that they are integrable over every finite interval. In this context a function of t will be said to be of slowgrowth if its growth is dominated by that of some polynomial; in other words, a function f is of slow growth if (1 + t²)^−Nf(t) is bounded for some N.

1.12 The Definition of Distributions

For continuous functions f of slow growth,

converges for each ϕ in and defines an “evaluation” of f on all elements of . In classical analysis, we think of a function f as characterized by its values f(t) for all real t. We now claim that, alternatively, we can characterize such a function by its evaluations f〈ϕ〉 on all elements of . In order to substantiate this claim, we have to show that two continuous functions of slow growth possessing the same evaluations on all elements of also possess the same values for all t and hence are identical. It will be sufficient to show that f〈ϕ〉 = 0 for all ϕ in entails f(t) = 0 for all t. Indeed, suppose f(t₀) ≠ 0 for some t₀, say f(t₀) > 0. Since f is a continuous function, there is some interval I around t₀ on which f is positive. Take any interval (a,b) in the interior of I and define

Then clearly f〈ϕ_ab〉 > 0, and accordingly the assumption f(t₀) ≠ 0 for some t₀ is inconsistent with f〈ϕ〉 = 0 for all ϕ in S. Incidentally, we see that a continuous function of slow growth is completely characterized by its evaluations on the ϕ_ab for rational a and b, but we shall continue to think of it in terms of its evaluations on all ϕ in .

Many discontinuous functions—namely, all locally integrable functions—of slow growth also possess evaluations on , and the proof given above shows that at points of continuity the values of such a function are completely determined by the evaluations of the function. On the other hand, the values at points of discontinuity are not at all determined. For instance, the two functions f and g defined by

clearly have the same evaluations on all ϕ in , but their values at t = 0 differ. More generally, if N is a null function—that is, N is locally integrable and

(or, N vanishes almost everywhere)—then f and f + N will have the same evaluations on all elements of . The situation is not unlike the one encountered in connection, say, with Fourier series or Laplace transforms, where f and f + N have the same Fourier series or Laplace transforms, as the case may be, and are thus indistinguishable. It so happens that in many situations the distinction between two functions differing by a null function is unimportant, and in such situations the evaluations of a function on characterize it as far as a characterization is meaningful —i.e., up to a null function. In many problems in applied mathematics, the functions that most naturally arise in the data or in the final results either are continuous or else have simple types of discontinuities, and in the latter case the right-hand and left-hand limits of a function at a discontinuity do matter, while the value of the function itself at the discontinuity is usually artificial and has no physical significance. In such problems, then, f〈ϕ〉 is as satisfactory as f(t) for a characterization of f.

The function f(t) indicates a mapping of the real line into the space of real and complex numbers; similarly, f〈ϕ〉 indicates a mapping of into the space of numbers in that it assigns a number, defined by Eq. (1.8), to each element of . Now, a mapping of a vector space into a space of numbers is called a functional, and in this sense we may say that the function f is characterized by the functional on that it generates. We shall henceforth use f indiscriminately for either the function or the functional.

The functional f〈ϕ〉 defined by Eq. (1.8) is clearly linear in the sense that, for any two numbers c₁ and c₂ and any two testing functions ϕ₁ and ϕ₂, we have

f〈c₁ϕ₁ + c₂ϕ₂〉 = c₁f〈ϕ₁〉 + c₂f〈ϕ₂〉

We claim that the function is also continuous in the sense that ϕ_n → ϕ in entails f〈ϕ_n〉 → f〈ϕ〉, in the sense of convergence of numbers, as n → ∞. On account of the linearity of the functional, it will be sufficient to show this in the special case ϕ = 0. Now, f being of slow growth, there exist numbers A and N such that

|f(t)| ≤ A(1 + t²)^N

Moreover, by the definition of convergence in ,

(1 + t²)^N+1ϕ_n(t) → 0

uniformly for all real t as n → ∞, and consequently for any positive we have

(1 + t²)^N+1|ϕ_n(t)| <

for all t and all sufficiently large n. But then

for all sufficiently large n, showing that f〈ϕ_n〉 → 0 as n → ∞.

Thus we see that every locally integrable function of slow growth determines uniquely a continuous linear functional on , and conversely such a function is determined up to a null function by its continuous linear functional.

There are many continuous linear functionals on that are not generated by functions. For instance, for all nonnegative integers k, the equations

assign numbers to each testing function and thus determine functionals on that are easily seen to be linear and continuous. Now, a process similar to the one carried out above for continuous functions shows that if this functional were generated by a function, then that function would have to vanish at all of its points of continuity other than the origin. Thus, it would vanish identically if it were continuous, and a familiar process of approximation of integrable functions by continuous ones shows that, in any event, it would have to be a null function. But a null function fails to generate the functional δ^(k), showing that this functional cannot be generated by a locally integrable function. Indeed, comparison with Eqs. (1.1) and (1.2) shows that it is the functional corresponding to the delta function for k = 0 and to its derivatives for k ≥ 1.

These considerations suggest that we regard every continuous linear functional on as defining a generalized function. The space of these continuous linear functionals will be denoted by ; it clearly contains all locally integrable functions of slow growth, and it also contains the delta function and its derivatives.

In all the foregoing considerations, we could restrict the testing functions to and thus obtain the space of continuous linear functionals on . (Note in this connection that ϕ_ab is in .) Every continuous linear functional on is also a continuous linear functional on and hence in , and there are functionals in that are not in . The integral in Eq. (1.8) converges for any continuous or locally integrable function, not necessarily of slow growth, when ϕ is in , thus showing that generalizes functions of arbitrary growth in the same sense in which generalizes functions of slow growth.

We shall call the elements both of and of distributions; in particular, we shall call δ the delta distribution. If it is necessary to distinguish between them, we shall call the elements of distributions of slow growth, and those of we shall call distributions of arbitrary growth or, shortly, distributions. The name “distribution” was introduced by L. Schwartz. It is suggested by a consideration of an integrable function f(t) as the density of a continuous distribution of mass, while the delta distribution corresponds to the unit mass concentrated at t = 0. Although it fails to offer a description of other generalized functions such as δ′ and is not in accordance with the use of the same term in probability theory, the name “distribution” will be retained here as a convenient means for distinguishing generalized functions introduced here from those that appear, say, in Mikusiński’s theory of operators.

1.13 Operations with Distributions

We shall generally denote distributions by capital letters, such as R, S, and T. Two distributions S and T are equal if S〈ϕ〉 = T〈ϕ〉 for all testing functions ϕ. We define multiplication of a distribution T by a number c and addition of two distributions S and T by

(cT)〈ϕ〉 = cT〈ϕ〉(S + T)〈ϕ〉 = S〈ϕ〉 + T〈ϕ〉

These algebraic operations have the usual properties, and the set of distributions equipped with these operations is a vector space.

The product of two distributions, to correspond to multiplication of the values of two functions, cannot be defined in general. In particular cases, however, such a definition is possible. For instance, let θ(t) be infinitely differentiable for all real t. Then the function θ corresponds to a distribution, which we shall also denote by θ. Given an arbitrary distribution T, we can define the product θT by

(θT)〈ϕ〉 = T〈θϕ〉

noting that if . Moreover, if both the function θ and the distribution θT are of slow growth, then θT will also be of slow growth.

The convolution of two distributions can also be defined, but its definition requires a consideration of distributions in two variables and will be considered only briefly, in Sec. 1.15.

If we think of distributions as generalized functions, we sometimes write T(t), and in this spirit occasionally write symbolically

For instance, in this sense we may write

Distributions do not in general have definite “values at points t.” Nevertheless, a distribution may have “values” on an interval in the following sense. We shall say that the distribution T vanishes on the open interval (a,b) if T〈ϕ〉 = 0 for all testing functions vanishing outside (a,b). In such a case we also write

T(t) = 0a < t < b

If the distribution T is generated by a locally integrable function f, then T will vanish on (a,b) in the sense of our definition if and only if f vanishes almost everywhere on (a,b).

This definition can be extended to a local comparison of a distribution T with a function f. Let f be integrable on (a,b), and extend f to all values of t, for instance by setting it equal to zero outside (a,b). The function f so extended defines a distribution, which we shall again denote by f. We then say

or T = f on (a,b) if the distribution T − f vanishes on (a,b) in the sense described above. Note that the vanishing or otherwise of T − f on (a,b) is independent of the manner of extending f.

In this sense, the delta distribution clearly vanishes on any open interval not containing the origin, and this circumstance may be expressed by stating δ(t) = 0 for t ≠ 0, without implying the existence of “values” of δ(t) for individual values of t. For the Heaviside distribution, defined by

we clearly have H(t) = 0 for t < 0 and H(t) = 1 for t > 0, so that in this sense we can attribute values to H everywhere except at the origin. As a matter of fact, H can be generated by either of the two functions defined in Eqs. (1.9).

The open set consisting of all open intervals on which a distribution T vanishes (i.e., the union of all such intervals) is called the null set of T. The collection of values of t not in the null set (i.e., in the complement of the null set) is called the support of T. The support is the set of points on which T(t) is essentially different from zero; it is a closed set, and it can be characterized either as the smallest closed set outside of which T(t) vanishes or as the collection of points t not contained in any open interval on which T vanishes.

The support of the delta distribution is clearly the single point t = 0, and the support of the Heaviside distribution H is the nonnegative half t ≥ 0 of the real line.

We now come to the differentiation of distributions. If f is continuously differentiable, then f and f′ generate distributions, and we have, by integration by parts,

and similarly for higher derivatives. This suggests the following definition for the derivative T′ of an arbitrary distribution T:

T′ 〈ϕ〉 = T〈−ϕ′〉

and for the derivatives of higher order,

In this connection, it should be noted that the derivative of a testing function (rapidly decreasing testing function) is again a testing function (rapidly decreasing testing function).

EXAMPLE 1.10. To determine the derivative H′ of Heaviside’s distribution H.

According to the definition, we have

for all testing functions, and hence H′ = δ. It can be verified similarly that the derivatives of the delta distribution are the distributions defined in Eqs. (1.10).

According to our definition, every distribution is infinitely differentiable, and if the distribution is of slow growth, its derivatives will also be of slow growth. Since every locally integrable function generates a distribution, every such function possesses “distribution derivatives” of arbitrary order, but these derivatives need not be functions. Thus, Heaviside’s function and its derivative, the delta distribution, exemplify this. If, on an interval (a,b), f possesses a locally integrable derivative, however, then the distribution derivative and the derivative in the usual sense are equal on (a,b). To show this, denote for the moment the distribution derivative of f by T′, so that

T′〈ϕ〉 = f〈−ϕ′〉

For testing functions vanishing outside (a,b), the computation (1.12) remains true and shows that

T′(t) = f′(t)

on (a,b). For this reason, we may write f′ for the distribution derivative without ambiguity. In particular,

H′(t) = δ(t)

is both meaningful and correct in this sense.

EXAMPLE 1.11. To investigate the derivatives of t^α, α > −1.

Since the values of this function fail to be real for negative t when α is fractional, we shall first discuss the function

f(t) = t^αH(t)

and the distribution T generated by it. If k is a positive integer and α − k > −1, then T^(k) is everywhere equal to the locally integrable function

and this equality holds for all positive integers k if α is a nonnegative integer. If α is not an integer and α − k < −1, then T^(k) will no longer be a function; nevertheless, the distribution T^(k) will be equal to the function f_k when t ≠ 0.

The following definition is then suggested. If [t^αH(t)] represents the distribution generated by t^αH(t) when α > −1, then the formula

in which denotes the kth distribution derivative, can be proved for α > −1, k = 1, 2, …, and it may be taken as the definition of the left-hand side when α < −1, k is a positive integer, and α + k > −1. It is easy to see that this definition is independent of k and that, with [t^βH(t)] so defined, the formula holds for all nonintegral α and all positive integers k.

The definition of [t^α] for a negative integer α involves logarithms and will not be given here.

In order to extend fractional powers also to negative t, we introduce

and obtain the further formulas

the discussion of which is left to the reader.

The differentiation of distributions obeys the usual rules. If c is a constant, if θ(t) is an infinitely differentiable function and θ the distribution that it generates, if S and T are distributions, and if θT is the product defined at the beginning of this section, then

(cT)′ = cT′(S + T)′ = S′ + T′(θT)′ = θ′T + θT′

It will be sufficient to prove the last of these statements:

Finally, we can define primitives or antiderivatives of distributions, corresponding to indefinite integrals of functions. The distribution T₁ is called a primitive of T if it satisfies . It is clear from the corresponding situation with regard to functions that a distribution may possess several primitives; we shall presently pursue this question further. Meanwhile, let us prove that every distribution possesses at least one primitive. In order to do this, let us fix a ϕ₀ in for which

For this, we shall arbitrarily set T₁ 〈ϕ₀〉 = 0. Corresponding to every testing function ϕ, we define a function ϕ₁ by the equation

It is easy to verify that ϕ₁ is again a testing function and that every testing function can be represented in the form ϕ₁. We further define a new distribution T₁ by

T₁〈ϕ〉 = T〈−ϕ₁〉

Clearly, T₁〈ϕ₀〉 = 0, and since

so that T₁ is a primitive of T.

It may be noted that in this discussion ϕ may be either in or in ; ϕ₁ will be in the same class of testing functions, so that for distributions of slow growth we have demonstrated the existence of a primitive of slow growth.

A constant c is an infinitely differentiable function and hence generates a distribution, which we shall again denote by c; thus,

Such a distribution will be called a constant distribution. Clearly, the constant distribution defined by c is equal to c everywhere in the sense of the definition (1.11), and cT may be used unambiguously for either the constant multiple of the distribution T or the product of the two distributions.

The derivative of a constant distribution is zero, since

We shall now prove that, conversely, the condition T′ = 0 or, equivalently, T〈ϕ′〉 = 0 for all testing functions entails that T is a constant distribution. To prove this, fix ϕ₀, and for every testing function ϕ define the testing function ϕ₁, as above. Since , it follows from Eq. (1.14) that

Hence T is the constant distribution that is equal to T〈ϕ₀〉 everywhere.

If T₁ is a primitive of T and c is any constant distribution, then T₁ + c is also a primitive of T, and it follows from the result of the preceding paragraph that any two primitives of a given distribution differ by a constant distribution.

These considerations can be extended to repeated primitives. A distribution generated by a polynomial of degree n being called a polynomial distribution of degree n, it is clear that derivatives and primitives of polynomial distributions are again polynomial distributions and have the appropriate degrees. In particular, the kth derivative of a polynomial distribution of degree k − 1 or less is zero; if T_k is a kth primitive of T and if p is any polynomial distribution of degree k − 1 or less, then T_k + p is also a kth primitive of T; conversely, any two kth primitives of a given distribution differ at most by a polynomial distribution of degree k − 1 or less.

1.14 Convergence of Distributions

We have seen that distributions form a vector space. In this vector space we introduce a notion of convergence, saying that the sequence of distributions T_n converges provided that, for every testing function ϕ, the sequence of numbers T_n〈ϕ〉 converges; further, we say that T_n converges to T, in symbols T_n → T as n → ∞, provided that, for every testing function ϕ, we have T_n〈ϕ〉 → T〈ϕ〉. Let T_n be a convergent sequence of distributions; then, for each ϕ, lim T_n〈ϕ〉 exists and defines a functional T〈ϕ〉. This functional is clearly linear, and it can also be proved to be continuous; thus, a convergent sequence always has a limit.

The notion of convergence defined above is consistent with the vector-space structure in that it makes the addition of two distributions and the multiplication of a distribution by a number continuous operations; that is, if c_n → c, S_n → S, and T_n → T as n → ∞, then also c_n T_n → cT and S_n + T_n → S + T as n → ∞. Moreover, a sequence of constant distributions T_n = c_n converges in the sense of distributions if and only if the sequence of numbers c_n converges in the sense of convergence of numbers, and the two limits correspond.

Since every locally integrable function determines a distribution, convergence of distributions may be used to define generalized limits of sequences of functions, it being understood that the generalized limit, if it exists, is in general a distribution rather than a function. The connection between ordinary pointwise limits and generalized limits of sequences of functions is by no means simple. Either of these limits may exist when the other does not, and even if both exist, they need not be equal. We shall exemplify these points.

EXAMPLE 1.12. To show that if f_n(t) = sin nt, then f_n → 0 in the sense of convergence of distributions.

Since every testing function ϕ is absolutely integrable, we have

by the Riemann-Lebesgue lemma. Note that every derivative of a testing function is again absolutely integrable and that therefore it can be proved through integrations by parts that the sequence of functions n^a sin nt, for any value of the constant a, also tends to 0 in the sense of convergence of distributions. Note also that for a fixed t that is not an even multiple of 2π, lim f_n(t) fails to exist as n → ∞.

EXAMPLE 1.13. To prove that the function f_n(t), defined by

does not converge in the sense of distributions.

Here, clearly, f_n(t) → 0, for every fixed t, as n → ∞, but

fails to have a limit as n → ∞ if ϕ(0) ≠ 0; thus, we have pointwise limits everywhere and yet no distribution limit.

EXAMPLE 1.14. To prove that for the function f_n(t), defined by

we have f_n → δ as n → ∞.

Since

it follows that f_n → δ, in the sense of convergence of distributions, as n → ∞. On the other hand, for each fixed t, f_n(t) → 0 as n → ∞. Thus both limits exist, though they do not agree.

Note that in the foregoing example the discrepancy was caused by the nonuniformity of the pointwise limit at t = 0; indeed the very existence of the limit at t = 0 was enforced only by an artificial definition of the function f_n(t) at that point. Under more stringent conditions, for instance under the condition that f_n(t) → f(t) uniformly for all t as n → 0 or that

it can be proved that the existence of the pointwise limit entails that of the distribution limit, and that the two limits are equal.

EXERCISE

Show that if f(t) is absolutely integrable over (− ∞, ∞), and f_n(t) = nf(nt), then f_n → δ as n → ∞. (Use Example 1.2 as a hint.)

One of the more remarkable properties of the convergence of distributions is its insensitivity to differentiation. If T_n → T as n → ∞, then

and hence also as n → ∞.

As an application of this property, convergent series of distributions (and likewise infinite integrals of distributions) may always be differentiated term by term. The differentiated series are then always convergent in the sense of distributions. This causes a very wide class of trigonometric series, in particular all Fourier series, to converge in the sense of distributions even though they may fail to converge in the ordinary sense. A trigonometric series such as

always converges in the sense of distributions provided that c_n = O(n^k) for some fixed integer k as n → ∞; for in this case the series

converges uniformly for all t and its differentiation k + 2 times produces the given series. In particular, for a Fourier series the c_n are bounded, and this process may be used with k = 0, so that the distribution sum of a Fourier series is at worst the second distribution derivative of a continuous periodic function.

In the definition of the relation T_n → T, the integer-valued parameter n may be replaced by a continuously varying parameter tending to some limit, and we shall take this extension for granted.

The notion of the distribution limit may be used in order to interpret divergent sums and integrals.

EXAMPLE 1.15. To show that

exists and is equal to the delta distribution.

Set

Then

by Fourier’s single-integral theorem, and hence f_a → πδ as a → ∞. This proves the result and shows that the formula

frequently used in applied mathematics is correct in the sense of distribution convergence.

The notion of the distribution limit may also be used to establish

as a meaningful formula. Here T(t) is the distribution T, and T′(t) is the distribution T′ defined by Eq. (1.13). The interpretation of T(t + h) as a distribution T_h is suggested by the formula

which is certainly valid when T(t) is a function. For any testing function ϕ let us define a new testing function ϕ_h by

ϕ_h(t) = ϕ(t − h)

and then define, for any distribution T, a new distribution T_h by

T_h〈ϕ〉 = T〈ϕ_h〉

We then wish to prove that

Now,

and thus we have to show that

in the sense of convergence in . Now, if ϕ is in , then, for all values of h such that |h| ≤ h₀, θ_h vanishes outside a finite interval that is independent of h, and we need to show that for each nonnegative integer k we have θ_h^(k)(t) → 0, uniformly for all t, as h → 0. We note that

and by the mean-value theorem of differential calculus we thus obtain

where is between u and t. The function ϕ^(k+2)(t) is bounded, say

|ϕ^(k+2)(t) ≤ A_k+2

for all t. It then follows that

so that, for each k, θ_h^(k)(t) → 0 uniformly for all t as h → 0, and the result is established. This consideration shows that the definition of T′ is consistent with the definition more closely resembling that of the derivative of a function.

1.15 Further Properties of Distributions

Let us consider a distribution depending on a parameter u, and let us denote such a distribution by T_u. Here u may be a real or complex parameter, or a parameter of a more general kind. The theory of convergence of distributions makes it possible to speak of continuous dependence of T_u on u, of the partial derivative of T_u with respect to u, of integrating T_u with respect to u, and so on. We shall take these developments for granted without enlarging upon them here.

The concept of distributions can easily be extended to functions of several variables, together with the notions of equality, convergence of distributions in several variables, partial derivatives of such distributions with respect to these variables, and so on. In particular, mixed partial derivatives of distributions are always independent of the order of differentiation, since mixed partial derivatives of testing functions have this property.

It will be sufficient to make a few remarks on distributions in two variables, s and t, based on testing functions ϕ(s,t). In this case the notation of generalized functions, R(s,t), S(s), T(t), is especially useful in that it indicates whether we are considering distributions in one or the other variable or in both variables. Similarly, for the testing functions we shall write ρ(s,t), σ(s), τ(t). Partial derivatives may then be defined by

We shall say that the distribution R(s,t) is independent of s, or depends only on t, if there exists a fixed distribution T(t) such that R = T on all open sets of the st plane, or, alternatively, if

whenever σ is a testing function of s and τ is a testing function of t (and hence στ is a testing function of s and t).

It can easily be proved that a distribution R is independent of s if and only if ∂R/∂s = 0. In a similar manner we may speak of distributions depending only on s + t, say, and prove that a distribution depends only on s + t if and only if ∂R/∂s − ∂R/∂t = 0. These concepts have applications in the theory of partial differential equations. For example, if f and g are two locally integrable functions of one variable, then f(s + t) and g(s − t) generate distributions in the two variables s and t. The first of these distributions depends only on s + t, the second only on s − t, and both satisfy the differential equation

so that their sum is also a solution of this differential equation in the sense of distributions. Conversely, it can be shown that any distribution that satisfies this equation is a sum of a distribution depending only on s + t and one depending only on s − t. Thus, the general distribution solution of the one-dimensional wave equation has been obtained, and we are in a position to handle discontinuous wave motions (see Sec. 1.3).

Now S and T determine a distribution

R(s,t) = S(s)T(t)

in two variables, which is known as the direct product of S and T. It may be defined by

R〈στ〉 = S〈σ〉T〈τ〉

where σ is a testing function of s and τ a testing function of t. It may also be shown that, for any testing function ϕ = ϕ(s,t) and any fixed s (which makes ϕ a testing function of t), T〈ϕ〉 exists; as s varies, T〈ϕ〉 may be shown to be a testing function of s, so that S〈T〈ϕ〉〉 exists and may be used as a definition of R〈ϕ〉.

These concepts may be used to define the convolution of two distributions. To motivate this definition, consider two functions f and g of a single variable and define their convolution h = f * g by

Now let ϕ be a testing function in one variable. At least formally,

and if we set u = s + t, we get

Now let S and T be two distributions in a single variable and let S(s)T(t) be their direct product in the sense explained above. We then define the convolution S * T as that distribution of a single variable for which

S * T〈ϕ〉 = S(s)T(t)〈ϕ(s + t)〉

Already in the case of locally integrable functions f and g a difficulty arises in that h need not be locally integrable (indeed the integral defining h need not exist), so that Eq. (1.15) does not really define a distribution. The corresponding difficulty in the definition of S * T shows up in the form that ϕ(s + t) is constant along the lines s + t = const; it certainly does not vanish at infinity (unless it vanishes identically), and hence it is not a testing function. The situation can be saved if suitable restrictions are placed on S and T, for instance if the support of one of these distributions, say that of T, is contained in a finite interval. In this case it can be shown that T(t)〈ϕ(s + t)〉 is a testing function of s, so that S(s)〈T(t)〈ϕ(s + t)〉〉 is a valid definition of S * T〈ϕ〉.

EXAMPLE 1.16. To show that for any distribution T(t), T * δ exists and is equal to T.

Here

δ(t)〈ϕ(s + t)〉 = ϕ(s)

is clearly a testing function, and

T(s)〈δ(t)〈ϕ(s + t)〉〉 = T〈ϕ〉

Similarly, it can be proved that

T * δ^(k) = T^(k)k = 1, 2, …

The convolution S * T is a continuous function of each of its two factors.

We have seen in Example 1.14 that the delta distribution can be represented as a distribution limit of functions. It can be shown that this is true of every distribution and, moreover, that the approximating functions may be chosen as infinitely differentiable. If in the exercise following Example 1.14 we take f as a testing function α, it is seen that there are sequences of testing functions α_n such that α_n → δ as n → ∞. We then consider the distributions generated by the α_n, denoting them again by α_n, and with these distributions we consider the convolutions T * α_n. By the continuity of the convolution, T * α_n → T * δ = T. Moreover, the distribution T * α_n is equal to the function T(u)〈α_n(t − u)〉, and it is not difficult to prove that this function of t is infinitely differentiable. Thus we see that every distribution is the limit, in the sense of distributions, of infinitely differentiable functions.

Every locally integrable function has distribution derivatives of all orders, and some distributions can thus be represented as distribution derivatives of locally integrable functions. Such distributions are known as distributions of finite order, and the least integer r for which T = f^(r) for some locally integrable f is called the order of T. In this sense, locally integrable functions are distributions of order zero, the delta distribution is of order one, and so on. Not all distributions are of finite order; for instance, the distribution T defined by

is not of finite order. Nevertheless, it can be proved that, given a distribution T and any finite interval I, there exists an integrable function f and an integer r such that T = f^(r) on I. Thus, locally—i.e., on every finite interval—distributions are of finite order, but the order may increase indefinitely as the interval I is made to expand.

The results briefly described in the last two paragraphs give us an added understanding of the nature and structure of distributions, and they suggest alternative approaches to the theory of distributions. These will be taken up in Sec. 1.18.

APPLICATIONS AND EXTENSIONS

1.16 Application to Fourier Transforms

In this section, “testing function” will mean a rapidly decreasing infinitely differentiable function, i.e., an element of , and “distribution” will mean a distribution of slow growth, i.e., an element of .

For a function f that is absolutely integrable over (−∞, ∞) the Fourier transform is defined by

As a result of the uniform convergence of the infinite integral, is a continuous function of t, and by the appraisal

is bounded. For two absolutely integrable functions f and g and their Fourier transforms and we have Parseval’s relationship

Since f and g are absolutely integrable and and are bounded, the infinite integrals on both sides of this equation converge.

For functions that fail to be integrable on (− ∞, ∞), this definition of Fourier transform does not apply. Nevertheless, the infinite integral evaluated in Example 1.15 suggests that the Fourier transform of the function f, for which f(t) = 1 for all t, exists in the distribution sense and is equal to 2πδ(t). We shall show that Fourier transforms can be defined for all distributions of slow growth, and that they are in general again distributions of slow growth. It may be remarked here that Fourier transforms of functions of slow growth were investigated (Ref. 1, Chap. VI) before the development of a general theory of distributions, and it might be noted further that a very elegant and elementary presentation⁹ of Fourier transforms of distributions of slow growth has recently been given.

Let us start with a testing function ϕ and its Fourier transform

Formal differentiation of this relationship leads to

and formal repeated integrations by parts to

Since ϕ is rapidly decreasing, repeated differentiation of may be justified by the uniform convergence of all integrals involved, and it shows that the Fourier transform is again infinitely differentiable. The rapid decrease of all derivatives of ϕ shows that repeated integrations by parts are legitimate, that the integrated parts vanish, and that Eq. (1.18) holds. By combining Eqs. (1.17) and (1.18), we have

If P(t) is any polynomial, we can now express as a Fourier integral involving a sum of polynomials multiplied by derivatives of ϕ, and by estimating this Fourier integral we can show that is rapidly decreasing. Thus, the Fourier transform of a testing function is again a testing function.

Since is again absolutely integrable, Fourier’s inversion formula

applies in this case and shows that

Thus, every testing function is the Fourier transform of some testing function, so that the Fourier transformation is a one-to-one mapping of the space of testing functions onto itself.

This mapping is clearly linear, and it is continuous, that is, if and only if ϕ_n → 0 as n → ∞. Because of the essential symmetry of the relationship between ϕ and , it will be sufficient to indicate the proof going one way. Since ϕ_n → 0 in the sense of convergence in , given any > 0 we have

(1 + t²)|ϕ_n(t)| <

for all sufficiently large n, and hence

for all sufficiently large n. This shows that uniformly for all t as n → ∞. To show that in the sense of convergence in , we have to show that for any integer k and any polynomial P(t)

uniformly for all t as n → ∞. Now, it has been pointed out above that is the Fourier transform of a finite number of expressions of the form Q(t)ϕ_n^(m)(t), where Q(t) again denotes a polynomial. Each of these expressions can be made less than /(1 + t²) by making n sufficiently large, thus proving as above that

uniformly for all t as n → ∞.

We now return to Fourier transforms of locally integrable functions. If f is absolutely integrable, then its Fourier transform is continuous and bounded; hence it is locally integrable and of slow growth. Thus, both f and can be evaluated on testing functions, and Parseval’s relationship shows that

This suggests that we define the Fourier transform of a distribution of slow growth T by

Clearly, as thus defined is a linear functional on testing functions. This linear functional is continuous, since ϕ_n → ϕ entails and consequently

According to this definition, every distribution of slow growth possesses a Fourier transform that is again a distribution of slow growth. The relationship

holds and shows that, conversely, every distribution of slow growth is the Fourier transform of some such distribution. Using the same symbol for the Fourier transformation of distributions and for the Fourier transformation of functions, we may write the definition given above also as

or more briefly as

The relationships

are analogous to and follow from Eqs. (1.17) and (1.18); their proof is left as an exercise for the reader.

EXAMPLE 1.17. To determine the Fourier transforms of the slowly increasing functions tⁿ, n = 0, 1, ….

The expression

gives the value at t = 0 of the Fourier transform, in the sense of Eq. (1.16), of the integrable function . By Eq. (1.18),

and by Fourier’s inversion formula,

So that

EXERCISE

Given that ; 0 < ν < 1. ; 0 < ν < 2 the reader should verify that these formulas hold for all nonintegral values of ν. (See Example 1.11 for the definitions involved here.)

The interested reader will find further material on Fourier transforms in Refs. 9 and 5; the latter reference also gives applications to partial differential equations.

1.17 Application to Differential Equations

In this section, “testing functions” will mean elements of and “distributions” will mean elements of .

Consider a linear ordinary differential equation

in which we assume the a_i, i = 1, …, n, to be infinitely differentiale functions of t and f to be a locally integrable function of t. It is known from classical analysis that this differential equation possesses an infinity of solutions all of which are n − 1 times continuously differentiable, with x⁽ⁿ⁻¹⁾ absolutely continuous, so that x⁽ⁿ⁾ exists and is locally integrable. It is also known that suitable initial conditions, for instance the values of x, x′, …, x⁽ⁿ⁻¹⁾ at a fixed point t₀, determine a unique solution.

We may now regard Eq. (1.19) as a differential equation in distributions. If x is any distribution, x^(k) is again a distribution; since a_n−k is infinitely differentiable, a_n−kx^(k) is defined so that x may be substituted in the left-hand side of Eq. (1.19). In this sense, every function that satisfies the differential equation in the classical sense is also a distribution solution of that equation. In this context it is natural to ask whether Eq. (1.19) has distribution solutions that are not included among the classical solutions either because x, although a function, lacks the appropriate differentiability properties or because x itself is a distribution. The answer to this question is an emphatic no: As long as f is locally integrable, the classical solutions are the only distribution solutions of Eq. (1.19).

To prove this, consider Eq. (1.19) on a fixed closed finite interval. On this interval, the distribution x⁽ⁿ⁾ is of a fixed order r; that is, it can be represented as the rth distribution derivative of some locally integrable function. If r ≥ 1, then x⁽ⁿ⁻¹⁾, x⁽ⁿ⁻²⁾, … are of order r − 1 at most, and so is

But this last expression, being equal to x⁽ⁿ⁾, is of order r; this contradiction shows that r cannot be ≥ 1 and hence must be 0. Thus, x⁽ⁿ⁾ is of order zero on every finite interval and hence is a locally integrable function.

The situation is different if we now consider the same differential equation with a right-hand side f that is itself a distribution. In this case, every solution is necessarily a distribution solution. If f is of order r on an interval, then the argument of the preceding paragraph shows that x⁽ⁿ⁾ is also of order r; if r ≥ n, then x itself is of order r − n; and if r < n, then x is n − r − 1 times continuously differentiable and x^(n−r−1) is absolutely continuous so that x^(n−r) is locally integrable. The difference of two solutions satisfies the homogeneous equation

and is a solution in the classical sense, so that the general distribution solution of Eq. (1.19) is the sum of a particular distribution solution and of the classical general solution of the homogeneous equation.

Such considerations are relevant with regard to Green’s functions (see, for instance, Ref. 4, Chap. 3) that satisfy differential equations of the form

and also satisfy appropriate boundary conditions. Since δ(t − τ) is of order zero and infinitely differentiable on any interval not including τ, and δ(t − τ) is of order 1 on any interval including that point, it follows that, under our assumptions on a₁, …, a_n, the Green’s function is infinitely differentiable except at t = τ; at this point, it possesses n − 2 continuous derivatives while x⁽ⁿ⁻¹⁾ has a unit jump.

Similar statements hold for the differential equation

in which a₀, …, a_n are infinitely differentiable functions, as long as a₀(t) ≠ 0. Zeros of a₀(t) are singularities of the differential equation, and at such singularities a different behavior may arise.

As an example, we shall consider the differential equation of the first order (see Ref. 6, Sec. 8)

tx′ + x = 0

The classical solution x = ct⁻¹, where c is an arbitrary constant, exists on (0, ∞) and on (− ∞,0) but fails to be integrable in any neighborhood of t = 0. The distribution suggested by the classical solution is c[log t]′, where the prime indicates a distribution derivative. To prove that this is indeed a solution, we evaluate

and obtain, through integration by parts,

So far, the only complication caused by the singularity was the necessity of replacing t⁻¹ by a distribution that, although equal to t⁻¹ for all t ≠ 0, is not generated by it. But more is to come. The integrated form of the differential equation is (tx)′ = 0, or tx = c, and it can be verified that c[log t]′ satisfies this equation. However, the solution of the equation is not unique, for

tδ〈ϕ〉 = δ〈tϕ〉 = 0

for all testing functions ϕ, showing that tδ = 0. Thus, c[log t]′ + aδ(t) satisfies tx′ + x = 0 for arbitrary values of a and c, so that this differential equation of the first order possesses a two-parameter family of distribution solutions.

Distributions are even more important in the theory of partial differential equations, but this topic is not within the scope of the present consideration.

1.18 Extensions and Alternative Theories

Different classes of generalized functions may be obtained by using different classes of testing functions. For instance, by using infinitely differentiable testing functions of a fixed period, one obtains generalizations of periodic integrable functions; by using k times differentiable testing functions, one obtains k times (rather than infinitely) differentiable distribution functions, etc. Distributions have also been defined on finite intervals, on arbitrary regions of n-dimensional space, on surfaces, and more generally on manifolds. For all these variants and extensions, the reader may consult Refs. 21 and 5. Vector-valued distributions and distributions in more abstract situations have also been studied, but these lie outside the scope of the present introduction. We mention only Ref. 15, which envisages applications to quantum mechanics.

In this chapter, we have chosen to follow Schwartz in presenting distributions as functionals on classes of testing functions. There are several alternative theories more or less equivalent to the one outlined here. The largest single group of these is inspired by the theorem that states that every distribution is the distribution limit of some sequence of functions (see Sec. 1.15) and indicates that it might be possible to construct distributions as generalized limits of functions.

Temple²³ attributes a very general method of defining generalized functions as “weak limits” of ordinary functions to Mikusiński.¹¹ This method was taken up by Ravetz;¹⁷ from the point of view of applied mathematics, it was treated by Temple²⁴^,²⁵ and Saltzer;¹⁸ and from the point of view of Fourier analysis, it was discussed by Lighthill.⁹ A somewhat more direct approach is inspired by a combination of approximation to distributions through functions and the identification of distributions on finite intervals with generalized derivatives of functions; this was the viewpoint adopted by Sikorski²² and Korevaar.⁸ The latter author defines distributions by sequences of integrable functions that are convergent in the generalized sense that on any given finite interval the sequence may be made uniformly convergent through a suitable number of integrations, the number of integrations depending on the interval.

A rather different approach is based on the identification of distributions, on finite intervals, with generalized derivatives of locally integrable functions. Here distributions are represented by formal series

in which Dⁿ is the symbol of generalized differentiation of order n and the f_n are locally integrable functions of which all but a finite number vanish identically on any given finite interval. This approach was briefly indicated by Halperin ⁶ and elaborated by König.⁷ It was adopted by Sauer,¹⁹ who used it in the solution of boundary-value problems.

Some of the major difficulties encountered in studying the mathematical theory of distributions are due to the circumstance that no norm exists for distributions, since none exists for the vector space of testing functions. There are several attempts at overcoming this difficulty, if necessary by restricting attention to a smaller class of generalized functions, and by basing the theory of distributions on the comparatively well-known and accessible theory of Banach spaces. We mention in this connection the work of E. R. Love¹⁰ and an unpublished theory of M. Riesz.

REFERENCES

1. Bochner, S., “Vorlesungen über Fouriersche Integrale,” Akademie-Verlag G.m.b.H., Berlin, 1932.

2. Dirac, P. A. M., “The Principles of Quantum Mechanics,” 2d ed., Oxford University Press, New York, 1935.

3. Erdélyi, A., “Operational Calculus,” Mathematics Department, California Institute of Technology, Pasadena, Calif., 1955.

4. Friedman, B., “Principles and Techniques of Applied Mathematics,” John Wiley & Sons, Inc., New York, 1956. “Operational Calculus and Generalized Functions,” California Institute of Technology, Pasadena, Calif., 1959.

5. Gelfand, I. M., and G. E. Šilov, Fourier Transforms of Rapidly Increasing Functions and Questions of Uniqueness of the Solution of Cauchy’s Problem, Uspehi Mat. Nauk, n.s., vol. 8, no. 6 (58), pp. 3–54, 1953; Amer. Math. Soc. Transl., ser. 2, vol. 5, pp. 221–274, 1957.

6. Halperin, I., “Introduction to the Theory of Distributions,” University of Toronto Press, Toronto, Canada, 1925. Based on lectures by Laurent Schwartz.

7. König, H., Neue Begründung der Theorie der “Distributionen” von L. Schwartz, Math. Nachr., vol. 9, pp. 129–148, 1953.

8. Korevaar, J., Distributions Defined from the Point of View of Applied Mathematics, Nederl. Akad. Wetensch. Proc., ser. A, vol. 58, pp. 368–389, 483–503, 663–674, 1955.

9. Lighthill, M. J., “Introduction to Fourier Analysis and Generalized Functions,” Cambridge University Press, New York, 1958.

10. Love, E. R., A Banach Space of Distributions, J. London Math. Soc., vol. 32, pp. 483–498, 1957; vol. 33, pp. 288–306, 1958.

11. Mikusiński, J. G., Sur la méthode de généralisation de M. Laurent Schwartz et sur la convergence faible, Fund. Math., vol. 35, pp. 235–239, 1948.

12. ——, “Rachunek Operatorów” [The Calculus of Operators], Monografie Matematyczne, Tom XXX, Polskie Towarzystwo Matematycne, Warszawa, 1953. “Operational Calculus,” Pergamon Press, Inc., New York, 1959.

13. ——, Le Calcul opérationnel d’intervalle fini, Studia Math., vol. 15, pp. 225–251, 1956.

14. —— and C. Ryll-Nardzewski, Un théorème sur le produit de composition des fonctions de plusieurs variables, Studia Math., vol. 13, pp. 62–68, 1953.

15. Nikodým, O. M., Summation of Quasi-vectors on Boolean Tribes and Its Applications to Quantum Theories. I. Mathematically Precise Theory of the Genuine P. A. M. Dirac’s Delta Function, Rend. Sem. Mat., Univ. Padova (in preparation).

16. Pol, Balth. van der, and H. Bremmer, “Operational Calculus, Based on the Two-sided Laplace Integral,” Cambridge University Press, New York, 1950.

17. Ravetz, J. R., Distributions Defined as Limits, Proc. Cambridge Phil. Soc., vol. 53, pp. 76–92, 1957.

18. Saltzer, C., The Theory of Distributions, Advances in Appl. Mech., vol. 5, pp. 91–110, 1958.

19. Sauer, R., “Anfangswertprobleme bei partiellen Differentialgleichungen,” 2d ed., Springer-Verlag OHG, Berlin, 1958.

20. Schmieden, C., and D. Laugwitz, Eine Erweiterung des Infinitesimalkalküls, Math. Z., vol. 69, pp. 1–39, 1958.

21. Schwartz, L., “Théorie des distributious,” 2 vols., Hermann & Cie, Paris, 1950, 1951.

22. Sikorski, R., A Definition of the Notion of Distribution, Bull. Acad. Polon. Sci., Cl. III, vol. 2, pp. 209–211, 1954.

23. Temple, G., Theories and Applications of Generalized Functions, J. London Math. Soc., vol. 28, pp. 134–148, 1953.

24. ——, La Théorie de la convergence généralisée et les fonctions généralisées et leur application á la physique mathématique, Rend. Mat. e Appl., ser. 5, vol. 11, pp. 113–122, 1953.

25. ——, The Theory of Generalized Functions, Proc. Roy. Soc. London, ser. A, vol. 228, pp. 175–190, 1955.

26. Weston, J. D., An Extension of the Laplace-transform Calculus, Rend. Circ. Mat. Palermo, ser. 2, vol. 6, pp. 1–9, 1957.

27. ——, Operational Calculus and Generalized Functions, Proc. Roy. Soc. London, ser. A, vol. 250, pp. 460–471, 1959.