1

From Delta Functions to Distributions

ARTHUR ERDÉLYI

PROFESSOR OF MATHEMATICS
CALIFORNIA INSTITUTE OF TECHNOLOGY

1.1    Introduction

In mathematical physics, one often encounters “impulsive” forces acting for a short time only. A unit impulse would be described by a function p(t) that vanishes outside a short interval and is such that

image

It is convenient to idealize such forces as “instantaneous” and to attempt to describe them by a function δ(t) that vanishes except for a single value of t which we take to be t = 0, is undefined for t = 0, and for which

image

Such a function, one convinces oneself, should possess the “sifting property”

image

for every continuous function ϕ, and the corresponding property (obtained by integration by parts)

image

for every k times continuously differentiable function ϕ.

Unfortunately, it can be proved that no function, in the sense of the mathematical definition of this term, possesses the sifting property. Nevertheless, “impulse functions” postulated to have these or other similar properties are being used with great success in applied mathematics and mathematical physics.

The use of such improper functions can be defended as a kind of short-hand, or else as a heuristic means; it can also be justified by an appropriate mathematical theory. In Sec. 1.2 we shall indicate briefly some theories that can be employed to justify the use of the delta function. In order to provide a theoretical framework accommodating the great variety of improper functions occurring in contemporary investigations of partial differential equations, it seems necessary to widen the traditional concept of a mathematical function. The new concept, that of a “generalized function,” is abstract and cannot reproduce all aspects of the older concept of a function. In particular, it is not possible to ascribe a definite value to a generalized function at a point. Nevertheless, we shall see that in some sense such generalized functions can be described. In particular, it makes perfectly good sense to say that δ(t), which is a generalized function, vanishes on any open interval not containing t = 0.

In this chapter we shall outline two theories of generalized functions. One, essentially algebraic in nature, is restricted to generalized functions on a half line; the other, more closely related to functional analysis, places less restrictions on the independent variable. We shall also mention briefly other theories of generalized functions.

DELTA FUNCTIONS AND OTHER GENERALIZED FUNCTIONS

1.2    The Delta Function

Since the delta function is the idealization of functions that vanish outside a short interval, it is plausible to try to approximate the delta function by such functions. Let s(t) be a function on (− ∞, ∞) satisfying the following conditions:

image

Then the function

sn (t) = ns(nt)

satisfies the conditions a and c and vanishes outside (−1/n, 1/n), and it may be regarded as “approaching” the delta function as n → ∞. Indeed, it can easily be proved that

image

for any continuous function ϕ, or even for any function ϕ that is integrable over some interval containing 0 and is continuous at 0. Furthermore,

image

uniformly in t over any finite interval, provided ϕ(t) is continuous over some larger interval. If s(t) is k times continuously differentiable, we also have

image

As a matter of fact, it is not necessary that s have the property b. If s has the properties a and c, then the condition (1.3) will hold for all functions ϕ that are bounded and continuous for − ∞ < t < ∞. Some examples of such approximations to the delta function that have been used by the great analysts of the last century are the following:

image

For a history of the delta function, see Ref. 16, Chap. V.

It may be remarked that we clearly have

image

showing that in some sense δ(t) is the derivative of the unit function

image

and indicating some connection between the theory of the delta function and that of generalized differentiation of discontinuous functions.

An entirely different kind of theory of the delta function was adumbrated by Heaviside (see Chaps. 2 and 3 and also Ref. 16, page 65) and more clearly pinpointed by Dirac (Ref. 2, pages 71 to 77) ; it was not carried out in detail, however, until more recently. According to this theory, the delta function is defined by its action on continuous functions, this action being given by the sifting property (1.1) or (1.3); any analytical operation that, acting on a continuous function ϕ, produces ϕ(0) is then a representation of the delta function.

We have seen that it is impossible to construct such an analytical operation in the form of a Riemann (or Lebesgue) integral; but it is possible to express it as a Stieltjes integral. Indeed,

image

for all continuous functions ϕ. If U were differentiable, we should have

image

so that here too the delta function appears as a generalized derivative of the unit function.

The two theories are not as far from each other as they might at first appear to be. Although the delta function cannot be expressed as an integral operation, it can be approximated by such operations, namely, the integral operations defined by means of the sn. Indeed, this is exactly the burden of Eq. (1.3).

1.3    Other Generalized Functions

We have indicated theories of the delta function, the basic impulse function appropriate to functions on the line − ∞ < t < ∞. Clearly there are corresponding basic impulse functions on an arbitrary finite or infinite interval; functions of two variables in a plane, where an impulse function may be concentrated at a point, along a curve, or on a more general set of points; functions of several variables; functions of a point on a curved surface or, more generally, on a manifold; and so on. While it should be possible to devise an appropriate theory for each of these impulse functions, it is clearly preferable to seek a general theory embracing all of them.

Other generalized functions occur in connection with Fourier analysis, the modern theory of partial differential equations, etc., and one should like these subjects to be included in any useful theory of generalized functions. We shall give a simple example to indicate the application of generalized functions to partial differential equations.

This example concerns the hyperbolic partial differential wave equation

uxxuyy = 0

Clearly f(xy) + g(x + y) is a solution of this equation if f and g are twice continuously differentiable functions. Now, in many problems—e.g., problems of discontinuous wave motion—one should like to regard f(xy) + g(x + y) as a solution of the wave equation even if f or g fails to be twice continuously differentiable. There are several ways of considering such “weak” or “generalized” solutions. From the point of view adopted here, the most natural approach is that of a generalized theory of differentiation, according to which every function has generalized derivatives that are generalized functions and satisfy the partial differential equation.

We shall outline in this chapter two theories of generalized functions. The first of these is algebraic in nature; indeed, it closely imitates the widening of the concept of number from integers to rational numbers. It is most successful with functions of a single nonnegative variable, although it has been extended to functions of several such variables and to functions of a single variable on a finite interval. It has the further advantage of providing a very natural approach to operational calculus as well as to generalized functions and generalized differentiation. Its greatest drawback at present seems to be its inability to cope with functions of unrestricted real variables or with functions of several variables ranging over an arbitrary region.

The second theory belongs more to the domain of functional analysis. In a sense, it might be compared with the extension of the concept of number from rational to real numbers, but the comparison is somewhat farfetched. The principal advantage of this theory is its ability to cope with all generalized functions needed at present. There are several approaches to this theory; we shall outline one of them in the simplest case of functions of a single real variable and briefly mention some others. The considerable number of different approaches to this theory is partly due to an endeavor to remove a basic difficulty remaining in it—the difficulty in defining the product of two generalized functions—and largely due to a desire to make this concept of generalized functions more easily accessible to applied mathematicians and engineers.

An entirely different attempt to cope with the problem of the delta function may be mentioned here. Schmieden and Laugwitz20 have enlarged the concept of real numbers. Their system of numbers contains infinitesimally small and infinitely large numbers, and the analysis based on this system leads to functions, in the mathematical sense of this word, that behave like the delta function. Moreover, in this system the multiplication of two functions presents no difficulties.

MIKUSIŃSKI’S THEORY OF OPERATIONAL CALCULUS AND GENERALIZED FUNCTIONS

1.4    The Definition of Operators

In Secs. 1.4 to 1.10, which are based largely on Ref. 12, t is a nonnegative variable, f = {f(t)} denotes a function of this variable, f(t) is the value of f at t (and hence a number), image is the set of all (real or complex) numbers, image the set of all continuous functions of t, small Greek letters will denote scalars (numbers), image will indicate that a is an element of image with similar notation for other sets, Θ will tentatively denote the function vanishing identically (later we shall see that we may replace this notation by 0), and l the function having a value equal to unity for every t ≥ 0 [except for the value 1 at t = 0, this is the restriction of U(t) to t ≥ 0]. The sifting property of the delta function appropriate to the interval 0 ≤ t ≤ ∞ may be expressed as

image

In image addition and multiplication by a scalar are defined in the obvious way, namely, for a and image and α and image, αa + βb is the function whose value at t is αa(t) + βb(t). These operations have the familiar properties. The convolution a * b, or simply ab, of two functions is defined by Duhamel’s integral

image

This operation has all the properties of multiplication, and it commutes with multiplication by a scalar; that is, ab = ba, a(bc) = (ab)c and hence may be written as abc, (a + b)c = ac + bc, (αa)b = α(ab), etc.

The set image with addition and multiplication by scalars forms a vector space. The same set with addition and convolution forms a commutative ring, which will be called the convolution ring.

Integral operations can be expressed in terms of convolutions. In fact, la is the function having value at t equal to

image

We also have

ll = l2 = {t}

and by induction,

image

for n = 1, 2, 3, … , so that convolution with the latter function expresses the effect of n successive integrations with fixed limit 0. More generally, for any image with image, where image denotes the real part of α, we may set

image

and call convolution with lα fractional integration of order α.

The convolution ring has no unit element; i.e., there is no image such that au = a for all aC. To see this, it is sufficient to note that lu = l means

image

for all t ≥ 0, which is clearly impossible. This means that the delta function appropriate to this case is certainly not a continuous function; actually, it is not any function.

A very important property of the convolution ring is contained in Titchmarsh’s theorem: For image we have ab = Θ if and only if a = Θ or b = Θ (or both these equations hold). An elementary proof of this theorem was given by Mikusiński in Ref. 12, Chap. 2, and is reproduced in Ref. 3, Sec. 2.1. Because of this property of image, division is a meaningful, although not always a feasible, operation in image. The equations bu = a, bv = a imply b(uv) = Θ, and if b ≠ Θ, then it follows that u = v; that is, the convolution equation bu = a has, with b ≠ Θ, at most one solution u. This solution may then be regarded as a/b. But, of course, bu = a may have no solution in image. Clearly, bu(0) = 0, and hence bu = a will certainly have no solution if a(0) ≠ 0; the equation may fail to have solutions in other cases as well.

The situation encountered here is very similar to that met upon the introduction of division of integers. There the feasibility of division (with the exception of division by zero) is ensured by the extension of the number concept from integers to rational numbers, and similarly here we shall ensure the existence of a unique solution of bu = a with b ≠ Θ by enlarging the convolution ring to a field of convolution quotients. Almost any construction of rational numbers from integers can be imitated; we shall follow the construction in terms of classes of equivalent ordered pairs of integers.

We shall consider ordered pairs (a,b) of elements of image, always assuming that the second element ≠ Θ. We call (a,b) and (c,d) equivalent if and only if ad = bc, denote by image or a/b the class of all ordered pairs equivalent to (a,b), call a/b a convolution quotient, and denote by image the set of all convolution quotients. Clearly the cancellation law (ac)/(bc) = a/b holds in image.

The elements of image are abstract entities of which it is difficult to form a definite picture. It is, however, possible to point out that in a sense image contains numbers, functions, and also the delta function and its derivatives. Thus, convolution quotients may be regarded as generalized functions, and they include the more common impulse functions on the half line t ≥ 0. We embed image in image by identifying image with the convolution quotient (ab)/b for any b ≠ Θ. Since

image

this embedding is independent of b. We shall call a function f integrable if it is absolutely integrable over every finite interval 0 ≤ tt0. For such a function, and image, the convolution fb is defined and is a continuous function. It is thus natural to identify f with the convolution quotient (fb)/b for any b ≠ Θ. As in the previous case, the embedding is independent of b. Lastly, we embed image in image by an identification of image with the convolution quotient (αb)/b, an embedding that is independent of b. Now b/b, the image of the number 1, is the unit in image and we shall see later that it acts as the delta function. Two integrable functions that differ only at a finite number of points (or, more generally, on a set of measure zero) will give the same convolution integral and hence will correspond to the same convolution quotient, thus being indistinguishable in image. This already shows that it is impossible to ascribe definite “values” to convolution quotients at a point t.

We now define the operations of addition, multiplication by a scalar, and (convolution) multiplication in image by the equations

image

It is necessary to verify that these definitions are independent of the ordered pair used in the representation of the convolution quotients involved. Now, if

image

Hence the definition of addition is meaningful as an operation on convolution quotients. Similarly, the other two definitions can be proved meaningful, and the operations can be shown to have all the usual properties. Moreover, it is easy to verify that the embedding of image and image in image preserves all these operations. For this reason, we may write f in place of (fb)/b and α in place of (αb)/b; in particular, we may write 1 in place of b/b. We also see that multiplication by 1 (which may be interpreted either as multiplication by the scalar 1 or as convolution multiplication by the convolution quotient corresponding to this number) reproduces f; hence we have the identification of the convolution quotient 1 with the delta function. Furthermore, the scalar 0 and the function Θ are mapped into the same convolution quotient, so that from now on we may write 0 indiscriminately for any of these three entities, which are conceptually entirely different yet operationally indistinguishable in image.

The set image of convolution quotients is an algebra; i.e., it is a vector space under addition and multiplication by scalars and a field under addition and convolution multiplication. The set image is closed under all these operations, and it is also closed under convolution division with the single exception of division by 0, which is prohibited. The rules of ordinary algebra hold in image.

From now on, elements of image will be denoted by single letters, and in case of doubt it will be indicated whether an entity belongs to image or to image.

The elements of image will primarily be considered here as generalized functions, but we shall see in the next section that they may act as operators, thus providing a convenient approach to Heaviside’s operational calculus.

1.5    Differential and Integral Operators

We have seen that convolution multiplication of a function by l means integration of that function. It is a plausible conjecture that multiplication by the inverse element in image, that is to say by

image

with image, b ≠ 0, means differentiation. We shall see that this is not quite the case.

Let

a = {a(t)}

be a differentiable function and

image

an integrable function. Then

image

or image. On multiplying by s, we obtain

image

and thus see that, even in the case of a differentiable function a, the product sa represents the derivative function only if a vanishes at t = 0 (Sec. 2.1). This may be explained to some extent by interpreting all our functions as vanishing for t < 0, so that there is a jump of a(0) at t = 0 that contributes a(0)δ(t) to the derivative. This also explains why, even for a differentiable function a, the product sa is in general not a function; further, it shows some of the difficulties encountered in the early applications of Heaviside’s operational calculus. By applying Eq. (1.4) several times, we obtain by induction

image

for a function a that is n times differentiable and has an integrable nth derivative a(n). For such a function, sna is in general a generalized function. On the other hand, sna exists as a convolution quotient for any (not necessarily differentiable) function a, or any convolution quotient. The generalized function sna may be called the extended or generalized nth derivative of a.

Since 1 corresponds to δ(t), sn is the extended nth derivative of the delta function, and a polynomial in s with constant, i.e., scalar, coefficients is an impulse function.

Next we investigate simple rational functions of s. From Eq. (1.4) we have

image

From this, it can be proved by induction that

image

We are now ready to interpret any rational function of s. Such a function can be decomposed into a sum of a polynomial and partial fractions of the form (1.5), and every term of this decomposition can then be interpreted.

EXAMPLE 1.1.   To decompose the rational function s3/(s2 + 1).

Since

image

where j2 = −1, we have

image

The operational calculus so developed may be used to solve ordinary linear differential equations with constant coefficients, and also to solve systems of such equations. It will be sufficient to illustrate the process by a simple example.

EXAMPLE 1.2.   To solve the differential equation

image

By applying Eq. (1.4) twice, we have

image

and hence

image

Now,

image

Hence we have the solution

image

Such equations can be solved even if the right-hand sides are generalized functions, for instance, delta functions, as with the differential equation satisfied by the Green’s function. For instance, the solution of

image

obtained in a manner similar to that for the above Example 1.2, is

image

We note that Eq. (1.5) is in agreement with the Laplace transform of eαttn−1/(n − 1)! in case s denotes a complex variable. We shall see in Example 1.8 that this is not a coincidence. Thus, tables of Laplace transforms may be used in interpreting rational functions of s.

1.6    Limits of Convolution Quotients

It is fairly clear that in image we should use a notion of convergence of continuous functions under which the limit of a convergent sequence of such functions is again continuous. Uniform convergence on every finite interval offers itself as the simplest notion of convergence that preserves continuity. It is much less clear how convergence of a sequence of convolution quotients should be defined, and no simple notion of convergence in image that has all the desirable properties is known. We shall follow Mikusiński in introducing a notion of convergence that is at any rate simple, has many of the desirable features of convergence, and appears to be adequate for the applications of convolution quotients to operational calculus and to partial differential equations. According to this concept of convergence, a sequence of convolution quotients is regarded as convergent if it has a common denominator and if the numerators, which are continuous functions, are convergent in the sense outlined above. Thus, we shall say that a sequence of convolution quotients an converges to a, in symbols

anaorlim an = a

if there is a q ≠ 0 such that, for each image and if furthermore the sequence of continuous functions qan converges to qa uniformly over every finite interval 0 ≤ tt0. Clearly a itself is then a convolution quotient.

It is fairly easy to prove that the limit, if it exists, is unique and has most of the usual properties. In particular, the sequence a, a, a, … converges to a; the sum (product) of convergent sequences is convergent and tends to the sum (product) of the limits; and a sequence of scalars is convergent in the ordinary sense if and only if the corresponding sequence of convolution quotients converges in the sense outlined here. However,

image

does not necessarily hold even if it is assumed that bn ≠ 0 and lim bn ≠ 0.

We shall now give some examples and comment on them.

EXAMPLE 1.3.   To prove that {sin nt} → 0.

We have

image

and the latter function converges to 0 uniformly (in this case over the entire nonnegative axis). This example shows that convergence in image, even for ordinary functions, demands much less than ordinary convergence. It thus allows us to ascribe limits to sequences of functions that would ordinarily be regarded as divergent, and it also opens the way to a representation of some convolution quotients as limits, in this sense, of ordinary functions. (See also Example 1.5.)

EXAMPLE 1.4.   To prove that for image, cn → 0 as n → ∞.

For a fixed t0, there exists an M > 0 such that |c(t)| ≤ M for 0 ≤ tt0.

We shall prove by induction that, for n = 1, 2, …,

image

This relationship clearly holds when n = 1. If it holds for n = 1, then

image

and this completes the proof by induction of the inequality (1.6). Since the right-hand side of (1.6) converges to 0 uniformly for 0 ≤ tt0, we have cn → 0 in the sense of convergence in image.

EXAMPLE 1.5.   To prove that if f(t) is absolutely integrable over 0 ≤ t ≤ ∞ and

image

then

image

We set

image

Now,

image

and we shall prove that this function approaches {t} = l2 uniformly over every finite interval. Set

image

First assume that 0 ≤ tδ. Then

image

For δtt0,

image

In either case, for any δ between 0 and t0,

image

Given t0 > 0 and > 0, we first choose δ so that 0 < δ < /(2A) and then choose N so that

image

We then have |gn(t)| < for 0 ≤ tt0 and nN, showing that gn(t) converges to 0 uniformly for 0 ≤ tt0. Thus, l2{nf(nt)} converges to l2 uniformly on every finite interval.

We have accordingly found a family of approximations to the delta function. This should be compared with the approximations discussed in Sec. 1.2. The result suggests that many other convolution quotients might be represented as limits of functions.

1.7    Operator Functions

We shall now consider convolution quotients that depend on parameters. For the sake of simplicity, we shall consider a single parameter x varying over a closed and bounded interval I: αxβ, and we shall denote the domain αxβ, t ≥ 0 of the xt plane by D. An operator function a(x) assigns to each xI a convolution quotient a(x). Mikusiński calls such a function a parametric function if each image, so that a(x) = {a(x,t)}, and considers a(x) as a continuous operator function if there exists a q ≠ 0 in image such that b(x) = qa(x) is a parametric function and b(x,t) is continuous in D; he says a(x) is k times continuously differentiable with respect to x if there exists a q ≠ 0 in image such that b(x) = qa(x) is a parametric function that is k times continuously differentiable with respect to x; and he sets

image

Continuous and differentiable operator functions have many of the usual properties, and differentiation obeys the familiar rules. It is unnecessary for us to go into further details here. Instead of this, let us consider some examples.

EXAMPLE 1.6.   To discuss the function a(x) = {cos (xt)}.

This is a continuous parametric function for any interval I. By virtue of the results in Example 1.2, this operator function can be expressed as

image

The function is indefinitely differentiable with respect to x, and the reader may easily verify that its derivatives can be computed by differentiating the explicit form. The function a(x) satisfies the operator differential equation

a″(x) + a(x) = 0

and the initial conditions

a(0) = {cos t}   a′(0) = {sin t}

EXAMPLE 1.7.   To discuss the function hα(x), defined as follows: For x ≥ 0 let

h(x,t) = 0 if 0 ≤ t < xh(x,t) = 1 if xt

and set

h(x) = {h(x,t)}hα(x) = lαh(x)

If image, then hα(x) is a parametric function, and

image

Thus hα(x) is a continuous parametric function if image, and it is k times continuously differentiable if image; further,

image

if image. Since

lβhα(x) = hα+β(x)

it follows that hα(x) is infinitely differentiable, in the sense of differentiation of operator functions, with respect to x; and

image

for all α. This differential equation, together with

hα(0) = lαh(0) = lα+1

suggests writing

hα(x) = lα+1esx

We shall justify this later.

Of particular importance is

h−1(x) = esx

For image, we set

h(x)f = {g(x,t)}

and find

image

Consequently, for

h−1(x)f = sg(x) = g1(x)

we have

g1(x,t) = 0   for 0 ≤ t < xg1(x,t) = f(tx)   for xt

Thus, g1(x, t) is simply the function f(t) shifted by x, and esx is the shift operator.

By a direct computation, it can be verified that

h(x)h(y) = lh(x + y)

for x ≥ 0, y ≥ 0, and this relationship holds for all real x, y, provided that we define h(x) for negative values of x by the equation

h(x)h(−x) = l2

We have thus defined hα(x) for all complex α and for all real x. In particular,

h−1(x)h−1(−x) = 1

We now turn to integration of operator functions with respect to x. Let ϕ(x) be absolutely integrable over I = [α,β], and let a(x) be a continuous operator function and q ≠ 0 such that qa(x) = b(x) is a continuous parametric function. We then set

image

It can be proved that this definition is independent of q and that the integral has all the usual properties. Infinite integrals may then be defined as limits of finite integrals.

EXAMPLE 1.8.   To prove that for any integrable function f, the integral image exists and is equal to f.

This result is the background for the coincidence noted at the end of Sec. 1.5. It should be remarked, though, that here s is an operator, so that the integral is not a Laplace integral; also, it might be noted that the result to be proved holds without any restriction on the growth of f(x). Since

lesx = h(x)

we have

image

As β → ∞, the last function approaches lf uniformly over every finite interval 0 ≤ tt0. Thus

image

exists and is equal to lf.

1.8    Exponential Functions

Suppose that, for a certain image, there exists an interval I containing x = 0 and a differentiable operator function e(x) on I that satisfies the differential equation

e′(x) = we(x)

and the initial condition

e(0) = 1

We then say that w is a logarithm and set

e(x) = exw

It is fairly easy to prove that in this case e(x) exists for all real x, is infinitely differentiable, is uniquely defined by the differential equation and the initial conditions, and has the properties

exw ≠ 0   for all x(exw)−1 = exwexweyw = e(x+y)w

EXERCISE

Verify that s is a logarithm and that (see Example 1.7)

exs = h−1(x)

Some but not all elements of image are logarithms. The element s is a logarithm, and so are real multiples of s, but it can be proved that js is not a logarithm. All elements of image are logarithms, and

image

The series representation holds also for other w; thus it holds for integrable (rather than continuous) functions, or for w = 1. But it does not hold for all logarithms; for instance, the series fails to converge for w = s, which is a logarithm. If u and v are logarithms, then αu + βv is a logarithm for real, but not necessarily for complex, α and β.

Exponential functions arise in the solution of partial differential equations, in which they often correspond to fundamental solutions.

EXAMPLE 1.9.   To prove that s½ is a logarithm.

Let us set

Q(x) = {Q(x,t)}R(x) = {R(x,t)}

where

image

Clearly, Q(x) = −R′(x). Now the function

image

although a parametric function, fails to be continuous; but the function

image

is continuously differentiable with respect to x when x > 0, and

image

On the other hand, we have

image

and upon introducing a new variable of integration v by

image

we obtain

image

so that

Q′(x) = −s½Q(x)

Moreover, l2Q(x) approaches {t} = l2 uniformly in every interval 0 ≤ tt0, and hence

image

In the course of this work we have also seen that

image

and

image

where erf denotes the error function. For fixed x > 0, Q(x,t) as a function of t increases for 0 < t < x2/6 and decreases thereafter. Thus,

image

Since this expression approaches zero, uniformly in t, as x → ∞, we see that exp (−xs½) → 0 as x → ∞. It follows from this that, if for some image, a exp (xs½) is a bounded continuous parametric function for x ≥ 0, then a = 0; and if for image and image, the function

a exp (xs½) + b exp (−xs½)

is a bounded continuous parametric function for all real x, then a = b = 0.

1.9    The Diffusion Equation

Let us briefly indicate the application of the technique developed here to the diffusion equation

uxx(x,t) = ut(x,t)

in the half plane − ∞ < x < ∞, 0 ≤ t < ∞ (subscripts indicate partial derivatives). If

u(x) = {u(x,t)}

is a parametric function possessing continuous partial derivatives with respect to x and t, and a continuous second partial derivative with respect to x, our partial differential equation may be replaced by the operator differential equation

image

where ϕ(x) = u(x,0).

If ϕ(x) is an integrable function, this differential equation may be solved by the method of variation of parameters. Two solutions differ by an operator function of the form

a exp (xs½) + b exp (−xs½)

where image and image; and, by the remark at the end of the preceding section, a solution that is a bounded continuous parametric function is unique within the class of such functions. It may be verified that

image

is a bounded continuous parametric solution of Eq. (1.7) if ϕ is a bounded measurable function. In this sense, the function

image

may be regarded as a (unique) generalized solution of our boundary-value problem if ϕ(x) = u(x,0) is a bounded measurable function. Actually, this solution is differentiable, indeed analytic, for t > 0 and for all x; and although it is not a continuous function for t ≥ 0, it satisfies the “initial condition” in the generalized sense that

u(x,t) → ϕ(x)as t → 0   t > 0

at least for all those x at which ϕ is continuous. By a more refined analysis, this result can be extended to measurable functions that, instead of being bounded, are assumed to satisfy an inequality

|ϕ(x)| ≤ A exp (B|x|α)

where A, B, and α are constants and 0 ≤ α < 2.

Other problems involving parabolic equations, and problems involving the wave equation and other hyperbolic equations, can be solved by means of this operational calculus, but so far no significant and successful applications to elliptic partial differential equations, such as Laplace’s equation, are known.

1.10    Extensions and Other Theories

Mikusiński14 has extended this theory to functions of several variables, t1, …, tn, ranging over the “cone”

t1 ≥ 0, …, tn ≥ 0

He13 has also developed the corresponding theory for convolution quotients of functions on a finite interval, αtβ.

An alternative theory has been proposed by J. D. Weston,26,27 whose generalized functions are operators acting on certain “perfect functions” rather than convolution quotients of functions.

DISTRIBUTIONS

1.11    Testing Functions

We now turn to generalized functions of an entirely different type, to distributions.21 There are several different approaches to this theory, most of them resembling the theories of the delta function indicated in Sec. 1.2 in that distributions appear either as generalized limits of functions or else as characterized by their action on certain classes of functions. We shall present here the second point of view for generalized functions of a single real variable t ranging over the entire real line (−∞, ∞). Alternative approaches and extensions will be mentioned in Sec. 1.18.

Since distributions will be defined in terms of their action on certain classes of functions, the resulting notion of generalized functions will depend on the class of testing functions on which distributions act. We shall use two spaces of testing functions: One has proved useful in applications to Fourier analysis, and the other has been employed in connection with partial differential equations. Other classes of testing functions have also been used.5

Let image be the set of infinitely differentiable functions decreasing rapidly as t → ± ∞. More precisely, ϕ is in image if all derivatives ϕ(k) exist and if for any integer k and any polynomial P(t), P(t)ϕ(k)(t) → 0 as t → ± ∞. The set image is a vector space in the sense that, for any two elements ϕ1 and ϕ2 of image and any two real or complex numbers c1 and c2, the function c1ϕ1 + c2ϕ2 is defined (in the obvious way) and is again in image. We shall use 0 indiscriminately to denote the number 0 and the function identically equal to zero for all values of t. We now introduce a notion of convergence in S. A sequence of functions ϕn in S is said to converge to 0 if, for any fixed k and fixed polynomial P(t), P(t)ϕn(k)(t) → 0 uniformly for all real t as n → ∞. A sequence of functions ϕn is said to converge to ϕ if ϕnϕ converges to 0. We shall indicate this by writing ϕnϕ as n → ∞.

Let cnc (in the sense of convergence of numbers) and let ϕnϕ and θnθ, in the sense of convergence in image, as n → ∞; then it is easy to see that also cnϕn and ϕn + θnϕ + θ, in the sense of convergence in image, as n → ∞. Thus multiplication by a number and addition of functions are continuous operations. From now on, we shall usually omit the qualifying phrases appearing in the parentheses above, since the nature of the entities involved will indicate which space we are in and which notion of convergence should be used.

Secondly, let image be the set of infinitely differentiable functions ϕ vanishing outside some finite interval, the interval depending on ϕ and varying from element to element of image. There are such functions. As an example, let us define

image

where a and b are real numbers, a < b, and c, d, α, and β are positive numbers. Clearly, ϕ vanishes outside a finite interval and is infinitely differentiable except possibly at a and b, and it is easy enough to show that ϕ is also infinitely differentiable at these two points. We now introduce a notion of convergence in image. A sequence of functions ϕn in image is said to converge to 0 if there is a finite interval I such that each ϕn vanishes outside I, and if, for each fixed nonnegative integer k, ϕn(k)(t) → 0 uniformly for all real t (or all t in I) as n → ∞. A sequence of functions ϕn is said to converge to ϕ if ϕnϕ converges to 0, and we shall write ϕnϕ → 0 and ϕnϕ as n → ∞. Clearly, image is a vector space, and the operations of multiplication of a function by a number and addition of two functions are continuous. Equally clearly, every element of image belongs also to image, and convergence in image implies convergence in image.

Elements of image or image will be called testing functions.

In studying the action of other functions on testing functions, we shall start with continuous functions of t and proceed to functions that are merely locally integrable in the sense that they are integrable over every finite interval. In this context a function of t will be said to be of slowgrowth if its growth is dominated by that of some polynomial; in other words, a function f is of slow growth if (1 + t2)Nf(t) is bounded for some N.

1.12    The Definition of Distributions

For continuous functions f of slow growth,

image

converges for each ϕ in image and defines an “evaluation” of f on all elements of image. In classical analysis, we think of a function f as characterized by its values f(t) for all real t. We now claim that, alternatively, we can characterize such a function by its evaluations fϕ〉 on all elements of image. In order to substantiate this claim, we have to show that two continuous functions of slow growth possessing the same evaluations on all elements of image also possess the same values for all t and hence are identical. It will be sufficient to show that fϕ〉 = 0 for all ϕ in image entails f(t) = 0 for all t. Indeed, suppose f(t0) ≠ 0 for some t0, say f(t0) > 0. Since f is a continuous function, there is some interval I around t0 on which f is positive. Take any interval (a,b) in the interior of I and define

image

Then clearly fϕab〉 > 0, and accordingly the assumption f(t0) ≠ 0 for some t0 is inconsistent with fϕ〉 = 0 for all ϕ in S. Incidentally, we see that a continuous function of slow growth is completely characterized by its evaluations on the ϕab for rational a and b, but we shall continue to think of it in terms of its evaluations on all ϕ in image.

Many discontinuous functions—namely, all locally integrable functions—of slow growth also possess evaluations on image, and the proof given above shows that at points of continuity the values of such a function are completely determined by the evaluations of the function. On the other hand, the values at points of discontinuity are not at all determined. For instance, the two functions f and g defined by

image

clearly have the same evaluations on all ϕ in image, but their values at t = 0 differ. More generally, if N is a null function—that is, N is locally integrable and

image

(or, N vanishes almost everywhere)—then f and f + N will have the same evaluations on all elements of image. The situation is not unlike the one encountered in connection, say, with Fourier series or Laplace transforms, where f and f + N have the same Fourier series or Laplace transforms, as the case may be, and are thus indistinguishable. It so happens that in many situations the distinction between two functions differing by a null function is unimportant, and in such situations the evaluations of a function on image characterize it as far as a characterization is meaningful —i.e., up to a null function. In many problems in applied mathematics, the functions that most naturally arise in the data or in the final results either are continuous or else have simple types of discontinuities, and in the latter case the right-hand and left-hand limits of a function at a discontinuity do matter, while the value of the function itself at the discontinuity is usually artificial and has no physical significance. In such problems, then, fϕ〉 is as satisfactory as f(t) for a characterization of f.

The function f(t) indicates a mapping of the real line into the space of real and complex numbers; similarly, fϕ〉 indicates a mapping of image into the space of numbers in that it assigns a number, defined by Eq. (1.8), to each element of image. Now, a mapping of a vector space into a space of numbers is called a functional, and in this sense we may say that the function f is characterized by the functional on image that it generates. We shall henceforth use f indiscriminately for either the function or the functional.

The functional fϕ〉 defined by Eq. (1.8) is clearly linear in the sense that, for any two numbers c1 and c2 and any two testing functions ϕ1 and ϕ2, we have

fc1ϕ1 + c2ϕ2〉 = c1fϕ1〉 + c2fϕ2

We claim that the function is also continuous in the sense that ϕnϕ in image entails fϕn〉 → fϕ〉, in the sense of convergence of numbers, as n → ∞. On account of the linearity of the functional, it will be sufficient to show this in the special case ϕ = 0. Now, f being of slow growth, there exist numbers A and N such that

|f(t)| ≤ A(1 + t2)N

Moreover, by the definition of convergence in image,

(1 + t2)N+1ϕn(t) → 0

uniformly for all real t as n → ∞, and consequently for any positive we have

(1 + t2)N+1|ϕn(t)| <

for all t and all sufficiently large n. But then

image

for all sufficiently large n, showing that fϕn〉 → 0 as n → ∞.

Thus we see that every locally integrable function of slow growth determines uniquely a continuous linear functional on image, and conversely such a function is determined up to a null function by its continuous linear functional.

There are many continuous linear functionals on image that are not generated by functions. For instance, for all nonnegative integers k, the equations

image

assign numbers to each testing function and thus determine functionals on image that are easily seen to be linear and continuous. Now, a process similar to the one carried out above for continuous functions shows that if this functional were generated by a function, then that function would have to vanish at all of its points of continuity other than the origin. Thus, it would vanish identically if it were continuous, and a familiar process of approximation of integrable functions by continuous ones shows that, in any event, it would have to be a null function. But a null function fails to generate the functional δ(k), showing that this functional cannot be generated by a locally integrable function. Indeed, comparison with Eqs. (1.1) and (1.2) shows that it is the functional corresponding to the delta function for k = 0 and to its derivatives for k ≥ 1.

These considerations suggest that we regard every continuous linear functional on image as defining a generalized function. The space of these continuous linear functionals will be denoted by image; it clearly contains all locally integrable functions of slow growth, and it also contains the delta function and its derivatives.

In all the foregoing considerations, we could restrict the testing functions to image and thus obtain the space image of continuous linear functionals on image. (Note in this connection that ϕab is in image.) Every continuous linear functional on image is also a continuous linear functional on image and hence in image, and there are functionals in image that are not in image. The integral in Eq. (1.8) converges for any continuous or locally integrable function, not necessarily of slow growth, when ϕ is in image, thus showing that image generalizes functions of arbitrary growth in the same sense in which image generalizes functions of slow growth.

We shall call the elements both of image and of image distributions; in particular, we shall call δ the delta distribution. If it is necessary to distinguish between them, we shall call the elements of image distributions of slow growth, and those of image we shall call distributions of arbitrary growth or, shortly, distributions. The name “distribution” was introduced by L. Schwartz. It is suggested by a consideration of an integrable function f(t) as the density of a continuous distribution of mass, while the delta distribution corresponds to the unit mass concentrated at t = 0. Although it fails to offer a description of other generalized functions such as δ′ and is not in accordance with the use of the same term in probability theory, the name “distribution” will be retained here as a convenient means for distinguishing generalized functions introduced here from those that appear, say, in Mikusiński’s theory of operators.

1.13    Operations with Distributions

We shall generally denote distributions by capital letters, such as R, S, and T. Two distributions S and T are equal if Sϕ〉 = Tϕ〉 for all testing functions ϕ. We define multiplication of a distribution T by a number c and addition of two distributions S and T by

(cT)〈ϕ〉 = cTϕ(S + T)〈ϕ〉 = Sϕ〉 + Tϕ

These algebraic operations have the usual properties, and the set of distributions equipped with these operations is a vector space.

The product of two distributions, to correspond to multiplication of the values of two functions, cannot be defined in general. In particular cases, however, such a definition is possible. For instance, let θ(t) be infinitely differentiable for all real t. Then the function θ corresponds to a distribution, which we shall also denote by θ. Given an arbitrary distribution T, we can define the product θT by

(θT)〈ϕ〉 = Tθϕ

noting that image if image. Moreover, if both the function θ and the distribution θT are of slow growth, then θT will also be of slow growth.

The convolution of two distributions can also be defined, but its definition requires a consideration of distributions in two variables and will be considered only briefly, in Sec. 1.15.

If we think of distributions as generalized functions, we sometimes write T(t), and in this spirit occasionally write symbolically

image

For instance, in this sense we may write

image

Distributions do not in general have definite “values at points t.” Nevertheless, a distribution may have “values” on an interval in the following sense. We shall say that the distribution T vanishes on the open interval (a,b) if Tϕ〉 = 0 for all testing functions vanishing outside (a,b). In such a case we also write

T(t) = 0a < t < b

If the distribution T is generated by a locally integrable function f, then T will vanish on (a,b) in the sense of our definition if and only if f vanishes almost everywhere on (a,b).

This definition can be extended to a local comparison of a distribution T with a function f. Let f be integrable on (a,b), and extend f to all values of t, for instance by setting it equal to zero outside (a,b). The function f so extended defines a distribution, which we shall again denote by f. We then say

image

or T = f on (a,b) if the distribution Tf vanishes on (a,b) in the sense described above. Note that the vanishing or otherwise of Tf on (a,b) is independent of the manner of extending f.

In this sense, the delta distribution clearly vanishes on any open interval not containing the origin, and this circumstance may be expressed by stating δ(t) = 0 for t ≠ 0, without implying the existence of “values” of δ(t) for individual values of t. For the Heaviside distribution, defined by

image

we clearly have H(t) = 0 for t < 0 and H(t) = 1 for t > 0, so that in this sense we can attribute values to H everywhere except at the origin. As a matter of fact, H can be generated by either of the two functions defined in Eqs. (1.9).

The open set consisting of all open intervals on which a distribution T vanishes (i.e., the union of all such intervals) is called the null set of T. The collection of values of t not in the null set (i.e., in the complement of the null set) is called the support of T. The support is the set of points on which T(t) is essentially different from zero; it is a closed set, and it can be characterized either as the smallest closed set outside of which T(t) vanishes or as the collection of points t not contained in any open interval on which T vanishes.

The support of the delta distribution is clearly the single point t = 0, and the support of the Heaviside distribution H is the nonnegative half t ≥ 0 of the real line.

We now come to the differentiation of distributions. If f is continuously differentiable, then f and f′ generate distributions, and we have, by integration by parts,

image

and similarly for higher derivatives. This suggests the following definition for the derivative T′ of an arbitrary distribution T:

T′ 〈ϕ〉 = T〈−ϕ′〉

and for the derivatives of higher order,

image

In this connection, it should be noted that the derivative of a testing function (rapidly decreasing testing function) is again a testing function (rapidly decreasing testing function).

EXAMPLE 1.10.   To determine the derivative H′ of Heaviside’s distribution H.

According to the definition, we have

image

for all testing functions, and hence H′ = δ. It can be verified similarly that the derivatives of the delta distribution are the distributions defined in Eqs. (1.10).

According to our definition, every distribution is infinitely differentiable, and if the distribution is of slow growth, its derivatives will also be of slow growth. Since every locally integrable function generates a distribution, every such function possesses “distribution derivatives” of arbitrary order, but these derivatives need not be functions. Thus, Heaviside’s function and its derivative, the delta distribution, exemplify this. If, on an interval (a,b), f possesses a locally integrable derivative, however, then the distribution derivative and the derivative in the usual sense are equal on (a,b). To show this, denote for the moment the distribution derivative of f by T′, so that

T′〈ϕ〉 = f〈−ϕ′〉

For testing functions vanishing outside (a,b), the computation (1.12) remains true and shows that

T′(t) = f′(t)

on (a,b). For this reason, we may write f′ for the distribution derivative without ambiguity. In particular,

H′(t) = δ(t)

is both meaningful and correct in this sense.

EXAMPLE 1.11.   To investigate the derivatives of tα, α > −1.

Since the values of this function fail to be real for negative t when α is fractional, we shall first discuss the function

f(t) = tαH(t)

and the distribution T generated by it. If k is a positive integer and αk > −1, then T(k) is everywhere equal to the locally integrable function

image

and this equality holds for all positive integers k if α is a nonnegative integer. If α is not an integer and αk < −1, then T(k) will no longer be a function; nevertheless, the distribution T(k) will be equal to the function fk when t ≠ 0.

The following definition is then suggested. If [tαH(t)] represents the distribution generated by tαH(t) when α > −1, then the formula

image

in which image denotes the kth distribution derivative, can be proved for α > −1, k = 1, 2, …, and it may be taken as the definition of the left-hand side when α < −1, k is a positive integer, and α + k > −1. It is easy to see that this definition is independent of k and that, with [tβH(t)] so defined, the formula holds for all nonintegral α and all positive integers k.

The definition of [tα] for a negative integer α involves logarithms and will not be given here.

In order to extend fractional powers also to negative t, we introduce

image

and obtain the further formulas

image

the discussion of which is left to the reader.

The differentiation of distributions obeys the usual rules. If c is a constant, if θ(t) is an infinitely differentiable function and θ the distribution that it generates, if S and T are distributions, and if θT is the product defined at the beginning of this section, then

(cT)′ = cT(S + T)′ = S′ + T(θT)′ = θT + θT

It will be sufficient to prove the last of these statements:

image

Finally, we can define primitives or antiderivatives of distributions, corresponding to indefinite integrals of functions. The distribution T1 is called a primitive of T if it satisfies image. It is clear from the corresponding situation with regard to functions that a distribution may possess several primitives; we shall presently pursue this question further. Meanwhile, let us prove that every distribution possesses at least one primitive. In order to do this, let us fix a ϕ0 in image for which

image

For this, we shall arbitrarily set T1ϕ0〉 = 0. Corresponding to every testing function ϕ, we define a function ϕ1 by the equation

image

It is easy to verify that ϕ1 is again a testing function and that every testing function can be represented in the form ϕ1. We further define a new distribution T1 by

T1ϕ〉 = T〈−ϕ1

Clearly, T1ϕ0〉 = 0, and since

image

so that T1 is a primitive of T.

It may be noted that in this discussion ϕ may be either in image or in image; ϕ1 will be in the same class of testing functions, so that for distributions of slow growth we have demonstrated the existence of a primitive of slow growth.

A constant c is an infinitely differentiable function and hence generates a distribution, which we shall again denote by c; thus,

image

Such a distribution will be called a constant distribution. Clearly, the constant distribution defined by c is equal to c everywhere in the sense of the definition (1.11), and cT may be used unambiguously for either the constant multiple of the distribution T or the product of the two distributions.

The derivative of a constant distribution is zero, since

image

We shall now prove that, conversely, the condition T′ = 0 or, equivalently, Tϕ′〉 = 0 for all testing functions entails that T is a constant distribution. To prove this, fix ϕ0, and for every testing function ϕ define the testing function ϕ1, as above. Since image, it follows from Eq. (1.14) that

image

Hence T is the constant distribution that is equal to Tϕ0〉 everywhere.

If T1 is a primitive of T and c is any constant distribution, then T1 + c is also a primitive of T, and it follows from the result of the preceding paragraph that any two primitives of a given distribution differ by a constant distribution.

These considerations can be extended to repeated primitives. A distribution generated by a polynomial of degree n being called a polynomial distribution of degree n, it is clear that derivatives and primitives of polynomial distributions are again polynomial distributions and have the appropriate degrees. In particular, the kth derivative of a polynomial distribution of degree k − 1 or less is zero; if Tk is a kth primitive of T and if p is any polynomial distribution of degree k − 1 or less, then Tk + p is also a kth primitive of T; conversely, any two kth primitives of a given distribution differ at most by a polynomial distribution of degree k − 1 or less.

1.14    Convergence of Distributions

We have seen that distributions form a vector space. In this vector space we introduce a notion of convergence, saying that the sequence of distributions Tn converges provided that, for every testing function ϕ, the sequence of numbers Tnϕ〉 converges; further, we say that Tn converges to T, in symbols TnT as n → ∞, provided that, for every testing function ϕ, we have Tnϕ〉 → Tϕ〉. Let Tn be a convergent sequence of distributions; then, for each ϕ, lim Tnϕ〉 exists and defines a functional Tϕ〉. This functional is clearly linear, and it can also be proved to be continuous; thus, a convergent sequence always has a limit.

The notion of convergence defined above is consistent with the vector-space structure in that it makes the addition of two distributions and the multiplication of a distribution by a number continuous operations; that is, if cnc, SnS, and TnT as n → ∞, then also cn TncT and Sn + TnS + T as n → ∞. Moreover, a sequence of constant distributions Tn = cn converges in the sense of distributions if and only if the sequence of numbers cn converges in the sense of convergence of numbers, and the two limits correspond.

Since every locally integrable function determines a distribution, convergence of distributions may be used to define generalized limits of sequences of functions, it being understood that the generalized limit, if it exists, is in general a distribution rather than a function. The connection between ordinary pointwise limits and generalized limits of sequences of functions is by no means simple. Either of these limits may exist when the other does not, and even if both exist, they need not be equal. We shall exemplify these points.

EXAMPLE 1.12.   To show that if fn(t) = sin nt, then fn → 0 in the sense of convergence of distributions.

Since every testing function ϕ is absolutely integrable, we have

image

by the Riemann-Lebesgue lemma. Note that every derivative of a testing function is again absolutely integrable and that therefore it can be proved through integrations by parts that the sequence of functions na sin nt, for any value of the constant a, also tends to 0 in the sense of convergence of distributions. Note also that for a fixed t that is not an even multiple of 2π, lim fn(t) fails to exist as n → ∞.

EXAMPLE 1.13.   To prove that the function fn(t), defined by

image

does not converge in the sense of distributions.

Here, clearly, fn(t) → 0, for every fixed t, as n → ∞, but

image

fails to have a limit as n → ∞ if ϕ(0) ≠ 0; thus, we have pointwise limits everywhere and yet no distribution limit.

EXAMPLE 1.14.   To prove that for the function fn(t), defined by

image

we have fnδ as n → ∞.

Since

image

it follows that fnδ, in the sense of convergence of distributions, as n → ∞. On the other hand, for each fixed t, fn(t) → 0 as n → ∞. Thus both limits exist, though they do not agree.

Note that in the foregoing example the discrepancy was caused by the nonuniformity of the pointwise limit at t = 0; indeed the very existence of the limit at t = 0 was enforced only by an artificial definition of the function fn(t) at that point. Under more stringent conditions, for instance under the condition that fn(t) → f(t) uniformly for all t as n → 0 or that

image

it can be proved that the existence of the pointwise limit entails that of the distribution limit, and that the two limits are equal.

EXERCISE

Show that if f(t) is absolutely integrable over (− ∞, ∞), image and fn(t) = nf(nt), then fn → δ as n → ∞. (Use Example 1.2 as a hint.)

One of the more remarkable properties of the convergence of distributions is its insensitivity to differentiation. If TnT as n → ∞, then

image

and hence also image as n → ∞.

As an application of this property, convergent series of distributions (and likewise infinite integrals of distributions) may always be differentiated term by term. The differentiated series are then always convergent in the sense of distributions. This causes a very wide class of trigonometric series, in particular all Fourier series, to converge in the sense of distributions even though they may fail to converge in the ordinary sense. A trigonometric series such as

image

always converges in the sense of distributions provided that cn = O(nk) for some fixed integer k as n → ∞; for in this case the series

image

converges uniformly for all t and its differentiation k + 2 times produces the given series. In particular, for a Fourier series the cn are bounded, and this process may be used with k = 0, so that the distribution sum of a Fourier series is at worst the second distribution derivative of a continuous periodic function.

In the definition of the relation TnT, the integer-valued parameter n may be replaced by a continuously varying parameter tending to some limit, and we shall take this extension for granted.

The notion of the distribution limit may be used in order to interpret divergent sums and integrals.

EXAMPLE 1.15.   To show that

image

exists and is equal to the delta distribution.

Set

image

Then

image

by Fourier’s single-integral theorem, and hence faπδ as a → ∞. This proves the result and shows that the formula

image

frequently used in applied mathematics is correct in the sense of distribution convergence.

The notion of the distribution limit may also be used to establish

image

as a meaningful formula. Here T(t) is the distribution T, and T′(t) is the distribution T′ defined by Eq. (1.13). The interpretation of T(t + h) as a distribution Th is suggested by the formula

image

which is certainly valid when T(t) is a function. For any testing function ϕ let us define a new testing function ϕh by

ϕh(t) = ϕ(th)

and then define, for any distribution T, a new distribution Th by

Thϕ〉 = Tϕh

We then wish to prove that

image

Now,

image

and thus we have to show that

image

in the sense of convergence in image. Now, if ϕ is in image, then, for all values of h such that |h| ≤ h0, θh vanishes outside a finite interval that is independent of h, and we need to show that for each nonnegative integer k we have θh(k)(t) → 0, uniformly for all t, as h → 0. We note that

image

and by the mean-value theorem of differential calculus we thus obtain

image

where image is between u and t. The function ϕ(k+2)(t) is bounded, say

|ϕ(k+2)(t) ≤ Ak+2

for all t. It then follows that

image

so that, for each k, θh(k)(t) → 0 uniformly for all t as h → 0, and the result is established. This consideration shows that the definition of T′ is consistent with the definition more closely resembling that of the derivative of a function.

1.15    Further Properties of Distributions

Let us consider a distribution depending on a parameter u, and let us denote such a distribution by Tu. Here u may be a real or complex parameter, or a parameter of a more general kind. The theory of convergence of distributions makes it possible to speak of continuous dependence of Tu on u, of the partial derivative of Tu with respect to u, of integrating Tu with respect to u, and so on. We shall take these developments for granted without enlarging upon them here.

The concept of distributions can easily be extended to functions of several variables, together with the notions of equality, convergence of distributions in several variables, partial derivatives of such distributions with respect to these variables, and so on. In particular, mixed partial derivatives of distributions are always independent of the order of differentiation, since mixed partial derivatives of testing functions have this property.

It will be sufficient to make a few remarks on distributions in two variables, s and t, based on testing functions ϕ(s,t). In this case the notation of generalized functions, R(s,t), S(s), T(t), is especially useful in that it indicates whether we are considering distributions in one or the other variable or in both variables. Similarly, for the testing functions we shall write ρ(s,t), σ(s), τ(t). Partial derivatives may then be defined by

image

We shall say that the distribution R(s,t) is independent of s, or depends only on t, if there exists a fixed distribution T(t) such that R = T on all open sets of the st plane, or, alternatively, if

image

whenever σ is a testing function of s and τ is a testing function of t (and hence στ is a testing function of s and t).

It can easily be proved that a distribution R is independent of s if and only if ∂R/∂s = 0. In a similar manner we may speak of distributions depending only on s + t, say, and prove that a distribution depends only on s + t if and only if ∂R/∂s − ∂R/∂t = 0. These concepts have applications in the theory of partial differential equations. For example, if f and g are two locally integrable functions of one variable, then f(s + t) and g(st) generate distributions in the two variables s and t. The first of these distributions depends only on s + t, the second only on st, and both satisfy the differential equation

image

so that their sum is also a solution of this differential equation in the sense of distributions. Conversely, it can be shown that any distribution that satisfies this equation is a sum of a distribution depending only on s + t and one depending only on st. Thus, the general distribution solution of the one-dimensional wave equation has been obtained, and we are in a position to handle discontinuous wave motions (see Sec. 1.3).

Now S and T determine a distribution

R(s,t) = S(s)T(t)

in two variables, which is known as the direct product of S and T. It may be defined by

Rστ〉 = SσTτ

where σ is a testing function of s and τ a testing function of t. It may also be shown that, for any testing function ϕ = ϕ(s,t) and any fixed s (which makes ϕ a testing function of t), Tϕ〉 exists; as s varies, Tϕ〉 may be shown to be a testing function of s, so that STϕ〉〉 exists and may be used as a definition of Rϕ〉.

These concepts may be used to define the convolution of two distributions. To motivate this definition, consider two functions f and g of a single variable and define their convolution h = f * g by

image

Now let ϕ be a testing function in one variable. At least formally,

image

and if we set u = s + t, we get

image

Now let S and T be two distributions in a single variable and let S(s)T(t) be their direct product in the sense explained above. We then define the convolution S * T as that distribution of a single variable for which

S * Tϕ〉 = S(s)T(t)〈ϕ(s + t)〉

Already in the case of locally integrable functions f and g a difficulty arises in that h need not be locally integrable (indeed the integral defining h need not exist), so that Eq. (1.15) does not really define a distribution. The corresponding difficulty in the definition of S * T shows up in the form that ϕ(s + t) is constant along the lines s + t = const; it certainly does not vanish at infinity (unless it vanishes identically), and hence it is not a testing function. The situation can be saved if suitable restrictions are placed on S and T, for instance if the support of one of these distributions, say that of T, is contained in a finite interval. In this case it can be shown that T(t)〈ϕ(s + t)〉 is a testing function of s, so that S(s)〈T(t)〈ϕ(s + t)〉〉 is a valid definition of S * Tϕ〉.

EXAMPLE 1.16.   To show that for any distribution T(t), T * δ exists and is equal to T.

Here

δ(t)〈ϕ(s + t)〉 = ϕ(s)

is clearly a testing function, and

T(s)〈δ(t)〈ϕ(s + t)〉〉 = Tϕ

Similarly, it can be proved that

T * δ(k) = T(k)k = 1, 2, …

The convolution S * T is a continuous function of each of its two factors.

We have seen in Example 1.14 that the delta distribution can be represented as a distribution limit of functions. It can be shown that this is true of every distribution and, moreover, that the approximating functions may be chosen as infinitely differentiable. If in the exercise following Example 1.14 we take f as a testing function α, it is seen that there are sequences of testing functions αn such that αnδ as n → ∞. We then consider the distributions generated by the αn, denoting them again by αn, and with these distributions we consider the convolutions T * αn. By the continuity of the convolution, T * αnT * δ = T. Moreover, the distribution T * αn is equal to the function T(u)〈αn(tu)〉, and it is not difficult to prove that this function of t is infinitely differentiable. Thus we see that every distribution is the limit, in the sense of distributions, of infinitely differentiable functions.

Every locally integrable function has distribution derivatives of all orders, and some distributions can thus be represented as distribution derivatives of locally integrable functions. Such distributions are known as distributions of finite order, and the least integer r for which T = f(r) for some locally integrable f is called the order of T. In this sense, locally integrable functions are distributions of order zero, the delta distribution is of order one, and so on. Not all distributions are of finite order; for instance, the distribution T defined by

image

is not of finite order. Nevertheless, it can be proved that, given a distribution T and any finite interval I, there exists an integrable function f and an integer r such that T = f(r) on I. Thus, locally—i.e., on every finite interval—distributions are of finite order, but the order may increase indefinitely as the interval I is made to expand.

The results briefly described in the last two paragraphs give us an added understanding of the nature and structure of distributions, and they suggest alternative approaches to the theory of distributions. These will be taken up in Sec. 1.18.

APPLICATIONS AND EXTENSIONS

1.16    Application to Fourier Transforms

In this section, “testing function” will mean a rapidly decreasing infinitely differentiable function, i.e., an element of image, and “distribution” will mean a distribution of slow growth, i.e., an element of image.

For a function f that is absolutely integrable over (−∞, ∞) the Fourier transform image is defined by

image

As a result of the uniform convergence of the infinite integral, image is a continuous function of t, and by the appraisal

image

image is bounded. For two absolutely integrable functions f and g and their Fourier transforms image and image we have Parseval’s relationship

image

Since f and g are absolutely integrable and image and image are bounded, the infinite integrals on both sides of this equation converge.

For functions that fail to be integrable on (− ∞, ∞), this definition of Fourier transform does not apply. Nevertheless, the infinite integral evaluated in Example 1.15 suggests that the Fourier transform of the function f, for which f(t) = 1 for all t, exists in the distribution sense and is equal to 2πδ(t). We shall show that Fourier transforms can be defined for all distributions of slow growth, and that they are in general again distributions of slow growth. It may be remarked here that Fourier transforms of functions of slow growth were investigated (Ref. 1, Chap. VI) before the development of a general theory of distributions, and it might be noted further that a very elegant and elementary presentation9 of Fourier transforms of distributions of slow growth has recently been given.

Let us start with a testing function ϕ and its Fourier transform

image

Formal differentiation of this relationship leads to

image

and formal repeated integrations by parts to

image

Since ϕ is rapidly decreasing, repeated differentiation of image may be justified by the uniform convergence of all integrals involved, and it shows that the Fourier transform image is again infinitely differentiable. The rapid decrease of all derivatives of ϕ shows that repeated integrations by parts are legitimate, that the integrated parts vanish, and that Eq. (1.18) holds. By combining Eqs. (1.17) and (1.18), we have

image

If P(t) is any polynomial, we can now express image as a Fourier integral involving a sum of polynomials multiplied by derivatives of ϕ, and by estimating this Fourier integral we can show that image is rapidly decreasing. Thus, the Fourier transform of a testing function is again a testing function.

Since image is again absolutely integrable, Fourier’s inversion formula

image

applies in this case and shows that

image

Thus, every testing function is the Fourier transform of some testing function, so that the Fourier transformation is a one-to-one mapping of the space image of testing functions onto itself.

This mapping is clearly linear, and it is continuous, that is, image if and only if ϕn → 0 as n → ∞. Because of the essential symmetry of the relationship between ϕ and image, it will be sufficient to indicate the proof going one way. Since ϕn → 0 in the sense of convergence in image, given any > 0 we have

(1 + t2)|ϕn(t)| <

for all sufficiently large n, and hence

image

for all sufficiently large n. This shows that image uniformly for all t as n → ∞. To show that image in the sense of convergence in image, we have to show that for any integer k and any polynomial P(t)

image

uniformly for all t as n → ∞. Now, it has been pointed out above that image is the Fourier transform of a finite number of expressions of the form Q(t)ϕn(m)(t), where Q(t) again denotes a polynomial. Each of these expressions can be made less than /(1 + t2) by making n sufficiently large, thus proving as above that

image

uniformly for all t as n → ∞.

We now return to Fourier transforms of locally integrable functions. If f is absolutely integrable, then its Fourier transform is continuous and bounded; hence it is locally integrable and of slow growth. Thus, both f and image can be evaluated on testing functions, and Parseval’s relationship shows that

image

This suggests that we define the Fourier transform image of a distribution of slow growth T by

image

Clearly, image as thus defined is a linear functional on testing functions. This linear functional is continuous, since ϕnϕ entails image and consequently

image

According to this definition, every distribution of slow growth possesses a Fourier transform that is again a distribution of slow growth. The relationship

image

holds and shows that, conversely, every distribution of slow growth is the Fourier transform of some such distribution. Using the same symbol image for the Fourier transformation of distributions and for the Fourier transformation of functions, we may write the definition given above also as

image

or more briefly as

image

The relationships

image

are analogous to and follow from Eqs. (1.17) and (1.18); their proof is left as an exercise for the reader.

EXAMPLE 1.17.   To determine the Fourier transforms of the slowly increasing functions tn, n = 0, 1, ….

The expression

image

gives the value at t = 0 of the Fourier transform, in the sense of Eq. (1.16), of the integrable function image. By Eq. (1.18),

image

and by Fourier’s inversion formula,

image

So that

image

EXERCISE

Given that image; 0 < ν < 1. image; 0 < ν < 2 the reader should verify that these formulas hold for all nonintegral values of ν. (See Example 1.11 for the definitions involved here.)

The interested reader will find further material on Fourier transforms in Refs. 9 and 5; the latter reference also gives applications to partial differential equations.

1.17    Application to Differential Equations

In this section, “testing functions” will mean elements of image and “distributions” will mean elements of image.

Consider a linear ordinary differential equation

image

in which we assume the ai, i = 1, …, n, to be infinitely differentiale functions of t and f to be a locally integrable function of t. It is known from classical analysis that this differential equation possesses an infinity of solutions all of which are n − 1 times continuously differentiable, with x(n−1) absolutely continuous, so that x(n) exists and is locally integrable. It is also known that suitable initial conditions, for instance the values of x, x′, …, x(n−1) at a fixed point t0, determine a unique solution.

We may now regard Eq. (1.19) as a differential equation in distributions. If x is any distribution, x(k) is again a distribution; since ank is infinitely differentiable, ankx(k) is defined so that x may be substituted in the left-hand side of Eq. (1.19). In this sense, every function that satisfies the differential equation in the classical sense is also a distribution solution of that equation. In this context it is natural to ask whether Eq. (1.19) has distribution solutions that are not included among the classical solutions either because x, although a function, lacks the appropriate differentiability properties or because x itself is a distribution. The answer to this question is an emphatic no: As long as f is locally integrable, the classical solutions are the only distribution solutions of Eq. (1.19).

To prove this, consider Eq. (1.19) on a fixed closed finite interval. On this interval, the distribution x(n) is of a fixed order r; that is, it can be represented as the rth distribution derivative of some locally integrable function. If r ≥ 1, then x(n−1), x(n−2), … are of order r − 1 at most, and so is

image

But this last expression, being equal to x(n), is of order r; this contradiction shows that r cannot be ≥ 1 and hence must be 0. Thus, x(n) is of order zero on every finite interval and hence is a locally integrable function.

The situation is different if we now consider the same differential equation with a right-hand side f that is itself a distribution. In this case, every solution is necessarily a distribution solution. If f is of order r on an interval, then the argument of the preceding paragraph shows that x(n) is also of order r; if rn, then x itself is of order rn; and if r < n, then x is nr − 1 times continuously differentiable and x(nr−1) is absolutely continuous so that x(nr) is locally integrable. The difference of two solutions satisfies the homogeneous equation

image

and is a solution in the classical sense, so that the general distribution solution of Eq. (1.19) is the sum of a particular distribution solution and of the classical general solution of the homogeneous equation.

Such considerations are relevant with regard to Green’s functions (see, for instance, Ref. 4, Chap. 3) that satisfy differential equations of the form

image

and also satisfy appropriate boundary conditions. Since δ(tτ) is of order zero and infinitely differentiable on any interval not including τ, and δ(tτ) is of order 1 on any interval including that point, it follows that, under our assumptions on a1, …, an, the Green’s function is infinitely differentiable except at t = τ; at this point, it possesses n − 2 continuous derivatives while x(n−1) has a unit jump.

Similar statements hold for the differential equation

image

in which a0, …, an are infinitely differentiable functions, as long as a0(t) ≠ 0. Zeros of a0(t) are singularities of the differential equation, and at such singularities a different behavior may arise.

As an example, we shall consider the differential equation of the first order (see Ref. 6, Sec. 8)

tx′ + x = 0

The classical solution x = ct−1, where c is an arbitrary constant, exists on (0, ∞) and on (− ∞,0) but fails to be integrable in any neighborhood of t = 0. The distribution suggested by the classical solution is c[log t]′, where the prime indicates a distribution derivative. To prove that this is indeed a solution, we evaluate

image

and obtain, through integration by parts,

image

So far, the only complication caused by the singularity was the necessity of replacing t−1 by a distribution that, although equal to t−1 for all t ≠ 0, is not generated by it. But more is to come. The integrated form of the differential equation is (tx)′ = 0, or tx = c, and it can be verified that c[log t]′ satisfies this equation. However, the solution of the equation is not unique, for

ϕ〉 = δ〉 = 0

for all testing functions ϕ, showing that = 0. Thus, c[log t]′ + (t) satisfies tx′ + x = 0 for arbitrary values of a and c, so that this differential equation of the first order possesses a two-parameter family of distribution solutions.

Distributions are even more important in the theory of partial differential equations, but this topic is not within the scope of the present consideration.

1.18    Extensions and Alternative Theories

Different classes of generalized functions may be obtained by using different classes of testing functions. For instance, by using infinitely differentiable testing functions of a fixed period, one obtains generalizations of periodic integrable functions; by using k times differentiable testing functions, one obtains k times (rather than infinitely) differentiable distribution functions, etc. Distributions have also been defined on finite intervals, on arbitrary regions of n-dimensional space, on surfaces, and more generally on manifolds. For all these variants and extensions, the reader may consult Refs. 21 and 5. Vector-valued distributions and distributions in more abstract situations have also been studied, but these lie outside the scope of the present introduction. We mention only Ref. 15, which envisages applications to quantum mechanics.

In this chapter, we have chosen to follow Schwartz in presenting distributions as functionals on classes of testing functions. There are several alternative theories more or less equivalent to the one outlined here. The largest single group of these is inspired by the theorem that states that every distribution is the distribution limit of some sequence of functions (see Sec. 1.15) and indicates that it might be possible to construct distributions as generalized limits of functions.

Temple23 attributes a very general method of defining generalized functions as “weak limits” of ordinary functions to Mikusiński.11 This method was taken up by Ravetz;17 from the point of view of applied mathematics, it was treated by Temple24,25 and Saltzer;18 and from the point of view of Fourier analysis, it was discussed by Lighthill.9 A somewhat more direct approach is inspired by a combination of approximation to distributions through functions and the identification of distributions on finite intervals with generalized derivatives of functions; this was the viewpoint adopted by Sikorski22 and Korevaar.8 The latter author defines distributions by sequences of integrable functions that are convergent in the generalized sense that on any given finite interval the sequence may be made uniformly convergent through a suitable number of integrations, the number of integrations depending on the interval.

A rather different approach is based on the identification of distributions, on finite intervals, with generalized derivatives of locally integrable functions. Here distributions are represented by formal series

image

in which Dn is the symbol of generalized differentiation of order n and the fn are locally integrable functions of which all but a finite number vanish identically on any given finite interval. This approach was briefly indicated by Halperin6 and elaborated by König.7 It was adopted by Sauer,19 who used it in the solution of boundary-value problems.

Some of the major difficulties encountered in studying the mathematical theory of distributions are due to the circumstance that no norm exists for distributions, since none exists for the vector space of testing functions. There are several attempts at overcoming this difficulty, if necessary by restricting attention to a smaller class of generalized functions, and by basing the theory of distributions on the comparatively well-known and accessible theory of Banach spaces. We mention in this connection the work of E. R. Love10 and an unpublished theory of M. Riesz.

REFERENCES

1.   Bochner, S., “Vorlesungen über Fouriersche Integrale,” Akademie-Verlag G.m.b.H., Berlin, 1932.

2.   Dirac, P. A. M., “The Principles of Quantum Mechanics,” 2d ed., Oxford University Press, New York, 1935.

3.   Erdélyi, A., “Operational Calculus,” Mathematics Department, California Institute of Technology, Pasadena, Calif., 1955.

4.   Friedman, B., “Principles and Techniques of Applied Mathematics,” John Wiley & Sons, Inc., New York, 1956. “Operational Calculus and Generalized Functions,” California Institute of Technology, Pasadena, Calif., 1959.

5.   Gelfand, I. M., and G. E. Šilov, Fourier Transforms of Rapidly Increasing Functions and Questions of Uniqueness of the Solution of Cauchy’s Problem, Uspehi Mat. Nauk, n.s., vol. 8, no. 6 (58), pp. 3–54, 1953; Amer. Math. Soc. Transl., ser. 2, vol. 5, pp. 221–274, 1957.

6.   Halperin, I., “Introduction to the Theory of Distributions,” University of Toronto Press, Toronto, Canada, 1925. Based on lectures by Laurent Schwartz.

7.   König, H., Neue Begründung der Theorie der “Distributionen” von L. Schwartz, Math. Nachr., vol. 9, pp. 129–148, 1953.

8.   Korevaar, J., Distributions Defined from the Point of View of Applied Mathematics, Nederl. Akad. Wetensch. Proc., ser. A, vol. 58, pp. 368–389, 483–503, 663–674, 1955.

9.   Lighthill, M. J., “Introduction to Fourier Analysis and Generalized Functions,” Cambridge University Press, New York, 1958.

10.   Love, E. R., A Banach Space of Distributions, J. London Math. Soc., vol. 32, pp. 483–498, 1957; vol. 33, pp. 288–306, 1958.

11.   Mikusiński, J. G., Sur la méthode de généralisation de M. Laurent Schwartz et sur la convergence faible, Fund. Math., vol. 35, pp. 235–239, 1948.

12.   ——, “Rachunek Operatorów” [The Calculus of Operators], Monografie Matematyczne, Tom XXX, Polskie Towarzystwo Matematycne, Warszawa, 1953. “Operational Calculus,” Pergamon Press, Inc., New York, 1959.

13.   ——, Le Calcul opérationnel d’intervalle fini, Studia Math., vol. 15, pp. 225–251, 1956.

14.   —— and C. Ryll-Nardzewski, Un théorème sur le produit de composition des fonctions de plusieurs variables, Studia Math., vol. 13, pp. 62–68, 1953.

15.   Nikodým, O. M., Summation of Quasi-vectors on Boolean Tribes and Its Applications to Quantum Theories. I. Mathematically Precise Theory of the Genuine P. A. M. Dirac’s Delta Function, Rend. Sem. Mat., Univ. Padova (in preparation).

16.   Pol, Balth. van der, and H. Bremmer, “Operational Calculus, Based on the Two-sided Laplace Integral,” Cambridge University Press, New York, 1950.

17.   Ravetz, J. R., Distributions Defined as Limits, Proc. Cambridge Phil. Soc., vol. 53, pp. 76–92, 1957.

18.   Saltzer, C., The Theory of Distributions, Advances in Appl. Mech., vol. 5, pp. 91–110, 1958.

19.   Sauer, R., “Anfangswertprobleme bei partiellen Differentialgleichungen,” 2d ed., Springer-Verlag OHG, Berlin, 1958.

20.   Schmieden, C., and D. Laugwitz, Eine Erweiterung des Infinitesimalkalküls, Math. Z., vol. 69, pp. 1–39, 1958.

21.   Schwartz, L., “Théorie des distributious,” 2 vols., Hermann & Cie, Paris, 1950, 1951.

22.   Sikorski, R., A Definition of the Notion of Distribution, Bull. Acad. Polon. Sci., Cl. III, vol. 2, pp. 209–211, 1954.

23.   Temple, G., Theories and Applications of Generalized Functions, J. London Math. Soc., vol. 28, pp. 134–148, 1953.

24.   ——, La Théorie de la convergence généralisée et les fonctions généralisées et leur application á la physique mathématique, Rend. Mat. e Appl., ser. 5, vol. 11, pp. 113–122, 1953.

25.   ——, The Theory of Generalized Functions, Proc. Roy. Soc. London, ser. A, vol. 228, pp. 175–190, 1955.

26.   Weston, J. D., An Extension of the Laplace-transform Calculus, Rend. Circ. Mat. Palermo, ser. 2, vol. 6, pp. 1–9, 1957.

27.   ——, Operational Calculus and Generalized Functions, Proc. Roy. Soc. London, ser. A, vol. 250, pp. 460–471, 1959.