CHAPTER 18

PROBABILITY THEORY AND RANDOM PROCESSES

    18.1. Introduction

      18.1-1. Introductory Remarks

    18.2. Definition and Representation of Probability Models

      18.2-1. Algebra of Events Associated with a Given Experiment

      18.2-2. Mathematical Definition of Probabilities. Conditional Probabilities

      18.2-3. Statistical Independence

      18.2-4. Compound Experiments, Independent Experiments, and Independent Repeated Trials

      18.2-5. Combination Rules

      18.2-6. Bayes's Theorem

      18.2-7. Representation of Events as Sets in a Sample Space

      18.2-8. Random Variables

      18.2-9. Representation of Probability Models in Terms of Numerical Random Variables and Distribution Functions

    18.3. One-dimensional Probability Distributions

      18.3-1. Discrete One-dimensional Probability Distributions

      18.3-2. Continuous One-dimensional Probability Distributions

      18.3-3. Expected Values and Variance. Characteristic Parameters of One-dimensional Probability Distributions

      18.3-4 Normalization

      18.3-5. Chebyshev's Inequality and Related Formulas

      18.3-6. Improved Description of Probability Distributions: Use of Stieltjes Integrals

      18.3-7. Moments of a One-dimensional Probability Distribution

      18.3-8. Characteristic Functions and Generating Functions

      18.3-9. Semi-invariants

      18.3-10. Computation of Moments and Semi-invariants from χx(q), Mx(s), and γx(s). Relations between Moments and Semi-invariants.

    18.4. Multidimensional Probability Distributions

      18.4-1. Joint distributions

      18.4-2. Two-dimensional Probability Distributions. Marginal Distributions

      18.4-3. Discrete and Continuous Two-dimensional Probability Distributions

      18.4-4. Expected Values, Moments, Covariance, and Correlation Coefficient

      18.4-5. Conditional Probability Distributions Involving Two Random Variables

      18.4-6. Regression

      18.4-7. n-dimensional Probability Distributions

      18.4-8. Expected Values and Moments

      18.4-9. Regression. Multiple and Partial Correlation Coefficients

      18.4-10. Characteristic Functions

      18.4-11. Statistically Independent Random Variables

      18.4-12. Entropy of a Probability Distribution, and Related Topics

    18.5. Functions of Random Variables. Change of Variables

      18.5-1. Introduction

      18.5-2. Functions (or Transformations) of a One-dimensional Random Variable

      18.5-3. Linear Functions (or Linear Transformations) of a One-dimensional Random Variable

      18.5-4. Functions and Transformations of Multidimensional Random Variables

      18.5-5. Linear Transformations

      18.5-6. Mean and Variance of a Sum of Random Variables

      18.5-7. Sums of Statistically Independent Random Variables

      18.5-8. Compound Distributions

    18.6. Convergence in Probability and Limit Theorems

      18.6-1. Sequences of Probability Distributions. Convergence in Probability

      18.6-2, Limits of Distribution Functions, Characteristic Functions, and Generating Functions.Continuity Theorems

      18.6-3. Convergence in Mean

      18.6-4. Asymptotically Normal Probability Distributions

      18.6-5. Limit Theorems

    18.7. Special Techniques for Solving Probability Problems

      18.7-1. Introduction

      18.7-2. Problems Involving Discrete Probability Distributions: Counting of Simple Events and Combinatorial Analysis

      18.7-3. Problems Involving Discrete Probability Distributions: Successes and Failures in Component Experiments

    18.8. Special Probability Distributions

      18.8-1. Discrete One-dimensional Probability Distributions

      18.8-2. Discrete Multidimensional Probability Distributions

      18.8-3. Continuous Probability Distributions: the Normal (Gaussian) Distribution

      18.8-4. Normal Random Variables: Distribution of Deviations from the Mean

      18.8-5. Miscellaneous Continuous One dimensional Probability Distributions

      18.8-6. Two-dimensional Normal Distributions

      18.8-7. Circular Normal Distributions

      18.8-8. n-dimensional Normal Distributions

      18.8-9. Addition Theorems for Special Probability Distributions

    18.9. Mathematical Description of Random Processes

      18.9-1. Random Processes

      18.9-2. Mathematical Description of Random Processes

      18.9-3. Ensemble Averages

(a) General Definitions

(b) Ensemble Correlation Functions and Mean Squares

(c) Characteristic Functions

(d) Ensemble Averages of Integrals and Derivatives

      18.9-4. Processes Defined by Random Parameters

      18.9-5. Orthogonal-function Expansions

    18.10. Stationary Random Processes. Correlation Functions and Spectral Densities

      18.10-1. Stationary Random Processes

      18.10-2. Ensemble Correlation Functions

      18.10-3. Ensemble Spectral Densities

      18.10-4. Correlation Functions and Spectra of Real Processes

      18.10-5. Spectral Decomposition of Mean “Power” for Real Processes

      18.10-6. Some Alternative Ensemble Spectral Densities

      18.10-7.t Averages and Ergodic Processes (a) t Averages (b) Ergodic Processes

      18.10-8. Non-ensemble Correlation Functions and Spectral Densities

      18.10-9. Functions with Periodic Components

      18.10-10. Generalized Fourier Transforms and Integrated Spectra

    18.11. Special Classes of Random Processes. Examples

      18.11-1. Processes with Constant and Periodic Sample Functions

(a) Constant Sample Functions

(b) Random-phase Sine Waves

(c) More General Periodic Processes

      18.11-2. Band-limited Functions and Processes. Sampling Theorems

      18.11-3. Gaussian Random Processes

      18.11-4. Markov Processes and the Poisson Process

(a) Random Processes of Order n

(b) Purely Random Processes

(c) Markov Processes

(d) The Poisson Process

      18.11-5. Some Random Processes Generated by a Poisson Process

(a) Random Telegraph Wave

(b) Process Generated by Poisson Sampling

(c) Impulse Noise and Campbell's Theorem

      18.11-6. Random Processes Generated by Periodic Sampling

    18.12. Operations on Random Processes

      18.12-1. Correlation Functions and Spectra of Sums

      18.12-2. Input-Output Relations for Linear Systems

      18.12-3. The Stationary Case

      18.12-4. Relations for t Correlation Functions and Non-ensemble Spectra

      18.12-5. Nonlinear Operations

      18.12-6. Nonlinear Operations on Gaussian Processes

(a) Price's Theorem

(b) Series Expansion

    18.13. Related Topics, References, and Bibliography

      18.13-1. Related Topics

      18.13-2. References and Bibliography

18.1.INTRODUCTION

18.1-1.Mathematical probabilities are values of a real numerical function defined on a class of idealized events, which represent results of an experiment or observation. Mathematical probabilities are not defined directly in terms of “likelihood” or relative frequency of occurrence; they are introduced by a set of defining postulates (Sec. 18.2-2; see also Sec. 12.1-1) which abstract essential properties of statistical relative frequencies (Sec. 19.2-1).* The concept of probability can, then, often be related to reality by the assumption that, in practically every sequence of independently repeated experiments, the relative frequency of each event tends to a limit represented by the corresponding probability (Sec. 19.2-1).* Theories based on the probability concept may, however, be useful even if they are not subject to direct statistical interpretation.

Probability theory deals with the definition and description of models involving the probability concept. The theory is especially concerned with methods for calculating the probability of an event from the known or postulated probabilities of other events which are logically related to the first event. Most applications of probability theory may be interpreted as special cases of random processes (Sees. 18.8-1 to 18.11-5).

* Whenever this proposition is justified, it must be regarded as a law of nature; it should not be confused with mathematical theorems like Bernoulli's theorem or the mathematical law of large numbers (Sec. 18.6-5).

18.2. DEFINITION AND REPRESENTATION OF PROBABILITY MODELS

18.2-1. Algebra of Events Associated with a Given Experiment. Each probability model describes a specific idealized experiment or observation having a class δ† of theoretically possible results (events, states) E permitting the following definitions.

The union (logical sum) E1E2 ∪ . . . (or E1 + E2 + . . .) of a countable (finite or infinite) set of events E1, E2, . . . is the event of realizing at least one of the events E1, E2, . . . .

The intersection (logical product) E1E2 (or E1E2) of two events E1 and E2 is the joint event of realizing both E1 and E2.

The (logical) complement of an event E is the event of not realizing E (“opposite” or complementary event of E).

I is the certain event of realizing at least one of the events of δ†.

0 is the impossible event of realizing no one of the events of δ†.

In each case, the class δ of events comprising δ† and 0 is to constitute a completely additive Boolean algebra (algebra of events associated with the given experiment or observation) having all the properties outlined in Secs 12.8-1 and 12.8-4. Either E1E2 = E1 or ElE2 = E2 implies the logical inclusion relation E2E1 (E2 implies E1); note 0 ⊂ EI. E1 and E2 are mutually exclusive (disjoint) if and only if E1E2 = 0.  The set δ1 of joint events EE1 is the algebra of events associated with the given experiment under the hypothesis that E1 occurs; E1E1 = E1 is the certain event in δ1(see also Sec. 12.8-3).

18.2-2. Mathematical Definition of Probabilities. Conditional Probabilities. It is possible to assign a (mathematical) probability P[E] (probability of E, probability of realizing the event E) to each event E of the class δ (event algebra, Sec. 18.2-1) associated with a given experiment if and only if one can define a single-valued real function P[E] on δ so that

image

Postulates 1 to 3 imply 0 ≤ P[E] ≤ 1; in particular, P[E] = 0 if E is an impossible event. Note carefully that P[E] = 1 or P[E] = 0 do not necessarily imply that E is, respectively, certain or impossible.

A fourth defining postulate relates the “absolute” probability P[E] associated with the given experiment to the “conditional” probabilities P[E|E1] referring to a “simpler” experiment restricted by the hypothesis that E1 occurs. The conditional probability P[E|E1] of E on (relative to) the hypothesis that the event E1 occurs is defined by the postulate

image

P[E|E1] is not defined if P[E1] = 0.

In the context of the restricted experiment, the quantities P[E|E1] are ordinary probabilities associated with the joint events EE1 constituting the event algebra δ1 of the restricted experiment (Sec. 18.2-1). In practice, every probability can be interpreted as a conditional probability relative to some hypothesis implied by the experiment under consideration.

18.2-3. Statistical Independence.  Two events E1 and E1 are statistically independent (stochastically independent) If and only if

(18.2-1)image

so that [E1|E2] = P[E1] if P[E2] ≠ 0, and P[E2|E1] = P[E2] if P[E1] ≠ 0.

N events E1 E2, . . . , EN are statistically independent if and only if not only each pair of events Ei, Ek but also each pair of possible joint events is statistically independent:

(18.2-2)image

18.2-4. Compound Experiments, Independent Experiments, and Independent Repeated Trials. Frequently an experiment appears as a combination of component experiments (see also Sees. 18.7-3 and 18.8-1). Let E′, E″, E′″, . . . denote any result associated, respectively, with the first, second, third, . . . component experiment. The results of the compound experiment can be described as joint events E = E′ ∩ E″ ∩ E′″ ∩ . . . ; their probabilities will, in general, depend on the nature and interaction of all component experiments. The probability P[E′] of realizing the component result E′ in the course of a given compound experiment is, in general, different from the probability associated with E′ in an independently performed component experiment.

Two or more component experiments of a given compound experiment are independent if and only if their respective results E′, E″, E′″, . . . obtained in the course of the compound experiment are statistically independent, i.e.,

image

for all E′, E″, E′″, . . . (Sec. 18.2-3). If a component experiment is independent of all others, the probability of realizing each of its results in the course of the given compound experiment is equal to the corresponding probability for the independently performed component experiment.

Repeated independent trials are independent experiments each having the same set of possible results E and the same set of associated probabilities P[E]. The probability of obtaining the sequence of results E1 E2, . . . En in the compound experiment corresponding to a sequence of n repeated independent trials is

(18.2-3)image

18.2-5. Combination Rules (see also Sees. 18.7-1 to 18.7-3). Each of the theorems in Table 18.2-1 expresses the probability of an event in terms of the (possibly already known) probabilities of other events logically related to the first event.

More generally, the probability of realizing at least m and exactly m of N (not necessarily statistically independent) events E1 E2, . . . , EN is, respectively

(18.2-4)(18.2-5)(18.2-6)image

If E1 E2, . . . , EN are statistically independent, the quantities (5) reduce to the symmetric functions (1.4-9) of the P[E1] (Table 18.2-1b).

EXAMPLES: If the probability of each throw with a die is 1/6 then

The probability of throwing either 1 or 6 is 1/6 + 1/6 = 1/3

The probability of throwing 6 at least oncein two throws is 1/6 + 1/6 – 1/36 = 11/36

The probability of throwing 6 exactly once in two throws is 1/3 – 2/3 = 5/18

The probability of throwing 6 twice in two throws is 1/36; etc.

18.2-6. Bayes's Theorem (see also Sec. 18.4-5b). Let H1 H2,. . . be a set of mutually exclusive events such that H1H2 ∪ . . . = I . Then, for each pair of events Hi, E,

(18.2-7)image

Equation (7) can be used to relate the “a priori” probability P[Hi] of a hypothetical cause Hi of the event E to the “a posteriori” probability P[Hi|E] if (and only if) the Hi are “random” events permitting the definition of probabilities P[Hi].

Table 18.2-1. Probabilities of Logically Related Events

image

18.2-7. Representation of Events as Sets in a Sample Space. Every class S of events E permitting the definition of probabilities P[E] can be described in terms of a set T of mutually exclusive events Ê ≠ 0 such that each event E is the union of a corresponding subset of Ê. Ê is called a sample space or fundamental probability set associated with the given experiment; each set of sample points (simple events, elementary events, phases) Ê of T corresponds to an event E. In particular, T itself corresponds to a certain event, and an empty subset of T corresponds to an impossible event.

The probabilities P[E] can then be regarded as values of a set function, the probability function defining the probability distribution of the sample space. Each probability P[E] is the sum of the probabilities attached to the simple events included in the event E.

The event algebra S is thus represented isomorphically by an algebra of measurable sets (see also Sees. 4.6-17b and 12.8-4). The fundamental probability set associated with the conditional probabilities P[E|E1] is the subset of I representing E1. Conversely, a sample space associated with any given experiment may be regarded as a subset “embedded” in a space of events associated with a more general experiment (see also Sees. 18.2-1 and 18.2-2).

18.2-8. Random Variables. A random variable (stochastic variable, chance variable, variate) is any (not necessarily numerical)* variable x whose “values” x = X constitute a fundamental probability set (sample space, Sec. 18.2-7) of simple events [x = X], or whose values label the points of a sample space on a reciprocal one-to-one basis. The associated probability distribution is the distribution of the random variable x.  The definition of any random variable must specify its distribution.

Every single-valued measurable function (Sec. 4.6-14c) x defined on any fundamental probability set T is a random variable; its distribution is defined by the probabilities of the events (measurable subsets of T, Sec. 18.2-7) corresponding to each set of values of x.

18.2-9. Representation of Probability Models in Terms of Numerical Random Variables and Distribution Functions. The simple events (sample points) Ê of the fundamental probability set associated with a given problem are frequently labeled with corresponding values (sample values) X of a real numerical random variable x. Each sample value of x may, for instance, correspond to the result of a measurement defining a simple event. Compound events, like [xa], [sin x > 0.5], or [x = arctan 2], correspond to measurable sets of values of x (see also Sec. 18.2-8).

More generally, each simple event may be labeled by a corresponding (ordered) set X ≡ (X1, X2, . . .) of real numbers X1, X2, . . . which

* The boldface type used to denote a multidimensional random variable x does not necessarily imply that x is a vector.

constitutes a “value” of a multidimensional random variable x ≡ (x1, x2, . . .). Each of the real variables x1, x2, . . . is itself a random variable (see also Sec. 18.4-1).

Given a random variable x; or x labeling the simple events of the given fundamental probability set on a one-to-one basis, the probabilities associated with the corresponding experiment are uniquely described by the probability distribution of the random variable.

Throughout this handbook, all real numerical random variables are understood to range from – ∞ to + ∞ ; values of a numerical random variable which do not label a possible simple event Ê are treated as impossible events and are assigned the probability zero.

The distribution (or the probability function, Sec. 18.2-7) of any real numerical random variable x is uniquely described by its (cumulative) distribution function

image

Similarly, the distribution of a multidimensional random variable x ≡ (x1, x2, . . .) is uniquely described by its (cumulative) distribution function

image

Conversely, the distribution function corresponding to a given probability distribution is uniquely defined for all values of the random variable in question. Every distribution function is a nondecreasing function of each of it's arguments, and

image

18.3. ONE-DIMENSIONAL PROBABILITY DISTRIBUTIONS

18.3-1. Discrete One-dimensional Probability Distributions (see Tables 18.8-1 to 18.8-7 for examples). The real numerical random variable x is a discrete random variable (has a discrete probability distribution) if and only if the probability

image

is different from zero only on a countable set of spectral values X = X(1), X(2), . . . (spectrum of the discrete random variable x). Each discrete probability distribution is defined by the function (1), or by the corresponding (cumulative) distribution function (Sec. 18.2-9)

image

Throughout this handbook, the notation image will be used to

signify summation of a function y(x) over all spectral values X(i) of a discrete random variable x (see also Sec. 18.3-6). Note

image

18.3-2. Continuous One-dimensional Probability Distributions (see Table 18.8-8 for examples). The real numerical random variable x is a continuous random variable (has a continuous probability distribution) if and only if its (cumulative) distribution function Φx(X) ≡ Φ(X) is continuous and has a piece wise continuous derivative, the frequency function (probability density, differential distribution function) of x

image

for all X .† P[X < xX + dx] = dΦ = φ(X) dx is called a probability element (probability differential). Note

image

If x is a continuous random variable, each event [x = X] has the probability zero but is not necessarily impossible. The spectrum of a continuous random variable x is the set of values x = X where φ(X) ≠ 0.

* In terms of the step function U -(t) [U-(t) = 0 if t < 0, U_(t) = 1 if t ≥ 0, Sec. 21.9-1],

image

† Some authors call a probability distribution continuous whenever its distribution function is continuous.

NOTE: A random variable can be continuous (i.e., have a piecewise continuous frequency function) over part of its range, while it is discrete elsewhere (see also Sec. 18.3-6).

18.3-3. Expected Values and Variance. Characteristic Parameters of One-dimensional Probability Distributions (see also Sec. 18.3-6). (a)  The expected value (mean, mean value, mathematical expectation) of a function y(x) of a discrete or continuous random variable x is

image

if this expression exists in the sense of absolute convergence (see also Sees. 4.6-2 and 4.8-1).

(b) In particular, the expected value (mean, mean value, mathematical expectation) E{x} = ξ and the variance Var {x} = σ2 of a discrete or continuous one-dimensional random variable x are denned by

image

For computation purposes note (see also Sec. 18.3-10)

image

Whenever E{x} and Var {x} exist, the mean square deviation

image

of the random variable x from one of its values X is least (and equal to σ2) for X = ξ.

(c) E{x} and Var {x} are not functions of x; they are functionals (Sec. 12.1-4) describing properties of the distribution of x. E{x} is a measure of location, and Var {x} is a measure of dispersion (or concentration) of the probability distribution of x. A number of other numerical “characteristic parameters” describing specific properties of one-dimensional probability distributions are defined in Table 18.3-1 and in Sees. 18.3-7 and 18.3-9. Note that one or more parameters like E{x), Var {x), E{|x — ξ|}, . . . may not exist for a given probability distribution.

(d)  Tables 18.8-1 to 18.8-8 list mean values and variances for a number of frequently used probability distributions.

Table 18.3-1. Numerical Parameters Describing Properties of One-dimensional Probability Distributions (see also Sees. 18.3-3, 18.3-7, and 18.2.1)

image

18.3-4. Normalization.  Given a function ψ(x) ≥ 0 known to be proportional to the function p(x) associated with a discrete random variable x (Sec. 18.3-1),*

image

Given a function ψ(x) ≥ 0 known to be proportional to the frequency function φ(x) of a continuous random variable x (Sec. 18.3-2),

image

In either case, k is called the normalization factor.  Analogous procedures apply to multidimensional distributions (Sec. 18.4-1).

18.3-5. Chebyshev's Inequality and Related Formulas.  The following formulas specify upper bounds for the probability that a random variable x, or its absolute deviation |x — ξ| from the mean value ξ = E{x}, exceeds a given value a > 0.

image

If x has a continuous distribution with a single mode (Table 18.3-1) ξmode, one has the stronger inequality

image

where ∑ is Pearson's measure of skewness (Table 18.3-1); note that 2 = 0 if the distribution is symmetrical about the mode.

18.3-6. Improved Description of Probability Distributions:Use of Stieltjes Integrals.  The treatment of discrete and continuous probability distributions is unified if one expresses the probability of each event [X – ΔX < xX + ΔX] as a Lebesgue-Stieltjes integral (Sec. 4.6-17)

image

* In order to conform with the notation used in many textbooks, the values x = X of a random variable x will be denoted simply by x whenever this notation does not lead to ambiguities.

where Φ(X) ≡ P[xX] is the cumulative distribution function (Sees. 18.2-9, 18.3-1, and 18.3-2) defining the distribution of the random variable x. For continuous distributions the Stieltjes integral (15) reduces to a Riemann integral. For a discrete distribution, Φ(X) is given by Eq. (2), and P[X — ΔX < xX + ΔX] reduces to the function p(X) denned in Sec. 18.3-1.

In terms of the Stieltjes-integral notation,

image

for both discrete and continuous distributions. The Stieltjes-integral notation applies also to probability distributions which are partly discrete and partly continuous. An analogous notation is used for multidimensional distributions (Sees. 18.4-4 and 18.4-8).

Discrete distributions may be formally represented in terms of a “probability-density” involving impulse functions δ_(X — Xi)) (see also Sees. 18.3-1 and 21.9-6).

18.3-7. Moments of a One-dimensional Probability Distribution (see also Sees. 18.3-6 and 18.3-10).  (a)  The moment of order r0 (rth moment) about x = X of a given random variable x is the mean value E{(xX)r}, if this quantity exists in the sense of absolute convergence (Sec. 18.3-3).

(b) In particular, the rth moment of x about X = 0 is

image

and the rth moment of x about its mean value ξ (central moment of order r) is

image

The existence of αr or µr implies the existence of all moments αk and µk of order kr; the divergence of αr or µr implies the divergence of all moments αk and µk of order kr.

If the probability distribution is symmetric about its mean, all (existing) central moments µr of odd order r are equal to zero.

(c) The rth factorial moment of x about X = 0 is

image

The rth central factorial moment of x is E{(x — ξ)[r]}. The rth absolute moment of x about X = 0 is βr = E{|x|r}. Note

image

(d) A one-dimensional probability distribution is uniquely defined by its moments α0, α1, α2, . . . if they all exist and are such that the series image converges absolutely for some |s| > 0 [see also Eq. (28) and the footnote to Sec. 18.3-8b].

(e) Refer to Tables 18.8-1 to 18.8-7 for examples, and to Sec. 18.3-10 for relations connecting the αr, µr, and α[r].

18.3-8. Characteristic Functions and Generating Functions (see also Sec. 18.3-6; refer to Tables 18.8-1 to 18.8-8 for examples).*

(a)The probability distribution of any one-dimensional random variable x uniquely defines its (generally complex-valued) characteristic function

image

where q is a real variable ranging between – ∞ and ∞.

(b) The probability distribution of a random variable x uniquely defines its moment-generating function

image

and its generating function (see also Sec. 8.6-5)

* See footnote to Sec, 18.3-4.

image

for each value of the complex variable s such that the function in question exists in the sense of absolute convergence.

(c) The characteristic function χx(q) defines the probability distribution of x uniquely.* The same is true for each of the functions Mx(s) and γx(s) if it exists, in the sense of absolute convergence, throughout an interval of the real axis including s = 0 in the case of Mx(s), and s = 1 in the case of γx(s). Specifically, if £ is a discrete or continuous random variable,

image

Eq. (24) also yields p(x) or φ(x) in terms of Mx(s), since

image

(d) In many problems it is much easier to obtain a description of a probability distribution in terms of χx(q), Mx(s), or γx(s) than to compute Φ(z), p(x), or φ(x) directly (Sees. 18.5-3b,18.5-7, and 18.5-8). Again, the methods of Sec. 18.3-10 permit one to compute mean values, variances, and moments by simple differentiations if χx(q), Mx(s), or γx(s) are known. The linear integral transformations (21) to (24) can often be made with the aid of tables of Fourier or Laplace transform pairs (Appendix D).

(e) The generating function γx(s) is particularly useful in problems involving discrete distributions with spectral values 0, 1, 2, . . . , for then

image

whenever the series converges (see also Sec. 18.8-1; see Ref. 18.4 for a number of interesting applications).

18.3-9. Semi-invariants (see also Sec. 18.3-10).  Given a one-dimensional probability distribution such that the rth moment αr exists, the first r semi-invariants (cumulants) k0, k1 k2, . . . , kr of the distribu-

* Φ(x) is, then, uniquely defined except possibly on a set of measure zero; Φ(x) is unique wherever it is continuous (see also Sec. 18.2-9).

tion exist and are defined by

image

Under the conditions of Sec. 18.3-7d all semi-invariants k1, k2, . . . exist and define the distribution uniquely.

18.3-10. Computation of Moments and Semi-invariants from χx(q), Mx(s), and γx(s).  Relations between Moments and Semi-invariants. Many properties of a distribution can be computed directly from χx(q), Mx(s), or γx(s) without previous computation of Φ(x), φ(x), or p(x). If the quantities in question exist,

image

Note

image

provided that the function on the left (respectively the moment generating function, the semi-invariant-generating function, and the factorial-moment-generating function of x) is analytic throughout a neighborhood of s = 0.

Equations (28) yield E{x} and Var {x} with the aid of the relations

image

Table 18.3-1 lists other parameters which can be expressed in terms of moments.

The following additional formulas relate moments and semi-invariants:

image

image

18.4. MULTIDIMENSIONAL PROBABILITY DISTRIBUTIONS

18.4-1. Joint Distributions (see also Sec. 18.2-9).  The probability distribution of a multidimensional random variable x ≡ (x1, x2, . . .) is described as a joint distribution of real numerical random variables x1, x2, . . . . Each simple event (point of the multidimensional sample space) [x = X] ≡ [x1 = X1 x2 = X2, . . .] may be regarded as a result of a compound experiment in which each of the variables x1, x2, . . . is measured. Each joint distribution is completely defined by its (cumulative) joint distribution function.

18.4-2. Two-dimensional Probability Distributions. Marginal Distributions.  The joint distribution of two random variables x1, x2 is defined by its (cumulative) distribution function

image

The distribution of x1 and x2 (marginal distributions derived from the joint distribution of x1 and x2) are described by the corresponding marginal distribution functions

image

18.4-3. Discrete and Continuous Two-dimensional Probability Distributions.  (a) A two-dimensional random variable x ≡ (x1, x2) is a discrete random variable (has a discrete probability distribution) if and only if the joint probability

image

is different from zero only for a countable set (spectrum) of “points” (X1, X2), i.e., if and only if both x1 and x2 are discrete random variables (Sec. 18.3-1). The marginal probabilities respectively associated with the marginal distributions of x1 and x2 (Sec. 18.4-2) are

image

(b) A two-dimensional random variable x = (x1, x2) is a continuous random variable (has a continuous probability distribution) if and only if (1) Φ(X1, X2) is continuous for all X1, X2, and (2) the joint frequency function (probability density)

image

exists and is piecewise continuous everywhere.* φ(X1 X2) dx1 dx2 is called a probability element.  The spectrum of a continuous two-dimensional probability distribution is the set of “points” (X1, X2) where the frequency function (5) is different from zero. The marginal frequency functions respectively associated with the (necessarily continuous) marginal distributions of x1 and x2 (Sec. 18.4-2) are

image

(c) Note

image

18.4-4. Expected Values, Moments, Covariance, and Correlation Coefficient.  (a) The expected value (mean value, mathematical expectation) of a function y = y(x1, x2) of two random variables x1, x2 with respect to their joint distribution is

image

* See footnote to Sec. 18.3-2.

if this expression exists in the sense of absolute convergence (see also Sec. 18.3-3).

NOTE: If y is a function of x1 alone, the mean value (8) is identical with the mean value (marginal expected value) with respect to the marginal distribution of x1.

(b) The mean values E{x1} = ξ1, E{x2} = ξ2 define a “point” (ξ1, ξ2) called the center of gravity of the joint distribution. The quantities E{(x1X1)r1(x2X2)r2 are called moments of order r1 + r2 about the “point” (X1 X2).  In particular, the quantities

image

are, respectively, the moments about the origin and the moments about the center of gravity (central moments) of order r1 + r2 (see also Sec. 18.3-76).

(c) The second-order central moments are of special interest and warrant a special notation.  Note the following definitions:

image

(see also Sec. 18.4-8). Note — 1 ≤ ρ 12 ≤ 1, and

image

18.4-5. Conditional Probability Distributions Involving Two Random Variables.  (a) The joint distribution of two random variables x1, x2 defines a conditional distribution of x1 relative to the hypothesis that x2 = X2 for each value X2 of x2 and a conditional distribution of x2 relative to each hypothesis x1 = X1.  The conditional distributions of x1 and x2 derived from a discrete joint distribution (Sec. 18.4-3a) are discrete and may be described by the respective conditional probabilities (Sec. 18.2-2)

image

The conditional distributions of x and x2 derived from a continuous joint distribution (Sec. 18.4-3b) are continuous and may be described by the respective conditional frequency functions

image

(b) Note

image

(c) Given a discrete or continuous joint distribution of two random variables x1 and x2, the conditional expected value of a function y(x1, x2) relative to the hypothesis that x1 = X1 is

image

if this expression exists in the sense of absolute convergence.  Note that E{y(x1, x2)|X1} is a function of X1

EXAMPLE: The conditional variances of x1 and x2 are the respective functions

image

18.4-6. Regression(see also Sees. 18.4-9 and 19.7-2).  (a) Given the joint distribution of two random variables x1 and x2, a regression of x2 on x1 is any function g2(x1) used to approximate the statistical dependence of x2 on x1 by a deterministic relation x2g2(x1).  More specifically, x2 is written as a sum of two random variables,

image

where h2(x1, x2) is regarded as a correction term.  In particular, the function

image

often simply called the regression of x2 on x1, minimizes the mean-square deviation

image

The corresponding curve x2 = E{x2|x1} is the (theoretical) mean-square regression curve of x2.

(b) It is often sufficient to approximate the regression (19) by the linear function

image

Equation (21) describes a straight line, the mean-square regression line of x2; β21 is the regression coefficient of x2 on x1.  Equation (21) represents the linear function ax1 + b whose coefficients a, b minimize the mean-square deviation

image

The resulting minimum mean-square deviation is σ22(1 — ρ122); the correlation coefficient ρ12 is seen to measure the quality of the “best” linear approximation.

(c) The mean-square regression (19) may be approximated more closely by a polynomial of degree m (parabolic mean-square regression of order m) or by other approximating functions, with coefficients or parameters chosen so as to minimize (20).

(d) If a; 2 is regarded as the independent variable, one has similarly

image

Note that in general neither (19) and (22) nor (21) and (23) are inverse functions. All mean-square regression curves and mean-square regression lines pass through the center of gravity (ξ1, ξ2) of the joint distribution.

The above definitions apply, in particular, if either of the two random variables, say x1 = t, becomes a given independent variable, and x2{t) describes a random process (Sec. 18.9-1).

18.4-7. n-dimensional Probability Distributions.  (a) The joint distribution of n random variables x1, x2, . . . , xn is uniquely described by its (cumulative) joint distribution function

image

(Sec. 18.2-9). The joint distribution of m < n of the variables x1, x2, . . . , xn is an m-dimensional marginal distribution derived from the original joint distribution.  One obtains the corresponding marginal distribution function from the joint distribution function (24) by substituting Xj = ∞ for each of the nm arguments Xj, which do not occur in the marginal distribution, e.g.,

image

(b) An n-dimensional random variable x ≡ (x1, x2, . . . , xn) is a discrete random variable (has a discrete probability distribution) if and only if the joint probability

image

differs from zero only for a countable set (spectrum) of “points” (X1, X2, . . . , Xn), i.e., if and only if each of the n random variables x1, x2, . . . , xn is discrete (see also Sees. 18.3-1 and 18.4-3a).

Marginal probabilities and conditional probabilities are defined in the manner of Sees. 18.4-3a and 18.4-5a, e.g.,

image

(c) An n-dimensional random variable x ≡ (x1, x2, . . . , xn) is a continuous random variable (has a continuous probability distribution if and only if (1) Φ(X1, X2, . . . , XN) is continuous for all X1, X2, . . . , XN and (2) the joint frequency function (probability density)

image

exists and is piecewise continuous everywhere.* φ(X1, X2, . . . , Xn) dx1 dx2 . . . dxn is called a probability element (see also Sees. 18.3-2 and 18.4-3b). The spectrum of a continuous probability distribution is the set of “points” (X1, X2, . . . , Xn) where the frequency function (26) is different from zero.

(d) Note

image

(e) The frequency functions associated with the (necessarily continuous) marginal and conditional distributions derived from a continuous w-dimensional probability distribution are denned in the manner of Sees. 18.4-3b and 18.4-5a, e.g.,

image

(f) The joint distribution of two or more multidimensional random variables x = (x1, x2, . . .), y = (y1, y2, . . .), . . . is the joint distribution of the random variables x1, x2, . . . ; y1, y2, . . . ; . . . .

NOTE: A joint distribution may be discrete with respect to one or more of the random variables involved, and continuous with respect to one or more of the others; and each random variable may be partly discrete and partly continuous.

18.4-8. Expected Values and Moments (see also Sec. 18.4-4).  (a) The expected value (mean value, mathematical expectation) of a function y = y(x1, x2, . . . , xn) of n random variables x1, x2, . . . , xn with respect to their joint distribution is

* See footnote to Sec. 18.3-2.

image

if this expression exists in the sense of absolute convergence.

NOTE : If y is a function of only m < n of the n random variables x1, x2, . . . , xn, then the mean value (28) is identical with the mean value of y with respect to the joint distribution (marginal distribution, Sec. 18.4-7) of the m variables in question.

(b) The n mean values E{x1} = ξ1, E{x2} = ξ2, . . . E{xn) = ξn define a “point” (ξ1, ξ2, . . . , ξn) called the center of gravity of the joint distribution. The quantities E{(x1X1)r1(x2X2)r2 . . . (xnXN)rn} are the moments of order r1 + r2 + . . . + rn about the “point” (X1, X2, . . . , Xn). In particular, the quantities

image

are, respectively, the moments about the origin and the moments about the center of gravity (central moments).

(c) The second-order central moments are again of special interest and warrant a special notation; the quantities

image

define the moment matrixik] ≡ Δ and its reciprocal (Sec. 13.2-3)*

image

det [λik] is the generalized variance of the joint distribution.  The (total) correlation coefficients

* Note that some authors denote the cofactor matrix [λik]—1 det [λik] by [Δik].  The notation chosen here simplifies some expressions,

image

(see also Sec. 18.4-4c) define the correlation matrix [pik] of the joint distribution. image is sometimes called the scatter coefficient.

The matrices [ λik] and [ρik] are real, symmetric, and nonnegative (Sees. 13.3-2 and 13.5-2). Their common rank (Sec. 13.2-7) r is the rank of the joint distribution. The ellipsoid of concentration corresponding to a given n-dimensional probability distribution is the n-dimensional “ellipsoid”

image

defined so that a uniform distribution of a unit probability “mass” inside the hyper-surface has the moment matrix [ λik]. The ellipsoid of concentration illustrates the “concentration” of the distribution in different “directions”; the “volume” of the ellipsoid is proportional to the square root of the generalized variance. For r < n, the probability distribution is singular: its spectrum (Sec. 18.4-7) is restricted to an r-dimensional linear manifold (straight line, plane, hyperplane) in the n-dimensional space of “points” (x1, x2, . . . , xN), and the same is true for its ellipsoid of concentration. Thus the spectrum of a two-dimensional probability distribution is restricted to a straight line if r = 1, and to a point if r = 0.

18.4-9. Regression. Multiple and Partial Correlation Coefficients (see also Sees. 18.4-6 and 19.7-2). (a) Given the joint distribution of n random variables x1, x2, . . . , xN, one may study the dependence of one of the variables, say x1 on the remaining n - 1 variables by writing

image

where h1(x1, x2, . . . , xN) is regarded as a correction term. The function

image

(mean-square regression of x1 on x2, x3, . . . , xN) minimizes the mean-square deviation E[x1 — g1(x2, x3 . . . , xN)]2; E{x1|X2, X3, . . . , XN) is the conditional mean of x1 relative to the hypothesis that x2 = X2, x3 = X3, . . . , xN = XN (see also Sec. 18.4-5c)

(b) The mean-square regression of any variable Xi on the remaining n — 1 variables is often approximated by the linear function

image

(see also Sec. 18.4-6). * The regression coefficients βik are uniquely determined if the distribution is nonsingular (Sec. 18.4-8). The multiple correlation coefficient

image

is a measure of the correlation between xi and the remaining n - 1 variables.

(c) The random variable hi(1) xi — gi(1) (difference between xi and its “linear estimate” gi(1) for x0668;ii ≠ 0) is the residual of xi with respect to the remaining n — 1 variables. Note

image

(d) Regressions and residuals may be similarly defined in connection with a suitable marginal distribution (Sec. 18.4-7a) of m < n variables, say x1, x2, . . . , xm. The quantities analogous to β12, β13, . . . ; h1(1), h2(1), . . . are then respectively denoted by β12.34...m, β13.24...m, . . . ; h(1)1.23...m, h(1)2.13...m, . . .; in each case, there is a subscript corresponding to each variable of the marginal distribution.

(e) The partial correlation coefficient of x1 and x2 with respect to x3, x4,. . . , xN

image

measures the correlation of x1 and x2 after removal of the linearly approximated effects of x3, x4, . . . , xN. In particular, for n = 3,

image

18.4-10. Characteristic Functions (see also Sec. 18.3-8). The probability distribution of an n-dimensional random variable x ≡ (x1, x2, . . . , xN) uniquely defines the corresponding characteristic function (joint characteristic function of x1, x2 ... , xN)

* See footnote to Sec. 18.4-8b.

image

and conversely. For continuous distributions,

image

The joint characteristic function corresponding to the marginal distribution of m < n of the n variables x1, x2, . . . , xN is obtained by substitution of qk = 0 in Eq. (39) whenever xk does not occur in the marginal distribution; thus χ12(qi, q2) ≡ χx(q1, q2, 0, . . . , 0).

Moments and semi-invariants of suitable multidimensional probability distributions can be obtained as coefficients in multiple series expansions of χx and loge χx in a manner analogous to that of Sec. 18.3-10.

18.4-11. Statistically Independent Random Variables (see also Sees. 18.2-3 and 18.5-7). * (a) A set of random variables x1, x2, . . . , xN are statistically independent if and only if the events [x1S1], [x2S2], . . . , [xNSN] are statistically independent for every collection of real-number sets S1 S2, . . . , SN. This is true if and only if

image

or, in the respective cases of discrete and continuous random variables, if and only if

image

The joint distribution of statistically independent random variables is completely defined by their individual marginal distributions. Statistically independent random variables x1, x2, . . . are uncorrelated, i.e., ρik = 0 for all i ≠ k (Sec. 18.4-8c), but the converse is not necessarily true (see also Sec. 18.8-8).

(b) Statistical independence of multidimensional random variables x1, x2, . . . is defined by Eqs. (41) or (42) on substitution of x1, x2, . . . for x1, x2, . . . .

* See footnote to Sec. 18.3-4.

EXAMPLE: The multidimensional random variables (x1 x2) and (x3, x4, x5) are statistically independent if and only if

image

Note that Eq. (43) implies the statistical independence of x2 and x5, x1 and (x3, x4), (x1, x2) and (x3, x5), etc.

(c) Given a joint distribution of n discrete or continuous random variables x1 x2, . . . , xN such that (x1 x2, . . . , xm) is statistically independent of (xm+1, xm+2, . . . , xN), note

image

(d) Two random variables x1 and x2 are statistically independent if and only if their joint characteristic function is the product of their individual (marginal) characteristic functions (Sec. 18.4-10), i.e.,

image

An analogous theorem applies for multidimensional random variables (see also Sec-18.5-7).

(e) If the random variables x1, x2, . . . are statistically independent, the same is true for the random variables y1(x1), y2(x2), . . . . An analogous theorem holds for multidimensional random variables.

18.4-12. Entropy of a Probability Distribution, and Related Topics, (a) The entropy associated with the probability distribution of a one-dimensional random variable x is defined as

image

H{x} (entropy of x) is a measure of the expected uncertainty involved in a measurement of x. In the case of discrete probability distributions, H{x) ≥ 0, with H{x} = 0 if and only if x has a causal distribution (Table 18.8-1). The continuous distribution having the largest entropy for a given variance σ2 is the normal distribution (Sec. 18.8-3), with H{x} = log2 image.

(b) In connection with the discrete or continuous joint distribution of two random variables x1, x2, one defines the joint entropy

image

and the conditional entropies

image

and Hx1{x2] (these are not conditional expected values, Sec. 18.4-5c), so that

image

The equality on the right applies if and only if x1 and x2 are statistically independent (Sec. 18.4-11). The nonnegative quantity

image

is a measure of the “statistical dependence” of x1 and x2. The functionals (46), (47), (48), and (50) have intuitive significance in statistical mechanics and in the theory of communications.

18.5. FUNCTIONS OF RANDOM VARIABLES.

CHANGE OF VARIABLES

18.5-1. Introduction. The following relations permit one to calculate probability distributions of suitable functions of random variables and, in particular, to change the random variables employed to describe a given set of events.

18.5-2. Functions (or Transformations) of a One-dimensional Random Variable, (a) Given a transformation y = y(x) associating a unique value of a random variable y with each value of the random variable x, the probability distribution of y is uniquely determined by that of x [see also Sec. 18.2-8; y(x) must be a measurable function].

(b) Let the random variables x and y be related by a reciprocal one-to-one transformation y = y(x), with x = x(y). Then

1.If v(x) is an increasina function.

image

Note that either y(x) or -y(x) is necessarily an increasing function. In either case, the medians x½ and y½ are related by y½ = y(x½).

2. If x and y are continuous random variables,

image

for all values Y of y such that dx/dy exists and is continuous.

NOTE: If x(y) is multiple-valued, one writes φy(Y) = φ1(Y) + φ2(Y) + . . . , where φ1(Y), φ2(Y), . . . are the frequency functions obtained from Eq. (2) for the respective single-valued “branches” x1(y), x2(y), . . . of x{y). EXAMPLE: If

image

(c) For single-valued, measurable y(x), f(y),

image

whenever this expected value exists; note that neither reciprocal one-to-one correspondence nor differentiability has been assumed for y(x). In particular, substitution of f(y) = esy in Eq. (4) yields the moment-generating function My(s) = E{esy}, and substitution of f(y) = eiqy produces the characteristic function χy(q) ≡ E{eiqy] (Sec. 18.3-8). If the integrals can be calculated, one may then use Eq. (18.3-25) to find φy(y) or py(y).

EXAMPLE: Let

image

where a is a constant, and x is uniformly distributed between 0 and 2π. Then

image

where we have used the symmetry properties of sin x and the fact that dy =

image. It follows that

image

(see also Sec. 18.11-1b).

(d) By an extension of the convolution theorem of Sec. 8.3-3 to bilateral Laplace transforms (Sec. 8.6-2), Eq. (4) can be rewritten as

image

where the integration contour parallels the imaginary axis in a suitable absolute-convergence strip; the quantity in square brackets is seen to be the bilateral Laplace transform of f[y(x)] (see also Sees. 8.6-2 and Table 8.6-1). The complex contour integral (5) may be easier to compute than the integral (4).

(e) Note that, in general, E{y(x)}y{E{x}) (see also Sec. 18.5-3).

18.5-3. Linear Functions (or Linear Transformations) of a One-dimensional Random Variable, (a) If x is a continuous random variable, and y = ax + b, then

image

(b) If the mean values in question exist,

image

image

The semi-invariants (Sec. 18.3-9) αi † of y = ax + b are related to the semi-invariants αi of x by image.

(c) Of particular interest is the linear transformation to standard units

image

is called a standardized random variable (see also Sec. 18.8-3).

(d) If y = y(x) is approximately linear throughout most of the spectrum of x, it is sometimes permissible to use the approximations

image

where y´(x) = dy/dx.

18.5-4, Functions and Transformations of Multidimensional Random Variables. (a) If the random variables

image

are single-valued measurable functions of the n random variables x1, x2, . . . , xN for all x1, x2 . . . , xN, then the probability distribution of each random variable yi is uniquely determined by the joint distribution of x1, x2 . . . , xN, and the same is true for each joint or conditional distribution involving a finite set of random variables yi

Thus the distribution function of yi and the joint distribution function of yi and yk are, respectively,

image

(b) If x ≡ (xi, x2 . . . , xN) and y ≡ (y1,y2,. . ., yN) are continuous random variables related by a reciprocal one-to-one (nonsingular) transformation (11), their respective frequency functions φx (X1, X2, . . . , XN) and φy(Y1,Y2, . . . , YN) are related by

image

for all Y1, Y2, . . . , YN such that the Jacobian exists and is continuous.

If x(y) is multiple-valued, φy(Y1, Y2, . . . , YN) may be computed in a manner analogous to that outlined in Sec. 18.5-2b.

(c) For single-valued, measurable yi = yi(x1, x2, . . . , xN) (i = 1, 2, . . . , m) and f(y1, y2 . . . , ym),

image

whenever this expected value exists. As in See. 18.5-2c, neither reciprocal one-to-one correspondence nor differentiability has been assumed.

Table 18.5-1. Distribution of the Sum x = x1 + x2 + ... + xN of n Independent Random Variables (see also Sees. 18.5-7, 18.6-5, and 19.3-3)

image

Substitution of f = exp (s1y1 + s2y2 + . . .+ smym) yields the joint moment-generating function of y1, y2 . . . , ym, and substitution of f = exp (iq1y1 + iq2y2 + . . . + iqmym) yields the joint characteristic function. Transform methods analogous to Eq. (5) may be useful. Such methods have been successfully applied to special random-process problems (Sec. 18.12-5).

(d) For any two random variables x1, x2,

image

if this quantity exists. If x1, x2, . . . , xN are statistically independent, then

image

if this quantity exists.

(e) If y = x1x2,and φx1(x1) = 0 for x1 < 0, then

image

(Sec. 18.5-4b), and

image

Other suitable functions y = y(x1, x2) can be treated in a similar manner.

18.5-5. Linear Transformations (see also Sees. 14.5-1 and 14.6-1). For every non-singular linear transformation

image

the respective joint distributions of x1, x2, . . . , xN and y1, y2, . . . , yN are of equal rank (Sec. 18.4-8c), and

image

if the quantities in question exist. Λ ̒ ≡ [λ ̒ik] is the moment matrix (Sec. 18.4-8c) of (y1 y2, . . . , yN). The methods of Sec. 13.5-5 make it possible to find

      1. An orthogonal transformation (18) such that the new moment matrix [λ ̒ik] (and hence also the correlation matrix image is diagonal (transformation to uncorrelated variables yi).

      2. A transformation (18) such that η1 = η2 = . . . = ηN = 0 and λ ̒ik = δik (trans formation to uncorrelated standardized variables yi (see also Secs. 18.8-6b and 18.8-8). The matrix [ E {xi*xk} ]must be nonsingular .

18.5-6. Mean and Variance of a Sum of Random Variables, (a) For any two (not necessarily statistically independent) random variables x1, x2

image

if the quantities in question exist.

(b) More generally,

image

(c) If y = y(x1, x2, . . . , xN) is approximately linear throughout most of the joint spectrum of (x1, x2, . . . , xN), it may be permissible to use the approximation

image

and to compute approximate values of E{y} and Var {y} by means of Eqs. (19) and (20) (see also Sec. 18.5-7).

18.5-7. Sums of Statistically Independent Random Variables (refer to Sec. 18.8-9 for examples), (a) If x1 and x2 are statistically independent random variables, then

image

where the subscripts 1 and 2 refer to the respective distributions of x1 and x2 as in Secs. 18.4-2, 18.4-3, and 18.4-7 (see also Table 18.5-1).

(b) More generally, if x = x1 + x2 + . . . + xN is the sum of n < ∞ statistically independent random variables x1, x1, . . . , xN,

image

and, if the quantities in question exist,

image

where Kr(i) is the rth-order semi-invariant of xi. Equations (24) and (26) permit the computation of higher-order moments with the aid of the relations given in Sec. 18.3-10.

(c) The distribution of the sum z = (z1, z2, . . .) = x + y of two suitable statistically independent multidimensional random variables x = (x1, x2, . . .) and y(y1,y2, . . .) is described by

image

18.5-8. Compound Distributions. Let x1, x2 , . . . be independent random variables each having the same probability distribution, and let & be a discrete random variable with spectral values 0, 1, 2, ... ; let k be statistically independent of x1, x2, . . . . If the generating functions γx1(s) and γk(s) exist, the distribution of the sum x= x1 + x2 + . . . + xk is given by its generating function

image

18.6. CONVERGENCE IN PROBABILITY AND LIMIT THEOREMS

18.6-1. Sequences of Probability Distributions. Convergence in Probability (see also Sec. 18.6-2). A sequence of random variables y1, y2, .. . converges in probability to the random variable y (yN converges in probability to y as n → ∞) if and only if the probability that yN differs from y by any finite amount converges to zero as n → ∞, or

image

An m-dimensional random variable yN converges in probability to the m-dimensional random variable y as n → ∞ if and only if each component variable of yN converges in probability to the corresponding component variable of y.

If the m random variables yn1, yn2, . . . , ynm converge in probability to the respective constants α1, α2, . . . , αm as n → ∞, then any function g(yn1, yn2, . . . , ynm) expressible as a positive power of a rational function of yn1, yn2, . . ., ynm converges in probability to g(α1, α2. . . , αm), provided that this quantity is finite.

18.6-2. Limits of Distribution Functions, Characteristic Functions, and Generating Functions. Continuity Theorems, (a) yN converges in probability to y as n x2192; n → ∞ if and only if the sequence of distribution functions Φyn(Y) converges to the limit Φy(Y) for all Y such that Φy(Y) is continuous.

(b) yN converges in probability to y as n → ∞ if and only if the sequence of characteristic functions Xyn(q)converges to a limit continuous for q = 0; in this case image (Continuity Theorem for Characteristic Functions).

(c) A sequence of discrete random variables y1, y2 . . . converges in probability to the discrete random variable y as n → ∞ if and only if

image

If the random variables yi, y2, . . . all have nonnegative integral spectral values 0, 1, 2, . . . and possess generating functions yVl(s), yV2(s), . . . , then Eq. (2) holds if and only if lim yVn(s) = yy(s) for all real s such that 0 ≤ s ≤ 1 (Continuity Theorem for Generating Functions). Note that a sequence of discrete random variables may converge in probability to a random variable which is not discrete (see, for example, Table 18.8-3).

(d) Analogous definitions apply if y{n) converges in probability as a function of a continuous parameter n.

(e) Analogous theorems apply to multidimensional probability distributions.

18.6-3. Convergence in Mean (see also Sec. 12.5-3). Given a random variable y having a finite mean and variance and a sequence of random variables y1, y2, . . . all having finite mean values and variances, yN converges in mean (in mean square) to y as n → ∞ if and only if

image

Convergence in mean implies convergence in probability, but the converse is not true;image as n → ∞ does not even imply that E{y} or Var {y} exists.

18.6-4. Asymptotically Normal Probability Distributions (refer to Table 18.8-3 and Sec. 19.5-3 for examples). The (probability distribution of a) random variable yN with the distribution function Φ(Y, n) is asymptotically normal with mean ηN and variance σN2 if and only if there exists a sequence of pairs of real numbers ηN, σN2 such that the random variable (yN — ηN)/σN converges in probability to a standardized normal variable (Sec. 18.8-3). This is true if and only if for all a, b > a

image

Equation (4) permits one to approximate the probability distribution of yN by a normal distribution with mean ηN and variance σN2 for sufficiently large n. Note that Eq. (4) does not imply that ηN and σ2 are the mean and variance of yN, that the sequence y1, y2, .. . converges in probability, or that E{yN} and ηN or Var {yN} and σN2 converge to the same limits; indeed, these limits may not exist.

18.6-5. Limit Theorems, (a) For every class of events E permitting the definition of probabilities P[E] (Sec. 18.2-2)

The relative frequency h[E] = nE/n (Sec. 19.2-1) of realizing the event E in n independent repeated trials (Sec. 18.2-4) is a random variable which converges to P[E] in mean, and thus also in probability, as n → σ {Bernoulli 's Theorem).

h[E] is asymptotically normal with mean P[E] and variance image {1 - P[E] } (see also Table 18.8-3).

Note that (see also Table 18.8-3) *

image

(b) Let x1, x2 . . . be a sequence of statistically independent random variables all having the same probability distribution with (finite) mean value ξ. Then, as n ↔ ∞

The random variable image converges in probability to ξ (Khinchiney's Theorem, Law of Large Numbers).

x is asymptotically normal with mean ξ and variance σ2/n, provided that the common variance σ2 of x1 x2, . . . exists (Lindeberg-Levy Theorem, Central Limit Theorem; see also Sees. 19.2-3 and 19.5-2).

(c) Let x1 x2, . . . be any sequence of statistically independent random variables having (finite) mean values ξ1, ξ2, . . . and variances σ12, σ22 .... Then, as n → ∞,

1. σN2 → 0 impliesimage(Chebyshev's Theorem).

* See footnote to Sec. 18.3-4.

image

(Central Limit Theorem, Lindeberg conditions).

The Lindeberg conditions are satisfied, in particular, if there exist two positive real numbers a and b such that E{|xi|2+a} exists and is less than bσi2 for i = 1, 2, . . . (Lyapunov x0027;s Condition). See also Table 18.5-1

NOTE: The limit theorems are of special importance in statistics (Sees. 19.2-1 and 19.2-3).

18.7. SPECIAL TECHNIQUES FOR SOLVING PROBABILITY PROBLEMS

18.7-1. Introduction. Most probability problems require one to compute the distribution of a random variable x (or the distributions of several random variables) from given conditions specifying the distributions of other random variables x1, x2, . . . . As a rule, the simple events labeled by values of x are compound events corresponding to various logical combinations of values of x1, x2, . . . . The first step in the solution of any such problem must be the unequivocal definition of the fundamental probability set labeled by each fandom variable. The probabilities of compound events may then be computed by the methods of Sees. 18.2-2 to 18.2-6 and 18.5-1 to 18.7-3. Equation (18.3-3), (18.3-6), (18.4-7), or (18.4-27) may be used to check computations.

18.7-2. Problems Involving Discrete Probability Distributions: Counting of Simple Events and Combinatorial Analysis. Each fundamental probability set labeled by the spectral values of a discrete random variable (Sec. 18.3-1) is a countable set of simple events. The following relations (either alone or in combination with the relations of Sees. 18.2-2 to 18.2-6) aid in computing probabilities of compound events:

(a) If, as in many games of chance, equal probabilities are assigned to each of the N simple events of a given finite fundamental probability set, then the probability of realizing a compound event (“success ”) defined as the union (Sec. 18.2-1) of N1 specified simple events (“favorable ” simple events) can be computed as

image

(b) Given a countable (finite or infinite) fundamental probability set, let an event E be defined as the union of N1 simple events each having the probability ph N2 simple events each having the probability p2, . . . ; then

image

Ni + N2 + . . . need not be finite.

(c) Given N1 simple events E', N2 simple events E", . . . , and NN simple events E(n) respectively associated with n independent component experiments (Sec. 18.2-4), there exist exactly N1N2 . . . NN simple experiments [E' ∩ E" ∩ . . . ∩ E(n)][E', E", • • • , E(n)].

(d) In many problems, the simple events under consideration are various possible arrangements of a given set or sets of elements, so that the numbers N1, N2, . . . in (a), (6), and (c) above are numbers of permutations, combinations, etc. The most important relevant definitions and formulas are given in Appendix C.

18.7-3. Problems Involving Discrete Probability Distributions: Successes and Failures in Component Experiments. Compound events are often described in terms of the results obtained in component experiments each admitting only two possible outcomes (“success” and “failure”). The probabilities of various compound events can be computed by the methods of Sees. 18.2-2 to 18.2-6 from the respective probabilities ϑ1, ϑ2, . . . of success in the first, second, . . . component experiment.

The methods of Sees. 18.5-6 to 18.5-8 may become applicable if one labels the events “success” and “failure” in the kth-component experiment with the respective spectral values 1 and 0 of a discrete random variable xk whose distribution is described by

image

Successes in two or more independent experiments are, by definition, statistically independent events (Sec. 18.2-4). Repeated independent trials (Sec. 18.2-4) each having only two possible outcomes are called Bernoulli trials 1 = ϑ2 = . . . = ϑ). The probability of realizing exactly x= x1 + x2 + • • • + xN successes in n Bernoulli trials is given by the binomial distribution (Table 18.8-3). If the trials are independent, but the ϑk are not all equal, one obtains the generalized binomial distribution of Poisson.

A subsequence of r successes or failures in any sequence of n trials is called a run of length r of successes or failures (see also Ref. 18.4, Chap. 13).

18.8. SPECIAL PROBABILITY DISTRIBUTIONS

18.8-1. Discrete One-dimensional Probability Distributions.*

Tables 18.8-1 to 18.8-7 describe a number of discrete one-dimensional distributions of interest, for instance, in connection with sampling problems and games of chance. The generating function rather than the characteristic function or the moment-generating function is tabulated: the latter two functions are easily obtained from

image

(see also Sec. 18.3-8). Moments not tabulated are also easily derived by the methods of Sec. 18.3-10.

Table 18.8-1. The Casual Distribution (see also Table 18.8-8)

image

Table 18.8-2. The Hypergeometric Distribution

image

Table 18.8-3. The Binomial Distribution (Fig. 18.8-1; see also Sec 18.7-3)

image

image

18.8-2. Discrete Multidimensional Probability Distributions (see also Sec. 18.4-2). (a) A multinomial distribution is described by

image

where ϑ1, ϑ2, . . . , ϑn are positive real numbers such that

image

image

FIG. 18.8-2. The Poisson distribution. (From Goode, H. H., and R. E. Machol, System Engineering, McGraw-Hill, New York, 1957.)

image

Given an experiment having n mutually exclusive results E1, E2, . . . , EN with respective probabilities ϑ1, ϑ2, . . . , ϑn such that ϑ + ϑ2 + . . . + ϑn = 1, the expression (1) is the probability that the respective events E1, E2, . . . , EN occur exactly x1, x2, ... , xN times in N independent repeated trials (see also Sec. 18.7-3). In classical statistical mechanics, x1 x2, . . . , xN are the occupation numbers of n independent states with respective a priori probabilities ϑ1, ϑ2, . . . , ϑn

Table 18.8-5. The Geometric Distribution

image

Table 18.8-6. Pascal's Distribution

image

Table 18.8-7. Polya's Distribution (Negative Binomial Distribution)

image

(b) A multiple Poisson Distribution is described by

image

18.8-3. Continuous Probability Distributions: The Normal (Gaussian) Distribution. A continuous random variable z is normally distributed (normal) with mean ξ and variance σ2 [or normal with parameters ξ, σ2; normal with parameters ξ, σ; normal (ξ, σ2); normal (ξ, σ)] if

image

The distribution of the standardized normal variable (normal deviate) image (see also Sec. 18.5-3c) is given by

image

(see also Fig. 18.8-3 and See. 18.8-4). erf z is the frequently tabulated error function (normal error integral, probability integral; see also Sec. 21.3-2)

image

φ(X) has points of inflection for X = ξ ± a. Note

image

where Hk(z) is the kih Hermite polynomial (Sec. 21.7-1).

Every normal distribution is symmetric about its mean value ξ; ξ is the median and the (single) mode. The coefficients of skewness and excess are zero, and

image

The moments αr about the origin may be computed by the methods of Sec. 18.3-10.

The normal distribution is of particular importance in many applications, especially in statistics (Sees. 19.3-1 and 19.5-3).

18.8-4. Normal Random Variables: Distribution of Deviations from the Mean. (a) For any normal random variable x with mean ξ and variance σ2,

image

image

FIG. 18.8-3. (a) The normal frequency function

image

and (b) the normal distribution function

image

(From Burington, R. S., and D. C, May, Handbook of Probability and Statistics, McGraw-Hill, Nm York, 1953,)

image

Table 18.8-8. Continuous Probability Distributions

image

are often referred to as tolerance limits of the normal deviate u or as a. values of the normal deviate (see also Sec. 19.6-4). Note

image

(c) Note the following measures of dispersion for normal distributions (see also Table 18.3-1):

The mean deviation (m.a.e)image

The probable deviation (p.e., median of |x – ξ|) image

One-half the half widthimage

image

The precision measureimage (see also Sees. 18.8-3, 19.3-4, 19.3-5, and 19.5-3)

image

18.8-5. Miscellaneous Continuous One-dimensional Probability Distributions. Table 18.8-8 describes a number of continuous one-dimensional probability distributions (see also Sees. 19.3-4, 19.3-5, and 19.5-3).

18.8-6. Two-dimensional Normal Distributions, (a) A two-dimensional normal distribution is a continuous probability distribution described by a frequency function of the form

image

The marginal distributions of x1 and x2 are both normal with respective mean values ξ1, ξ2 and variances σ12, σ22; ρ12 is the correlation coefficient of x1 and x2. The five parameters ξ1, ξ2, σ1, σ2, ρ12 define the distribution completely.

The conditional distributions of x1 and x2 are both normal, with

image

so that the regression curves are identical with the mean-square regression lines (Sec. 18.4-6). x1 and x2 are statistically independent if and only if they are uncorrelated12 = 0, see also Sec. 18.4-11). Note

image

(b) Every two-dimensional normal distribution (16) can be described in terms of standardized normal variables u1, u2 with the correlation coefficient ρ12, or in terms of statistically independent standardized normal variables (Sec. 18.5-5). Thus

image

(c) The distribution (16) is represented graphically by the contour ellipses φ(x1, x2) = constant, or

image

The probability that the “point” (x1, x2) is inside the contour ellipse (22) is

image

i.e., λ2 = χP2(2) (Table 19.5-1). The two mean-square regression lines respectively defined by Eqs. (17) and (18) bisect all contour-ellipse chords in the x1 and x2 directions, respectively (see also Sec. 2.4-6).

18.8-7. Circular Normal Distributions. Equation (16) represents a circular normal distribution with dispersion a about the center of gravity (ξ1, ξ2) if and only if ρ12 = 0, σ1 = σ2 = σ. The contour ellipses (22) become circles corresponding to fractiles of the radial deviation (radial error) image The distribution of r is given by

image

(see also Sec. 18.11-16 and Table 19.5-1).

Circular normal distributions are of particular interest in problems related to gunnery; circular probability paper shows contour circles for equal increments of Φr(R). Note

image

18.8-8. n-Dimensional Normal Distributions. * The joint distribution of n random variables x1 x2, . . . , xN is an n-dimensional normal distribution if and only if it is a continuous probability distribution having a frequency function of the form

image

* See footnote to Sec. 18.4-8.

Each normal distribution is completely defined by its center of gravity1, ξ2, . . . , ξN) and its moment matrixjk] ≡ [Λjk]-1, or by the corresponding variances and correlation coefficients (Sec. 18.4-8). The characteristic function is

image

Each marginal and conditional distribution derived from a normal distribution is normal. All mean-square regression hypersurfaces are identical with the corresponding mean-square regression hyperplanes (Sec. 18.4-9). n random variables x1, x2, . . . , xN having a normal joint distribution are statistically independent if and only if they are vjicorrelated (see also Sec. 18.4-11).

Each w-dimensional normal distribution can be described as the joint distribution of n statistically independent standardized normal variables related to the original variables by a linear transformation (18.5-15).

18.8-9. Addition Theorems for Special Probability Distributions * (see also Sec. 18.5-7 and Table 19.5-1). (a) The binomial distribution (Table 18.8-3), the Poisson distribution (Table 18.8-4), and the Cauchy distribution (Table 18.8-8) “reproduce themselves” on addition of independent variables. If the random variable x is defined as the sum

image

of n statistically independent random variables x1, x2, . . . , xN, then

image

(b) The sum x = x1+x2+ . . . + xN of n statistically independent random variables x1, x2, . . . , xN is a normal variable if and only if x1, x2 . . . , xN are normal variables. In this case,

image

If x1, x2, . . . , xN are (not necessarily statistically independent) normal variables, then x = a1x1 + a2x2 + . . . + aNxN is a normal variable whose mean and variance are given by Eq. (18.5-19).

* See footnote to Sec. 18.3-4.

18.9. MATHEMATICAL DESCRIPTION OF RANDOM PROCESSES

18.9-1. Random Processes.  Consider a variable x capable of assuming different values x(t) for different values of an independent variable t. A random process (stochastic process) selects a specific sample function x(t) from a given theoretical population (Sec. 19.1-2) or ensemble of possible sample functions. More specifically, the functions x(t) are said to describe a random process if and only if the sample values x1 = x(t1), x2= x(t2), . . . are random variables admitting definition of a joint probability distribution for every finite set of values (sampling times) t1, t2, . . . (Fig. 19.8-1). The random process is discrete or continuous if the joint distribution of x(t1), x(t2), . . . is, respectively, discrete or continuous for every finite set t1, t2, . . . . The process is a random series if the independent variable t assumes only a countable set of values. More generally, a random process may be described by a multidimensional variable x(t) ≡ [x(t), y(t), . . .].

The definition of a random process implies the existence of a probability distribution on the (in general, infinite-dimensional) sample space (Sec. 18.2-7) of possible functions x(t). Each particular function x(t) ≡ X(t) constitutes a simple event [sample point, “value” of the multidimensional random variable x(t)].

In most applications the independent variable t is the time, and the variable x(t) or x(t) labels the state of a physical system.  EXAMPLES: Results of successive observations, states of dynamical systems in Gibbsian statistical mechanics or quantum mechanics, messages and noise in communications systems, economic time series.

18.9-2. Mathematical Description of Random Processes.  (a) To describe a random process, one must specify the distribution of x(t1) and the respective joint distributions of [x(t1), x(t2)], [x(t1), x(t2), x(t3)], . . . for every finite set of values t1, t2, t3, . . . (first, second, third, . . . probability distributions associated with the random process). These distributions are described by the corresponding first, second, ... (or first-order, second-order, . . .) distribution functions (see also Sec. 18.4-7)

image

or, respectively for discrete and continuous random processes, by the corresponding probabilities and frequency functions

image

NOTE: The sequence of distribution functions (la) describes the random process in increasing detail, since each distribution function Φ(n) completely defines all preceding ones as marginal distribution functions (Sec. 18.4-7). The same is true for each sequence (1b). Each of the functions (1) is symmetric with respect to (unaffected by) interchanges of pairs Xi, ti and Xk, tk.

(b)Conditional probability distributions descriptive of the random process are related to the functions (1b) in the manner of Sec. 18.4-7; thus

image

NOTE: The functions (2) are not in general symmetric with respect to interchanges of pairs Xi, ti and Xk, tk separated by the bar.

(c)A multidimensional random process, say one generating a pair of sample functions x(t), y(t), is similarly defined in terms of joint distributions of sample values x(ti), y(tk). In particular,

image

18.9-3. Ensemble Averages.  (a) General Definitions. The ensemble average (statistical average, mathematical expectation) of a suitable function f[x(t1), x(t2), . . . , x(tn)] of n sample values x(t1), x(t2), . . . , x(tn) (statistic, see also Sec. 19.1-1) is the expected value (Sec. 18.4-8a)

image

if this limit exists in the sense of absolute convergence.  Integration in Eq. (4) is over X1, X2, . . . , Xn] E{f} is a function of t1, t2, . . . , tN.

Similarly, for a multidimensional random process described by x(t), y(t),

image

if the limit exists in the sense of absolute convergence.

(b)Ensemble Correlation Functions and Mean Squares.  The ensemble averages E{x(t1)} = ξ(t1), E{x2(t1)}, and

image

are of special interest.  They abstract important properties of the random process and are frequently all that is known about the process: note that

image

The definitions (6) and Eq. (7) apply to real x(t), y(t). If x(t) and/or y(t) is a complex variable (really a two-dimensional random variable), then one defines

image

which includes (6) as a special case; Rxy is necessarily real for real x and y.

Note that, for real or complex x, y,

image

Existence of the quantities on the right implies that of the correlation functions on the left.

(c)Characteristic Functions.  The nth characteristic function corresponding to the nth distribution function (la) of the random process (see also Sec. 18.4-10) is

image

Joint characteristic functions for x(t), y(t), . . . are similarly defined. Characteristic functions can yield moments like E{x(t1)}, E{x2(t1)}, Rxx(t1, t2), . . . by differentiation in the manner of Sees. 18.3-10 and 18.4-10.

(d)Ensemble Averages of Integrals and Derivatives(see also Sec. 18.6-3).Random integrals of the form

image

are defined in the sense of convergence in probability (Sec. 18.6-1) or, if possible, in the mean-square sense of Sec. 18.6-3. The integral converges in mean (in the sense of Sec. 18.6-3) if and only if

image

exists.  If image dt exists, then the integral (12) exists in the sense of absolute convergence for each sample function x(t), except vossibly for a set of probability 0, and

image

The important relation (14) is needed, in particular, to derive the input-output relations of Sec. 18.12-2 (see also Refs. 18.13 to 18.17).

The random process generating x(t) is continuous in the mean (mean-square continuous) at t = to in the sense of Sec. 18.6-3 if and only if

image

this is true if and only if Rxx(t1,t2) exists and is continuous for t1 = t2 = t0. The random process generating image will be called the mean-square derivative of a random process generating x(t) if and only if

image

This is true if and only if 2Rxx(t1, t2)/∂t1t2 exists and equals 2Rxx(t1, t2)/∂t2t1 for all t1 = t2.  It follows that

image

(see also Sec. 18.12-2).

18.9-4. Processes Defined by Random Parameters.  It is often possible to represent each sample function of a random process as a deterministic function x = x(t); η1, η2, . . .) of t and a set of random parameters η1, η2, . . . . The process is then denned by the joint distribution of η1, η2, . . . ; in this case,

image

In particular, each probability distribution of such a random process is uniquely defined by its characteristic function (Sec. 18.4-10)

image

18.9-5. Orthonormal-function Expansions.  Given a real or complex random process x(t) with E{x(t)} finite and Rxx(t1, t2) bounded and continuous on the closed observation interval [a, b], there exist complete orthonormal sets of functions u1(t), u2(t), . . . (Sec. 15.2-4) such that

image

where the series and the integral for each ck converges in mean in the sense of Sec. 18.6-3 (see also Sec. 18.9-3d). The random process is, then, represented by the set of random coefficients c1, c2, . . . ; the first n coefficients may give a useful approximate representation. In particular, there exists a complete orthonormal set uk(t) ≡ Ψk(t) such that all the ck are uncorrelated standardized random variables, i.e.,

image

(Karhunen-Loeve Theorem).  Specifically, the required Ψk{t) are the eigenfunctions of the integral equation

image

(see also Sec. 15.3-3).  The corresponding eigenvalues λk, are nonnegative and have at most a finite degree of degeneracy (by Mercer's theorem, Sec. 15.3-4), and

image

The Karhunen-Loéve theorem constitutes a generalization of the theorem of Sec. 18.5-5.

EXAMPLES: Periodic random processes (Sec. 18.11-1), band-limited flat-spectrum noise (Sec. 18.11-2b).  Although explicit analytical solution of the integral equation (14) is rarely possible, the theorem is useful in detection theory (Ref. 19.24).

18.10. STATIONARY RANDOM PROCESSES. CORRELATION FUNCTIONS AND SPECIAL DENSITIES

18.10-1. Stationary Random Processes.  A random process, or the corresponding ensemble of functions x(t), is stationary if and only if each of its probability distributions is unchanged when t is replaced by t + to, so that

image

i.e., the nth probability distribution depends only on a set of n1 differences

image

of sampling times tk.  Similarly, two or more random processes generating x(t), y(t), . . . are jointly stationary if and only if their joint probability distributions are unchanged when t is replaced by t + t0.

For stationary and jointly stationary random processes, each ensemble average (18.9-4) or (18.9-5) depends only on n — 1 differences (2):

image

for every t1 (see also Sec. 18.10-2).

18.10-2. Ensemble Correlation Functions(see also Sec. 18.9-3b). (a) For stationary x(t) [and jointly stationary x(t), y(t)], the expected values

E{x(t)} ≡ E{x} = ξ E{|x(t)|2} ≡ E{|x|2} E{y(t)} ≡ E{y} = η . . .

are constant, and the ensemble correlation functions (18.9-8) reduce to functions of the delay t2t1 = separating t1 and t2.  In this case,

image

image

image

Again, existence of the quantities on the right implies existence of the correlation functions on the left.  If Rxx() is continuous for image = 0, it is continuous for all image (Ref. 18.17).

[Rxx(titk)] is a positive-semidefinite hermitian matrix (Sec. 13.5-3) for every finite set t1, t2, . . . , tn.

(b)Normalized ensemble correlation functions are defined by

image

Note |ρxx| ≤ 1, |ρxy| ≤ 1.  For real stationary x, y, ρxx and ρxy are real correlation coefficients (Sec. 18.4-4), and Eq. (4) implies

image

Random processes which are not stationary or jointly stationary but have constant E{x(t)}, E{y(t)} and “stationary correlation functions” satisfying Eq. (4) are often called stationary, or jointly stationary, in the wide sense.

18.10-3. Ensemble Spectral Densities.  If x(t) is generated by a stationary random process, and x(t), y(t) by jointly stationary random processes, the ensemble power spectral density Φxx(ω) and the ensemble cross-spectral density Φxy(ω) are defined by

image

Assuming suitable convergence, this implies

image

The Fourier transforms (9) are introduced, essentially, to simplify the relations between input and output correlation functions in linear time-invariant systems (Sec. 18.12-3).  Existence of the transforms (9) requires, besides the existence of E{|x|2} and E{|y|2} (Sec. 18.9-3b), that Rxx(r) or Rxy(imge) or Rxy(imge) decays sufficiently quickly as imge → ∞.  In the case of periodic and d-c processes, one extends the definitions of spectral densities to include delta-function terms chosen so that Eq. (10) is satisfied (Sec. 18.10-9).

18.10-4. Correlation Functions and Spectra of Real Processes. The relations (9) and (10) apply to both real and complex random processes x(t), y(t).  Note that the power spectral density Φxx(ω) is always real, even if x is complex; but the cross-spectral density Φxy(ω) may be a complex function even for real x, y.  If x and y are real, the same is true for the correlation functions Rxx(τ), Rxy(τ).  In this case,

image

image

image

Note again that Eqs. (11) to (13) apply to real x, y.

18.10-5. Spectral Decomposition of Mean “Power” for Real Processes.  For real x(t), substitution of τ = 0 in Eqs. (11) and (12) yields

image

This is interpreted as a spectral decomposition of E{x2} (mean “power”). In the first integral, contributions to E{x2} are “distributed” over both positive and negative frequencies with density Φxx(ω) (“two-sided” power spectral density), measured in (x units)2/cps, sinceω/2π is frequency in cps. Alternatively, we can consider E{x2} as distributed only over nonnegative (“real”) frequencies with the “one-sided” power spectral density 2Φxx(ω) (x units)2/cps.

Intuitive interpretation of the—in general complex—cross-spectral density Φxy(ω) is not quite so simple.  For real x(t), y(t), substitution of r = 0 in Eq. (10) yields

image

Re Φxy(ω) is often called a cross-power spectral density.  Im Φxy(ω) (cross-quadrature spectral density) does not contribute to the mean “power” (15).

18.10-6. Some Alternative Ensemble Spectral Densities.  Other spectral-density functions found in the literature are

image

(v = ω/2π; two-sided spectral density in x units2/cps)

image

(two-sided spectral density in x-units2/ radian/ sec)

and the one-sided spectral densities

image

Note that Γxx(v) and Gxx(ω) are defined only for nonnegative frequencies. Similar definitions also apply to cross-spectral densities. Note that symbols and definitions vary greatly in the literature; the correct definition should be restated and referred to in each case.

18.10-7. t Averages and Ergodic Processes.  (a) t Averages.  Given any function x(t), the t average (average over t, frequently a time average) of a measurable function f[x(t1), x(t2), . . . , x(tn)] is defined as

image

if the limit exists.*  If x(t) describes a random process, then <f> is (like f, but unlike E{f}) a random variable (statistic) for each given set of values t1, t2, . . . , tn.  Note that

image

whenever the integrals exist.

(b)Ergodic Processes.  A (necessarily stationary) random process generating x(t) is ergodic if and only if the probability associated with every stationary subensemble is either 0 or 1. Every ergodic process has the ergodic property: the t average (20) of every measurable function f[x(t1), x(t2), . . . , x(tn)] equals its ensemble average (18.9-4) with probability one, i.e.,

image

whenever these averages exist. Any one of the functions x(t) will then define the random process uniquely with probability one, e.g., in terms of the characteristic functions (18.9-11) computed from x(t) by means of Eq. (21). Each t average, such as <x>, <x2>, or Rxx(τ), will then

* The notation image is sometimes used instead of <f>, as well as instead of E[f}; but the symbol image is preferably reserved for the sample average

image

where kf is the value of f obtained from one of an empirical random sample of n sample functions x(t) = kx(t) (k = 1, 2, . . . , n; see also Sec. 19.8-4).

represent, with probability one, a property common to the entire ensemble of functions x(t).

Two or more jointly stationary random processes are jointly ergodic if and only if the probability associated with every stationary joint sub-ensemble is either 0 or 1.  The ergodic theorem applies to averages computed from sample values of jointly ergodic processes.

18.10-8. Non-ensemble Correlation Functions and Spectral Densities.  Given the real or complex functions x(t), y(t) (which may or may not be sample functions of a random process) such that

image

exist, the t averages

image

exist.  These correlation functions satisfy all the relations listed in Sec. 18.10-2, if each ensemble average (expected value) is replaced by the corresponding t average.  Again, the (non-ensemble) power spectral density Ψxx(ω) and the cross-spectral density Ψxy(ω) are introduced through the Wiener-Khinchine relations

image

If these “individual” spectral densities exist (one formally admits delta-function terms, Sec. 18.10-9), they satisfy relations analogous to those listed in Sees. 18.10-3 to 18.10-5.  Alternative non-ensemble spectral densities can be defined in the manner of Sec. 18.10-6.

If x(t), y(t) are sample functions of jointly stationary random processes, then the correlation functions (24), (25) and the spectral densities (26) are random variables whose expected values equal the corresponding ensemble functions whenever they exist.  If x(t), y(t) are jointly ergodic, then the correlation functions (24), (25) and the spectral densities (26) are identical to the corresponding ensemble quantities with probability one.

As an alternative definition, spectral densities are sometimes introduced by the formal relation

image

where aT(ω) and bT(ω) are Fourier transforms of the “truncated” functions xT(t), yT(t) respectively equal to x(t), y(t) for |t| < T and zero for |t| > T:

image

The corresponding ensemble spectral density Φxy(ω) may then be defined by Φxy(ω) = Exy(ω)}, and the Wiener-Khinchine relations (26) follow from Borel's convolution theorem (Table 4.11-1).  In general, however, Eq. (27) is valid only if both sides appear in an integral over ω (in particular, spectral densities often contain delta-function terms, Sees. 18.10-9 and 18.11-5; see also Sec. 18.10-10).

18.10-9. Functions with Periodic Components (see also Sec. 18.11-1).  Like other t averages, non-ensemble correlation functions and spectra are of interest mainly if they happen to equal the corresponding ensemble quantities with probability one (this is true for all t averages in the case of ergodic processes, Sec. 18.10-76). When this is true, the single integrals (24), (25) may be easier to compute than the double integrals (4).  The ergodic property also permits interpretation of, say, Φxx(ω) in terms of the “frequency content” of a single “typical” sample function x(t), since Φxx(ω) = Ψxx(ω) with probability one.

Without recourse to probability theory, non-ensemble correlation functions and spectra can be computed only for functions x(t), y(t) representable as sums of periodic components (except for the trivial case that the correlation function or spectral density is identically zero).  In particular, for

image

More generally, let x(t) be a real function and of bounded variation in every finite interval and such that <|x(0)|2> exists.  Then x(t) can be represented almost everywhere (Sec. 4.6-146) as the sum of its average value <x(0)> = c0, a countable set of periodic components, and an aperiodic component* p(t):

image

* The aperiodic component p(t) may be expressible as a Fourier integral [<|x(0)|2> = 0], or <|x(0)|2> may be different from zero (“random” component); or p(t) may be a sum of both types of terms.

image

Let y(t) be another real function y(t) satisfying the same conditions as x(t), so that

image

The set of circular frequencies ω1, ω2, . . . is understood to include the periodic-component frequencies of both x(t) and y(t).  Then

image

The cross correlation function Rxy(τ) measures the “coherence” of x(t) and y(t) or the “serial correlation” between the function values x(t) and y(t + τ) separated by a delay τ.  x(t) and y(t) are uncorrelated if and only if Rxy(τ) ≡ 0.

NOTE: The (real) functions x(t), y(t) belong to a complex unitary vector space with inner product (u, v) = <*(0)v(0)> (Sec. 14.2-6).  Note the useful orthogonality relations

image

18.10-10. Generalized Fourier Transforms and Integrated Spectra.  (a) To avoid the difficulties associated with delta-function terms in the Fourier transforms and spectral densities of periodic functions, one may introduce the generalized or integrated Fourier transform XINT(iω) of x(t), defined (to within an additive constant) by

image

The corresponding inversion integral is the Stieltjes integral (Sec. 4.6-17)

image

If the Fourier transform XF(iω) of x(t) exists, then

image

If x(t) can be represented as image(this is, in particular, true for periodic functions; see also Sec. 18.11-1), then XINT(iω) is a step function (Sec. 21.9-1).

(b)The integrated power spectrum ΦINT(ω) of a stationary or wide-sense stationary random process generating x(t) is the generalized Fourier transform of its autocorrelation function:

image

Analogous relations can be written for non-ensemble correlation functions and spectra.

(c)Note the following generalizations of the Wiener-Khinchine relations (9) and (26) for real stationary (or wide-sense stationary) x(t).

image

For τ = 0, Eq. (40) yields Wiener's Quadratic-variation Theorem

image

If the non-ensemble power spectral density Ψxx(ω) exists, Eq. (40) reduces to the Wiener-Khinchine relation (26), with

image

18.11 SPECIAL CLASSES OF RANDOM PROCESSES. EXAMPLES

18.11-1. Processes with Constant and Periodic Sample Functions. (a) Constant Sample Functions (Fig. 18.11-la).  If each sample function x(t) is identically equal to a constant random parameter a with given probability distribution, the latter determines the resulting random process uniquely.  The process is stationary; but it is not ergodic.  If E{a2} exists,

image

(b)Random-phase Sine Waves.   Let

image

(Fig. 18.11-1b) where a is a given constant, and the phase angle α is a random variable uniformly distributed between 0 and 2π.  The process is stationary and ergodic, with

image

If the amplitude a of the random-phase sine wave is not a constant, but is itself a (positive) random variable independent of α (as in amplitude modulation), the process is stationary but not in general ergodic.

Now

image

If, in particular, the amplitude a has a Rayleigh distribution defined by

image

(circular normal distribution with σ2 = 1, Sec. 18.8-7), then the random process is Gaussian (Sec. 18.11-3).

If the phase angle α is not uniformly distributed between 0 and 2π, then the process is nonstationary even if the amplitude a is fixed.

image

FIG. 18.11-1. Sample functions x(t) for five examples of random processes.  In Fig. 18.11-le, x(t) is the sum of the individual pulses akv(ttk) shown.

(c)More General Periodic Processes(see also Sec. 18.10-9). The random-phase sine wave is a special case of the general random-phase periodic process represented by

image

where α is uniformly distributed between 0 and 2π; it is assumed that the series converges in mean square in the sense of Sec. 18.6-3.  The

image

FIG. 18.11-2. Autocorrelation function and power spectrum for a random telegraph wave (a) and a coin-tossing sample-hold process (b) having equal mean count rates α = 1/2Δt, both with zero mean and mean square α2.  Note that different ω scales are used in (a) and (b). (From G. A. Korn, Random-process Simulation and Measurements, McGraw-Hill, New York, 1966.)

process is stationary and ergodic, with

image

A still more general periodic process is denned by the Fourier series

image

with real random coefficients c0, ak, bk, assuming that the series converges in mean square.  Such a process is wide-sense stationary if and only if

image

In this case, Eq. (8) is an orthogonal-series expansion in the sense of Sec. 18.9-5, and

image

18.11-2. Band-limited Functions and Processes.  Sampling Theorems.  (a) A function x(t) is band-limited between ω = 0 and ω = 2πB if and only if its Fourier transform XF(iω) (Sec. 4.11-3) exists and equals zero for |ω| > 2πB; B (measured in cycles per second if t is measured in seconds) is the bandwidth associated with x(t). For every band-limited x(t)

image

i.e., x(t) is uniquely determined for all t by samples x(tk)spaced 1/2B t-units apart (Nyquist-Kotelnikov-Shannon Sampling Theorem).

The functions (Fig. 18.11-3)

image

FIG. 18.11-3. The sampling function sinc image(see also Table F-21).

constitute a complete orthonormal set for the space of functions x(t) band-limited between ω = 0 and ω = 2πB (Sec. 15.2-4); note

image

(b)A stationary or wide-sense stationary random process with sample functions x(t) is band-limited between ω = 0 and ω = 2πB if and only if its ensemble power spectral density Φxx(ω) exists and equals zero for |ω| > 2πB.  In this case, the expansion (11) applies in the sense of mean-square convergence (Sec. 18.6-3), i.e.,

image

and Eq. (11) represents each sample function x(t) in terms of its sample values xk = x(k/2B) with probability one.

NOTE: In the special case of a stationary band-limited “flat-spectrum” process with

image

the sample values xk = x(k/2B) have zero mean and are uncorrelated.

18.11-3. Gaussian Random Processes (see also Sees. 18.8-3 to 18.8-8, and 18.12-6).  A real random process is Gaussian if and only if all its probability distributions are normal distributions for all t1, t2, . . . . Every Gaussian process is uniquely defined by its (necessarily normal) second-order probability distribution, and hence by the ensemble autocorrelation function Rxx(t1, t2) ≡ E{x(t1)x(t2)} together with ξ(t) ≡ E{x(t)}. Specifically, the joint distribution of every set of sample values x1 = x(t1), x2 = x(t2), . . . , xn = x(tn) is a normal distribution with probability density

image

Processes obtained through addition of Gaussian processes and/or linear operations on their sample functions are Gaussian (Sec. 18.12-2). Coefficients in orthogonal-function expansions of a Gaussian process (Sec. 18.9-5) are jointly Gaussian random variables.

18.11-4. Markov Processes and the Poisson Process.  (a) Random Process of Order n.  A random process of order n is a random process completely specified by its nth (nth-order) distribution function Φ(n)(Sec. 18.9-2), but not by Φ(n—1).

(b)Purely Random Processes.  A random process described by x(t) is a purely random process if and only if the random variables x(t1), x(t2), . . . are statistically independent for every finite set t1, t2, . . . . A purely random process is completely specified by Φ(1))(X1, t1), p(1)(X1, t1, or φ(1)(X1, t1).

EXAMPLES: Successive independent observations, Bernoulli trials, and random samples in statistics (Sec. 19.1-2) represent purely random series.  Purely random continuous-parameter processes imply sample functions of unlimited bandwidth and cannot, strictly speaking, describe real physical phenomena.

(c)Markov Processes.  A discrete or continuous random process described by x(t) is a (simple) Markov process if and only if, for every finite set t1 < t2 < . . . < tn—1 < tn,

image

respectively.  If x(tn) = Xn—1 is given, knowledge of x(tn—2), x(tn—3), . . . contributes nothing to one's knowledge of the distribution of x(tn). A Markov process is completely specified by its second-order probability distribution and hence by its first-order probability distribution together with the “transition probabilities” given by

image

A Markovian random series is often called a Markov chain.  Every purely random process is a Markov process.

Many physical processes can be described as Markov processes.  An important class of problems involves the determination of the functions (21) from their given “initial values” specified for t = t1.  The defining property (20) of a Markov process implies the Chapman-Kolmogorov-Smolu-chovski equation

image

Equation (22) is a first-order difference equation (Sec. 20.4-3) which may be solved for the unknown function (21) of the independent variable t whenever p(x, t|X1, t1) or φ(x, t|X1, t1) is suitably given.  If p(1)(X1, t1) or φ(1)(X1, t1) is known, the Markov process is now completely determined for all t > t1.

(d) The Poisson Process.  In many problems involving random searches, waiting lines, radioactive decay, etc., x{t) is a discrete random variable capable of assuming the spectral values 0, 1, 2, . . . (“counting process”; number of “successes,” telephone calls, disintegrations, etc.). A frequently useful model assumes the Markov property (20a) and

image

where ot) denotes a term such that ot)/Δt becomes negligible as Δ → 0 (Sec. 4.4-3). To find

image

substitute the given transition probabilities (23) into the Smoluchovski equation (22a) for t2 = t + Δt to obtain the difference equation

image

with P( — 1, T) ≡ 0. As Δt → 0, this reduces to an ordinary differential equation

image

for each K.  These differential equations are solved successively for P(0, T), P(l, T), P(2, T), . . . , with initial conditions given by

image

It follows that

image

Thus, once the process is started, the number K of state changes in every time interval of length T has the Poisson distribution (Table 18.8-4).  α is called the mean count rate of the Poisson process.

The probability that no state changes take place is

image

so that the probability that at least one state change takes place is

image

The time interval T1 between successive state changes is a random variable with probability density

image

and expected value 1/α.

Within any finite time interval of length T, a Poisson process is also uniquely defined by the joint distribution of the K + 1 statistically independent random variables K, t1, t2 . . . , tK, where K is the number of state changes during the time T, and t1, t2, . . . , tK are now the respective times of the 1st, 2nd, . . . , Kth state change during this time interval.  One has

image

(e)See Refs. 18.15, 18.16, and 18.17 for treatments of more general Markov processes.

18.11-5. Some Random Processes Generated by a Poisson Process. (a) Random Telegraph Wave (Fig. 18.11-lc).  x(t) equals either a or — a, with sign changes generated by the state changes of a Poisson process of mean count rate α (Sec. 18.11-4d).  The process is stationary and ergodic if started at t = — ∞, and

image

(b)Process Generated by Poisson Sampling (Fig. 18.11-ld). x(t) changes value at each state change of a Poisson process with mean count rate α; between state changes, x(t) is constant and takes continuously distributed random values x with given mean ξ and variance σ2. The process is stationary and ergodic if started at t = — ∞, and

image

(c)Impulse Noise and Campbell's Theorem (Fig. 18.11-le). x(t) is the sum of many similarly shaped transient pulses,

image

whose shape is given by v = v(t), with

image

while the pulse amplitude ak is a random variable with finite variance, and the times tk are random incidence times determined by the state changes of a Poisson process with mean count rate α.  The process is stationary and ergodic if started at t = — ∞; it approximates a Gaussian random process if many pulses overlap.  One has

image

In the special case where ak is a fixed constant, the formulas (36) are known as Campbell's theorem.

18.11-6. Random Processes Generated by Periodic Sampling. Certain measuring devices sample a stationary and ergodic random variable q(t) periodically and then hold their output x(t) for a constant sampling interval Δt.  The resulting random process is stationary and ergodic if the timing of the periodic sampling commands is random and uniformly distributed between 0 and Δt.  A sample function x(t) will be similar to Fig. 18.11-ld except that state changes must be separated by integral multiples of Δt.  If q is a binary random variable capable of assuming only the values a and —a with probabilities 1/2, 1/2, then x(t) will resemble the random telegraph wave of Fig. 18.11-lc, except that state changes are, again, separated by integral multiples of Δt (“coin-tossing” sample-hold process).

If different samples of q are statistically independent, then

image

and hence

image

Figure 18.11-2 compares Rxx(τ) and Φxx(ω) for a random telegraph wave and a coin-tossing sample-hold process with equal mean count rates α = 1/2Δt, zero mean, and E{x2} = a2.

18.12. OPERATIONS ON RANDOM PROCESSES

18.12-1. Correlation Functions and Spectra of Sums.  Let x(t), y(t) be generated by real or complex random processes.  For

image

with real or complex α, β, the correlation functions Rxz(t1, t2), Rzx(t1, t2), Rzz(t1, t2) are given by

image

These relations also apply to the correlation functions Rxz(τ), Rzx(τ), Rzz(τ) of stationary random processes; the corresponding spectral densities are

image

18.12-2. Input-Output Relations for Linear Systems.  (a) Consider a real linear system with real input x(t) and output

image

where the weighting function (Green's function, Sees. 9.3-3 and 9.4-3) is the system response to a unit-impulse input δ(t — λ) (impulse applied at t = λ), and h(t, ζ) ≡ w(t, t — ζ).

In the most important applications, t represents time, and w(t, λ) = 0 for t < λ, since physically realizable systems cannot respond to future inputs (see also Sec. 9.4-3).

(b)If x(t) is generated by a real random process, and if E{x2(t)} and E{y2(t)} exist, then

image

If x(t) is Gaussiany y(t) is also Gaussian and completely determined by Eqs. (5) to (7).

18.12-3. The Stationary Case.  (a) If the input x(t) is stationary, and

image

(time-invariant linear system, see also Sec. 9.4-3), then the system output y(t) is also stationary; y(t) will be ergodic (Sec. 18.10-7b) if this is true for x(t).  The input-output relations (4) to (7) for real x(t), y(t) reduce to

image

In most applications, physical realizability requires h(ζ) = 0 for ζ < 0 (see also Sec. 9.4-3).

(b)The important input-output relations (11) are greatly simplified if they are expressed in terms of spectral densities (Sec. 18.10-3):

image

(c)Note also

image

In the special case of stationary white-noise input with Rxx(τ) ≡ Φ0(τ) (Sec. 18.11-46), note

image

18.12-4. Relations for t Correlation Functions and Non-ensemble Spectra.  The relations (2), (4), and (10) to (17) all hold if each ensemble average, correlation function, and spectral density is replaced by the corresponding t average, t correlation function, and non-ensemble spectral density (Sees. 18.10-7 to 18.10-9), whenever these quantities exist.

18.12-5. Nonlinear Operations.  Given a random process generating x(t) and a single-valued, measurable function y = y(x), the functions

image

represent a new random process produced by a (generally nonlinear) zero-memory operation on the x(t); y(x) does not depend explicitly on t. Distributions and ensemble averages of the y process are obtained by the methods of Sees. 18.5-2 and 18.5-4.  In particular, the autocorrelation function of the “output” y is, for real variables,

image

where x1 = x(t1), x2 = x(t2); y2 = y(x1), y2 = y(x2).

If this turns out to be more convenient, Ryy(t1, t2) can be obtained in the form

image

where the integration contours C1, C2 parallel the imaginary axis in suitable absolute-convergence strips (Ref. 18.15). The “transform method” is especially useful in connection with certain practically important transfer characteristics y(x), e.g., limiters, half-wave detectors, quantizers, etc. (Refs. 18.13 and 18.15).

18.12-6. Nonlinear Operations on Gaussian Processes.  (a) Price's Theorem (Ref. 18.17). Given two jointly normal random variables x1, x2 with covariance λ12 and a function f(x1, x2) such that

image

for some real a > 0, b < 2, then

image

Price's theorem yields ensemble averages (and, in particular, correlation functions) in the form

image

where C is the value of E{f(x1, x2)} for λ12 = 0, i.e., for uncorrelated x1, x2.

Price's theorem also leads to the useful recursion formula

image

In particular,

image

(b)Series Expansion.  Given a stationary Gaussian process x(t) with E{x} = 0, Rxx(τ) = σ2ρxx(τ) and a function y = y(x) such that Ryy(τ) exists, then

image

where the Hk(v) are the Hermite polynomials defined in Table 21.7-1.

18.13. RELATED TOPICS, REFERENCES, AND BIBLIOGRAPHY

18.13-1. Related Topics.  The following topics related to the study of probability theory and random processes are treated in other chapters of this handbook:

Measure, Lebesgue integrals, Stieltjes integrals, Fourier analysis  Chap. 4

Construction of mathematical models, abstract spaces, Boolean algebras.  Chap. 12

Orthogonal-function expansions  Chap. 15

Mathematical statistics, random-process measurements and tests  Chap. 19

Permutations and combinations Appendix C

18.13-2. References and Bibliography (see also Sec. 19.9-2).

      18.1. Arley, N., and K. R. Buch: Introduction to the Theory of Probability and Statistics, Wiley, New York, 1950.

      18.2. Burington, R. S., and D. C. May: Handbook of Probability and Statistics, 2d ed., McGraw-Hill, New York, 1967.

      18.3. Cramér, H.: Mathematical Methods of Statistics, Princeton, Princeton, N.J., 1951.

      18.4.———: The Elements of Probability Theory and Some of Its Applications, Wiley, New York, 1955.

      18.5. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. I, 2d ed., Wiley, New York, 1958; vol. II, 1966.

      18.6. Gnedenko, B. V.: Theory of Probability, Chelsea, New York, 1962.

      18.7.——— and A. I. Khinchine: An Elementary Introduction to the Theory of Probability, Dover, New York, 1961.

      18.8. Loéve, M. M.: Probability Theory, 3d ed., Van Nostrand, Princeton, N.J., 1963.

      18.9. Parzen, E.: Modern Probability Theory and Its Applications, Wiley, New York, 1960.

      18.10. Richter, H.: Wahrscheinlichkeitstheorie, 2d ed., Springer, Berlin, 1967.

Random Processes

      18.11. Bailey, N. T. J.: The Elements of Stochastic Processes with Applications to the Natural Sciences, Wiley, New York, 1964.

      18.12. Bharucha-Reid, A. J.: Elements of the Theory of Markov Processes and Their Applications, McGraw-Hill, New York, 1960.

      18.13. Davenport, W. B., Jr., and W. L. Root: Introduction to Random Signals and Noise, McGraw-Hill, New York, 1958.

      18.14. Doob, J. L.: Stochastic Processes, Wiley, New York, 1953.

      18.15. Middleton, D.: An Introduction to Statistical Communication Theory, McGraw-Hill, New York, 1960.

      18.16. Parzen, E.: Stochastic Processes, Holden-Day, San Francisco, 1962.

      18.17. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1965.

      18.18. Rosenblatt, M.: Random Processes, Oxford, New York, 1962.

      18.19. Saaty, T. L.: Elements of Queueing Theory with Applications, McGraw-Hill, New York, 1961.