Chapter Name

* Whenever this proposition is justified, it must be regarded as a law of nature; it should not be confused with mathematical theorems like Bernoulli's theorem or the mathematical law of large numbers (Sec. 18.6-5).

18.2. DEFINITION AND REPRESENTATION OF PROBABILITY MODELS

18.2-1. Algebra of Events Associated with a Given Experiment. Each probability model describes a specific idealized experiment or observation having a class δ† of theoretically possible results (events, states) E permitting the following definitions.

The union (logical sum) E₁ ∪ E₂ ∪ . . . (or E₁ + E₂ + . . .) of a countable (finite or infinite) set of events E₁, E₂, . . . is the event of realizing at least one of the events E₁, E₂, . . . .

The intersection (logical product) E₁∩ E₂ (or E₁E₂) of two events E₁ and E₂ is the joint event of realizing both E₁ and E₂.

The (logical) complement Ẽ of an event E is the event of not realizing E (“opposite” or complementary event of E).

I is the certain event of realizing at least one of the events of δ†.

0 is the impossible event of realizing no one of the events of δ†.

In each case, the class δ of events comprising δ† and 0 is to constitute a completely additive Boolean algebra (algebra of events associated with the given experiment or observation) having all the properties outlined in Secs 12.8-1 and 12.8-4. Either E₁ ∪ E₂ = E₁ or E_l ∩ E₂ = E₂ implies the logical inclusion relation E₂ ⊂ E₁ (E₂ implies E₁); note 0 ⊂ E ⊂ I. E₁ and E₂ are mutually exclusive (disjoint) if and only if E₁ ∩ E₂ = 0. The set δ₁ of joint events E ∩ E₁ is the algebra of events associated with the given experiment under the hypothesis that E₁ occurs; E₁ ∩ E₁ = E₁ is the certain event in δ₁(see also Sec. 12.8-3).

18.2-2. Mathematical Definition of Probabilities. Conditional Probabilities. It is possible to assign a (mathematical) probability P[E] (probability of E, probability of realizing the event E) to each event E of the class δ (event algebra, Sec. 18.2-1) associated with a given experiment if and only if one can define a single-valued real function P[E] on δ so that

Postulates 1 to 3 imply 0 ≤ P[E] ≤ 1; in particular, P[E] = 0 if E is an impossible event. Note carefully that P[E] = 1 or P[E] = 0 do not necessarily imply that E is, respectively, certain or impossible.

A fourth defining postulate relates the “absolute” probability P[E] associated with the given experiment to the “conditional” probabilities P[E|E₁] referring to a “simpler” experiment restricted by the hypothesis that E₁ occurs. The conditional probability P[E|E₁] of E on (relative to) the hypothesis that the event E₁ occurs is defined by the postulate

P[E|E₁] is not defined if P[E₁] = 0.

In the context of the restricted experiment, the quantities P[E|E₁] are ordinary probabilities associated with the joint events E ∩ E₁ constituting the event algebra δ₁ of the restricted experiment (Sec. 18.2-1). In practice, every probability can be interpreted as a conditional probability relative to some hypothesis implied by the experiment under consideration.

18.2-3. Statistical Independence. Two events E₁ and E₁ are statistically independent (stochastically independent) If and only if

(18.2-1)

so that [E₁|E₂] = P[E₁] if P[E₂] ≠ 0, and P[E₂|E₁] = P[E₂] if P[E₁] ≠ 0.

N events E₁ E₂, . . . , E_N are statistically independent if and only if not only each pair of events E_i, E_k but also each pair of possible joint events is statistically independent:

(18.2-2)

18.2-4. Compound Experiments, Independent Experiments, and Independent Repeated Trials. Frequently an experiment appears as a combination of component experiments (see also Sees. 18.7-3 and 18.8-1). Let E′, E″, E′″, . . . denote any result associated, respectively, with the first, second, third, . . . component experiment. The results of the compound experiment can be described as joint events E = E′ ∩ E″ ∩ E′″ ∩ . . . ; their probabilities will, in general, depend on the nature and interaction of all component experiments. The probability P[E′] of realizing the component result E′ in the course of a given compound experiment is, in general, different from the probability associated with E′ in an independently performed component experiment.

Two or more component experiments of a given compound experiment are independent if and only if their respective results E′, E″, E′″, . . . obtained in the course of the compound experiment are statistically independent, i.e.,

for all E′, E″, E′″, . . . (Sec. 18.2-3). If a component experiment is independent of all others, the probability of realizing each of its results in the course of the given compound experiment is equal to the corresponding probability for the independently performed component experiment.

Repeated independent trials are independent experiments each having the same set of possible results E and the same set of associated probabilities P[E]. The probability of obtaining the sequence of results E₁ E₂, . . . E_n in the compound experiment corresponding to a sequence of n repeated independent trials is

(18.2-3)

18.2-5. Combination Rules (see also Sees. 18.7-1 to 18.7-3). Each of the theorems in Table 18.2-1 expresses the probability of an event in terms of the (possibly already known) probabilities of other events logically related to the first event.

More generally, the probability of realizing at least m and exactly m of N (not necessarily statistically independent) events E₁ E₂, . . . , E_N is, respectively

(18.2-4)(18.2-5)(18.2-6)

If E₁ E₂, . . . , E_N are statistically independent, the quantities (5) reduce to the symmetric functions (1.4-9) of the P[E₁] (Table 18.2-1b).

EXAMPLES: If the probability of each throw with a die is 1/6 then

The probability of throwing either 1 or 6 is 1/6 + 1/6 = 1/3

The probability of throwing 6 at least oncein two throws is 1/6 + 1/6 – 1/36 = 11/36

The probability of throwing 6 exactly once in two throws is 1/3 – 2/3 = 5/18

The probability of throwing 6 twice in two throws is 1/36; etc.

18.2-6. Bayes's Theorem (see also Sec. 18.4-5b). Let H₁ H₂,. . . be a set of mutually exclusive events such that H₁ ∪ H₂ ∪ . . . = I . Then, for each pair of events H_i, E,

(18.2-7)

Equation (7) can be used to relate the “a priori” probability P[H_i] of a hypothetical cause H_i of the event E to the “a posteriori” probability P[H_i|E] if (and only if) the H_i are “random” events permitting the definition of probabilities P[H_i].

Table 18.2-1. Probabilities of Logically Related Events

18.2-7. Representation of Events as Sets in a Sample Space. Every class S of events E permitting the definition of probabilities P[E] can be described in terms of a set T of mutually exclusive events Ê ≠ 0 such that each event E is the union of a corresponding subset of Ê. Ê is called a sample space or fundamental probability set associated with the given experiment; each set of sample points (simple events, elementary events, phases) Ê of T corresponds to an event E. In particular, T itself corresponds to a certain event, and an empty subset of T corresponds to an impossible event.

The probabilities P[E] can then be regarded as values of a set function, the probability function defining the probability distribution of the sample space. Each probability P[E] is the sum of the probabilities attached to the simple events included in the event E.

The event algebra S is thus represented isomorphically by an algebra of measurable sets (see also Sees. 4.6-17b and 12.8-4). The fundamental probability set associated with the conditional probabilities P[E|E₁] is the subset of I representing E₁. Conversely, a sample space associated with any given experiment may be regarded as a subset “embedded” in a space of events associated with a more general experiment (see also Sees. 18.2-1 and 18.2-2).

18.2-8. Random Variables. A random variable (stochastic variable, chance variable, variate) is any (not necessarily numerical)* variable x whose “values” x = X constitute a fundamental probability set (sample space, Sec. 18.2-7) of simple events [x = X], or whose values label the points of a sample space on a reciprocal one-to-one basis. The associated probability distribution is the distribution of the random variable x. The definition of any random variable must specify its distribution.

Every single-valued measurable function (Sec. 4.6-14c) x defined on any fundamental probability set T is a random variable; its distribution is defined by the probabilities of the events (measurable subsets of T, Sec. 18.2-7) corresponding to each set of values of x.

18.2-9. Representation of Probability Models in Terms of Numerical Random Variables and Distribution Functions. The simple events (sample points) Ê of the fundamental probability set associated with a given problem are frequently labeled with corresponding values (sample values) X of a real numerical random variable x. Each sample value of x may, for instance, correspond to the result of a measurement defining a simple event. Compound events, like [x ≤ a], [sin x > 0.5], or [x = arctan 2], correspond to measurable sets of values of x (see also Sec. 18.2-8).

More generally, each simple event may be labeled by a corresponding (ordered) set X ≡ (X₁, X₂, . . .) of real numbers X₁, X₂, . . . which

* The boldface type used to denote a multidimensional random variable x does not necessarily imply that x is a vector.

constitutes a “value” of a multidimensional random variable x ≡ (x₁, x₂, . . .). Each of the real variables x₁, x₂, . . . is itself a random variable (see also Sec. 18.4-1).

Given a random variable x; or x labeling the simple events of the given fundamental probability set on a one-to-one basis, the probabilities associated with the corresponding experiment are uniquely described by the probability distribution of the random variable.

Throughout this handbook, all real numerical random variables are understood to range from – ∞ to + ∞ ; values of a numerical random variable which do not label a possible simple event Ê are treated as impossible events and are assigned the probability zero.

The distribution (or the probability function, Sec. 18.2-7) of any real numerical random variable x is uniquely described by its (cumulative) distribution function

Similarly, the distribution of a multidimensional random variable x ≡ (x₁, x₂, . . .) is uniquely described by its (cumulative) distribution function

Conversely, the distribution function corresponding to a given probability distribution is uniquely defined for all values of the random variable in question. Every distribution function is a nondecreasing function of each of it's arguments, and

18.3. ONE-DIMENSIONAL PROBABILITY DISTRIBUTIONS

18.3-1. Discrete One-dimensional Probability Distributions (see Tables 18.8-1 to 18.8-7 for examples). The real numerical random variable x is a discrete random variable (has a discrete probability distribution) if and only if the probability

is different from zero only on a countable set of spectral values X = X₍₁₎, X₍₂₎, . . . (spectrum of the discrete random variable x). Each discrete probability distribution is defined by the function (1), or by the corresponding (cumulative) distribution function (Sec. 18.2-9)

Throughout this handbook, the notation will be used to

signify summation of a function y(x) over all spectral values X_(i) of a discrete random variable x (see also Sec. 18.3-6). Note

18.3-2. Continuous One-dimensional Probability Distributions (see Table 18.8-8 for examples). The real numerical random variable x is a continuous random variable (has a continuous probability distribution) if and only if its (cumulative) distribution function Φ_x(X) ≡ Φ(X) is continuous and has a piece wise continuous derivative, the frequency function (probability density, differential distribution function) of x

for all X .† P[X < x ≤ X + dx] = dΦ = φ(X) dx is called a probability element (probability differential). Note

If x is a continuous random variable, each event [x = X] has the probability zero but is not necessarily impossible. The spectrum of a continuous random variable x is the set of values x = X where φ(X) ≠ 0.

* In terms of the step function U -(t) [U-(t) = 0 if t < 0, U_(t) = 1 if t ≥ 0, Sec. 21.9-1],

† Some authors call a probability distribution continuous whenever its distribution function is continuous.

NOTE: A random variable can be continuous (i.e., have a piecewise continuous frequency function) over part of its range, while it is discrete elsewhere (see also Sec. 18.3-6).

18.3-3. Expected Values and Variance. Characteristic Parameters of One-dimensional Probability Distributions (see also Sec. 18.3-6). (a) The expected value (mean, mean value, mathematical expectation) of a function y(x) of a discrete or continuous random variable x is

if this expression exists in the sense of absolute convergence (see also Sees. 4.6-2 and 4.8-1).

(b) In particular, the expected value (mean, mean value, mathematical expectation) E{x} = ξ and the variance Var {x} = σ² of a discrete or continuous one-dimensional random variable x are denned by

For computation purposes note (see also Sec. 18.3-10)

Whenever E{x} and Var {x} exist, the mean square deviation

of the random variable x from one of its values X is least (and equal to σ²) for X = ξ.

(c) E{x} and Var {x} are not functions of x; they are functionals (Sec. 12.1-4) describing properties of the distribution of x. E{x} is a measure of location, and Var {x} is a measure of dispersion (or concentration) of the probability distribution of x. A number of other numerical “characteristic parameters” describing specific properties of one-dimensional probability distributions are defined in Table 18.3-1 and in Sees. 18.3-7 and 18.3-9. Note that one or more parameters like E{x), Var {x), E{|x — ξ|}, . . . may not exist for a given probability distribution.

(d) Tables 18.8-1 to 18.8-8 list mean values and variances for a number of frequently used probability distributions.

Table 18.3-1. Numerical Parameters Describing Properties of One-dimensional Probability Distributions (see also Sees. 18.3-3, 18.3-7, and 18.2.1)

18.3-4. Normalization. Given a function ψ(x) ≥ 0 known to be proportional to the function p(x) associated with a discrete random variable x (Sec. 18.3-1),*

Given a function ψ(x) ≥ 0 known to be proportional to the frequency function φ(x) of a continuous random variable x (Sec. 18.3-2),

In either case, k is called the normalization factor. Analogous procedures apply to multidimensional distributions (Sec. 18.4-1).

18.3-5. Chebyshev's Inequality and Related Formulas. The following formulas specify upper bounds for the probability that a random variable x, or its absolute deviation |x — ξ| from the mean value ξ = E{x}, exceeds a given value a > 0.

If x has a continuous distribution with a single mode (Table 18.3-1) ξ_mode, one has the stronger inequality

where ∑ is Pearson's measure of skewness (Table 18.3-1); note that 2 = 0 if the distribution is symmetrical about the mode.

18.3-6. Improved Description of Probability Distributions:Use of Stieltjes Integrals. The treatment of discrete and continuous probability distributions is unified if one expresses the probability of each event [X – ΔX < x ≤ X + ΔX] as a Lebesgue-Stieltjes integral (Sec. 4.6-17)

* In order to conform with the notation used in many textbooks, the values x = X of a random variable x will be denoted simply by x whenever this notation does not lead to ambiguities.

where Φ(X) ≡ P[x ≤ X] is the cumulative distribution function (Sees. 18.2-9, 18.3-1, and 18.3-2) defining the distribution of the random variable x. For continuous distributions the Stieltjes integral (15) reduces to a Riemann integral. For a discrete distribution, Φ(X) is given by Eq. (2), and P[X — ΔX < x ≤ X + ΔX] reduces to the function p(X) denned in Sec. 18.3-1.

In terms of the Stieltjes-integral notation,

for both discrete and continuous distributions. The Stieltjes-integral notation applies also to probability distributions which are partly discrete and partly continuous. An analogous notation is used for multidimensional distributions (Sees. 18.4-4 and 18.4-8).

Discrete distributions may be formally represented in terms of a “probability-density” involving impulse functions δ_(X — X_i)) (see also Sees. 18.3-1 and 21.9-6).

18.3-7. Moments of a One-dimensional Probability Distribution (see also Sees. 18.3-6 and 18.3-10). (a) The moment of order r ≥ 0 (r^th moment) about x = X of a given random variable x is the mean value E{(x — X)^r}, if this quantity exists in the sense of absolute convergence (Sec. 18.3-3).

(b) In particular, the r^th moment of x about X = 0 is

and the r^th moment of x about its mean value ξ (central moment of order r) is

The existence of α_r or µ_r implies the existence of all moments α_k and µ_k of order k ≤ r; the divergence of α_r or µ_r implies the divergence of all moments α_k and µ_k of order k ≥ r.

If the probability distribution is symmetric about its mean, all (existing) central moments µ_r of odd order r are equal to zero.

The r^th central factorial moment of x is E{(x — ξ)^[r]}. The r^th absolute moment of x about X = 0 is β_r = E{|x|^r}. Note

(d) A one-dimensional probability distribution is uniquely defined by its moments α₀, α₁, α₂, . . . if they all exist and are such that the series converges absolutely for some |s| > 0 [see also Eq. (28) and the footnote to Sec. 18.3-8b].

(e) Refer to Tables 18.8-1 to 18.8-7 for examples, and to Sec. 18.3-10 for relations connecting the α_r, µ_r, and α_[r].

18.3-8. Characteristic Functions and Generating Functions (see also Sec. 18.3-6; refer to Tables 18.8-1 to 18.8-8 for examples).*

(a)The probability distribution of any one-dimensional random variable x uniquely defines its (generally complex-valued) characteristic function

where q is a real variable ranging between – ∞ and ∞.

(b) The probability distribution of a random variable x uniquely defines its moment-generating function

and its generating function (see also Sec. 8.6-5)

* See footnote to Sec, 18.3-4.

for each value of the complex variable s such that the function in question exists in the sense of absolute convergence.

(c) The characteristic function χ_x(q) defines the probability distribution of x uniquely.* The same is true for each of the functions M_x(s) and γ_x(s) if it exists, in the sense of absolute convergence, throughout an interval of the real axis including s = 0 in the case of M_x(s), and s = 1 in the case of γ_x(s). Specifically, if £ is a discrete or continuous random variable,

Eq. (24) also yields p(x) or φ(x) in terms of M_x(s), since

(d) In many problems it is much easier to obtain a description of a probability distribution in terms of χ_x(q), M_x(s), or γ_x(s) than to compute Φ(z), p(x), or φ(x) directly (Sees. 18.5-3b,18.5-7, and 18.5-8). Again, the methods of Sec. 18.3-10 permit one to compute mean values, variances, and moments by simple differentiations if χ_x(q), M_x(s), or γ_x(s) are known. The linear integral transformations (21) to (24) can often be made with the aid of tables of Fourier or Laplace transform pairs (Appendix D).

(e) The generating function γ_x(s) is particularly useful in problems involving discrete distributions with spectral values 0, 1, 2, . . . , for then

whenever the series converges (see also Sec. 18.8-1; see Ref. 18.4 for a number of interesting applications).

18.3-9. Semi-invariants (see also Sec. 18.3-10). Given a one-dimensional probability distribution such that the r^th moment α_r exists, the first r semi-invariants (cumulants) k₀, k₁ k₂, . . . , k_r of the distribu-

* Φ(x) is, then, uniquely defined except possibly on a set of measure zero; Φ(x) is unique wherever it is continuous (see also Sec. 18.2-9).

tion exist and are defined by

Under the conditions of Sec. 18.3-7d all semi-invariants k₁, k₂, . . . exist and define the distribution uniquely.

18.3-10. Computation of Moments and Semi-invariants from χ_x(q), M_x(s), and γ_x(s). Relations between Moments and Semi-invariants. Many properties of a distribution can be computed directly from χ_x(q), M_x(s), or γ_x(s) without previous computation of Φ(x), φ(x), or p(x). If the quantities in question exist,

Note

provided that the function on the left (respectively the moment generating function, the semi-invariant-generating function, and the factorial-moment-generating function of x) is analytic throughout a neighborhood of s = 0.

Equations (28) yield E{x} and Var {x} with the aid of the relations

Table 18.3-1 lists other parameters which can be expressed in terms of moments.

The following additional formulas relate moments and semi-invariants:

18.4. MULTIDIMENSIONAL PROBABILITY DISTRIBUTIONS

18.4-1. Joint Distributions (see also Sec. 18.2-9). The probability distribution of a multidimensional random variable x ≡ (x₁, x₂, . . .) is described as a joint distribution of real numerical random variables x₁, x₂, . . . . Each simple event (point of the multidimensional sample space) [x = X] ≡ [x₁ = X₁ x₂ = X₂, . . .] may be regarded as a result of a compound experiment in which each of the variables x₁, x₂, . . . is measured. Each joint distribution is completely defined by its (cumulative) joint distribution function.

18.4-2. Two-dimensional Probability Distributions. Marginal Distributions. The joint distribution of two random variables x₁, x₂ is defined by its (cumulative) distribution function

The distribution of x₁ and x₂ (marginal distributions derived from the joint distribution of x₁ and x₂) are described by the corresponding marginal distribution functions

18.4-3. Discrete and Continuous Two-dimensional Probability Distributions. (a) A two-dimensional random variable x ≡ (x₁, x₂) is a discrete random variable (has a discrete probability distribution) if and only if the joint probability

is different from zero only for a countable set (spectrum) of “points” (X₁, X₂), i.e., if and only if both x₁ and x₂ are discrete random variables (Sec. 18.3-1). The marginal probabilities respectively associated with the marginal distributions of x₁ and x₂ (Sec. 18.4-2) are

(b) A two-dimensional random variable x = (x₁, x₂) is a continuous random variable (has a continuous probability distribution) if and only if (1) Φ(X₁, X₂) is continuous for all X₁, X₂, and (2) the joint frequency function (probability density)

exists and is piecewise continuous everywhere.* φ(X₁ X₂) dx₁ dx₂ is called a probability element. The spectrum of a continuous two-dimensional probability distribution is the set of “points” (X₁, X₂) where the frequency function (5) is different from zero. The marginal frequency functions respectively associated with the (necessarily continuous) marginal distributions of x₁ and x₂ (Sec. 18.4-2) are

18.4-4. Expected Values, Moments, Covariance, and Correlation Coefficient. (a) The expected value (mean value, mathematical expectation) of a function y = y(x₁, x₂) of two random variables x₁, x₂ with respect to their joint distribution is

* See footnote to Sec. 18.3-2.

if this expression exists in the sense of absolute convergence (see also Sec. 18.3-3).

NOTE: If y is a function of x₁ alone, the mean value (8) is identical with the mean value (marginal expected value) with respect to the marginal distribution of x₁.

(b) The mean values E{x₁} = ξ₁, E{x₂} = ξ₂ define a “point” (ξ₁, ξ₂) called the center of gravity of the joint distribution. The quantities E{(x₁ — X₁)^r₁(x₂ — X₂)^r₂ are called moments of order r₁ + r₂ about the “point” (X₁ X₂). In particular, the quantities

are, respectively, the moments about the origin and the moments about the center of gravity (central moments) of order r₁ + r₂ (see also Sec. 18.3-76).

(c) The second-order central moments are of special interest and warrant a special notation. Note the following definitions:

(see also Sec. 18.4-8). Note — 1 ≤ ρ ₁₂ ≤ 1, and

18.4-5. Conditional Probability Distributions Involving Two Random Variables. (a) The joint distribution of two random variables x₁, x₂ defines a conditional distribution of x₁ relative to the hypothesis that x₂ = X₂ for each value X₂ of x₂ and a conditional distribution of x₂ relative to each hypothesis x₁ = X₁. The conditional distributions of x₁ and x₂ derived from a discrete joint distribution (Sec. 18.4-3a) are discrete and may be described by the respective conditional probabilities (Sec. 18.2-2)

The conditional distributions of x and x₂ derived from a continuous joint distribution (Sec. 18.4-3b) are continuous and may be described by the respective conditional frequency functions

(b) Note

(c) Given a discrete or continuous joint distribution of two random variables x₁ and x₂, the conditional expected value of a function y(x₁, x₂) relative to the hypothesis that x₁ = X₁ is

if this expression exists in the sense of absolute convergence. Note that E{y(x₁, x₂)|X₁} is a function of X₁

EXAMPLE: The conditional variances of x₁ and x₂ are the respective functions

18.4-6. Regression(see also Sees. 18.4-9 and 19.7-2). (a) Given the joint distribution of two random variables x₁ and x₂, a regression of x₂ on x₁ is any function g₂(x₁) used to approximate the statistical dependence of x₂ on x₁ by a deterministic relation x₂ ≈ g₂(x₁). More specifically, x₂ is written as a sum of two random variables,

where h₂(x₁, x₂) is regarded as a correction term. In particular, the function

often simply called the regression of x₂ on x₁, minimizes the mean-square deviation

The corresponding curve x₂ = E{x₂|x₁} is the (theoretical) mean-square regression curve of x₂.

(b) It is often sufficient to approximate the regression (19) by the linear function

Equation (21) describes a straight line, the mean-square regression line of x₂; β₂₁ is the regression coefficient of x₂ on x₁. Equation (21) represents the linear function ax₁ + b whose coefficients a, b minimize the mean-square deviation

The resulting minimum mean-square deviation is σ₂²(1 — ρ₁₂²); the correlation coefficient ρ₁₂ is seen to measure the quality of the “best” linear approximation.

(c) The mean-square regression (19) may be approximated more closely by a polynomial of degree m (parabolic mean-square regression of order m) or by other approximating functions, with coefficients or parameters chosen so as to minimize (20).

(d) If a; 2 is regarded as the independent variable, one has similarly

Note that in general neither (19) and (22) nor (21) and (23) are inverse functions. All mean-square regression curves and mean-square regression lines pass through the center of gravity (ξ₁, ξ₂) of the joint distribution.

The above definitions apply, in particular, if either of the two random variables, say x₁ = t, becomes a given independent variable, and x₂{t) describes a random process (Sec. 18.9-1).

18.4-7. n-dimensional Probability Distributions. (a) The joint distribution of n random variables x₁, x₂, . . . , x_n is uniquely described by its (cumulative) joint distribution function

(Sec. 18.2-9). The joint distribution of m < n of the variables x₁, x₂, . . . , x_n is an m-dimensional marginal distribution derived from the original joint distribution. One obtains the corresponding marginal distribution function from the joint distribution function (24) by substituting X_j = ∞ for each of the n — m arguments X_j, which do not occur in the marginal distribution, e.g.,

(b) An n-dimensional random variable x ≡ (x₁, x₂, . . . , x_n) is a discrete random variable (has a discrete probability distribution) if and only if the joint probability

differs from zero only for a countable set (spectrum) of “points” (X₁, X₂, . . . , X_n), i.e., if and only if each of the n random variables x₁, x₂, . . . , x_n is discrete (see also Sees. 18.3-1 and 18.4-3a).

Marginal probabilities and conditional probabilities are defined in the manner of Sees. 18.4-3a and 18.4-5a, e.g.,

(c) An n-dimensional random variable x ≡ (x₁, x₂, . . . , x_n) is a continuous random variable (has a continuous probability distribution if and only if (1) Φ(X₁, X₂, . . . , X_N) is continuous for all X₁, X₂, . . . , X_N and (2) the joint frequency function (probability density)

exists and is piecewise continuous everywhere.* φ(X₁, X₂, . . . , X_n) dx₁ dx₂ . . . dx_n is called a probability element (see also Sees. 18.3-2 and 18.4-3b). The spectrum of a continuous probability distribution is the set of “points” (X₁, X₂, . . . , X_n) where the frequency function (26) is different from zero.

(d) Note

(e) The frequency functions associated with the (necessarily continuous) marginal and conditional distributions derived from a continuous w-dimensional probability distribution are denned in the manner of Sees. 18.4-3b and 18.4-5a, e.g.,

(f) The joint distribution of two or more multidimensional random variables x = (x₁, x₂, . . .), y = (y₁, y₂, . . .), . . . is the joint distribution of the random variables x₁, x₂, . . . ; y₁, y₂, . . . ; . . . .

NOTE: A joint distribution may be discrete with respect to one or more of the random variables involved, and continuous with respect to one or more of the others; and each random variable may be partly discrete and partly continuous.

18.4-8. Expected Values and Moments (see also Sec. 18.4-4). (a) The expected value (mean value, mathematical expectation) of a function y = y(x₁, x₂, . . . , x_n) of n random variables x₁, x₂, . . . , x_n with respect to their joint distribution is

* See footnote to Sec. 18.3-2.

if this expression exists in the sense of absolute convergence.

NOTE : If y is a function of only m < n of the n random variables x₁, x₂, . . . , x_n, then the mean value (28) is identical with the mean value of y with respect to the joint distribution (marginal distribution, Sec. 18.4-7) of the m variables in question.

(b) The n mean values E{x₁} = ξ₁, E{x₂} = ξ₂, . . . E{x_n) = ξ_n define a “point” (ξ₁, ξ₂, . . . , ξ_n) called the center of gravity of the joint distribution. The quantities E{(x₁ — X₁)^r₁(x₂ — X₂)^r₂ . . . (x_n — X_N)^r_n} are the moments of order r₁ + r₂ + . . . + r_n about the “point” (X₁, X₂, . . . , X_n). In particular, the quantities

are, respectively, the moments about the origin and the moments about the center of gravity (central moments).

define the moment matrix [λ_ik] ≡ Δ and its reciprocal (Sec. 13.2-3)*

det [λ_ik] is the generalized variance of the joint distribution. The (total) correlation coefficients

* Note that some authors denote the cofactor matrix [λ_ik]^—1 det [λ_ik] by [Δ_ik]. The notation chosen here simplifies some expressions,

(see also Sec. 18.4-4c) define the correlation matrix [p_ik] of the joint distribution. is sometimes called the scatter coefficient.

The matrices [ λ_ik] and [ρ_ik] are real, symmetric, and nonnegative (Sees. 13.3-2 and 13.5-2). Their common rank (Sec. 13.2-7) r is the rank of the joint distribution. The ellipsoid of concentration corresponding to a given n-dimensional probability distribution is the n-dimensional “ellipsoid”

defined so that a uniform distribution of a unit probability “mass” inside the hyper-surface has the moment matrix [ λ_ik]. The ellipsoid of concentration illustrates the “concentration” of the distribution in different “directions”; the “volume” of the ellipsoid is proportional to the square root of the generalized variance. For r < n, the probability distribution is singular: its spectrum (Sec. 18.4-7) is restricted to an r-dimensional linear manifold (straight line, plane, hyperplane) in the n-dimensional space of “points” (x₁, x₂, . . . , x_N), and the same is true for its ellipsoid of concentration. Thus the spectrum of a two-dimensional probability distribution is restricted to a straight line if r = 1, and to a point if r = 0.

18.4-9. Regression. Multiple and Partial Correlation Coefficients (see also Sees. 18.4-6 and 19.7-2). (a) Given the joint distribution of n random variables x₁, x₂, . . . , x_N, one may study the dependence of one of the variables, say x₁ on the remaining n - 1 variables by writing

where h₁(x₁, x₂, . . . , x_N) is regarded as a correction term. The function

(mean-square regression of x₁ on x₂, x₃, . . . , x_N) minimizes the mean-square deviation E[x₁ — g₁(x₂, x₃ . . . , x_N)]²; E{x₁|X₂, X₃, . . . , X_N) is the conditional mean of x₁ relative to the hypothesis that x₂ = X₂, x₃ = X₃, . . . , x_N = X_N (see also Sec. 18.4-5c)

(b) The mean-square regression of any variable Xi on the remaining n — 1 variables is often approximated by the linear function

(see also Sec. 18.4-6). * The regression coefficients β_ik are uniquely determined if the distribution is nonsingular (Sec. 18.4-8). The multiple correlation coefficient

is a measure of the correlation between x_i and the remaining n - 1 variables.

(c) The random variable h_i⁽¹⁾ ≡ x_i — g_i⁽¹⁾ (difference between x_i and its “linear estimate” g_i⁽¹⁾ for x0668;_ii ≠ 0) is the residual of x_i with respect to the remaining n — 1 variables. Note

(d) Regressions and residuals may be similarly defined in connection with a suitable marginal distribution (Sec. 18.4-7a) of m < n variables, say x₁, x₂, . . . , x_m. The quantities analogous to β₁₂, β₁₃, . . . ; h₁⁽¹⁾, h₂⁽¹⁾, . . . are then respectively denoted by β_12.34...m, β_13.24...m, . . . ; h⁽¹⁾_1.23...m, h⁽¹⁾_2.13...m, . . .; in each case, there is a subscript corresponding to each variable of the marginal distribution.

(e) The partial correlation coefficient of x₁ and x₂ with respect to x₃, x₄,. . . , x_N

measures the correlation of x₁ and x₂ after removal of the linearly approximated effects of x₃, x₄, . . . , x_N. In particular, for n = 3,

18.4-10. Characteristic Functions (see also Sec. 18.3-8). The probability distribution of an n-dimensional random variable x ≡ (x₁, x₂, . . . , x_N) uniquely defines the corresponding characteristic function (joint characteristic function of x₁, x₂ ... , x_N)

* See footnote to Sec. 18.4-8b.

and conversely. For continuous distributions,

The joint characteristic function corresponding to the marginal distribution of m < n of the n variables x₁, x₂, . . . , x_N is obtained by substitution of q_k = 0 in Eq. (39) whenever x_k does not occur in the marginal distribution; thus χ₁₂(q_i, q₂) ≡ χ_x(q₁, q₂, 0, . . . , 0).

Moments and semi-invariants of suitable multidimensional probability distributions can be obtained as coefficients in multiple series expansions of χ_x and log_e χ_x in a manner analogous to that of Sec. 18.3-10.

18.4-11. Statistically Independent Random Variables (see also Sees. 18.2-3 and 18.5-7). * (a) A set of random variables x₁, x₂, . . . , x_N are statistically independent if and only if the events [x₁ ∊ S₁], [x₂ ∊ S₂], . . . , [x_N ∊ S_N] are statistically independent for every collection of real-number sets S₁ S₂, . . . , S_N. This is true if and only if

or, in the respective cases of discrete and continuous random variables, if and only if

The joint distribution of statistically independent random variables is completely defined by their individual marginal distributions. Statistically independent random variables x₁, x_2, . . . are uncorrelated, i.e., ρ_ik = 0 for all i ≠ k (Sec. 18.4-8c), but the converse is not necessarily true (see also Sec. 18.8-8).

(b) Statistical independence of multidimensional random variables x₁, x₂, . . . is defined by Eqs. (41) or (42) on substitution of x₁, x₂, . . . for x₁, x₂, . . . .

* See footnote to Sec. 18.3-4.

EXAMPLE: The multidimensional random variables (x₁ x₂) and (x₃, x₄, x₅) are statistically independent if and only if

Note that Eq. (43) implies the statistical independence of x₂ and x₅, x₁ and (x₃, x₄), (x₁, x₂) and (x₃, x₅), etc.

(c) Given a joint distribution of n discrete or continuous random variables x₁ x₂, . . . , x_N such that (x₁ x₂, . . . , x_m) is statistically independent of (x_m+1, x_m+2, . . . , x_N), note

(d) Two random variables x₁ and x₂ are statistically independent if and only if their joint characteristic function is the product of their individual (marginal) characteristic functions (Sec. 18.4-10), i.e.,

An analogous theorem applies for multidimensional random variables (see also Sec-18.5-7).

(e) If the random variables x₁, x₂, . . . are statistically independent, the same is true for the random variables y₁(x₁), y₂(x₂), . . . . An analogous theorem holds for multidimensional random variables.

18.4-12. Entropy of a Probability Distribution, and Related Topics, (a) The entropy associated with the probability distribution of a one-dimensional random variable x is defined as

H{x} (entropy of x) is a measure of the expected uncertainty involved in a measurement of x. In the case of discrete probability distributions, H{x) ≥ 0, with H{x} = 0 if and only if x has a causal distribution (Table 18.8-1). The continuous distribution having the largest entropy for a given variance σ² is the normal distribution (Sec. 18.8-3), with H{x} = log₂ .

(b) In connection with the discrete or continuous joint distribution of two random variables x₁, x₂, one defines the joint entropy

and the conditional entropies

and H_x1{x₂] (these are not conditional expected values, Sec. 18.4-5c), so that

The equality on the right applies if and only if x₁ and x₂ are statistically independent (Sec. 18.4-11). The nonnegative quantity

is a measure of the “statistical dependence” of x₁ and x₂. The functionals (46), (47), (48), and (50) have intuitive significance in statistical mechanics and in the theory of communications.

18.5. FUNCTIONS OF RANDOM VARIABLES.

CHANGE OF VARIABLES

18.5-1. Introduction. The following relations permit one to calculate probability distributions of suitable functions of random variables and, in particular, to change the random variables employed to describe a given set of events.

18.5-2. Functions (or Transformations) of a One-dimensional Random Variable, (a) Given a transformation y = y(x) associating a unique value of a random variable y with each value of the random variable x, the probability distribution of y is uniquely determined by that of x [see also Sec. 18.2-8; y(x) must be a measurable function].

(b) Let the random variables x and y be related by a reciprocal one-to-one transformation y = y(x), with x = x(y). Then

1.If v(x) is an increasina function.

Note that either y(x) or -y(x) is necessarily an increasing function. In either case, the medians x_½ and y_½ are related by y_½ = y(x_½).

2. If x and y are continuous random variables,

for all values Y of y such that dx/dy exists and is continuous.

NOTE: If x(y) is multiple-valued, one writes φ_y(Y) = φ₁(Y) + φ₂(Y) + . . . , where φ₁(Y), φ₂(Y), . . . are the frequency functions obtained from Eq. (2) for the respective single-valued “branches” x₁(y), x₂(y), . . . of x{y). EXAMPLE: If

whenever this expected value exists; note that neither reciprocal one-to-one correspondence nor differentiability has been assumed for y(x). In particular, substitution of f(y) = e^sy in Eq. (4) yields the moment-generating function M_y(s) = E{e^sy}, and substitution of f(y) = e^iqy produces the characteristic function χ_y(q) ≡ E{e^iqy] (Sec. 18.3-8). If the integrals can be calculated, one may then use Eq. (18.3-25) to find φ_y(y) or p_y(y).

EXAMPLE: Let

where a is a constant, and x is uniformly distributed between 0 and 2π. Then

where we have used the symmetry properties of sin x and the fact that dy =

. It follows that

(see also Sec. 18.11-1b).

(d) By an extension of the convolution theorem of Sec. 8.3-3 to bilateral Laplace transforms (Sec. 8.6-2), Eq. (4) can be rewritten as

where the integration contour parallels the imaginary axis in a suitable absolute-convergence strip; the quantity in square brackets is seen to be the bilateral Laplace transform of f[y(x)] (see also Sees. 8.6-2 and Table 8.6-1). The complex contour integral (5) may be easier to compute than the integral (4).

(e) Note that, in general, E{y(x)} ≠ y{E{x}) (see also Sec. 18.5-3).

18.5-3. Linear Functions (or Linear Transformations) of a One-dimensional Random Variable, (a) If x is a continuous random variable, and y = ax + b, then

(b) If the mean values in question exist,

The semi-invariants (Sec. 18.3-9) α_i † of y = ax + b are related to the semi-invariants α_i of x by .

x´ is called a standardized random variable (see also Sec. 18.8-3).

(d) If y = y(x) is approximately linear throughout most of the spectrum of x, it is sometimes permissible to use the approximations

where y´(x) = dy/dx.

18.5-4, Functions and Transformations of Multidimensional Random Variables. (a) If the random variables

are single-valued measurable functions of the n random variables x₁, x₂, . . . , x_N for all x₁, x₂ . . . , x_N, then the probability distribution of each random variable y_i is uniquely determined by the joint distribution of x₁, x₂ . . . , x_N, and the same is true for each joint or conditional distribution involving a finite set of random variables y_i

Thus the distribution function of y_i and the joint distribution function of y_i and y_k are, respectively,

(b) If x ≡ (x_i, x₂ . . . , x_N) and y ≡ (y₁,y₂,. . ., y_N) are continuous random variables related by a reciprocal one-to-one (nonsingular) transformation (11), their respective frequency functions φ_x (X₁, X₂, . . . , X_N) and φ_y(Y₁,Y₂, . . . , Y_N) are related by

for all Y₁, Y₂, . . . , Y_N such that the Jacobian exists and is continuous.

If x(y) is multiple-valued, φ_y(Y₁, Y₂, . . . , Y_N) may be computed in a manner analogous to that outlined in Sec. 18.5-2b.

(c) For single-valued, measurable y_i = y_i(x₁, x₂, . . . , x_N) (i = 1, 2, . . . , m) and f(y₁, y₂ . . . , y_m),

whenever this expected value exists. As in See. 18.5-2c, neither reciprocal one-to-one correspondence nor differentiability has been assumed.

Table 18.5-1. Distribution of the Sum x = x₁ + x₂ + ... + x_N of n Independent Random Variables (see also Sees. 18.5-7, 18.6-5, and 19.3-3)

Substitution of f = exp (s₁y₁ + s₂y₂ + . . .+ s_my_m) yields the joint moment-generating function of y₁, y₂ . . . , y_m, and substitution of f = exp (iq₁y₁ + iq₂y₂ + . . . + iq_my_m) yields the joint characteristic function. Transform methods analogous to Eq. (5) may be useful. Such methods have been successfully applied to special random-process problems (Sec. 18.12-5).

(d) For any two random variables x₁, x₂,

if this quantity exists. If x₁, x₂, . . . , x_N are statistically independent, then

if this quantity exists.

(e) If y = x₁x₂,and φ_x₁(x₁) = 0 for x₁ < 0, then

(Sec. 18.5-4b), and

Other suitable functions y = y(x₁, x₂) can be treated in a similar manner.

18.5-5. Linear Transformations (see also Sees. 14.5-1 and 14.6-1). For every non-singular linear transformation

the respective joint distributions of x₁, x₂, . . . , x_N and y₁, y₂, . . . , y_N are of equal rank (Sec. 18.4-8c), and

if the quantities in question exist. Λ ̒ ≡ [λ ̒_ik] is the moment matrix (Sec. 18.4-8c) of (y₁ y₂, . . . , y_N). The methods of Sec. 13.5-5 make it possible to find

1. An orthogonal transformation (18) such that the new moment matrix [λ ̒_ik] (and hence also the correlation matrix is diagonal (transformation to uncorrelated variables y_i).

2. A transformation (18) such that η₁ = η₂ = . . . = η_N = 0 and λ ̒_ik = δⁱ_k (trans formation to uncorrelated standardized variables y_i (see also Secs. 18.8-6b and 18.8-8). The matrix [ E {x_i*x_k} ]must be nonsingular .

18.5-6. Mean and Variance of a Sum of Random Variables, (a) For any two (not necessarily statistically independent) random variables x₁, x₂

if the quantities in question exist.

(b) More generally,

(c) If y = y(x₁, x₂, . . . , x_N) is approximately linear throughout most of the joint spectrum of (x₁, x₂, . . . , x_N), it may be permissible to use the approximation

and to compute approximate values of E{y} and Var {y} by means of Eqs. (19) and (20) (see also Sec. 18.5-7).

18.5-7. Sums of Statistically Independent Random Variables (refer to Sec. 18.8-9 for examples), (a) If x₁ and x₂ are statistically independent random variables, then

where the subscripts 1 and 2 refer to the respective distributions of x₁ and x₂ as in Secs. 18.4-2, 18.4-3, and 18.4-7 (see also Table 18.5-1).

(b) More generally, if x = x₁ + x₂ + . . . + x_N is the sum of n < ∞ statistically independent random variables x₁, x₁, . . . , x_N,

and, if the quantities in question exist,

where K_r⁽ⁱ⁾ is the r^th-order semi-invariant of x_i. Equations (24) and (26) permit the computation of higher-order moments with the aid of the relations given in Sec. 18.3-10.

(c) The distribution of the sum z = (z₁, z₂, . . .) = x + y of two suitable statistically independent multidimensional random variables x = (x₁, x₂, . . .) and y ≡ (y₁,y₂, . . .) is described by

18.5-8. Compound Distributions. Let x₁, x₂ , . . . be independent random variables each having the same probability distribution, and let & be a discrete random variable with spectral values 0, 1, 2, ... ; let k be statistically independent of x₁, x₂, . . . . If the generating functions γ_x1(s) and γ_k(s) exist, the distribution of the sum x= x₁ + x₂ + . . . + x_k is given by its generating function

18.6. CONVERGENCE IN PROBABILITY AND LIMIT THEOREMS

18.6-1. Sequences of Probability Distributions. Convergence in Probability (see also Sec. 18.6-2). A sequence of random variables y₁, y₂, .. . converges in probability to the random variable y (y_N converges in probability to y as n → ∞) if and only if the probability that y_N differs from y by any finite amount converges to zero as n → ∞, or

An m-dimensional random variable y_N converges in probability to the m-dimensional random variable y as n → ∞ if and only if each component variable of y_N converges in probability to the corresponding component variable of y.

If the m random variables y_n1, y_n2, . . . , y_nm converge in probability to the respective constants α₁, α₂, . . . , α_m as n → ∞, then any function g(y_n1, y_n2, . . . , y_nm) expressible as a positive power of a rational function of y_n1, y_n2, . . ., y_nm converges in probability to g(α₁, α₂. . . , α_m), provided that this quantity is finite.

18.6-2. Limits of Distribution Functions, Characteristic Functions, and Generating Functions. Continuity Theorems, (a) y_N converges in probability to y as n x₂192; n → ∞ if and only if the sequence of distribution functions Φ_yn(Y) converges to the limit Φ_y(Y) for all Y such that Φ_y(Y) is continuous.

(b) y_N converges in probability to y as n → ∞ if and only if the sequence of characteristic functions X_yn(q)converges to a limit continuous for q = 0; in this case (Continuity Theorem for Characteristic Functions).

(c) A sequence of discrete random variables y₁, y₂ . . . converges in probability to the discrete random variable y as n → ∞ if and only if

If the random variables yi, y₂, . . . all have nonnegative integral spectral values 0, 1, 2, . . . and possess generating functions y_Vl(s), y_V2(s), . . . , then Eq. (2) holds if and only if lim y_Vn(s) = y_y(s) for all real s such that 0 ≤ s ≤ 1 (Continuity Theorem for Generating Functions). Note that a sequence of discrete random variables may converge in probability to a random variable which is not discrete (see, for example, Table 18.8-3).

(d) Analogous definitions apply if y{n) converges in probability as a function of a continuous parameter n.

(e) Analogous theorems apply to multidimensional probability distributions.

18.6-3. Convergence in Mean (see also Sec. 12.5-3). Given a random variable y having a finite mean and variance and a sequence of random variables y₁, y₂, . . . all having finite mean values and variances, y_N converges in mean (in mean square) to y as n → ∞ if and only if

Convergence in mean implies convergence in probability, but the converse is not true; as n → ∞ does not even imply that E{y} or Var {y} exists.

18.6-4. Asymptotically Normal Probability Distributions (refer to Table 18.8-3 and Sec. 19.5-3 for examples). The (probability distribution of a) random variable y_N with the distribution function Φ(Y, n) is asymptotically normal with mean η_N and variance σ_N² if and only if there exists a sequence of pairs of real numbers η_N, σ_N² such that the random variable (y_N — η_N)/σ_N converges in probability to a standardized normal variable (Sec. 18.8-3). This is true if and only if for all a, b > a

Equation (4) permits one to approximate the probability distribution of y_N by a normal distribution with mean η_N and variance σ_N² for sufficiently large n. Note that Eq. (4) does not imply that η_N and σ² are the mean and variance of y_N, that the sequence y₁, y₂, .. . converges in probability, or that E{y_N} and η_N or Var {y_N} and σ_N² converge to the same limits; indeed, these limits may not exist.

18.6-5. Limit Theorems, (a) For every class of events E permitting the definition of probabilities P[E] (Sec. 18.2-2)

The relative frequency h[E] = n_E/n (Sec. 19.2-1) of realizing the event E in n independent repeated trials (Sec. 18.2-4) is a random variable which converges to P[E] in mean, and thus also in probability, as n → σ {Bernoulli 's Theorem).

h[E] is asymptotically normal with mean P[E] and variance {1 - P[E] } (see also Table 18.8-3).

Note that (see also Table 18.8-3) *

(b) Let x₁, x₂ . . . be a sequence of statistically independent random variables all having the same probability distribution with (finite) mean value ξ. Then, as n ↔ ∞

The random variable converges in probability to ξ (Khinchiney's Theorem, Law of Large Numbers).

x is asymptotically normal with mean ξ and variance σ²/n, provided that the common variance σ² of x₁ x₂, . . . exists (Lindeberg-Levy Theorem, Central Limit Theorem; see also Sees. 19.2-3 and 19.5-2).

(c) Let x₁ x₂, . . . be any sequence of statistically independent random variables having (finite) mean values ξ₁, ξ₂, . . . and variances σ₁², σ₂² .... Then, as n → ∞,

1. σ_N² → 0 implies(Chebyshev's Theorem).

* See footnote to Sec. 18.3-4.

(Central Limit Theorem, Lindeberg conditions).

The Lindeberg conditions are satisfied, in particular, if there exist two positive real numbers a and b such that E{|xi|^2+a} exists and is less than bσ_i² for i = 1, 2, . . . (Lyapunov x0027;s Condition). See also Table 18.5-1

NOTE: The limit theorems are of special importance in statistics (Sees. 19.2-1 and 19.2-3).

18.7. SPECIAL TECHNIQUES FOR SOLVING PROBABILITY PROBLEMS

18.7-1. Introduction. Most probability problems require one to compute the distribution of a random variable x (or the distributions of several random variables) from given conditions specifying the distributions of other random variables x₁, x₂, . . . . As a rule, the simple events labeled by values of x are compound events corresponding to various logical combinations of values of x₁, x₂, . . . . The first step in the solution of any such problem must be the unequivocal definition of the fundamental probability set labeled by each fandom variable. The probabilities of compound events may then be computed by the methods of Sees. 18.2-2 to 18.2-6 and 18.5-1 to 18.7-3. Equation (18.3-3), (18.3-6), (18.4-7), or (18.4-27) may be used to check computations.

18.7-2. Problems Involving Discrete Probability Distributions: Counting of Simple Events and Combinatorial Analysis. Each fundamental probability set labeled by the spectral values of a discrete random variable (Sec. 18.3-1) is a countable set of simple events. The following relations (either alone or in combination with the relations of Sees. 18.2-2 to 18.2-6) aid in computing probabilities of compound events:

(a) If, as in many games of chance, equal probabilities are assigned to each of the N simple events of a given finite fundamental probability set, then the probability of realizing a compound event (“success ”) defined as the union (Sec. 18.2-1) of N₁ specified simple events (“favorable ” simple events) can be computed as

(b) Given a countable (finite or infinite) fundamental probability set, let an event E be defined as the union of N₁ simple events each having the probability p_h N₂ simple events each having the probability p₂, . . . ; then

Ni + N₂ + . . . need not be finite.

(c) Given N1 simple events E', N₂ simple events E", . . . , and N_N simple events E⁽ⁿ⁾ respectively associated with n independent component experiments (Sec. 18.2-4), there exist exactly N₁N₂ . . . N_N simple experiments [E' ∩ E" ∩ . . . ∩ E⁽ⁿ⁾] ≡ [E', E", • • • , E⁽ⁿ⁾].

(d) In many problems, the simple events under consideration are various possible arrangements of a given set or sets of elements, so that the numbers N₁, N₂, . . . in (a), (6), and (c) above are numbers of permutations, combinations, etc. The most important relevant definitions and formulas are given in Appendix C.

18.7-3. Problems Involving Discrete Probability Distributions: Successes and Failures in Component Experiments. Compound events are often described in terms of the results obtained in component experiments each admitting only two possible outcomes (“success” and “failure”). The probabilities of various compound events can be computed by the methods of Sees. 18.2-2 to 18.2-6 from the respective probabilities ϑ₁, ϑ₂, . . . of success in the first, second, . . . component experiment.

The methods of Sees. 18.5-6 to 18.5-8 may become applicable if one labels the events “success” and “failure” in the k^th-component experiment with the respective spectral values 1 and 0 of a discrete random variable x_k whose distribution is described by

Successes in two or more independent experiments are, by definition, statistically independent events (Sec. 18.2-4). Repeated independent trials (Sec. 18.2-4) each having only two possible outcomes are called Bernoulli trials (ϑ₁ = ϑ₂ = . . . = ϑ). The probability of realizing exactly x= x1 + x2 + • • • + x_N successes in n Bernoulli trials is given by the binomial distribution (Table 18.8-3). If the trials are independent, but the ϑ_k are not all equal, one obtains the generalized binomial distribution of Poisson.

A subsequence of r successes or failures in any sequence of n trials is called a run of length r of successes or failures (see also Ref. 18.4, Chap. 13).

18.8. SPECIAL PROBABILITY DISTRIBUTIONS

18.8-1. Discrete One-dimensional Probability Distributions.*

Tables 18.8-1 to 18.8-7 describe a number of discrete one-dimensional distributions of interest, for instance, in connection with sampling problems and games of chance. The generating function rather than the characteristic function or the moment-generating function is tabulated: the latter two functions are easily obtained from

(see also Sec. 18.3-8). Moments not tabulated are also easily derived by the methods of Sec. 18.3-10.

Table 18.8-1. The Casual Distribution (see also Table 18.8-8)

Table 18.8-2. The Hypergeometric Distribution

Table 18.8-3. The Binomial Distribution (Fig. 18.8-1; see also Sec 18.7-3)

18.8-2. Discrete Multidimensional Probability Distributions (see also Sec. 18.4-2). (a) A multinomial distribution is described by

where ϑ₁, ϑ₂, . . . , ϑn are positive real numbers such that

FIG. 18.8-2. The Poisson distribution. (From Goode, H. H., and R. E. Machol, System Engineering, McGraw-Hill, New York, 1957.)

Given an experiment having n mutually exclusive results E₁, E₂, . . . , E_N with respective probabilities ϑ₁, ϑ₂, . . . , ϑn such that ϑ + ϑ2 + . . . + ϑn = 1, the expression (1) is the probability that the respective events E₁, E₂, . . . , E_N occur exactly x₁, x₂, ... , x_N times in N independent repeated trials (see also Sec. 18.7-3). In classical statistical mechanics, x₁ x₂, . . . , x_N are the occupation numbers of n independent states with respective a priori probabilities ϑ₁, ϑ₂, . . . , ϑn

Table 18.8-5. The Geometric Distribution

Table 18.8-6. Pascal's Distribution

Table 18.8-7. Polya's Distribution (Negative Binomial Distribution)

(b) A multiple Poisson Distribution is described by

18.8-3. Continuous Probability Distributions: The Normal (Gaussian) Distribution. A continuous random variable z is normally distributed (normal) with mean ξ and variance σ² [or normal with parameters ξ, σ²; normal with parameters ξ, σ; normal (ξ, σ²); normal (ξ, σ)] if

The distribution of the standardized normal variable (normal deviate) (see also Sec. 18.5-3c) is given by

(see also Fig. 18.8-3 and See. 18.8-4). erf z is the frequently tabulated error function (normal error integral, probability integral; see also Sec. 21.3-2)

φ(X) has points of inflection for X = ξ ± a. Note

where H_k(z) is the k^ih Hermite polynomial (Sec. 21.7-1).

Every normal distribution is symmetric about its mean value ξ; ξ is the median and the (single) mode. The coefficients of skewness and excess are zero, and

The moments α_r about the origin may be computed by the methods of Sec. 18.3-10.

The normal distribution is of particular importance in many applications, especially in statistics (Sees. 19.3-1 and 19.5-3).

18.8-4. Normal Random Variables: Distribution of Deviations from the Mean. (a) For any normal random variable x with mean ξ and variance σ²,

FIG. 18.8-3. (a) The normal frequency function

and (b) the normal distribution function

(From Burington, R. S., and D. C, May, Handbook of Probability and Statistics, McGraw-Hill, Nm York, 1953,)

Table 18.8-8. Continuous Probability Distributions

are often referred to as tolerance limits of the normal deviate u or as a. values of the normal deviate (see also Sec. 19.6-4). Note

(c) Note the following measures of dispersion for normal distributions (see also Table 18.3-1):

The mean deviation (m.a.e)

The probable deviation (p.e., median of |x – ξ|)

One-half the half width

The precision measure (see also Sees. 18.8-3, 19.3-4, 19.3-5, and 19.5-3)

18.8-5. Miscellaneous Continuous One-dimensional Probability Distributions. Table 18.8-8 describes a number of continuous one-dimensional probability distributions (see also Sees. 19.3-4, 19.3-5, and 19.5-3).

18.8-6. Two-dimensional Normal Distributions, (a) A two-dimensional normal distribution is a continuous probability distribution described by a frequency function of the form

The marginal distributions of x₁ and x₂ are both normal with respective mean values ξ₁, ξ₂ and variances σ₁², σ₂²; ρ₁₂ is the correlation coefficient of x₁ and x₂. The five parameters ξ₁, ξ₂, σ₁, σ₂, ρ₁₂ define the distribution completely.

The conditional distributions of x₁ and x₂ are both normal, with

so that the regression curves are identical with the mean-square regression lines (Sec. 18.4-6). x₁ and x₂ are statistically independent if and only if they are uncorrelated (ρ₁₂ = 0, see also Sec. 18.4-11). Note

(b) Every two-dimensional normal distribution (16) can be described in terms of standardized normal variables u₁, u₂ with the correlation coefficient ρ₁₂, or in terms of statistically independent standardized normal variables (Sec. 18.5-5). Thus

(c) The distribution (16) is represented graphically by the contour ellipses φ(x₁, x₂) = constant, or

The probability that the “point” (x₁, x₂) is inside the contour ellipse (22) is

i.e., λ² = χP²(2) (Table 19.5-1). The two mean-square regression lines respectively defined by Eqs. (17) and (18) bisect all contour-ellipse chords in the x₁ and x₂ directions, respectively (see also Sec. 2.4-6).

18.8-7. Circular Normal Distributions. Equation (16) represents a circular normal distribution with dispersion a about the center of gravity (ξ₁, ξ₂) if and only if ρ₁₂ = 0, σ₁ = σ₂ = σ. The contour ellipses (22) become circles corresponding to fractiles of the radial deviation (radial error) The distribution of r is given by

(see also Sec. 18.11-16 and Table 19.5-1).

Circular normal distributions are of particular interest in problems related to gunnery; circular probability paper shows contour circles for equal increments of Φ_r(R). Note

18.8-8. n-Dimensional Normal Distributions. * The joint distribution of n random variables x₁ x₂, . . . , x_N is an n-dimensional normal distribution if and only if it is a continuous probability distribution having a frequency function of the form

* See footnote to Sec. 18.4-8.

Each normal distribution is completely defined by its center of gravity (ξ₁, ξ₂, . . . , ξ_N) and its moment matrix [λ_jk] ≡ [Λ_jk]^-1, or by the corresponding variances and correlation coefficients (Sec. 18.4-8). The characteristic function is

Each marginal and conditional distribution derived from a normal distribution is normal. All mean-square regression hypersurfaces are identical with the corresponding mean-square regression hyperplanes (Sec. 18.4-9). n random variables x₁, x₂, . . . , x_N having a normal joint distribution are statistically independent if and only if they are vjicorrelated (see also Sec. 18.4-11).

Each w-dimensional normal distribution can be described as the joint distribution of n statistically independent standardized normal variables related to the original variables by a linear transformation (18.5-15).

18.8-9. Addition Theorems for Special Probability Distributions * (see also Sec. 18.5-7 and Table 19.5-1). (a) The binomial distribution (Table 18.8-3), the Poisson distribution (Table 18.8-4), and the Cauchy distribution (Table 18.8-8) “reproduce themselves” on addition of independent variables. If the random variable x is defined as the sum

of n statistically independent random variables x₁, x₂, . . . , x_N, then

(b) The sum x = x₁+x₂+ . . . + x_N of n statistically independent random variables x₁, x₂, . . . , x_N is a normal variable if and only if x₁, x₂ . . . , x_N are normal variables. In this case,

If x₁, x₂, . . . , x_N are (not necessarily statistically independent) normal variables, then x = a₁x₁ + a₂x₂ + . . . + a_Nx_N is a normal variable whose mean and variance are given by Eq. (18.5-19).

* See footnote to Sec. 18.3-4.

18.9. MATHEMATICAL DESCRIPTION OF RANDOM PROCESSES

18.9-1. Random Processes.  Consider a variable x capable of assuming different values x(t) for different values of an independent variable t. A random process (stochastic process) selects a specific sample function x(t) from a given theoretical population (Sec. 19.1-2) or ensemble of possible sample functions. More specifically, the functions x(t) are said to describe a random process if and only if the sample values x₁ = x(t₁), x₂= x(t₂), . . . are random variables admitting definition of a joint probability distribution for every finite set of values (sampling times) t₁, t₂, . . . (Fig. 19.8-1). The random process is discrete or continuous if the joint distribution of x(t₁), x(t₂), . . . is, respectively, discrete or continuous for every finite set t₁, t₂, . . . . The process is a random series if the independent variable t assumes only a countable set of values. More generally, a random process may be described by a multidimensional variable x(t) ≡ [x(t), y(t), . . .].

The definition of a random process implies the existence of a probability distribution on the (in general, infinite-dimensional) sample space (Sec. 18.2-7) of possible functions x(t). Each particular function x(t) ≡ X(t) constitutes a simple event [sample point, “value” of the multidimensional random variable x(t)].

In most applications the independent variable t is the time, and the variable x(t) or x(t) labels the state of a physical system.  EXAMPLES: Results of successive observations, states of dynamical systems in Gibbsian statistical mechanics or quantum mechanics, messages and noise in communications systems, economic time series.

18.9-2. Mathematical Description of Random Processes.  (a) To describe a random process, one must specify the distribution of x(t₁) and the respective joint distributions of [x(t₁), x(t₂)], [x(t₁), x(t₂), x(t₃)], . . . for every finite set of values t₁, t₂, t₃, . . . (first, second, third, . . . probability distributions associated with the random process). These distributions are described by the corresponding first, second, ... (or first-order, second-order, . . .) distribution functions (see also Sec. 18.4-7)

or, respectively for discrete and continuous random processes, by the corresponding probabilities and frequency functions

NOTE: The sequence of distribution functions (la) describes the random process in increasing detail, since each distribution function Φ_(n) completely defines all preceding ones as marginal distribution functions (Sec. 18.4-7). The same is true for each sequence (1b). Each of the functions (1) is symmetric with respect to (unaffected by) interchanges of pairs X_i, t_i and X_k, t_k.

(b)Conditional probability distributions descriptive of the random process are related to the functions (1b) in the manner of Sec. 18.4-7; thus

NOTE: The functions (2) are not in general symmetric with respect to interchanges of pairs X_i, t_i and X_k, t_k separated by the bar.

(c)A multidimensional random process, say one generating a pair of sample functions x(t), y(t), is similarly defined in terms of joint distributions of sample values x(t_i), y(t_k). In particular,

18.9-3. Ensemble Averages.  (a) General Definitions. The ensemble average (statistical average, mathematical expectation) of a suitable function f[x(t₁), x(t₂), . . . , x(t_n)] of n sample values x(t₁), x(t₂), . . . , x(t_n) (statistic, see also Sec. 19.1-1) is the expected value (Sec. 18.4-8a)

if this limit exists in the sense of absolute convergence.  Integration in Eq. (4) is over X₁, X₂, . . . , X_n] E{f} is a function of t1, t₂, . . . , t_N.

Similarly, for a multidimensional random process described by x(t), y(t),

if the limit exists in the sense of absolute convergence.

(b)Ensemble Correlation Functions and Mean Squares.  The ensemble averages E{x(t₁)} = ξ(t₁), E{x²(t₁)}, and

are of special interest.  They abstract important properties of the random process and are frequently all that is known about the process: note that

The definitions (6) and Eq. (7) apply to real x(t), y(t). If x(t) and/or y(t) is a complex variable (really a two-dimensional random variable), then one defines

which includes (6) as a special case; R_xy is necessarily real for real x and y.

Note that, for real or complex x, y,

Existence of the quantities on the right implies that of the correlation functions on the left.

(c)Characteristic Functions.  The n^th characteristic function corresponding to the n^th distribution function (la) of the random process (see also Sec. 18.4-10) is

Joint characteristic functions for x(t), y(t), . . . are similarly defined. Characteristic functions can yield moments like E{x(t₁)}, E{x²(t₁)}, R_xx(t₁, t₂), . . . by differentiation in the manner of Sees. 18.3-10 and 18.4-10.

(d)Ensemble Averages of Integrals and Derivatives(see also Sec. 18.6-3).Random integrals of the form

are defined in the sense of convergence in probability (Sec. 18.6-1) or, if possible, in the mean-square sense of Sec. 18.6-3. The integral converges in mean (in the sense of Sec. 18.6-3) if and only if

exists.  If dt exists, then the integral (12) exists in the sense of absolute convergence for each sample function x(t), except vossibly for a set of probability 0, and

The important relation (14) is needed, in particular, to derive the input-output relations of Sec. 18.12-2 (see also Refs. 18.13 to 18.17).

The random process generating x(t) is continuous in the mean (mean-square continuous) at t = t_o in the sense of Sec. 18.6-3 if and only if

this is true if and only if R_xx(t₁,t₂) exists and is continuous for t₁ = t₂ = t₀. The random process generating will be called the mean-square derivative of a random process generating x(t) if and only if

This is true if and only if ∂²R_xx(t₁, t₂)/∂t₁ ∂t₂ exists and equals ∂²R_xx(t₁, t₂)/∂t₂∂t₁ for all t₁ = t₂.  It follows that

(see also Sec. 18.12-2).

18.9-4. Processes Defined by Random Parameters.  It is often possible to represent each sample function of a random process as a deterministic function x = x(t); η₁, η₂, . . .) of t and a set of random parameters η₁, η₂, . . . . The process is then denned by the joint distribution of η₁, η₂, . . . ; in this case,

In particular, each probability distribution of such a random process is uniquely defined by its characteristic function (Sec. 18.4-10)

18.9-5. Orthonormal-function Expansions.  Given a real or complex random process x(t) with E{x(t)} finite and R_xx(t₁, t₂) bounded and continuous on the closed observation interval [a, b], there exist complete orthonormal sets of functions u₁(t), u₂(t), . . . (Sec. 15.2-4) such that

where the series and the integral for each c_k converges in mean in the sense of Sec. 18.6-3 (see also Sec. 18.9-3d). The random process is, then, represented by the set of random coefficients c₁, c₂, . . . ; the first n coefficients may give a useful approximate representation. In particular, there exists a complete orthonormal set u_k(t) ≡ Ψ_k(t) such that all the c_k are uncorrelated standardized random variables, i.e.,

(Karhunen-Loeve Theorem).  Specifically, the required Ψ_k{t) are the eigenfunctions of the integral equation

(see also Sec. 15.3-3).  The corresponding eigenvalues λ_k, are nonnegative and have at most a finite degree of degeneracy (by Mercer's theorem, Sec. 15.3-4), and

The Karhunen-Loéve theorem constitutes a generalization of the theorem of Sec. 18.5-5.

EXAMPLES: Periodic random processes (Sec. 18.11-1), band-limited flat-spectrum noise (Sec. 18.11-2b).  Although explicit analytical solution of the integral equation (14) is rarely possible, the theorem is useful in detection theory (Ref. 19.24).

18.10. STATIONARY RANDOM PROCESSES. CORRELATION FUNCTIONS AND SPECIAL DENSITIES

18.10-1. Stationary Random Processes.  A random process, or the corresponding ensemble of functions x(t), is stationary if and only if each of its probability distributions is unchanged when t is replaced by t + t_o, so that

i.e., the n^th probability distribution depends only on a set of n — 1 differences

of sampling times t_k.  Similarly, two or more random processes generating x(t), y(t), . . . are jointly stationary if and only if their joint probability distributions are unchanged when t is replaced by t + t₀.

For stationary and jointly stationary random processes, each ensemble average (18.9-4) or (18.9-5) depends only on n — 1 differences (2):

for every t₁ (see also Sec. 18.10-2).

18.10-2. Ensemble Correlation Functions(see also Sec. 18.9-3b). (a) For stationary x(t) [and jointly stationary x(t), y(t)], the expected values

E{x(t)} ≡ E{x} = ξ E{|x(t)|²} ≡ E{|x|²} E{y(t)} ≡ E{y} = η . . .

are constant, and the ensemble correlation functions (18.9-8) reduce to functions of the delay t₂ — t₁ = separating t₁ and t₂.  In this case,

Again, existence of the quantities on the right implies existence of the correlation functions on the left.  If R_xx() is continuous for = 0, it is continuous for all (Ref. 18.17).

[R_xx(t_i — t_k)] is a positive-semidefinite hermitian matrix (Sec. 13.5-3) for every finite set t₁, t₂, . . . , t_n.

(b)Normalized ensemble correlation functions are defined by

Note |ρ_xx| ≤ 1, |ρ_xy| ≤ 1.  For real stationary x, y, ρ_xx and ρ_xy are real correlation coefficients (Sec. 18.4-4), and Eq. (4) implies

Random processes which are not stationary or jointly stationary but have constant E{x(t)}, E{y(t)} and “stationary correlation functions” satisfying Eq. (4) are often called stationary, or jointly stationary, in the wide sense.

18.10-3. Ensemble Spectral Densities.  If x(t) is generated by a stationary random process, and x(t), y(t) by jointly stationary random processes, the ensemble power spectral density Φ_xx(ω) and the ensemble cross-spectral density Φ_xy(ω) are defined by

Assuming suitable convergence, this implies

The Fourier transforms (9) are introduced, essentially, to simplify the relations between input and output correlation functions in linear time-invariant systems (Sec. 18.12-3).  Existence of the transforms (9) requires, besides the existence of E{|x|²} and E{|y|²} (Sec. 18.9-3b), that R_xx(r) or R_xy() or R_xy() decays sufficiently quickly as → ∞.  In the case of periodic and d-c processes, one extends the definitions of spectral densities to include delta-function terms chosen so that Eq. (10) is satisfied (Sec. 18.10-9).

18.10-4. Correlation Functions and Spectra of Real Processes. The relations (9) and (10) apply to both real and complex random processes x(t), y(t).  Note that the power spectral density Φ_xx(ω) is always real, even if x is complex; but the cross-spectral density Φ_xy(ω) may be a complex function even for real x, y.  If x and y are real, the same is true for the correlation functions R_xx(τ), R_xy(τ).  In this case,

Note again that Eqs. (11) to (13) apply to real x, y.

18.10-5. Spectral Decomposition of Mean “Power” for Real Processes.  For real x(t), substitution of τ = 0 in Eqs. (11) and (12) yields

This is interpreted as a spectral decomposition of E{x²} (mean “power”). In the first integral, contributions to E{x²} are “distributed” over both positive and negative frequencies with density Φ_xx(ω) (“two-sided” power spectral density), measured in (x units)²/cps, sinceω/2π is frequency in cps. Alternatively, we can consider E{x²} as distributed only over nonnegative (“real”) frequencies with the “one-sided” power spectral density 2Φ_xx(ω) (x units)²/cps.

Intuitive interpretation of the—in general complex—cross-spectral density Φ_xy(ω) is not quite so simple.  For real x(t), y(t), substitution of r = 0 in Eq. (10) yields

Re Φ_xy(ω) is often called a cross-power spectral density.  Im Φ_xy(ω) (cross-quadrature spectral density) does not contribute to the mean “power” (15).

18.10-6. Some Alternative Ensemble Spectral Densities.  Other spectral-density functions found in the literature are

(v = ω/2π; two-sided spectral density in x units²/cps)

(two-sided spectral density in x-units²/ radian/ sec)

and the one-sided spectral densities

Note that Γ_xx(v) and G_xx(ω) are defined only for nonnegative frequencies. Similar definitions also apply to cross-spectral densities. Note that symbols and definitions vary greatly in the literature; the correct definition should be restated and referred to in each case.

18.10-7. t Averages and Ergodic Processes.  (a) t Averages.  Given any function x(t), the t average (average over t, frequently a time average) of a measurable function f[x(t₁), x(t₂), . . . , x(t_n)] is defined as

if the limit exists.*  If x(t) describes a random process, then <f> is (like f, but unlike E{f}) a random variable (statistic) for each given set of values t₁, t₂, . . . , t_n.  Note that

whenever the integrals exist.

(b)Ergodic Processes.  A (necessarily stationary) random process generating x(t) is ergodic if and only if the probability associated with every stationary subensemble is either 0 or 1. Every ergodic process has the ergodic property: the t average (20) of every measurable function f[x(t₁), x(t₂), . . . , x(t_n)] equals its ensemble average (18.9-4) with probability one, i.e.,

whenever these averages exist. Any one of the functions x(t) will then define the random process uniquely with probability one, e.g., in terms of the characteristic functions (18.9-11) computed from x(t) by means of Eq. (21). Each t average, such as <x>, <x²>, or R_xx(τ), will then

* The notation is sometimes used instead of <f>, as well as instead of E[f}; but the symbol is preferably reserved for the sample average

where ^kf is the value of f obtained from one of an empirical random sample of n sample functions x(t) = ^kx(t) (k = 1, 2, . . . , n; see also Sec. 19.8-4).

represent, with probability one, a property common to the entire ensemble of functions x(t).

Two or more jointly stationary random processes are jointly ergodic if and only if the probability associated with every stationary joint sub-ensemble is either 0 or 1.  The ergodic theorem applies to averages computed from sample values of jointly ergodic processes.

18.10-8. Non-ensemble Correlation Functions and Spectral Densities.  Given the real or complex functions x(t), y(t) (which may or may not be sample functions of a random process) such that

exist, the t averages

exist.  These correlation functions satisfy all the relations listed in Sec. 18.10-2, if each ensemble average (expected value) is replaced by the corresponding t average.  Again, the (non-ensemble) power spectral density Ψ_xx(ω) and the cross-spectral density Ψ_xy(ω) are introduced through the Wiener-Khinchine relations

If these “individual” spectral densities exist (one formally admits delta-function terms, Sec. 18.10-9), they satisfy relations analogous to those listed in Sees. 18.10-3 to 18.10-5.  Alternative non-ensemble spectral densities can be defined in the manner of Sec. 18.10-6.

If x(t), y(t) are sample functions of jointly stationary random processes, then the correlation functions (24), (25) and the spectral densities (26) are random variables whose expected values equal the corresponding ensemble functions whenever they exist.  If x(t), y(t) are jointly ergodic, then the correlation functions (24), (25) and the spectral densities (26) are identical to the corresponding ensemble quantities with probability one.

As an alternative definition, spectral densities are sometimes introduced by the formal relation

where a_T(ω) and b_T(ω) are Fourier transforms of the “truncated” functions x_T(t), y_T(t) respectively equal to x(t), y(t) for |t| < T and zero for |t| > T:

The corresponding ensemble spectral density Φ_xy(ω) may then be defined by Φ_xy(ω) = E{Φ_xy(ω)}, and the Wiener-Khinchine relations (26) follow from Borel's convolution theorem (Table 4.11-1).  In general, however, Eq. (27) is valid only if both sides appear in an integral over ω (in particular, spectral densities often contain delta-function terms, Sees. 18.10-9 and 18.11-5; see also Sec. 18.10-10).

18.10-9. Functions with Periodic Components (see also Sec. 18.11-1).  Like other t averages, non-ensemble correlation functions and spectra are of interest mainly if they happen to equal the corresponding ensemble quantities with probability one (this is true for all t averages in the case of ergodic processes, Sec. 18.10-76). When this is true, the single integrals (24), (25) may be easier to compute than the double integrals (4).  The ergodic property also permits interpretation of, say, Φ_xx(ω) in terms of the “frequency content” of a single “typical” sample function x(t), since Φ_xx(ω) = Ψxx(ω) with probability one.

Without recourse to probability theory, non-ensemble correlation functions and spectra can be computed only for functions x(t), y(t) representable as sums of periodic components (except for the trivial case that the correlation function or spectral density is identically zero).  In particular, for

More generally, let x(t) be a real function and of bounded variation in every finite interval and such that <|x(0)|²> exists.  Then x(t) can be represented almost everywhere (Sec. 4.6-146) as the sum of its average value <x(0)> = c₀, a countable set of periodic components, and an aperiodic component* p(t):

* The aperiodic component p(t) may be expressible as a Fourier integral [<|x(0)|²> = 0], or <|x(0)|²> may be different from zero (“random” component); or p(t) may be a sum of both types of terms.

Let y(t) be another real function y(t) satisfying the same conditions as x(t), so that

The set of circular frequencies ω₁, ω₂, . . . is understood to include the periodic-component frequencies of both x(t) and y(t).  Then

The cross correlation function R_xy(τ) measures the “coherence” of x(t) and y(t) or the “serial correlation” between the function values x(t) and y(t + τ) separated by a delay τ.  x(t) and y(t) are uncorrelated if and only if R_xy(τ) ≡ 0.

NOTE: The (real) functions x(t), y(t) belong to a complex unitary vector space with inner product (u, v) = <*(0)v(0)> (Sec. 14.2-6).  Note the useful orthogonality relations

18.10-10. Generalized Fourier Transforms and Integrated Spectra.  (a) To avoid the difficulties associated with delta-function terms in the Fourier transforms and spectral densities of periodic functions, one may introduce the generalized or integrated Fourier transform X_INT(iω) of x(t), defined (to within an additive constant) by

The corresponding inversion integral is the Stieltjes integral (Sec. 4.6-17)

If the Fourier transform X_F(iω) of x(t) exists, then

If x(t) can be represented as (this is, in particular, true for periodic functions; see also Sec. 18.11-1), then X_INT(iω) is a step function (Sec. 21.9-1).

(b)The integrated power spectrum Φ_INT(ω) of a stationary or wide-sense stationary random process generating x(t) is the generalized Fourier transform of its autocorrelation function:

Analogous relations can be written for non-ensemble correlation functions and spectra.

(c)Note the following generalizations of the Wiener-Khinchine relations (9) and (26) for real stationary (or wide-sense stationary) x(t).

For τ = 0, Eq. (40) yields Wiener's Quadratic-variation Theorem

If the non-ensemble power spectral density Ψ_xx(ω) exists, Eq. (40) reduces to the Wiener-Khinchine relation (26), with

18.11 SPECIAL CLASSES OF RANDOM PROCESSES. EXAMPLES

18.11-1. Processes with Constant and Periodic Sample Functions. (a) Constant Sample Functions (Fig. 18.11-la).  If each sample function x(t) is identically equal to a constant random parameter a with given probability distribution, the latter determines the resulting random process uniquely.  The process is stationary; but it is not ergodic.  If E{a²} exists,

(b)Random-phase Sine Waves.   Let

(Fig. 18.11-1b) where a is a given constant, and the phase angle α is a random variable uniformly distributed between 0 and 2π.  The process is stationary and ergodic, with

If the amplitude a of the random-phase sine wave is not a constant, but is itself a (positive) random variable independent of α (as in amplitude modulation), the process is stationary but not in general ergodic.

Now

If, in particular, the amplitude a has a Rayleigh distribution defined by

(circular normal distribution with σ² = 1, Sec. 18.8-7), then the random process is Gaussian (Sec. 18.11-3).

If the phase angle α is not uniformly distributed between 0 and 2π, then the process is nonstationary even if the amplitude a is fixed.

FIG. 18.11-1. Sample functions x(t) for five examples of random processes.  In Fig. 18.11-le, x(t) is the sum of the individual pulses a_kv(t — t_k) shown.

(c)More General Periodic Processes(see also Sec. 18.10-9). The random-phase sine wave is a special case of the general random-phase periodic process represented by

where α is uniformly distributed between 0 and 2π; it is assumed that the series converges in mean square in the sense of Sec. 18.6-3.  The

FIG. 18.11-2. Autocorrelation function and power spectrum for a random telegraph wave (a) and a coin-tossing sample-hold process (b) having equal mean count rates α = 1/2Δt, both with zero mean and mean square α².  Note that different ω scales are used in (a) and (b). (From G. A. Korn, Random-process Simulation and Measurements, McGraw-Hill, New York, 1966.)

process is stationary and ergodic, with

A still more general periodic process is denned by the Fourier series

with real random coefficients c₀, a_k, b_k, assuming that the series converges in mean square.  Such a process is wide-sense stationary if and only if

In this case, Eq. (8) is an orthogonal-series expansion in the sense of Sec. 18.9-5, and

18.11-2. Band-limited Functions and Processes.  Sampling Theorems.  (a) A function x(t) is band-limited between ω = 0 and ω = 2πB if and only if its Fourier transform X_F(iω) (Sec. 4.11-3) exists and equals zero for |ω| > 2πB; B (measured in cycles per second if t is measured in seconds) is the bandwidth associated with x(t). For every band-limited x(t)

i.e., x(t) is uniquely determined for all t by samples x(t_k)spaced 1/2B t-units apart (Nyquist-Kotelnikov-Shannon Sampling Theorem).

The functions (Fig. 18.11-3)

FIG. 18.11-3. The sampling function sinc (see also Table F-21).

constitute a complete orthonormal set for the space of functions x(t) band-limited between ω = 0 and ω = 2πB (Sec. 15.2-4); note

(b)A stationary or wide-sense stationary random process with sample functions x(t) is band-limited between ω = 0 and ω = 2πB if and only if its ensemble power spectral density Φ_xx(ω) exists and equals zero for |ω| > 2πB.  In this case, the expansion (11) applies in the sense of mean-square convergence (Sec. 18.6-3), i.e.,

and Eq. (11) represents each sample function x(t) in terms of its sample values x_k = x(k/2B) with probability one.

NOTE: In the special case of a stationary band-limited “flat-spectrum” process with

the sample values x_k = x(k/2B) have zero mean and are uncorrelated.

18.11-3. Gaussian Random Processes (see also Sees. 18.8-3 to 18.8-8, and 18.12-6).  A real random process is Gaussian if and only if all its probability distributions are normal distributions for all t₁, t₂, . . . . Every Gaussian process is uniquely defined by its (necessarily normal) second-order probability distribution, and hence by the ensemble autocorrelation function R_xx(t₁, t₂) ≡ E{x(t₁)x(t₂)} together with ξ(t) ≡ E{x(t)}. Specifically, the joint distribution of every set of sample values x₁ = x(t₁), x₂ = x(t₂), . . . , x_n = x(t_n) is a normal distribution with probability density

Processes obtained through addition of Gaussian processes and/or linear operations on their sample functions are Gaussian (Sec. 18.12-2). Coefficients in orthogonal-function expansions of a Gaussian process (Sec. 18.9-5) are jointly Gaussian random variables.

18.11-4. Markov Processes and the Poisson Process.  (a) Random Process of Order n.  A random process of order n is a random process completely specified by its n^th (n^th-order) distribution function Φ_(n)(Sec. 18.9-2), but not by Φ_(n—1).

(b)Purely Random Processes.  A random process described by x(t) is a purely random process if and only if the random variables x(t₁), x(t₂), . . . are statistically independent for every finite set t₁, t₂, . . . . A purely random process is completely specified by Φ₍₁₎)(X₁, t₁), p₍₁₎(X₁, t₁, or φ₍₁₎(X₁, t₁).

EXAMPLES: Successive independent observations, Bernoulli trials, and random samples in statistics (Sec. 19.1-2) represent purely random series.  Purely random continuous-parameter processes imply sample functions of unlimited bandwidth and cannot, strictly speaking, describe real physical phenomena.

(c)Markov Processes.  A discrete or continuous random process described by x(t) is a (simple) Markov process if and only if, for every finite set t₁ < t₂ < . . . < t_n—1 < t_n,

respectively.  If x(t_n—) = X_n—1 is given, knowledge of x(t_n—2), x(t_n—3), . . . contributes nothing to one's knowledge of the distribution of x(t_n). A Markov process is completely specified by its second-order probability distribution and hence by its first-order probability distribution together with the “transition probabilities” given by

A Markovian random series is often called a Markov chain.  Every purely random process is a Markov process.

Many physical processes can be described as Markov processes.  An important class of problems involves the determination of the functions (21) from their given “initial values” specified for t = t₁.  The defining property (20) of a Markov process implies the Chapman-Kolmogorov-Smolu-chovski equation

Equation (22) is a first-order difference equation (Sec. 20.4-3) which may be solved for the unknown function (21) of the independent variable t whenever p(x, t|X₁, t₁) or φ(x, t|X₁, t₁) is suitably given.  If p₍₁₎(X₁, t₁) or φ₍₁₎(X₁, t₁) is known, the Markov process is now completely determined for all t > t₁.

(d) The Poisson Process.  In many problems involving random searches, waiting lines, radioactive decay, etc., x{t) is a discrete random variable capable of assuming the spectral values 0, 1, 2, . . . (“counting process”; number of “successes,” telephone calls, disintegrations, etc.). A frequently useful model assumes the Markov property (20a) and

where o(Δt) denotes a term such that o(Δt)/Δt becomes negligible as Δ → 0 (Sec. 4.4-3). To find

substitute the given transition probabilities (23) into the Smoluchovski equation (22a) for t₂ = t + Δt to obtain the difference equation

with P( — 1, T) ≡ 0. As Δt → 0, this reduces to an ordinary differential equation

for each K.  These differential equations are solved successively for P(0, T), P(l, T), P(2, T), . . . , with initial conditions given by

It follows that

Thus, once the process is started, the number K of state changes in every time interval of length T has the Poisson distribution (Table 18.8-4).  α is called the mean count rate of the Poisson process.

The probability that no state changes take place is

so that the probability that at least one state change takes place is

The time interval T₁ between successive state changes is a random variable with probability density

and expected value 1/α.

Within any finite time interval of length T, a Poisson process is also uniquely defined by the joint distribution of the K + 1 statistically independent random variables K, t₁, t₂ . . . , t_K, where K is the number of state changes during the time T, and t₁, t₂, . . . , t_K are now the respective times of the 1^st, 2^nd, . . . , K^th state change during this time interval.  One has

(e)See Refs. 18.15, 18.16, and 18.17 for treatments of more general Markov processes.

18.11-5. Some Random Processes Generated by a Poisson Process. (a) Random Telegraph Wave (Fig. 18.11-lc).  x(t) equals either a or — a, with sign changes generated by the state changes of a Poisson process of mean count rate α (Sec. 18.11-4d).  The process is stationary and ergodic if started at t = — ∞, and

(b)Process Generated by Poisson Sampling (Fig. 18.11-ld). x(t) changes value at each state change of a Poisson process with mean count rate α; between state changes, x(t) is constant and takes continuously distributed random values x with given mean ξ and variance σ². The process is stationary and ergodic if started at t = — ∞, and

(c)Impulse Noise and Campbell's Theorem (Fig. 18.11-le). x(t) is the sum of many similarly shaped transient pulses,

whose shape is given by v = v(t), with

while the pulse amplitude a_k is a random variable with finite variance, and the times t_k are random incidence times determined by the state changes of a Poisson process with mean count rate α.  The process is stationary and ergodic if started at t = — ∞; it approximates a Gaussian random process if many pulses overlap.  One has

In the special case where a_k is a fixed constant, the formulas (36) are known as Campbell's theorem.

18.11-6. Random Processes Generated by Periodic Sampling. Certain measuring devices sample a stationary and ergodic random variable q(t) periodically and then hold their output x(t) for a constant sampling interval Δt.  The resulting random process is stationary and ergodic if the timing of the periodic sampling commands is random and uniformly distributed between 0 and Δt.  A sample function x(t) will be similar to Fig. 18.11-ld except that state changes must be separated by integral multiples of Δt.  If q is a binary random variable capable of assuming only the values a and —a with probabilities 1/2, 1/2, then x(t) will resemble the random telegraph wave of Fig. 18.11-lc, except that state changes are, again, separated by integral multiples of Δt (“coin-tossing” sample-hold process).

If different samples of q are statistically independent, then

and hence

Figure 18.11-2 compares R_xx(τ) and Φ_xx(ω) for a random telegraph wave and a coin-tossing sample-hold process with equal mean count rates α = 1/2Δt, zero mean, and E{x²} = a².

18.12. OPERATIONS ON RANDOM PROCESSES

18.12-1. Correlation Functions and Spectra of Sums.  Let x(t), y(t) be generated by real or complex random processes.  For

with real or complex α, β, the correlation functions R_xz(t₁, t₂), R_zx(t₁, t₂), R_zz(t₁, t₂) are given by

These relations also apply to the correlation functions R_xz(τ), R_zx(τ), R_zz(τ) of stationary random processes; the corresponding spectral densities are

18.12-2. Input-Output Relations for Linear Systems.  (a) Consider a real linear system with real input x(t) and output

where the weighting function (Green's function, Sees. 9.3-3 and 9.4-3) is the system response to a unit-impulse input δ(t — λ) (impulse applied at t = λ), and h(t, ζ) ≡ w(t, t — ζ).

In the most important applications, t represents time, and w(t, λ) = 0 for t < λ, since physically realizable systems cannot respond to future inputs (see also Sec. 9.4-3).

(b)If x(t) is generated by a real random process, and if E{x²(t)} and E{y²(t)} exist, then

If x(t) is Gaussiany y(t) is also Gaussian and completely determined by Eqs. (5) to (7).

18.12-3. The Stationary Case.  (a) If the input x(t) is stationary, and

(time-invariant linear system, see also Sec. 9.4-3), then the system output y(t) is also stationary; y(t) will be ergodic (Sec. 18.10-7b) if this is true for x(t).  The input-output relations (4) to (7) for real x(t), y(t) reduce to

In most applications, physical realizability requires h(ζ) = 0 for ζ < 0 (see also Sec. 9.4-3).

(b)The important input-output relations (11) are greatly simplified if they are expressed in terms of spectral densities (Sec. 18.10-3):

(c)Note also

In the special case of stationary white-noise input with R_xx(τ) ≡ Φ₀(τ) (Sec. 18.11-46), note

18.12-4. Relations for t Correlation Functions and Non-ensemble Spectra.  The relations (2), (4), and (10) to (17) all hold if each ensemble average, correlation function, and spectral density is replaced by the corresponding t average, t correlation function, and non-ensemble spectral density (Sees. 18.10-7 to 18.10-9), whenever these quantities exist.

18.12-5. Nonlinear Operations.  Given a random process generating x(t) and a single-valued, measurable function y = y(x), the functions

represent a new random process produced by a (generally nonlinear) zero-memory operation on the x(t); y(x) does not depend explicitly on t. Distributions and ensemble averages of the y process are obtained by the methods of Sees. 18.5-2 and 18.5-4.  In particular, the autocorrelation function of the “output” y is, for real variables,

where x₁ = x(t₁), x₂ = x(t₂); y₂ = y(x₁), y₂ = y(x₂).

If this turns out to be more convenient, R_yy(t₁, t₂) can be obtained in the form

where the integration contours C₁, C₂ parallel the imaginary axis in suitable absolute-convergence strips (Ref. 18.15). The “transform method” is especially useful in connection with certain practically important transfer characteristics y(x), e.g., limiters, half-wave detectors, quantizers, etc. (Refs. 18.13 and 18.15).

18.12-6. Nonlinear Operations on Gaussian Processes.  (a) Price's Theorem (Ref. 18.17). Given two jointly normal random variables x₁, x₂ with covariance λ₁₂ and a function f(x₁, x₂) such that

for some real a > 0, b < 2, then

Price's theorem yields ensemble averages (and, in particular, correlation functions) in the form

where C is the value of E{f(x₁, x₂)} for λ₁₂ = 0, i.e., for uncorrelated x₁, x₂.

Price's theorem also leads to the useful recursion formula

In particular,

(b)Series Expansion.  Given a stationary Gaussian process x(t) with E{x} = 0, R_xx(τ) = σ²ρ_xx(τ) and a function y = y(x) such that R_yy(τ) exists, then

where the H_k(v) are the Hermite polynomials defined in Table 21.7-1.

18.13. RELATED TOPICS, REFERENCES, AND BIBLIOGRAPHY

18.13-1. Related Topics.  The following topics related to the study of probability theory and random processes are treated in other chapters of this handbook:

Measure, Lebesgue integrals, Stieltjes integrals, Fourier analysis  Chap. 4

Construction of mathematical models, abstract spaces, Boolean algebras.  Chap. 12

Orthogonal-function expansions  Chap. 15

Mathematical statistics, random-process measurements and tests  Chap. 19

Permutations and combinations Appendix C

18.13-2. References and Bibliography (see also Sec. 19.9-2).

      18.1. Arley, N., and K. R. Buch: Introduction to the Theory of Probability and Statistics, Wiley, New York, 1950.

      18.2. Burington, R. S., and D. C. May: Handbook of Probability and Statistics, 2d ed., McGraw-Hill, New York, 1967.

      18.3. Cramér, H.: Mathematical Methods of Statistics, Princeton, Princeton, N.J., 1951.

      18.4.———: The Elements of Probability Theory and Some of Its Applications, Wiley, New York, 1955.

      18.5. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. I, 2d ed., Wiley, New York, 1958; vol. II, 1966.

      18.6. Gnedenko, B. V.: Theory of Probability, Chelsea, New York, 1962.

      18.7.——— and A. I. Khinchine: An Elementary Introduction to the Theory of Probability, Dover, New York, 1961.

      18.8. Loéve, M. M.: Probability Theory, 3d ed., Van Nostrand, Princeton, N.J., 1963.

      18.9. Parzen, E.: Modern Probability Theory and Its Applications, Wiley, New York, 1960.

      18.10. Richter, H.: Wahrscheinlichkeitstheorie, 2d ed., Springer, Berlin, 1967.

Random Processes

      18.11. Bailey, N. T. J.: The Elements of Stochastic Processes with Applications to the Natural Sciences, Wiley, New York, 1964.

      18.12. Bharucha-Reid, A. J.: Elements of the Theory of Markov Processes and Their Applications, McGraw-Hill, New York, 1960.

      18.13. Davenport, W. B., Jr., and W. L. Root: Introduction to Random Signals and Noise, McGraw-Hill, New York, 1958.

      18.14. Doob, J. L.: Stochastic Processes, Wiley, New York, 1953.

      18.15. Middleton, D.: An Introduction to Statistical Communication Theory, McGraw-Hill, New York, 1960.

      18.16. Parzen, E.: Stochastic Processes, Holden-Day, San Francisco, 1962.

      18.17. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1965.

      18.18. Rosenblatt, M.: Random Processes, Oxford, New York, 1962.

      18.19. Saaty, T. L.: Elements of Queueing Theory with Applications, McGraw-Hill, New York, 1961.