CHAPTER 18
PROBABILITY THEORY AND RANDOM PROCESSES
18.2. Definition and Representation of Probability Models
18.2-1. Algebra of Events Associated with a Given Experiment
18.2-2. Mathematical Definition of Probabilities. Conditional Probabilities
18.2-3. Statistical Independence
18.2-4. Compound Experiments, Independent Experiments, and Independent Repeated Trials
18.2-7. Representation of Events as Sets in a Sample Space
18.3. One-dimensional Probability Distributions
18.3-1. Discrete One-dimensional Probability Distributions
18.3-2. Continuous One-dimensional Probability Distributions
18.3-5. Chebyshev's Inequality and Related Formulas
18.3-6. Improved Description of Probability Distributions: Use of Stieltjes Integrals
18.3-7. Moments of a One-dimensional Probability Distribution
18.3-8. Characteristic Functions and Generating Functions
18.4. Multidimensional Probability Distributions
18.4-2. Two-dimensional Probability Distributions. Marginal Distributions
18.4-3. Discrete and Continuous Two-dimensional Probability Distributions
18.4-4. Expected Values, Moments, Covariance, and Correlation Coefficient
18.4-5. Conditional Probability Distributions Involving Two Random Variables
18.4-7. n-dimensional Probability Distributions
18.4-8. Expected Values and Moments
18.4-9. Regression. Multiple and Partial Correlation Coefficients
18.4-10. Characteristic Functions
18.4-11. Statistically Independent Random Variables
18.4-12. Entropy of a Probability Distribution, and Related Topics
18.5. Functions of Random Variables. Change of Variables
18.5-2. Functions (or Transformations) of a One-dimensional Random Variable
18.5-3. Linear Functions (or Linear Transformations) of a One-dimensional Random Variable
18.5-4. Functions and Transformations of Multidimensional Random Variables
18.5-5. Linear Transformations
18.5-6. Mean and Variance of a Sum of Random Variables
18.5-7. Sums of Statistically Independent Random Variables
18.5-8. Compound Distributions
18.6. Convergence in Probability and Limit Theorems
18.6-1. Sequences of Probability Distributions. Convergence in Probability
18.6-4. Asymptotically Normal Probability Distributions
18.7. Special Techniques for Solving Probability Problems
18.8. Special Probability Distributions
18.8-1. Discrete One-dimensional Probability Distributions
18.8-2. Discrete Multidimensional Probability Distributions
18.8-3. Continuous Probability Distributions: the Normal (Gaussian) Distribution
18.8-4. Normal Random Variables: Distribution of Deviations from the Mean
18.8-5. Miscellaneous Continuous One dimensional Probability Distributions
18.8-6. Two-dimensional Normal Distributions
18.8-7. Circular Normal Distributions
18.8-8. n-dimensional Normal Distributions
18.8-9. Addition Theorems for Special Probability Distributions
18.9. Mathematical Description of Random Processes
18.9-2. Mathematical Description of Random Processes
(b) Ensemble Correlation Functions and Mean Squares
(d) Ensemble Averages of Integrals and Derivatives
18.9-4. Processes Defined by Random Parameters
18.9-5. Orthogonal-function Expansions
18.10. Stationary Random Processes. Correlation Functions and Spectral Densities
18.10-1. Stationary Random Processes
18.10-2. Ensemble Correlation Functions
18.10-3. Ensemble Spectral Densities
18.10-4. Correlation Functions and Spectra of Real Processes
18.10-5. Spectral Decomposition of Mean “Power” for Real Processes
18.10-6. Some Alternative Ensemble Spectral Densities
18.10-7.t Averages and Ergodic Processes (a) t Averages (b) Ergodic Processes
18.10-8. Non-ensemble Correlation Functions and Spectral Densities
18.10-9. Functions with Periodic Components
18.10-10. Generalized Fourier Transforms and Integrated Spectra
18.11. Special Classes of Random Processes. Examples
18.11-1. Processes with Constant and Periodic Sample Functions
(c) More General Periodic Processes
18.11-2. Band-limited Functions and Processes. Sampling Theorems
18.11-3. Gaussian Random Processes
18.11-4. Markov Processes and the Poisson Process
(a) Random Processes of Order n
18.11-5. Some Random Processes Generated by a Poisson Process
(b) Process Generated by Poisson Sampling
(c) Impulse Noise and Campbell's Theorem
18.11-6. Random Processes Generated by Periodic Sampling
18.12. Operations on Random Processes
18.12-1. Correlation Functions and Spectra of Sums
18.12-2. Input-Output Relations for Linear Systems
18.12-4. Relations for t Correlation Functions and Non-ensemble Spectra
18.12-6. Nonlinear Operations on Gaussian Processes
18.13. Related Topics, References, and Bibliography
18.13-2. References and Bibliography
18.1-1.Mathematical probabilities are values of a real numerical function defined on a class of idealized events, which represent results of an experiment or observation. Mathematical probabilities are not defined directly in terms of “likelihood” or relative frequency of occurrence; they are introduced by a set of defining postulates (Sec. 18.2-2; see also Sec. 12.1-1) which abstract essential properties of statistical relative frequencies (Sec. 19.2-1).* The concept of probability can, then, often be related to reality by the assumption that, in practically every sequence of independently repeated experiments, the relative frequency of each event tends to a limit represented by the corresponding probability (Sec. 19.2-1).* Theories based on the probability concept may, however, be useful even if they are not subject to direct statistical interpretation.
Probability theory deals with the definition and description of models involving the probability concept. The theory is especially concerned with methods for calculating the probability of an event from the known or postulated probabilities of other events which are logically related to the first event. Most applications of probability theory may be interpreted as special cases of random processes (Sees. 18.8-1 to 18.11-5).
* Whenever this proposition is justified, it must be regarded as a law of nature; it should not be confused with mathematical theorems like Bernoulli's theorem or the mathematical law of large numbers (Sec. 18.6-5).
18.2. DEFINITION AND REPRESENTATION OF PROBABILITY MODELS
18.2-1. Algebra of Events Associated with a Given Experiment. Each probability model describes a specific idealized experiment or observation having a class δ† of theoretically possible results (events, states) E permitting the following definitions.
The union (logical sum) E1 ∪ E2 ∪ . . . (or E1 + E2 + . . .) of a countable (finite or infinite) set of events E1, E2, . . . is the event of realizing at least one of the events E1, E2, . . . .
The intersection (logical product) E1∩ E2 (or E1E2) of two events E1 and E2 is the joint event of realizing both E1 and E2.
The (logical) complement Ẽ of an event E is the event of not realizing E (“opposite” or complementary event of E).
I is the certain event of realizing at least one of the events of δ†.
0 is the impossible event of realizing no one of the events of δ†.
In each case, the class δ of events comprising δ† and 0 is to constitute a completely additive Boolean algebra (algebra of events associated with the given experiment or observation) having all the properties outlined in Secs 12.8-1 and 12.8-4. Either E1 ∪ E2 = E1 or El ∩ E2 = E2 implies the logical inclusion relation E2 ⊂ E1 (E2 implies E1); note 0 ⊂ E ⊂ I. E1 and E2 are mutually exclusive (disjoint) if and only if E1 ∩ E2 = 0. The set δ1 of joint events E ∩ E1 is the algebra of events associated with the given experiment under the hypothesis that E1 occurs; E1 ∩ E1 = E1 is the certain event in δ1(see also Sec. 12.8-3).
18.2-2. Mathematical Definition of Probabilities. Conditional Probabilities. It is possible to assign a (mathematical) probability P[E] (probability of E, probability of realizing the event E) to each event E of the class δ (event algebra, Sec. 18.2-1) associated with a given experiment if and only if one can define a single-valued real function P[E] on δ so that
Postulates 1 to 3 imply 0 ≤ P[E] ≤ 1; in particular, P[E] = 0 if E is an impossible event. Note carefully that P[E] = 1 or P[E] = 0 do not necessarily imply that E is, respectively, certain or impossible.
A fourth defining postulate relates the “absolute” probability P[E] associated with the given experiment to the “conditional” probabilities P[E|E1] referring to a “simpler” experiment restricted by the hypothesis that E1 occurs. The conditional probability P[E|E1] of E on (relative to) the hypothesis that the event E1 occurs is defined by the postulate
P[E|E1] is not defined if P[E1] = 0.
In the context of the restricted experiment, the quantities P[E|E1] are ordinary probabilities associated with the joint events E ∩ E1 constituting the event algebra δ1 of the restricted experiment (Sec. 18.2-1). In practice, every probability can be interpreted as a conditional probability relative to some hypothesis implied by the experiment under consideration.
18.2-3. Statistical Independence. Two events E1 and E1 are statistically independent (stochastically independent) If and only if
so that [E1|E2] = P[E1] if P[E2] ≠ 0, and P[E2|E1] = P[E2] if P[E1] ≠ 0.
N events E1 E2, . . . , EN are statistically independent if and only if not only each pair of events Ei, Ek but also each pair of possible joint events is statistically independent:
18.2-4. Compound Experiments, Independent Experiments, and Independent Repeated Trials. Frequently an experiment appears as a combination of component experiments (see also Sees. 18.7-3 and 18.8-1). Let E′, E″, E′″, . . . denote any result associated, respectively, with the first, second, third, . . . component experiment. The results of the compound experiment can be described as joint events E = E′ ∩ E″ ∩ E′″ ∩ . . . ; their probabilities will, in general, depend on the nature and interaction of all component experiments. The probability P[E′] of realizing the component result E′ in the course of a given compound experiment is, in general, different from the probability associated with E′ in an independently performed component experiment.
Two or more component experiments of a given compound experiment are independent if and only if their respective results E′, E″, E′″, . . . obtained in the course of the compound experiment are statistically independent, i.e.,
for all E′, E″, E′″, . . . (Sec. 18.2-3). If a component experiment is independent of all others, the probability of realizing each of its results in the course of the given compound experiment is equal to the corresponding probability for the independently performed component experiment.
Repeated independent trials are independent experiments each having the same set of possible results E and the same set of associated probabilities P[E]. The probability of obtaining the sequence of results E1 E2, . . . En in the compound experiment corresponding to a sequence of n repeated independent trials is
18.2-5. Combination Rules (see also Sees. 18.7-1 to 18.7-3). Each of the theorems in Table 18.2-1 expresses the probability of an event in terms of the (possibly already known) probabilities of other events logically related to the first event.
More generally, the probability of realizing at least m and exactly m of N (not necessarily statistically independent) events E1 E2, . . . , EN is, respectively
If E1 E2, . . . , EN are statistically independent, the quantities (5) reduce to the symmetric functions (1.4-9) of the P[E1] (Table 18.2-1b).
EXAMPLES: If the probability of each throw with a die is 1/6 then
The probability of throwing either 1 or 6 is 1/6 + 1/6 = 1/3
The probability of throwing 6 at least oncein two throws is 1/6 + 1/6 – 1/36 = 11/36
The probability of throwing 6 exactly once in two throws is 1/3 – 2/3 = 5/18
The probability of throwing 6 twice in two throws is 1/36; etc.
18.2-6. Bayes's Theorem (see also Sec. 18.4-5b). Let H1 H2,. . . be a set of mutually exclusive events such that H1 ∪ H2 ∪ . . . = I . Then, for each pair of events Hi, E,
Equation (7) can be used to relate the “a priori” probability P[Hi] of a hypothetical cause Hi of the event E to the “a posteriori” probability P[Hi|E] if (and only if) the Hi are “random” events permitting the definition of probabilities P[Hi].
Table 18.2-1. Probabilities of Logically Related Events
18.2-7. Representation of Events as Sets in a Sample Space. Every class S of events E permitting the definition of probabilities P[E] can be described in terms of a set T of mutually exclusive events Ê ≠ 0 such that each event E is the union of a corresponding subset of Ê. Ê is called a sample space or fundamental probability set associated with the given experiment; each set of sample points (simple events, elementary events, phases) Ê of T corresponds to an event E. In particular, T itself corresponds to a certain event, and an empty subset of T corresponds to an impossible event.
The probabilities P[E] can then be regarded as values of a set function, the probability function defining the probability distribution of the sample space. Each probability P[E] is the sum of the probabilities attached to the simple events included in the event E.
The event algebra S is thus represented isomorphically by an algebra of measurable sets (see also Sees. 4.6-17b and 12.8-4). The fundamental probability set associated with the conditional probabilities P[E|E1] is the subset of I representing E1. Conversely, a sample space associated with any given experiment may be regarded as a subset “embedded” in a space of events associated with a more general experiment (see also Sees. 18.2-1 and 18.2-2).
18.2-8. Random Variables. A random variable (stochastic variable, chance variable, variate) is any (not necessarily numerical)* variable x whose “values” x = X constitute a fundamental probability set (sample space, Sec. 18.2-7) of simple events [x = X], or whose values label the points of a sample space on a reciprocal one-to-one basis. The associated probability distribution is the distribution of the random variable x. The definition of any random variable must specify its distribution.
Every single-valued measurable function (Sec. 4.6-14c) x defined on any fundamental probability set T is a random variable; its distribution is defined by the probabilities of the events (measurable subsets of T, Sec. 18.2-7) corresponding to each set of values of x.
18.2-9. Representation of Probability Models in Terms of Numerical Random Variables and Distribution Functions. The simple events (sample points) Ê of the fundamental probability set associated with a given problem are frequently labeled with corresponding values (sample values) X of a real numerical random variable x. Each sample value of x may, for instance, correspond to the result of a measurement defining a simple event. Compound events, like [x ≤ a], [sin x > 0.5], or [x = arctan 2], correspond to measurable sets of values of x (see also Sec. 18.2-8).
More generally, each simple event may be labeled by a corresponding (ordered) set X ≡ (X1, X2, . . .) of real numbers X1, X2, . . . which
* The boldface type used to denote a multidimensional random variable x does not necessarily imply that x is a vector.
constitutes a “value” of a multidimensional random variable x ≡ (x1, x2, . . .). Each of the real variables x1, x2, . . . is itself a random variable (see also Sec. 18.4-1).
Given a random variable x; or x labeling the simple events of the given fundamental probability set on a one-to-one basis, the probabilities associated with the corresponding experiment are uniquely described by the probability distribution of the random variable.
Throughout this handbook, all real numerical random variables are understood to range from – ∞ to + ∞ ; values of a numerical random variable which do not label a possible simple event Ê are treated as impossible events and are assigned the probability zero.
The distribution (or the probability function, Sec. 18.2-7) of any real numerical random variable x is uniquely described by its (cumulative) distribution function
Similarly, the distribution of a multidimensional random variable x ≡ (x1, x2, . . .) is uniquely described by its (cumulative) distribution function
Conversely, the distribution function corresponding to a given probability distribution is uniquely defined for all values of the random variable in question. Every distribution function is a nondecreasing function of each of it's arguments, and
18.3. ONE-DIMENSIONAL PROBABILITY DISTRIBUTIONS
18.3-1. Discrete One-dimensional Probability Distributions (see Tables 18.8-1 to 18.8-7 for examples). The real numerical random variable x is a discrete random variable (has a discrete probability distribution) if and only if the probability
is different from zero only on a countable set of spectral values X = X(1), X(2), . . . (spectrum of the discrete random variable x). Each discrete probability distribution is defined by the function (1), or by the corresponding (cumulative) distribution function (Sec. 18.2-9)
Throughout this handbook, the notation will be used to
signify summation of a function y(x) over all spectral values X(i) of a discrete random variable x (see also Sec. 18.3-6). Note
18.3-2. Continuous One-dimensional Probability Distributions (see Table 18.8-8 for examples). The real numerical random variable x is a continuous random variable (has a continuous probability distribution) if and only if its (cumulative) distribution function Φx(X) ≡ Φ(X) is continuous and has a piece wise continuous derivative, the frequency function (probability density, differential distribution function) of x
for all X .† P[X < x ≤ X + dx] = dΦ = φ(X) dx is called a probability element (probability differential). Note
If x is a continuous random variable, each event [x = X] has the probability zero but is not necessarily impossible. The spectrum of a continuous random variable x is the set of values x = X where φ(X) ≠ 0.
* In terms of the step function U -(t) [U-(t) = 0 if t < 0, U_(t) = 1 if t ≥ 0, Sec. 21.9-1],
† Some authors call a probability distribution continuous whenever its distribution function is continuous.
NOTE: A random variable can be continuous (i.e., have a piecewise continuous frequency function) over part of its range, while it is discrete elsewhere (see also Sec. 18.3-6).
18.3-3. Expected Values and Variance. Characteristic Parameters of One-dimensional Probability Distributions (see also Sec. 18.3-6). (a) The expected value (mean, mean value, mathematical expectation) of a function y(x) of a discrete or continuous random variable x is
if this expression exists in the sense of absolute convergence (see also Sees. 4.6-2 and 4.8-1).
(b) In particular, the expected value (mean, mean value, mathematical expectation) E{x} = ξ and the variance Var {x} = σ2 of a discrete or continuous one-dimensional random variable x are denned by
For computation purposes note (see also Sec. 18.3-10)
Whenever E{x} and Var {x} exist, the mean square deviation
of the random variable x from one of its values X is least (and equal to σ2) for X = ξ.
(c) E{x} and Var {x} are not functions of x; they are functionals (Sec. 12.1-4) describing properties of the distribution of x. E{x} is a measure of location, and Var {x} is a measure of dispersion (or concentration) of the probability distribution of x. A number of other numerical “characteristic parameters” describing specific properties of one-dimensional probability distributions are defined in Table 18.3-1 and in Sees. 18.3-7 and 18.3-9. Note that one or more parameters like E{x), Var {x), E{|x — ξ|}, . . . may not exist for a given probability distribution.
(d) Tables 18.8-1 to 18.8-8 list mean values and variances for a number of frequently used probability distributions.
Table 18.3-1. Numerical Parameters Describing Properties of One-dimensional Probability Distributions (see also Sees. 18.3-3, 18.3-7, and 18.2.1)
18.3-4. Normalization. Given a function ψ(x) ≥ 0 known to be proportional to the function p(x) associated with a discrete random variable x (Sec. 18.3-1),*
Given a function ψ(x) ≥ 0 known to be proportional to the frequency function φ(x) of a continuous random variable x (Sec. 18.3-2),
In either case, k is called the normalization factor. Analogous procedures apply to multidimensional distributions (Sec. 18.4-1).
18.3-5. Chebyshev's Inequality and Related Formulas. The following formulas specify upper bounds for the probability that a random variable x, or its absolute deviation |x — ξ| from the mean value ξ = E{x}, exceeds a given value a > 0.
If x has a continuous distribution with a single mode (Table 18.3-1) ξmode, one has the stronger inequality
where ∑ is Pearson's measure of skewness (Table 18.3-1); note that 2 = 0 if the distribution is symmetrical about the mode.
18.3-6. Improved Description of Probability Distributions:Use of Stieltjes Integrals. The treatment of discrete and continuous probability distributions is unified if one expresses the probability of each event [X – ΔX < x ≤ X + ΔX] as a Lebesgue-Stieltjes integral (Sec. 4.6-17)
* In order to conform with the notation used in many textbooks, the values x = X of a random variable x will be denoted simply by x whenever this notation does not lead to ambiguities.
where Φ(X) ≡ P[x ≤ X] is the cumulative distribution function (Sees. 18.2-9, 18.3-1, and 18.3-2) defining the distribution of the random variable x. For continuous distributions the Stieltjes integral (15) reduces to a Riemann integral. For a discrete distribution, Φ(X) is given by Eq. (2), and P[X — ΔX < x ≤ X + ΔX] reduces to the function p(X) denned in Sec. 18.3-1.
In terms of the Stieltjes-integral notation,
for both discrete and continuous distributions. The Stieltjes-integral notation applies also to probability distributions which are partly discrete and partly continuous. An analogous notation is used for multidimensional distributions (Sees. 18.4-4 and 18.4-8).
Discrete distributions may be formally represented in terms of a “probability-density” involving impulse functions δ_(X — Xi)) (see also Sees. 18.3-1 and 21.9-6).
18.3-7. Moments of a One-dimensional Probability Distribution (see also Sees. 18.3-6 and 18.3-10). (a) The moment of order r ≥ 0 (rth moment) about x = X of a given random variable x is the mean value E{(x — X)r}, if this quantity exists in the sense of absolute convergence (Sec. 18.3-3).
(b) In particular, the rth moment of x about X = 0 is
and the rth moment of x about its mean value ξ (central moment of order r) is
The existence of αr or µr implies the existence of all moments αk and µk of order k ≤ r; the divergence of αr or µr implies the divergence of all moments αk and µk of order k ≥ r.
If the probability distribution is symmetric about its mean, all (existing) central moments µr of odd order r are equal to zero.
(c) The rth factorial moment of x about X = 0 is
The rth central factorial moment of x is E{(x — ξ)[r]}. The rth absolute moment of x about X = 0 is βr = E{|x|r}. Note
(d) A one-dimensional probability distribution is uniquely defined by its moments α0, α1, α2, . . . if they all exist and are such that the series converges absolutely for some |s| > 0 [see also Eq. (28) and the footnote to Sec. 18.3-8b].
(e) Refer to Tables 18.8-1 to 18.8-7 for examples, and to Sec. 18.3-10 for relations connecting the αr, µr, and α[r].
18.3-8. Characteristic Functions and Generating Functions (see also Sec. 18.3-6; refer to Tables 18.8-1 to 18.8-8 for examples).*
(a)The probability distribution of any one-dimensional random variable x uniquely defines its (generally complex-valued) characteristic function
where q is a real variable ranging between – ∞ and ∞.
(b) The probability distribution of a random variable x uniquely defines its moment-generating function
and its generating function (see also Sec. 8.6-5)
* See footnote to Sec, 18.3-4.
for each value of the complex variable s such that the function in question exists in the sense of absolute convergence.
(c) The characteristic function χx(q) defines the probability distribution of x uniquely.* The same is true for each of the functions Mx(s) and γx(s) if it exists, in the sense of absolute convergence, throughout an interval of the real axis including s = 0 in the case of Mx(s), and s = 1 in the case of γx(s). Specifically, if £ is a discrete or continuous random variable,
Eq. (24) also yields p(x) or φ(x) in terms of Mx(s), since
(d) In many problems it is much easier to obtain a description of a probability distribution in terms of χx(q), Mx(s), or γx(s) than to compute Φ(z), p(x), or φ(x) directly (Sees. 18.5-3b,18.5-7, and 18.5-8). Again, the methods of Sec. 18.3-10 permit one to compute mean values, variances, and moments by simple differentiations if χx(q), Mx(s), or γx(s) are known. The linear integral transformations (21) to (24) can often be made with the aid of tables of Fourier or Laplace transform pairs (Appendix D).
(e) The generating function γx(s) is particularly useful in problems involving discrete distributions with spectral values 0, 1, 2, . . . , for then
whenever the series converges (see also Sec. 18.8-1; see Ref. 18.4 for a number of interesting applications).
18.3-9. Semi-invariants (see also Sec. 18.3-10). Given a one-dimensional probability distribution such that the rth moment αr exists, the first r semi-invariants (cumulants) k0, k1 k2, . . . , kr of the distribu-
* Φ(x) is, then, uniquely defined except possibly on a set of measure zero; Φ(x) is unique wherever it is continuous (see also Sec. 18.2-9).
tion exist and are defined by
Under the conditions of Sec. 18.3-7d all semi-invariants k1, k2, . . . exist and define the distribution uniquely.
18.3-10. Computation of Moments and Semi-invariants from χx(q), Mx(s), and γx(s). Relations between Moments and Semi-invariants. Many properties of a distribution can be computed directly from χx(q), Mx(s), or γx(s) without previous computation of Φ(x), φ(x), or p(x). If the quantities in question exist,
Note
provided that the function on the left (respectively the moment generating function, the semi-invariant-generating function, and the factorial-moment-generating function of x) is analytic throughout a neighborhood of s = 0.
Equations (28) yield E{x} and Var {x} with the aid of the relations
Table 18.3-1 lists other parameters which can be expressed in terms of moments.
The following additional formulas relate moments and semi-invariants:
18.4. MULTIDIMENSIONAL PROBABILITY DISTRIBUTIONS
18.4-1. Joint Distributions (see also Sec. 18.2-9). The probability distribution of a multidimensional random variable x ≡ (x1, x2, . . .) is described as a joint distribution of real numerical random variables x1, x2, . . . . Each simple event (point of the multidimensional sample space) [x = X] ≡ [x1 = X1 x2 = X2, . . .] may be regarded as a result of a compound experiment in which each of the variables x1, x2, . . . is measured. Each joint distribution is completely defined by its (cumulative) joint distribution function.
18.4-2. Two-dimensional Probability Distributions. Marginal Distributions. The joint distribution of two random variables x1, x2 is defined by its (cumulative) distribution function
The distribution of x1 and x2 (marginal distributions derived from the joint distribution of x1 and x2) are described by the corresponding marginal distribution functions
18.4-3. Discrete and Continuous Two-dimensional Probability Distributions. (a) A two-dimensional random variable x ≡ (x1, x2) is a discrete random variable (has a discrete probability distribution) if and only if the joint probability
is different from zero only for a countable set (spectrum) of “points” (X1, X2), i.e., if and only if both x1 and x2 are discrete random variables (Sec. 18.3-1). The marginal probabilities respectively associated with the marginal distributions of x1 and x2 (Sec. 18.4-2) are
(b) A two-dimensional random variable x = (x1, x2) is a continuous random variable (has a continuous probability distribution) if and only if (1) Φ(X1, X2) is continuous for all X1, X2, and (2) the joint frequency function (probability density)
exists and is piecewise continuous everywhere.* φ(X1 X2) dx1 dx2 is called a probability element. The spectrum of a continuous two-dimensional probability distribution is the set of “points” (X1, X2) where the frequency function (5) is different from zero. The marginal frequency functions respectively associated with the (necessarily continuous) marginal distributions of x1 and x2 (Sec. 18.4-2) are
(c) Note
18.4-4. Expected Values, Moments, Covariance, and Correlation Coefficient. (a) The expected value (mean value, mathematical expectation) of a function y = y(x1, x2) of two random variables x1, x2 with respect to their joint distribution is
* See footnote to Sec. 18.3-2.
if this expression exists in the sense of absolute convergence (see also Sec. 18.3-3).
NOTE: If y is a function of x1 alone, the mean value (8) is identical with the mean value (marginal expected value) with respect to the marginal distribution of x1.
(b) The mean values E{x1} = ξ1, E{x2} = ξ2 define a “point” (ξ1, ξ2) called the center of gravity of the joint distribution. The quantities E{(x1 — X1)r1(x2 — X2)r2 are called moments of order r1 + r2 about the “point” (X1 X2). In particular, the quantities
are, respectively, the moments about the origin and the moments about the center of gravity (central moments) of order r1 + r2 (see also Sec. 18.3-76).
(c) The second-order central moments are of special interest and warrant a special notation. Note the following definitions:
(see also Sec. 18.4-8). Note — 1 ≤ ρ 12 ≤ 1, and
18.4-5. Conditional Probability Distributions Involving Two Random Variables. (a) The joint distribution of two random variables x1, x2 defines a conditional distribution of x1 relative to the hypothesis that x2 = X2 for each value X2 of x2 and a conditional distribution of x2 relative to each hypothesis x1 = X1. The conditional distributions of x1 and x2 derived from a discrete joint distribution (Sec. 18.4-3a) are discrete and may be described by the respective conditional probabilities (Sec. 18.2-2)
The conditional distributions of x and x2 derived from a continuous joint distribution (Sec. 18.4-3b) are continuous and may be described by the respective conditional frequency functions
(b) Note
(c) Given a discrete or continuous joint distribution of two random variables x1 and x2, the conditional expected value of a function y(x1, x2) relative to the hypothesis that x1 = X1 is
if this expression exists in the sense of absolute convergence. Note that E{y(x1, x2)|X1} is a function of X1
EXAMPLE: The conditional variances of x1 and x2 are the respective functions
18.4-6. Regression(see also Sees. 18.4-9 and 19.7-2). (a) Given the joint distribution of two random variables x1 and x2, a regression of x2 on x1 is any function g2(x1) used to approximate the statistical dependence of x2 on x1 by a deterministic relation x2 ≈ g2(x1). More specifically, x2 is written as a sum of two random variables,
where h2(x1, x2) is regarded as a correction term. In particular, the function
often simply called the regression of x2 on x1, minimizes the mean-square deviation
The corresponding curve x2 = E{x2|x1} is the (theoretical) mean-square regression curve of x2.
(b) It is often sufficient to approximate the regression (19) by the linear function
Equation (21) describes a straight line, the mean-square regression line of x2; β21 is the regression coefficient of x2 on x1. Equation (21) represents the linear function ax1 + b whose coefficients a, b minimize the mean-square deviation
The resulting minimum mean-square deviation is σ22(1 — ρ122); the correlation coefficient ρ12 is seen to measure the quality of the “best” linear approximation.
(c) The mean-square regression (19) may be approximated more closely by a polynomial of degree m (parabolic mean-square regression of order m) or by other approximating functions, with coefficients or parameters chosen so as to minimize (20).
(d) If a; 2 is regarded as the independent variable, one has similarly
Note that in general neither (19) and (22) nor (21) and (23) are inverse functions. All mean-square regression curves and mean-square regression lines pass through the center of gravity (ξ1, ξ2) of the joint distribution.
The above definitions apply, in particular, if either of the two random variables, say x1 = t, becomes a given independent variable, and x2{t) describes a random process (Sec. 18.9-1).
18.4-7. n-dimensional Probability Distributions. (a) The joint distribution of n random variables x1, x2, . . . , xn is uniquely described by its (cumulative) joint distribution function
(Sec. 18.2-9). The joint distribution of m < n of the variables x1, x2, . . . , xn is an m-dimensional marginal distribution derived from the original joint distribution. One obtains the corresponding marginal distribution function from the joint distribution function (24) by substituting Xj = ∞ for each of the n — m arguments Xj, which do not occur in the marginal distribution, e.g.,
(b) An n-dimensional random variable x ≡ (x1, x2, . . . , xn) is a discrete random variable (has a discrete probability distribution) if and only if the joint probability
differs from zero only for a countable set (spectrum) of “points” (X1, X2, . . . , Xn), i.e., if and only if each of the n random variables x1, x2, . . . , xn is discrete (see also Sees. 18.3-1 and 18.4-3a).
Marginal probabilities and conditional probabilities are defined in the manner of Sees. 18.4-3a and 18.4-5a, e.g.,
(c) An n-dimensional random variable x ≡ (x1, x2, . . . , xn) is a continuous random variable (has a continuous probability distribution if and only if (1) Φ(X1, X2, . . . , XN) is continuous for all X1, X2, . . . , XN and (2) the joint frequency function (probability density)
exists and is piecewise continuous everywhere.* φ(X1, X2, . . . , Xn) dx1 dx2 . . . dxn is called a probability element (see also Sees. 18.3-2 and 18.4-3b). The spectrum of a continuous probability distribution is the set of “points” (X1, X2, . . . , Xn) where the frequency function (26) is different from zero.
(d) Note
(e) The frequency functions associated with the (necessarily continuous) marginal and conditional distributions derived from a continuous w-dimensional probability distribution are denned in the manner of Sees. 18.4-3b and 18.4-5a, e.g.,
(f) The joint distribution of two or more multidimensional random variables x = (x1, x2, . . .), y = (y1, y2, . . .), . . . is the joint distribution of the random variables x1, x2, . . . ; y1, y2, . . . ; . . . .
NOTE: A joint distribution may be discrete with respect to one or more of the random variables involved, and continuous with respect to one or more of the others; and each random variable may be partly discrete and partly continuous.
18.4-8. Expected Values and Moments (see also Sec. 18.4-4). (a) The expected value (mean value, mathematical expectation) of a function y = y(x1, x2, . . . , xn) of n random variables x1, x2, . . . , xn with respect to their joint distribution is
* See footnote to Sec. 18.3-2.
if this expression exists in the sense of absolute convergence.
NOTE : If y is a function of only m < n of the n random variables x1, x2, . . . , xn, then the mean value (28) is identical with the mean value of y with respect to the joint distribution (marginal distribution, Sec. 18.4-7) of the m variables in question.
(b) The n mean values E{x1} = ξ1, E{x2} = ξ2, . . . E{xn) = ξn define a “point” (ξ1, ξ2, . . . , ξn) called the center of gravity of the joint distribution. The quantities E{(x1 — X1)r1(x2 — X2)r2 . . . (xn — XN)rn} are the moments of order r1 + r2 + . . . + rn about the “point” (X1, X2, . . . , Xn). In particular, the quantities
are, respectively, the moments about the origin and the moments about the center of gravity (central moments).
(c) The second-order central moments are again of special interest and warrant a special notation; the quantities
define the moment matrix [λik] ≡ Δ and its reciprocal (Sec. 13.2-3)*
det [λik] is the generalized variance of the joint distribution. The (total) correlation coefficients
* Note that some authors denote the cofactor matrix [λik]—1 det [λik] by [Δik]. The notation chosen here simplifies some expressions,
(see also Sec. 18.4-4c) define the correlation matrix [pik] of the joint distribution. is sometimes called the scatter coefficient.
The matrices [ λik] and [ρik] are real, symmetric, and nonnegative (Sees. 13.3-2 and 13.5-2). Their common rank (Sec. 13.2-7) r is the rank of the joint distribution. The ellipsoid of concentration corresponding to a given n-dimensional probability distribution is the n-dimensional “ellipsoid”
defined so that a uniform distribution of a unit probability “mass” inside the hyper-surface has the moment matrix [ λik]. The ellipsoid of concentration illustrates the “concentration” of the distribution in different “directions”; the “volume” of the ellipsoid is proportional to the square root of the generalized variance. For r < n, the probability distribution is singular: its spectrum (Sec. 18.4-7) is restricted to an r-dimensional linear manifold (straight line, plane, hyperplane) in the n-dimensional space of “points” (x1, x2, . . . , xN), and the same is true for its ellipsoid of concentration. Thus the spectrum of a two-dimensional probability distribution is restricted to a straight line if r = 1, and to a point if r = 0.
18.4-9. Regression. Multiple and Partial Correlation Coefficients (see also Sees. 18.4-6 and 19.7-2). (a) Given the joint distribution of n random variables x1, x2, . . . , xN, one may study the dependence of one of the variables, say x1 on the remaining n - 1 variables by writing
where h1(x1, x2, . . . , xN) is regarded as a correction term. The function
(mean-square regression of x1 on x2, x3, . . . , xN) minimizes the mean-square deviation E[x1 — g1(x2, x3 . . . , xN)]2; E{x1|X2, X3, . . . , XN) is the conditional mean of x1 relative to the hypothesis that x2 = X2, x3 = X3, . . . , xN = XN (see also Sec. 18.4-5c)
(b) The mean-square regression of any variable Xi on the remaining n — 1 variables is often approximated by the linear function
(see also Sec. 18.4-6). * The regression coefficients βik are uniquely determined if the distribution is nonsingular (Sec. 18.4-8). The multiple correlation coefficient
is a measure of the correlation between xi and the remaining n - 1 variables.
(c) The random variable hi(1) ≡ xi — gi(1) (difference between xi and its “linear estimate” gi(1) for x0668;ii ≠ 0) is the residual of xi with respect to the remaining n — 1 variables. Note
(d) Regressions and residuals may be similarly defined in connection with a suitable marginal distribution (Sec. 18.4-7a) of m < n variables, say x1, x2, . . . , xm. The quantities analogous to β12, β13, . . . ; h1(1), h2(1), . . . are then respectively denoted by β12.34...m, β13.24...m, . . . ; h(1)1.23...m, h(1)2.13...m, . . .; in each case, there is a subscript corresponding to each variable of the marginal distribution.
(e) The partial correlation coefficient of x1 and x2 with respect to x3, x4,. . . , xN
measures the correlation of x1 and x2 after removal of the linearly approximated effects of x3, x4, . . . , xN. In particular, for n = 3,
18.4-10. Characteristic Functions (see also Sec. 18.3-8). The probability distribution of an n-dimensional random variable x ≡ (x1, x2, . . . , xN) uniquely defines the corresponding characteristic function (joint characteristic function of x1, x2 ... , xN)
* See footnote to Sec. 18.4-8b.
and conversely. For continuous distributions,
The joint characteristic function corresponding to the marginal distribution of m < n of the n variables x1, x2, . . . , xN is obtained by substitution of qk = 0 in Eq. (39) whenever xk does not occur in the marginal distribution; thus χ12(qi, q2) ≡ χx(q1, q2, 0, . . . , 0).
Moments and semi-invariants of suitable multidimensional probability distributions can be obtained as coefficients in multiple series expansions of χx and loge χx in a manner analogous to that of Sec. 18.3-10.
18.4-11. Statistically Independent Random Variables (see also Sees. 18.2-3 and 18.5-7). * (a) A set of random variables x1, x2, . . . , xN are statistically independent if and only if the events [x1 ∊ S1], [x2 ∊ S2], . . . , [xN ∊ SN] are statistically independent for every collection of real-number sets S1 S2, . . . , SN. This is true if and only if
or, in the respective cases of discrete and continuous random variables, if and only if
The joint distribution of statistically independent random variables is completely defined by their individual marginal distributions. Statistically independent random variables x1, x2, . . . are uncorrelated, i.e., ρik = 0 for all i ≠ k (Sec. 18.4-8c), but the converse is not necessarily true (see also Sec. 18.8-8).
(b) Statistical independence of multidimensional random variables x1, x2, . . . is defined by Eqs. (41) or (42) on substitution of x1, x2, . . . for x1, x2, . . . .
* See footnote to Sec. 18.3-4.
EXAMPLE: The multidimensional random variables (x1 x2) and (x3, x4, x5) are statistically independent if and only if
Note that Eq. (43) implies the statistical independence of x2 and x5, x1 and (x3, x4), (x1, x2) and (x3, x5), etc.
(c) Given a joint distribution of n discrete or continuous random variables x1 x2, . . . , xN such that (x1 x2, . . . , xm) is statistically independent of (xm+1, xm+2, . . . , xN), note
(d) Two random variables x1 and x2 are statistically independent if and only if their joint characteristic function is the product of their individual (marginal) characteristic functions (Sec. 18.4-10), i.e.,
An analogous theorem applies for multidimensional random variables (see also Sec-18.5-7).
(e) If the random variables x1, x2, . . . are statistically independent, the same is true for the random variables y1(x1), y2(x2), . . . . An analogous theorem holds for multidimensional random variables.
18.4-12. Entropy of a Probability Distribution, and Related Topics, (a) The entropy associated with the probability distribution of a one-dimensional random variable x is defined as
H{x} (entropy of x) is a measure of the expected uncertainty involved in a measurement of x. In the case of discrete probability distributions, H{x) ≥ 0, with H{x} = 0 if and only if x has a causal distribution (Table 18.8-1). The continuous distribution having the largest entropy for a given variance σ2 is the normal distribution (Sec. 18.8-3), with H{x} = log2 .
(b) In connection with the discrete or continuous joint distribution of two random variables x1, x2, one defines the joint entropy
and the conditional entropies
and Hx1{x2] (these are not conditional expected values, Sec. 18.4-5c), so that
The equality on the right applies if and only if x1 and x2 are statistically independent (Sec. 18.4-11). The nonnegative quantity
is a measure of the “statistical dependence” of x1 and x2. The functionals (46), (47), (48), and (50) have intuitive significance in statistical mechanics and in the theory of communications.
18.5. FUNCTIONS OF RANDOM VARIABLES.
CHANGE OF VARIABLES
18.5-1. Introduction. The following relations permit one to calculate probability distributions of suitable functions of random variables and, in particular, to change the random variables employed to describe a given set of events.
18.5-2. Functions (or Transformations) of a One-dimensional Random Variable, (a) Given a transformation y = y(x) associating a unique value of a random variable y with each value of the random variable x, the probability distribution of y is uniquely determined by that of x [see also Sec. 18.2-8; y(x) must be a measurable function].
(b) Let the random variables x and y be related by a reciprocal one-to-one transformation y = y(x), with x = x(y). Then
1.If v(x) is an increasina function.
Note that either y(x) or -y(x) is necessarily an increasing function. In either case, the medians x½ and y½ are related by y½ = y(x½).
2. If x and y are continuous random variables,
for all values Y of y such that dx/dy exists and is continuous.
NOTE: If x(y) is multiple-valued, one writes φy(Y) = φ1(Y) + φ2(Y) + . . . , where φ1(Y), φ2(Y), . . . are the frequency functions obtained from Eq. (2) for the respective single-valued “branches” x1(y), x2(y), . . . of x{y). EXAMPLE: If
(c) For single-valued, measurable y(x), f(y),
whenever this expected value exists; note that neither reciprocal one-to-one correspondence nor differentiability has been assumed for y(x). In particular, substitution of f(y) = esy in Eq. (4) yields the moment-generating function My(s) = E{esy}, and substitution of f(y) = eiqy produces the characteristic function χy(q) ≡ E{eiqy] (Sec. 18.3-8). If the integrals can be calculated, one may then use Eq. (18.3-25) to find φy(y) or py(y).
EXAMPLE: Let
where a is a constant, and x is uniformly distributed between 0 and 2π. Then
where we have used the symmetry properties of sin x and the fact that dy =
. It follows that
(see also Sec. 18.11-1b).
(d) By an extension of the convolution theorem of Sec. 8.3-3 to bilateral Laplace transforms (Sec. 8.6-2), Eq. (4) can be rewritten as
where the integration contour parallels the imaginary axis in a suitable absolute-convergence strip; the quantity in square brackets is seen to be the bilateral Laplace transform of f[y(x)] (see also Sees. 8.6-2 and Table 8.6-1). The complex contour integral (5) may be easier to compute than the integral (4).
(e) Note that, in general, E{y(x)} ≠ y{E{x}) (see also Sec. 18.5-3).
18.5-3. Linear Functions (or Linear Transformations) of a One-dimensional Random Variable, (a) If x is a continuous random variable, and y = ax + b, then
(b) If the mean values in question exist,
The semi-invariants (Sec. 18.3-9) αi † of y = ax + b are related to the semi-invariants αi of x by .
(c) Of particular interest is the linear transformation to standard units
x´ is called a standardized random variable (see also Sec. 18.8-3).
(d) If y = y(x) is approximately linear throughout most of the spectrum of x, it is sometimes permissible to use the approximations
where y´(x) = dy/dx.
18.5-4, Functions and Transformations of Multidimensional Random Variables. (a) If the random variables
are single-valued measurable functions of the n random variables x1, x2, . . . , xN for all x1, x2 . . . , xN, then the probability distribution of each random variable yi is uniquely determined by the joint distribution of x1, x2 . . . , xN, and the same is true for each joint or conditional distribution involving a finite set of random variables yi
Thus the distribution function of yi and the joint distribution function of yi and yk are, respectively,
(b) If x ≡ (xi, x2 . . . , xN) and y ≡ (y1,y2,. . ., yN) are continuous random variables related by a reciprocal one-to-one (nonsingular) transformation (11), their respective frequency functions φx (X1, X2, . . . , XN) and φy(Y1,Y2, . . . , YN) are related by
for all Y1, Y2, . . . , YN such that the Jacobian exists and is continuous.
If x(y) is multiple-valued, φy(Y1, Y2, . . . , YN) may be computed in a manner analogous to that outlined in Sec. 18.5-2b.
(c) For single-valued, measurable yi = yi(x1, x2, . . . , xN) (i = 1, 2, . . . , m) and f(y1, y2 . . . , ym),
whenever this expected value exists. As in See. 18.5-2c, neither reciprocal one-to-one correspondence nor differentiability has been assumed.
Table 18.5-1. Distribution of the Sum x = x1 + x2 + ... + xN of n Independent Random Variables (see also Sees. 18.5-7, 18.6-5, and 19.3-3)
Substitution of f = exp (s1y1 + s2y2 + . . .+ smym) yields the joint moment-generating function of y1, y2 . . . , ym, and substitution of f = exp (iq1y1 + iq2y2 + . . . + iqmym) yields the joint characteristic function. Transform methods analogous to Eq. (5) may be useful. Such methods have been successfully applied to special random-process problems (Sec. 18.12-5).
(d) For any two random variables x1, x2,
if this quantity exists. If x1, x2, . . . , xN are statistically independent, then
if this quantity exists.
(e) If y = x1x2,and φx1(x1) = 0 for x1 < 0, then
(Sec. 18.5-4b), and
Other suitable functions y = y(x1, x2) can be treated in a similar manner.
18.5-5. Linear Transformations (see also Sees. 14.5-1 and 14.6-1). For every non-singular linear transformation
the respective joint distributions of x1, x2, . . . , xN and y1, y2, . . . , yN are of equal rank (Sec. 18.4-8c), and
if the quantities in question exist. Λ ̒ ≡ [λ ̒ik] is the moment matrix (Sec. 18.4-8c) of (y1 y2, . . . , yN). The methods of Sec. 13.5-5 make it possible to find
1. An orthogonal transformation (18) such that the new moment matrix [λ ̒ik] (and hence also the correlation matrix is diagonal (transformation to uncorrelated variables yi).
2. A transformation (18) such that η1 = η2 = . . . = ηN = 0 and λ ̒ik = δik (trans formation to uncorrelated standardized variables yi (see also Secs. 18.8-6b and 18.8-8). The matrix [ E {xi*xk} ]must be nonsingular .
18.5-6. Mean and Variance of a Sum of Random Variables, (a) For any two (not necessarily statistically independent) random variables x1, x2
if the quantities in question exist.
(b) More generally,
(c) If y = y(x1, x2, . . . , xN) is approximately linear throughout most of the joint spectrum of (x1, x2, . . . , xN), it may be permissible to use the approximation
and to compute approximate values of E{y} and Var {y} by means of Eqs. (19) and (20) (see also Sec. 18.5-7).
18.5-7. Sums of Statistically Independent Random Variables (refer to Sec. 18.8-9 for examples), (a) If x1 and x2 are statistically independent random variables, then
where the subscripts 1 and 2 refer to the respective distributions of x1 and x2 as in Secs. 18.4-2, 18.4-3, and 18.4-7 (see also Table 18.5-1).
(b) More generally, if x = x1 + x2 + . . . + xN is the sum of n < ∞ statistically independent random variables x1, x1, . . . , xN,
and, if the quantities in question exist,
where Kr(i) is the rth-order semi-invariant of xi. Equations (24) and (26) permit the computation of higher-order moments with the aid of the relations given in Sec. 18.3-10.
(c) The distribution of the sum z = (z1, z2, . . .) = x + y of two suitable statistically independent multidimensional random variables x = (x1, x2, . . .) and y ≡ (y1,y2, . . .) is described by
18.5-8. Compound Distributions. Let x1, x2 , . . . be independent random variables each having the same probability distribution, and let & be a discrete random variable with spectral values 0, 1, 2, ... ; let k be statistically independent of x1, x2, . . . . If the generating functions γx1(s) and γk(s) exist, the distribution of the sum x= x1 + x2 + . . . + xk is given by its generating function
18.6. CONVERGENCE IN PROBABILITY AND LIMIT THEOREMS
18.6-1. Sequences of Probability Distributions. Convergence in Probability (see also Sec. 18.6-2). A sequence of random variables y1, y2, .. . converges in probability to the random variable y (yN converges in probability to y as n → ∞) if and only if the probability that yN differs from y by any finite amount converges to zero as n → ∞, or
An m-dimensional random variable yN converges in probability to the m-dimensional random variable y as n → ∞ if and only if each component variable of yN converges in probability to the corresponding component variable of y.
If the m random variables yn1, yn2, . . . , ynm converge in probability to the respective constants α1, α2, . . . , αm as n → ∞, then any function g(yn1, yn2, . . . , ynm) expressible as a positive power of a rational function of yn1, yn2, . . ., ynm converges in probability to g(α1, α2. . . , αm), provided that this quantity is finite.
18.6-2. Limits of Distribution Functions, Characteristic Functions, and Generating Functions. Continuity Theorems, (a) yN converges in probability to y as n x2192; n → ∞ if and only if the sequence of distribution functions Φyn(Y) converges to the limit Φy(Y) for all Y such that Φy(Y) is continuous.
(b) yN converges in probability to y as n → ∞ if and only if the sequence of characteristic functions Xyn(q)converges to a limit continuous for q = 0; in this case (Continuity Theorem for Characteristic Functions).
(c) A sequence of discrete random variables y1, y2 . . . converges in probability to the discrete random variable y as n → ∞ if and only if
If the random variables yi, y2, . . . all have nonnegative integral spectral values 0, 1, 2, . . . and possess generating functions yVl(s), yV2(s), . . . , then Eq. (2) holds if and only if lim yVn(s) = yy(s) for all real s such that 0 ≤ s ≤ 1 (Continuity Theorem for Generating Functions). Note that a sequence of discrete random variables may converge in probability to a random variable which is not discrete (see, for example, Table 18.8-3).
(d) Analogous definitions apply if y{n) converges in probability as a function of a continuous parameter n.
(e) Analogous theorems apply to multidimensional probability distributions.
18.6-3. Convergence in Mean (see also Sec. 12.5-3). Given a random variable y having a finite mean and variance and a sequence of random variables y1, y2, . . . all having finite mean values and variances, yN converges in mean (in mean square) to y as n → ∞ if and only if
Convergence in mean implies convergence in probability, but the converse is not true; as n → ∞ does not even imply that E{y} or Var {y} exists.
18.6-4. Asymptotically Normal Probability Distributions (refer to Table 18.8-3 and Sec. 19.5-3 for examples). The (probability distribution of a) random variable yN with the distribution function Φ(Y, n) is asymptotically normal with mean ηN and variance σN2 if and only if there exists a sequence of pairs of real numbers ηN, σN2 such that the random variable (yN — ηN)/σN converges in probability to a standardized normal variable (Sec. 18.8-3). This is true if and only if for all a, b > a
Equation (4) permits one to approximate the probability distribution of yN by a normal distribution with mean ηN and variance σN2 for sufficiently large n. Note that Eq. (4) does not imply that ηN and σ2 are the mean and variance of yN, that the sequence y1, y2, .. . converges in probability, or that E{yN} and ηN or Var {yN} and σN2 converge to the same limits; indeed, these limits may not exist.
18.6-5. Limit Theorems, (a) For every class of events E permitting the definition of probabilities P[E] (Sec. 18.2-2)
The relative frequency h[E] = nE/n (Sec. 19.2-1) of realizing the event E in n independent repeated trials (Sec. 18.2-4) is a random variable which converges to P[E] in mean, and thus also in probability, as n → σ {Bernoulli 's Theorem).
h[E] is asymptotically normal with mean P[E] and variance {1 - P[E] } (see also Table 18.8-3).
Note that (see also Table 18.8-3) *
(b) Let x1, x2 . . . be a sequence of statistically independent random variables all having the same probability distribution with (finite) mean value ξ. Then, as n ↔ ∞
The random variable converges in probability to ξ (Khinchiney's Theorem, Law of Large Numbers).
x is asymptotically normal with mean ξ and variance σ2/n, provided that the common variance σ2 of x1 x2, . . . exists (Lindeberg-Levy Theorem, Central Limit Theorem; see also Sees. 19.2-3 and 19.5-2).
(c) Let x1 x2, . . . be any sequence of statistically independent random variables having (finite) mean values ξ1, ξ2, . . . and variances σ12, σ22 .... Then, as n → ∞,
1. σN2 → 0 implies(Chebyshev's Theorem).
* See footnote to Sec. 18.3-4.
(Central Limit Theorem, Lindeberg conditions).
The Lindeberg conditions are satisfied, in particular, if there exist two positive real numbers a and b such that E{|xi|2+a} exists and is less than bσi2 for i = 1, 2, . . . (Lyapunov x0027;s Condition). See also Table 18.5-1
NOTE: The limit theorems are of special importance in statistics (Sees. 19.2-1 and 19.2-3).
18.7. SPECIAL TECHNIQUES FOR SOLVING PROBABILITY PROBLEMS
18.7-1. Introduction. Most probability problems require one to compute the distribution of a random variable x (or the distributions of several random variables) from given conditions specifying the distributions of other random variables x1, x2, . . . . As a rule, the simple events labeled by values of x are compound events corresponding to various logical combinations of values of x1, x2, . . . . The first step in the solution of any such problem must be the unequivocal definition of the fundamental probability set labeled by each fandom variable. The probabilities of compound events may then be computed by the methods of Sees. 18.2-2 to 18.2-6 and 18.5-1 to 18.7-3. Equation (18.3-3), (18.3-6), (18.4-7), or (18.4-27) may be used to check computations.
18.7-2. Problems Involving Discrete Probability Distributions: Counting of Simple Events and Combinatorial Analysis. Each fundamental probability set labeled by the spectral values of a discrete random variable (Sec. 18.3-1) is a countable set of simple events. The following relations (either alone or in combination with the relations of Sees. 18.2-2 to 18.2-6) aid in computing probabilities of compound events:
(a) If, as in many games of chance, equal probabilities are assigned to each of the N simple events of a given finite fundamental probability set, then the probability of realizing a compound event (“success ”) defined as the union (Sec. 18.2-1) of N1 specified simple events (“favorable ” simple events) can be computed as
(b) Given a countable (finite or infinite) fundamental probability set, let an event E be defined as the union of N1 simple events each having the probability ph N2 simple events each having the probability p2, . . . ; then
Ni + N2 + . . . need not be finite.
(c) Given N1 simple events E', N2 simple events E", . . . , and NN simple events E(n) respectively associated with n independent component experiments (Sec. 18.2-4), there exist exactly N1N2 . . . NN simple experiments [E' ∩ E" ∩ . . . ∩ E(n)] ≡ [E', E", • • • , E(n)].
(d) In many problems, the simple events under consideration are various possible arrangements of a given set or sets of elements, so that the numbers N1, N2, . . . in (a), (6), and (c) above are numbers of permutations, combinations, etc. The most important relevant definitions and formulas are given in Appendix C.
18.7-3. Problems Involving Discrete Probability Distributions: Successes and Failures in Component Experiments. Compound events are often described in terms of the results obtained in component experiments each admitting only two possible outcomes (“success” and “failure”). The probabilities of various compound events can be computed by the methods of Sees. 18.2-2 to 18.2-6 from the respective probabilities ϑ1, ϑ2, . . . of success in the first, second, . . . component experiment.
The methods of Sees. 18.5-6 to 18.5-8 may become applicable if one labels the events “success” and “failure” in the kth-component experiment with the respective spectral values 1 and 0 of a discrete random variable xk whose distribution is described by
Successes in two or more independent experiments are, by definition, statistically independent events (Sec. 18.2-4). Repeated independent trials (Sec. 18.2-4) each having only two possible outcomes are called Bernoulli trials (ϑ1 = ϑ2 = . . . = ϑ). The probability of realizing exactly x= x1 + x2 + • • • + xN successes in n Bernoulli trials is given by the binomial distribution (Table 18.8-3). If the trials are independent, but the ϑk are not all equal, one obtains the generalized binomial distribution of Poisson.
A subsequence of r successes or failures in any sequence of n trials is called a run of length r of successes or failures (see also Ref. 18.4, Chap. 13).
18.8. SPECIAL PROBABILITY DISTRIBUTIONS
18.8-1. Discrete One-dimensional Probability Distributions.*
Tables 18.8-1 to 18.8-7 describe a number of discrete one-dimensional distributions of interest, for instance, in connection with sampling problems and games of chance. The generating function rather than the characteristic function or the moment-generating function is tabulated: the latter two functions are easily obtained from
(see also Sec. 18.3-8). Moments not tabulated are also easily derived by the methods of Sec. 18.3-10.
Table 18.8-1. The Casual Distribution (see also Table 18.8-8)
Table 18.8-2. The Hypergeometric Distribution
Table 18.8-3. The Binomial Distribution (Fig. 18.8-1; see also Sec 18.7-3)
18.8-2. Discrete Multidimensional Probability Distributions (see also Sec. 18.4-2). (a) A multinomial distribution is described by
where ϑ1, ϑ2, . . . , ϑn are positive real numbers such that
FIG. 18.8-2. The Poisson distribution. (From Goode, H. H., and R. E. Machol, System Engineering, McGraw-Hill, New York, 1957.)
Given an experiment having n mutually exclusive results E1, E2, . . . , EN with respective probabilities ϑ1, ϑ2, . . . , ϑn such that ϑ + ϑ2 + . . . + ϑn = 1, the expression (1) is the probability that the respective events E1, E2, . . . , EN occur exactly x1, x2, ... , xN times in N independent repeated trials (see also Sec. 18.7-3). In classical statistical mechanics, x1 x2, . . . , xN are the occupation numbers of n independent states with respective a priori probabilities ϑ1, ϑ2, . . . , ϑn
Table 18.8-5. The Geometric Distribution
Table 18.8-6. Pascal's Distribution
Table 18.8-7. Polya's Distribution (Negative Binomial Distribution)
(b) A multiple Poisson Distribution is described by
18.8-3. Continuous Probability Distributions: The Normal (Gaussian) Distribution. A continuous random variable z is normally distributed (normal) with mean ξ and variance σ2 [or normal with parameters ξ, σ2; normal with parameters ξ, σ; normal (ξ, σ2); normal (ξ, σ)] if
The distribution of the standardized normal variable (normal deviate) (see also Sec. 18.5-3c) is given by
(see also Fig. 18.8-3 and See. 18.8-4). erf z is the frequently tabulated error function (normal error integral, probability integral; see also Sec. 21.3-2)
φ(X) has points of inflection for X = ξ ± a. Note
where Hk(z) is the kih Hermite polynomial (Sec. 21.7-1).
Every normal distribution is symmetric about its mean value ξ; ξ is the median and the (single) mode. The coefficients of skewness and excess are zero, and
The moments αr about the origin may be computed by the methods of Sec. 18.3-10.
The normal distribution is of particular importance in many applications, especially in statistics (Sees. 19.3-1 and 19.5-3).
18.8-4. Normal Random Variables: Distribution of Deviations from the Mean. (a) For any normal random variable x with mean ξ and variance σ2,
FIG. 18.8-3. (a) The normal frequency function
and (b) the normal distribution function
(From Burington, R. S., and D. C, May, Handbook of Probability and Statistics, McGraw-Hill, Nm York, 1953,)
Table 18.8-8. Continuous Probability Distributions
are often referred to as tolerance limits of the normal deviate u or as a. values of the normal deviate (see also Sec. 19.6-4). Note
(c) Note the following measures of dispersion for normal distributions (see also Table 18.3-1):
The mean deviation (m.a.e)
The probable deviation (p.e., median of |x – ξ|)
One-half the half width
The precision measure (see also Sees. 18.8-3, 19.3-4, 19.3-5, and 19.5-3)
18.8-5. Miscellaneous Continuous One-dimensional Probability Distributions. Table 18.8-8 describes a number of continuous one-dimensional probability distributions (see also Sees. 19.3-4, 19.3-5, and 19.5-3).
18.8-6. Two-dimensional Normal Distributions, (a) A two-dimensional normal distribution is a continuous probability distribution described by a frequency function of the form
The marginal distributions of x1 and x2 are both normal with respective mean values ξ1, ξ2 and variances σ12, σ22; ρ12 is the correlation coefficient of x1 and x2. The five parameters ξ1, ξ2, σ1, σ2, ρ12 define the distribution completely.
The conditional distributions of x1 and x2 are both normal, with
so that the regression curves are identical with the mean-square regression lines (Sec. 18.4-6). x1 and x2 are statistically independent if and only if they are uncorrelated (ρ12 = 0, see also Sec. 18.4-11). Note
(b) Every two-dimensional normal distribution (16) can be described in terms of standardized normal variables u1, u2 with the correlation coefficient ρ12, or in terms of statistically independent standardized normal variables (Sec. 18.5-5). Thus
(c) The distribution (16) is represented graphically by the contour ellipses φ(x1, x2) = constant, or
The probability that the “point” (x1, x2) is inside the contour ellipse (22) is
i.e., λ2 = χP2(2) (Table 19.5-1). The two mean-square regression lines respectively defined by Eqs. (17) and (18) bisect all contour-ellipse chords in the x1 and x2 directions, respectively (see also Sec. 2.4-6).
18.8-7. Circular Normal Distributions. Equation (16) represents a circular normal distribution with dispersion a about the center of gravity (ξ1, ξ2) if and only if ρ12 = 0, σ1 = σ2 = σ. The contour ellipses (22) become circles corresponding to fractiles of the radial deviation (radial error) The distribution of r is given by
(see also Sec. 18.11-16 and Table 19.5-1).
Circular normal distributions are of particular interest in problems related to gunnery; circular probability paper shows contour circles for equal increments of Φr(R). Note
18.8-8. n-Dimensional Normal Distributions. * The joint distribution of n random variables x1 x2, . . . , xN is an n-dimensional normal distribution if and only if it is a continuous probability distribution having a frequency function of the form
* See footnote to Sec. 18.4-8.
Each normal distribution is completely defined by its center of gravity (ξ1, ξ2, . . . , ξN) and its moment matrix [λjk] ≡ [Λjk]-1, or by the corresponding variances and correlation coefficients (Sec. 18.4-8). The characteristic function is
Each marginal and conditional distribution derived from a normal distribution is normal. All mean-square regression hypersurfaces are identical with the corresponding mean-square regression hyperplanes (Sec. 18.4-9). n random variables x1, x2, . . . , xN having a normal joint distribution are statistically independent if and only if they are vjicorrelated (see also Sec. 18.4-11).
Each w-dimensional normal distribution can be described as the joint distribution of n statistically independent standardized normal variables related to the original variables by a linear transformation (18.5-15).
18.8-9. Addition Theorems for Special Probability Distributions * (see also Sec. 18.5-7 and Table 19.5-1). (a) The binomial distribution (Table 18.8-3), the Poisson distribution (Table 18.8-4), and the Cauchy distribution (Table 18.8-8) “reproduce themselves” on addition of independent variables. If the random variable x is defined as the sum
of n statistically independent random variables x1, x2, . . . , xN, then
(b) The sum x = x1+x2+ . . . + xN of n statistically independent random variables x1, x2, . . . , xN is a normal variable if and only if x1, x2 . . . , xN are normal variables. In this case,
If x1, x2, . . . , xN are (not necessarily statistically independent) normal variables, then x = a1x1 + a2x2 + . . . + aNxN is a normal variable whose mean and variance are given by Eq. (18.5-19).
* See footnote to Sec. 18.3-4.
18.9. MATHEMATICAL DESCRIPTION OF RANDOM PROCESSES
18.9-1. Random Processes. Consider a variable x capable of assuming different values x(t) for different values of an independent variable t. A random process (stochastic process) selects a specific sample function x(t) from a given theoretical population (Sec. 19.1-2) or ensemble of possible sample functions. More specifically, the functions x(t) are said to describe a random process if and only if the sample values x1 = x(t1), x2= x(t2), . . . are random variables admitting definition of a joint probability distribution for every finite set of values (sampling times) t1, t2, . . . (Fig. 19.8-1). The random process is discrete or continuous if the joint distribution of x(t1), x(t2), . . . is, respectively, discrete or continuous for every finite set t1, t2, . . . . The process is a random series if the independent variable t assumes only a countable set of values. More generally, a random process may be described by a multidimensional variable x(t) ≡ [x(t), y(t), . . .].
The definition of a random process implies the existence of a probability distribution on the (in general, infinite-dimensional) sample space (Sec. 18.2-7) of possible functions x(t). Each particular function x(t) ≡ X(t) constitutes a simple event [sample point, “value” of the multidimensional random variable x(t)].
In most applications the independent variable t is the time, and the variable x(t) or x(t) labels the state of a physical system. EXAMPLES: Results of successive observations, states of dynamical systems in Gibbsian statistical mechanics or quantum mechanics, messages and noise in communications systems, economic time series.
18.9-2. Mathematical Description of Random Processes. (a) To describe a random process, one must specify the distribution of x(t1) and the respective joint distributions of [x(t1), x(t2)], [x(t1), x(t2), x(t3)], . . . for every finite set of values t1, t2, t3, . . . (first, second, third, . . . probability distributions associated with the random process). These distributions are described by the corresponding first, second, ... (or first-order, second-order, . . .) distribution functions (see also Sec. 18.4-7)
or, respectively for discrete and continuous random processes, by the corresponding probabilities and frequency functions
NOTE: The sequence of distribution functions (la) describes the random process in increasing detail, since each distribution function Φ(n) completely defines all preceding ones as marginal distribution functions (Sec. 18.4-7). The same is true for each sequence (1b). Each of the functions (1) is symmetric with respect to (unaffected by) interchanges of pairs Xi, ti and Xk, tk.
(b)Conditional probability distributions descriptive of the random process are related to the functions (1b) in the manner of Sec. 18.4-7; thus
NOTE: The functions (2) are not in general symmetric with respect to interchanges of pairs Xi, ti and Xk, tk separated by the bar.
(c)A multidimensional random process, say one generating a pair of sample functions x(t), y(t), is similarly defined in terms of joint distributions of sample values x(ti), y(tk). In particular,
18.9-3. Ensemble Averages. (a) General Definitions. The ensemble average (statistical average, mathematical expectation) of a suitable function f[x(t1), x(t2), . . . , x(tn)] of n sample values x(t1), x(t2), . . . , x(tn) (statistic, see also Sec. 19.1-1) is the expected value (Sec. 18.4-8a)
if this limit exists in the sense of absolute convergence. Integration in Eq. (4) is over X1, X2, . . . , Xn] E{f} is a function of t1, t2, . . . , tN.
Similarly, for a multidimensional random process described by x(t), y(t),
if the limit exists in the sense of absolute convergence.
(b)Ensemble Correlation Functions and Mean Squares. The ensemble averages E{x(t1)} = ξ(t1), E{x2(t1)}, and
are of special interest. They abstract important properties of the random process and are frequently all that is known about the process: note that
The definitions (6) and Eq. (7) apply to real x(t), y(t). If x(t) and/or y(t) is a complex variable (really a two-dimensional random variable), then one defines
which includes (6) as a special case; Rxy is necessarily real for real x and y.
Note that, for real or complex x, y,
Existence of the quantities on the right implies that of the correlation functions on the left.
(c)Characteristic Functions. The nth characteristic function corresponding to the nth distribution function (la) of the random process (see also Sec. 18.4-10) is
Joint characteristic functions for x(t), y(t), . . . are similarly defined. Characteristic functions can yield moments like E{x(t1)}, E{x2(t1)}, Rxx(t1, t2), . . . by differentiation in the manner of Sees. 18.3-10 and 18.4-10.
(d)Ensemble Averages of Integrals and Derivatives(see also Sec. 18.6-3).Random integrals of the form
are defined in the sense of convergence in probability (Sec. 18.6-1) or, if possible, in the mean-square sense of Sec. 18.6-3. The integral converges in mean (in the sense of Sec. 18.6-3) if and only if
exists. If dt exists, then the integral (12) exists in the sense of absolute convergence for each sample function x(t), except vossibly for a set of probability 0, and
The important relation (14) is needed, in particular, to derive the input-output relations of Sec. 18.12-2 (see also Refs. 18.13 to 18.17).
The random process generating x(t) is continuous in the mean (mean-square continuous) at t = to in the sense of Sec. 18.6-3 if and only if
this is true if and only if Rxx(t1,t2) exists and is continuous for t1 = t2 = t0. The random process generating will be called the mean-square derivative of a random process generating x(t) if and only if
This is true if and only if ∂2Rxx(t1, t2)/∂t1 ∂t2 exists and equals ∂2Rxx(t1, t2)/∂t2∂t1 for all t1 = t2. It follows that
(see also Sec. 18.12-2).
18.9-4. Processes Defined by Random Parameters. It is often possible to represent each sample function of a random process as a deterministic function x = x(t); η1, η2, . . .) of t and a set of random parameters η1, η2, . . . . The process is then denned by the joint distribution of η1, η2, . . . ; in this case,
In particular, each probability distribution of such a random process is uniquely defined by its characteristic function (Sec. 18.4-10)
18.9-5. Orthonormal-function Expansions. Given a real or complex random process x(t) with E{x(t)} finite and Rxx(t1, t2) bounded and continuous on the closed observation interval [a, b], there exist complete orthonormal sets of functions u1(t), u2(t), . . . (Sec. 15.2-4) such that
where the series and the integral for each ck converges in mean in the sense of Sec. 18.6-3 (see also Sec. 18.9-3d). The random process is, then, represented by the set of random coefficients c1, c2, . . . ; the first n coefficients may give a useful approximate representation. In particular, there exists a complete orthonormal set uk(t) ≡ Ψk(t) such that all the ck are uncorrelated standardized random variables, i.e.,
(Karhunen-Loeve Theorem). Specifically, the required Ψk{t) are the eigenfunctions of the integral equation
(see also Sec. 15.3-3). The corresponding eigenvalues λk, are nonnegative and have at most a finite degree of degeneracy (by Mercer's theorem, Sec. 15.3-4), and
The Karhunen-Loéve theorem constitutes a generalization of the theorem of Sec. 18.5-5.
EXAMPLES: Periodic random processes (Sec. 18.11-1), band-limited flat-spectrum noise (Sec. 18.11-2b). Although explicit analytical solution of the integral equation (14) is rarely possible, the theorem is useful in detection theory (Ref. 19.24).
18.10. STATIONARY RANDOM PROCESSES. CORRELATION FUNCTIONS AND SPECIAL DENSITIES
18.10-1. Stationary Random Processes. A random process, or the corresponding ensemble of functions x(t), is stationary if and only if each of its probability distributions is unchanged when t is replaced by t + to, so that
i.e., the nth probability distribution depends only on a set of n — 1 differences
of sampling times tk. Similarly, two or more random processes generating x(t), y(t), . . . are jointly stationary if and only if their joint probability distributions are unchanged when t is replaced by t + t0.
For stationary and jointly stationary random processes, each ensemble average (18.9-4) or (18.9-5) depends only on n — 1 differences (2):
for every t1 (see also Sec. 18.10-2).
18.10-2. Ensemble Correlation Functions(see also Sec. 18.9-3b). (a) For stationary x(t) [and jointly stationary x(t), y(t)], the expected values
E{x(t)} ≡ E{x} = ξ E{|x(t)|2} ≡ E{|x|2} E{y(t)} ≡ E{y} = η . . .
are constant, and the ensemble correlation functions (18.9-8) reduce to functions of the delay t2 — t1 = separating t1 and t2. In this case,
Again, existence of the quantities on the right implies existence of the correlation functions on the left. If Rxx() is continuous for = 0, it is continuous for all
(Ref. 18.17).
[Rxx(ti — tk)] is a positive-semidefinite hermitian matrix (Sec. 13.5-3) for every finite set t1, t2, . . . , tn.
(b)Normalized ensemble correlation functions are defined by
Note |ρxx| ≤ 1, |ρxy| ≤ 1. For real stationary x, y, ρxx and ρxy are real correlation coefficients (Sec. 18.4-4), and Eq. (4) implies
Random processes which are not stationary or jointly stationary but have constant E{x(t)}, E{y(t)} and “stationary correlation functions” satisfying Eq. (4) are often called stationary, or jointly stationary, in the wide sense.
18.10-3. Ensemble Spectral Densities. If x(t) is generated by a stationary random process, and x(t), y(t) by jointly stationary random processes, the ensemble power spectral density Φxx(ω) and the ensemble cross-spectral density Φxy(ω) are defined by
Assuming suitable convergence, this implies
The Fourier transforms (9) are introduced, essentially, to simplify the relations between input and output correlation functions in linear time-invariant systems (Sec. 18.12-3). Existence of the transforms (9) requires, besides the existence of E{|x|2} and E{|y|2} (Sec. 18.9-3b), that Rxx(r) or Rxy() or Rxy(
) decays sufficiently quickly as
→ ∞. In the case of periodic and d-c processes, one extends the definitions of spectral densities to include delta-function terms chosen so that Eq. (10) is satisfied (Sec. 18.10-9).
18.10-4. Correlation Functions and Spectra of Real Processes. The relations (9) and (10) apply to both real and complex random processes x(t), y(t). Note that the power spectral density Φxx(ω) is always real, even if x is complex; but the cross-spectral density Φxy(ω) may be a complex function even for real x, y. If x and y are real, the same is true for the correlation functions Rxx(τ), Rxy(τ). In this case,
Note again that Eqs. (11) to (13) apply to real x, y.
18.10-5. Spectral Decomposition of Mean “Power” for Real Processes. For real x(t), substitution of τ = 0 in Eqs. (11) and (12) yields
This is interpreted as a spectral decomposition of E{x2} (mean “power”). In the first integral, contributions to E{x2} are “distributed” over both positive and negative frequencies with density Φxx(ω) (“two-sided” power spectral density), measured in (x units)2/cps, sinceω/2π is frequency in cps. Alternatively, we can consider E{x2} as distributed only over nonnegative (“real”) frequencies with the “one-sided” power spectral density 2Φxx(ω) (x units)2/cps.
Intuitive interpretation of the—in general complex—cross-spectral density Φxy(ω) is not quite so simple. For real x(t), y(t), substitution of r = 0 in Eq. (10) yields
Re Φxy(ω) is often called a cross-power spectral density. Im Φxy(ω) (cross-quadrature spectral density) does not contribute to the mean “power” (15).
18.10-6. Some Alternative Ensemble Spectral Densities. Other spectral-density functions found in the literature are
(v = ω/2π; two-sided spectral density in x units2/cps)
(two-sided spectral density in x-units2/ radian/ sec)
and the one-sided spectral densities
Note that Γxx(v) and Gxx(ω) are defined only for nonnegative frequencies. Similar definitions also apply to cross-spectral densities. Note that symbols and definitions vary greatly in the literature; the correct definition should be restated and referred to in each case.
18.10-7. t Averages and Ergodic Processes. (a) t Averages. Given any function x(t), the t average (average over t, frequently a time average) of a measurable function f[x(t1), x(t2), . . . , x(tn)] is defined as
if the limit exists.* If x(t) describes a random process, then <f> is (like f, but unlike E{f}) a random variable (statistic) for each given set of values t1, t2, . . . , tn. Note that
whenever the integrals exist.
(b)Ergodic Processes. A (necessarily stationary) random process generating x(t) is ergodic if and only if the probability associated with every stationary subensemble is either 0 or 1. Every ergodic process has the ergodic property: the t average (20) of every measurable function f[x(t1), x(t2), . . . , x(tn)] equals its ensemble average (18.9-4) with probability one, i.e.,
whenever these averages exist. Any one of the functions x(t) will then define the random process uniquely with probability one, e.g., in terms of the characteristic functions (18.9-11) computed from x(t) by means of Eq. (21). Each t average, such as <x>, <x2>, or Rxx(τ), will then
* The notation is sometimes used instead of <f>, as well as instead of E[f}; but the symbol
is preferably reserved for the sample average
where kf is the value of f obtained from one of an empirical random sample of n sample functions x(t) = kx(t) (k = 1, 2, . . . , n; see also Sec. 19.8-4).
represent, with probability one, a property common to the entire ensemble of functions x(t).
Two or more jointly stationary random processes are jointly ergodic if and only if the probability associated with every stationary joint sub-ensemble is either 0 or 1. The ergodic theorem applies to averages computed from sample values of jointly ergodic processes.
18.10-8. Non-ensemble Correlation Functions and Spectral Densities. Given the real or complex functions x(t), y(t) (which may or may not be sample functions of a random process) such that
exist, the t averages
exist. These correlation functions satisfy all the relations listed in Sec. 18.10-2, if each ensemble average (expected value) is replaced by the corresponding t average. Again, the (non-ensemble) power spectral density Ψxx(ω) and the cross-spectral density Ψxy(ω) are introduced through the Wiener-Khinchine relations
If these “individual” spectral densities exist (one formally admits delta-function terms, Sec. 18.10-9), they satisfy relations analogous to those listed in Sees. 18.10-3 to 18.10-5. Alternative non-ensemble spectral densities can be defined in the manner of Sec. 18.10-6.
If x(t), y(t) are sample functions of jointly stationary random processes, then the correlation functions (24), (25) and the spectral densities (26) are random variables whose expected values equal the corresponding ensemble functions whenever they exist. If x(t), y(t) are jointly ergodic, then the correlation functions (24), (25) and the spectral densities (26) are identical to the corresponding ensemble quantities with probability one.
As an alternative definition, spectral densities are sometimes introduced by the formal relation
where aT(ω) and bT(ω) are Fourier transforms of the “truncated” functions xT(t), yT(t) respectively equal to x(t), y(t) for |t| < T and zero for |t| > T:
The corresponding ensemble spectral density Φxy(ω) may then be defined by Φxy(ω) = E{Φxy(ω)}, and the Wiener-Khinchine relations (26) follow from Borel's convolution theorem (Table 4.11-1). In general, however, Eq. (27) is valid only if both sides appear in an integral over ω (in particular, spectral densities often contain delta-function terms, Sees. 18.10-9 and 18.11-5; see also Sec. 18.10-10).
18.10-9. Functions with Periodic Components (see also Sec. 18.11-1). Like other t averages, non-ensemble correlation functions and spectra are of interest mainly if they happen to equal the corresponding ensemble quantities with probability one (this is true for all t averages in the case of ergodic processes, Sec. 18.10-76). When this is true, the single integrals (24), (25) may be easier to compute than the double integrals (4). The ergodic property also permits interpretation of, say, Φxx(ω) in terms of the “frequency content” of a single “typical” sample function x(t), since Φxx(ω) = Ψxx(ω) with probability one.
Without recourse to probability theory, non-ensemble correlation functions and spectra can be computed only for functions x(t), y(t) representable as sums of periodic components (except for the trivial case that the correlation function or spectral density is identically zero). In particular, for
More generally, let x(t) be a real function and of bounded variation in every finite interval and such that <|x(0)|2> exists. Then x(t) can be represented almost everywhere (Sec. 4.6-146) as the sum of its average value <x(0)> = c0, a countable set of periodic components, and an aperiodic component* p(t):
* The aperiodic component p(t) may be expressible as a Fourier integral [<|x(0)|2> = 0], or <|x(0)|2> may be different from zero (“random” component); or p(t) may be a sum of both types of terms.
Let y(t) be another real function y(t) satisfying the same conditions as x(t), so that
The set of circular frequencies ω1, ω2, . . . is understood to include the periodic-component frequencies of both x(t) and y(t). Then
The cross correlation function Rxy(τ) measures the “coherence” of x(t) and y(t) or the “serial correlation” between the function values x(t) and y(t + τ) separated by a delay τ. x(t) and y(t) are uncorrelated if and only if Rxy(τ) ≡ 0.
NOTE: The (real) functions x(t), y(t) belong to a complex unitary vector space with inner product (u, v) = <*(0)v(0)> (Sec. 14.2-6). Note the useful orthogonality relations
18.10-10. Generalized Fourier Transforms and Integrated Spectra. (a) To avoid the difficulties associated with delta-function terms in the Fourier transforms and spectral densities of periodic functions, one may introduce the generalized or integrated Fourier transform XINT(iω) of x(t), defined (to within an additive constant) by
The corresponding inversion integral is the Stieltjes integral (Sec. 4.6-17)
If the Fourier transform XF(iω) of x(t) exists, then
If x(t) can be represented as (this is, in particular, true for periodic functions; see also Sec. 18.11-1), then XINT(iω) is a step function (Sec. 21.9-1).
(b)The integrated power spectrum ΦINT(ω) of a stationary or wide-sense stationary random process generating x(t) is the generalized Fourier transform of its autocorrelation function:
Analogous relations can be written for non-ensemble correlation functions and spectra.
(c)Note the following generalizations of the Wiener-Khinchine relations (9) and (26) for real stationary (or wide-sense stationary) x(t).
For τ = 0, Eq. (40) yields Wiener's Quadratic-variation Theorem
If the non-ensemble power spectral density Ψxx(ω) exists, Eq. (40) reduces to the Wiener-Khinchine relation (26), with
18.11 SPECIAL CLASSES OF RANDOM PROCESSES. EXAMPLES
18.11-1. Processes with Constant and Periodic Sample Functions. (a) Constant Sample Functions (Fig. 18.11-la). If each sample function x(t) is identically equal to a constant random parameter a with given probability distribution, the latter determines the resulting random process uniquely. The process is stationary; but it is not ergodic. If E{a2} exists,
(b)Random-phase Sine Waves. Let
(Fig. 18.11-1b) where a is a given constant, and the phase angle α is a random variable uniformly distributed between 0 and 2π. The process is stationary and ergodic, with
If the amplitude a of the random-phase sine wave is not a constant, but is itself a (positive) random variable independent of α (as in amplitude modulation), the process is stationary but not in general ergodic.
Now
If, in particular, the amplitude a has a Rayleigh distribution defined by
(circular normal distribution with σ2 = 1, Sec. 18.8-7), then the random process is Gaussian (Sec. 18.11-3).
If the phase angle α is not uniformly distributed between 0 and 2π, then the process is nonstationary even if the amplitude a is fixed.
FIG. 18.11-1. Sample functions x(t) for five examples of random processes. In Fig. 18.11-le, x(t) is the sum of the individual pulses akv(t — tk) shown.
(c)More General Periodic Processes(see also Sec. 18.10-9). The random-phase sine wave is a special case of the general random-phase periodic process represented by
where α is uniformly distributed between 0 and 2π; it is assumed that the series converges in mean square in the sense of Sec. 18.6-3. The
FIG. 18.11-2. Autocorrelation function and power spectrum for a random telegraph wave (a) and a coin-tossing sample-hold process (b) having equal mean count rates α = 1/2Δt, both with zero mean and mean square α2. Note that different ω scales are used in (a) and (b). (From G. A. Korn, Random-process Simulation and Measurements, McGraw-Hill, New York, 1966.)
process is stationary and ergodic, with
A still more general periodic process is denned by the Fourier series
with real random coefficients c0, ak, bk, assuming that the series converges in mean square. Such a process is wide-sense stationary if and only if
In this case, Eq. (8) is an orthogonal-series expansion in the sense of Sec. 18.9-5, and
18.11-2. Band-limited Functions and Processes. Sampling Theorems. (a) A function x(t) is band-limited between ω = 0 and ω = 2πB if and only if its Fourier transform XF(iω) (Sec. 4.11-3) exists and equals zero for |ω| > 2πB; B (measured in cycles per second if t is measured in seconds) is the bandwidth associated with x(t). For every band-limited x(t)
i.e., x(t) is uniquely determined for all t by samples x(tk)spaced 1/2B t-units apart (Nyquist-Kotelnikov-Shannon Sampling Theorem).
The functions (Fig. 18.11-3)
FIG. 18.11-3. The sampling function sinc (see also Table F-21).
constitute a complete orthonormal set for the space of functions x(t) band-limited between ω = 0 and ω = 2πB (Sec. 15.2-4); note
(b)A stationary or wide-sense stationary random process with sample functions x(t) is band-limited between ω = 0 and ω = 2πB if and only if its ensemble power spectral density Φxx(ω) exists and equals zero for |ω| > 2πB. In this case, the expansion (11) applies in the sense of mean-square convergence (Sec. 18.6-3), i.e.,
and Eq. (11) represents each sample function x(t) in terms of its sample values xk = x(k/2B) with probability one.
NOTE: In the special case of a stationary band-limited “flat-spectrum” process with
the sample values xk = x(k/2B) have zero mean and are uncorrelated.
18.11-3. Gaussian Random Processes (see also Sees. 18.8-3 to 18.8-8, and 18.12-6). A real random process is Gaussian if and only if all its probability distributions are normal distributions for all t1, t2, . . . . Every Gaussian process is uniquely defined by its (necessarily normal) second-order probability distribution, and hence by the ensemble autocorrelation function Rxx(t1, t2) ≡ E{x(t1)x(t2)} together with ξ(t) ≡ E{x(t)}. Specifically, the joint distribution of every set of sample values x1 = x(t1), x2 = x(t2), . . . , xn = x(tn) is a normal distribution with probability density
Processes obtained through addition of Gaussian processes and/or linear operations on their sample functions are Gaussian (Sec. 18.12-2). Coefficients in orthogonal-function expansions of a Gaussian process (Sec. 18.9-5) are jointly Gaussian random variables.
18.11-4. Markov Processes and the Poisson Process. (a) Random Process of Order n. A random process of order n is a random process completely specified by its nth (nth-order) distribution function Φ(n)(Sec. 18.9-2), but not by Φ(n—1).
(b)Purely Random Processes. A random process described by x(t) is a purely random process if and only if the random variables x(t1), x(t2), . . . are statistically independent for every finite set t1, t2, . . . . A purely random process is completely specified by Φ(1))(X1, t1), p(1)(X1, t1, or φ(1)(X1, t1).
EXAMPLES: Successive independent observations, Bernoulli trials, and random samples in statistics (Sec. 19.1-2) represent purely random series. Purely random continuous-parameter processes imply sample functions of unlimited bandwidth and cannot, strictly speaking, describe real physical phenomena.
(c)Markov Processes. A discrete or continuous random process described by x(t) is a (simple) Markov process if and only if, for every finite set t1 < t2 < . . . < tn—1 < tn,
respectively. If x(tn—) = Xn—1 is given, knowledge of x(tn—2), x(tn—3), . . . contributes nothing to one's knowledge of the distribution of x(tn). A Markov process is completely specified by its second-order probability distribution and hence by its first-order probability distribution together with the “transition probabilities” given by
A Markovian random series is often called a Markov chain. Every purely random process is a Markov process.
Many physical processes can be described as Markov processes. An important class of problems involves the determination of the functions (21) from their given “initial values” specified for t = t1. The defining property (20) of a Markov process implies the Chapman-Kolmogorov-Smolu-chovski equation
Equation (22) is a first-order difference equation (Sec. 20.4-3) which may be solved for the unknown function (21) of the independent variable t whenever p(x, t|X1, t1) or φ(x, t|X1, t1) is suitably given. If p(1)(X1, t1) or φ(1)(X1, t1) is known, the Markov process is now completely determined for all t > t1.
(d) The Poisson Process. In many problems involving random searches, waiting lines, radioactive decay, etc., x{t) is a discrete random variable capable of assuming the spectral values 0, 1, 2, . . . (“counting process”; number of “successes,” telephone calls, disintegrations, etc.). A frequently useful model assumes the Markov property (20a) and
where o(Δt) denotes a term such that o(Δt)/Δt becomes negligible as Δ → 0 (Sec. 4.4-3). To find
substitute the given transition probabilities (23) into the Smoluchovski equation (22a) for t2 = t + Δt to obtain the difference equation
with P( — 1, T) ≡ 0. As Δt → 0, this reduces to an ordinary differential equation
for each K. These differential equations are solved successively for P(0, T), P(l, T), P(2, T), . . . , with initial conditions given by
It follows that
Thus, once the process is started, the number K of state changes in every time interval of length T has the Poisson distribution (Table 18.8-4). α is called the mean count rate of the Poisson process.
The probability that no state changes take place is
so that the probability that at least one state change takes place is
The time interval T1 between successive state changes is a random variable with probability density
and expected value 1/α.
Within any finite time interval of length T, a Poisson process is also uniquely defined by the joint distribution of the K + 1 statistically independent random variables K, t1, t2 . . . , tK, where K is the number of state changes during the time T, and t1, t2, . . . , tK are now the respective times of the 1st, 2nd, . . . , Kth state change during this time interval. One has
(e)See Refs. 18.15, 18.16, and 18.17 for treatments of more general Markov processes.
18.11-5. Some Random Processes Generated by a Poisson Process. (a) Random Telegraph Wave (Fig. 18.11-lc). x(t) equals either a or — a, with sign changes generated by the state changes of a Poisson process of mean count rate α (Sec. 18.11-4d). The process is stationary and ergodic if started at t = — ∞, and
(b)Process Generated by Poisson Sampling (Fig. 18.11-ld). x(t) changes value at each state change of a Poisson process with mean count rate α; between state changes, x(t) is constant and takes continuously distributed random values x with given mean ξ and variance σ2. The process is stationary and ergodic if started at t = — ∞, and
(c)Impulse Noise and Campbell's Theorem (Fig. 18.11-le). x(t) is the sum of many similarly shaped transient pulses,
whose shape is given by v = v(t), with
while the pulse amplitude ak is a random variable with finite variance, and the times tk are random incidence times determined by the state changes of a Poisson process with mean count rate α. The process is stationary and ergodic if started at t = — ∞; it approximates a Gaussian random process if many pulses overlap. One has
In the special case where ak is a fixed constant, the formulas (36) are known as Campbell's theorem.
18.11-6. Random Processes Generated by Periodic Sampling. Certain measuring devices sample a stationary and ergodic random variable q(t) periodically and then hold their output x(t) for a constant sampling interval Δt. The resulting random process is stationary and ergodic if the timing of the periodic sampling commands is random and uniformly distributed between 0 and Δt. A sample function x(t) will be similar to Fig. 18.11-ld except that state changes must be separated by integral multiples of Δt. If q is a binary random variable capable of assuming only the values a and —a with probabilities 1/2, 1/2, then x(t) will resemble the random telegraph wave of Fig. 18.11-lc, except that state changes are, again, separated by integral multiples of Δt (“coin-tossing” sample-hold process).
If different samples of q are statistically independent, then
and hence
Figure 18.11-2 compares Rxx(τ) and Φxx(ω) for a random telegraph wave and a coin-tossing sample-hold process with equal mean count rates α = 1/2Δt, zero mean, and E{x2} = a2.
18.12. OPERATIONS ON RANDOM PROCESSES
18.12-1. Correlation Functions and Spectra of Sums. Let x(t), y(t) be generated by real or complex random processes. For
with real or complex α, β, the correlation functions Rxz(t1, t2), Rzx(t1, t2), Rzz(t1, t2) are given by
These relations also apply to the correlation functions Rxz(τ), Rzx(τ), Rzz(τ) of stationary random processes; the corresponding spectral densities are
18.12-2. Input-Output Relations for Linear Systems. (a) Consider a real linear system with real input x(t) and output
where the weighting function (Green's function, Sees. 9.3-3 and 9.4-3) is the system response to a unit-impulse input δ(t — λ) (impulse applied at t = λ), and h(t, ζ) ≡ w(t, t — ζ).
In the most important applications, t represents time, and w(t, λ) = 0 for t < λ, since physically realizable systems cannot respond to future inputs (see also Sec. 9.4-3).
(b)If x(t) is generated by a real random process, and if E{x2(t)} and E{y2(t)} exist, then
If x(t) is Gaussiany y(t) is also Gaussian and completely determined by Eqs. (5) to (7).
18.12-3. The Stationary Case. (a) If the input x(t) is stationary, and
(time-invariant linear system, see also Sec. 9.4-3), then the system output y(t) is also stationary; y(t) will be ergodic (Sec. 18.10-7b) if this is true for x(t). The input-output relations (4) to (7) for real x(t), y(t) reduce to
In most applications, physical realizability requires h(ζ) = 0 for ζ < 0 (see also Sec. 9.4-3).
(b)The important input-output relations (11) are greatly simplified if they are expressed in terms of spectral densities (Sec. 18.10-3):
(c)Note also
In the special case of stationary white-noise input with Rxx(τ) ≡ Φ0(τ) (Sec. 18.11-46), note
18.12-4. Relations for t Correlation Functions and Non-ensemble Spectra. The relations (2), (4), and (10) to (17) all hold if each ensemble average, correlation function, and spectral density is replaced by the corresponding t average, t correlation function, and non-ensemble spectral density (Sees. 18.10-7 to 18.10-9), whenever these quantities exist.
18.12-5. Nonlinear Operations. Given a random process generating x(t) and a single-valued, measurable function y = y(x), the functions
represent a new random process produced by a (generally nonlinear) zero-memory operation on the x(t); y(x) does not depend explicitly on t. Distributions and ensemble averages of the y process are obtained by the methods of Sees. 18.5-2 and 18.5-4. In particular, the autocorrelation function of the “output” y is, for real variables,
where x1 = x(t1), x2 = x(t2); y2 = y(x1), y2 = y(x2).
If this turns out to be more convenient, Ryy(t1, t2) can be obtained in the form
where the integration contours C1, C2 parallel the imaginary axis in suitable absolute-convergence strips (Ref. 18.15). The “transform method” is especially useful in connection with certain practically important transfer characteristics y(x), e.g., limiters, half-wave detectors, quantizers, etc. (Refs. 18.13 and 18.15).
18.12-6. Nonlinear Operations on Gaussian Processes. (a) Price's Theorem (Ref. 18.17). Given two jointly normal random variables x1, x2 with covariance λ12 and a function f(x1, x2) such that
for some real a > 0, b < 2, then
Price's theorem yields ensemble averages (and, in particular, correlation functions) in the form
where C is the value of E{f(x1, x2)} for λ12 = 0, i.e., for uncorrelated x1, x2.
Price's theorem also leads to the useful recursion formula
In particular,
(b)Series Expansion. Given a stationary Gaussian process x(t) with E{x} = 0, Rxx(τ) = σ2ρxx(τ) and a function y = y(x) such that Ryy(τ) exists, then
where the Hk(v) are the Hermite polynomials defined in Table 21.7-1.
18.13. RELATED TOPICS, REFERENCES, AND BIBLIOGRAPHY
18.13-1. Related Topics. The following topics related to the study of probability theory and random processes are treated in other chapters of this handbook:
Measure, Lebesgue integrals, Stieltjes integrals, Fourier analysis Chap. 4
Construction of mathematical models, abstract spaces, Boolean algebras. Chap. 12
Orthogonal-function expansions Chap. 15
Mathematical statistics, random-process measurements and tests Chap. 19
Permutations and combinations Appendix C
18.13-2. References and Bibliography (see also Sec. 19.9-2).
18.1. Arley, N., and K. R. Buch: Introduction to the Theory of Probability and Statistics, Wiley, New York, 1950.
18.2. Burington, R. S., and D. C. May: Handbook of Probability and Statistics, 2d ed., McGraw-Hill, New York, 1967.
18.3. Cramér, H.: Mathematical Methods of Statistics, Princeton, Princeton, N.J., 1951.
18.4.———: The Elements of Probability Theory and Some of Its Applications, Wiley, New York, 1955.
18.5. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. I, 2d ed., Wiley, New York, 1958; vol. II, 1966.
18.6. Gnedenko, B. V.: Theory of Probability, Chelsea, New York, 1962.
18.7.——— and A. I. Khinchine: An Elementary Introduction to the Theory of Probability, Dover, New York, 1961.
18.8. Loéve, M. M.: Probability Theory, 3d ed., Van Nostrand, Princeton, N.J., 1963.
18.9. Parzen, E.: Modern Probability Theory and Its Applications, Wiley, New York, 1960.
18.10. Richter, H.: Wahrscheinlichkeitstheorie, 2d ed., Springer, Berlin, 1967.
Random Processes
18.11. Bailey, N. T. J.: The Elements of Stochastic Processes with Applications to the Natural Sciences, Wiley, New York, 1964.
18.12. Bharucha-Reid, A. J.: Elements of the Theory of Markov Processes and Their Applications, McGraw-Hill, New York, 1960.
18.13. Davenport, W. B., Jr., and W. L. Root: Introduction to Random Signals and Noise, McGraw-Hill, New York, 1958.
18.14. Doob, J. L.: Stochastic Processes, Wiley, New York, 1953.
18.15. Middleton, D.: An Introduction to Statistical Communication Theory, McGraw-Hill, New York, 1960.
18.16. Parzen, E.: Stochastic Processes, Holden-Day, San Francisco, 1962.
18.17. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1965.
18.18. Rosenblatt, M.: Random Processes, Oxford, New York, 1962.
18.19. Saaty, T. L.: Elements of Queueing Theory with Applications, McGraw-Hill, New York, 1961.