Modern Mathematics for the Engineer

WILLIAM FELLER

PROFESSOR OF MATHEMATICS
PRINCETON UNIVERSITY

6.1 Introduction

Chance processes play a role in almost all phases of modern life: The efficient organization of traffic and communication systems, the provision of an adequate power supply and of storage and emergency facilities, the setting of tolerance limits, etc., all depend on an understanding of the phenomena of chance fluctuations and on an estimation of their future magnitude. Accordingly, probability theory has to cope with a large variety of problems and methods, and it would obviously be impossible here to present a fair sample of them. We shall therefore limit our attention essentially to two simple but methodologically interesting problems.

First we move in the direction of the classical central limit theorem of probability theory. It is well known that the normal (or Gaussian) distribution appears almost every time when we have to deal with the summation of many small chance effects. We shall attempt to explain this phenomenon, approaching it by the method of random walks.¹ This method is useful in many connections, and it has the advantage of showing that the normal distribution enters the theory in its role as fundamental solution of the heat equation. This leads us naturally to the more general Fokker-Planck, or diffusion, equation and permits us to deal with important variations on the theme, such as the ruin problem.

In the second part, we use the random-walk approach to discuss the simplest model for fluctuations in a waiting line or queue.³^,⁴ We discuss processes characterized by a lack of memory, or aftereffect, and the role of the exponential law for decay and for holding times. An effort is made to elucidate the operational meaning of the so-called steady state (or state of statistical equilibrium), which, together with the law of large numbers,¹ is the subject of widespread and dangerous misconceptions.

SUMS OF RANDOM VARIABLES

6.2 Cumulative Effects

We begin with an analysis of the probable fluctuations of a quantity that may be considered as the resultant of a huge number of small—i.e., individually negligible—chance effects. For example, the electricity consumption in a city at any given time is the sum of the demands of a great number of individual households, offices, factories, etc. Similarly, the supply and the demand in a variety of economic processes are due to many contributing sources, and certain heritable characteristics are the cumulative effect of many individually small genetic contributions. The error of measurement in a physical experiment is composed of many unobservably small errors, and the whole theory to be presented traces its origin to this example. The thickness of washers produced in a factory and the total damage due to fire accidents may serve as further familiar examples.

In each case, we deal with a large (not necessarily known) number of random variables X₁, X₂, …, X_n and their sum

Perhaps the most important and intuitively simplest case arises in connection with stochastic processes, i.e., processes with a time-dependent random variable S(t). Typical examples are the amount of water in a reservoir, the total damage due to accidents up to time t, and the electricity consumption in a city. These quantities may be the cumulative effect of many contributing factors, but there is another way in which we may treat them as a sum of the form (6.1). In fact, we may split the time interval from 0 to t into n intervals of length τ = t/n and consider the increment S(t) − S(0) as the sum of n corresponding increments, S(kτ + τ) − S(kτ). Here the number n of components is arbitrarily large, and by letting n → ∞ we obtain correct laws rather than approximations.

6.3 The Simplest Random-walk Model

For simplicity, we consider only sums of the form (6.1) in which the components X_k are statistically independent. In the case of a stochastic process S(t), this means that the increment S(t″) − S(t′) in any time interval (t′,t″) is in no way influenced by the past of the process. (“Processes with independent increments.” A more general scheme of Markovian processes will be considered in Sec. 6.6.)

The next restriction will at first appear much harder, but will turn out to be rather harmless (see Sec. 6.6). For simplicity of exposition, we shall restrict our attention to the case where the components X_k can assume only three values, namely 0, h, and −h, where h > 0 is a constant. The probabilities that X_k equals h, −h, and 0 are p, q, and r, respectively, with p + q + r = 1.

In the tritest example, X_k represents the gain or loss of a gambler at the kth trial (with r the probability of a tie), and S_n the accumulated gain in the first n trials. It is convenient to imagine that the trials take place at a uniform rate at times τ, 2τ, 3τ, …, so that the index n may be interpreted as a time variable, t = nτ.

In Brownian motion, or diffusion, one considers a particle exposed to random molecular collisions. These, of course, are neither uniformly spaced in time nor of equal magnitude. In first approximation, however, we can work with averages and assume that the shocks can occur only at times τ, 2τ, 3τ, … and that the resulting displacement is always 0, h, or −h. In this case, S_n represents the position of the moving particle at time t = nτ, and if at some time t the position happens to be x = kh, then the position of the particle at the next moment t + τ is x + h, x − h, or x, with probabilities p, q, and r, respectively.

In other words, we are concerned with a simple random walk on the x axis starting from the origin and such that all steps have the same length h. This terminology is so convenient that it is used in general for sums of the type described, whatever the empirical meaning (if any) of the random variables X_k. The “particle,” of course, becomes a symbolic indicator for S_n.

Our first task is to calculate the probability

u_n,k = prob {S_n = kh}

that at time t = nτ the particle occupies the position x = kh; here n = 1, 2, 3, …, and k = 0, ±1, ±2, …. Now S_n+1 can equal kh in only three mutually exclusive cases: either S_n = (k − 1)h and X_n+1 = h; or S_n = (k + 1)h and X_n+1 = −h; or, finally, S_n = kh and X_n+1 = 0. By adding the corresponding probabilities, we find that, for n ≥ 1,

This equation remains valid also for n = 0 if we put

From Eq. (6.2), we may calculate recursively all the required probabilities u_n,k by substituting successively n = 0, 1, 2, …. Thus the difference equation (6.2) together with the initial conditions (6.3) furnishes the solution to our problem. Instead of deriving the rather unwieldy exact solution, however, we prefer to approximate Eq. (6.2) by a differential equation. This method has the advantage of leading to an important class of more general diffusion processes.

6.4 The Fokker-Planck Equation

It is natural, for our problem, to introduce such units of measurement that the probable fluctuations of S_n will be numerically neither very small nor very large. Since we are interested in large n, this means that h and τ will be numerically very small. One would then expect that for a fixed time t = nτ the probability distribution {u_n,k} may be interpolated by a probability density u(t,x). This means that, for any integers ν < μ, the probability

may be approximated by the integral of u(t,x) over the interval νh < x ≤ μh. Then u_n,k corresponds to the integral of u(t,x) over an interval of length h centered at x = kh; that is, we should have

u_n,k ≈ u(n,kh)h

This leads us simply to put

u(nτ,kh) = h⁻¹u_n,k

and to assume that u(t,x) may be defined for all t > 0 and x as a function with two continuous derivatives. This step can be justified rigorously, but we shall proceed purely heuristically.

The difference equation (6.2) now takes on the form

u(t + τ, x) = pu(t, x − h) + qu(t, x + h) + ru(t,x)

By using Taylor’s formula, we conclude that

The coefficients occurring on the right have a simple physical significance. In fact, (p − q)h = E(X_k) is the mean displacement of the particle at each individual step. Since each step takes a time τ, the ratio

is the mean displacement per time unit. Similarly,

is the mean-square displacement per time unit. By keeping these quantities fixed, we obtain in the limit as h → 0 the Fokker-Planck (or general diffusion, or heat) equation

For the initial conditions (6.3), we require the particular solution of Eq. (6.6) that, as t → 0, approaches the so-called Dirac function, that is, u(t,x) → 0 for each x ≠ 0 and u(t,0) → ∞ in such a way that the integral of u equals 1 for all t > 0 (see Chap. 1). With reasonable coefficients a and b, there exists exactly one solution of Eq. (6.6) satisfying this condition, and we may use it as an approximation to the probability distribution of S_n for large n.

The advantage of this method is that our problem has been reduced to the analysis of an equation that plays an important role in many theories. In Sec. 6.6, we shall indicate how it may be derived under less restrictive conditions and in a more general form. It will be seen in Sec. 6.7 that the same equation with appropriate boundary conditions serves several additional purposes.

6.5 Example

Consider a coin-tossing game in which at each trial the gambler wins or loses the amount h > 0 with probabilities ½. Here p = q = ½, r = 0, and hence b = 0. The Fokker-Planck equation reduces to the ordinary heat equation

where a is a nonzero constant that may be chosen arbitrarily since this amounts only to a choice of the scale.† The required solution of Eq. (6.7) is given by the famous normal (or Gaussian) probability density

To apply our approximation formula, we have to put τ = h²/a. To calculate approximately the probability distribution of the gambler’s accumulated gain after n trials, we must put t = nτ, that is, at = nh². Thus,

The quantity S_n/h is, of course, independent of h, and it is more natural to rewrite the relationship (6.8) in the form

This approximation formula was given by De Moivre in 1718 and is a special case of the central limit theorem of probability theory.¹^,² It shows in particular that the probable fluctuations of S_n will be of the order of magnitude hn^½. For example, for large n the probability that |S_n| > 3hn^½ is about 0.0027, and the probability that |S_n| > 4hn^½ is smaller than 0.000065.

If a > 0 and b are arbitrary constants, the required solution of the Fokker-Planck equation (6.5) becomes

which is simply a normal distribution with a moving center (modal point) at x = bt. This explains why so many different phenomena are subject to the normal law.

6.6 Generalizations

The Fokker-Planck equation (6.6) actually may be derived from much more general assumptions than those introduced in Sec. 6.3, and its validity is much wider than is indicated by the above discussion. For example, the restriction that the X_k assume only the values 0 and ± h was introduced only for notational convenience. To illustrate this point, it will suffice to consider another special case, namely, that each X_k ranges over the five possible values −2h, −h, 0, h, 2h, which are assumed, respectively, with probabilities p₋₂, p₋₁, p₀, p₁, p₂. In this case, the difference equation (6.4) takes the form

and the use of Taylor’s formula leads to a differential equation of the form (6.5) except that the coefficients on the right are replaced by

Now these coefficients again represent, respectively, the mean displacement and mean-square displacement per time unit, and so the Fokker-Planck equation (6.6) remains valid without change in the physical (or probabilistic) meaning of the coefficients.

This consideration should render plausible the fact that the differential equation (6.6) represents a rather universal law governing the fluctuation of random variables obtained by a summation of many independent components. In fact, it applies with little change to the much more general case of random walks in which the probability distribution of each step depends on the actual position of the particle, i.e., instead of independent and equally distributed X_k we may consider the case in which X_n+1 and its distribution depend on S_n. The essential new feature is that the coefficients a and b are no longer constants but may depend on x. The derivation outlined above leads in this case to the general Fokker-Planck equation

6.7 The “Ruin” Problem

In many applications, the problem of estimating the probable fluctuations of S_n or S(t) for a fixed n or t is of a secondary interest; what we really require is the probability that S_n or S(t) will remain within preassigned bounds α and β for all times, or at least for all t of a certain period. In technical applications, the setting of tolerance limits and the provision of adequate power supplies, safety measures, or storage facilities are likely to lead to questions of this type. The estimation of the probable extremes of various chance variables in the past or future provides another example. Even in ordinary gambling, the De Moivre formula (6.8) does not give the pertinent information. In fact, if the two players have respectively the finite capitals α and β, then the first gambler’s gain cannot fall outside the interval (− α,β). Once one of the limits is reached, one of the gamblers is “ruined” and the game necessarily ends. A picturesque terminology is desirable, and it has therefore become customary to refer to the event that S_n reaches a preassigned limit as “ruin.” The example of Sec. 6.11 will show, however, that in technical applications the “ruin” may stand for the most desirable event.

The random-walk method remains applicable with almost no change. With the conventions of Sec. 3.3, we now let u_n,k be the probability that S_n = k under the additional restriction that none of the variables S₁, S₂, …, S_n has reached either the level −α or the level β. The probability ρ_n that such “ruin” does not occur within the first n trials is then given by

The quantities u_n,k are, of course, undefined for k ≤ −α and k ≥ β. The difference equation (6.2) obviously remains valid when −α + 1 < k < β − 1, but it must be modified for the extreme values k = −α + 1 and k = β − 1. For example, when k = β − 1, the event S_n+1 = kh can occur only if S_n equals either kh or (k − 1)h, and so Eq. (6.2) in this case reduces to u_n+1,β−1 = pu_n,β−2 + ru_n,β−1; that is, the term qu_n,p is missing. Since u_n,β is undefined, we can take care of this exception by defining u_n,β = 0. The required probabilities u_n,k will then satisfy the difference equations (6.2) for all k in the pertinent interval −α < k < β, but we have to impose the two boundary conditions

u_n,−α = 0u_n,β = 0

The argument leading to the Fokker-Planck equation (6.6) remains valid, but this equation will now be restricted to the given interval and supplemented by the boundary condition that u vanish at the two end points for all times.

The general applicability and usefulness of this method will be illustrated in Sec. 6.11.

QUEUEING PROBLEMS

6.8 Holding and Waiting Times; Discipline

Consider a counter, telephone trunk line, port facility, or any other device that at any given moment either is “free” or else is “serving a customer.” If a new customer arrives when the counter is free, he is served without delay. If the counter is occupied, the new customer joins a waiting line or queue. The number W(t) of customers in the waiting line (always including the customer being served) is a time-dependent random variable of a nature totally different from the random variables studied above.

In focusing our attention on the queue length, we are neglecting features of the process that from other points of view may be of great importance. If I am standing in a queue, the point of interest to me is my waiting time, and this depends only on the number of customers to be served before me, not on the total number of customers in the queue. Here the queue discipline determines the precedence. Under the first come, first served rule, the queue is ordered and I am interested only in the number of customers ahead of me. Unfortunately, this order of things is not as natural or as common as one might expect. Thus in an automatic telephone exchange, a (usually short) queue may be formed with every dialing. Here, as in other technical devices of a similar nature, the prevailing system is based on random choice: at the termination of each service time, in case of an existing queue, one of its members is selected at random irrespective of the order of arrivals. Finally, for tracking- and servomechanisms using information that is fed to them continuously, the latest information is naturally most important, and accordingly their queue discipline is based on the last come, first served rule.

As long as we are interested only in the length of the queue, the queue discipline plays no role, but we still have to specify the laws governing the duration of the service time and the incoming traffic. A great variety of models is of theoretical interest and of importance in applications, but we shall limit our discussion to the simplest and most important case of a “Markovian process.” It is characterized by a complete lack of memory.

To introduce this notion in the simplest way, let us begin by considering a waiting time T; this random variable may stand for the lifetime of a radioactive atom, a piece of mechanical equipment, or an individual. Alternatively, T may stand for the duration of a service time in our queue, for the length of a telephone conversation, or for the time between the arrivals of two customers.

In principle, this random variable T can have an arbitrary probability distribution, but we consider only the case in which no aging occurs: given that the waiting time T, starting at 0, has lasted for a time t, the probability that it ends during the time interval (t, t + τ) is independent of the present age t and depends only on τ. A typical example is a radioactive atom since, according to accepted and tested theories, the probability of its disintegrating during the next minute remains constant throughout its lifetime. The law of exponential decay for radioactive matter is well known, and we now proceed to show that it is a simple consequence of the assumed lack of aging or memory.

Let us first discretize time; i.e., let us assume that the waiting time T ranges only over integral multiples of a fixed time, say τ sec. Let us denote by q_n the probability that T exceeds nτ, that is,

q_n = prob {T > nτ}

Then 1 − q₁ is the probability that T expires at time τ; therefore, given that T has surpassed nτ, the probability that T will expire at time (n + 1)τ is again 1 − q₁. This means that q_n+1 = q_nq₁, and hence by induction that q_n = q₁ⁿ. The probability that the waiting time T exceeds nτ is accordingly subject to the geometric distribution {q₁ⁿ}, and the latter characterizes the lack of aging. Of course, q₁ is an arbitrary constant, 0 < q₁ < 1. For example, in throwing a die the probability that the waiting time for the first ace exceeds n trials is ()ⁿ. For a waiting time subject to the geometric distribution {qⁿ}, the probability that T lasts exactly n time units is qⁿ⁻¹ − qⁿ, and therefore the expected (or mean) duration is given by

To obtain the corresponding formulas for a continuous time parameter, we have only to pass to the limit as τ → 0. As we shorten the intervals of subdivision, the probability q will vary in such a way that the life expectancy E(T) remains constant, say

E(T) = a⁻¹

This means that

q = 1 − a⁻¹τ

and hence the probability prob {T > t} that T lasts longer then time t = nτ becomes

qⁿ = (1 − a⁻¹τ)^t/τ

This quantity approaches exp (−at) as τ → 0, and we see thus that our assumption of absence of aging leads to the probability distribution

prob {T > t} = exp (−at)

which is the law of exponential decay.

In many applications there exists a theoretical justification for the assumption that the waiting and service times are subject to this law, but frequently this assumption is introduced as an approximation merely because without it the theory becomes unmanageable. Such a procedure may appear rough, but it is accompanied by an astounding success. Practically the whole theory of telephone communication uses this assumption even though it cannot be correct; for example, long-distance conversations are more likely to last merely 3, 6, 9, … min, which contradicts the assumption that the past has no aftereffect. For further details and applications, see Ref. 2.

6.9 Random-walk Model; the Differential Equations

We begin by describing our queueing model with a discretized time; i.e., we assume that changes can occur only at times τ, 2τ, 3τ, …, where τ is a fixed small fraction of the time unit. For a continuous model, we shall then pass to the limit as τ → 0.

Our model is based on the assumption that, independently of the past history of the system, at any time t = nτ the probability of an arrival of a new customer is ατ, and, independently, if at that instant the counter is occupied, the probability that the service time ends at the next moment t + τ is βτ. We do not consider the possibility that several customers arrive simultaneously. It will be noted that both the service times and the waiting times between consecutive arrivals of customers lack memory and therefore have geometric distributions. The expected duration of a service time is β⁻¹, and the expected duration between two consecutive arrivals is α⁻¹. It follows that the average number of arrivals per time unit is α.

We may describe the process in random-walk terminology, with the position of the “particle” at any time indicating the number of customers in the queue (including the one actually being served). The possible positions are now restricted to the integers k ≥ 0. If at any time there are k ≥ 1 customers in the line, the next step takes the particle to k + 1 if a new customer arrives before the service time ends; the step leads to the left if the service time ends before a new customer arrives; in the remaining cases the particle stays fixed. In terms of the notations of Sec. 6.3, the probabilities p, q, and r of a step to the right, to the left, or no change, respectively, are therefore given by

An exceptional situation arises for k = 0, since no service time can end when none is going on. For k = 0, we therefore have p = ατ, q = 0.

If u_n,k stands for the probability that at time t = nτ there are exactly k customers in the line, we have again the difference equation (6.2), valid for k ≥ 2; for k = 0, 1, the equation takes the forms

If the counter is free at time 0, we again have the initial conditions (6.3).

For a queueing model with continuous time, we have only to pass to the limit as τ → 0. Let u_k(t) be the probability that at time t the line contains k ≥ 0 customers. If in our difference equations we subtract u_n,k from both sides and divide by τ, we obtain on the left the difference ratios [u_k(t + τ) − u_k(t)]/τ. It follows that the required probabilities u_k(t) satisfy the differential equations

This is a system of infinitely many differential equations, and its explicit solution requires the use of Laplace transforms or other tricks. However, much information can be obtained directly from Eqs. (6.11). As an example, in the next section we shall consider the so-called “steady state.”

6.10 Steady State

It is not difficult to show that for the solutions u_k(t) of Eqs. (6.11), and for all similar systems, the limits

always exist. Moreover, only two possibilities arise: either (a) p_k = 0 for all k or (b) the p_k are a probability distribution—that is,

In the first case, the probability of finding fewer than N customers in the line tends to 0, and hence the waiting line is bound to increase indefinitely as time goes on.

In the second case, we obtain a solution of Eqs. (6.11) if we put

u_k(t) = p_k

for all times. This means that if initially, at time 0, the number of customers in the line is randomly distributed in accordance with the probability distribution {p_k}, then the same probability distribution will prevail at all times: we have a “steady state,” or a state of “statistical equilibrium.” Much harm has been done by this misleading term; its true significance will be discussed in Sec. 6.12.

The probabilities p_k must obviously satisfy the system of equations obtained on putting in Eqs. (6.11), that is,

Thus p₁ = p₀(α/β); by substituting into the second equation for k = 1, we find p₂ = p₀(α/β)²; and by continuing in this manner by recursion, we get p_k = p₀(α/β)^k. But the series Σp_k must converge and its sum is unity whenever the p_k represent steady-state probabilities. Hence we have the result that

In other words, a “steady state” exists only if α < β. The average queue length in the steady state is

Before discussing (see Sec. 6.12) the significance of this concept and the frequent misconceptions concerning it, we turn to a related notion.

6.11 Busy Periods

Suppose that at some time t, which we choose as t = 0, there are i customers in the waiting line. How long will it take for the counter to become free for the first time? How will the queue length vary in the meantime? Many important questions of this type may be raised, and they are all related to the ruin problem as discussed in Sec. 6.7. Indeed, if we restrict our consideration to a time during which the counter remains constantly busy, the queue length 0 does not occur and thus 0 becomes the “ruin”: the busy period ends when 0 is reached. For example, consider first the model with discrete time and let v_n,k be the probability at time t = nτ, during a busy period, of finding k customers in the line. That is, we have v_0,i = 1, and v_n,k is the probability of the compound event that the counter is not free at times τ, 2τ, …, (n − 1)τ and that there are exactly k customers at time nτ. As we have seen in Sec. 6.7, for k ≥ 1 these v_n,k will satisfy the same difference equations as the u_n,k, namely,

v_n+1,k = pv_n,k−1 + qv_n,k+1 + rv_n,k

But we have to impose the boundary condition v_n,0 = 0. Similarly, in the continuous model, the probability Q_k(t) that a busy period starting with i customers lasts longer than t, and that at time t there are exactly k customers in the line, satisfies the differential equations

for k ≥ 1, with the initial and boundary conditions

Q_i(0) = 1 Q₀(t) = 0

These equations uniquely determine the probabilities Q_k(t). The sum ΣQ_k(t) is the probability that the busy period extends beyond t.

We shall here calculate only the expected or average duration L_i of this busy period. For that purpose, let t₁, t₂, t₃, …, t_i be the moments when the queue length (initially of size i) first decreases to i − 1, i − 2, i − 3, …, 0. These moments break up the busy period into i sub-intervals of length t₁, t₂ − t₁, t₃ − t₂, …, t_i − t_i−1, each of which is characterized by the condition that at the end (and only at the end) the queue is shorter than at the beginning. It follows that the durations of the several subintervals are random variables with a common distribution and the common expected value L; therefore, L_i = iL. To calculate L = L₁, consider the discrete-time model at an instant when the queue length is 1. At the next moment the queue length is 0, 1, or 2, with respective probabilities q, r, and p. In the first case, the busy period lasted one trial (twice τ). In the second case, the situation remained unchanged and, on the average, it will take another L trials before 0 is reached. Finally, in the last case, the expected continued duration of the busy period is 2L. Thus,

L = τ + q · 0 + r · L + p · 2L

This, of course, makes sense only if q > p; if q ≤ p, there is a chance that the busy period never ends, and we cannot speak of its expected duration.

By using the values (6.10) and letting τ → 0, we find that in the continuous-time model, corresponding to the differential equations (6.11), the expected duration of a busy period starting with i customers is i/(β − α), provided, of course, that β > α.

For example, let the average service time be 5 min and suppose that the average frequency of arrivals is 10 per hr. This corresponds to α = , β = . The average queue length (6.13) in the “steady state” is 5; the average duration of a busy period starting with i customers is 30i min; the steady-state probability p₀ of finding the counter unoccupied is .

6.12 Fluctuations in the Individual Process vs. Ensemble Averages

What are the practical or operational implications of the “steady state,” i.e., of the existence of the limits (6.12)? It is a widespread but dangerous fallacy to assume that these relationships refer to the fluctuations in an individual process or the successive stages of the queue at any one particular counter. In reality, the relationships (6.12) and the “statistical equilibrium” refer to a large ensemble of similar counters. If we had a large number N of counters operating independently for a long time, as described by our differential equations (6.11), then it would be reasonable to expect that, at any particular moment, Np₀ of the counters would be found free, Np₁ with exactly one customer, etc. Averaging over this ensemble of counters shows that the average number of customers is approximately α/(β − α), etc. Once this steady-state distribution {p_k} is attained, it will automatically maintain itself—at least ideally in an infinitely large ensemble.

At each individual counter, however, the fluctuations will continue unabatedly without the slightest attenuation. There is no tendency of the queue length to converge in any sense of the word or to settle down near the average value. In our last example, the average queue length was 5. If at any time the queue length happens to be 5, however, with probability about 0.287 that it will reach the level of 10 before the counter ever gets free again, it will take a very long time for the queue to reach 10, and from that moment on it will, on the average, take another 5 hr to find the counter free for the first time. In more than 10 per cent of all situations the queue will, starting from a length 5, reach the length 15 before disappearing, and in such cases the busy-time period will be extraordinarily long.

We saw that a busy period (lifetime of a queue) starting with one customer is 1/(β − α). Unfortunately, this statement has almost no practical significance because of the exceedingly large fluctuations. One common measure of the probable magnitude of such fluctuations is the variance. For the busy period, this variance is (α + β)/(β − α)³, which can be much larger than the mean. In our example the mean is 30 while the variance is 9,900. In practice, our counter would be “jammed” at frequent intervals.

It is very dangerous to rely—as is usually done—on expectations as indication for what will really happen and to believe that the individual process shows similarity to the whole ensemble. This point may be illustrated by the example given in the next section.

6.13 The Example of D. G. Kendall’s Taxicab Stand ³

At a perfectly balanced taxicab stand, either customers wait for taxis or taxis wait for customers. Customers and taxis arrive with equal frequencies. If customers are counted + and taxis −, the queue length may be any integer 0, ±1, ±2, …. From a queue of length k, the next change leads with equal probabilities to k + 1 or k − 1. Thus the successive changes are represented by a symmetric random walk. The expected queue length is 0, but it is easily seen that at each individual stand the queue length is bound to grow to +∞ or −∞. The zero expectation says nothing about the fluctuations at an individual stand; it assures us merely that, in a large ensemble, for any stand with thousands of taxis waiting in despair for customers there is somewhere a stand with equally many customers waiting vainly for a taxi.

It should be borne in mind that in building taxi stands, elevators, etc., we are interested in the fluctuations in time at one particular counter, not in large ensembles balanced in the manner described. Statistical equilibrium is good where it is really meaningful—e.g., in an ensemble of many telephone trunk lines. But little satisfaction can be derived from a judicial statistical equilibrium where for each innocently condemned person we find a felon running free.†

REFERENCES

1. Brown, George W., Monte Carlo Methods, chap. 12 in “Modern Mathematics for the Engineer,” First Series, edited by E. F. Beckenbach, McGraw-Hill Book Company, Inc., New York, 1956.

2. Feller, William, “An Introduction to Probability Theory and Its Applications,” vol. 1, 2d ed., John Wiley & Sons, Inc., New York, 1957.

3. Kendall, D. G., Some Problems in the Theory of Queues, J. Roy. Statist. Soc., ser. B, vol. 13, pp. 151–185, 1951.

4. Morse, P. M., “Queues, Inventories, and Maintenance,” John Wiley & Sons, Inc., New York, 1958.

† We have refrained from the natural norming a = 1 in order to exhibit the general form (6.9) of the Fokker-Planck equation.

† See Ref. 2, Chap. III, for a discussion and experiment concerning the accumulated gains S₁, S₂, S₃, …, S_n in an individual coin-tossing game. A majority of coins are what our psychologists and sociologists would call maladjusted; i.e., they behave quite differently from the ensemble of coins. For example, the sums S_k are likely to remain of the same sign for unbelievably long periods: they show what many economists would call a trend.