part0012

Chapter 5

Probability and Stochastic Processes

5.1. Introduction

HERE we pass from deterministic models of optimization to models which use techniques from the theory of probability and from the quantitative study of inductive inference. By assigning a number, a probability, between zero and unity to the occurrence of an event, one has a measure of the likelihood of its occurrence. For example, unity indicates that the event will occur with certainty.

Probability theory provides a variety of useful models suitable for the description and interpretation of complex problems. As in other applications of mathematics, there is usually sufficient closeness between the ideal and the real to make the results useful. Repetition of a process with varying outcomes enables the detection of order, the formulation of a model, and subsequently the development of a theory with selected measures of effectiveness. This representation should be and usually is checked against actual data. When success, within prescribed limits, has been attained, the theory may then be used for prediction purposes.

In any case an application of the theory of probability generally derives its concepts from the underlying physical situation and proceeds to develop, as any nonabstract correct theory does, a logically sound system from which reliable answers may be deduced.

In our models, we shall assume that the reader is familiar with the basic concepts of probability.

The physical situations which prompted the development of the theory of stochastic processes are numerous. Time-ordered sequences of related random events occur commonly. The stock market averages for the coming days may be considered as random variables. If we wish to project several days ahead, we must determine the joint distribution of the stock market averages for those days, knowing the past history. Situations arise in which the relationship among random events depends both on time and on space: a simple example is that of causing a ripple in a still pond. The water level at any fixed point depends both on the time after the initial disturbance and the position of the point. The general model, a set of well-defined random variables with a known joint distribution, is again a useful construct.

While knowledge of the set of random variables and their joint distribution (or the joint distributions of all finite subsets if the number of random variables is infinite) is equivalent to complete information about any reasonable process, it is often not practical or possible to derive the joint distributions mathematically. To obtain an adequate resolution frequently requires much less information. Thus, a solution to a stochastic process does not always mean total knowledge of the joint distributions. Much of the work in stochastic processes has been directed towards the derivation of characteristics of a process without resort to the joint distributions.

In preparation for understanding the general ideas, probability measures and stochastic events need to be defined. The starting point is the basic space U called the sample space. The elements of U are the possible “occurrences,” “elementary events,” “outcomes,” or “effects.” Associated with U is a collection of measurable subsets B of U . The collection B forms a special kind of set called a sigma-field. A measure P , called a probability measure, is defined on the elements of B . A real-valued function x from U to the real numbers R , measurable with respect to B , is called a random variable. x is B measurable if and only if {ω :x (ω ) ≤ a } ∈ B for all a ∈ R . Associated with each random variable is a distribution function F , where

F (a ) is well-defined since the set, {x (ω ) ≤ a } ∈ B by the measurability of x . The introduction of a random variable and its distribution function allows us to translate the actual physical events into numerical form, in much the same way as a functional equation gives an explicit algebraic form for a cause–effect relationship. In fact, once the random variable is well defined, the underlying sigma-field and probability measure may be dropped. The range of the random variable will replace the set of physical events as the sample space, and all mathematical development will be performed at this level.

The joint distribution of a family of random variables (x 1 , x 2 ,…, x n ) is a multivariate function defined as follows:

In order to characterize stochastic processes, we start with a family of random variables. The elements which distinguish the various stochastic processes are the state space, the index set, and the set of all finite dimensional distribution functions. The state space is the subset of Euclidean n -space over which the values of the random variable range; the index set merely indexes and, in some cases, orders the random variables; the distributions give the relations between the random variables. These three elements are used to classify all stochastic processes.

We shall characterize four major types of stochastic processes:

(i) Independent increments. Given any finite subset of the index set, say
( t 1 , t 2 ,…, t n ), such that t 1 < t 2 <…< t n , then if the random variables x t 2 – x t 1 , x t 3 – x t 2 ,…, x t n – x t n – 1 , are independent, the result is a stochastic process with independent increments. In such a process, the joint distributions of any finite set of x t i ’s can be found easily, for x t i = z t 1 + z t 2 +…+ z t i , where z t i = x t i – x t i – 1 , and the z t i ’s are independent.

(ii) Martingales. Given any finite subset of the index set, say
( t 1 , t 2 ,…, t n , t n + 1 ), such that t 1 < t 2 <…< t n < t n + 1 ; then if E ( x t n + 1 | x t 1 = a 1 ,| x t 2 = a 2 … x t n = a n ) = a n , the stochastic process is called a martingale. (This is the basic condition for a fair game.)

(iii) Markov processes. Given any subset of the index set, say
( t 1 , t 2 ,…, t n ), such that t 1 < t 2 <…< t n , then if P ( x t n ≤ a n | x t 1 = a 1 , x t 2 = a 2 … x t n – 1 = a n – 1 ) = P ( x t n ≤ a n | x t n – 1 = a n – 1 ), the stochastic process is called a Markov process.

(iv) Stationary processes. Given any finite subset of the index set, say
( t 1 , t 2 ,…, t n ), and any h > 0, then if the joint distributions of x t 1 , x t 2 ,…, x t n , and x t 1 + h , x t 2 + h ,…, x t n , + h are the same, the stochastic process is a stationary process.

In practice, the existence of some stable limiting distribution may be important to us. The occurrence or recurrence of a particular state may be of interest. Sometimes one may wish to find the expected first passage times or recurrence times for given states. For specialized processes, existence theorems and methods of analysis exist for determining these characteristics without the necessity of deriving the complete joint distribution.

The use of stochastic processes lends itself easily to the following major areas of application: (i) queueing theory, (ii) inventory theory, (iii) reliability theory, (iv) renewal theory, and (v) stochastic optimization.

5.2. Applications of the Theory of Probability

We now give some elementary examples which make use of probability concepts. Our first two examples deal with very simple circuit and switching problems.

Example 1. Circuits and Switches

In the circuit in Fig. 5.1 , what is the probability that the bulb will be lit (i.e., the circuit closed), given that it is equally likely for any of the switches A , B , C , D to be open or closed.

FIG . 5.1.

The bulb will be lit if both switches A and B are closed or if either switch C or switch D is closed. Thus the desired probability is given by

The probability of any switch being closed is 1/2; and thus, P (C ) = P (D ) = 1/2. We also have, P (AB ) = P (CD ) = 1/4, P (ABC ) = P (ABD ) = 1/8, and P (ABCD ) = 1/16 since, because of independence, the probabilities are given by the products of the corresponding single-switch probabilities.

Example 2. Craps

In this game the “passer” rolls out two dice and wins if he makes 7 or 11 on the first roll (these are called “naturals”). He loses if he makes 2, 3, or 12. If he makes 4, 5, 6, 8, 9, 10 on the first roll, then he rolls until he makes the same point or gets 7. If he rolls 7 before getting the same point, he loses; otherwise, he wins. What is the probability of winning?

First, consider the probability of obtaining a particular sum on a single roll of two dice. The probability of turning up a particular point on each dice is clearly 1/6. Then the probability of obtaining a 2 is 1/6 × 1/6 = 1/36, by the compound-probability property of independent events. Now for a 3, one can have a 1 and 2 or 2 and 1. The probability is then 1/36 + 1/36 = 1/18, applying the properties of compound and total probability. For 4, one can have a 1 and 3, 3 and 1, 2 and 2 for a probability of 1/12. By considering all possible sums in this fashion, one has the probabilities given as follows:

The passer wins if he obtains one of the following:

A 7 or an 11 on the first roll, with probability P A .
A repetition of any other number (a 4,5,6,8,9,10) appearing on the first roll, before a 7 is rolled. Let this probability be P B . Now

To calculate P B , note that, having obtained a 4 on the first roll, the probability of obtaining another 4 on a subsequent roll before casting a 7 is the sum of the probabilities of obtaining a 4 on the first roll and on the second, obtaining a 4 on the first roll, not obtaining a 4 or a 7 on the second, then obtaining a 4 on the third roll, etc. This gives

Simplification gives

An alternative approach is as follows. There are nine ways of obtaining 4 or 7, three of which give 4; since the probability of the latter is 3/36, one has

This argument may be applied to the other numbers: the probability of winning on 5 is

the probability of winning on 6 is

the probability of winning on 8 is the same as on 6; the probability of winning on 9 is the same as on 5; the probability of winning on 10 is the same as on 4.

Thus the probability of winning at craps is less than one-half.

Example 3. What is Random?—The Subway Paradox

The Red Line Subway in Boston runs from Harvard Square to Quincy Center via Park Street (Fig. 5.2 ). Each day you arrive at the Park Street Station at a random time, go down

FIG . 5.2.

to the central platform, and get on the first Red Line train to arrive. Nine days out often you end up at Harvard Square. How can this be?

Most people's intuition tells them that one should go to Quincy Center as often as one goes to Harvard Square. This is not necessarily the case. The error results from considering the wrong property as being random.

Insight : notice that it is you, and not the trains, which arrives at random. Imagine, for example, that trains from Harvard Square to Quincy Center run every 10 minutes on a regular schedule. Also, to avoid having a build-up of trains at one end of the line or the other, trains also run from Quincy Center to Harvard Square every 10 minutes.

Solution : to resolve the paradox, imagine that each train bound for Quincy Center arrives 1 minute after the train bound for Harvard Square.

FIG . 5.3.

Clearly, if your random arrival time falls anywhere between the time of a Quincy Center bound train and a Harvard Square bound train (9 minutes) you will end up at Harvard Square; while if your arrival falls between the time of a Harvard Square bound train and a Quincy Center bound train (1 minute), you will end up at Quincy Center (Fig. 5.3 ).

Example 4. Meeting of Friends

Two friends, X and Y , plan to meet downtown between 12:00 noon and 1:00 p.m. Their arrival times are random variables. They agree that each will wait 10 minutes, and that if the other has not arrived during these 10 minutes he will leave. What is the probability that they will meet? Assume that if either arrives after 12:50 he will leave at 1:00 even though he has not waited the full 10 minutes.

We may show the situation graphically as in Fig. 5.4 .

FIG . 5.4.

Let x be the arrival time for X , 0 ≤ x ≤ 1. Let y be the arrival time for Y , 0 ≤ y ≤ 1.

Then y = x represents the event that both arrive together.

gives a bound to the situation that x arrives first and waits 10 minutes.

gives a bound to the situation where y arrives first and waits 10 minutes.

The region between these two lines represents the situation in which the two meet. Since they arrive at random over the hour, the probability is represented by the area of this belt, which is

We now consider a risk problem.

Example 5. Risk of Flying on Two-engine and on Four-engine Planes

Is it after to fly on a plane which has two engines or on a plane with four engines?

Let q be the probability of failure of a single engine and let p = 1 – q be the probability of successful operation of a single engine. We assume that a four-engine plane can fly on four or on three engines but that a two-engine plane needs both engines. Assuming independence, the probability of four successful engines is p 4 , and the probability of exactly three successful engines is 4p 3 (1 – p ). For the other plane, the probability of two successful engines is p 2 . We wish to determine the range of values of p such that p 4 + 4p 3 (1 – p ) ≥ p 2 ; i.e., we seek the range of values of p for which the four-engine plane would be preferred to the two-engine plane. The condition is equivalent to

Since p ≤ 1 we have 3p ≥ 1 or p ≥ 1/3. Thus the four-engine plane is safer if 1/3 ≤ p ≤ 1 and the two-engine plane is safer if 0 ≤ p ≤ 1/3. (At p = 1/3, each is equally safe.)

These results seem contrary to intuition because the higher the probability of successful operation of a single engine, the more one should ride the four-engine plane; the smaller probability, the more one should ride the two-engine plane. Intuitively it would seem safer to do the opposite.

One must remember that this result depends on our assumptions about safety. The problem might be modified by considering the four-engine plane as safe if at least one engine on each side is operating successfully; in this case it turns out that the four-engine plane is always safer.

Example 6. Bayes’ Theorem—Causes from effects or a priori from a posteriori

For our next example we use Bayes’ theorem, which obtains the probability of causes, given the probability of the effects. It often happens that one may wish to isolate the most likely cause to have given rise to an effect when all the possible causes are known and the conditional probability that if a certain cause occurs, then the effect is also known. This probability is determined empirically.

Suppose that an event (a cause) B consists of finite or countably infinite independent events B i . (The B i may be regarded as the different ways in which an event can occur.) Then P (B ) = ∑P (B i ) = 1. Let A be an event (an effect) and let P (B ) be the a priori probability of occurrence of A and P (A |B i ) be the conditional probability of its occurrence, given that B i has occurred; then the conditional probability of the cause B i , given that A has occurred, is obtained as follows:

Thus the last two expressions give

Thus the probability of the prior occurrence of cause B i , given that the effect A has occurred, is

which is known as Bayes’ theorem.

Example. A ball was transferred from an urn containing two white and two black balls to another containing three white and two black balls. A white ball was then drawn from the second urn. What is the probability that the transferred ball was white?

Let B 1 and B 2 be the two events of transferring a white ball and a black ball, respectively, and let A be the event of drawing a white ball from the second urn.

5.3. Applications of the Theory of Stochastic Processes

We now give some elementary examples which make use of the theory of stochastic processes. Our first example is better known as Random Walk. We give it here as relating to a drunkard, but it has also been applied to a young man killing time while waiting for his date to show up.

Example 1. The Drunkard

Suppose we find a drunken old man wandering his way up and down the street after leaving his favorite tavern. The street runs east–west. His possible transitions would include taking a step in an easterly direction, taking a step in a westerly direction, or falling flat on his face, and thus not going in any direction. Our street is illustrated in Fig. 5.5 . The tavern is at the center.

FIG . 5.5.

Now we wish to determine the probability of his being at a position k steps away after s transitions. If we assume his probability of movement is the same regardless of his position, we may let p be the probability of an eastward step and q and r be the probabilities of a westward step and of falling down, respectively. Let P k,s be the probability of being at position k after s transitions. There are only three ways we can get there in the sth transition, as shown in Table 5.1 .

TABLE 5.1

Thus we have the relationship

We now give a very general example which has many specific uses.

Example 2. Point Processes

Consider a system which can be in any one of n states E 1 …E n . Chance may enter the system in two ways: first, by fixing the times at which the system changes its states (such changes are called transitions), and, second, by fixing the order of the transition among the states. The probabilistic selection of instants of time at which the transitions take place suggests the use of the name “process” and because of the discrete instants involved it is called a point process.

Suppose that the transition events are independent of each other and of the time at which they occur; i.e., the probability of a transition during an interval (t , t + h ) does not depend on other transitions (and thus does not depend on what occurred before the instant t ) nor on the time t. The generation of telephone calls in a switch board and the arrivals of ships in a harbor satisfy this assumption. Thus, the probability that no event (transition) occurs during the time interval (t , t + h ) depends only on h. Denote this probability by p (h ). The probability that no event occurs during the time interval (t , t + h + k ) is then p (h + k ). The probability that no event occurs in (t , t + h ) followed by no event occurring in (t + h , t + h + k ) is p (h )p (k ), by our assumption that these two events are independent. Thus, for any h and k we have the functional equation describing the probability of nonoccurrence of transitions

The solution of this functional equation is given by p (t ) = e at Since 0 ≤ p (t ) ≤ 1, we must have a = – λ , with λ positive. If t i and t i + 1 are the successive instants of transition to states E i and E i + 1 , respectively, then

For n intervals separating n + 1 consecutive transition times, we obtain the probability as follows:

Then

and for

we have

Extending this to n successive time intervals, where

yields

which may be proved by induction.

Now let t = t i + n – t i so that u = λt . Then

Now if p n = probability of exactly n events in time T and π n = probability of at least n events in time T , then

Since p n = π n – π n + 1 , we obtain

which is the Poisson process with parameter λT.

There are more general definitions of point processes than the definition which we have given, but this is adequate for our purpose here.

Note that a Poisson process is an example of a continuous time (T = [0, ∞]) stochastic process and is a special case of a Markov process which we discuss next. The sample function x t counts the number of times a specified event occurs during the time period from 0 to t .

Concrete examples of such processes are the number of X-rays emitted by a substance undergoing radioactive decay, the number of telephone calls originating in a given locality, the occurrences of accidents at a given intersection, the occurrence of errors in a page of typing, breakdowns of a machine, and the arrival of customers for service. The justification for viewing these examples as Poisson processes is based on the concept of the law of rare events. We have a situation with several Bernoulli trials, each with a small probability of success, where the expected number of successes is constant. Under these conditions the actual number of events follows a Poisson law.

The Poisson process has the following characteristics:

Suppose t 0 < t 1 …<t n . The increments X t 1 , – X t 0 ,…, X t n – X t n – 1 are mutually independent random variables.
The probability distribution of X t 2 – X t 1 , t 2 > t 1 , depends only on t 2 – t 1 , and not, for example, on t 1 .
The probability of at least one event occurring in an interval of time t is given by λt + o (t ).

A more general derivation of the Poisson process (not necessarily homogeneous) may be obtained by assuming that the transitions in fact depend on the state, i.e., the probability of a transition in the interval (t , t + Δt ) from state E n to state E n + 1 is λ n Δt + o (Δt ).

Let P n (t ) be the probability of being in E n at time t . Then

Then

and

From the above expression, we have

and

We may now let λ n = λ for all n ; i.e., the probability of change is assumed constant regardless of the state. The solution of this system again gives the Poisson law

We now consider another general example with many specific uses.

Example 3. Markov Processes and Markov Chains; Semi-Markov Processes

Suppose we have a system which can be in any of n possible states E 1 …E n . We are interested in state changes which occur in time according to some probability law. Denote the state of the process at time t by X (t ). If a transition from state X (t ) to state X (t + 1) depends only on X (t ) and not on any of the previous states, i.e., if

then the process is called a Markov process as we have already noted. If both the state variable X and the time variable t are discrete valued; the process is called a Markov Chain.

If the process is time-independent, the transition probabilities are said to be stationary and P ij (t ) = P ij . We may write a one-step transition probability matrix as follows:

where p ij is the probability of going from state i to state j in one step.

The two-step probability matrix is given by P 2 since the probability of going from E i to E j in two steps is

and, from inductive reasoning, the n -step probability matrix is P n .

Markov processes have been used extensively in modeling voting behavior, queue size, and other phenomena. However, there are some processes which are slightly, but significantly different from those described above. In a Markov process, the length of time which a process spends in a given state is a random variable, independent of the particular state and any other state. Many common processes, however, do not have this underlying property. For example, everyone is familiar with the type of a queue which forms in front of the box office of a popular movie. Initially, when the queue is short, its size increases rapidly since people are willing to wait. However, after the queue has extended down the block and around the corner, people who arrive and wish to see the movie decide that they would rather not wait and they go elsewhere. The line then moves from the state of having a few people waiting before a box office with constant service rate, for example, to a state where the rate at which people join the queue depends on how many people are already there.

The treatment of an illness is another example of such a queue. The time which the patient spends in a particular stage of an illness is often a function of that stage and also of the stage which he is going to next.

To summarize, the salient points of these processes, called semi-Markovian processes, are as follows. The transitions between states and their probabilities are those of a Markov process, but the time spent in each stage is a random variable which is a function of both the state the process is in and the state to which the process is changing. For both Markov and semi-Markov processes, we are interested in such quantities as the expected time of first passage to state j , starting in state i , the number of visits to state j in n state changes, the number of visits to state j between visits to state i , the mean time between recurrences of state i , the mean time to absorption, given there are one or more states which absorb the process (machine failure, etc.). Absorption occurs at a state when the process is trapped there and cannot get out of that state. We illustrate the differences between the two types of process by looking at the so-called first passage time, the expected length of time for the process to get to state j for the first time, given that it starts out in state i .
Let m ik be the Markov mean first passage time from i to k ; l ik be the semi-Markov mean first passage time from i to k ; F ik (t ) be the distribution function, for the semi-Markov process, for the “wait” of the process in state i , given that the next transition will be to state k ; and h ik be the mean of the distribution function F ik (t ).

Now there are two possible ways to get from i to j : (i) directly from i to j in one transition, or (ii) from i to some state k ≠ j in one transition, and then from k to j in an unknown number of transitions which will take either m kj or l kj time units, depending on whether it is a Markov or semi-Markov process. This leads to the following sets of equations, one set for each process, which can be solved for the m ij or l ij respectively.

Markovian:

Semi-Markovian:

The calculations are similar, once the h ik ’s have been determined (in practice, this may not be a trivial task).

We give a very simple example of a Markov process: weather forecasting. Let the three types of weather in a specific city be given as:

We have been given reason to assume that the transition probability matrix from one day to the next is given by:

We want to know the probabilities over a two-day period, and, and also in the long run.

Over two days, the probabilities are given by

Thus, given that it is raining today, there is zero probability of rain tomorrow, but there is a 25% chance of rain the day after tomorrow.

The transition probability matrix for n days is found by

. Eventually P n converges to a limit to give the long-range behavior. Denote the stationary probabilities of each type of weather by π N , π R , π S . We may either seek the limit of P n as n → ∞ or we may solve the equation πP 1 = π , where

since this expresses the fact that the probabilities do not change from one day to the next.

This equation gives

In our example, this becomes

to give

This means that someone coming on a randomly chosen day has a 40% chance for nice weather, a 20% chance for a rainy day, and a 40% chance for a snowy day.

We now proceed to another very general class of models, known as birth and death processes.

Example 4. Birth and Death Processes

Consider a continuous time-discrete state Markov chain. Suppose further that the following assumptions are satisfied: (i) the probability of going from state E n to state E n + 1 in the interval (t , t + h ) is (λ n h )(1 – μ n h ) + o (h ) where o (h ) is a function such that

= 0; (ii) the probability of a transition from state E n to state E n – 1 in the interval (t , t + h ) is μ n h (1 – λ n h ) + o (h ); (iii) the probability of a transition from state E n to any state other than E n , E n – 1 or E n + 1 in (t , t + h )is o (h ); (iv) the probability of remaining in state E n in (t , t + h ) is (1 – λ n h )(1 – μ n h ) + o (h ); (v) the transition probabilities are time-independent. Then the above process is called a birth–death process.

We may, for example, be interested in knowing how many people will be living in a city at some future date. Thus, we may be concerned with the probability P n (t ) of having n persons (elements) in this city (system) at some time t . Our state E n would then denote the state where there are n persons or elements.

Now, we may arrive at a system with n elements at time t + h in several ways: there may be n + 1 elements at time t , and a death occurs in the interval (t , t + h ); there may be n – 1 elements at time t and a birth occurs in the interval (t , t + h ); there may be n elements at time t and neither a birth nor a death occurs in the interval (t , t + h ) or there may be n ± c, c ≥ 2, elements at time t , and some combination of two or more events (births and/or deaths) occurs in the interval (t , t + h ). We may summarise as in Table 5.2 .

TABLE 5.2

No. of elements at time t	Event(s) in (t , t + h )	Probability of n elements at time t + h
n	No birth	P n (t )(1 – λ n h )(1 – μ n h ) + o (h )
	No death	= P n (t )(1 – λ n h – μ n h ) + o (h )
n – 1	Birth	P n – 1 (t )(λ n – 1 h )(1 – μ n – 1 h ) + o (h )
	No death	= P n – 1 (t )λ n – 1 h + o (h )
n + l	Death	P n + 1 (t )(1 –λ n + 1 h )μ n + 1 h + o (h )
	No birth	= P n + 1 (t )μ n + 1 h + o (h )
n ± c	More than I event	o (h )

Thus

Therefore,

i.e.,

If we assume that all elements in the population have an equal probability of giving birth to a new element, and similarly for deaths, then λ n = nλ and μ n = nμ .

Thus

A process may be absorbed or trapped in a state or it may be reflected to a neighboring state. Thus E N is an absorbing barrier if λ n = μ n = 0 for n ≥ N . The state E 0 which now is a reflecting state becomes an absorbing state if λ 0 = 0. Applications of these ideas have been made to the spread of epidemics.

The following is an illustration of embedded Markov chains in a birth–death context.

*Example 5. The Pollaczek–Khintchine Queue [Saaty]

Suppose that arrivals occur at random by a Poisson process with the rate λ per unit time, to a waiting line, in statistical equilibrium, before a single-service facility. Also suppose that they are served by an arbitrary service-time distribution at the rate of μ per unit time—first come, first served. We assume that λ/μ < 1. Suppose that a departing customer leaves q in line, including the one in service, whose service time is t. Let r customers arrive during this time t . If the next departing customer leaves q′ customers behind, we can relate q and q′ as follows:

Note that by introducing δ we avoid using the max expression.

It is assumed that equilibrium values for the first and second moments E [q ] and E [q 2 ] of the queue exist. Note that q is treated as a random variable. Now, from the definition we have, δ 2 = δ and q (1 – δ ) = q . Also, E [q ] = E [q ′] and E [q 2 ] = E [q ′2 ], since both q and q ′ are assumed to have the same equilibrium distribution. Note that, because the system is in equilibrium there is no difference between the types of queue left behind any customer; i.e., they all have the same probability distribution independent of time.

Taking the expected value of the equation in the different variables, we have

from which we have

But during a service time of length t we have

Thus, on taking expectations with respect to the service time t , one has E [r ] = λ/μ = ρ . For example, if the service distribution is exponential, we have

since the average value of the service-time distribution is 1/μ .

Note that E [r ] is a number which is unaffected by taking averages. This gives E [δ ] = 1 – ρ . Now the probability of r arrivals is independent of q , the length of the queue, and of δ , which assumes values that depend only on q which is independent of r. Consequently, if we take the expected value over the variables r, q , and δ , we may take the expected value of r and of q separately wherever we encounter their product. This is also true for r and δ.

Also, averaging r 2 over time yields

Again, by squaring the equation relating q and q ′ and using the facts that δ 2 = δ , δq ≡ 0,

Hence, because of equilibrium,

or, simplifying and using the foregoing relations, we have the Pollaczek–Khintchine formula:

Thus, once we know the variance of the service time t from its given distribution, the average number in the system is determined. It is important to observe that the foregoing average is taken over instants just following departures and is not the time average value of the number in the system. In fact, if E t (q ) is the time average, all we can say without further argument is that E [q ] ≤ E t (q ) < E [q ] + 1.

Note the fact, which holds in general (i.e., even if one has several channels in parallel), that the average number in the system equals the sum of the average number of busy channels (in this can it is ρ , the traffic intensity) and the average number in line.

Now to obtain the average waiting time we argue as follows: If we write E [w ] for the average waiting time in the queue (not including service), λ [E (w ) + 1/μ ] is the expected number of arrivals during the total waiting plus service time of one customer; i.e., of his stay in the system. But this must be just the number in the system immediately after his departure; namely E [q ]. Thus

The smaller var(t ) the less is the waiting time, e.g., if the service distribution is constant we have var(t ) = 0. Note that if we consider the birth–death equations and put λ n = λ and μ n = μ , μ 0 = 0, and equate the derivatives to zero, we obtain the steady-state equations of a single-channel first-come first-served queue with Poisson input and exponential service time:

from which we have

for the number in queue. Check the Pollaczek–Khintchine formula for this result by putting f (t ) = e – μt .

Another general process of wide application follows.

Example 6. Gaussian (Normal) Process

The importance of the Gaussian distribution in probability theory arises from the fact that many random variables may be considered as normally distributed and the normal distribution is tractable and convenient to handle.

A stochastic process {x (t ), t ∈ T } is said to be a Gaussian process if, for any integer n and any subset {t 1 , t 2 ,…, t n } of T , the n random variables X (t 1 ), X (t 2 ),…, X (t n ) are jointly normally distributed. An example of a Gaussian process is the Wiener process which also provides a mathematical model for Brownian motion.

Consider the motion of a particle on a line as a result of numerous random impacts with other particles. Take the origin as its position at time t = 0. If we assume that the time between successive impacts are independently and exponentially distributed with mean 1/μ , then the number N (t ) of impacts in time t is a Poisson process, with intensity μ . Assume that the effect on the particle by an impact is to change its position by ± a , each having a probability of

. The position of the particle at time t may be represented by

, where Y n is the particle's change of position resulting from the n th impact. It can be shown that {X (t ), T > 0} is a stochastic process with stationary independent increments. Its characteristic function is

Defining σ 2 = a 2 μ , the logarithm of the characteristic function is given by

Now let μ → ∞ and a → 0 in such a way that μa 2 = σ 2 (the total mean square displacement of the particle per unit time) is constant. Then, ϕ X ( t ) (x ) → exp (–

x 2 σ 2 t ).

Thus, X (t ) is approximately normally distributed with stationary independent increments. This is the Wiener process, which is a special case of the Gaussian process with E [X (t )] = 0.

The methodology has been applied to noise currents in a vacuum tube.

If each element of a system produces a number of new elements at each stage of a process according to a probability law, we have a growth phenomenon known as a branching process.

Example 7 . Bacterial Growth: A Branching Process

For example, consider bacteria which reproduce only at discrete intervals. Let p k be the probability that a single organism reproduces k new organisms in one time period. We shall assume that all organisms reproduce independently of the others. In this particular problem, we make the simplifying assumption on X t , the number in the system at time t , that X 0 = 1. Then X 1 has the probability distribution {p k } and

is the probability generating function of X 1 . The generating function of X 2 , P 2 (s ), is given by

If X 1 = n , P 2 (s |X 1 = n )= [P (s )]n .

Similarly,

From this we can derive the extinction probabilities P (X t = 0) for each state. If p 0 = 0, the extinction probabilities are obviously zero, so we shall assume that 0 < p 0 < 1. If x t is the extinction probability at the t th stage, then x t = P (X t = 0).

Consider the probability generating function

We have

Further,

since all organisms act independently. Recursively, we have

Since P (s ) is a monotone increasing function for 0 < s < 1, we have x 1 < x 2 <…< x n <….

Further, P (s ) increases to some number τ such that τ = P (τ ). If u is any positive root of u = P (u ), then x n < P (u ) = u for all n and hence, τ < u . Therefore,

the smallest possible root of u = P (u ). Now, u = P (u ) has a positive root less than unity, if and only if P ′(1) > 1 since P (s ) is a convex monotone function.

Let

be the expected number of descendants of a particular organism. Then if μ < 1, the probability of eventual extinction tends to one; that is,

= 1. If μ > 1, there exists some ξ < 1 such that

, which is the probability of extinction in finite time. Further, we call 1 – ξ the probability of an infinitely prolonged process, or of the establishment of the process.

Now if X 0 = r , then

if u > 1 and the probability of establishment is 1 – ξ r , which is near 1 if r is large. Further, by induction, we obtain the result E (x r ) = μ r . Thus, either the process explodes or expires, and a stable position is highly improbable.

Example 8. Availability

Let F (t ) be the failure density function at time t of a component in an electronic system and P (t ) the cumulative probability that maintenance will not be completed by time t. The unavailability U (t ) of the component at time t given that it was put into operation at time t = 0 is given by

Let

We define the availability A (t ) as

The same steady-state result can be obtained by using arbitrary distributions for F (t ) and P (t ). [Note that if we defined availability A as a random variable A = X /(X + Y ), which is a ratio of the time between failures X and the time between failures X plus the time needed for replacement Y , then the expected value of A cannot be written as the ratio of the expected values. If Y /X is always very small we may expand the right side in powers of Y /X as follows:

and then take the expected value on both sides. The last term of the alternating series provides an estimate of the error.]

In our next chapters we consider a number of examples classified by area of application rather than by type of model.

Chapter 5—Problems

1. A shot is fired from each of three guns A , B , C at a single target, the probability of a hit being 0.1, 0.2, 0.3 respectively. Find the probability distribution for the number of shots.

2. In the circuit of Fig. 5.1 (p. 81), given that the bulb is lit, what is the probability that switches A and B are both closed? [

3. If two integers A and B , B > A are selected at random, what is the probability that they have no common divisor?

4. A player X plays one game with one of two players A and B , a second game with the other and finally, a third game with the first one. The probability that X wins a game against A is 1/3; that against B is 2/3. X wins if he wins two consecutive games. Should he play the sequence ABA or BAB to improve his chances of winning? [ABA .]

5. Consider an organism whose only purpose is searching for food on a branching system of paths. It has no maximization objective but only needs to maintain a certain average rate of food intake (a satisficing objective). The food occurs at random on any of the nodes of the branching system. The probability that food is located at any node is p > q . From any node there are d branches to d nodes. The organism can see all nodes that are n moves away from its position and hence can follow a definite path if food is seen ahead. (A move is a transition along a branch from one node to an adjacent one.) If food has not been sighted, the organism is indifferent as to which node to approach next. The maximum number of moves the organism can make between meals without starvation is N . Determine P , the probability that the organism will survive from meal to meal.

6. A needle of length l is thrown at random on a board on which parallel lines are drawn d units apart. If d > l , what is the probability that the needle touches one of the parallel lines?

7. In a study of past records, it has been found that 25% of all shirts manufactured had an imperfection in them. Two persons are hired to inspect the shirts before they are shipped from the factory. The probability that either inspector will misclassify a shirt is 10% and their decisions are independent. (a) If it is decided to class as imperfect any shirt which either or both inspectors reject, what is the probability that if a shirt is classified as good, it is actually good; actually imperfect? (b) If it is decided to class as imperfect only the shirts that both inspectors reject, what is the probability that a shirt classified as good is actually good; actually imperfect? Also, if it is classified as imperfect, what is the probability it is actually good; actually imperfect? [Hint : use Bayes’ theorem.]

Bibliography

Bailey, N. T. J., The Elements of Stochastic Processes With Applications to Natural Sciences , Wiley, New York, 1964.

Bartlett, M. S., Stochastic Processes , Cambridge University Press, 1962.

Benes, V. E., General Stochastic Processes in the Theory of Queues , Addison-Wesley, Reading, Massachusetts, 1963.

Bharucha-Reid, A. T., Elements of the Theory of Markov Processes and Their Applications , McGraw-Hill, New York, 1960 (Chapter 9, The Theory of Queues).

Bodino, G. A., and F. Brambilla, Teoria delle Code , Cisalpino, Milano, 1959.

Chung, K. L., Markov Chains with Stationary Transition Probabilities , Springer-Verlag, Berlin, 1960.

Cox, D. R. and W. L. Smith, Queues , Wiley, New York, 1961.

Doob, J. L., Stochastic Processes , Wiley, New York, 1953.

Dynkin, E. B., Markov Processes (2 volumes), Academic Press and Springer-Verlag, 1965.

Dynkin, E. B., Theory of Markov Processes , Prentice-Hall, Englewood Cliffs, New Jersey, 1961.

Feller, W., An Introduction to Probability Theory With Applications , Vol. II, Wiley, New York, 1966.

Feller, W., An Introduction to Probability Theory and Its Applications (2nd edn.), Wiley, New York, 1957.

Girault, M., Stochastic Processes , Springer-Verlag, Berlin, 1966.

Harris, T. E., The Theory of Branching Processes , Prentice-Hall, Englewood Cliffs, New Jersey (Springer-Verlag), 1963.

Howard, R. A., Dynamic Programming and Markov Processes , Wiley, New York, 1960.

Ito, K. and H. J. McKean, Diffusion Processes and their Sample Paths , Springer-Verlag, Berlin, 1965.

Karlin, S., A First Course in Stochastic Processes , Academic Press, New York, 1968.

Kemeny, J. G., and J. L. Snell, Finite Markov Chains , van Nostrand, New Jersey, 1960.

Kemperman, H. B., The Passage Problem for a Stationary Markov Chain , University of Chicago Press, Chicago, 1961.

Khintchine, A. Y., Mathematical Methods in the Theory of Queueing , Griffin, London, 1960.

LeGall, P., Les Systemes Avec ou sans Attente , Editions Dunod, Paris, 1962.

Loeve, M., Probability Theory , 3rd edn., van Nostrand, New Jersey, 1963.

Meyer, P. A., Probability and Potentials , Blaisdell, New York, 1966.

Morse, P. M., Queues, Inventories and Maintenance , ORSA, Wiley, New York, 1958.

Neven, T., Mathematical Foundations of the Calculus of Probability , Holden-Day, San Francisco, California, 1965.

Parzen, E., Stochastic Processes , Holden-Day, San Francisco, California, 1962.

Pollaczek, F., Theorie Analytique des Problems Stochastiques Relatifs a un Groupe de Lignes Telephoniques avec Dispositif d’attente , Mem. Sci. Math., Gauthier-Villars, Paris, 1961.

Pollaczek. F., Problemes Stochastiques Poses par le phenomene de Formation d'une queue d’attente a un Guichet et par des Phenomenes Apparentes , Mem. Sci. Math., Vauthier-Villars, Paris, 1957.

Prabhu, N. U., Queues and Inventories , Wiley, New York, 1965.

Prabhu, N. U., Stochastic Processes , Macmillan, New York, 1965.

Riordan, J., Stochastic Service Systems , Wiley, New York, 1962.

Rosenblatt, M., Random Processes , Oxford University Press, Oxford, 1962.

Runnenburg, J., Th., On the Use of Markov Processes in One Server Waiting Time Problems and Renewal Theory , University of Amsterdam, 1960.

Saaty, T. L., Elements of Queueing Theory With Applications , McGraw-Hill, New York, 1961.

Spitzer, F., Principles of Random Walk , van Nostrand, New Jersey, 1964.

Syski, R., Introduction to Congestion Theory in Telephone Systems , Oliver & Boyd, Edinburgh, 1960.

Syski, R., Stochastic differential equations, Chapter 8 , in T. Saaty, Modern Non-linear Equations , McGraw-Hill, New York, 1967.

Takacs, L., Introduction to the Theory of Queues , Oxford University Press, Oxford, 1962.

Takacs, L., Stochastic Processes ( Problems and Solutions ), Methuen, London, 1960.