Introduction to Probability Models

Simple calculus shows that the preceding is minimized when

a=-Cov(f(X),g(X))Var(f(X))

and, for this value of a,

Var(W)=Var(g(X))-[Cov(f(X),g(X))]2Var(f(X))

Because Var(f(X)) and Cov(f(X),g(X)) are usually unknown, the simulated data should be used to estimate these quantities.

Dividing the preceding equation by Var(g(X)) shows that

Var(W)Var(g(X))=1-Corr2(f(X),g(X))

where Corr(X,Y) is the correlation between X and Y. Consequently, the use of a control variate will greatly reduce the variance of the simulation estimator whenever f(X) and g(X) are strongly correlated.

Example 11.20

Consider a continuous-time Markov chain that, upon entering state i, spends an exponential time with rate vi in that state before making a transition into some other state, with the transition being into state j with probability Pi,j,i⩾0,j≠i. Suppose that costs are incurred at rate C(i)⩾0 per unit time whenever the chain is in state i,i⩾0. With X(t) equal to the state at time t, and α being a constant such that 0<α<1, the quantity

W=∫0∞e-αtC(X(t))dt

represents the total discounted cost. For a given initial state, suppose we want to use simulation to estimate E[W]. Whereas at first it might seem that we cannot obtain an unbiased estimator without simulating the continuous-time Markov chain for an infinite amount of time (which is clearly impossible), we can make use of the results of Example 5.1, which gives the equivalent expression for E[W]:

E[W]=E∫0TC(X(t))dt

where T is an exponential random variable with rate α that is independent of the continuous-time Markov chain. Therefore, we can first generate the value of T, then generate the states of the continuous-time Markov chain up to time T, to obtain the unbiased estimator ∫0TC(X(t))dt. Because all the cost rates are nonnegative this estimator is strongly positively correlated with T, which will thus make an effective control variate. ■

Example 11.21

A Queueing System

Let Dn+1 denote the delay in queue of the n+1 customer in a queueing system in which the interarrival times are independent and identically distributed (i.i.d.) with distribution F having mean μF and are independent of the service times, which are i.i.d. with distribution G having mean μG. If Xi is the interarrival time between arrival i and i+1, and if Si is the service time of customer i,i⩾1, we may write

Dn+1=g(X1,…,Xn,S1,…,Sn)

To take into account the possibility that the simulated variables Xi,Si may by chance be quite different from what might be expected we can let

f(X1,…,Xn,S1,…,Sn)=∑i=1n(Si-Xi)

As E[f(X,S)]=n(μG-μF) we could use

g(X,S)+a[f(X,S)-n(μG-μF)]

as an estimator of E[Dn+1]. Since Dn+1 and f are both increasing functions of Si,-Xi,i=1,…,n it follows from Theorem 11.1 that f(X,S) and Dn+1 are positively correlated, and so the simulated estimate of a should turn out to be negative.

If we wanted to estimate the expected sum of the delays in queue of the first N(T) arrivals, then we could use ∑i=1N(T)Si as our control variable. Indeed as the arrival process is usually assumed independent of the service times, it follows that

E∑i=1N(T)Si=E[S]E[N(T)]

where E[N(T)] can either be computed by the method suggested in Section 7.8 or estimated from the simulation as in Example 11.18. This control variable could also be used if the arrival process were a nonhomogeneous Poisson with rate λ(t); in this case,

E[N(T)]=∫0Tλ(t)dt■

11.6.4 Importance Sampling

Let X=(X1,…,Xn) denote a vector of random variables having a joint density function (or joint mass function in the discrete case) f(x)=f(x1,…,xn), and suppose that we are interested in estimating

θ=E[h(X)]=∫h(x)f(x)dx

where the preceding is an n-dimensional integral. (If the Xi are discrete, then interpret the integral as an n-fold summation.)

Suppose that a direct simulation of the random vector X, so as to compute values of h(X), is inefficient, possibly because (a) it is difficult to simulate a random vector having density function f(x), or (b) the variance of h(X) is large, or (c) a combination of (a) and (b).

Another way in which we can use simulation to estimate θ is to note that if g(x) is another probability density such that f(x)=0 whenever g(x)=0, then we can express θ as

θ=∫h(x)f(x)g(x)g(x)dx=Egh(X)f(X)g(X) (11.14)

(11.14)

where we have written Eg to emphasize that the random vector X has joint density g(x).

It follows from Equation (11.14) that θ can be estimated by successively generating values of a random vector X having density function g(x) and then using as the estimator the average of the values of h(X)f(X)/g(X). If a density function g(x) can be chosen so that the random variable h(X)f(X)/g(X) has a small variance then this approach—referred to as importance sampling—can result in an efficient estimator of θ.

Let us now try to obtain a feel for why importance sampling can be useful. To begin, note that f(X) and g(X) represent the respective likelihoods of obtaining the vector X when X is a random vector with respective densities f and g. Hence, if X is distributed according to g, then it will usually be the case that f(X) will be small in relation to g(X) and thus when X is simulated according to g the likelihood ratio f(X)/g(X) will usually be small in comparison to 1. However, it is easy to check that its mean is 1:

Egf(X)g(X)=∫f(x)g(x)g(x)dx=∫f(x)dx=1

Thus we see that even though f(X)/g(X) is usually smaller than 1, its mean is equal to 1; thus implying that it is occasionally large and so will tend to have a large variance. So how can h(X)f(X)/g(X) have a small variance? The answer is that we can sometimes arrange to choose a density g such that those values of x for which f(x)/g(x) is large are precisely the values for which h(x) is exceedingly small, and thus the ratio h(X)f(X)/g(X) is always small. Since this will require that h(x) sometimes be small, importance sampling seems to work best when estimating a small probability; for in this case the function h(x) is equal to 1 when x lies in some set and is equal to 0 otherwise.

We will now consider how to select an appropriate density g. We will find that the so-called tilted densities are useful. Let M(t)=Ef[etX]=∫etxf(x)dx be the moment generating function corresponding to a one-dimensional density f.

Definition 11.2

A density function

ft(x)=etxf(x)M(t)

is called a tilted density of f,-∞<t<∞.

A random variable with density ft tends to be larger than one with density f when t>0 and tends to be smaller when t<0.

In certain cases the tilted distributions ft have the same parametric form as does f.

Example 11.22

If f is the exponential density with rate λ then

ft(x)=Cetxλe-λx=λCe-(λ-t)x

where C=1/M(t) does not depend on x. Therefore, for t⩽λ,ft is an exponential density with rate λ-t.

If f is a Bernoulli probability mass function with parameter p, then

f(x)=px(1-p)1-x,x=0,1

Hence, M(t)=Ef[etX]=pet+1-p and so

ft(x)=1M(t)(pet)x(1-p)1-x=petpet+1-px1-ppet+1-p1-x (11.15)

(11.15)

That is, ft is the probability mass function of a Bernoulli random variable with parameter

pt=petpet+1-p

We leave it as an exercise to show that if f is a normal density with parameters μ and σ2 then ft is a normal density with mean μ+σ2t and variance σ2. ■

In certain situations the quantity of interest is the sum of the independent random variables X1,…,Xn. In this case the joint density f is the product of one-dimensional densities. That is,

f(x1,…,xn)=f1(x1)⋯fn(xn)

where fi is the density function of Xi. In this situation it is often useful to generate the Xi according to their tilted densities, with a common choice of t employed.

Example 11.23

Let X1,…,Xn be independent random variables having respective probability density (or mass) functions fi, for i=1,…,n. Suppose we are interested in approximating the probability that their sum is at least as large as a, where a is much larger than the mean of the sum. That is, we are interested in

θ=P{S⩾a}

where S=∑i=1nXi, and where a>∑i=1nE[Xi]. Letting I{S⩾a} equal 1 if S⩾a and letting it be 0 otherwise, we have that

θ=Ef[I{S⩾a}]

where f=(f1,…,fn). Suppose now that we simulate Xi according to the tilted mass function fi,t,i=1,…,n, with the value of t,t>0 left to be determined. The importance sampling estimator of θ would then be

θˆ=I{S⩾a}∏fi(Xi)fi,t(Xi)

Now,

fi(Xi)fi,t(Xi)=Mi(t)e-tXi

and so

θˆ=I{S⩾a}M(t)e-tS

where M(t)=∏Mi(t) is the moment generating function of S. Since t>0 and I{S⩾a} is equal to 0 when S<a, it follows that

I{S⩾a}e-tS⩽e-ta

and so

θˆ⩽M(t)e-ta

To make the bound on the estimator as small as possible we thus choose t,t>0, to minimize M(t)e-ta. In doing so, we will obtain an estimator whose value on each iteration is between 0 and mintM(t)e-ta. It can be shown that the minimizing t, call it t∗, is such that

Et∗[S]=Et∗∑i=1nXi=a

where, in the preceding, we mean that the expected value is to be taken under the assumption that the distribution of Xi is fi,t∗ for i=1,…,n.

For instance, suppose that X1,…,Xn are independent Bernoulli random variables having respective parameters pi, for i=1,…,n. Then, if we generate the Xi according to their tilted mass functions pi,t,i=1,…,n, the importance sampling estimator of θ=P{S⩾a} is

θˆ=I{S⩾a}e-tS∏i=1npiet+1-pi

Since pi,t is the mass function of a Bernoulli random variable with parameter piet/(piet+1-pi) it follows that

Et∑i=1nXi=∑i=1npietpiet+1-pi

The value of t that makes the preceding equal to a can be numerically approximated and then utilized in the simulation.

As an illustration, suppose that n=20,pi=0.4,anda=16. Then

Et[S]=200.4et0.4et+0.6

Setting this equal to 16 yields, after a little algebra,

et∗=6

Thus, if we generate the Bernoullis using the parameter

0.4et∗0.4et∗+0.6=0.8

then because

M(t∗)=(0.4et∗+0.6)20ande-t∗S=(1/6)S

we see that the importance sampling estimator is

θˆ=I{S⩾16}(1/6)S320

It follows from the preceding that

θˆ⩽(1/6)16320=81/216=0.001236

That is, on each iteration the value of the estimator is between 0 and 0.001236. Since, in this case, θ is the probability that a binomial random variable with parameters 20, 0.4 is at least 16, it can be explicitly computed with the result θ=0.000317. Hence, the raw simulation estimator I, which on each iteration takes the value 0 if the sum of the Bernoullis with parameter 0.4 is less than 16 and takes the value 1 otherwise, will have variance

Var(I)=θ(1-θ)=3.169×10-4

On the other hand, it follows from the fact that 0⩽θˆ⩽0.001236 that (see Exercise 33)

Var(θˆ)⩽2.9131×10-7■

Example 11.24

Consider a single-server queue in which the times between successive customer arrivals have density function f and the service times have density g. Let Dn denote the amount of time that the nth arrival spends waiting in queue and suppose we are interested in estimating α=P{Dn⩾a} when a is much larger than E[Dn]. Rather than generating the successive interarrival and service times according to f and g, respectively, they should be generated according to the densities f-t and gt, where t is a positive number to be determined. Note that using these distributions as opposed to f and g will result in smaller interarrival times (since -t<0) and larger service times. Hence, there will be a greater chance that Dn>a than if we had simulated using the densities f and g. The importance sampling estimator of α would then be

αˆ=I{Dn>a}et(Sn-Yn)[Mf(-t)Mg(t)]n

where Sn is the sum of the first n interarrival times, Yn is the sum of the first n service times, and Mf and Mg are the moment generating functions of the densities f and g, respectively. The value of t used should be determined by experimenting with a variety of different choices. ■

11.7 Determining the Number of Runs

Suppose that we are going to use simulation to generate r independent and identically distributed random variables Y(1),…,Y(r) having mean μ and variance σ2. We are then going to use

Y¯r=Y(1)+⋯+Y(r)r

as an estimate of μ. The precision of this estimate can be measured by its variance

Var(Y¯r)=E[(Y¯r-μ)2]=σ2/r

Hence, we would want to choose r, the number of necessary runs, large enough so that σ2/r is acceptably small. However, the difficulty is that σ2 is not known in advance. To get around this, you should initially simulate k runs (where k⩾30) and then use the simulated values Y(1),…,Y(k) to estimate σ2 by the sample variance

∑i=1k(Y(i)-Y¯k)2/(k-1)

Based on this estimate of σ2 the value of r that attains the desired level of precision can now be determined and an additional r-k runs can be generated.

11.8 Generating from the Stationary Distribution of a Markov Chain

11.8.1 Coupling from the Past

Consider an irreducible Markov chain with states 1,…,m and transition probabilities Pi,j and suppose we want to generate the value of a random variable whose distribution is that of the stationary distribution of this Markov chain. Whereas we could approximately generate such a random variable by arbitrarily choosing an initial state, simulating the resulting Markov chain for a large fixed number of time periods, and then choosing the final state as the value of the random variable, we will now present a procedure that generates a random variable whose distribution is exactly that of the stationary distribution.

If, in theory, we generated the Markov chain starting at time -∞ in any arbitrary state, then the state at time 0 would have the stationary distribution. So imagine that we do this, and suppose that a different person is to generate the next state at each of these times. Thus, if X(-n), the state at time -n, is i, then person -n would generate a random variable that is equal to j with probability Pi,j,j=1,…,m, and the value generated would be the state at time -(n-1). Now suppose that person -1 wants to do his random variable generation early. Because he does not know what the state at time -1 will be, he generates a sequence of random variables N-1(i),i=1,…,m, where N-1(i), the next state if X(-1)=i, is equal to j with probability Pi,j,j=1,…,m. If it results that X(-1)=i, then person -1 would report that the state at time 0 is

S-1(i)=N-1(i),i=1,…,m

(That is, S-1(i) is the simulated state at time 0 when the simulated state at time -1 is i.)

Now suppose that person -2, hearing that person -1 is doing his simulation early, decides to do the same thing. She generates a sequence of random variables N-2(i),i=1,…,m, where N-2(i) is equal to j with probability Pi,j,j=1,…,m. Consequently, if it is reported to her that X(-2)=i, then she will report that X(-1)=N-2(i). Combining this with the early generation of person -1 shows that if X(-2)=i, then the simulated state at time 0 is

S-2(i)=S-1(N-2(i)),i=1,…,m

Continuing in the preceding manner, suppose that person -3 generates a sequence of random variables N-3(i),i=1,…,m, where N-3(i) is to be the generated value of the next state when X(-3)=i. Consequently, if X(-3)=i then the simulated state at time 0 would be

S-3(i)=S-2(N-3(i)),i=1,…,m

Now suppose we continue the preceding, and so obtain the simulated functions

S-1(i),S-2(i),S-3(i),…,i=1,…,m

Going backward in time in this manner, we will at some time, say -r, have a simulated function S-r(i) that is a constant function. That is, for some state j,S-r(i) will equal j for all states i=1,…,m. But this means that no matter what the simulated values from time -∞ to -r, we can be certain that the simulated value at time 0 is j. Consequently, j can be taken as the value of a generated random variable whose distribution is exactly that of the stationary distribution of the Markov chain.

Example 11.25

Consider a Markov chain with states 1, 2, 3 and suppose that simulation yielded the values

N-1(i)=3,ifi=12,ifi=22,ifi=3

and

N-2(i)=1,ifi=13,ifi=21,ifi=3

Then

S-2(i)=3,ifi=12,ifi=23,ifi=3

N-3(i)=3,ifi=11,ifi=21,ifi=3

then

S-3(i)=3,ifi=13,ifi=23,ifi=3

Therefore, no matter what the state is at time -3, the state at time 0 will be 3. ■

Remark

The procedure developed in this section for generating a random variable whose distribution is the stationary distribution of the Markov chain is called coupling from the past.

11.8.2 Another Approach

Consider a Markov chain whose state space is the nonnegative integers. Suppose the chain has stationary probabilities, and denote them by πi,i⩾0. We now present another way of simulating a random variable whose distribution is given by the πi,i⩾0, which can be utilized if the chain satisfies the following property. Namely, that for some state, which we will call state 0, and some positive number α

Pi,0⩾α>0

for all states i. That is, whatever the current state, the probability that the next state will be 0 is at least some positive value α.

To simulate a random variable distributed according to the stationary probabilities, start by simulating the Markov chain in the obvious manner. Namely, whenever the chain is in state i, generate a random variable that is equal to j with probability Pi,j,j⩾0, and then set the next state equal to the generated value of this random variable. In addition, however, whenever a transition into state 0 occurs a coin, whose probability of coming up heads depends on the state from which the transition occurred, is flipped. Specifically, if the transition into state 0 was from state i, then the coin flipped has probability α/Pi,0 of coming up heads. Call such a coin an i-coin, i⩾0. If the coin comes up heads then we say that an event has occurred. Consequently, each transition of the Markov chain results in an event with probability α, implying that events occur at rate α. Now say that an event is an i-event if it resulted from a transition out of state i; that is, an event is an i-event if it resulted from the flip of an i-coin. Because πi is the proportion of transitions that are out of state i, and each such transition will result in an i-event with probability α, it follows that the rate at which i-events occur is απi. Therefore, the proportion of all events that are i-events is απi/α=πi,i⩾0.

Now, suppose that X0=0. Fix i, and let Ij equal 1 if the jth event that occurs is an i-event, and let Ij equal 0 otherwise. Because an event always leaves the chain in state 0 it follows that Ij,j⩾1, are independent and identically distributed random variables. Because the proportion of the Ij that are equal to 1 is πi, we see that

πi=limn→∞I1+…+Inn=E[I1]=P(I1=1)

where the second equality follows from the strong law of large numbers. Hence, if we let

T=min{n>0:an event occurs at timen}

denote the time of the first event, then it follows from the preceding that

πi=P(I1=1)=P(XT-1=i)

As the preceding is true for all states i, it follows that XT-1, the state of the Markov chain at time T-1, has the stationary distribution.

Exercises

*1. Suppose it is relatively easy to simulate from the distributions Fi,i=1,2,…,n. If n is small, how can we simulate from

F(x)=∑i=1nPiFi(x),Pi⩾0,∑iPi=1?

Give a method for simulating from

F(x)=1-e-2x+2x3,0<x<13-e-2x3,1<x<∞

2. Give a method for simulating a negative binomial random variable.

*3. Give a method for simulating a hypergeometric random variable.

4. Suppose we want to simulate a point located at random in a circle of radius r centered at the origin. That is, we want to simulate X,Y having joint density

f(x,y)=1πr2,x2+y2⩽r2

(a) Let R=X2+Y2,θ=tan-1Y/X denote the polar coordinates. Compute the joint density of R,θ and use this to give a simulation method. Another method for simulating X,Y is as follows:

Step 1: Generate independent random numbers U1,U2 and set Z1=2rU1-r,Z2=2rU2-r. Then Z1,Z2 is uniform in the square whose sides are of length 2r and which encloses, the circle of radius r (see Figure 11.6).

Step 2: If (Z1,Z2) lies in the circle of radius r—that is, if Z12+Z22⩽r2—set (X,Y)=(Z1,Z2). Otherwise return to step 1.

(b) Prove that this method works, and compute the distribution of the number of random numbers it requires.

5. Suppose it is relatively easy to simulate from Fi for each i=1,…,n. How can we simulate from

(a) F(x)=∏i=1nFi(x)?

(b) F(x)=1-∏i=1n(1-Fi(x))?

*6. In Example 11.4 we simulated the absolute value of a standard normal by using the Von Neumann rejection procedure on exponential random variables with rate 1. This raises the question of whether we could obtain a more efficient algorithm by using a different exponential density—that is, we could use the density g(x)=λe-λx. Show that the mean number of iterations needed in the rejection scheme is minimized when λ=1.

7. Give an algorithm for simulating a random variable having density function

f(x)=30(x2-2x3+x4),0<x<1

8. Consider the technique of simulating a gamma (n,λ) random variable by using the rejection method with g being an exponential density with rate λ/n.

(a) Show that the average number of iterations of the algorithm needed to generate a gamma is nne1-n/(n-1)!.

(b) Use Stirling’s approximation to show that for large n the answer to part (a) is approximately equal to e[(n-1)/(2π)]1/2.

Step 1: Generate Y1 and Y2, independent exponentials with rate 1.

Step 2: If Y1<(n-1)[Y2-log(Y2)-1], return to step 1.

Step 3: Set X=nY2/λ.

(d) Explain how to obtain an independent exponential along with a gamma from the preceding algorithm.

9. Set up the alias method for simulating from a binomial random variable with parameters n=6,p=0.4.

10. Explain how we can number the Q(k) in the alias method so that k is one of the two points that Q(k) gives weight.
Hint: Rather than giving the initial Q the name Q(1), what else could we call it?

11. Complete the details of Example 11.10.

12. Let X1,…,Xk be independent with

P{Xi=j}=1n,j=1,…,n,i=1,…,k

If D is the number of distinct values among X1,…,Xk show that

E[D]=n1-n-1nk≈k-k22nwhenk2nis small

13. The Discrete Rejection Method: Suppose we want to simulate X having probability mass function P{X=i}=Pi,i=1,…,n and suppose we can easily simulate from the probability mass function Qi,∑iQi=1,Qi⩾0. Let C be such that Pi⩽CQi,i=1,…,n. Show that the following algorithm generates the desired random variable:

Step 1: Generate Y having mass function Q and U an independent random number.

Step 2: If U⩽PY/CQY, set X=Y. Otherwise return to step 1.

14. The Discrete Hazard Rate Method: Let X denote a nonnegative integer valued random variable. The function λ(n)=P{X=n∣X⩾n},n⩾0, is called the discrete hazard rate function.

(a) Show that P{X=n}=λ(n)∏i=0n-1(1-λ(i)).

(b) Show that we can simulate X by generating random numbers U1,U2,… stopping at

X=min{n:Un⩽λ(n)}

(d) Suppose that λ(n)⩽p<1 for all n. Consider the following algorithm for simulating X and explain why it works: Simulate Xi,Ui,i⩾1 where Xi is geometric with mean 1/p and Ui is a random number. Set Sk=X1+⋯+Xk and let

X=min{Sk:Uk⩽λ(Sk)/p}

15. Suppose you have just simulated a normal random variable X with mean μ and variance σ2. Give an easy way to generate a second normal variable with the same mean and variance that is negatively correlated with X.

*16. Suppose n balls having weights w1,w2,…,wn are in an urn. These balls are sequentially removed in the following manner: At each selection, a given ball in the urn is chosen with a probability equal to its weight divided by the sum of the weights of the other balls that are still in the urn. Let I1,I2,…,In denote the order in which the balls are removed—thus I1,…,In is a random permutation with weights.

(a) Give a method for simulating I1,…,In.

(b) Let Xi be independent exponentials with rates wi,i=1,…,n. Explain how Xi can be utilized to simulate I1,…,In.

17. Order Statistics: Let X1,…,Xn be i.i.d. from a continuous distribution F, and let X(i) denote the ith smallest of X1,…,Xn,i=1,…,n. Suppose we want to simulate X(1)<X(2)<⋯<X(n). One approach is to simulate n values from F, and then order these values. However, this ordering, or sorting, can be time consuming when n is large.

(a) Suppose that λ(t), the hazard rate function of F, is bounded. Show how the hazard rate method can be applied to generate the n variables in such a manner that no sorting is necessary.

Suppose now that F-1 is easily computed.

(b) Argue that X(1),…,X(n) can be generated by simulating U(1)<U(2)<⋯<U(n)—the ordered values of n independent random numbers—and then setting X(i)=F-1(U(i)). Explain why this means that X(i) can be generated from F-1(βi) where βi is beta with parameters i,n+i+1.

(c) Argue that U(1),…,U(n) can be generated, without any need for sorting, by simulating i.i.d. exponentials Y1,…,Yn+1 and then setting

U(i)=Y1+⋯+YiY1+⋯+Yn+1,i=1,…,n

Hint: Given the time of the (n+1)st event of a Poisson process, what can be said about the set of times of the first n events?

(d) Show that if U(n)=y then U(1),…,U(n-1) has the same joint distribution as the order statistics of a set of n-1 uniform (0,y) random variables.

(e) Use part (d) to show that U(1),…,U(n) can be generated as follows:

Step 1: Generate random numbers U1,…,Un.

Step 2: Set

U(n)=U11/n,U(n-1)=U(n)(U2)1/(n-1),U(j-1)=U(j)(Un-j+2)1/(j-1),j=2,…,n-1

18. Let X1,…,Xn be independent exponential random variables each having rate 1. Set

W1=X1/n,Wi=Wi-1+Xin-i+1,i=2,…,n

Explain why W1,…,Wn has the same joint distribution as the order statistics of a sample of n exponentials each having rate 1.

19. Suppose we want to simulate a large number n of independent exponentials with rate 1—call them X1,X2,…,Xn. If we were to employ the inverse transform technique we would require one logarithmic computation for each exponential generated. One way to avoid this is to first simulate Sn, a gamma random variable with parameters (n,1) (say, by the method of Section 11.3.3). Now interpret Sn as the time of the nth event of a Poisson process with rate 1 and use the result that given Sn the set of the first n-1 event times is distributed as the set of n-1 independent uniform (0,Sn) random variables. Based on this, explain why the following algorithm simulates n independent exponentials:

Step 1: Generate Sn, a gamma random variable with parameters (n,1).

Step 2: Generate n-1 random numbers U1,U2,…,Un-1.

Step 3: Order the Ui,i=1,…,n-1 to obtain U(1)<U(2)<⋯<U(n-1).

Step 4: Let U(0)=0,U(n)=1, and set Xi=Sn(U(i)-U(i-1)),i=1,…,n.

When the ordering (step 3) is performed according to the algorithm described in Section 11.5, the preceding is an efficient method for simulating n exponentials when all n are simultaneously required. If memory space is limited, however, and the exponentials can be employed sequentially, discarding each exponential from memory once it has been used, then the preceding may not be appropriate.

20. Consider the following procedure for randomly choosing a subset of size k from the numbers 1,2,…,n: Fix p and generate the first n time units of a renewal process whose interarrival distribution is geometric with mean 1/p—that is, P{interarrival time=k}=p(1-p)k-1,k=1,2,…. Suppose events occur at times i1<i2<⋯<im⩽n. If m=k, stop; i1,…,im is the desired set. If m>k, then randomly choose (by some method) a subset of size k from i1,…,im and then stop. If m<k, take i1,…,im as part of the subset of size k and then select (by some method) a random subset of size k-m from the set {1,2,…,n}-{i1,…,im}. Explain why this algorithm works. As E[N(n)]=np a reasonable choice of p is to take p≈k/n. (This approach is due to Dieter.)

21. Consider the following algorithm for generating a random permutation of the elements 1,2,…,n. In this algorithm, P(i) can be interpreted as the element in position i.

Step 1: Set k=1.

Step 2: Set P(1)=1.

Step 3: If k=n, stop. Otherwise, let k=k+1.

Step 4: Generate a random number U, and let

P(k)=P([kU]+1),P([kU]+1)=k.Go tostep 3.

(a) Explain in words what the algorithm is doing.

(b) Show that at iteration k—that is, when the value of P(k) is initially set—that P(1),P(2),…,P(k) is a random permutation of 1,2,…,k.

Hint: Use induction and argue that

Pk{i1,i2,…,ij-1,k,ij,…,ik-2,i}=Pk-1{i1,i2,…,ij-1,i,ij,…,ik-2}1k=1k!by the induction hypothesis

The preceding algorithm can be used even if n is not initially known.

22. Verify that if we use the hazard rate approach to simulate the event times of a nonhomogeneous Poisson process whose intensity function λ(t) is such that λ(t)⩽λ, then we end up with the approach given in method 1 of Section 11.5.

*23. For a nonhomogeneous Poisson process with intensity function λ(t),t⩾0, where ∫0∞λ(t)dt=∞, let X1,X2,… denote the sequence of times at which events occur.

(a) Show that ∫0X1λ(t)dt is exponential with rate 1.

(b) Show that ∫Xi-1Xiλ(t)dt,i⩾1, are independent exponentials with rate 1, where X0=0.

In words, independent of the past, the additional amount of hazard that must be experienced until an event occurs is exponential with rate 1.

24. Give an efficient method for simulating a nonhomogeneous Poisson process with intensity function

λ(t)=b+1t+a,t⩾0

25. Let (X,Y) be uniformly distributed in a circle of radius r about the origin. That is, their joint density is given by

f(x,y)=1πr2,0⩽x2+y2⩽r2

Let R=X2+Y2 and θ=arctanY/X denote their polar coordinates. Show that R and θ are independent with θ being uniform on (0,2π) and P{R<a}=a2/r2,0<a<r.

26. Let R denote a region in the two-dimensional plane. Show that for a two-dimensional Poisson process, given that there are n points located in R, the points are independently and uniformly distributed in R—that is, their density is f(x,y)=c,(x,y)∈R where c is the inverse of the area of R.

27. Let X1,…,Xn be independent random variables with E[Xi]=θ,Var(Xi)=σi2i=1,…,n, and consider estimates of θ of the form ∑i=1nλiXi where ∑i=1nλi=1. Show that Var∑i=1nλiXi is minimized when

λi=(1/σi2)/∑j=1n1/σj2,i=1,…,n.

Possible Hint: If you cannot do this for general n, try it first when n=2.
The following two problems are concerned with the estimation of ∫01g(x)dx=E[g(U)] where U is uniform (0,1).

28. The Hit–Miss Method: Suppose g is bounded in [0,1]—for instance, suppose 0⩽g(x)⩽b for x∈[0,1]. Let U1,U2 be independent random numbers and set X=U1,Y=bU2—so the point (X,Y) is uniformly distributed in a rectangle of length 1 and height b. Now set

I=1,ifY<g(X)0,otherwise

That is, accept (X,Y) if it falls in the shaded area of Figure 11.7.

(a) Show that E[bI]=∫01g(x)dx.

(b) Show that Var(bI)⩾Var(g(U)), and so hit–miss has larger variance than simply computing g of a random number.

29. Stratified Sampling: Let U1,…,Un be independent random numbers and set U¯i=(Ui+i-1)/n,i=1,…,n. Hence, U¯i,i⩾1, is uniform on ((i-1)/n,i/n). ∑i=1ng(U¯i)/n is called the stratified sampling estimator of ∫01g(x)dx.

(a) Show that E[∑i=1ng(U¯i)/n]=∫01g(x)dx.

(b) Show that Var[∑i=1ng(U¯i)/n]⩽Var[∑i=1ng(Ui)/n].

Hint: Let U be uniform (0,1) and define N by N=i if (i-1)/n<U<i/n,i=1,…,n. Now use the conditional variance formula to obtain

Var(g(U))=E[Var(g(U)∣N)]+Var(E[g(U)∣N])⩾E[Var(g(U)∣N)]=∑i=1nVar(g(U)∣N=i)n=∑i=1nVar[g(U¯i)]n

30. If f is the density function of a normal random variable with mean μ and variance σ2, show that the tilted density ft is the density of a normal random variable with mean μ+σ2t and variance σ2.

31. Consider a queueing system in which each service time, independent of the past, has mean μ. Let Wn and Dn denote, respectively, the amounts of time customer n spends in the system and in queue. Hence, Dn=Wn-Sn where Sn is the service time of customer n. Therefore,

E[Dn]=E[Wn]-μ

If we use simulation to estimate E[Dn], should we

(a) use the simulated data to determine Dn, which is then used as an estimate of E[Dn]; or

(b) use the simulated data to determine Wn and then use this quantity minus μ as an estimate of E[Dn]?

Repeat for when we want to estimate E[Wn].

*32. Show that if X and Y have the same distribution then

Var((X+Y)/2)⩽Var(X)

Hence, conclude that the use of antithetic variables can never increase variance (though it need not be as efficient as generating an independent set of random numbers).

33. If 0⩽X⩽a, show that

(a) E[X2]⩽aE[X],

(b) Var(X)⩽E[X](a-E[X]),

34. Suppose in Example 11.19 that no new customers are allowed in the system after time t0. Give an efficient simulation estimator of the expected additional time after t0 until the system becomes empty.

35. Suppose we are able to simulate independent random variables X and Y. If we simulate 2k independent random variables X1,…,Xk and Y1,…,Yk, where the Xi have the same distribution as does X, and the Yj have the same distribution as does Y, how would you use them to estimate P(X<Y)?

36. If U1,U2,U3 are independent uniform (0, 1) random variables, find P∏i=13Ui>0.1.

Hint: Relate the desired probability to one about a Poisson process.