11
Monte Carlo Calculations in Problems of Mathematical Physics
STANISLAW M. ULAM
RESEARCH ADVISOR
LOS ALAMOS SCIENTIFIC LABORATORY
11.1 Introduction
In this chapter we shall outline the general idea and summarize certain applications of the procedures constituting the so-called Monte Carlo method, and we shall discuss briefly some mathematical problems connected with these procedures. The Monte Carlo method is used in problems for which explicit solutions cannot be obtained by methods of classical analysis and also in cases such that the numerical work in solving the corresponding differential or integrodifferential equations would take a prohibitively long time even on the fast modern electronic computing machines. The chapter by George W. Brown, Monte Carlo Methods, in the first volume of “Modern Mathematics for the Engineer,”1 gives an excellent description of the scope of this approach and of the basic statistical concepts. Some of the different applications will be mentioned here, and the role of branching processes will be discussed by means of examples.
The development of electronic computers has made it possible to perform statistical experiments, not with the physical situations in question, but rather—so to say—“on paper,” mocking up the physical problem rather directly on the computing machines themselves. Viewed rather broadly, when the problem involves many (often more than 2!) independent variables or when the combinatorial complexities and the geometrical complications of the problem seem to preclude obtaining solutions in closed form, the set of procedures that came to be called the Monte Carlo method may often be used advantageously.
The idea of this approach may be illustrated, in a very simple case, by the following example:
Suppose a region R is defined in the n-dimensional euclidean space by the inequalities
The problem is to find the volume of the region R.
Suppose that R is contained in the unit cube and that we know a priori that this volume is not very small compared with 1. If the computation of the definite integral cannot be performed explicitly, an idea of the volume of R can be obtained, actually with arbitrary accuracy, from numerical work corresponding to the definition of the integral. That is to say, the unit cube is subdivided into small cubes, and by counting the number of those cubes that lie in the region R we obtain an approximation to its volume. If n is, say, 10, by subdividing each axis into r intervals we obtain r10 small cubes. The proportion of those among them that lie wholly in R gives us an approximation to the volume. Even if the functions Fi are analytically well behaved, in order to obtain an answer to our question within 1 per cent or so, we have to subdivide the interval into some 10 to 100 subintervals. Assume that r is of the order of 100 and that the number of subcubes is therefore 10010. It is obviously impractical to consider all of these. Instead of employing the foregoing “deterministic” procedure, we could do the following: Suppose we select N points at random in the unit cube in our ten-dimensional space, and count among them those that belong to R, that is to say, those points having coordinates that satisfy all the given inequalities (11.1). Suppose we know in advance that the volume of R is of the order of 1, i.e., that it is about ½ or so. Then if N is of the order of 1002, that is, 10,000, we could hope that the fraction of the number of points lying in R will give us an idea of the value of the volume that should differ from the true value by less than 1 per cent or so. This is due to the fact that we have here a situation similar to the simple “Bernoulli case” problem in the elementary theory of probabilities. If the true value of the volume is v, we perform an experiment N times by selecting a point in the unit cube, with the probability v that it will lie in R. The theorem of Bernoulli states that, if N → ∞, the probability that the frequency of successes will approach v tends to 1. What is more, we can estimate for a finite N the mean-square error of the difference between the frequency and v. If N is now of the order of 10,000, the probability is close to 1 that this error will not surpass a small percentage. By making N several times 10,000, we can, with high probability, obtain an answer to our question within 1 per cent. The number N of experiments is obviously much smaller than the above number 10010 of subcubes. In this way, by performing a stochastic experiment, we obtain a much quicker idea of the true volume —to be sure not with certainty, but only with high probability.
It remains for us to explain how one selects points “at random,” with uniform probability, in a unit cube. It is immediately apparent that it suffices to be able to select randomly with uniform probability real numbers between 0 and 1, independent of each other in succession. Ten such numbers will define a point in the unit cube. Therefore, in order to obtain 10,000 random points in space of 10 dimensions, it suffices to produce 100,000 real numbers between 0 and 1. We shall discuss briefly how one can, on a computing machine—by a mathematical operation—produce such sequences of real numbers possessing certain properties of randomness. One could think of producing such real numbers by a physical process corresponding somehow to “tossing a coin.” Indeed, one could use counts of radioactive disintegrations in electronic tubes, etc., but the problem is to produce these numbers very quickly without any auxiliary physical equipment in the computing machine itself. Many methods have been devised for achieving this aim. One of them is to start with a suitably selected real number written in the binary expansion as , where αi = 0 or 1. To obtain x2, we square x1. This number will have 80 binaries, and we select the middle 40 of them. That is to say, the α’s in x12 having indices from 21 to 60 are used to define x2. By iteration of this procedure we obtain x3 from x2, and so on. Obviously, the sequence of numbers thus obtained will not be truly random, since for one thing it has to start repeating, the process being a finite one if we have only a finite number—40 in our case—of digits; however, very long sequences of the xi, of the order of 1,000,000 terms or so, can be obtained. The properties of uniformity of position in the unit interval, the lack of correlation between each term and its immediate successors, and so on, can be tested. Once this is done ahead of time, these numbers can be used very quickly, since each step requires only a multiplication and shifting on the machine to produce our “random” sequences.
There are other methods for obtaining such sequences. An account of these is given, for example, in the article by O. Taussky and J. Todd.13
11.2 A Combinatorial Problem
To give an example of how a sampling method may be of value in obtaining deterministically defined quantities by a random procedure, we shall discuss the following combinatorial problem.
A well-known theorem asserts that, given n2 + 1 integers in any order, it is always possible to find among them a subsequence of n + 1 that is either increasing or decreasing. So, for example, with n = 10, if we have an arbitrary permutation of 101 integers, it is possible to find among them a subsequence of 11 in increasing order or 11 in decreasing order. Suppose we want to find, however, not the minimum length but rather, considering all possible permutations, the average length of the maximal increasing or decreasing subsequences in the permutations. The average is meant with respect to all possible permutations of the n2 + 1 integers. If n = 10, the number of permutations is 101!, and it is obviously impossible to examine them all. Imagine now that we examine only a few thousand out of the (n2 + 1)! permutations, these few thousand being chosen “at random” with uniform probability among all possible ones. The average performed on these should give us a fair idea of the true value of the average. An electronic computing machine like the IBM 704 will process several thousand permutations of the 101 integers in a few minutes.
We should explain here how one may obtain random permutations of N integers. One way might be the following: We produce, to obtain a “random” permutation of N integers, a sequence of N random real numbers on the interval (0,1). If we write 1 at the position determined by the smallest among them, 2 at the position occupied by the next smallest among them, and so on, finally writing N at the position occupied by the largest among them, we obtain one permutation of the N integers. To produce K permutations, we perform this process K times. For various values n, E. Neighbor examined the mean length of the maximal increasing or decreasing subsequence (using a few thousand permutations for each case) starting with n = 4 and going to n = 10. The theorem guarantees that the minimum length of a monotone subsequence for n = 4 is 5 (for a permutation of 17 integers). The average length of a maximal monotone subsequence in a permutation of 17 integers turned out for his sample to be 6.69. For n = 5 the average was 8.46, for n = 8 the average was 14.02, and for n = 10 the average came out 17.85. The averages formed a linear function of n and were about 1.7 times n. Another question of interest would be to find the distribution of the length of the maximum monotone subsequence around this average. It turned out to have a Gaussian form starting at the guaranteed minimum, having its maximum at the average, and becoming vanishingly small at about 2.2 times the minimum. We mention this example to show how, in problems of combinatorics, a statistical approach may help to obtain an idea of regularities inherent in complicated situations. We shall mention later a Monte Carlo study in some problems of statistical mechanics.
11.3 Branching Processes
One of the largest areas of application of the Monte Carlo method lies in the problems involving branching or multiplicative processes. Schematically, these problems can be described as follows: Imagine a collection of particles lying in a space E. This could be the ordinary three-dimensional space, but we can consider a much more general case, the particles being rather abstract entities located in a space of many dimensions. Each of the particles, in addition to its position in the space E, may have some other characteristics described by indices running from 1 to K. Let us imagine further that we consider time proceeding in a discrete manner by generations, one “second” apart. The particles perform a random walk in space during each generation and, in addition, they may multiply or demultiply; in particular, they may disappear altogether by absorption. They may also reflect on certain given boundaries, etc. The random walk may be isotropic—that is to say, the particles go with equal probability in any direction—or some directions may be preferred. The multiplication of the particle may depend, in an explicit fashion, on its position in space and on its index.
The simplest illustration is the elementary random-walk problem with no multiplication. This is a possible interpretation of problems of diffusion.1 If we consider the steps to be very short and the intervals of time to tend to zero, the analytical description of the density u of such diffusing particles as a function of space and time is determined by the differential equation
where
If the particles, in addition to performing a random walk, should multiply in number by a factor V(x,y,z) at the place (x,y,z), the equation would be
a Schrödinger equation. Here, V is a given function. More generally, we think of equations of the type
where Φ(u) is an operator on the function u, not involving time explicitly. If we consider time as increasing by a constant amount from generation to generation, the difference version of such an equation is
where Ψn is the nth iterate of the operator Ψ. A solution of the equation may be obtained approximately by numerical work, through iteration of a given operator, starting from an initial distribution u(x,y,z;0). For numerical work, the space of the variables x, y, z may be considered as being divided into a finite number of zones. The operator Φ will involve, in some problems, not only differential expressions as in the continuous model of diffusion or the random walk, but integral parts corresponding to “large” displacements, etc. A stochastic or Monte Carlo approach is often useful in problems of this sort. As we shall see throughout this chapter, the procedure involves an iteration, in a timelike variable, of a given process. We intend now to outline some properties of the iterates of such operators, restricting our attention to a discrete time and space.
Consider the branching or multiplicative features of the processes on a simple prototype as follows: We start with a particle in the zeroth generation. This particle may now do one of the following things. With probability p0 it may disappear altogether; with probability p1 it reproduces itself; with probability p2 it produces two particles; …; with probability pn it produces n particles like itself. For the simplest case, assume that the new particles, if any, in the second generation are independently subject to the same rules. We may obtain particles in the third generation, and the process may continue. The problem is to determine the properties of such a process—in particular, to discuss the probability distribution of the population as a function of time, that is to say, the generation index.
A useful tool for the study of these questions is the generating function of Laplace. Let
be a function of the real variable x. We have f(1) = 1, since
gives us the expected value of the number of offspring from one particle. The second and higher moments of the probability distribution for the number of offspring from each particle can be obtained from f(x) by successive differentiations. So, for example,
It is very easily proved that, if f(x) is the generating function of the known probability distribution for the number of particles in the first generation, then the generating function for the second generation is given by f[f(x)]. For the third generation the generating function is given by f{f[f(x)]}. For the kth generation it is fk(x), where
fk(x) = f[fk−1(x)]
Note that, for the generating functions at least, the solution to our problem is obtained by iterating a given function. From the generating function we can obtain explicitly the moments of the probability distribution for the number of particles in the kth generation. So, for example, the expected value is readily obtained through the equation
from the rule for differentiation of composite functions. The second derivative [fk(x)]″ is simply
In general, by representing
we obtain difference equations
Each is of the form
xk = Ak−1 + M1xk−1
with general solution
Solutions for the first three derivatives are
The probability of having no particles left in the kth generation is given by the constant term of its generating function. It is easy to find the asymptotic probability for the “death” of the system, that is to say, the limit of the value of this constant term of the kth generation as k → ∞. This is done as follows: In the first generation, f(0) = p0. We want to know the asymptotic value of fk(0). The function f(x) is monotone increasing. As we iterate f, the values at x = 0 increase and approach a limit that is equal to the first x such that f(x) = x. Therefore, the probability of the system dying out at some time is given by the root of this equation. Three cases are to be distinguished: ν > 1, ν = 1, and ν < 1. In the first case the function f(x) has a slope at x = 1 that is greater than 1, and since all its coefficients in the power-series development are non-negative, it is convex upward. Since, as we may assume, we have p0 > 0, it must cross the diagonal inside the interval (0,1), and the smallest root of the equation f(x) = x is less than 1. Therefore, the probability of the system dying out is less than 1, and there is a positive chance for immortality of the system. The same argument shows that in the other cases the first root of the equation is x = 1, and therefore the system will die out with probability 1.
The problem of finding the probability distribution itself and not merely its moments, as above, is much more difficult. It can be solved explicitly only in some simple cases. For example, if f(x) is a power series of a form that can be written as
where a, b, c, and d are constants so chosen as to ensure that all the pi are nonnegative with their sum equal to 1, then we have a three-parameter family of functions that can be explicitly iterated. This is due to the fact that functions of this form constitute a group under composition. The iterates have the same form and the constants can be obtained by a matrix-multiplication algorithm.
Suppose we are interested in the probability distribution of the sum of all particles in the system produced from the first through the kth generation. If we want the generating function for the probabilities of having a total of n particles from the first through the kth generation, we proceed as follows.
The total of n particles can be obtained in any one of the following mutually exclusive ways: We can have 1 in the first generation and n − 1 in the remaining k − 1, or 2 in the first generation and n − 2 in the remaining k − 1; in general, we can have r in the first and n − r in the remaining k − 1 generations. The required probability is therefore the sum
Here denotes the probability that, starting from r in the first generation, we shall attain from these r a total of n − r in k − 1 generations. But the r particles are independent of each other. The probability of getting the total of n − r from them is therefore the probability of n − r in the sum of these r variables. The generating function for the sum of the independent variables is the product of the generating functions corresponding to each of them. In our case it is the rth power of f(x). We are looking for the coefficient of xn−r in [fk−1(x)]r. Our required probability qk equals, therefore, the sum with respect to r of the coefficients of xn−r in [fk−1(x)]r, or the sum of the coefficients of xn in the sum
But the coefficient of xn in the expression (11.2) is the same as this coefficient in f[xfk−1(x)]. This is true for all n. Therefore, the generating function for qn is f[xfk−1(x)]. Since n is arbitrary here, we get the generating function for the time sum:
uk(x) = f[xuk−1(x)]
If we count the original particle, this multiplies the generating function by x; expressing this slightly modified form recursively, we obtain the more convenient expression
uk(x) = xf[uk−1(x)]
We have, in general, a relation between moments of the nth order of a distribution function and the nth derivative of the generation function. We shall now show how one can compute the derivatives of uk(x) for any k in an explicit manner.
Since, as was shown above,
uk(x) = xf[uk−1(x)]
we may obtain the desired results by repeated differentiations, and by solving the resulting finite-difference equations. But if k is allowed to approach infinity, and if the system is subcritical, then
Hence for the distribution of the total number produced, we have
u(x) = xf[u(x)]
Suppose that we do not start with a single particle but that there exists a source introducing particles in each generation according to probabilities given by a generating function s(x). Then the generating functions for the zeroth, first, and second generations are given by s(x), s(x)s[f(x)], and s(x)s[f(x)]s[f2(x)], respectively.
In general, the generating function for the distribution of our multiplicative system in the kth generation is
gk(x) = s(x)gk−1[f(x)]
If the system is subcritical—that is to say, if ν < 1— but sustained by the source, we shall have a limiting distribution for the generating functions. This limiting distribution F(x) will be a solution of the functional equation
F(x) = s(x)F[s(x)]
Even without solving this equation, one can at once obtain useful statistical information about moments of F(x) by differentiating it. Thus
11.4 Multidimensional Branching Processes
So far we have described a branching process starting from a particle producing other particles of the same type. If the particles differ in kind and have different characteristics, such as unequal spatial positions or momenta, we have a branching process of greater generality. By assuming that the positions or momenta, for example, are able to assume discrete values, we are led to the following N-dimensional branching process:
Suppose that a system of particles consists of N different types, that a particle of type i can transform into one or more particles of different types, and that the probability of a particle of type i producing
particles, jk of type k, is p1(i;j1, …, jN), i = 1, 2, …, N. We assume that for every set of nonnegative integers j1, …, jN we have
p1(i; j1, …, jN) ≥ 0
As before, we imagine now that, starting with a particle, we have a continuing branching process and that, among other things, we are interested in the probability of having so many particles of each type in future generations. We again consider a generating function
which defines the probabilities of progeny at the end of one generation from one particle of type i. Hence x′ = G(x); explicitly,
with 0 ≤ xi ≤ 1, defines a generating transformation of the unit cube IN of the euclidean N space into itself. Moreover, abbreviating (1, …, 1) as 1, we see that G(1) = 1, so that the point 1 is a fixed point of the transformation G.
If, in a given generation, the generating function is
[i.e., the coefficient q(k) is the probability in this generation of the state: ki particles of type i, i = 1, …, N] then the generating function of the next generation is
If we begin with one particle of type i, then the generating functions are the following:
For the first generation, gi(x1, …, xN)
For the second generation, gi(g1, …, gN)
For the third generation, gi[g1(g1, …, gN), …, gN(g1, …, gN)]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hence, by adopting the notation Gk(x) for the kth iterate of G(x):
we see that gi(k)(x) is the generating function for the kth generation of progeny from one particle of type i, for i = 1, 2, …, N.
First Moments; Jacobian. Let
be the generating function for a particular generation. Then
and
where P(j1) is the probability of j1 particles of type 1 in this generation. Hence we define as the first moment for particles of type 1; similarly, we define
as the first moment for particles of type j.
We adopt the notation
as the first moment of particles of type j in the kth generation of progeny from one particle of type i.
Recall that, for a transformation G(x), namely, gi(x1, …, xN), the Jacobian matrix is
where we have exhibited the element in the ith row and jth column. If H(x) is a second transformation, hi(x1, … ,xN), then the Jacobian of the composite transformation H[G(x)], namely, hi(g1, … ,gN), is
It follows, since Gk = G(Gk−1), that
and, for a fixed point,
That is, the Jacobian of the kth iterate of G at a fixed point is the kth power of the Jacobian matrix of G. In particular, since 1 = G(1), we have
J(Gk)1 = (J(G)1)k
Now
so that the relationship
[mij(k)] = [mij(1)]k
exists between the first moments of the kth generation and those of the first.
Since
(S−1 AS)k = S−1 Ak S
we have
[mij(k)] = S(S−1 Mk S)S−1 = S(S−1 MS)k S−1
where S−1MS is the canonical form of
M = [mij(1)]
This permits more rapid computation of mij(k).
Again the study of properties of a branching system of this general type can be reduced to an investigation of iterates of a transformation in N dimensions, giving us a probability flow in time on the N-dimensional cube. The expected values of the number of the particles of each type are given by a linear transformation defined by the first derivatives of our functions. This matrix M has all its elements nonnegative (mij may be interpreted as the expected value of the number of particles of type j produced by a particle of type i), and we recall a theorem of Frobenius-Perron on the properties of iterates of such a matrix. This theorem guarantees, among other things, that, if the elements are actually all positive and if we start with an arbitrary vector in the positive octant of space (a vector with all its coordinates nonnegative) and apply the matrix to this vector successively, then the iterated images of it will converge, in direction, to a unique vector with nonnegative coefficients. Actually, it is sufficient for this purpose that some power of the matrix has all its elements positive. There is a considerable amount of literature on strengthened forms and ramifications of this theorem. The limiting vector
has the property that
. That is to say, it is a characteristic vector (eigenvector); λ is the characteristic root (eigenvalue) of highest absolute value for the matrix M.
The applications of this theorem are obviously important. As an illustration, we may consider a system of neutrons in a spatial assembly that has been divided into N regions. If mij is interpreted as the average number of neutrons produced, in unit time, in region j by a neutron issuing from region i, the above theorem can be interpreted as asserting that a steady state—that is to say, an asymptotic distribution of neutrons —will obtain. In other words, the ratios of the numbers of neutrons in various zones will approach fixed values. These values can be obtained by computing the direction cosines of the vector ; the number of neutrons in the system will increase geometrically in time, the ratios in successive generations being given by λ.
The foregoing discussion refers to the expected values or first moments of the numbers in successive generations. In reality, we are given a probability flow and need an analogue of the law of large numbers asserting that with very great probability the actual numbers of particles of each type will approach those given by the theorem. If the system is supercritical, this law can be demonstrated.3,4
We can see now that a Monte Carlo procedure may be justified, to some extent, by the foregoing theorem. Suppose a problem involves the determination of the asymptotic distribution of neutrons and their growth in time in a given medium. We may subdivide the medium into a large number N of zones, compute the expected values of the mij, and iterate the matrix. An alternative procedure would be, instead of dealing with all the regions, to proceed in a stochastic manner (in analogy with our attempt to compute definite integrals). We start with a particle given in a zone and select its direction at random with a correct distribution. We then determine at random again, but with correct chances, the length of the path that the particle traverses before “producing” another particle (or several of them) in a fixed region j. This latter distribution is an exponential, the chances of the particle being absorbed or producing other particles varying from region to region that it traverses. In actual problems the situation is even more general. Not only do the different positions of the particle require separate indices, but also the energy of the particle assumes different values; thus our N is actually the product of two numbers: the number of regions and the number of energy intervals for the particle. In many cases, a stochastic calculation of the properties of such a system is more practical than a computation of the eigenvalues and eigenvectors by iteration of very large matrices. A large number of examples illustrating the Monte Carlo approach can be found in the book by Cashwell and Everett.2 The problems treated there involve the distribution of the number of particles starting from given sources in systems with a given geometry, undergoing various processes, and terminating in a number of different categories. The Monte Carlo problem involves following a large number of particles undergoing, with probabilities that are supposed to be known, events of different types. By a final census of the numbers produced, we obtain an idea of the terminal distribution.
In actual computations, it is often advantageous to use, instead of the literal imitation of the physical process as indicated above, certain special tricks. For example, instead of considering a neutron as producing an integral number of particles 0, 1, 2, … (with given probabilities p0, p1, p2, …), one may use a weight representing an average of the numbers that it may reproduce and follow the change and growth of weights deposited by the particle in various regions of space as the branching process continues.
11.5 Statistical Sampling Methods
Even with fast electronic computing machines, there is a premium on the efficacy of statistical sampling methods, since some problems require, for moderate accuracy, enormous numbers of “particles” to be processed. This is especially true when one has a situation involving systems that are just about critical multiplicatively, or a situation requiring estimates of the fraction—very small—of particles that succeed in penetrating given shields involving many mean free paths. This is done by more sophisticated techniques, the so-called importance sampling being widely used.5,6,8
We might indicate, by an example, the kind of practical problems computed by the statistical sampling method. This is Problem 10 discussed in the book of Cashwell and Everett.2 A cylinder of cadmium contains a gaseous boron compound and a vacuum space. This is surrounded by a hydrocarbon in a cylindrical outer container. There is a parallel beam of neutrons that are monoenergetic impinging on the base of the outer container. The neutrons can be captured in the hydrogen, in the cadmium, or in the boron; they can be scattered by the first nuclei according to a known scattering law. One is interested in the number of escapes from the base—escape out of the cylinder through the opposite face—the amount of capture, and the energy losses by collisions. The purpose is to estimate the efficiency of capture by boron as a function of the energy of the source and its distance from the axis. The system is to be used as a neutron counter. Each neutron is characterized by many parameters.
It is apparent that in a problem of this sort, in such a complicated geometry, an analytical study through an attempt to solve the integro-differential equations by ordinary numerical methods would be quite impractical.
For a description of another case, a calculation of resonance-capture probability of neutrons in the reactor lattice, we refer the reader to the article by R. D. Richtmyer.10
11.6 Reactions in a Heavy Nucleus
An instructive case of a branching process showing considerable combinatorial and analytical complexities is provided by the problem of reactions produced in a heavy nucleus by an incident particle of high energy. A nucleus of a heavy atom—say copper or silver—is composed of individual nucleons bound together. Imagine now that a cosmic-ray proton, or a neutron, or a high-energy particle produced in, e.g., the Brookhaven machine—a proton or a π meson—strikes this assembly of nucleons. One of the nucleons constituting the target may be knocked off, with some of the energy of the incident particle lost, and the incident particle may continue inside the assembly of other nucleons, exciting them by elastic collisions or producing new particles on the way. The variety of the new particles produced is quite great. They may be π mesons, k mesons, many of the so-called strange particles, and γ rays. Such production of “strange” particles will occur if the incident particle has enough energy—over a few billion electron volts. We have here a great variety of possible branching processes characterized by transmutation of indices denoting the various particles, their positions inside the nucleus, their equations, and their energies. For high energies, the mechanics of the processes has to be treated relativistically. For lower energies of the incident particle, the problem deals mainly with the question of “evaporation” of constituent nucleons. A 100-Mev proton may knock off several nucleons after exciting “thermally” the whole nucleus. M. L. Goldberger7 treated this problem by a Monte Carlo method. A study of phenomena at higher energies involving pion production was made by N. Metropolis and A. Turkevitch.9 The most recent work including the study of the strange-particle production is that of L. Sartori, A. Werbrouck, and J. Wooten, to appear in Physical Review.
11.7 The Petit Canonical Ensemble
An interesting study of a problem in statistical mechanics by a Monte Carlo method was undertaken in recent years to obtain statistical mechanical averages in the so-called petit canonical ensemble. The problem is to obtain thermodynamic properties of dense fluids. We consider N spherical molecules occupying, say, a cubic volume and having a given temperature, and we try to obtain an insight into the spatial distributions of these molecules, which are close to a hard packing, by letting them move from one position in configuration space into another by performing, one at a time, allowed displacement of each molecule. We assume various forces between the molecules. One limiting case is that of hard spheres, but we may consider forces between pairs of these molecules such as, e.g., the Lennard-Jones potentials. Various thermodynamic properties, e.g., distribution functions, or pressures, may be obtained by averaging over the class of possible configurations. The equations of state and transition phases could be obtained from such an empirical study on computing machines with rather surprising indications of the real facts by using for N such small numbers as 128 or even 32. In reality, N is of the order of 1023 cm−3.
The problem involved, of course, hundreds of thousands of configurations obtained from the starting one by an allowed elementary displacement and computations of the thermodynamic quantities given by configurations. It is impossible here to summarize the results and the procedure. The reader is referred to the articles by M. Rosenbluth, Metropolis, Teller, et al.,9 and also the more recent work by Z. W. Salsburg, W. W. Wood, et al.11,15,16
In all these problems, the branching or multiplicative feature does not appear. The problems involve, rather, a Markovian process of transitions in a space of a high number of variables from one point to another, and instead of multiplication of the matrices, they employ a sampling process on individual chains of such transitions, sampled by many runs.
11.8 Iterates of Transformations, Ergodic Properties, and Time Averages
We have seen how the Monte Carlo method can be used to obtain information concerning the behavior of problems that involve iteration of operators—either for the study of a steady state, if it is established, or for an investigation of the history of these processes in time. (The random numbers used in the method are themselves also obtained by an iteration procedure!) The theorems regarding the existence of a steady state can often be obtained from generalizations of the Frobenius-Perron theorem concerning the iterates of linear transformations with nonnegative coefficients. The probabilistic branching process can be described by the iteration of the generating transformation so chosen that its first moments form this linear-transformation matrix.
In case the linear operator to be iterated is not positive in the sense of Frobenius and Perron, there will in general be no convergence of the vectors, under iteration, to a fixed direction. Nevertheless, the ergodic properties of the iterates of an arbitrary linear transformation allow us to obtain information concerning the properties of time averages of the sequence of iterates.14 Suppose A is an arbitrary linear transformation of the n-dimensional euclidean space. We consider An(r), where r is any vector. If A represents a rotation in the plane through an irrational angle, the sequence of vectors An(r) will not converge. On the contrary, these vectors will become uniformly dense on the circumference of a circle; this is a theorem of Weyl. In a higher number of dimensions, the theorem of Kronecker asserts that in case A is a rotation in n space with the angles of rotation in each plane rationally independent, the points formed by the iteration of the vector will lie densely on the surface of the n-dimensional torus. Weyl’s theorem permits us to assert that they will be uniformly dense, in the sense that if we take an arbitrary area on the torus and normalize the entire area of the torus to 1, then the fraction of the iterates of the vector Ai(r) that fall into this area, divided by N, will tend to the area of A. The behavior of the iterates of a rotation is therefore in a sense diametrically opposite to that of the iterates of a matrix with positive coefficients. One can prove, however, for an arbitrary matrix A, the following property: Consider the iterates Ai(r) and the points on the unit sphere defined, for i = 1, 2, … , by
Let R be any region on the surface of the unit sphere. The sojourn time for the pi’s, under the iterates of the matrix, that fall into the region R exists for almost every starting vector r.
One might be interested in the properties of iterates of transformations that are more general than the linear ones, in particular, in case A is a quadratic transformation—that is, a transformation such that the coordinates of the transformed point are quadratic functions of the original point. Such transformations occur in describing processes involving binary reactions between particles. In the problems discussed above, each particle behaved, in a given generation, independently of the other particles; i.e., the chances determining the fate of a neutron in a given generation were assumed to be independent of the fate of the coexisting neutrons. This is reasonable if the time can be chosen sufficiently short. In problems encountered in genetic studies and in biological processes, this is not so; quite to the contrary, the production of the new generation may be determined by pairs of particles. A mathematical study of the evolution of such processes could then necessitate a study of iterates of quadratic transformations.
As an illustration of such a problem, consider a system of a large number N of particles, each of which is of one of k possible “colors.” Assume that in a given generation our particles combine in pairs at random and that each pair produces exactly two particles with colors that are given functions of the colors of the parents. The original particles, we may assume, disappear in the generation. In the next generation this process continues. We might be interested in the population ratios of particles of each kind—in particular, whether these will tend to a steady distribution. Assume, for example, that k = 3 and that we have the following rule determining the index of the offspring as a function of the indices of its parents. A pair of particles of type 1 and 1 produces particles of type 1. Particles of type 1 with particles of type 2 produce again particles of type 1. Particles of type 1 with particles of type 3 produce particles of type 2. Two particles of type 2 produce together particles of type 3. Particles of type 2 with particles of type 3 produce particles of type 3. Finally, two particles of type 3 produce particles of type 1. If x1, x2, x3 denote the fractions of each kind in the total population originally present, among the N particles, then the new proportions, , in the second generation will be given by the formulas
These formulas define a quadratic transformation of the three-dimensional space into itself. Actually, since x1 + x2 + x3 = 1 and, in virtue of our special convention, also, the transformation is really one of a two-dimensional triangular area into itself. The proportions of the population of each kind will be given in subsequent generations by iteration of this transformation. The particular transformation corresponds to our particular rule. There are many different rules of this kind possible—actually for k = 3, 97 nonequivalent (by permutation of indices) ones—and a study of properties of iterates of all these was made by P. Stein and the author.12 Under iteration of some of the transformations of the above type, any nondegenerate initial vector converges to a unique limiting value. In some other cases, the behavior approaches a periodic permutation among a finite number of limiting vectors. (In one case, for k = 4, there is a convergence of initial vectors to one of 12 fixed vectors permutating successively among themselves!)
For k > 3, the number of different cases increases rapidly. The ergodic properties of such transformations are not well known, and a study of them by sampling methods would have heuristic value: The transformation that we wrote above refers to the expected values of the fractions of particles of the different types. For a finite—even quite large—N, fluctuations would make the process deviate from the behavior predicted by means of the first moments.
It may be conjectured that, at least for transformations of the above type, the time-average limits exist for almost every initial condition. That is to say, if we denote by xin the fraction of the population of the ith type in the nth generation, then
exists for each i, i = 1, …, k, and for almost every vector
x = (x1,x2, …, xk)
with
The transformations like the above are, of course, still of a very special form. The assumption that each pair produces exactly two particles with identical indices may be generalized to allow a variable number of offspring with possibly different characteristics. The equations describing the expected values of the particles of each kind will still be quadratic, but with nonintegral coefficients. If the system of this type should be supercritical, that is to say, the expected value of the total number of particles should increase to infinity, an analogue of the theorem on branching processes mentioned may hold: In the space of all possible branching processes based on binary reactions as above, those processes that lead to population ratios approaching the ratios predicted by the “deterministic” method—that is to say, the limiting values given by iterates of the transformations of expected values—have measure equal to the measure of all processes. In other words, with great probability the probabilistic processes will converge to the states given by the iterates of the transformation concerning the first moments.
REFERENCES
1. Brown, George W., Monte Carlo Methods, chap. 12 in “Modern Mathematics for the Engineer,” First Series, edited by E. F. Beckenbach, McGraw-Hill Book Company, Inc., New York, 1956.
2. Cashwell, E. D., and C. J. Everett, “A Practical Manual on the Monte Carlo Method,” Pergamon Press, New York, 1959.
3. Everett, C. J., and S. Ulam, “Multiplicative Systems,” I, AECD-2164, II, AECD-2165, III, AECD-2532, Technical Information Branch, Oak Ridge, Tenn., 1948.
4. —— and ——, Multiplicative Systems, I, Proc. Nat. Acad. Sci. U.S.A., vol. 34, pp. 403–405, 1948.
5. Goad, W., and R. Johnston, A Monte Carlo Method for Criticality Problems, Nuc. Sci. and Eng., vol. 5, pp. 371–375, 1959.
6. Goertzel, G., “Quota Sampling and Importance Functions in Stochastic Solution of Particle Problems,” AECD-2793, Technical Information Branch, Oak Ridge, Tenn., 1949.
7. Goldberger, M. L., The Interaction of High Energy Neutrons and Heavy Nuclei, Phys. Rev., vol. 74, pp. 1269–1277, 1948.
8. Kahn, H., Random Sampling (Monte Carlo) Techniques in Neutron Attenuation Problems. I, II, Nucleonics, vol. 6, pp. 27–33, 37, 60–65, 1950.
9. Metropolis, N., et al., Monte Carlo Calculations on Intranuclear Cascades I. Low Energy Studies, Phys. Rev., vol. 110, pp. 185–203, 1958; II. High Energy Studies and Pion Processes, Phys. Rev., vol. 110, pp. 204–219, 1958.
10. Richtmyer, R. D., Monte Carlo Methods, in “Symposium on Nuclear Reactor Theory,” edited by Garrett Birkhoff, American Mathematical Society (to appear).
11. Salsburg, Z. W., et al., Application of the Monte Carlo Method to the Lattice Gas Model, J. Chem. Phys., vol. 30, pp. 65–72, 1959.
12. Stein, P., and S. Ulam, “Quadratic Transformations, I,” LA-2305, Office of Technical Services, U.S. Department of Commerce, 1959.
13. Taussky, Olga, and John Todd, Generation and Testing of Pseudo-random Numbers, in “Symposium on Monte Carlo Methods,” John Wiley & Sons, Inc., New York, 1956.
14. Wiener, Norbert, The Theory of Prediction, chap. 8 in “Modern Mathematics for the Engineer,” First Series, edited by E. F. Beckenbach, McGraw-Hill Book Company, Inc., New York, 1956.
15. Wood, W. W., and F. R. Parker, Monte Carlo Equation of State of Molecules Interacting with the Lennard-Jones Potential. I. A Supercritical Isotherm at about Twice the Critical Temperature, J. Chem. Phys., vol. 27, pp. 720–733, 1957.
16. —— et al., Recent Monte Carlo Calculations of the Equation of State of Lennard-Jones and Hard Sphere Molecules, Nuovo Cimento, ser. 10, vol. 9, pp. 133–143, 1958.