Many computer techniques are well-defined sets of procedures leading to definite outcomes. In contrast, some computational techniques are trial-and-error algorithms in which decisions on what path to follow are made based on the current values of variables, and the program quits only when it thinks it has solved the problem. (We already did some of this when we summed a power series until the terms became small.) Writing this type of program is usually interesting because we must foresee how to have the computer act intelligently in all possible situations, and running them is very much like an experiment in which it is hard to predict what the computer will come up with.
Problem Probably the most standard problem in quantum mechanics1) is to solve for the energies of a particle of mass m bound within a 1D square well of radius a:
As shown in quantum mechanics texts (Gottfried, 1966), the energies of the bound states within this well are solutions of the transcendental equations
where even and odd refer to the symmetry of the wave function. Here we have chosen units such that . Your problem is to
Trial-and-error root finding looks for a value of x for which
where the 0 on the right-hand side is conventional (an equation such as can easily be written as
. The search procedure starts with a guessed value for x, substitutes that guess into f(x) (the “trial”), and then sees how far the LHS is from zero (the “error”). The program then changes x based on the error and tries out the new guess in f(x). The procedure continues until
to some desired level of precision, or until the changes in x are insignificant, or if the search seems endless.
The most elementary trial-and-error technique is the bisection algorithm. It is reliable but slow. If you know some interval in which f(x) changes sign, then the bisection algorithm will always converge to the root by finding progressively smaller and smaller intervals within which the zero lies. Other techniques, such as the Newton–Raphson method we describe next, may converge more quickly, but if the initial guess is not close, it may become unstable and fail completely.
Figure 7.1 A graphical representation of the steps involved in solving for a zero of f(x) using the bisection algorithm. The bisection algorithm takes the midpoint of the interval as the new guess for x, and so each step reduces the interval size by one-half. Four steps are shown for the algorithm.
The basis of the bisection algorithm is shown in Figure 7.1. We start with two values of x between which we know that a zero occurs. (You can determine these by making a graph or by stepping through different x values and looking for a sign change.) To be specific, let us say that f(x) is negative at x− and positive at x+:
(Note that it may well be that x− > x+ if the function changes from positive to negative as x increases.) Thus, we start with the interval x+ ≤ x ≤ x− within which we know a zero occurs. The algorithm (implemented as in Listing 7.1) then picks the new x as the bisection of the interval and selects as its new interval the half in which the sign change occurs:
This process continues until the value of f(x) is less than a predefined level of precision or until a predefined (large) number of subdivisions occurs.
Listing 7.1 The Bisection.py is a simple implementation of the bisection algorithm for finding a zero of a function, in this case 2 cos x – x.
The example in Figure 7.1 shows the first interval extending from x− = x+1 to x+ = x−1. We bisect that interval at x, and because f(x) < 0 at the midpoint, we set x− ≡ x−2 = x and label it x−2 to indicate the second step. We then use x+2 ≡ x+1 and x−2 as the next interval and continue the process. We see that only x− changes for the first three steps in this example, but for the fourth step x+ finally changes. The changes then become too small for us to show.
The Newton–Raphson algorithm finds approximate roots of the equation
more quickly than the bisection method. As we see graphically in Figure 7.2, this algorithm is the equivalent of drawing a straight line tangent to the curve at an x value for which
and then using the intercept of the line with the x-axis at
as an improved guess for the root. If the “curve” was a straight line, the answer would be exact; otherwise, it is a good approximation if the guess is close enough to the root for f(x) to be nearly linear. The process continues until some set level of precision is reached. If a guess is in a region where f(x) is nearly linear (Figure 7.2), then the convergence is much more rapid than for the bisection algorithm.
The analytic formulation of the Newton–Raphson algorithm starts with an old guess x0 and expresses a new guess x as a correction Δx to the old guess:
Figure 7.2 A graphical representation of the steps involved in solving for a zero of f(x) using the Newton–Raphson method. The Newton–Raphson method takes the new guess as the zero of the line tangent to f(x) at the old guess. Two guesses are shown.
We next expand the known function f(x) in a Taylor series around x0 and keep only the linear terms:
We then determine the correction Δx by calculating the point at which this linear approximation to f(x) crosses the x-axis:
The procedure is repeated starting at the improved x until some set level of precision is obtained.
The Newton–Raphson algorithm (7.12) requires evaluation of the derivative df / dx at each value of x0. In many cases, you may have an analytic expression for the derivative and can build it into the algorithm. However, especially for more complicated problems, it is simpler and less error-prone to use a numerical forward-difference approximation to the derivative2):
where δx is some small change in x that you just chose (different from the Δ used for searching in (7.12)). While a central-difference approximation for the derivative would be more accurate, it would require additional evaluations of the f’s, and once you find a zero, it does not matter how you got there. In Listing 7.2, we give a program NewtonCD.py
that implement the search with the central difference derivative.
Listing 7.2 NewtonCD.py uses the Newton–Raphson method to search for a zero of the function f(x). A central-difference approximation is used to determine df/dx.
Two examples of possible problems with the Newton–Raphson algorithm are shown in Figure 7.3. In Figure 7.3a, we see a case where the search takes us to an x value where the function has a local minimum or maximum, that is, where df / dx = 0. Because Δx = –f/f′, this leads to a horizontal tangent (division by zero), and so the next guess is x = ∞, from where it is hard to return. When this happens, you need to start your search with a different guess and pray that you do not fall into this trap again. In cases where the correction is very large but maybe not infinite, you may want to try backtracking (described below) and hope that by taking a smaller step you will not get into as much trouble.
In Figure 7.3b, we see a case where a search falls into an infinite loop surrounding the zero without ever getting there. A solution to this problem is called backtracking. As the name implies, in cases where the new guess x0 + Δx leads to an increase in the magnitude of the function, , you can backtrack somewhat and try a smaller guess, say, x0 + Δx/2. If the magnitude of f still increases, then you just need to backtrack some more, say, by trying x0 + Δx/4 as your next guess, and so forth. Because you know that the tangent line leads to a local decrease in
f
, eventually an acceptable small enough step should be found.
The problem in both these cases is that the initial guesses were not close enough to the regions where f(x) is approximately linear. So again, a good plot may help produce a good first guess. Alternatively, you may want to start your search with the bisection algorithm and then switch to the faster Newton–Raphson algorithm when you get closer to the zero.
Figure 7.3 Two examples of how the Newton–Raphson algorithm may fail if the initial guess is not in the region where f(x) can be approximated by a straight line. (a) A guess lands at a local minimum/maximum, that is, a place where the derivative vanishes, and so the next guess ends up at x = ∞. (b)The search has fallen into an infinite loop. The technique know as “backtracking” could eliminate this problem.
Problem Determine M(T) the magnetization as a function of temperature for simple magnetic materials.
A collection of N spin-1/2 particles each with the magnetic moment µ is at temperature T. The collection has an external magnetic field B applied to it and comes to equilibrium with NL particles in the lower energy state (spins aligned with the magnetic field), and with NU particles in the upper energy state (spins opposed to the magnetic field). The Boltzmann distribution law tells us that the relative probability of a state with energy E is proportional to , where kB is Boltzmann’s constant. For a dipole with moment µ, its energy in a magnetic field is given by the dot product
. Accordingly, spin-up particle have lower energy in a magnetic field than spin-down particles, and thus are more probable.
Applying the Boltzmann distribution to our spin problem, we have that the number of particles in the lower energy level (spin up) is
while the number of particles in the upper energy level (spin down) is
As discussed by (Kittel, 2005), we now assume that the molecular magnetic field B = λ M is much larger than the applied magnetic field and so replace B by the molecular field. This permits us to eliminate B from the preceding equations. The magnetization M(T) is given by the individual magnetic moment µ times the net number of particles pointing in the direction of the magnetic field:
Note that this expression appears to make sense because as the temperature approaches zero, all spins will be aligned along the direction of B and so M(T = 0) = Nµ.
Solution via Searching Equation 7.17 relates the magnetization and the temperature. However, it is not really a solution to our problem because M appears on the LHS of the equation as well as within the hyperbolic function on the RHS. Generally, a transcendental equation of this sort does not have an analytic solution that would give M as simply a function of the temperature T. But by working backward, we can find a numerical solution. To do that we first express (7.17) in terms of the reduced magnetization m, the reduced temperature t, and the Curie temperature Tc:
While it is no easier to find an analytic solution to (7.18) than it was to (7.17), the simpler form of (7.18) makes the programming easier as we search for values of t and m that satisfy (7.18).
Figure 7.4 A function of the reduced magnetism m at three reduced temperatures t. A zero of this function determines the value of the magnetism at a particular value of t.
One approach to a trial-and-error solution is to define a function
and then, for a variety of fixed t = ti values, search for those m values at which f(m, ti) = 0. (One could just as well fix the value of m to mj and search for the value of t for which f(mj, t) = 0; once you have a solution, you have a solution.) Each zero so found gives us a single value of m(ti). A plot or a table of these values for a range of ti values then provides the best we can do as the desired solution m(t).
Figure 7.4 shows three plots of f(m,t) as a function of the reduced magnetization m, each plot for a different value of the reduced temperature. As you can see, other than the uninteresting solution at m = 0, there is only one solution (a zero) near m = 1 for t = 0.5, and no solution at other temperatures.
Problem The cross sections measured for the resonant scattering of neutrons from a nucleus are given in Table 7.1. Your problem is to determine values for the cross sections at energy values lying between those in the table.
You can solve this problem in a number of ways. The simplest is to numerically interpolate between the values of the experimental f(Ei) given in Table 7.1. This is direct and easy but does not account for there being experimental noise in the data. A more appropriate solution (discussed in Section 7.7) is to find the best fit of a theoretical function to the data. We start with what we believe to be the “correct” theoretical description of the data,
Table 7.1 Experimental values for a scattering cross section (f(E) in the theory), each with absolute error ±σi, as a function of energy (xi in the theory).
i = | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Ei (MeV) | 0 | 25 | 50 | 75 | 100 | 125 | 150 | 175 | 200 |
g(Ei) (mb) | 10.6 | 16.0 | 45.0 | 83.5 | 52.8 | 19.9 | 10.8 | 8.25 | 4.7 |
Error (mb) | 9.34 | 17.9 | 41.5 | 85.5 | 51.5 | 21.5 | 10.8 | 6.29 | 4.14 |
where fr, Er, and Γ are unknown parameters. We then adjust the parameters to obtain the best fit. This is a best fit in a statistical sense but in fact may not pass through all (or any) of the data points. For an easy, yet effective, introduction to statistical data analysis, we recommend (Bevington and Robinson, 2002).
These two techniques of interpolation and least-squares fitting are powerful tools that let you treat tables of numbers as if they were analytic functions and sometimes let you deduce statistically meaningful constants or conclusions from measurements. In general, you can view data fitting as global or local. In global fits, a single function in x is used to represent the entire set of numbers in a table like Table 7.1. While it may be spiritually satisfying to find a single function that passes through all the data points, if that function is not the correct function for describing the data, the fit may show nonphysical behavior (such as large oscillations) between the data points. The rule of thumb is that if you must interpolate, keep it local and view global interpolations with a critical eye.
Consider Table 7.1 as ordered data that we wish to interpolate. We call the independent variable x and its tabulated values xi(i = 1, 2, …), and assume that the dependent variable is the function g(x), with the tabulated values gi = g(xi). We assume that g(x) can be approximated as an (n – 1)-degree polynomial in each interval i:
Because our fit is local, we do not assume that one g(x) can fit all the data in the table but instead use a different polynomial, that is, a different set of ai values, for each interval. While each polynomial is of low degree, multiple polynomials are needed to span the entire table. If some care is taken, the set of polynomials so obtained will behave well enough to be used in further calculations without introducing much unwanted noise or discontinuities.
The classic interpolation formula was created by Lagrange. He figured out a closed-form expression that directly fits the (n – 1)-order polynomial (7.22) to n values of the function g(x) evaluated at the points xi . The formula for each interval is written as the sum of polynomials:
For three points, (7.23) provides a second-degree polynomial, while for eight points it gives a seventh-degree polynomial. For example, assume that we are given the points and function values
With four points, the Lagrange formula determines a third-order polynomial that reproduces each of the tabulated values:
As a check, we see that
If the data contain little noise, this polynomial can be used with some confidence within the range of the data, but with risk beyond the range of the data.
Notice that Lagrange interpolation has no restriction that the points xi be evenly spaced. Usually, the Lagrange fit is made to only a small region of the table with a small value of n, despite the fact that the formula works perfectly well for fitting a high-degree polynomial to the entire table. The difference between the value of the polynomial evaluated at some x and that of the actual function can be shown to be the remainder
where ζ lies somewhere in the interpolation interval. What significant here is that we see that if significant high derivatives exist in g(x), then it cannot be approximated well by a polynomial. For example, a table of noisy data would have significant high derivatives.
Consider the experimental neutron scattering data in Table 7.1. The expected theoretical functional form that describes these data is (7.21), and our empirical fits to these data are shown in Figure 7.5.
This example shows how easy it is to go wrong with a high-degree-polynomial fit to data with errors. Although the polynomial is guaranteed to pass through all the data points, the representation of the function away from these points can be quite unrealistic. Using a low-order interpolation formula, say, n = 2 or 3, in each interval usually eliminates the wild oscillations, but may not have any theoretical justification. If these local fits are matched together, as we discuss in the next section, a rather continuous curve results. Nonetheless, you must recall that if the data contain errors, a curve that actually passes through them may lead you astray. We discuss how to do this properly with least-squares fitting in Section 7.7.
If you tried to interpolate the resonant cross section with Lagrange interpolation, then you saw that fitting parabolas (three-point interpolation) within a table may avoid the erroneous and possibly catastrophic deviations of a high-order formula. (A two-point interpolation, which connects the points with straight lines, may not lead you far astray, but it is rarely pleasing to the eye or precise.) A sophisticated variation of an n = 4 interpolation, known as cubic splines, often leads to surprisingly eye-pleasing fits. In this approach (Figure 7.5), cubic polynomials are fit to the function in each interval, with the additional constraint that the first and second derivatives of the polynomials be continuous from one interval to the next. This continuity of slope and curvature is what makes the spline fit particularly eye-pleasing. It is analogous to what happens when you use the flexible spline drafting tool (a lead wire within a rubber sheath) from which the method draws its name.
Figure 7.5 Three fits to data. Dashed: Lagrange interpolation using an eight-degree polynomial; Short dashes: cubic splines fit ; Long dashed: Least-squares parabola fit.
The series of cubic polynomials obtained by spline-fitting a table of data can be integrated and differentiated and is guaranteed to have well-behaved derivatives. The existence of meaningful derivatives is an important consideration. As a case in point, if the interpolated function is a potential, you can take the derivative to obtain the force. The complexity of simultaneously matching polynomials and their derivatives over all the interpolation points leads to many simultaneous linear equations to be solved. This makes splines unattractive for hand calculation, yet easy for computers and, not surprisingly, popular in both calculations and computer drawing programs. To illustrate, the smooth solid curve in Figure 7.5 is a spline fit.
The basic approximation of splines is the representation of the function g(x) in the subinterval [xi, xi+1] with a cubic polynomial:
This representation makes it clear that the coefficients in the polynomial equal the values of g(x) and its first, second, and third derivatives at the tabulated points xi. Derivatives beyond the third vanish for a cubic. The computational chore is to determine these derivatives in terms of the N tabulated gi values. The matching of gi at the nodes that connect one interval to the next provides the equations
The matching of the first and second derivatives at each interval’s boundaries provides the equations
The additional equations needed to determine all constants are obtained by matching the third derivatives at adjacent nodes. Values for the third derivatives are found by approximating them in terms of the second derivatives:
As discussed in Chapter 5, a central-difference approximation would be more accurate than a forward-difference approximation, yet (7.33) keeps the equations simpler.
It is straightforward yet complicated to solve for all the parameters in (7.30). We leave that to the references (Thompson, 1992; Press et al., 1994). We can see, however, that matching at the boundaries of the intervals results in only (N – 2) linear equations for N unknowns. Further input is required. It usually is taken to be the boundary conditions at the endpoints a = x1 and b = xN, specifically, the second derivatives g″(a) and g″(b). There are several ways to determine these second derivatives:
Natural spline: Set g″(a) = g″(b) = 0; that is, permit the function to have a slope at the endpoints but no curvature. This is “natural” because the derivative vanishes for the flexible spline drafting tool (its ends being free).
Input values for g′ at the boundaries: The computer uses g′(a) to approximate g″(a). If you do not know the first derivatives, you can calculate them numerically from the table of gi values.
Input values for g″ at the boundaries: Knowing values is of course better than approximating values, but it requires the user to input information. If the values of g″ are not known, they can be approximated by applying a forward-difference approximation to the tabulated values:
A powerful integration scheme is to fit an integrand with splines and then integrate the cubic polynomials analytically. If the integrand g(x) is known only at its tabulated values, then this is about as good an integration scheme as is possible; if you have the ability to calculate the function directly for arbitrary x, Gaussian quadrature may be preferable. We know that the spline fit to g in each interval is the cubic (7.30)
It is easy to integrate this to obtain the integral of g for this interval and then to sum over all intervals:
Making the intervals smaller does not necessarily increase precision, as subtractive cancelations in (7.36) may get large.
Spline Fit of Cross Section (Implementation) Fitting a series of cubics to data is a little complicated to program yourself, so we recommend using a library routine. While we have found quite a few Java-based spline applications available on the Internet, none seemed appropriate for interpreting a simple set of numbers. That being the case, we have adapted the splint.c
and the spline.c
functions from (Press et al., 1994) to produce the SplineInteract.py
program shown in Listing 7.3 (there is also an applet). Your problem for this section is to carry out the assessment in Section 7.5.1 using cubic spline interpolation rather than Lagrange interpolation.
Problem Figure 7.6 presents actual experimental data on the number of decays ΔN of the π meson as a function of time (Stetz et al., 1973). Notice that the time has been “binned” into Δt = 10 ns intervals and that the smooth curve is the theoretical exponential decay expected for very large numbers of pions (which there is not). Your problem is to deduce the lifetime τ of the π meson from these data (the tabulated lifetime of the pion is 2.6 × 10–8 s).
Theory Assume that we start with N0 particles at time t = 0 that can decay to other particles.3) If we wait a short time Δt, then a small number ΔN of the particles will decay spontaneously, that is, with no external influences. This decay is a stochastic process, which means that there is an element of chance involved in just when a decay will occur, and so no two experiments are expected to give exactly the same results. The basic law of nature for spontaneous decay is that the number of decays ΔN in a time interval Δt is proportional to the number of particles N(t) present at that time and to the time interval
Listing 7.3 SplineInteract.py performs a cubic spline fit to data and permits interactive control. The arrays x[] and y[] are the data to fit, and the values of the fit at Nfit points are output.
Figure 7.6 A reproduction of the experimental measurement by Stetz et al. (1973) giving the number of decays of a π meson as a function of time since its creation. Measurements were made during time intervals (box sizes) of 10-ns width. The dashed curve is the result of a linear least-squares fit to the log N(t).
Here τ = 1/λ is the lifetime of the particle, with λ the rate parameter. The actual decay rate is given by the second equation in (7.38). If the number of decays ΔN is very small compared to the number of particles N, and if we look at vanishingly small time intervals, then the difference equation (7.38) becomes the differential equation
This differential equation has an exponential solution for the number as well as for the decay rate:
Equation 7.40 is the theoretical formula we wish to “fit” to the data in Figure 7.6. The output of such a fit is a “best value” for the lifetime τ.
Books have been written and careers have been spent discussing what is meant by a “good fit” to experimental data. We cannot do justice to the subject here and refer the reader to Bevington and Robinson (2002); Press et al. (1994); Thompson (1992). However, we will emphasize three points:
Imagine that you have measured ND data values of the independent variable y as a function of the dependent variable x:
where ±σi is the experimental uncertainty in the ith value of y. (For simplicity we assume that all the errors σi occur in the dependent variable, although this is hardly ever true (Thompson, 1992)). For our problem, y is the number of decays as a function of time, and xi are the times. Our goal is to determine how well a mathematical function y = g(x) (also called a theory or a model) can describe these data. Additionally, if the theory contains some parameters or constants, our goal can be viewed as determining the best values for these parameters. We assume that the theory function g(x) contains, in addition to the functional dependence on x,an additional dependence upon MP parameters {a1, a2, …, aMP}. Notice that the parameters {am} are not variables, in the sense of numbers read from a meter, but rather are parts of the theoretical model, such as the size of a box, the mass of a particle, or the depth of a potential well. For the exponential decay function (7.40), the parameters are the lifetime τ and the initial decay rate dN(0)/dt.We include the parameters as
where the ai’s are parameters and x the independent variable. We use the chisquare (χ2) measure (Bevington and Robinson, 2002) as a gauge of how well a theoretical function g reproduces data:
where the sum is over the ND experimental points (xi, yi ±σi). The definition (7.43) is such that smaller values of χ2 are better fits, with χ2 = 0 occurring if the theoretical curve went through the center of every data point. Notice also that the weighting means that measurements with larger errors4) contribute less to χ2.
Least-squares fitting refers to adjusting the parameters in the theory until a minimum in χ2 is found, that is, finding a curve that produces the least value for the summed squares of the deviations of the data from the function g(x). In general, this is the best fit possible and the best way to determine the parameters in a theory. The MP parameters {am, m = 1, MP} that make χ2 an extremum are found by solving the MP equations:
Often, the function g(x; {am}) has a sufficiently complicated dependence on the am values for (7.44) to produce MP simultaneous nonlinear equations in the am values. In these cases, solutions are found by a trial-and-error search through the MP-dimensional parameter space, as we do in Section 7.8.2. To be safe, when such a search is completed, you need to check that the minimum χ2 you found is global and not local. One way to do that is to repeat the search for a whole grid of starting values, and if different minima are found, to pick the one with the lowest χ2.
When the deviations from theory are as a result of random errors and when these errors are described by a Gaussian distribution, there are some useful rules of thumb to remember (Bevington and Robinson, 2002). You know that your fit is good if the value of χ2 calculated via the definition (7.43) is approximately equal to the number of degrees of freedom χ2 ND−Mp where ND is the number of data points and MP is the number of parameters in the theoretical function. If your χ2 is much less than ND − MP, it does not mean that you have a “great” theory or a really precise measurement; instead, you probably have too many parameters or have assigned errors (σi values) that are too large. In fact, too small χ2 may indicate that you are fitting the random scatter in the data rather than missing approximately one-third of the error bars, as expected if the errors are random. If your χ2 is significantly greater than ND − MP, the theory may not be good, you may have significantly underestimated your errors, or you may have errors that are not random.
The MP simultaneous equations (7.44) can be simplified considerably if the functions g(x; {am}) depend linearly on the parameter values ai, for example,
Figure 7.7 A linear least-squares best fit of a straight line to data. The deviation of theory from experiment is greater than would be expected from statistics, which means that a straight line is not a good theory to describe these data.
In this case (also known as linear regression or straight-line fit),as shown in Figure 7.7, there are Mp = 2 parameters, the slope a2, and the y intercept a1. Notice that while there are only two parameters to determine, there still may be an arbitrary number ND of data points to fit. Remember, a unique solution is not possible unless the number of data points is equal to or greater than the number of parameters. For this linear case, there are just two derivatives,
and after substitution, the χ2 minimization equations (7.44) can be solved (Press et al., 1994):
Statistics also gives you an expression for the variance or uncertainty in the deduced parameters:
This is a measure of the uncertainties in the values of the fitted parameters arising from the uncertainties σi in the measured yi values. A measure of the dependence of the parameters on each other is given by the correlation coefficient:
Here cov(a1, a2) is the covariance of a1 and a2 and vanishes if a1 and a2 are independent. The correlation coefficient ρ(a1, a2)lies in the range −1 ≤ ρ ≤ 1, with a positive ρ indicating that the errors in a1 and a2 are likely to have the same sign, and a negative ρ indicating opposite signs.
The preceding analytic solutions for the parameters are of the form found in statistics books but are not optimal for numerical calculations because subtractive cancelation can make the answers unstable. As discussed in Chapter 3, a rearrangement of the equations can decrease this type of error. For example, Thompson (1992) gives improved expressions that measure the data relative to their averages:
In Fit.py
in Listing 7.4 we give a program that fits a parabola to some data. You can use it as a model for fitting a line to data, although you can use our closed-form expressions for a straight-line fit. In Fit.py
on the instructor’s site, we give a program for fitting to the decay data.
This means that if we treat ln |ΔN(t)/Δt| as the dependent variable and time Δt as the independent variable, we can use our linear-fit results. Plot ln|ΔN/Δt| vs. Δt.
Table 7.2 Temperature vs. distance as measured along a metal rod.
xi (cm) | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 | 7.0 | 8.0 | 9.0 |
Ti (C) | 14.6 | 18.5 | 36.6 | 30.8 | 59.2 | 60.1 | 62.2 | 79.4 | 99.9 |
Table 7.3 Distance vs. radial velocity for 24 extragalactic nebulae.
Object | r (Mpc) | v (km/s) | Object | r (Mpc) | v (km/s) |
0.032 | 170 | 3627 | 0.9 | 650 | |
0.034 | 290 | 4826 | 0.9 | 150 | |
6822 | 0.214 | –130 | 5236 | 0.9 | 500 |
598 | 0.263 | –70 | 1068 | 1.0 | 920 |
221 | 0.275 | –185 | 5055 | 1.1 | 450 |
224 | 0.275 | –220 | 7331 | 1.1 | 500 |
5457 | 0.45 | 200 | 4258 | 1.4 | 500 |
4736 | 0.5 | 290 | 4141 | 1.7 | 960 |
5194 | 0.5 | 270 | 4382 | 2.0 | 500 |
4449 | 0.63 | 200 | 4472 | 2.0 | 850 |
4214 | 0.8 | 300 | 4486 | 2.0 | 800 |
3031 | 0.9 | –30 | 4649 | 2.0 | 1090 |
As indicated earlier, as long as the function being fitted depends linearly on the unknown parameters ai, the condition of minimum χ2 leads to a set of simultaneous linear equations for the a’s that can be solved by hand or on the computer using matrix techniques. To illustrate, suppose we want to fit the quadratic polynomial
to the experimental measurements (xi, yi, i= l, ND)(Figure 7.8). Because this g(x) is linear in all the parameters ai, we can still make a linear fit although x is raised to the second power. (However, if we tried to a fit a function of the form g(x) = (a1 + a2x) exp(−a3x) to the data, then we would not be able to make a linear fit because there is not a linear dependence on a3)
Figure 7.8 A linear least-squares best fit of a parabola to data. Here we see that the fit misses approximately one-third of the points, as expected from the statistics for a good fit.
The best fit of this quadratic to the data is obtained by applying the minimum χ2 condition (7.44) for Mp = 3 parameters and ND (still arbitrary) data points. A solution represents the maximum likelihood that the deduced parameters provide a correct description of the data for the theoretical function g(x). Equation 7.44 leads to the three simultaneous equations for a1, a2,and a3:
Note: Because the derivatives are independent of the parameters (the a’s), the a dependence arises only from the term in square brackets in the sums, and because that term has only a linear dependence on the a’s, these equations are linear in the a’s.
Exercise Show that after some rearrangement, (7.58)–(7.60) can be written as
Here the definitions of the S‘s are simple extensions of those used in (7.47)–(7.49) and are programmed in Fit.py
shown in Listing 7.4. After placing the three unknown parameters into a vector x and the known three RHS terms in (7.61) into a vector b, these equations assume the matrix form
The solution for the parameter vector a is obtained by solving the matrix equations. Although for 3 × 3 matrices, we can write out the solution in a closed form, for larger problems the numerical solution requires matrix methods.
Listing 7.4 Fit.py performs a least-squares fit of a parabola to data using the NumPy linalg package to solve the set of linear equations Sa = s.
Linear Quadratic Fit Assessment
Problem Remember how earlier in this chapter we interpolated the values in Table 7.1 in order to obtain the experimental cross section ∑ as a function of energy. Although we did not use it, we also gave the theory describing these data, namely, the Breit-Wigner resonance formula (7.21):
Your problem is to determine what values for the parameters (Er, fr, and Γ in (7.64) provide the best fit to the data in Table 7.1.
Because (7.64) is not a linear function of the parameters (Er, Σ0, Γ), the three equations that result from minimizing χ2 are not linear equations and so cannot be solved by the techniques of linear algebra (matrix methods). However, in our study of the masses on a string problem, we showed how to use the Newton–Raphson algorithm to search for solutions of simultaneous nonlinear equations. That technique involved expansion of the equations about the previous guess to obtain a set of linear equations and then solving the linear equations with the matrix libraries. We now use this same combination of fitting, trial-and-error searching, and matrix algebra to conduct a nonlinear least-squares fit of (7.64) to the data in Table 7.1.
Recall that the condition for a best fit is to find values of the Mp parameters am in the theory g(x, am) that minimize. This leads to the Mp equations (7.44) to solve
To find the form of these equations appropriate to our problem, we rewrite our theory function (7.64) in the notation of (7.65):
The three derivatives required in (7.65) are then
Substitution of these derivatives into the best-fit condition (7.65) yields three simultaneous equations in a1, a2, and a3 that we need to solve in order to fit the ND = 9 data points (xi, yi) in Table 7.1:
Even without the substitution of (7.64) for g(x, a), it is clear that these three equations depend on the a’s in a nonlinear fashion. That is okay because in Section 6.1.2 we derived the N-dimensional Newton–Raphson search for the roots of
where we have made the change of variable yi → ai for the present problem. We use that same formalism here for the N = 3 Equation 7.69 by writing them as
Because fr ≡ a1 is the peak value of the cross section, ER ≡ a2 is the energy at which the peak occurs, and is the full width of the peak at half-maximum, good guesses for the a’s can be extracted from a graph of the data. To obtain the nine derivatives of the three f’s with respect to the three unknown a’s, we use two nested loops over i and j, along with the forward-difference approximation for the derivative
where Δaj corresponds to a small, say ≤1%, change in the parameter value.
Nonlinear Fit Implementation Use the Newton–Raphson algorithm as outlined in Section 7.8.2 to conduct a nonlinear search for the best-fit parameters of the Breit–Wigner theory (7.64) to the data in Table 7.1. Compare the deduced values of (fr, ER, Γ) to that obtained by inspection of the graph.