7
Trial-and-Error Searching and Data Fitting

7.1 Problem 1: A Search for Quantum States in a Box

Many computer techniques are well-defined sets of procedures leading to definite outcomes. In contrast, some computational techniques are trial-and-error algorithms in which decisions on what path to follow are made based on the current values of variables, and the program quits only when it thinks it has solved the problem. (We already did some of this when we summed a power series until the terms became small.) Writing this type of program is usually interesting because we must foresee how to have the computer act intelligently in all possible situations, and running them is very much like an experiment in which it is hard to predict what the computer will come up with.

Problem Probably the most standard problem in quantum mechanics1) is to solve for the energies of a particle of mass m bound within a 1D square well of radius a:

(7.1)c07-math-0001

As shown in quantum mechanics texts (Gottfried, 1966), the energies of the bound states c07-math-0002 within this well are solutions of the transcendental equations

(7.3)c07-math-0004

where even and odd refer to the symmetry of the wave function. Here we have chosen units such that c07-math-0005. Your problem is to

  1. Find several bound-state energies EB for even wave functions, that is, the solution of (7.2).
  2. See if making the potential deeper, say, by changing the 10 to a 20 or a 30, produces a larger number of, or deeper, bound states.

7.2 Algorithm: Trial-and-Error Roots via Bisection

Trial-and-error root finding looks for a value of x for which

(7.4)c07-math-0006

where the 0 on the right-hand side is conventional (an equation such as c07-math-0007 can easily be written as c07-math-0008. The search procedure starts with a guessed value for x, substitutes that guess into f(x) (the “trial”), and then sees how far the LHS is from zero (the “error”). The program then changes x based on the error and tries out the new guess in f(x). The procedure continues until c07-math-0009 to some desired level of precision, or until the changes in x are insignificant, or if the search seems endless.

The most elementary trial-and-error technique is the bisection algorithm. It is reliable but slow. If you know some interval in which f(x) changes sign, then the bisection algorithm will always converge to the root by finding progressively smaller and smaller intervals within which the zero lies. Other techniques, such as the Newton–Raphson method we describe next, may converge more quickly, but if the initial guess is not close, it may become unstable and fail completely.

c07f001

Figure 7.1 A graphical representation of the steps involved in solving for a zero of f(x) using the bisection algorithm. The bisection algorithm takes the midpoint of the interval as the new guess for x, and so each step reduces the interval size by one-half. Four steps are shown for the algorithm.

The basis of the bisection algorithm is shown in Figure 7.1. We start with two values of x between which we know that a zero occurs. (You can determine these by making a graph or by stepping through different x values and looking for a sign change.) To be specific, let us say that f(x) is negative at x and positive at x+:

(7.5)c07-math-0010

(Note that it may well be that x > x+ if the function changes from positive to negative as x increases.) Thus, we start with the interval x+xx within which we know a zero occurs. The algorithm (implemented as in Listing 7.1) then picks the new x as the bisection of the interval and selects as its new interval the half in which the sign change occurs:

c07f001

This process continues until the value of f(x) is less than a predefined level of precision or until a predefined (large) number of subdivisions occurs.

Listing 7.1 The Bisection.py is a simple implementation of the bisection algorithm for finding a zero of a function, in this case 2 cos x – x.

c07f002

The example in Figure 7.1 shows the first interval extending from x = x+1 to x+ = x−1. We bisect that interval at x, and because f(x) < 0 at the midpoint, we set xx−2 = x and label it x−2 to indicate the second step. We then use x+2x+1 and x−2 as the next interval and continue the process. We see that only x changes for the first three steps in this example, but for the fourth step x+ finally changes. The changes then become too small for us to show.

7.2.1 Implementation: Bisection Algorithm

  1. The first step in implementing any search algorithm is to get an idea of what your function looks like. For the present problem, you do this by making a plot of c07-math-0014 Note from your plot some approximate values at which c07-math-0015. Your program should be able to find more exact values for these zeros.
  2. Write a program that implements the bisection algorithm and use it to find some solutions of (7.2).
  3. Warning: Because the tan function has singularities, you have to be careful. In fact, your graphics program (or Maple) may not function accurately near these singularities. One cure is to use a different but equivalent form of the equation. Show that an equivalent form of (7.2) is
  4. Make a second plot of (7.6), which also has singularities but at different places. Choose some approximate locations for zeros from this plot.
  5. Evaluate f(EB) and thus determine directly the precision of your solution.
  6. Compare the roots you find with those given by Maple or Mathematica.

7.3 Improved Algorithm: Newton–Raphson Searching

The Newton–Raphson algorithm finds approximate roots of the equation

(7.7)c07-math-0017

more quickly than the bisection method. As we see graphically in Figure 7.2, this algorithm is the equivalent of drawing a straight line c07-math-0018 tangent to the curve at an x value for which c07-math-0019 and then using the intercept of the line with the x-axis at c07-math-0020 as an improved guess for the root. If the “curve” was a straight line, the answer would be exact; otherwise, it is a good approximation if the guess is close enough to the root for f(x) to be nearly linear. The process continues until some set level of precision is reached. If a guess is in a region where f(x) is nearly linear (Figure 7.2), then the convergence is much more rapid than for the bisection algorithm.

The analytic formulation of the Newton–Raphson algorithm starts with an old guess x0 and expresses a new guess x as a correction Δx to the old guess:

(7.8)c07-math-0021
(7.9)c07-math-0022
c07f001a

Figure 7.2 A graphical representation of the steps involved in solving for a zero of f(x) using the Newton–Raphson method. The Newton–Raphson method takes the new guess as the zero of the line tangent to f(x) at the old guess. Two guesses are shown.

We next expand the known function f(x) in a Taylor series around x0 and keep only the linear terms:

(7.10)c07-math-0023

We then determine the correction Δx by calculating the point at which this linear approximation to f(x) crosses the x-axis:

(7.11)c07-math-0024

The procedure is repeated starting at the improved x until some set level of precision is obtained.

The Newton–Raphson algorithm (7.12) requires evaluation of the derivative df / dx at each value of x0. In many cases, you may have an analytic expression for the derivative and can build it into the algorithm. However, especially for more complicated problems, it is simpler and less error-prone to use a numerical forward-difference approximation to the derivative2):

(7.13)c07-math-0026

where δx is some small change in x that you just chose (different from the Δ used for searching in (7.12)). While a central-difference approximation for the derivative would be more accurate, it would require additional evaluations of the f’s, and once you find a zero, it does not matter how you got there. In Listing 7.2, we give a program NewtonCD.py that implement the search with the central difference derivative.

Listing 7.2 NewtonCD.py uses the Newton–Raphson method to search for a zero of the function f(x). A central-difference approximation is used to determine df/dx.

c07f001a c07f001b

7.3.1 Newton–Raphson with Backtracking

Two examples of possible problems with the Newton–Raphson algorithm are shown in Figure 7.3. In Figure 7.3a, we see a case where the search takes us to an x value where the function has a local minimum or maximum, that is, where df / dx = 0. Because Δx = –f/f′, this leads to a horizontal tangent (division by zero), and so the next guess is x = ∞, from where it is hard to return. When this happens, you need to start your search with a different guess and pray that you do not fall into this trap again. In cases where the correction is very large but maybe not infinite, you may want to try backtracking (described below) and hope that by taking a smaller step you will not get into as much trouble.

In Figure 7.3b, we see a case where a search falls into an infinite loop surrounding the zero without ever getting there. A solution to this problem is called backtracking. As the name implies, in cases where the new guess x0 + Δx leads to an increase in the magnitude of the function, c07-math-0031, you can backtrack somewhat and try a smaller guess, say, x0 + Δx/2. If the magnitude of f still increases, then you just need to backtrack some more, say, by trying x0 + Δx/4 as your next guess, and so forth. Because you know that the tangent line leads to a local decrease in imagesfimages, eventually an acceptable small enough step should be found.

The problem in both these cases is that the initial guesses were not close enough to the regions where f(x) is approximately linear. So again, a good plot may help produce a good first guess. Alternatively, you may want to start your search with the bisection algorithm and then switch to the faster Newton–Raphson algorithm when you get closer to the zero.

c07f002

Figure 7.3 Two examples of how the Newton–Raphson algorithm may fail if the initial guess is not in the region where f(x) can be approximated by a straight line. (a) A guess lands at a local minimum/maximum, that is, a place where the derivative vanishes, and so the next guess ends up at x = ∞. (b)The search has fallen into an infinite loop. The technique know as “backtracking” could eliminate this problem.

7.3.2 Implementation: Newton–Raphson Algorithm

  1. Use the Newton–Raphson algorithm to find some energies EB that are solutions of (7.2). Compare these solutions with the ones found with the bisection algorithm.
  2. Again, notice that the 10 in (7.2) is proportional to the strength of the potential that causes the binding. See if making the potential deeper, say, by changing the 10 to a 20 or a 30, produces more or deeper bound states. (Note that in contrast to the bisection algorithm, your initial guess must be closer to the answer for the Newton–Raphson algorithm to work.)
  3. Modify your algorithm to include backtracking and then try it out on some difficult cases.
  4. Evaluate f(EB) and thus determine directly the precision of your solution.

7.4 Problem 2: Temperature Dependence of Magnetization

Problem Determine M(T) the magnetization as a function of temperature for simple magnetic materials.

A collection of N spin-1/2 particles each with the magnetic moment µ is at temperature T. The collection has an external magnetic field B applied to it and comes to equilibrium with NL particles in the lower energy state (spins aligned with the magnetic field), and with NU particles in the upper energy state (spins opposed to the magnetic field). The Boltzmann distribution law tells us that the relative probability of a state with energy E is proportional to c07-math-0033, where kB is Boltzmann’s constant. For a dipole with moment µ, its energy in a magnetic field is given by the dot product c07-math-0034. Accordingly, spin-up particle have lower energy in a magnetic field than spin-down particles, and thus are more probable.

Applying the Boltzmann distribution to our spin problem, we have that the number of particles in the lower energy level (spin up) is

(7.14)c07-math-0035

while the number of particles in the upper energy level (spin down) is

(7.15)c07-math-0036

As discussed by (Kittel, 2005), we now assume that the molecular magnetic field B = λ M is much larger than the applied magnetic field and so replace B by the molecular field. This permits us to eliminate B from the preceding equations. The magnetization M(T) is given by the individual magnetic moment µ times the net number of particles pointing in the direction of the magnetic field:

(7.16)c07-math-0038

Note that this expression appears to make sense because as the temperature approaches zero, all spins will be aligned along the direction of B and so M(T = 0) = .

Solution via Searching Equation 7.17 relates the magnetization and the temperature. However, it is not really a solution to our problem because M appears on the LHS of the equation as well as within the hyperbolic function on the RHS. Generally, a transcendental equation of this sort does not have an analytic solution that would give M as simply a function of the temperature T. But by working backward, we can find a numerical solution. To do that we first express (7.17) in terms of the reduced magnetization m, the reduced temperature t, and the Curie temperature Tc:

(7.19)c07-math-0042

While it is no easier to find an analytic solution to (7.18) than it was to (7.17), the simpler form of (7.18) makes the programming easier as we search for values of t and m that satisfy (7.18).

c07f003

Figure 7.4 A function of the reduced magnetism m at three reduced temperatures t. A zero of this function determines the value of the magnetism at a particular value of t.

One approach to a trial-and-error solution is to define a function

and then, for a variety of fixed t = ti values, search for those m values at which f(m, ti) = 0. (One could just as well fix the value of m to mj and search for the value of t for which f(mj, t) = 0; once you have a solution, you have a solution.) Each zero so found gives us a single value of m(ti). A plot or a table of these values for a range of ti values then provides the best we can do as the desired solution m(t).

Figure 7.4 shows three plots of f(m,t) as a function of the reduced magnetization m, each plot for a different value of the reduced temperature. As you can see, other than the uninteresting solution at m = 0, there is only one solution (a zero) near m = 1 for t = 0.5, and no solution at other temperatures.

7.4.1 Searching Exercise

  1. Find the root of (7.20) to six significant figures for t = 0.5 using the bisection algorithm.
  2. Find the root of (7.20) to six significant figures for t = 0.5 using the Newton–Raphson algorithm.
  3. Compare the time it takes to find the solutions for the bisection and Newton–Raphson algorithms.
  4. Construct a plot of the reduced magnetization m(t) as a function of the reduced temperature t.

7.5 Problem 3: Fitting An Experimental Spectrum

Problem The cross sections measured for the resonant scattering of neutrons from a nucleus are given in Table 7.1. Your problem is to determine values for the cross sections at energy values lying between those in the table.

You can solve this problem in a number of ways. The simplest is to numerically interpolate between the values of the experimental f(Ei) given in Table 7.1. This is direct and easy but does not account for there being experimental noise in the data. A more appropriate solution (discussed in Section 7.7) is to find the best fit of a theoretical function to the data. We start with what we believe to be the “correct” theoretical description of the data,

Table 7.1 Experimental values for a scattering cross section (f(E) in the theory), each with absolute error ±σi, as a function of energy (xi in the theory).

i = 1 2 3 4 5 6 7 8 9
Ei (MeV) 0 25 50 75 100 125 150 175 200
g(Ei) (mb) 10.6 16.0 45.0 83.5 52.8 19.9 10.8 8.25 4.7
Error (mb) 9.34 17.9 41.5 85.5 51.5 21.5 10.8 6.29 4.14
(7.21)c07-math-0046

where fr, Er, and Γ are unknown parameters. We then adjust the parameters to obtain the best fit. This is a best fit in a statistical sense but in fact may not pass through all (or any) of the data points. For an easy, yet effective, introduction to statistical data analysis, we recommend (Bevington and Robinson, 2002).

These two techniques of interpolation and least-squares fitting are powerful tools that let you treat tables of numbers as if they were analytic functions and sometimes let you deduce statistically meaningful constants or conclusions from measurements. In general, you can view data fitting as global or local. In global fits, a single function in x is used to represent the entire set of numbers in a table like Table 7.1. While it may be spiritually satisfying to find a single function that passes through all the data points, if that function is not the correct function for describing the data, the fit may show nonphysical behavior (such as large oscillations) between the data points. The rule of thumb is that if you must interpolate, keep it local and view global interpolations with a critical eye.

Consider Table 7.1 as ordered data that we wish to interpolate. We call the independent variable x and its tabulated values xi(i = 1, 2, …), and assume that the dependent variable is the function g(x), with the tabulated values gi = g(xi). We assume that g(x) can be approximated as an (n – 1)-degree polynomial in each interval i:

Because our fit is local, we do not assume that one g(x) can fit all the data in the table but instead use a different polynomial, that is, a different set of ai values, for each interval. While each polynomial is of low degree, multiple polynomials are needed to span the entire table. If some care is taken, the set of polynomials so obtained will behave well enough to be used in further calculations without introducing much unwanted noise or discontinuities.

The classic interpolation formula was created by Lagrange. He figured out a closed-form expression that directly fits the (n – 1)-order polynomial (7.22) to n values of the function g(x) evaluated at the points xi . The formula for each interval is written as the sum of polynomials:

(7.24)c07-math-0050

For three points, (7.23) provides a second-degree polynomial, while for eight points it gives a seventh-degree polynomial. For example, assume that we are given the points and function values

(7.25)c07-math-0051

With four points, the Lagrange formula determines a third-order polynomial that reproduces each of the tabulated values:

(7.26)c07-math-0052

As a check, we see that

(7.27)c07-math-0053

If the data contain little noise, this polynomial can be used with some confidence within the range of the data, but with risk beyond the range of the data.

Notice that Lagrange interpolation has no restriction that the points xi be evenly spaced. Usually, the Lagrange fit is made to only a small region of the table with a small value of n, despite the fact that the formula works perfectly well for fitting a high-degree polynomial to the entire table. The difference between the value of the polynomial evaluated at some x and that of the actual function can be shown to be the remainder

(7.28)c07-math-0054

where ζ lies somewhere in the interpolation interval. What significant here is that we see that if significant high derivatives exist in g(x), then it cannot be approximated well by a polynomial. For example, a table of noisy data would have significant high derivatives.

7.5.1 Lagrange Implementation, Assessment

Consider the experimental neutron scattering data in Table 7.1. The expected theoretical functional form that describes these data is (7.21), and our empirical fits to these data are shown in Figure 7.5.

  1. Write a subroutine to perform an n-point Lagrange interpolation using (7.23). Treat n as an arbitrary input parameter. (You may also perform this exercise with the spline fits discussed in Section 7.5.2.)
  2. Use the Lagrange interpolation formula to fit the entire experimental spectrum with one polynomial. (This means that you must fit all nine data points with an eight-degree polynomial.) Then use this fit to plot the cross section in steps of 5MeV.
  3. Use your graph to deduce the resonance energy Er (your peak position) and Γ (the full-width at half-maximum). Compare your results with those predicted by a theorist friend, (Er, Γ) = (78, 55) MeV.
  4. A more realistic use of Lagrange interpolation is for local interpolation with a small number of points, such as three. Interpolate the preceding cross-sectional data in 5-MeV steps using three-point Lagrange interpolation for each interval. (Note that the end intervals may be special cases.)
  5. We deliberately have not discussed extrapolation of data because it can lead to serious systematic errors; the answer you get may well depend more on the function you assume than on the data you input. Add some adventure to your life and use the programs you have written to extrapolate to values outside Table 7.1. Compare your results to the theoretical Breit–Wigner shape (7.21).

This example shows how easy it is to go wrong with a high-degree-polynomial fit to data with errors. Although the polynomial is guaranteed to pass through all the data points, the representation of the function away from these points can be quite unrealistic. Using a low-order interpolation formula, say, n = 2 or 3, in each interval usually eliminates the wild oscillations, but may not have any theoretical justification. If these local fits are matched together, as we discuss in the next section, a rather continuous curve results. Nonetheless, you must recall that if the data contain errors, a curve that actually passes through them may lead you astray. We discuss how to do this properly with least-squares fitting in Section 7.7.

7.5.2 Cubic Spline Interpolation (Method)

If you tried to interpolate the resonant cross section with Lagrange interpolation, then you saw that fitting parabolas (three-point interpolation) within a table may avoid the erroneous and possibly catastrophic deviations of a high-order formula. (A two-point interpolation, which connects the points with straight lines, may not lead you far astray, but it is rarely pleasing to the eye or precise.) A sophisticated variation of an n = 4 interpolation, known as cubic splines, often leads to surprisingly eye-pleasing fits. In this approach (Figure 7.5), cubic polynomials are fit to the function in each interval, with the additional constraint that the first and second derivatives of the polynomials be continuous from one interval to the next. This continuity of slope and curvature is what makes the spline fit particularly eye-pleasing. It is analogous to what happens when you use the flexible spline drafting tool (a lead wire within a rubber sheath) from which the method draws its name.

c07f004

Figure 7.5 Three fits to data. Dashed: Lagrange interpolation using an eight-degree polynomial; Short dashes: cubic splines fit ; Long dashed: Least-squares parabola fit.

The series of cubic polynomials obtained by spline-fitting a table of data can be integrated and differentiated and is guaranteed to have well-behaved derivatives. The existence of meaningful derivatives is an important consideration. As a case in point, if the interpolated function is a potential, you can take the derivative to obtain the force. The complexity of simultaneously matching polynomials and their derivatives over all the interpolation points leads to many simultaneous linear equations to be solved. This makes splines unattractive for hand calculation, yet easy for computers and, not surprisingly, popular in both calculations and computer drawing programs. To illustrate, the smooth solid curve in Figure 7.5 is a spline fit.

The basic approximation of splines is the representation of the function g(x) in the subinterval [xi, xi+1] with a cubic polynomial:

(7.29)c07-math-0056

This representation makes it clear that the coefficients in the polynomial equal the values of g(x) and its first, second, and third derivatives at the tabulated points xi. Derivatives beyond the third vanish for a cubic. The computational chore is to determine these derivatives in terms of the N tabulated gi values. The matching of gi at the nodes that connect one interval to the next provides the equations

(7.31)c07-math-0058

The matching of the first and second derivatives at each interval’s boundaries provides the equations

(7.32)c07-math-0059

The additional equations needed to determine all constants are obtained by matching the third derivatives at adjacent nodes. Values for the third derivatives are found by approximating them in terms of the second derivatives:

As discussed in Chapter 5, a central-difference approximation would be more accurate than a forward-difference approximation, yet (7.33) keeps the equations simpler.

It is straightforward yet complicated to solve for all the parameters in (7.30). We leave that to the references (Thompson, 1992; Press et al., 1994). We can see, however, that matching at the boundaries of the intervals results in only (N – 2) linear equations for N unknowns. Further input is required. It usually is taken to be the boundary conditions at the endpoints a = x1 and b = xN, specifically, the second derivatives g″(a) and g″(b). There are several ways to determine these second derivatives:

Natural spline: Set g″(a) = g″(b) = 0; that is, permit the function to have a slope at the endpoints but no curvature. This is “natural” because the derivative vanishes for the flexible spline drafting tool (its ends being free).

Input values for g′ at the boundaries: The computer uses g′(a) to approximate g″(a). If you do not know the first derivatives, you can calculate them numerically from the table of gi values.

Input values for g″ at the boundaries: Knowing values is of course better than approximating values, but it requires the user to input information. If the values of g″ are not known, they can be approximated by applying a forward-difference approximation to the tabulated values:

(7.34)c07-math-0061

7.5.2.1 Cubic Spline Quadrature (Exploration)

A powerful integration scheme is to fit an integrand with splines and then integrate the cubic polynomials analytically. If the integrand g(x) is known only at its tabulated values, then this is about as good an integration scheme as is possible; if you have the ability to calculate the function directly for arbitrary x, Gaussian quadrature may be preferable. We know that the spline fit to g in each interval is the cubic (7.30)

(7.35)c07-math-0062

It is easy to integrate this to obtain the integral of g for this interval and then to sum over all intervals:

(7.37)c07-math-0064

Making the intervals smaller does not necessarily increase precision, as subtractive cancelations in (7.36) may get large.

Spline Fit of Cross Section (Implementation) Fitting a series of cubics to data is a little complicated to program yourself, so we recommend using a library routine. While we have found quite a few Java-based spline applications available on the Internet, none seemed appropriate for interpreting a simple set of numbers. That being the case, we have adapted the splint.c and the spline.c functions from (Press et al., 1994) to produce the SplineInteract.py program shown in Listing 7.3 (there is also an applet). Your problem for this section is to carry out the assessment in Section 7.5.1 using cubic spline interpolation rather than Lagrange interpolation.

7.6 Problem 4: Fitting Exponential Decay

Problem Figure 7.6 presents actual experimental data on the number of decays ΔN of the π meson as a function of time (Stetz et al., 1973). Notice that the time has been “binned” into Δt = 10 ns intervals and that the smooth curve is the theoretical exponential decay expected for very large numbers of pions (which there is not). Your problem is to deduce the lifetime τ of the π meson from these data (the tabulated lifetime of the pion is 2.6 × 10–8 s).

Theory Assume that we start with N0 particles at time t = 0 that can decay to other particles.3) If we wait a short time Δt, then a small number ΔN of the particles will decay spontaneously, that is, with no external influences. This decay is a stochastic process, which means that there is an element of chance involved in just when a decay will occur, and so no two experiments are expected to give exactly the same results. The basic law of nature for spontaneous decay is that the number of decays ΔN in a time interval Δt is proportional to the number of particles N(t) present at that time and to the time interval

(7.38)c07-math-0065

Listing 7.3 SplineInteract.py performs a cubic spline fit to data and permits interactive control. The arrays x[] and y[] are the data to fit, and the values of the fit at Nfit points are output.

c07f004
c07f005

Figure 7.6 A reproduction of the experimental measurement by Stetz et al. (1973) giving the number of decays of a π meson as a function of time since its creation. Measurements were made during time intervals (box sizes) of 10-ns width. The dashed curve is the result of a linear least-squares fit to the log N(t).

Here τ = 1/λ is the lifetime of the particle, with λ the rate parameter. The actual decay rate is given by the second equation in (7.38). If the number of decays ΔN is very small compared to the number of particles N, and if we look at vanishingly small time intervals, then the difference equation (7.38) becomes the differential equation

(7.39)c07-math-0067

This differential equation has an exponential solution for the number as well as for the decay rate:

(7.40)c07-math-0068

Equation 7.40 is the theoretical formula we wish to “fit” to the data in Figure 7.6. The output of such a fit is a “best value” for the lifetime τ.

7.7 Least-Squares Fitting (Theory)

Books have been written and careers have been spent discussing what is meant by a “good fit” to experimental data. We cannot do justice to the subject here and refer the reader to Bevington and Robinson (2002); Press et al. (1994); Thompson (1992). However, we will emphasize three points:

  1. If the data being fit contain errors, then the “best fit” in a statistical sense should not pass through all the data points.
  2. If the theory is not an appropriate one for the data (e.g., the parabola in Figure 7.5), then its best fit to the data may not be a good fit at all. This is good, for this is how we know that the theory is not right.
  3. Only for the simplest case of a linear least-squares fit, can we write down a closed-form solution to evaluate and obtain the fit. More realistic problems are usually solved by trial-and-error search procedures, sometimes using sophisticated subroutine libraries. However, in Section 7.8.2 we show how to conduct such a nonlinear search using familiar tools.

Imagine that you have measured ND data values of the independent variable y as a function of the dependent variable x:

(7.41)c07-math-0069

where ±σi is the experimental uncertainty in the ith value of y. (For simplicity we assume that all the errors σi occur in the dependent variable, although this is hardly ever true (Thompson, 1992)). For our problem, y is the number of decays as a function of time, and xi are the times. Our goal is to determine how well a mathematical function y = g(x) (also called a theory or a model) can describe these data. Additionally, if the theory contains some parameters or constants, our goal can be viewed as determining the best values for these parameters. We assume that the theory function g(x) contains, in addition to the functional dependence on x,an additional dependence upon MP parameters {a1, a2, …, aMP}. Notice that the parameters {am} are not variables, in the sense of numbers read from a meter, but rather are parts of the theoretical model, such as the size of a box, the mass of a particle, or the depth of a potential well. For the exponential decay function (7.40), the parameters are the lifetime τ and the initial decay rate dN(0)/dt.We include the parameters as

(7.42)c07-math-0070

where the ai’s are parameters and x the independent variable. We use the chisquare (χ2) measure (Bevington and Robinson, 2002) as a gauge of how well a theoretical function g reproduces data:

where the sum is over the ND experimental points (xi, yi ±σi). The definition (7.43) is such that smaller values of χ2 are better fits, with χ2 = 0 occurring if the theoretical curve went through the center of every data point. Notice also that the c07-math-0072weighting means that measurements with larger errors4) contribute less to χ2.

Least-squares fitting refers to adjusting the parameters in the theory until a minimum in χ2 is found, that is, finding a curve that produces the least value for the summed squares of the deviations of the data from the function g(x). In general, this is the best fit possible and the best way to determine the parameters in a theory. The MP parameters {am, m = 1, MP} that make χ2 an extremum are found by solving the MP equations:

Often, the function g(x; {am}) has a sufficiently complicated dependence on the am values for (7.44) to produce MP simultaneous nonlinear equations in the am values. In these cases, solutions are found by a trial-and-error search through the MP-dimensional parameter space, as we do in Section 7.8.2. To be safe, when such a search is completed, you need to check that the minimum χ2 you found is global and not local. One way to do that is to repeat the search for a whole grid of starting values, and if different minima are found, to pick the one with the lowest χ2.

7.7.1 Least-Squares Fitting: Theory and Implementation

When the deviations from theory are as a result of random errors and when these errors are described by a Gaussian distribution, there are some useful rules of thumb to remember (Bevington and Robinson, 2002). You know that your fit is good if the value of χ2 calculated via the definition (7.43) is approximately equal to the number of degrees of freedom χ2 imagesNDMp where ND is the number of data points and MP is the number of parameters in the theoretical function. If your χ2 is much less than NDMP, it does not mean that you have a “great” theory or a really precise measurement; instead, you probably have too many parameters or have assigned errors (σi values) that are too large. In fact, too small χ2 may indicate that you are fitting the random scatter in the data rather than missing approximately one-third of the error bars, as expected if the errors are random. If your χ2 is significantly greater than NDMP, the theory may not be good, you may have significantly underestimated your errors, or you may have errors that are not random.

The MP simultaneous equations (7.44) can be simplified considerably if the functions g(x; {am}) depend linearly on the parameter values ai, for example,

(7.45)c07-math-0077
c07f006.jpg

Figure 7.7 A linear least-squares best fit of a straight line to data. The deviation of theory from experiment is greater than would be expected from statistics, which means that a straight line is not a good theory to describe these data.

In this case (also known as linear regression or straight-line fit),as shown in Figure 7.7, there are Mp = 2 parameters, the slope a2, and the y intercept a1. Notice that while there are only two parameters to determine, there still may be an arbitrary number ND of data points to fit. Remember, a unique solution is not possible unless the number of data points is equal to or greater than the number of parameters. For this linear case, there are just two derivatives,

(7.46)c07-math-0078

and after substitution, the χ2 minimization equations (7.44) can be solved (Press et al., 1994):

(7.48)c07-math-0080

Statistics also gives you an expression for the variance or uncertainty in the deduced parameters:

(7.50)c07-math-0082

This is a measure of the uncertainties in the values of the fitted parameters arising from the uncertainties σi in the measured yi values. A measure of the dependence of the parameters on each other is given by the correlation coefficient:

(7.51)c07-math-0083

Here cov(a1, a2) is the covariance of a1 and a2 and vanishes if a1 and a2 are independent. The correlation coefficient ρ(a1, a2)lies in the range −1 ≤ ρ ≤ 1, with a positive ρ indicating that the errors in a1 and a2 are likely to have the same sign, and a negative ρ indicating opposite signs.

The preceding analytic solutions for the parameters are of the form found in statistics books but are not optimal for numerical calculations because subtractive cancelation can make the answers unstable. As discussed in Chapter 3, a rearrangement of the equations can decrease this type of error. For example, Thompson (1992) gives improved expressions that measure the data relative to their averages:

(7.52)c07-math-0084

In Fit.py in Listing 7.4 we give a program that fits a parabola to some data. You can use it as a model for fitting a line to data, although you can use our closed-form expressions for a straight-line fit. In Fit.py on the instructor’s site, we give a program for fitting to the decay data.

7.8 Exercises: Fitting Exponential Decay, Heat Flow and Hubble’s Law

  1. Fit the exponential decay law (7.40) to the data in Figure 7.6. This means finding values for τ and ΔN(0)/Δt that provide a best fit to the data, and then judging how good the fit is.
    1. a) Construct a table of approximate values for(ΔNti, ti), for i =1, ND as read from Figure 7.6. Because time was measured in bins, ti should correspond to the middle of a bin.
    2. b) Add an estimate of the error σi to obtain a table of the form (ΔNti ± σi,ti). You can estimate the errors by eye, say, by estimating how much the histogram values appear to fluctuate about a smooth curve, or you can take c07-math-0085 (This last approximation is reasonable for large numbers, which this is not.)
    3. c) In the limit of very large numbers, we would expect a plot of ln |dN/dt|vs. t to be a straight line:
      (7.53)c07-math-0086

      This means that if we treat ln |ΔN(t)/Δt| as the dependent variable and time Δt as the independent variable, we can use our linear-fit results. Plot ln|ΔNt| vs. Δt.

    4. d) Make a least-squares fit of a straight line to your data and use it to determine the lifetime τ of the π meson. Compare your deduction to the tabulated lifetime of 2.6 × 10−8 s and comment on the difference.
    5. e) Plot your best fit on the same graph as the data and comment on the agreement.
    6. f) Deduce the goodness of fit of your straight line and the approximate error in your deduced lifetime. Do these agree with what your “eye” tells you?
    7. g) Now that you have a fit, look at the data again and estimate what a better value for the errors in the ordinates might be.
  2. Table 7.2 gives the temperature T along a metal rod whose ends are kept at a fixed constant temperature. The temperature is a function of the distance x along the rod.
    1. a) Plot the data in Table 7.3 to verify the appropriateness of a linear relation
      (7.54)c07-math-0087
    2. b) Because you are not given the errors for each measurement, assume that the least significant figure has been rounded off and so σ ≥ 0.05.
    3. c) Use that to compute a least-squares straight-line fit to these data.
    4. d) Plot your best a + bx on the curve with the data.
    5. e) After fitting the data, compute the variance and compare it to the deviation of your fit from the data. Verify that about one-third of the points miss the σ error band (that is what is expected for a normal distribution of errors).
    6. f) Use your computed variance to determine the χ2 of the fit. Comment on the value obtained.
    7. g) Determine the variances σa and σb and check whether it makes sense to use them as the errors in the deduced values for a and b.
  3. In 1929, Edwin Hubble examined the data relating the radial velocity v of 24 extra galactic nebulae to their distance r from our galaxy (Hubble, 1929). Although there was considerable scatter in the data, he fit them with a straight line:
    (7.55)c07-math-0088
    where H is now called the Hubble constant. Table 7.3 contains the distances and velocities used by Hubble.
    1. a) Plot the data to verify the appropriateness of a linear relation
      (7.56)c07-math-0089
    2. b) Because you are not given the errors for each measurement, you may assume that the least significant figure has been rounded off and so σ ≥ 1. Or, you may assume that astronomical measurements are hard to make and that there are at least 10% errors in the data.

      Table 7.2 Temperature vs. distance as measured along a metal rod.

      xi (cm) 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
      Ti (C) 14.6 18.5 36.6 30.8 59.2 60.1 62.2 79.4 99.9

      Table 7.3 Distance vs. radial velocity for 24 extragalactic nebulae.

      Object r (Mpc) v (km/s) Object r (Mpc) v (km/s)
      0.032 170 3627 0.9 650
      0.034 290 4826 0.9 150
      6822 0.214 –130 5236 0.9 500
      598 0.263 –70 1068 1.0 920
      221 0.275 –185 5055 1.1 450
      224 0.275 –220 7331 1.1 500
      5457 0.45 200 4258 1.4 500
      4736 0.5 290 4141 1.7 960
      5194 0.5 270 4382 2.0 500
      4449 0.63 200 4472 2.0 850
      4214 0.8 300 4486 2.0 800
      3031 0.9 –30 4649 2.0 1090
    3. c) Compute a least-squares straight-line fit to these data.
    4. d) Plot your best a + Hr on the curve with the data.
    5. e) After fitting the data, compute the variance and compare it to the deviation of your fit from the data. Verify that about one-third of the points miss the σ error band (that is what is expected for a normal distribution of errors).
    6. f) Use your computed variance to determine the χ2 of the fit. Comment on the value obtained.
    7. g) Determine the variances σa and σb and check whether it makes sense to use them as the errors in the deduced values for a and b.
    8. h) Now that you have a fit, look at the data again and estimate what a better value for the errors in the ordinates might be.

7.8.1 Linear Quadratic Fit

As indicated earlier, as long as the function being fitted depends linearly on the unknown parameters ai, the condition of minimum χ2 leads to a set of simultaneous linear equations for the a’s that can be solved by hand or on the computer using matrix techniques. To illustrate, suppose we want to fit the quadratic polynomial

to the experimental measurements (xi, yi, i= l, ND)(Figure 7.8). Because this g(x) is linear in all the parameters ai, we can still make a linear fit although x is raised to the second power. (However, if we tried to a fit a function of the form g(x) = (a1 + a2x) exp(−a3x) to the data, then we would not be able to make a linear fit because there is not a linear dependence on a3)

c07f007.jpg

Figure 7.8 A linear least-squares best fit of a parabola to data. Here we see that the fit misses approximately one-third of the points, as expected from the statistics for a good fit.

The best fit of this quadratic to the data is obtained by applying the minimum χ2 condition (7.44) for Mp = 3 parameters and ND (still arbitrary) data points. A solution represents the maximum likelihood that the deduced parameters provide a correct description of the data for the theoretical function g(x). Equation 7.44 leads to the three simultaneous equations for a1, a2,and a3:

(7.59)c07-math-0092

Note: Because the derivatives are independent of the parameters (the a’s), the a dependence arises only from the term in square brackets in the sums, and because that term has only a linear dependence on the a’s, these equations are linear in the a’s.

Exercise Show that after some rearrangement, (7.58)(7.60) can be written as

Here the definitions of the S‘s are simple extensions of those used in (7.47)(7.49) and are programmed in Fit.py shown in Listing 7.4. After placing the three unknown parameters into a vector x and the known three RHS terms in (7.61) into a vector b, these equations assume the matrix form

(7.62)c07-math-0095

The solution for the parameter vector a is obtained by solving the matrix equations. Although for 3 × 3 matrices, we can write out the solution in a closed form, for larger problems the numerical solution requires matrix methods.

Listing 7.4 Fit.py performs a least-squares fit of a parabola to data using the NumPy linalg package to solve the set of linear equations Sa = s.

c07f005

Linear Quadratic Fit Assessment

  1. Fit the quadratic (7.57) to the following data sets [given as (x1, y1), (x2, y2),…] In each case, indicate the values found for the a’s the number of degrees of freedom, and the value of χ2.
    1. a) (0,1)
    2. b) (0,1), (1,3)
    3. c) (0,1), (1,3), (2, 7)
    4. d) (0,1), (1,3), (2, 7), (3,15)
  2. Find a fit to the last set of data to the function
    (7.63)c07-math-0096
    Hint: A judicious change of variables will permit you to convert this to a linear fit. Does a minimum χ2 still have meaning here?

7.8.2 Problem 5: Nonlinear Fit to a Breit–Wigner

Problem Remember how earlier in this chapter we interpolated the values in Table 7.1 in order to obtain the experimental cross section as a function of energy. Although we did not use it, we also gave the theory describing these data, namely, the Breit-Wigner resonance formula (7.21):

Your problem is to determine what values for the parameters (Er, fr, and Γ in (7.64) provide the best fit to the data in Table 7.1.

Because (7.64) is not a linear function of the parameters (Er, Σ0, Γ), the three equations that result from minimizing χ2 are not linear equations and so cannot be solved by the techniques of linear algebra (matrix methods). However, in our study of the masses on a string problem, we showed how to use the Newton–Raphson algorithm to search for solutions of simultaneous nonlinear equations. That technique involved expansion of the equations about the previous guess to obtain a set of linear equations and then solving the linear equations with the matrix libraries. We now use this same combination of fitting, trial-and-error searching, and matrix algebra to conduct a nonlinear least-squares fit of (7.64) to the data in Table 7.1.

Recall that the condition for a best fit is to find values of the Mp parameters am in the theory g(x, am) that minimizec07-math-0098. This leads to the Mp equations (7.44) to solve

To find the form of these equations appropriate to our problem, we rewrite our theory function (7.64) in the notation of (7.65):

(7.66)c07-math-0100
(7.67)c07-math-00101

The three derivatives required in (7.65) are then

(7.68)c07-math-0102

Substitution of these derivatives into the best-fit condition (7.65) yields three simultaneous equations in a1, a2, and a3 that we need to solve in order to fit the ND = 9 data points (xi, yi) in Table 7.1:

Even without the substitution of (7.64) for g(x, a), it is clear that these three equations depend on the a’s in a nonlinear fashion. That is okay because in Section 6.1.2 we derived the N-dimensional Newton–Raphson search for the roots of

(7.70)c07-math-0104

where we have made the change of variable yiai for the present problem. We use that same formalism here for the N = 3 Equation 7.69 by writing them as

(7.71)c07-math-0105
(7.72)c07-math-0106
(7.73)c07-math-0107

Because fra1 is the peak value of the cross section, ERa2 is the energy at which the peak occurs, and c07-math-0110 is the full width of the peak at half-maximum, good guesses for the a’s can be extracted from a graph of the data. To obtain the nine derivatives of the three f’s with respect to the three unknown a’s, we use two nested loops over i and j, along with the forward-difference approximation for the derivative

(7.74)c07-math-0111

where Δaj corresponds to a small, say ≤1%, change in the parameter value.

Nonlinear Fit Implementation Use the Newton–Raphson algorithm as outlined in Section 7.8.2 to conduct a nonlinear search for the best-fit parameters of the Breit–Wigner theory (7.64) to the data in Table 7.1. Compare the deduced values of (fr, ER, Γ) to that obtained by inspection of the graph.