Chapter 2. Analyzing Algorithms

This chapter introduces the terminology and notation used by theoreticians and practitioners alike in modeling the performance of algorithms in terms of computational performance and resource usage. When evaluating the runtime performance of your software program, you might be perfectly satisfied, in which case you can continue to use the application as is. But if you want to improve runtime performance, this book shows you where to start—with the program’s data structures and algorithms. You are faced with some specific questions:

Am I solving a specific problem in the most efficient way?: There may be other algorithms that would significantly improve performance.
Am I implementing an algorithm in the most efficient way?: There can be hidden performance costs that can be eliminated.
Should I just buy a faster computer?: The exact same program will have different runtime performance based on the computer on which it runs. In this chapter, I explain how computer scientists have developed analysis techniques that account for regular hardware improvements.

I start by showing how to model the runtime performance of a program on ever-increasing problem instance sizes. The runtime performance of an algorithm on small problem instance sizes can be hard to measure accurately because it could be sensitive to the actual values in the problem instance or to the resolution of the computer timers. Once your program processes a sufficiently large problem instance, you can develop models to classify its runtime behavior using empirical models.

Using Empirical Models to Predict Performance

I’d like to start with an example that shows how theoretical analysis is decidedly practical in real software systems. Imagine you are in charge of designing an application as part of a nightly batch job to process a large data set each night; the task is launched at midnight and must complete before 6:00 a.m. The data set contains several million values and is expected to double in size over the next five years.

You have built a working prototype but have only tested it on multiple small data sets, with 100, 1,000, and 10,000 values each. Table 2-1 presents the runtime performance of your prototype on these data sets.

Table 2-1. Prototype runtime performance
N	Time (seconds)
100	`0.063`
1,000	`0.565`
10,000	`5.946`

Can these preliminary results predict the performance of your prototype on larger problem instances, such as 100,000 or even 1,000,000? Let’s build mathematical models from this data alone to define a function T(N) that predicts the runtime performance for a given problem instance size. An accurate model will compute a T(N) that is close to the three values in Table 2-1, as well as predict higher values of N, as shown in Table 2-2 (which repeats these three time results in brackets).

You may have used a software tool, such as Maple or Microsoft Excel, to compute a trendline (also called a line of best fit) for sample data. The popular SciPy library for mathematics, science, and engineering can develop these trendline models. Listing 2-1 uses scipy to try to find a linear model, TL(N) = a × N + b, where a and b are constants. curve_fit() will return the (a, b) coefficients to use with the linear model based on the available empirical data encoded in lists xs and ys.

Listing 2-1. Calculate models based on partial data

import numpy as np
from scipy.optimize import curve_fit

def linear_model(n, a, b):
  return a*n + b

# Sample data
xs = [100, 1000, 10000]
ys = [0.063, 0.565, 5.946]

# Coefficients are returned as first argument
[(a,b), _]   = curve_fit(linear_model, np.array(xs), np.array(ys))
print('Linear = {}*N + {}'.format(a, b))

The resulting model is the formula TL(N) = 0.000596 × N – 0.012833. As you can see from Table 2-2, this model is inaccurate because as the problem size increases, it significantly underestimates the actual runtime performance of the prototype. Another possible model is a quadratic polynomial, where N is raised to the power of 2:

def quadratic_model(n, a, b):
  return a*n*n + b*n;

With quadratic_model, the goal is to find TQ(N) = a × N² + b × N, where a and b are constants. Using the approach in Listing 2-1, the formula is TQ(N) = 0.000000003206 × N² + 0.000563 × N. Table 2-2 shows that as the problem size increases, this model significantly overestimates the actual runtime performance, so it is also not accurate.

Note

Many of these constants are quite small, like 0.000000003206, which is 3.206 × 10^–9. The reason is that the problems solved by algorithms involve problem instances where N = 1,000,000 or higher. Note that (1,000,000)² = 10¹², so be prepared to see both very small and very large constants.

The final column in Table 2-2 contains the predicted result using a third mathematical model, TN(N) = a × N × log(N), which uses the logarithm function (in base 2) and in which a is a constant. The result is TN(N) = 0.0000448 × N × log(N). For N = 10,000,000, the estimate computed by TN(N) is within 5% of the actual value.

Table 2-2. Comparing different mathematical models against actual performance
N	Time (seconds)	TL	TQ	TN
100	`[0.063]`	`0.047`	`0.056`	`0.030`
1,000	`[0.565]`	`0.583`	`0.565`	`0.447`
10,000	`[5.946]`	`5.944`	`5.946`	`5.955`
100,000	`65.391`	`59.559`	`88.321`	`74.438`
1,000,000	`860.851`	`595.708`	`3769.277`	`893.257`
10,000,000	`9879.44`	`5957.194`	`326299.837`	`10421.327`

The linear model, TL, underestimates the total time, while the quadratic model, TQ, overestimates it. For N = 10,000,000, TL declares it will take 5,957 seconds (about 100 minutes), but TQ declares it will take 326,300 seconds (about 91 hours). TN does a better job in predicting the performance, estimating 10,421 seconds (about 2.9 hours) against the actual performance of 9,879 seconds (2.75 hours).

The prototype completes its processing overnight—that’s a relief!—but you must review the code for your prototype to see the algorithms and data structures it employs, so you can guarantee this result regardless of the problem instance being solved.

Why does the formula a × N × log(N) model the behavior so well? It has to do with the fundamental algorithms used within your prototype application. These three kinds of models—linear, quadratic, and N log N—appear regularly when analyzing algorithms. Let’s try one more example to demonstrate a surprising result discovered about fifty years ago.2

Multiplication Can Be Faster

Consider two examples, shown in Listing 2-2, for multiplying two N-digit integers using an algorithm most of us learned in grade school. While I do not precisely define this algorithm, you see that it creates N products, listed below the original numbers, which are totaled to compute the final answer.

Listing 2-2. Using grade-school algorithm to multiply two N-digit integers

   456                  123456
 x 712                x 712835
   ---                  ------
   912                  617280
  456                  370368
3192                  987648
------               246912
324672              123456
                   864192
                   -----------
                   88003757760

When multiplying two 3-digit integers, you need 9 single-digit multiplications. For 6-digit integers, you need 36 single-digit multiplications. Using this algorithm with two N-digit integers requires N² single-digit multiplications. Another observation is that when you double the number of digits in the integers being multiplied, you need four times as many single-digit multiplications. I’m not even counting all the other work to do (like additions) because single-digit multiplication is the key operation.

A computer’s CPU provides efficient operations to multiply fixed-size 32-bit or 64-bit integers, but it has no ability to deal with larger integers. Python automatically upgrades large integer values to a Bignum structure, which allows integers to grow to any size necessary. This means you can measure the runtime performance when multiplying two N-digit numbers. Table 2-3 derives three models based on the first five rows of timing results of multiplying two N-digit integers (shown in brackets).

Table 2-3. Multiplying two N-digit integers
N	Time (seconds)	TL	TQ	Karatsuba	TKN
256	`[0.0009]`	`-0.0045`	`0.0017`	`0.0010`	`0.0009`
512	`[0.0027]`	`0.0012`	`0.0038`	`0.0031`	`0.0029`
1,024	`[0.0089]`	`0.0126`	`0.0096`	`0.0094`	`0.0091`
2,048	`[0.0280]`	`0.0353`	`0.0269`	`0.0282`	`0.0278`
4,096	`[0.0848]`	`0.0807`	`0.0850`	`0.0846`	`0.0848`
8,192	`0.2524`	`0.1716`	`0.2946`	`0.2539`	`0.2571`
16,384	`0.7504`	`0.3534`	`1.0879`	`0.7617`	`0.7765`
32,768	`2.2769`	`0.7170`	`4.1705`	`2.2851`	`2.3402`
65,536	`6.7919`	`1.4442`	`16.3196`	`6.8554`	`7.0418`
131,072	`20.5617`	`2.8985`	`64.5533`	`20.5663`	`21.1679`
262,144	`61.7674`	`5.8071`	`256.7635`	`61.6990`	`63.5884`

TL is a linear model, while TQ is a quadratic model. Karatsuba is the unusual formula a × N^1.585, and the improved model TKN(N) = a × N^1.585 + b × N, where a and b are constants.³ TL significantly underestimates the time. TQ significantly overestimates the time, which is surprising since earlier intuition suggests that when N doubles, the time to perform should increase fourfold, an essential characteristic of quadratic models. These other models more accurately predict the performance of multiplying N-digit integers in Python, which uses an advanced, more efficient Karatsuba multiplication algorithm for large integers.

The approach used to generate Table 2-2 and Table 2-3 is a good start, but it is limited since it is indirect, based only on runtime performance and not by reviewing the code. Throughout this book, I will describe the implementations of the algorithms, and based on the structure of the code, I can identify the appropriate formula to model the performance of an algorithm.

Performance Classes

When different algorithms solve the exact same problem, it is sometimes possible to identify which one will be the most efficient simply by classifying its performance using mathematical models. Often algorithms are described using phrases like “Complexity is O(N²)” or “Worst-case performance is O(N log N).” To explain this terminology, I want to start with the image in Figure 2-1, which might be a familiar one if you have read a book, or an online resource, that discusses algorithm analysis.

The goal is to find a model that predicts the worst runtime performance for a given problem instance, N. In mathematics, this is known as an upper bound—think of the phrase “the algorithm will never work harder than this.” A corresponding concept, the lower bound, represents the minimum runtime performance—in other words, “the algorithm must always work at least this much.”

To explain the concepts of lower and upper bounds, consider how a car’s speedometer is a model that computes an indicated speed as an approximation of the car’s true speed. The indicated speed must never be less than the true speed so the driver can abide by speed limits. This represents the mathematical concept of a lower bound. On the high end, the indicated speed is allowed to be up to 110% of the true speed plus 4 kilometers per hour.4 This represents the mathematical concept of an upper bound.

The true speed for the car must always be larger than the lower bound and smaller than the upper bound.

In Figure 2-1, the three curves—TL, TQ, and TKN—represent the model predictions, while black squares represent the individual actual performance when multiplying two N-digit integers. While TQ(N) is an upper bound on the actual performance (since TQ(N) > Time for all values of N), it is highly inaccurate, as you can see from Figure 2-1.

If you go back and review Table 2-3, observe that for all N values greater than or equal to the threshold problem instance size of 8,192, TKN(N) is greater than the actual performance reported for N, while remaining much closer to the actual values. This evidence is a clear indication that often the real behavior will stabilize once N is “large enough,” which will depend on each algorithm and the way it is implemented.

It might seem that TL(N) models the lower bound on the actual performance, since TL(N) < Time for all values of N. However, as N increases, it becomes further and further away from the runtime performance, essentially rendering it useless as a model of the algorithm’s runtime performance. The Karatsuba formula a × N^1.585, whose values appear in Table 2-3, provide a more accurate lower bound.

If you run these programs on different computers, the numeric details shown in Table 2-3 will change—the runtime performance can be slower or faster; the a and b coefficients of TK() will change; the threshold problem instance size above which TK(N) stabilizes could be lower or higher. What would remain unchanged is the exponent 1.585 in the model, since the structure of the Karatsuba fast multiplication algorithm determines how it will perform. No supercomputer will somehow make the Karatsuba implementation suddenly behave in a way modeled by the linear TL(N) model.

We can now tackle asymptotic analysis, an approach that lets us eliminate any knowledge about the actual computer when evaluating the performance of an algorithm. Powerful computers can make code run faster, but they cannot bend the laws of asymptotic analysis.

Asymptotic Analysis

The concept of an additive constant is common in many real-world scenarios, like the speedometer I just discussed. It’s what we mean when we say, “I’ll be there in 40 minutes, give or take 5 minutes.”

Asymptotic analysis takes this idea further and introduces the notion of a multiplicative constant to analyze algorithms. If you’ve heard of Moore’s Law, you will find this concept familiar. Gordon Moore, the CEO and cofounder of Intel corporation, predicted in 1965 that the number of components per integrated circuit would double every year for a decade; in 1975 he revised this prediction to doubling every two years. This prediction was valid for over 40 years and explains why the speed of computers essentially doubles every two years. A multiplicative constant applied to computing means you can find an older computer where the same program runs one thousand times slower (or even worse) than on a modern computer.

Consider two algorithms that solve the same problem. Using the techniques I have already shown, assume that algorithm X requires 5N operations on problems of size N, while algorithm Y requires 2020 × log(N) operations to solve the same problem. Is algorithm X more efficient than Y?

You have two computers on which you execute implementations of these algorithms: computer C_fast is two times faster than C_slow. Figure 2-2 reports the number of operations for each algorithm on a problem instance of size N. It also shows performance of X and Y on C_fast (i.e., columns labeled X_fast and Y_fast) and the performance of X on C_slow (i.e., column labeled X_slow).

While initially X requires fewer operations than Y on problem instances of the same size, once N is 8,192 or larger, Y requires far fewer operations, and it’s not even close. The graphs in Figure 2-3 visualize the crossover point between 4,096 and 8,192, when Y begins to outperform X in terms of the number of operations required. When you run the exact same implementation of X on two different computers, you can see that X_fast (running on C_fast) outperforms X_slow (running on C_slow).

If you found a supercomputer, C_fastest, that was 500 times faster than C_slow, you could eventually find a problem instance size for which the efficient algorithm Y, running on C_slow, outperforms the inefficient algorithm X, running on C_fastest. This is an “apples vs. oranges” comparison in a way, because the programs are running on different computers; nonetheless, in this specific case, the crossover occurs on problem instance sizes between 4,194,304 and 8,388,608. Even with a supercomputer, eventually the more efficient algorithm will outperform it on a slower computer, once the problem instances are large enough.

You can try to throw advanced computing hardware at a problem, but eventually the more efficient algorithm will be faster for large-enough problem instances.

Computer scientists use a Big O notation to classify algorithms based on the runtime performance on best case (or worst case) problem instances of size N. The letter O is used because the growth rate of a function is called the “order of a function.” For example, the formula 4N² + 3N – 5 is an “order 2” function, also called quadratic, since the largest exponent for N is 2.

To estimate the runtime performance for an algorithm on a problem instance of size N, start by counting the number of operations. It’s assumed that each operation takes a fixed amount of time to execute, which turns this count into an estimate for the runtime performance.

Note

T(N) is the time required for an algorithm to process a problem instance of size N. There can be different T(N) defined for best case and worst case problem instances for the same algorithm. The time unit does not matter (whether milliseconds, or seconds).

S(N) is the storage required for an algorithm to process a problem instance of size N. There can be different S(N) defined for best case and worst case problem instances for the same algorithm. The space unit does not matter (whether bits or gigabytes).

Counting All Operations

The goal is to estimate the time for an algorithm to process any problem instance of size N. Because this estimate must be accurate for all problem instances, try to find a worst case problem instance that will force the algorithm to work the most.

First, determine K(N), the count of how many times a key operation executes on a worst case problem instance of size N. Next, estimate that the number of machine instructions executed in total would be a multiple of this count, that is c × K(N). This is a safe assumption because modern programming languages can be compiled into tens or hundreds of machine instructions. You don’t even have to compute c, but rather you can empirically determine it, based on the individual performance on a computer, as I have done.

The notation clearly classifies the trajectory of the performance (or storage) as a function of N. Each performance class O(f(N)) is described by some f(N). The terminology can, at first, be confusing. When classifying the algorithm, you will use a formula represented as a function, f, based on N. We have seen four performance classes:

O(N) is the linear complexity class, where f(N) = N.

O(N^1.585) is the Karatsuba complexity class, where f(N) = N^1.585.

O(N²) is the quadratic complexity class, where f(N) = N².

O(N log N) is the complexity class where f(N) = N × log N.

To conduct an accurate analysis, you must inspect the source code to see the structure of the algorithm. In the following code example, how many times does the key operation ct = ct + 1 execute?

for i in range(100):
  for j in range(N):
      ct = ct + 1

The outer i loop executes 100 times, and for each of these loops, the inner j loop executes N times. In total, ct = ct + 1 executes 100 × N times. The total time T(N) to execute the preceding code on a problem instance of size N is smaller than c × N for some suitably chosen c. If you execute this code on an actual computer, you will be able to determine the exact c. More precisely, using the Big O notation, we can state that the performance of this code is O(N).

Run this code thousands of times on different computing systems, and each time you would be able to compute a different c; this fact remains true and is the reason we can classify the code performance as O(N). There are theoretical details I sidestep in this discussion, but you only need to know that when you have identified a function, f(N), that represents the count of the operations in your algorithm, you have its algorithm classification, O(f(N)).

Counting All Bytes

You can perform a similar analysis to determine the space complexity required for an algorithm on a problem instance of size N. When an algorithm dynamically allocates additional storage space, it invariably increases the runtime performance because of the costs associated with dynamic memory management.

The following Python statements require a different amount of space:

range(N) uses a fixed amount of space because in Python 3, range is a generator that produces the numbers one at a time, without allocating a whole list (as it did in Python 2).

list(range(N)) constructs a list storing N integers from 0 to N – 1. The size of the required memory grows larger in direct proportion to N.

Quantifying the space for a statement is hard because there is no universally agreed-upon unit for space. Should we count the bytes of memory used? The bits? Does it matter if an integer requires 32 bits of storage or 64 bits of storage? Imagine a future computer that allowed for 128-bit representations of integers. Has the space complexity changed? In Python, sys.getsizeof(...) determines the size in bytes for an object. Python 3 uses generators for range(), which significantly reduces the storage needs for Python programs. If you type the following statements into a Python interpreter, you will see the corresponding storage requirements:

>>> import sys
>>> sys.getsizeof(range(100))
48
>>> sys.getsizeof(range(10000))
48
>>> sys.getsizeof(list(range(100)))
1008
>>> sys.getsizeof(list(range(1000)))
9112
>>> sys.getsizeof(list(range(10000)))
90112
>>> sys.getsizeof(list(range(100000)))
900112

These results show that the byte storage for list(range(10000)) is about 100 times larger than for list(range(100)). And when you review the other numbers, you can classify this storage requirement as O(N).

In contrast, the number of bytes required for range(100) and range(10000) is identical (48 bytes). Since the storage is constant, we need to introduce another complexity class, known as the constant complexity class:

O(1) is the constant complexity class, where f(N) = c for some constant c.

I’ve covered a lot of theoretical material in this chapter, and it’s time to put these concepts to practical use. I now present an optimal searching algorithm in computer science called Binary Array Search. In explaining why it is so efficient, I will introduce a new complexity class, O(log N).

When One Door Closes, Another One Opens

I have written a sequence of seven different numbers in increasing order from left to right, and hidden each one behind a door, as shown in Figure 2-4. Try this challenge: what is the fewest number of doors you need to open—one at a time—to either find a target value of 643 or prove it is not hidden behind one of these doors? You could start from the left and open each door—one at a time—until you find 643 or a larger number (which means it wasn’t behind a door in the first place). But with bad luck, you might have to open all seven doors. This search strategy doesn’t take advantage of the fact that you know the numbers behind the doors are in ascending order. Instead, you can solve the challenge by opening no more than three doors. Start with door 4 in the middle and open it.

The number it was hiding is 173; since you are searching for 643, you can ignore all of the doors to the left of door 4 (since the numbers behind those doors will all be smaller than 173). Now open door 6 to reveal the number 900. OK, so now you know that you can ignore the doors to the right of door 6. Only door 5 can be hiding 643, so open it now to determine whether the original series contained 643. I leave it to your imagination whether the number was behind that door.

If you repeat this process on any ascending list of seven numbers using any target value, you will never need to open more than three doors. Did you notice that 2³ – 1 = 7? What if you had 1,000,000 doors covering an ascending list of numbers? Would you accept a $10,000 challenge to determine whether a specific number is hidden behind some door if you are only able to open 20 doors? You should! Since 2²⁰ – 1 = 1,048,575, you can always locate a number in an ascending list of 1,048,575 numbers after opening 20 or fewer doors. Even better, if there were suddenly twice as many doors, 2,097,151 in fact, you would never need to open more than 21 doors to find a number; that’s just one additional door to open. That seems astonishingly efficient! You have just discovered Binary Array Search.

Binary Array Search

Binary Array Search is a fundamental algorithm in computer science because of its time complexity. Listing 2-3 contains an implementation that searches for target in an ordered list, A.

Listing 2-3. Binary Array Search

def binary_array_search(A, target):
  lo = 0
  hi = len(A) - 1            

  while lo <= hi:            
    mid = (lo + hi) // 2     

    if target < A[mid]:      
      hi = mid-1
    elif target > A[mid]:    
      lo = mid+1
    else:
      return True            

  return False

: Set lo and hi to be inclusive within list index positions of 0 and len(A)–1.
: Continue as long as there is at least one value to explore.
: Find midpoint value, A[mid], of remaining range A[lo .. hi].
: If target is smaller than A[mid], continue looking to the left of mid.
: If target is larger than A[mid], continue looking to the right of mid.
: If target is found, return True.
: Once lo is greater than hi, there are no values remaining to search. Report that target is not in A.

Initially lo and hi are set to the lowest and highest indices of A. While there is a sublist to explore, find the midpoint, mid, using integer division. If A[mid] is target, your search is over; otherwise you have learned whether to repeat the search in the sublist to the left, A[lo .. mid-1], or to the right, A[mid+1 .. hi].

Note

The notation A[lo .. mid] means the sublist from lo up to and including mid. If lo > mid, then the sublist is empty.

This algorithm determines whether a value exists in a sorted list of N values. As the loop iterates, eventually either the target will be found or hi crosses over to become smaller than lo, which ends the loop.

Almost as Easy as π

Consider using Binary Array Search to find a target value of 53 in the list shown in Figure 2-5. First, set lo and hi to the boundary index positions of A. In the while loop, mid is computed. Since A[mid] is 19—which is smaller than the target, 53—the code takes the elif case, setting lo to mid + 1 to refine the search on the sublist A[mid+1 .. hi]. The grayed-out values are no longer in consideration. The size of the sublist being explored after this iteration is reduced by half (from 7 values to 3).

In the second pass through the while loop, mid is recomputed, and it turns out that A[mid] is 53, which is the target value, so the function returns True.

Note

Before conducting Binary Array Search, is it worth checking that target ≥ A[0] and target ≤ A[-1]? Doing so would prevent a fruitless search for a target that couldn’t possibly be present in an ordered list. The short answer is no. This adds up to two comparisons to every search, which are unnecessary if the searched-for values are always within the range of the extreme values in A.

Now let’s search for a value that is not in the list. To search for the target value of 17 in Figure 2-6, initialize lo and hi as before. A[mid] is 19, which is larger than the target, 17, so take the if case and focus the search on A[lo .. mid-1]. The grayed-out values are no longer in consideration. The target, 17, is greater than A[mid] = 14, so take the elif case and try searching A[mid+1 .. hi].

In the third time through the while loop, A[mid] is 15, which is smaller than the target value of 17. Once again, take the elif case, which sets lo to be larger than hi; this is the “crossover,” shown at the bottom of Figure 2-6. The condition of the while loop is false, and so False is returned, declaring that A does not contain the target value.

Two Birds with One Stone

What if you want to know the exact location of target in A, instead of just confirming that target is contained in A? Binary Array Search currently returns True or False. Modify the code, as shown in Listing 2-4, to return the index position, mid, where target is found.

Listing 2-4. Return location of `target` in `A`

def binary_array_search(A, target):
  lo = 0
  hi = len(A) - 1

  while lo <= hi:
    mid = (lo + hi) // 2

    if target < A[mid]:
      hi = mid-1
    elif target > A[mid]:
      lo = mid+1
    else:
      return mid             

  return -(lo+1)

: Return the value of mid since that is the location of target.
: Alert caller that target doesn’t exist by returning the negative of lo + 1.

What should be returned when target is not in A? You could just return –1 (which is an invalid index location), but there is an opportunity to return more information. What if we could tell the caller “target is not in A, but if you wanted to insert target into A, it would go in this location”?

Look at Figure 2-6 again. When searching for a target value of 17 (which doesn’t exist in A), the final value of lo is actually where 17 would be inserted. You could return –lo as the result, and this would work for all index locations except for the first one, which is zero. Instead return the negation of (lo + 1). The calling function that receives a negative value, x, has learned that target would be placed at location –(x + 1). When a nonnegative value is returned, that is the location of target in A.

One final optimization remains. In Listing 2-4, there are two comparisons between target and A[mid]. When both values are numeric, Listing 2-5 shows how to compute their difference just once instead of invoking this key operation twice; this also ensures you only access A[mid] once.

Listing 2-5. Optimization that requires just a single value comparison

diff = target - A[mid]
if diff < 0:
  hi = mid-1
elif diff > 0:
  lo = mid+1
else:
  return mid

If target is smaller than A[mid], then diff < 0, which is equivalent logically to checking whether target < A[mid]. If diff is positive, then you know target was greater than A[mid]. Even when the values are not numeric, some programming languages offer a compareTo() function that returns a negative number, zero, or a positive number based on the relative ordering of two values. Using this operation leads to more efficient code when comparison operations are costly.

Tip

If the values in a list appear in descending order, you can still use Binary Array Search—just switch the way lo and hi are updated in the while loop.

How efficient is Binary Array Search on a problem instance of size N? To answer this question, I have to compute, in the worst case, how many times the while loop is forced to execute. The mathematical concept of logarithms will tell us the answer.5

To see how logarithms work, consider this question: how many times do you need to double the number 1 until the value equals 33,554,432? Well, you could start computing this manually: 1, 2, 4, 8, 16, and so on, but this is truly tedious. Mathematically, you are looking for a value, x, such that 2^x = 33,554,432.

Note that 2^x involves exponentiation of a base (the value 2) and an exponent (the value x). A logarithm is the opposite of exponentiation, in the same way that division is the opposite of multiplication. To find an x such that 2^x = 33,554,432, compute log₂ (33,554,432) using a base of 2, resulting in the value 25.0. If you type the equation 2²⁵ into a calculator, the result is 33,554,432.

This computation also answers the question of how many times you can divide 2 into 33,554,432. You get to 1 after 25 divisions. log() computes a floating point result; for example, log₂(137) is about 7.098032. This makes sense since 2⁷ = 128, and 137 would require a slightly higher exponent for base 2.

The Binary Array Search algorithm will repeat the while loop as long as lo ≤ hi, or in other words, while there are values to search. The first iteration starts with N values to search, and in the second iteration, this is reduced to no more than N/2—if N is odd it is (N – 1)/2. To determine the maximum number of successive iterations, you need to know how many times you can divide N by 2 until you reach 1. This quantity is exactly k = log₂(N), so the total number of times through the while loop is 1 + k, counting 1 for the first time for N values plus k for the successive iterations. Because log() can return a floating point value, and we need an integer number for the number of iterations, use the floor(x) mathematical operation, which computes the largest integer that is smaller than or equal to x.

Tip

No handheld calculator has a button to compute log₂(X)—most calculator apps don’t either. Don’t panic! You can always compute log₂ easily. For example, log₂(16) = 4. On your calculator, enter 16 and then press the log button (which is either in base 10 or the natural base e). Your display should read something awful like 1.20411998. Now press the / (divide) button, press the 2 button, and finally press the log button again. Your display should read 0.301029995. Now that all hope seems lost, press the equals button. Magically the value 4 appears. This sequence of operations demonstrates that log₂(X) = log₁₀ (X)/log₁₀ (2).

For Binary Array Search, the while loop iterates no more than floor(log₂(N)) + 1 times. This behavior is truly extraordinary! With one million values in sorted order, you can locate any value in just 20 passes through the while loop.

Tip

To provide some quick evidence for this formula, count the number of iterations through the while loop for problem instances whose size, N, ranges from 8 to 15: you only need 4 in all cases. For example, starting with 15 values in the first iteration, the second iteration explores a sublist with 7 values, the third iteration explores a sublist with 3 values, and the fourth and final iteration explores a sublist with just 1 value. If you started with 10 values, the number of explored values in each iteration would be 10 → 5 → 2 → 1, which also means four iterations in total.

The experience with Binary Array Search leads to a new complexity class, O(log N), called the logarithmic complexity class, where f(N) = log(N).

To summarize, when you analyze an algorithm and state that its time complexity is O(log N), you are claiming that once the problem instance size is larger than some threshold size, the runtime performance, T(N), of the algorithm is always smaller than c × log(N) for some constant, c. Your claim is correct if you can’t make this claim with another complexity class of lower complexity.

All complexity classes are arranged in order of dominance, as shown in Figure 2-7.

While there are an infinite number of complexity classes, the eight classes in this figure are the most commonly used ones. The constant time, O(1), has the least amount of complexity and reflects constant work that is independent of the size of the problem instance. The next-higher complexity class, O(log N), is logarithmic, and you have seen how Binary Array Search falls into this category. Both of these classes are sub-linear and result in extremely efficient algorithms.

The linear, O(N), complexity class means complexity is directly proportional to the size of the problem instance. A series of polynomial classes are all of increasing complexity—O(N²), O(N³), and so on—up to any fixed constant, O(N^c). Sandwiched between O(N) and O(N²) is O(N log N), which is often identified as the ideal complexity class for algorithm designers.

This dominance hierarchy can also help identify how to classify an algorithm when there is a mixed combination of time complexities. For example, if an algorithm contains two substeps—the first with time complexity of O(N log N) and the second with O(N²)—what is its overall complexity? The overall classification of the algorithm is O(N²) because the complexity of substep 2 is the dominant impact on the overall complexity. In practical terms, if you have modeled T(N) for an algorithm to be 5N² + 10,000,000 × N × log(N), then T(N) has complexity of O(N²).

The final two complexity classes—exponential and factorial—are inefficient, and algorithms with these time complexities can only solve very small problem instances. Check out the challenge exercises at the end of the chapter that address these complexity classes.

Pulling It All Together

Table 2-4 presents the computation of f(N) for each of the designated complexity classes. Imagine that each of these numbers represents the estimated time in seconds for an algorithm with the rated time complexity (in a column) to process a problem instance of size N (as identified by the row). 4,096 is about one hour and eight minutes, so in just over an hour of computation time, you could likely solve:

An O(1) algorithm, since its performance is independent of the problem instance size
An O(log N) algorithm for problem instances of size 2⁴⁰⁹⁶ or smaller
An O(N) algorithm for problem instances of 4,096 or smaller
An O(N log N) algorithm for problem instances of size 462 or smaller

An O(N²) algorithm for problem instances of size 64 or smaller

An O(N³) algorithm for problem instances of size 16 or smaller

An O(2^N) algorithm for problem instances of size 12 or smaller

An O(N!) algorithm for problem instances of size 7 or smaller

Table 2-4. Growth of different computations
N	log(N)	N	N log N	N²	N³	2^N	N!
2	`1`	`2`	`2`	`4`	`8`	`4`	`2`
4	`2`	`4`	`8`	`16`	`64`	`16`	`24`
8	`3`	`8`	`24`	`64`	`512`	`256`	`40,320`
16	`4`	`16`	`64`	`256`	`4,096`	`65,536`	`2.1 x 10¹³`
32	`5`	`32`	`160`	`1,024`	`32,768`	`4.3 x 10⁹`	`2.6 x 10³⁵`
64	`6`	`64`	`384`	`4,096`	`262,114`	`1.8 x 10¹⁹`	`1.3 x 10⁸⁹`
128	`7`	`128`	`896`	`16,384`	`2,097,152`	`3.4 x 10³⁸`	`∞`
256	`8`	`256`	`2,048`	`65,536`	`16,777,216`	`1.2 x 10⁷⁷`	`∞`
512	`9`	`512`	`4,608`	`262,144`	`1.3 x 10⁸`	`∞`	`∞`
1,024	`10`	`1,024`	`10,240`	`1,048,576`	`1.1 x 10⁹`	`∞`	`∞`
2,048	`11`	`2,048`	`22,528`	`4,194,304`	`8.6 x 10⁹`	`∞`	`∞`

The reason to investigate algorithms with the lowest complexity rating is because the problems you want to solve are simply too large on even the fastest computer. With higher complexity classes, the time to solve even small problems essentially is infinite (as Figure 2-8 shows).

A common way to visualize these extremely large numbers is shown in Figure 2-8. The x-axis represents the size of the problem instance being solved. The y-axis represents the total estimated runtime performance for one of the algorithms labeled in the chart. As the complexity of the algorithm increases, the size of the problem instance that can be solved “in reasonable time” decreases.

Consider the following, more complex scenarios:

If someone classifies an algorithm as O(N² + N), how should you respond? The dominance hierarchy in Figure 2-7 shows that N² has a greater complexity than N, and so this can be simplified to be O(N²). Similarly, O(2^N + N⁸) would be simplified to become O(2^N).

If an algorithm is classified as O(50 × N³), you can simplify this to become O(N³) because multiplicative constants can be eliminated.

Sometimes the behavior of an algorithm can depend on properties other than just the size of the problem instance, N. For example, consider an algorithm that processes N numeric values, with a primary task whose runtime performance is directly proportional to N. Now assume this algorithm has a subtask that processes all even values in the problem instance. The runtime performance of this subtask is directly proportional to E², where E represents the number of even values. You might want to carefully specify that the runtime performance of the algorithm is O(N + E²). If, for example, you could eliminate all even values from the input set, this performance would be rated as O(N), and that might be noteworthy. Naturally, in the worst case where all numbers are even, then the overall classification of the algorithm would become O(N²) since E ≤ N.

Curve Fitting Versus Lower and Upper Bounds

The curve_fit() function provided by SciPy uses a nonlinear, least squares method to fit a model function, f, to existing data. I use data based on different problem instances of size N and the respective runtime performance of an algorithm in solving those instances. The result of curve_fit()—as you have seen in this chapter—are coefficients to use with a model function to predict future runtime performance. With these coefficients applied to f, the resulting model minimizes the sum of the square of the errors between the actual data and the predicted value from the model.

These are useful to get a ballpark estimate of the inner behavior of an algorithm’s implementation on specific problem instances. By itself, this model is neither a proven upper bound or lower bound regarding the complexity of an algorithm. You need to review the algorithm’s implementation to develop a model that counts the number of key operations, which directly affects the runtime performance of the algorithm.

When you have an accurate model, f(N), that reflects the count of the key operations in the worst case for an algorithm on a problem instance of size N, then you have classified O(f(N)) in the worst case. This is the upper bound. A corresponding lower bound can be similarly derived to model the effort that the algorithm must at least expend in the worst case. The notation Ω(f(N)) is used to describe the classification of the lower bound of an algorithm.6

In our earlier discussion of Binary Array Search, I demonstrated that the while loop iterates no more than floor(log₂(N)) + 1 times. This means, in the worst case, use f(N) = log(N) to formally classify Binary Array Search as O(log N). What is the best case for Binary Array Search? If the target value is found in A[mid], the function returns after just one pass through the while loop. Since this is a constant independent of the size of the problem instance, this means that Binary Array Search is classified as O(1) in the best case. The big O notation can be used for both best case and worst case analysis, although many programmers assume it is meant only for worst case.

Tip

You may occasionally see the time complexity for an algorithm rated as Θ(N log N) using the capitalized Greek symbol theta. This notation is typically used to analyze the average case for an algorithm. This means that the upper bound is O(N log N), and the lower bound is Ω(N log N). This is known, mathematically, as a tight bound and provides the best evidence that the runtime performance of the algorithm is quite predictable.

Summary

We have covered a lot of ground in these first two chapters, but there is so much more you can learn about analyzing algorithms. I presented a number of examples that describe the way algorithms behave independent of how they are implemented. In the mid-20th century, while researchers were discovering new algorithms, advances in computing technology dramatically improved the very performance of the computers executing these algorithms. Asymptotic analysis provides the foundation for independently assessing the performance of algorithms in a way that eliminates any dependence on the computing platform. I defined several time (or storage) complexity classes, visualized in Figure 2-8, to explain the behavior of an algorithm in terms of the size of the problem instances. These complexity classes appear throughout the book as a notation to quickly summarize an algorithm’s behavior.

Challenge Exercises

Rate the time complexity of each code fragment in Table 2-5.

Table 2-5. Code fragments to analyze
Fragment-1	`for` `i` `in` `range(100):` `for` `j` `in` `range(N):` `for` `k` `in` `range(10000):` `...`
Fragment-2	`for` `i` `in` `range(N):` `for` `j` `in` `range(N):` `for` `k` `in` `range(100):` `...`
Fragment-3	`for` `i` `in` `range(0,N,2):` `for` `j` `in` `range(0,N,2):` `...`
Fragment-4	`while` `N` `>` `1:` `...` `N` `=` `N` `//` `2`
Fragment-5	`for` `i` `in` `range(2,N,3):` `for` `j` `in` `range(3,N,2):` `...`

Use the techniques described in this chapter to model the value of ct returned by the f4 function in Listing 2-6.
Listing 2-6. Sample function to analyze
```
def f4(N):
  ct = 1
  while N >= 2:
    ct = ct + 1
    N = N ** 0.5
  return ct
```
You will find that none of the models used in this chapter is accurate. Instead, develop one based on a × log(log(N)), in base 2. Generate a table up to N = 2⁵⁰ containing actual results as compared to the model. An algorithm with this behavior would be classified as O(log(log(N))).

One way to sort a list of values is to generate each permutation until you find one that is sorted, as shown in Listing 2-7.
Listing 2-7. Code to generate permutations from a list
```
from itertools import permutations
from scipy.special import factorial

def factorial_model(n, a):
  return a*factorial(n)

def check_sorted(a):
  for i, val in enumerate(a):
    if i > 0 and val < a[i-1]:
      return False
  return True

def permutation_sort(A):
  for attempt in permutations(A):
    if check_sorted(attempt):
      A[:] = attempt[:]         # copy back into A
      return
```
Generate a table of results for sorting a worst case problem instance (i.e., the values are in descending order) of up to 12 elements using permutation_sort(). Use the factorial_model() to curve fit the preliminary results and see how accurate the model is in predicting runtime performance. Based on these results, what is your estimate (in years) for the runtime performance on a worst case problem instance of size 20?

Generate empirical evidence on 50,000 random trials of Binary Array Search for N in the range of 2⁵ through 2²¹. Each trial should use random.sample() to randomly select N values from the range 0 .. 4N and place these values in sorted order. Then each trial should search for a random target value in the same range.

Using the results I have outlined in this chapter, use curve_fit() to develop a log N model that models the results of runtime performance for N in the range 2⁵ through 2¹². Determine the threshold problem instance size above which the behavior stabilizes. Create a visual plot of the data to see whether the computed model accurately models the empirical data.

We are normally concerned with time complexity, but consider the sorting algorithm in Listing 2-8:
Listing 2-8. Code to sort by repeatedly removing maximum value from list
```
def max_sort(A):
  result = []
  while len(A) > 1:
    index_max = max(range(len(A)), key=A.__getitem__)
    result.insert(0, A[index_max])
    A = list(A[:index_max]) + list(A[index_max+1:])
  return A + result
```
Using the results I have outlined in this chapter, assess the storage complexity of max_sort.
A galactic algorithm is an algorithm whose time complexity is better than any known algorithm when the problem instance size is “sufficiently large.” For example, the N-digit multiplication algorithm (published November 2020) by David Harvey and Joris Van Der Hoeven has O(N log N) runtime performance, once N is larger than 2^Z, where Z is 1729¹²; this exponent, Z, is already an astronomically large number, about 7 × 10³⁸. Consider now raising 2 to this incredibly large number! Do some research on other galactic algorithms. While these algorithms are not practical, they do offer hope that breakthroughs are possible on some really challenging problems.

Table 2-1 contains three rows of performance measurements on three data sets of different sizes. If you only had two rows of performance measurements, would it be possible to predict the performance of a quadratic time algorithm? In general, if you have K rows of performance measurements, what is the highest polynomial you can effectively use in a model?

1 Pronounced using three syllables, like en-log-en.

² A fast multiplication algorithm was discovered in 1960 by a 23-year old student at Moscow State University named Anatoly Karatsuba. Python uses this algorithm when multiplying very large integers.

³ The exponent, 1.585, is the approximate value of log(3) in base 2, which is 1.58496250072116.

⁴ This is a European Union requirement; in the United States, it is sufficient for the speedometer to be accurate to within plus or minus 5 miles per hour.

⁵ It is purely a coincidence that the word logarithm is an anagram of algorithm.

⁶ Ω is the capitalized Greek character omega.

Listing 2-2. Using grade-school algorithm to multiply two N-digit integers

Figure 2-2. Performance of algorithms X and Y on different computers

Figure 2-3. Visualizing the numbers from Figure 2-2

Note

Figure 2-4. Doors of destiny!

Figure 2-8. Runtime performance plotted against problem instance size for complexity classes

Tip

Chapter 2. Analyzing Algorithms

Using Empirical Models to Predict Performance

Listing 2-1. Calculate models based on partial data

Note

Multiplication Can Be Faster

Performance Classes

Figure 2-1. Comparing models against performance

Asymptotic Analysis

Counting All Operations

Counting All Bytes

When One Door Closes, Another One Opens

Binary Array Search

Listing 2-3. Binary Array Search

Note

Almost as Easy as π

Figure 2-5. Searching for 53 in a sorted array that contains the value

Note

Figure 2-6. Searching for 17 in a sorted array that doesn’t contain the value

Two Birds with One Stone

Listing 2-4. Return location of `target` in `A`

Listing 2-5. Optimization that requires just a single value comparison

Tip

Tip

Tip

Figure 2-7. All complexity classes are arranged in dominance hierarchy

Pulling It All Together

Curve Fitting Versus Lower and Upper Bounds

Summary

Challenge Exercises

Listing 2-6. Sample function to analyze

Listing 2-7. Code to generate permutations from a list

Listing 2-8. Code to sort by repeatedly removing maximum value from list

Chapter 2. Analyzing Algorithms

Using Empirical Models to Predict Performance

Listing 2-1. Calculate models based on partial data

Note

Multiplication Can Be Faster

Listing 2-2. Using grade-school algorithm to multiply two N-digit integers

Performance Classes

Figure 2-1. Comparing models against performance

Asymptotic Analysis

Figure 2-2. Performance of algorithms X and Y on different computers

Figure 2-3. Visualizing the numbers from Figure 2-2

Note

Counting All Operations

Counting All Bytes

When One Door Closes, Another One Opens

Figure 2-4. Doors of destiny!

Binary Array Search

Listing 2-3. Binary Array Search

Note

Almost as Easy as π

Figure 2-5. Searching for 53 in a sorted array that contains the value

Note

Figure 2-6. Searching for 17 in a sorted array that doesn’t contain the value

Two Birds with One Stone

Listing 2-4. Return location of target in A

Listing 2-5. Optimization that requires just a single value comparison

Tip

Tip

Tip

Figure 2-7. All complexity classes are arranged in dominance hierarchy

Pulling It All Together

Figure 2-8. Runtime performance plotted against problem instance size for complexity classes

Curve Fitting Versus Lower and Upper Bounds

Tip

Summary

Challenge Exercises

Listing 2-6. Sample function to analyze

Listing 2-7. Code to generate permutations from a list

Listing 2-8. Code to sort by repeatedly removing maximum value from list

Listing 2-4. Return location of `target` in `A`