This chapter introduces the terminology and notation used by theoreticians and practitioners alike in modeling the performance of algorithms in terms of computational performance and resource usage. When evaluating the runtime performance of your software program, you might be perfectly satisfied, in which case you can continue to use the application as is. But if you want to improve runtime performance, this book shows you where to start—with the program’s data structures and algorithms. You are faced with some specific questions:
There may be other algorithms that would significantly improve performance.
There can be hidden performance costs that can be eliminated.
The exact same program will have different runtime performance based on the computer on which it runs. In this chapter, I explain how computer scientists have developed analysis techniques that account for regular hardware improvements.
I start by showing how to model the runtime performance of a program on ever-increasing problem instance sizes. The runtime performance of an algorithm on small problem instance sizes can be hard to measure accurately because it could be sensitive to the actual values in the problem instance or to the resolution of the computer timers. Once your program processes a sufficiently large problem instance, you can develop models to classify its runtime behavior using empirical models.
I’d like to start with an example that shows how theoretical analysis is decidedly practical in real software systems. Imagine you are in charge of designing an application as part of a nightly batch job to process a large data set each night; the task is launched at midnight and must complete before 6:00 a.m. The data set contains several million values and is expected to double in size over the next five years.
You have built a working prototype but have only tested it on multiple small data sets, with 100, 1,000, and 10,000 values each. Table 2-1 presents the runtime performance of your prototype on these data sets.
N | Time (seconds) |
---|---|
100 |
|
1,000 |
|
10,000 |
|
Can these preliminary results predict the performance of your prototype on
larger problem instances, such as 100,000 or even 1,000,000? Let’s
build mathematical models from this data alone to define a function
T
(N) that predicts the runtime performance for a given problem instance
size. An accurate model will compute a T
(N) that is close to the three
values in Table 2-1, as well as predict higher values of N, as shown
in Table 2-2 (which repeats these three time results in brackets).
You may have used a software tool, such as Maple or Microsoft Excel, to compute a trendline (also called a line of best fit) for sample data. The popular SciPy library for mathematics, science, and engineering can develop these trendline models. Listing 2-1 uses scipy
to try to find a linear model, TL
(N) = a
× N + b
, where a
and b
are constants. curve_fit()
will return the (a
, b
) coefficients to use with the linear model based on the available empirical data encoded in lists xs
and ys
.
import
numpy
as
np
from
scipy.optimize
import
curve_fit
def
linear_model
(
n
,
a
,
b
):
return
a
*
n
+
b
# Sample data
xs
=
[
100
,
1000
,
10000
]
ys
=
[
0.063
,
0.565
,
5.946
]
# Coefficients are returned as first argument
[(
a
,
b
),
_
]
=
curve_fit
(
linear_model
,
np
.
array
(
xs
),
np
.
array
(
ys
))
(
'Linear = {}*N + {}'
.
format
(
a
,
b
))
The resulting model is the formula TL
(N) = 0.000596 × N – 0.012833. As you can see from Table 2-2, this model is inaccurate because as the problem size increases, it significantly underestimates the actual runtime performance of the prototype. Another possible model is a quadratic polynomial, where N is raised to the power
of 2:
def
quadratic_model
(
n
,
a
,
b
):
return
a
*
n
*
n
+
b
*
n
;
With quadratic_model
, the goal is to find TQ
(N) = a
× N2 + b
× N, where a
and b
are constants. Using the approach in Listing 2-1, the formula is TQ
(N) = 0.000000003206 × N2 + 0.000563 × N. Table 2-2 shows that as the problem size
increases, this model significantly overestimates the actual runtime
performance, so it is also not accurate.
Many of these constants are quite small, like 0.000000003206, which is 3.206 × 10–9. The reason is that the problems solved by algorithms involve problem instances where N = 1,000,000 or higher. Note that (1,000,000)2 = 1012, so be prepared to see both very small and very large constants.
The final column in Table 2-2 contains the predicted result
using a third mathematical model, TN
(N) = a
× N × log
(N), which uses the logarithm function (in base 2) and in which a
is a constant. The result is TN
(N) = 0.0000448 × N × log
(N). For N = 10,000,000, the estimate computed by TN
(N) is within 5% of the actual value.
N | Time (seconds) | TL | TQ | TN |
---|---|---|---|---|
100 |
|
|
|
|
1,000 |
|
|
|
|
10,000 |
|
|
|
|
100,000 |
|
|
|
|
1,000,000 |
|
|
|
|
10,000,000 |
|
|
|
|
The linear model, TL
, underestimates the total time, while the quadratic
model, TQ
, overestimates it. For N = 10,000,000, TL
declares it will
take 5,957 seconds (about 100 minutes), but TQ
declares it will take
326,300 seconds (about 91 hours). TN
does a better job in predicting
the performance, estimating 10,421 seconds (about 2.9 hours) against
the actual performance of 9,879 seconds (2.75 hours).
The prototype completes its processing overnight—that’s a relief!—but you must review the code for your prototype to see the algorithms and data structures it employs, so you can guarantee this result regardless of the problem instance being solved.
Why does the formula a
× N × log
(N) model the behavior so well? It has
to do with the fundamental algorithms used within your prototype
application. These three kinds of models—linear, quadratic, and N
log
N—appear regularly when analyzing algorithms. Let’s try one more
example to demonstrate a surprising result discovered about fifty years
ago.2
Consider two examples, shown in Listing 2-2, for multiplying two N-digit integers using an algorithm most of us learned in grade school. While I do not precisely define this algorithm, you see that it creates N products, listed below the original numbers, which are totaled to compute the final answer.
456 123456 x 712 x 712835 --- ------ 912 617280 456 370368 3192 987648 ------ 246912 324672 123456 864192 ----------- 88003757760
When multiplying two 3-digit integers, you need 9 single-digit multiplications. For 6-digit integers, you need 36 single-digit multiplications. Using this algorithm with two N-digit integers requires N2 single-digit multiplications. Another observation is that when you double the number of digits in the integers being multiplied, you need four times as many single-digit multiplications. I’m not even counting all the other work to do (like additions) because single-digit multiplication is the key operation.
A computer’s CPU provides efficient operations to multiply fixed-size 32-bit or 64-bit integers, but it has no ability to deal with larger integers. Python automatically upgrades large integer values to a Bignum structure, which allows integers to grow to any size necessary. This means you can measure the runtime performance when multiplying two N-digit numbers. Table 2-3 derives three models based on the first five rows of timing results of multiplying two N-digit integers (shown in brackets).
N | Time (seconds) | TL | TQ | Karatsuba | TKN |
---|---|---|---|---|---|
256 |
|
|
|
|
|
512 |
|
|
|
|
|
1,024 |
|
|
|
|
|
2,048 |
|
|
|
|
|
4,096 |
|
|
|
|
|
8,192 |
|
|
|
|
|
16,384 |
|
|
|
|
|
32,768 |
|
|
|
|
|
65,536 |
|
|
|
|
|
131,072 |
|
|
|
|
|
262,144 |
|
|
|
|
|
TL
is a linear model, while TQ
is a quadratic model. Karatsuba
is the
unusual formula a
× N1.585, and the improved model TKN
(N) = a
× N1.585 + b
× N, where a
and b
are constants.3 TL
significantly underestimates the time. TQ
significantly overestimates the time, which is surprising since earlier
intuition suggests that when N doubles, the time to perform should increase
fourfold, an essential characteristic of quadratic models. These other
models more accurately predict the performance of multiplying N-digit
integers in Python, which uses an advanced, more efficient Karatsuba
multiplication algorithm for large integers.
The approach used to generate Table 2-2 and Table 2-3 is a good start, but it is limited since it is indirect, based only on runtime performance and not by reviewing the code. Throughout this book, I will describe the implementations of the algorithms, and based on the structure of the code, I can identify the appropriate formula to model the performance of an algorithm.
When different algorithms solve the exact same problem, it is sometimes
possible to identify which one will be the most efficient simply by
classifying its performance using mathematical models. Often algorithms are
described using phrases like “Complexity is O(N2)” or “Worst-case
performance is O(N log
N).” To explain this terminology, I want to start
with the image in Figure 2-1, which might be a familiar one
if you have read a book, or an online resource, that discusses algorithm
analysis.
The goal is to find a model that predicts the worst runtime performance for a given problem instance, N. In mathematics, this is known as an upper bound—think of the phrase “the algorithm will never work harder than this.” A corresponding concept, the lower bound, represents the minimum runtime performance—in other words, “the algorithm must always work at least this much.”
To explain the concepts of lower and upper bounds, consider how a car’s speedometer is a model that computes an indicated speed as an approximation of the car’s true speed. The indicated speed must never be less than the true speed so the driver can abide by speed limits. This represents the mathematical concept of a lower bound. On the high end, the indicated speed is allowed to be up to 110% of the true speed plus 4 kilometers per hour.4 This represents the mathematical concept of an upper bound.
The true speed for the car must always be larger than the lower bound and smaller than the upper bound.
In Figure 2-1, the three curves—TL
, TQ
, and TKN
—represent the model predictions, while black squares represent
the individual actual performance when multiplying two N-digit
integers. While TQ
(N) is an upper bound on the actual performance (since
TQ
(N) > Time
for all values of N), it is highly inaccurate, as you can
see from Figure 2-1.
If you go back and review Table 2-3, observe that for all N values greater than or equal to the threshold problem instance size of 8,192, TKN
(N) is greater than the actual performance reported for N, while remaining much closer to the actual values. This evidence is a clear indication that often the real behavior will stabilize once N is “large enough,” which will depend on each algorithm and the way it is implemented.
It might seem that TL
(N) models the lower bound on the actual
performance, since TL
(N) < Time
for all values of N. However, as N
increases, it becomes further and further away from the runtime
performance, essentially rendering it useless as a model of the algorithm’s
runtime performance. The Karatsuba
formula a
× N1.585, whose values
appear in Table 2-3, provide a more accurate lower
bound.
If you run these programs on different computers, the numeric
details shown in Table 2-3 will change—the runtime performance can be slower or
faster; the a
and b
coefficients of TK()
will change; the threshold
problem instance size above which TK
(N) stabilizes could be lower or
higher. What would remain unchanged is the exponent 1.585 in the model,
since the structure of the Karatsuba fast multiplication algorithm
determines how it will perform. No supercomputer will somehow make the
Karatsuba implementation suddenly behave in a way modeled by the linear
TL
(N) model.
We can now tackle asymptotic analysis, an approach that lets us eliminate any knowledge about the actual computer when evaluating the performance of an algorithm. Powerful computers can make code run faster, but they cannot bend the laws of asymptotic analysis.
The concept of an additive constant is common in many real-world scenarios, like the speedometer I just discussed. It’s what we mean when we say, “I’ll be there in 40 minutes, give or take 5 minutes.”
Asymptotic analysis takes this idea further and introduces the notion of a multiplicative constant to analyze algorithms. If you’ve heard of Moore’s Law, you will find this concept familiar. Gordon Moore, the CEO and cofounder of Intel corporation, predicted in 1965 that the number of components per integrated circuit would double every year for a decade; in 1975 he revised this prediction to doubling every two years. This prediction was valid for over 40 years and explains why the speed of computers essentially doubles every two years. A multiplicative constant applied to computing means you can find an older computer where the same program runs one thousand times slower (or even worse) than on a modern computer.
Consider two algorithms that solve the same problem. Using the techniques I
have already shown, assume that algorithm X requires 5N operations on
problems of size N, while algorithm Y requires 2020 × log
(N) operations
to solve the same problem. Is algorithm X more efficient than Y?
You have two computers on which you execute implementations of these algorithms: computer Cfast is two times faster than Cslow. Figure 2-2 reports the number of operations for each algorithm on a problem instance of size N. It also shows performance of X and Y on Cfast (i.e., columns labeled Xfast and Yfast) and the performance of X on Cslow (i.e., column labeled Xslow).
While initially X requires fewer operations than Y on problem instances of the same size, once N is 8,192 or larger, Y requires far fewer operations, and it’s not even close. The graphs in Figure 2-3 visualize the crossover point between 4,096 and 8,192, when Y begins to outperform X in terms of the number of operations required. When you run the exact same implementation of X on two different computers, you can see that Xfast (running on Cfast) outperforms Xslow (running on Cslow).
If you found a supercomputer, Cfastest, that was 500 times faster than Cslow, you could eventually find a problem instance size for which the efficient algorithm Y, running on Cslow, outperforms the inefficient algorithm X, running on Cfastest. This is an “apples vs. oranges” comparison in a way, because the programs are running on different computers; nonetheless, in this specific case, the crossover occurs on problem instance sizes between 4,194,304 and 8,388,608. Even with a supercomputer, eventually the more efficient algorithm will outperform it on a slower computer, once the problem instances are large enough.
You can try to throw advanced computing hardware at a problem, but eventually the more efficient algorithm will be faster for large-enough problem instances.
Computer scientists use a Big O notation to classify algorithms based on the runtime performance on best case (or worst case) problem instances of size N. The letter O is used because the growth rate of a function is called the “order of a function.” For example, the formula 4N2 + 3N – 5 is an “order 2” function, also called quadratic, since the largest exponent for N is 2.
To estimate the runtime performance for an algorithm on a problem instance of size N, start by counting the number of operations. It’s assumed that each operation takes a fixed amount of time to execute, which turns this count into an estimate for the runtime performance.
T
(N) is the time required for an algorithm to process a problem instance
of size N. There can be different T
(N) defined for best case and worst
case problem instances for the same algorithm. The time unit does not
matter (whether milliseconds, or seconds).
S
(N) is the storage required for an algorithm to process a problem
instance of size N. There can be different S
(N) defined for best case
and worst case problem instances for the same algorithm. The space unit
does not matter (whether bits or gigabytes).
The goal is to estimate the time for an algorithm to process any problem instance of size N. Because this estimate must be accurate for all problem instances, try to find a worst case problem instance that will force the algorithm to work the most.
First, determine K
(N), the count of how many times a key operation
executes on a worst case problem instance of size N. Next, estimate that
the number of machine instructions executed in total would be a multiple of
this count, that is c
× K
(N). This is a safe assumption because modern
programming languages can be compiled into tens or hundreds of machine
instructions. You don’t even have to compute c
, but rather you can
empirically determine it, based on the individual performance on a computer,
as I have done.
The notation clearly classifies the trajectory of the performance (or
storage) as a function of N. Each performance class O(f
(N)) is described
by some f
(N). The terminology can, at first, be confusing. When
classifying the algorithm, you will use a formula represented as a
function, f
, based on N. We have seen four performance classes:
To conduct an accurate analysis, you must inspect the source code to see the structure of the algorithm. In the following code example, how many times does the key operation ct = ct + 1
execute?
for
i
in
range
(
100
):
for
j
in
range
(
N
):
ct
=
ct
+
1
The outer i
loop executes 100 times, and for each of these loops, the
inner j
loop executes N times. In total, ct = ct + 1
executes 100 × N times. The total time T
(N) to execute the preceding code on a problem instance of size N is smaller than
c
× N for some suitably chosen c
. If you execute this code on an actual computer, you will be able to determine the exact c
. More precisely, using the Big O notation, we can state that the performance of this code is O(N).
Run this code thousands of times on different computing systems, and each
time you would be able to compute a different c
; this fact remains true
and is the reason we can classify the code performance as O(N). There are
theoretical details I sidestep in this discussion, but you only need to
know that when you have identified a function, f
(N), that represents the
count of the operations in your algorithm, you have its algorithm
classification, O(f
(N)).
You can perform a similar analysis to determine the space complexity required for an algorithm on a problem instance of size N. When an algorithm dynamically allocates additional storage space, it invariably increases the runtime performance because of the costs associated with dynamic memory management.
The following Python statements require a different amount of space:
range(N)
uses a fixed amount of space because in Python 3, range
is a
generator that produces the numbers one at a time, without allocating a
whole list (as it did in Python 2).
list(range(N))
constructs a list storing N integers from 0 to
N – 1. The size of the required memory grows larger in direct proportion to N.
Quantifying the space for a statement is hard because there is no
universally agreed-upon unit for space. Should we count the bytes of memory
used? The bits? Does it matter if an integer requires 32 bits of storage or
64 bits of storage? Imagine a future computer that allowed for 128-bit
representations of integers. Has the space complexity changed? In Python,
sys.getsizeof(...)
determines the size in bytes for an object. Python 3
uses generators for range()
, which significantly reduces the storage needs
for Python programs. If you type the following statements into a Python
interpreter, you will see the corresponding storage requirements:
>>>
import
sys
>>>
sys
.
getsizeof
(
range
(
100
))
48
>>>
sys
.
getsizeof
(
range
(
10000
))
48
>>>
sys
.
getsizeof
(
list
(
range
(
100
)))
1008
>>>
sys
.
getsizeof
(
list
(
range
(
1000
)))
9112
>>>
sys
.
getsizeof
(
list
(
range
(
10000
)))
90112
>>>
sys
.
getsizeof
(
list
(
range
(
100000
)))
900112
These results show that the byte storage for list(range(10000))
is about
100 times larger than for list(range(100))
. And when you review
the other numbers, you can classify this storage requirement as O(N).
In contrast, the number of bytes required for range(100)
and
range(10000)
is identical (48 bytes). Since the storage is constant, we
need to introduce another complexity class, known as the constant complexity class:
I’ve covered a lot of theoretical material in this chapter, and it’s time to put these concepts to practical use. I now present an optimal
searching algorithm in computer science called Binary Array Search. In
explaining why it is so efficient, I will introduce a new complexity class,
O(log
N).
I have written a sequence of seven different numbers in increasing order from left to right, and hidden each one behind a door, as shown in Figure 2-4. Try this challenge: what is the fewest number of doors you need to open—one at a time—to either find a target value of 643 or prove it is not hidden behind one of these doors? You could start from the left and open each door—one at a time—until you find 643 or a larger number (which means it wasn’t behind a door in the first place). But with bad luck, you might have to open all seven doors. This search strategy doesn’t take advantage of the fact that you know the numbers behind the doors are in ascending order. Instead, you can solve the challenge by opening no more than three doors. Start with door 4 in the middle and open it.
The number it was hiding is 173; since you are searching for 643, you can ignore all of the doors to the left of door 4 (since the numbers behind those doors will all be smaller than 173). Now open door 6 to reveal the number 900. OK, so now you know that you can ignore the doors to the right of door 6. Only door 5 can be hiding 643, so open it now to determine whether the original series contained 643. I leave it to your imagination whether the number was behind that door.
If you repeat this process on any ascending list of seven numbers using any target value, you will never need to open more than three doors. Did you notice that 23 – 1 = 7? What if you had 1,000,000 doors covering an ascending list of numbers? Would you accept a $10,000 challenge to determine whether a specific number is hidden behind some door if you are only able to open 20 doors? You should! Since 220 – 1 = 1,048,575, you can always locate a number in an ascending list of 1,048,575 numbers after opening 20 or fewer doors. Even better, if there were suddenly twice as many doors, 2,097,151 in fact, you would never need to open more than 21 doors to find a number; that’s just one additional door to open. That seems astonishingly efficient! You have just discovered Binary Array Search.
Binary Array Search is a fundamental algorithm in computer science because
of its time complexity. Listing 2-3 contains an implementation that
searches for target
in an ordered list, A
.
def
binary_array_search
(
A
,
target
)
:
lo
=
0
hi
=
len
(
A
)
-
1
while
lo
<
=
hi
:
mid
=
(
lo
+
hi
)
/
/
2
if
target
<
A
[
mid
]
:
hi
=
mid
-
1
elif
target
>
A
[
mid
]
:
lo
=
mid
+
1
else
:
return
True
return
False
Set lo
and hi
to be inclusive within list index positions of 0 and len(A)–1
.
Continue as long as there is at least one value to explore.
Find midpoint value, A[mid]
, of remaining range A[lo .. hi]
.
If target
is smaller than A[mid]
, continue looking to the left of mid
.
If target
is larger than A[mid]
, continue looking to the right of mid
.
If target
is found, return True
.
Once lo
is greater than hi
, there are no values remaining to search. Report that target
is not in A
.
Initially lo
and hi
are set to the lowest and highest indices of
A
. While there is a sublist to explore, find the midpoint, mid
, using
integer division. If A[mid]
is target
, your search is over; otherwise
you have learned whether to repeat the search in the sublist to the left,
A[lo .. mid-1]
, or to the right, A[mid+1 .. hi]
.
The notation A[lo .. mid]
means the sublist from lo
up to and
including mid
. If lo
> mid
, then the sublist is empty.
This algorithm determines whether a value exists in a sorted list of N
values. As the loop iterates, eventually either the target
will be
found or hi
crosses over to become smaller than lo
, which ends the
loop.
Consider using Binary Array Search to find a target value of 53 in the
list shown in Figure 2-5. First, set lo
and hi
to
the boundary index positions of A
. In the while
loop, mid
is
computed. Since A[mid]
is 19—which is smaller than the target, 53—the code takes the elif
case, setting lo
to mid + 1
to refine the search on the sublist A[mid+1 .. hi]
. The grayed-out values are no
longer in consideration. The size of the sublist being explored after this
iteration is reduced by half (from 7 values to 3).
In the second pass through the while
loop, mid
is recomputed, and it
turns out that A[mid]
is 53, which is the target value, so the function
returns True
.
Before conducting Binary Array Search, is it worth checking that target
≥ A[0]
and target
≤ A[-1]
? Doing so would prevent a
fruitless search for a target that couldn’t possibly be present in an
ordered list. The short answer is no. This adds up to two comparisons to
every search, which are unnecessary if the searched-for values are always
within the range of the extreme values in A
.
Now let’s search for a value that is not in the list. To search for the
target value of 17 in Figure 2-6, initialize lo
and
hi
as before. A[mid]
is 19, which is larger than the target, 17, so
take the if
case and focus the search on A[lo .. mid-1]
. The grayed-out
values are no longer in consideration. The target, 17, is greater than
A[mid]
= 14, so take the elif
case and try searching A[mid+1 .. hi]
.
In the third time through the while
loop, A[mid]
is 15, which is
smaller than the target value of 17. Once again, take the elif
case, which
sets lo
to be larger than hi
; this is the “crossover,” shown at the
bottom of Figure 2-6. The condition of the while
loop
is false, and so False
is returned, declaring that A
does not contain
the target value.
What if you want to know the exact location of target
in A
, instead
of just confirming that target
is contained in A
? Binary Array Search
currently returns True
or False
. Modify the code, as shown in
Listing 2-4, to return the index position, mid
, where
target
is found.
target
in A
def
binary_array_search
(
A
,
target
)
:
lo
=
0
hi
=
len
(
A
)
-
1
while
lo
<
=
hi
:
mid
=
(
lo
+
hi
)
/
/
2
if
target
<
A
[
mid
]
:
hi
=
mid
-
1
elif
target
>
A
[
mid
]
:
lo
=
mid
+
1
else
:
return
mid
return
-
(
lo
+
1
)
What should be returned when target
is not in A
? You could just return
–1
(which is an invalid index location), but there is
an opportunity to return more information. What if we could tell the caller
“target
is not in A
, but if you wanted to insert target
into A
, it would
go in this location”?
Look at Figure 2-6 again. When searching for a target
value of 17 (which doesn’t exist in A
), the final value of lo
is
actually where 17 would be inserted. You could return –lo
as the
result, and this would work for all index locations except for the first one, which is zero. Instead return the negation of (lo + 1)
. The calling
function that receives a negative value, x
, has learned that target
would be placed at location –(x + 1)
. When a nonnegative value is
returned, that is the location of target
in A
.
One final optimization remains. In Listing 2-4, there are two
comparisons between target
and A[mid]
. When both values are numeric,
Listing 2-5 shows how to compute their difference
just once instead of invoking this key operation twice; this also ensures
you only access A[mid]
once.
diff
=
target
-
A
[
mid
]
if
diff
<
0
:
hi
=
mid
-
1
elif
diff
>
0
:
lo
=
mid
+
1
else
:
return
mid
If target
is smaller than A[mid]
, then diff
< 0, which is equivalent
logically to checking whether target < A[mid]
. If diff
is positive,
then you know target
was greater than A[mid]
. Even when the values are
not numeric, some programming languages offer a compareTo()
function that returns a
negative number, zero, or a positive number based on the relative ordering
of two values. Using this operation leads to more efficient code when
comparison operations are costly.
If the values in a list appear in descending order, you can still use
Binary Array Search—just switch the way lo
and hi
are updated in the
while
loop.
How efficient is Binary Array Search on a problem instance of size N? To
answer this question, I have to compute, in the worst case, how many
times the while
loop is forced to execute. The mathematical concept of
logarithms will tell us the answer.5
To see how logarithms work, consider this question: how many times do you
need to double the number 1 until the value equals 33,554,432? Well,
you could start computing this manually: 1, 2, 4, 8, 16, and so
on, but this is truly tedious. Mathematically, you are looking for a value,
x, such that 2x
= 33,554,432.
Note that 2x
involves exponentiation of a base (the value 2) and
an exponent (the value x
). A logarithm is the opposite of exponentiation,
in the same way that division is the opposite of multiplication. To find an
x
such that 2x
= 33,554,432, compute log
2 (33,554,432) using
a base of 2, resulting in the value 25.0. If you type the equation
225 into a calculator, the result is 33,554,432.
This computation also answers the question of how many times you can divide 2 into 33,554,432. You get to 1 after 25 divisions. log()
computes a floating point result; for example, log
2(137) is about
7.098032. This makes sense since 27
= 128, and 137 would require
a slightly higher exponent for base 2.
The Binary Array Search algorithm will repeat the while
loop as long as lo
≤ hi
, or in other words, while there are values to search. The first iteration starts with N values to search, and in the second iteration, this is reduced to no more than N/2—if N is odd it is (N – 1)/2. To determine the maximum number of successive iterations, you need to know how many times you can divide N by 2 until you reach 1. This quantity is exactly k = log
2
(N), so the total number of times through the while
loop is 1 + k
, counting 1 for the first time for N values plus k
for the successive iterations. Because log()
can return a floating point value, and we need an integer number for the
number of iterations, use the floor(x)
mathematical operation, which computes the largest integer that is smaller than or equal to x
.
No handheld calculator has a button to compute log
2(X)—most
calculator apps don’t either. Don’t panic! You can always compute
log
2 easily. For example, log
2(16) = 4. On your calculator,
enter 16 and then press the log
button (which is either in base 10 or
the natural base e
). Your display should read something awful like
1.20411998. Now press the /
(divide) button, press the 2 button, and
finally press the log
button again. Your display should read
0.301029995. Now that all hope seems lost, press the equals
button. Magically the value 4 appears. This sequence of operations
demonstrates that log
2(X) = log
10 (X)/log
10 (2).
For Binary Array Search, the while
loop iterates no more than
floor
(log
2
(N)) + 1 times. This behavior is truly extraordinary!
With one million values in sorted order, you can locate any value in
just 20 passes through the while
loop.
To provide some quick evidence for this formula, count the number of
iterations through the while
loop for problem instances whose size, N,
ranges from 8 to 15: you only need 4 in all cases. For example,
starting with 15 values in the first iteration, the second iteration
explores a sublist with 7 values, the third iteration explores a sublist with
3 values, and the fourth and final iteration explores a sublist with just
1 value. If you started with 10 values, the number of explored values
in each iteration would be 10 → 5 → 2 → 1, which
also means four iterations in total.
The experience with Binary Array Search leads to a new complexity class,
O(log
N), called the logarithmic complexity class, where f
(N) =
log
(N).
To summarize, when you analyze an algorithm and state that its time
complexity is O(log
N), you are claiming that once the problem instance
size is larger than some threshold size, the runtime performance, T
(N),
of the algorithm is always smaller than c
× log
(N) for some constant, c
. Your claim is correct if you can’t make this claim with another complexity class of lower complexity.
All complexity classes are arranged in order of dominance, as shown in Figure 2-7.
While there are an infinite number of complexity classes, the eight classes in this figure are the most commonly used ones. The constant time, O(1
), has the least amount of complexity and reflects constant work that is independent of the size of the problem instance. The next-higher complexity class, O(log
N), is logarithmic, and you have seen how Binary Array Search falls into this category. Both of these classes are sub-linear and result in extremely efficient algorithms.
The linear, O(N), complexity class means complexity is directly
proportional to the size of the problem instance. A series of polynomial classes are all of increasing complexity—O(N2), O(N3), and so on—up to any fixed constant, O(Nc). Sandwiched between O(N) and O(N2) is O(N log
N), which is often identified as the ideal complexity class for algorithm designers.
This dominance hierarchy can also help identify how to classify an
algorithm when there is a mixed combination of time complexities. For
example, if an algorithm contains two substeps—the first with time
complexity of O(N log
N) and the second with O(N2)—what is its
overall complexity? The overall classification of the algorithm is
O(N2) because the complexity of substep 2 is the dominant impact on the overall complexity. In practical terms, if you have modeled T
(N) for an algorithm to be 5N2 + 10,000,000 × N × log(N), then T
(N) has
complexity of O(N2).
The final two complexity classes—exponential and factorial—are inefficient, and algorithms with these time complexities can only solve very small problem instances. Check out the challenge exercises at the end of the chapter that address these complexity classes.
Table 2-4 presents the computation of f
(N) for each of
the designated complexity classes. Imagine that each of these numbers
represents the estimated time in seconds for an algorithm with the rated
time complexity (in a column) to process a problem instance of size N (as
identified by the row). 4,096 is about one hour and eight minutes, so in
just over an hour of computation time, you could likely solve:
An O(1
) algorithm, since its performance is independent of the problem instance size
An O(log
N) algorithm for problem instances of size 24096 or smaller
An O(N) algorithm for problem instances of 4,096 or smaller
An O(N log
N) algorithm for problem instances of size 462 or smaller
An O(N2) algorithm for problem instances of size 64 or smaller
An O(N3) algorithm for problem instances of size 16 or smaller
An O(2N) algorithm for problem instances of size 12 or smaller
An O(N!) algorithm for problem instances of size 7 or smaller
N | log(N) | N | N log N | N2 | N3 | 2N | N! |
---|---|---|---|---|---|---|---|
2 |
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
8 |
|
|
|
|
|
|
|
16 |
|
|
|
|
|
|
|
32 |
|
|
|
|
|
|
|
64 |
|
|
|
|
|
|
|
128 |
|
|
|
|
|
|
|
256 |
|
|
|
|
|
|
|
512 |
|
|
|
|
|
|
|
1,024 |
|
|
|
|
|
|
|
2,048 |
|
|
|
|
|
|
|
The reason to investigate algorithms with the lowest complexity rating is because the problems you want to solve are simply too large on even the fastest computer. With higher complexity classes, the time to solve even small problems essentially is infinite (as Figure 2-8 shows).
A common way to visualize these extremely large numbers is shown in Figure 2-8. The x-axis represents the size of the problem instance being solved. The y-axis represents the total estimated runtime performance for one of the algorithms labeled in the chart. As the complexity of the algorithm increases, the size of the problem instance that can be solved “in reasonable time” decreases.
Consider the following, more complex scenarios:
If someone classifies an algorithm as O(N2 + N), how should you respond? The dominance hierarchy in Figure 2-7 shows that N2 has a greater complexity than N, and so this can be simplified to be O(N2). Similarly, O(2N + N8) would be simplified to become O(2N).
If an algorithm is classified as O(50 × N3), you can simplify this to become O(N3) because multiplicative constants can be eliminated.
Sometimes the behavior of an algorithm can depend on properties other than just the size of the problem instance, N. For example, consider an algorithm that processes N numeric values, with a primary task whose runtime performance is directly proportional to N. Now assume this algorithm has a subtask that processes all even values in the problem instance. The runtime performance of this subtask is directly proportional to E2, where E represents the number of even values. You might want to carefully specify that the runtime performance of the algorithm is O(N + E2). If, for example, you could eliminate all even values from the input set, this performance would be rated as O(N), and that might be noteworthy. Naturally, in the worst case where all numbers are even, then the overall classification of the algorithm would become O(N2) since E ≤ N.
The curve_fit()
function provided by SciPy uses a nonlinear, least
squares method to fit a model function, f
, to existing data. I use data
based on different problem instances of size N and the respective runtime
performance of an algorithm in solving those instances. The result of
curve_fit()
—as you have seen in this chapter—are coefficients to
use with a model function to predict future runtime performance. With these
coefficients applied to f
, the resulting model minimizes the sum of the
square of the errors between the actual data and the predicted value from
the model.
These are useful to get a ballpark estimate of the inner behavior of an algorithm’s implementation on specific problem instances. By itself, this model is neither a proven upper bound or lower bound regarding the complexity of an algorithm. You need to review the algorithm’s implementation to develop a model that counts the number of key operations, which directly affects the runtime performance of the algorithm.
When you have an accurate model, f
(N), that reflects the count of the key
operations in the worst case for an algorithm on a problem instance of
size N, then you have classified O(f
(N)) in the worst case. This is the upper bound. A corresponding lower bound can be similarly derived to model the effort that the algorithm must at least expend in the worst case. The notation Ω(f
(N)) is used to describe the classification of the lower bound of an algorithm.6
In our earlier discussion of Binary Array Search, I demonstrated that the
while
loop iterates no more than floor
(log
2
(N)) + 1 times. This
means, in the worst case, use f
(N) = log
(N) to formally classify
Binary Array Search as O(log
N). What is the best case for Binary Array
Search? If the target value is found in A[mid]
, the function returns
after just one pass through the while
loop. Since this is a constant
independent of the size of the problem instance, this means that Binary
Array Search is classified as O(1
) in the best case. The big O notation can be used for both best case and worst case analysis, although many programmers assume it is meant only for worst case.
You may occasionally see the time complexity for an algorithm rated as Θ(N log
N) using the capitalized Greek symbol theta. This notation is typically used to analyze the average case for an algorithm. This means that the upper bound is O(N log
N), and the lower bound is Ω(N log
N). This is known, mathematically, as a tight bound and provides the best evidence that the runtime performance of the
algorithm is quite predictable.
We have covered a lot of ground in these first two chapters, but there is so much more you can learn about analyzing algorithms. I presented a number of examples that describe the way algorithms behave independent of how they are implemented. In the mid-20th century, while researchers were discovering new algorithms, advances in computing technology dramatically improved the very performance of the computers executing these algorithms. Asymptotic analysis provides the foundation for independently assessing the performance of algorithms in a way that eliminates any dependence on the computing platform. I defined several time (or storage) complexity classes, visualized in Figure 2-8, to explain the behavior of an algorithm in terms of the size of the problem instances. These complexity classes appear throughout the book as a notation to quickly summarize an algorithm’s behavior.
Rate the time complexity of each code fragment in Table 2-5.
Fragment-1 |
|
Fragment-2 |
|
Fragment-3 |
|
Fragment-4 |
|
Fragment-5 |
|
Use the techniques described in this chapter to model the value of ct
returned by the f4
function in Listing 2-6.
def
f4
(
N
):
ct
=
1
while
N
>=
2
:
ct
=
ct
+
1
N
=
N
**
0.5
return
ct
You will find that none of the models used in this chapter is accurate.
Instead, develop one based on a
× log
(log
(N)), in base 2. Generate a
table up to N = 250 containing actual results as compared to the
model. An algorithm with this behavior would be classified as
O(log
(log
(N))).
One way to sort a list of values is to generate each permutation until you find one that is sorted, as shown in Listing 2-7.
from
itertools
import
permutations
from
scipy.special
import
factorial
def
factorial_model
(
n
,
a
):
return
a
*
factorial
(
n
)
def
check_sorted
(
a
):
for
i
,
val
in
enumerate
(
a
):
if
i
>
0
and
val
<
a
[
i
-
1
]:
return
False
return
True
def
permutation_sort
(
A
):
for
attempt
in
permutations
(
A
):
if
check_sorted
(
attempt
):
A
[:]
=
attempt
[:]
# copy back into A
return
Generate a table of results for sorting a worst case problem instance
(i.e., the values are in descending order) of up to 12 elements using
permutation_sort()
. Use the factorial_model()
to curve fit the
preliminary results and see how accurate the model is in predicting runtime
performance. Based on these results, what is your estimate (in years) for
the runtime performance on a worst case problem instance of size 20?
Generate empirical evidence on 50,000 random trials of Binary Array
Search for N in the range of 25 through 221. Each trial should
use random.sample()
to randomly select N values from the range 0 .. 4N
and place these values in sorted order. Then each trial should
search for a random target value in the same range.
Using the results I have outlined in this chapter, use curve_fit()
to
develop a log
N model that models the results of runtime performance for
N in the range 25 through 212. Determine the threshold
problem instance size above which the behavior stabilizes. Create a visual
plot of the data to see whether the computed model accurately models the
empirical data.
We are normally concerned with time complexity, but consider the sorting algorithm in Listing 2-8:
def
max_sort
(
A
):
result
=
[]
while
len
(
A
)
>
1
:
index_max
=
max
(
range
(
len
(
A
)),
key
=
A
.
__getitem__
)
result
.
insert
(
0
,
A
[
index_max
])
A
=
list
(
A
[:
index_max
])
+
list
(
A
[
index_max
+
1
:])
return
A
+
result
Using the results I have outlined in this chapter, assess the storage
complexity of max_sort
.
A galactic algorithm is an algorithm whose time complexity is better
than any known algorithm when the problem instance size is “sufficiently
large.” For example, the N-digit multiplication algorithm (published
November 2020) by David Harvey and Joris Van Der Hoeven has O(N log
N)
runtime performance, once N is larger than 2Z, where Z is
172912; this exponent, Z, is already an astronomically large number,
about 7 × 1038. Consider now raising 2 to this incredibly large number!
Do some research on other galactic algorithms. While these algorithms are
not practical, they do offer hope that breakthroughs are possible on some
really challenging problems.
Table 2-1 contains three rows of performance measurements on three data sets of different sizes. If you only had two rows of performance measurements, would it be possible to predict the performance of a quadratic time algorithm? In general, if you have K rows of performance measurements, what is the highest polynomial you can effectively use in a model?
1 Pronounced using three syllables, like en-log-en.
2 A fast multiplication algorithm was discovered in 1960 by a 23-year old student at Moscow State University named Anatoly Karatsuba. Python uses this algorithm when multiplying very large integers.
3 The exponent, 1.585, is the approximate value of log
(3) in base 2, which is 1.58496250072116.
4 This is a European Union requirement; in the United States, it is sufficient for the speedometer to be accurate to within plus or minus 5 miles per hour.
5 It is purely a coincidence that the word logarithm is an anagram of algorithm.
6 Ω is the capitalized Greek character omega.