In this chapter, I present algorithms that rearrange the N values in an array so they are in ascending order. Organizing a collection of values in sorted order is an essential first step to improve the efficiency of many programs. Sorting is also necessary for many real-world applications, such as printing staff directories for a company with the names and phone numbers of employees, or displaying flight departure times on an airport display.
With an unordered array, searching for a value, in the worst case, is
O(N). When the array is sorted, Binary Array Search, in the worst case,
can locate a target value in O(log
N) performance.
Try sorting the values in the array, A
, at the top of Figure 5-1.
Use a pencil to copy the values from Figure 5-1 onto a piece of
paper (or bring out a pen and just write on these pages!). I challenge you
to sort these values in ascending order by repeatedly swapping the
location of two values in the array. What is the fewest number of swaps
that you need? Also, count the number of times you compare two values
together. I have sorted these values with five swaps. Is it possible to use
fewer?1
A
, to sortWhile it’s important to count the number of swaps, you also need to count
the number of comparisons between two values. To start, you can determine
that 2 is the smallest value in A
, with just seven comparisons,
something I showed in Chapter 1. The smallest value is found at A[3]
, so
it is swapped with A[0]
. This moves the smallest value to the front of
the array where it belongs. In Figure 5-1, I highlight values when
they are swapped. I use bold borders to mark the values that are guaranteed
to be in their final location; these will not be swapped again. All values
outside of the bolded borders remain to be sorted.
I scan the remaining values to find the largest value, 24, (using six
comparisons) and swap A[5]
and A[7]
to move the largest value to the
end of the array. I then locate the smallest remaining value, 5, (using
five comparisons) and swap A[1]
and A[6]
to move 5 into its proper
place. It looks like 21 is in its right spot, which takes four
comparisons to validate; no need for a swap here!
With three comparisons, I find that 15 is the smallest remaining value,
and I choose to swap the second occurrence of 15, A[4]
, with
A[2]
. With two comparisons, you can validate that 15 belongs in index
position 3, which leaves just one more comparison to swap A[4]
and
A[5]
, moving 19 into its proper spot. In the final step shown in
Figure 5-1, the value 20 is in the right location, since it is
larger than or equal to all values to its left and is smaller than or equal
to all values to its right. With five exchanges and 28
comparisons, I have sorted this array.
I didn’t follow a specific algorithm to sort this small group of values; sometimes I looked for the smallest value, while other times I looked for the largest. The number of comparisons is reduced after each swap, and there are far more comparisons than swaps. I now define a sorting algorithm that works on any array of N values and evaluate its runtime performance.
Selection Sort is named because it incrementally sorts an array from left
to right, repeatedly selecting the smallest value remaining and swapping
it into its proper location. To sort N values, find the smallest value and
swap it with A[0]
. Now only N – 1 values remain to be sorted, since A[0]
is in its final spot. Find the location of the smallest remaining value and
swap it with A[1]
, which leaves N – 2 values to sort. Repeat this process
until all values are in place.
What happens when the smallest remaining value is already in its proper
place, that is, when i
is equal to min_index
when the for
loop over j
completes? The code will attempt to swap A[i]
with A[min_index]
, and nothing in the array will change. You might think to add an if
statement to only swap when i
and min_index
are different, but it will not noticeably improve performance.
In Listing 5-1, there is an outer for
loop over i
that
iterates through nearly every index position in the array, from 0 to N – 2. The
inner for
loop over j
iterates through the remaining index positions in
the array, from i+1
up to N – 1 to find the smallest remaining value. At
the end of the for
loop over i
, the value at index position i
is
swapped with the smallest value found at index position min_index
.
def
selection_sort
(
A
)
:
N
=
len
(
A
)
for
i
in
range
(
N
-
1
)
:
min_index
=
i
for
j
in
range
(
i
+
1
,
N
)
:
if
A
[
j
]
<
A
[
min_index
]
:
min_index
=
j
A
[
i
]
,
A
[
min_index
]
=
A
[
min_index
]
,
A
[
i
]
Before each pass through the i
for
loop, you know A[0 .. i-1]
is sorted.
min_index
is the index location of the smallest value in A[i .. N-1]
.
If any A[j]
< A[min_index]
, then update min_index
to remember index location for this newly discovered smallest value.
Swap A[i]
with A[min_index]
to ensure that A[0 .. i]
is sorted.
At a high level, Selection Sort starts with a problem of size N and reduces it one step at a time, first to a problem of size N – 1, then to a problem of size N – 2, until the whole array is sorted. As shown in Figure 5-2, it takes N – 1 swaps to sort an array.
After
these swaps have properly placed N – 1 values into their final location, the
value at A[N–1]
is the largest remaining unsorted value, which means it
is already in its final location. Counting the number of comparisons is
more complicated. In Figure 5-2, it was 28, which is the
sum of the numbers from 1 through 7.
Mathematically, the sum of the numbers from 1 to K is equal to K × (K + 1)/2; Figure 5-3 shows a visualization to provide the intuition behind this formula. Number 28 is called a triangle number, from the shape formed by the arrangement of cells.
If you make a second triangle equal in size to the first and rotate it 180 degrees, the two triangles combine to form a K by K + 1 rectangle. The count of the squares in each triangle is half the number of squares in the 7 x 8 rectangle. In this figure, K = 7. When sorting N values, K = N – 1 since that is the number of comparisons in the first step to find the smallest value: the total number of comparisons is (N – 1) × N/2 or ½ × N2 – ½ × N.
The analysis for Selection Sort shows that the number of comparisons is dominated by the N2 term, which means its performance will be O(N2) since that is the dominant operation. To explain why, look at how Selection Sort has N – 1 distinct steps when sorting N values. In the first step, it finds the smallest value in N – 1 comparisons, and only one value is moved into its proper location. In each of the subsequent N – 2 steps, the number of comparisons will (ever so slowly) decrease until there is no work done in the final step. Can something be done to reduce the number of comparisons?
Insertion Sort is a different sorting algorithm that also uses N – 1 distinct
steps to sort an array from left to right. It starts by assuming that
A[0]
is in its proper location (hey, it could be the smallest value in
the array, right?). In its first step, it checks if A[1]
is smaller than
A[0]
and swaps these two values as needed to sort in ascending order. In
the second step, it tries to insert the A[2]
value into its proper
sorted location when just considering the first three values. There are
three possibilities: either A[2]
is in its proper spot, or it should be
inserted between A[0]
and A[1]
, or it should be inserted before
A[0]
. However, since you cannot insert a value between two array
positions, you must repeatedly swap values to make room for the value to
be inserted.
At the end of each step, as shown in Figure 5-4, Insertion Sort repeatedly swaps neighboring out-of-order values.
All swapped values are highlighted, and bold borders identify the sorted values in the array. Unlike Selection Sort, values within the bold borders may continue to be swapped, as you can see in the figure. At times (like when the value 5 is inserted) there is a sequence of cascading swaps to move that value into its proper place because the value to insert is smaller than most of the already sorted values. At other times (like when 21 or 24 is inserted), no swaps are needed because the value to insert is larger than all of the already-sorted values. In this example, there are 20 comparisons and 14 swaps. For Insertion Sort, the number of comparisons will always be greater than or equal to the number of swaps. On this problem instance, Insertion Sort uses fewer comparisons than Selection Sort but more swaps. Its implementation, in Listing 5-2, is surprisingly brief.
def
insertion_sort
(
A
)
:
N
=
len
(
A
)
for
i
in
range
(
1
,
N
)
:
for
j
in
range
(
i
,
0
,
-
1
)
:
if
A
[
j
-
1
]
<
=
A
[
j
]
:
break
A
[
j
]
,
A
[
j
-
1
]
=
A
[
j
-
1
]
,
A
[
j
]
Insertion Sort works the hardest when each value to be inserted is smaller than all already-sorted values. The worst case for Insertion Sort occurs when the values are in descending order. In each successive step, the number of comparisons (and swaps) increases by one, summing in total to the triangle numbers mentioned earlier.
Selection Sort will always have ½ × N2 – ½ × N comparisons and N – 1 swaps when sorting N values. Counting the operations for Insertion Sort is more complicated because its performance depends on the order of the values themselves. On average, Insertion Sort should outperform Selection Sort. In the worst case for Insertion Sort, the values appear in descending order, and the number of comparisons and swaps is ½ × N2 – ½ × N. No matter what you do, both Insertion Sort and Selection Sort require on the order of N2 comparisons, which leads to the runtime performance visualized in Figure 5-5. Another way to explain this poor behavior is to observe that the problem instance size 524,288 is 512 times as large as 1,024, yet the runtime performance for both Selection Sort and Insertion Sort takes about 275,000 times longer.2 Sorting 524,288 values takes about two hours for Insertion Sort and nearly four hours for Selection Sort. To solve larger problems, you would need to measure the completion times in days or weeks. This is what a quadratic, or O(N2), algorithm will do to you, and it is simply unacceptable performance.
What if you wanted to sort an array in descending order? Or what if the values have a complex structure and there is no default less-than operation defined? Each of the sorting algorithms in this chapter can be extended with a parameter for a comparator function to determine how values are to be ordered, as shown in Listing 5-3. For simplicity, the implementations of the remaining algorithms assume the values are sorted in ascending order.
def
insertion_sort_cmp
(
A
,
less
=
lambda
one
,
two
:
one
<
=
two
)
:
N
=
len
(
A
)
for
i
in
range
(
1
,
N
)
:
for
j
in
range
(
i
,
0
,
-
1
)
:
if
less
(
A
[
j
-
1
]
,
A
[
j
]
)
:
break
A
[
j
]
,
A
[
j
-
1
]
=
A
[
j
-
1
]
,
A
[
j
]
Both Selection Sort and Insertion Sort use N – 1 steps to sort an array of N values, where each step reduces the problem size by just one. A different strategy, known as divide and conquer, breaks a problem up into two sub-problems to be solved.
The concept of recursion has existed in mathematics for centuries—it occurs when a function calls itself.
The Fibonacci series starts with two integers, 0 and 1. The next integer in the series is the sum of the two prior numbers. The next few integers in the series are 1, 2, 3, 5, 8, 13, and so on. The recursive formula for the nth integer in the series is F(n) = F(n–1) + F(n–2). As you can see, F(n) is defined by calling itself twice.
The factorial of an integer, N, is the product of all positive integers less than or equal to N. It is written as “N!”; thus 5! = 5 × 4 × 3 × 2 × 1 = 120. Another way to represent this operation is to state that N! = N × (N – 1)! For example, 120 = 5 × 4!, where 4! = 24. A recursive implementation is shown in Listing 5-4.
def
fact
(
N
)
:
if
N
<
=
1
:
return
1
return
N
*
fact
(
N
-
1
)
It may seem odd to see a function calling itself—how can you be sure
that it will not do so forever? Each recursive function has a base case
that prevents this infinite behavior. fact(1)
will return 1
and not
call itself.3 In the recursive case, fact(N)
calls itself with an argument of
N – 1 and multiplies the returned computation by N to produce its final
result.
Figure 5-6 visualizes the execution of the statement y = fact(3)
as time advances downward. Each box represents an invocation of fact()
with the given argument (whether 3, 2, or 1). Invoking fact(3)
recursively calls fact(2)
. When that happens, the original fact(3)
function will be “paused” (grayed out in the figure) until the value of
fact(2)
is known. When fact(2)
is invoked, it also must recursively
call fact(1)
, so it is paused (and grayed out in the figure) until the
value of fact(1)
is known. Finally at this point, the base case stops the
recursion, and fact(1)
returns 1 as its value, shown inside a dashed circle; this resumes the paused
execution of fact(2)
, which returns 2 × 1 = 2 as its value. Finally, the
original fact(3)
resumes, returning 3 × 2 = 6, which sets the value of y
to 6.
During recursion, any number of
fact(N)
invocations can be paused until the base case is reached.4 Then,
the recursion “unwinds” one function call at a time until the original
invocation is complete.
In reviewing this algorithm, it still solves a problem of size N by reducing it into a smaller problem of size N – 1. What if the problem of size N could be divided into two problems of, more or less, N/2? It might seem like this computation could go on forever, since each of these two sub-problems are further subdivided into four sub-problems of size N/4. Fortunately the base case will ensure that—at some point—the computations will complete.
Consider a familiar problem, trying to find the largest value in an
unordered array of N values. In Listing 5-5, find_max(A)
invokes a recursive helper function,5 rmax(0,len(A)–1)
, to properly set up the initial values for
lo
= 0 and hi
= N – 1, where N is the length of A
. The base case in
rmax()
stops the recursion once lo = hi
because this represents looking
for the largest value in a range containing just a single value. Once the
largest values are determined for the left and right sub-problems, rmax()
returns the larger of these two values as the largest value in A[lo
.. hi]
.
fact(3)
def
find_max
(
A
)
:
def
rmax
(
lo
,
hi
)
:
if
lo
==
hi
:
return
A
[
lo
]
mid
=
(
lo
+
hi
)
/
/
2
L
=
rmax
(
lo
,
mid
)
R
=
rmax
(
mid
+
1
,
hi
)
return
max
(
L
,
R
)
return
rmax
(
0
,
len
(
A
)
-
1
)
Invoke the initial recursive call with proper arguments for lo
and hi
.
Base case: when lo
== hi
, the range A[lo .. hi]
contains a single value; return it as the largest value.
Find midpoint index location in the range A[lo .. hi]
. Use integer division //
in case range has odd number of values.
L
is the largest value in the range A[lo .. mid]
.
R
is the largest value in the range A[mid+1 .. hi]
.
The largest value in A[lo .. hi]
is the maximum of L
and R
.
The function rmax(lo, hi)
solves this problem recursively by dividing a
problem of size N into two problems of half the size.
Figure 5-7 visualizes the execution of
rmax(0,3)
on the given array, A
, with four values. To solve this
problem, it solves two sub-problems: rmax(0,1)
finds the largest value in
the left-hand side of A
, and rmax(2,3)
finds the largest value in the
right-hand side of A
. Since rmax()
makes two recursive calls within its
function, I introduce a new visualization to describe where, in rmax()
,
the execution is paused. I still use a gray background to show that
rmax()
is paused when it makes a recursive call: in addition, the lines
highlighted with a black background will execute once the recursive call
returns.
In Figure 5-7, I only have space to show the
three recursive calls that complete to determine that 21 is the largest
value in the left-hand side of A
. As you can see, the final two lines in
the invocation box for rmax(0,3)
are highlighted in black to remind you
that the rest of the computation will resume with the recursive call to
rmax(2,3)
. A similar sequence of three additional recursive calls would
complete the right-hand sub-problem, ultimately allowing the original
recursive invocation rmax(0,3)
to return max(21,20)
as its answer.
Figure 5-8 visualizes the full recursive behavior
of rmax(0,7)
. Similar to my explanation for fact()
, this figure shows
how the invocation of rmax(0,3)
is paused while it recursively computes
the first sub-problem, rmax(0,1)
. The original problem is repeatedly
subdivided until rmax()
is invoked where its parameters lo
and hi
are
equal; this will happen eight different times in the figure, since there
are N = 8 values. Each of these eight cases represents a base case, which
stops the recursion. As you can see in
Figure 5-8, the maximum value is 24, and I have
highlighted the rmax()
recursive calls that return this value.
rmax(0,3)
on A = [15,21,20,2]
rmax(0,7)
Inspired by these examples, we can now ask, “Is there a recursive divide-and-conquer approach to sort an array?” Listing 5-6 contains the gist of an idea: to sort an array, recursively sort its left half, and recursively sort its right half; then somehow merge the partial results to ensure the whole array is sorted.
def
sort
(
A
)
:
def
rsort
(
lo
,
hi
)
:
if
hi
<
=
lo
:
return
mid
=
(
lo
+
hi
)
/
/
2
rsort
(
lo
,
mid
)
rsort
(
mid
+
1
,
hi
)
merge
(
lo
,
mid
,
hi
)
rsort
(
0
,
len
(
A
)
-
1
)
The structure of Listing 5-6 is identical to the
find_max(A)
function described in Listing 5-5. Completing
this implementation leads to Merge Sort, an in-place recursive sorting
algorithm that requires extra storage but provides the breakthrough we were
looking for, namely an O(N log
N) sorting algorithm.
The key to Merge Sort is the merge
function that merges in place the
sorted left half of an array with the sorted right half of an array. The
mechanics of merge()
might be familiar if you’ve ever had two sorted
stacks of paper that you want to merge into one final sorted stack, as
shown in Figure 5-9.
To merge these two stacks into one stack, look at the topmost remaining value in each stack and choose the smallest one. In the first two steps, 2 is removed from the left stack, and then 5 is removed from the right. When faced with two values that are the same, arbitrarily take the value from the left stack, first removing 15 from the left stack, then removing 15 from the right stack. Repeat this process until one of the stacks is exhausted (which happens in the final eighth step). When only one stack remains, just take all those values as a group, since they are already sorted.
The merge process sketched in Figure 5-9 works because of the extra storage into which the values are placed. The most efficient way to implement Merge Sort is to initially allocate extra storage equal to the size of the original array being sorted, as shown in Listing 5-7.
def
merge_sort
(
A
)
:
aux
=
[
None
]
*
len
(
A
)
def
rsort
(
lo
,
hi
)
:
if
hi
<
=
lo
:
return
mid
=
(
lo
+
hi
)
/
/
2
rsort
(
lo
,
mid
)
rsort
(
mid
+
1
,
hi
)
merge
(
lo
,
mid
,
hi
)
def
merge
(
lo
,
mid
,
hi
)
:
aux
[
lo
:
hi
+
1
]
=
A
[
lo
:
hi
+
1
]
left
=
lo
right
=
mid
+
1
for
i
in
range
(
lo
,
hi
+
1
)
:
if
left
>
mid
:
A
[
i
]
=
aux
[
right
]
right
+
=
1
elif
right
>
hi
:
A
[
i
]
=
aux
[
left
]
left
+
=
1
elif
aux
[
right
]
<
aux
[
left
]
:
A
[
i
]
=
aux
[
right
]
right
+
=
1
else
:
A
[
i
]
=
aux
[
left
]
left
+
=
1
rsort
(
0
,
len
(
A
)
-
1
)
Allocate auxiliary storage equal in size to original array.
Base case: with 1 or fewer values, there is nothing to sort.
Recursive case: sort left and right sub-arrays and then merge.
Copy sorted sub-arrays from A
into aux
to prepare for merge.
Set left
and right
to be the starting index positions of the corresponding
sub-arrays.
When left sub-array is exhausted, take value from right sub-array.
When right sub-array is exhausted, take value from left sub-array.
When right value is smaller than left value, take value from right sub-array.
When left value is smaller than or equal to right value, take value from left sub-array.
Invoke the initial recursive call.
Figure 5-10 visualizes the dynamic behavior of
merge()
. The first step of merge(lo,mid,hi)
is to copy the elements
from A[lo .. hi]
into aux[lo .. hi]
since this is the sub-problem range
being sorted.
The for
loop over i
will execute 8 times, because that is the total
size of the two sub-problems being merged. Starting in the third row of
Figure 5-10, the variables left
, right
, and i
each
keep track of specific locations:
left
is the index position of the next value in the left sub-array to be
merged.
right
is the index position of the next value in the right sub-array to be
merged.
i
is the index position in A
where successively larger values are
copied until, by the last step, all values in A[lo .. hi]
are in sorted
order.
Within the for
loop, up to two values in aux
(highlighted in
Figure 5-10) are compared to find the lower value, which
is then copied into A[i]
. With each step, i
is incremented, while left
and right
advance only when the value at aux[left]
or aux[right]
is
found to be the next smallest one to be copied into A
. The time to
complete merge()
is directly proportional to the combined size of the
sub-problems (or hi – lo + 1
).
Merge Sort is a great example of a divide-and-conquer algorithm that
guarantees
O(N log
N) performance. If you have a problem that satisfies
the following checklist, then an O(N log
N) algorithm exists:
If you can subdivide a problem of size N into two independent sub-problems of size N/2; it is perfectly fine for one sub-problem to be slightly larger than the other.
If you have a base case that either does nothing (like with Merge Sort) or performs some operations in constant time.
If you have a processing step (either before the problem is subdivided or
afterward as a post-processing step) that requires time directly
proportional to the number of values in the sub-problem. For example, the
for
loop in merge()
repeats a number of times equal to the size of
the sub-problem being solved.
Another sorting algorithm that follows divide-and-conquer is Quicksort,
one of the most heavily studied and efficient sorting algorithms ever
designed.6 It recursively sorts an array by selecting an element in A
to use as a pivot value, p
, and then it inserts p
into its proper
location in the final sorted array. To do this, it rearranges the contents
of A[lo .. hi]
such that there is a left sub-array with values that are
≤ p
, and a right sub-array with values that are ≥ p
. You can
confirm in Figure 5-11 that the partitioned array has this
property.
partition(A,0,7,0)
using A[0]
as pivotThis amazing feat may at first seem impossible—how do you know where p
exists in
the final sorted array without actually sorting the entire array? It turns
out that partitioning doesn’t sort all elements in A
but rearranges just
a few based on p
. In the challenge exercises found in Chapter 1, you can
find the implementation of partition()
. After partition()
completes in
Figure 5-11, the left sub-array to be sorted contains two
values, while the right sub-array contains five values. Each of these
sub-arrays is recursively sorted using Quicksort, as shown in Listing 5-8.
def
quick_sort
(
A
)
:
def
qsort
(
lo
,
hi
)
:
if
hi
<
=
lo
:
return
pivot_idx
=
lo
location
=
partition
(
A
,
lo
,
hi
,
pivot_idx
)
qsort
(
lo
,
location
-
1
)
qsort
(
location
+
1
,
hi
)
qsort
(
0
,
len
(
A
)
-
1
)
Base case: with 1 or fewer values, there is nothing to sort.
Choose A[lo]
as the pivot value, p
.
Return location
in A
such that:
A[location] = p
All values in left sub-array A[lo .. location–1]
are all ≤ p
All values in right sub-array A[location+1 .. hi]
are all ≥ p
Recursive case: sort in place left and right sub-arrays, since p
is already in its proper sorted location, A[location]
.
Invoke the initial recursive call.
Quicksort presents an elegant recursive solution whose success depends on
the partitioning function. For example, if partition()
is invoked on a
sub-array A[lo .. hi]
containing N values and the smallest value in
that sub-array is used as the pivot, then the resulting left sub-array is
empty, whereas the right sub-array contains N – 1 values. Reducing a
sub-problem by 1 is exactly how Insertion Sort and Selection Sort
performed, leading to inefficient O(N2) sorting. The top of Figure 5-12 summarizes the key steps of Quicksort applied to the array from Figure 5-11. The bottom of Figure 5-12 shows the full recursive execution. On the right side of the figure, you can see A
, the array being sorted, and how its values change in response to the recursive execution. For each partition of a range A[lo .. hi]
, the selected pivot is always A[lo]
, which is why each box reads partition(lo,hi,lo)
. As time moves vertically down the figure, you can see how each partition()
invocation leads to 1 or 2 recursive calls to qsort()
. For example, partition(0,7,0)
on A
places 15 into its final index location (which is why it is grayed out on the right), leading to two subsequent recursive invocations: qsort(0,1)
on the left sub-array and qsort(3,7)
on the right sub-array. The invocation of qsort(3,7)
does not start until qsort(0,1)
has completed its work.
Each time partition
is invoked, a different value is placed into its
proper index location and grayed out. When qsort(lo,hi)
is invoked on a
range where lo = hi
, that value is in its proper location, and it is also
grayed out.
When a partition(lo,hi,lo)
produces only a single recursive call to
qsort()
, it is because the pivot value is placed in either A[lo]
or
A[hi]
, thus reducing the problem size by just 1. For example, given the implementation in Listing 5-8, Quicksort will degrade its performance to O(N2) when called on an array of already-sorted values! To avoid this behavior, Quicksort is often modified to choose the pivot value randomly from within the range A[lo .. hi]
by replacing pivot_idx = lo
in Listing 5-8 with pivot_idx = random.randint(lo, hi)
. Decades of research have confirmed that there is always a theoretical possibility that in the worst case, Quicksort will have a runtime performance of O(N2). Despite this weakness, Quicksort is often the sorting algorithm of choice because, unlike Merge Sort, it does not require any extra storage. In reviewing the structure for Quicksort, you can see that it conforms to the checklist for O(N log
N) algorithms.
Another way to achieve O(N log
N) is to have N steps where the runtime
performance of each step is O(log
N). Using the heap data structure
introduced in the last chapter, I now present Heap Sort, whose runtime
performance is O(N log
N).
To see why a max binary heap can help sort an array, consider
Figure 5-13 that presents the array storage for the heap
from Figure 4-17. The largest value in A
is
found in A[1]
. When this max value is dequeued, the underlying array
storage is updated to reflect the modified max binary heap containing one
less value. More importantly, the index position A[18]
is not only
unused, it is exactly the index position that should contain the maximum
value if the array were sorted. Simply place the dequeued value there.
Perform another dequeue, and this value (the second-largest value in the
heap) can be placed in index position A[17]
, which is now unused.
To make this promising approach work, I need to address the following issues:
The heap data structure ignores the value in index position 0 to simplify its computations using an array of size N + 1 to store N values.
The heap is initially empty, and new values are enqueued one at a time. When starting with N values to sort initially, there needs to be an efficient way to “bulk upload” all values.
Let’s fix how index positions are calculated. The original heap with 18 elements (as shown in Figure 5-13) was stored in an array with 19 elements. Any reference to A[i]
uses 1-based indexing, meaning that A[1]
stored the first value in the heap, and A[N–1]
stored the last. In Listing 5-9, the less(i,j)
and swap(i,j)
functions all subtract 1 from i
and j
whenever accessing A[i]
or A[j]
. This allows 1-based indexing to work with 0-based array storage. The largest value in the heap is now in A[0]
. When swap(1, N)
appears in the sort()
function, it actually swaps the values in A[0]
and A[N–1]
. With this small adjustment, the sink()
method remains the same. Note that Heap Sort never uses swim()
.
class
HeapSort
:
def
__init__
(
self
,
A
)
:
self
.
A
=
A
self
.
N
=
len
(
A
)
for
k
in
range
(
self
.
N
/
/
2
,
0
,
-
1
)
:
self
.
sink
(
k
)
def
sort
(
self
)
:
while
self
.
N
>
1
:
self
.
swap
(
1
,
self
.
N
)
self
.
N
-
=
1
self
.
sink
(
1
)
def
less
(
self
,
i
,
j
)
:
return
self
.
A
[
i
-
1
]
<
self
.
A
[
j
-
1
]
def
swap
(
self
,
i
,
j
)
:
self
.
A
[
i
-
1
]
,
self
.
A
[
j
-
1
]
=
self
.
A
[
j
-
1
]
,
self
.
A
[
i
-
1
]
To ensure that i
// 2 computes the parent index location for i
, both less()
and swap()
subtract 1 from i
and j
, as if they were using 1-based
indexing.
Convert array to be sorted, A
, into a max binary heap in bottom-up fashion, starting at N
//2, the highest index position that has at least one child.
The while
loop continues as long as there are values to sort.
Dequeue maximum value by swapping with last value in heap.
Reduce size of heap by one for upcoming sink()
to work.
Sink the newly swapped value into its proper location, which reestablishes the heap-ordered property.
The most important step in Heap Sort is constructing the initial max binary
heap from the original array to be sorted. The for
loop in HeapSort
completes this task, and the result is shown in Figure 5-14, which
required only 23 total comparisons and 5 swaps. This for
loop
constructs a heap from the bottom to the top by starting at index position
N
//2, the highest index position that has at least one child. In
reverse order, the for
loop calls sink()
on the kth index position to
ultimately ensure that all values in the array satisfy the heap-ordered
property. These index positions are drawn with a bold border in Figure 5-14.
Through a rather unexpected theoretical analysis, the total number of
comparisons required to convert an arbitrary array into a max binary heap
is no more than 2N in the worst case. The intuition behind this result
can be seen in the running total of comparisons in Figure 5-14,
which shows a steady, but slow, growth rate. I continue to alternatively
shade the index positions within A
by the computed level of the max
binary heap to show how values are swapped between levels.
The final row in Figure 5-14 represents a max binary heap—in
fact, the exact same one depicted in Figure 4-16, now offset by one
index position to use all N index positions. The sort()
function in
Listing 5-9 now repeatedly swaps the largest value in the heap
with the last value in the heap (using the trick hinted at in
Figure 5-13), which has the effect of placing that value
in exactly its proper location in the final sorted array. sort()
then
reduces the size of the heap by one, and sink()
properly re-establishes
the heap-ordered property with runtime performance of O(log
N), as
described in Chapter 4.
How does the runtime performance of these different sorting algorithms—all classified as O(N log
N)—compare with each other? Let’s start with
some empirical results, as shown in Table 5-1. Reading the
numbers down in a column reports the timing results of an algorithm as the
problem size doubles; you can see that each timing value is a bit more than
twice as large as its previous value. This relative performance is the
signature behavior of an O(N log
N) algorithm.
N | Merge Sort | Quicksort | Heap Sort | Tim Sort | Python Sort |
---|---|---|---|---|---|
1,024 |
|
|
|
|
|
2,048 |
|
|
|
|
|
4,096 |
|
|
|
|
|
8,192 |
|
|
|
|
|
16,384 |
|
|
|
|
|
32,768 |
|
|
|
|
|
65,536 |
|
|
|
|
|
131,072 |
|
|
|
|
|
262,144 |
|
|
|
|
|
524,288 |
|
|
|
|
|
1,048,576 |
|
|
|
|
|
Now in each row, the absolute runtime performance of each algorithm is different. In Chapter 2, I discussed how different behaviors within a classification can vary by a multiplicative constant. This table provides evidence of this observation. Once the problem size is large enough, Quicksort is about 15% faster than Merge Sort, while Heap Sort is more than four times slower.
The last two columns in Table 5-1 report on the performance
of a new sorting algorithm, Tim Sort, invented by Tim Peters for Python in
2002. This algorithm is quickly becoming the standard sorting algorithm
used by major programming languages, such as Java, Python, and
Swift. Column “Tim Sort” represents the runtime performance for a
simplified Tim Sort implementation, which also exhibits O(N log
N)
behavior. The final column, labeled “Python Sort,” represents the runtime
performance using the built-in sort()
method in the list
data
type. Because it is
implemented internally, it will naturally be the most
efficient—as you can see, it is around 15 times faster than
Quicksort. It is worthwhile to investigate Tim Sort because it mixes
together two different sorting algorithms to achieve its outstanding
performance.
Tim Sort combines Insertion Sort and the merge()
helper function from
Merge Sort in a novel way to provide a fast sorting algorithm that
outperforms other sorting algorithms on real-world data. In particular, Tim
Sort dynamically takes advantage of long sequences of partially sorted data
to deliver truly outstanding results.
As shown in Listing 5-10, Tim Sort first partially sorts N/size
sub-arrays of a computed size
, based on compute_min_run()
. size
will typically be an integer between
32 and 64, which means we can treat this number as a constant that is
independent of N. This stage ensures there are sequences of partially
sorted data, which improves the behavior of merge()
, the helper function
from Merge Sort that merges two sorted sub-arrays into one.
def
tim_sort
(
A
)
:
N
=
len
(
A
)
if
N
<
64
:
insertion_sort
(
A
,
0
,
N
-
1
)
return
size
=
compute_min_run
(
N
)
for
lo
in
range
(
0
,
N
,
size
)
:
insertion_sort
(
A
,
lo
,
min
(
lo
+
size
-
1
,
N
-
1
)
)
aux
=
[
None
]
*
N
while
size
<
N
:
for
lo
in
range
(
0
,
N
,
2
*
size
)
:
mid
=
min
(
lo
+
size
-
1
,
N
-
1
)
hi
=
min
(
lo
+
2
*
size
-
1
,
N
-
1
)
merge
(
A
,
lo
,
mid
,
hi
,
aux
)
size
=
2
*
size
Small arrays are sorted instead using Insertion Sort.
Compute size
—a value typically between 32 and 64—to use for the length of the sub-arrays to be sorted.
Use Insertion Sort to sort each sub-array A[lo .. lo+size–1]
, handling special case when final sub-array is smaller.
merge()
uses extra storage equal in size to the original array.
Compute index positions for two sub-arrays to be merged, A[lo .. mid]
and A[mid+1 .. hi]
. Take special care with partial sub-arrays.
Merge sub-arrays together to sort A[lo .. hi]
using aux
for auxiliary storage.
Once all sub-arrays of length size
are merged with another, prepare for next iteration through while
loop to merge sub-arrays twice as large.
The auxiliary storage, aux
, is allocated once and used by each invocation
of merge()
. The actual implementation of Tim Sort has more complicated
logic that looks for ascending or strictly descending sub-arrays; it also
has a more sophisticated merge function that can merge groups of values
“all at once,” where the merge()
function I’ve shown operates one value
at a time. The simplified implementation whose behavior is shown in
Figure 5-15 contains the essential structure. Given the extensive
study of sorting algorithms, it is rather amazing that a new sorting
algorithm—discovered this century—has proven to be so effective when
working with real-world data sets.
Figure 5-15 demonstrates how Tim Sort works, using a min_run
of 4 just to make it easier to visualize. In the first step, four sub-arrays of
size 4 are sorted using Insertion Sort; the final two values containing
2 and 8 are contained in a partial sub-array of length 2. These
sorted sub-arrays are visualized using alternating bands of shaded and
non-shaded regions. There will be N/size
sorted sub-arrays (possibly one
more, if the length of the original array is not divisible by size
). I
showed earlier that the runtime performance of sorting size
values is
directly proportional to size
× (size
– 1)/2—since this occurs N/size
times, the total runtime performance is directly proportional to
N × (size
– 1)/2. Because size
can be considered a constant, this initial phase is classified as O(N).
In the second phase, pairs of neighboring runs are merged together. The
total accumulated time for the merge()
invocations is
proportional to N (as I explained earlier in Merge Sort). After the first
pass through the while
loop, the size of the sorted sub-arrays has
doubled to 8, as you can see by the shaded regions in Figure 5-15.
In this example, there are three iterations, as size
repeatedly doubles
from 4 to 32 (which is greater than N). In general, starting with
sorted sub-arrays of size size
, the while
loop iterates k
times until
size
× 2
k
> N; rewrite this computation as 2k
> N/size
.
To find k
, take the logarithm of both sides, which reveals that k
>
log
(N/size
). Because log
(a
/b
) = log
(a
) – log
(b
), I can say
that k
> log
(N) – log
(size
); since size
is a constant, I only
need to focus on the fact that k
is equal to the smallest integer greater than or equal to log
(N) minus a small constant value.
To summarize, the first phase of Tim Sort—which applies Insertion Sort—can be classified as O(N), and the second phase—which performs
repeated merge()
requests— is O(k
× N), where k
is no greater than
log
(N), resulting in a total overall performance of O(N log
N).
Sorting is a fundamental problem in computer science and has been extensively studied. An array containing primitive values can be sorted because these values can be compared with each other by default. More complex data types (such as strings or two-dimensional points) can be sorted using custom ordering functions to allow the same sorting algorithms to work.
In this chapter, you learned:
How some basic sorting algorithms have O(N2) performance, making them completely unsuitable for sorting large data sets.
The concept of recursion as a key strategy to solve problems by dividing them into smaller sub-problems.
That Merge Sort and Heap Sort, in different ways, achieve O(N log
N)
performance.
That Quicksort achieves O(N log
N) performance without requiring
additional storage, as Merge Sort does.
Tim Sort, the default sorting algorithm used by Python and an increasing number of other programming languages.
Write a recursive method count(A,t)
that returns the number of times
that the value t
appears within A
. Your implementation must have a
recursive structure, similar to find_max(A)
.
You are given an array containing a permutation of the N distinct integers from 0 to N – 1. Determine the fewest number of swaps needed to sort the values in ascending order. Write a function, num_swaps(A)
, that takes such an array as input and returns an integer value. Note that you do not actually have to sort the array; just determine the number of swaps.
Extend the problem to work with an array of N distinct values, using the symbol table from Chapter 3, and confirm that five swaps are needed for Figure 5-1.
What is the total number of comparisons needed for the recursive
find_max(A)
to determine the largest value in an unordered array of N
values? Is this total less than (or greater than) the total number of
comparisons used by largest(A)
presented in Chapter 1?
In the merge()
step in Merge Sort, it can happen that one side (left
or right) is exhausted. Currently, the merge()
function continues to
iterate one step at a time. Replace this logic using Python’s ability to
copy entire slices of an array, like was done in aux[lo:hi+1] = A[lo:hi+1]
. Replace the logic in the first two cases of merge()
using
slice assignment. Conduct empirical trials to try to measure the
performance improvement, if any.
Complete a recursive implementation, recursive_two(A)
, that returns the
two largest values in A
. Compare its runtime performance against the
other approaches from Chapter 1; also compare the number of times
less-than is invoked.
The Fibonacci series is defined using the recursive formula FN = FN – 1 + FN – 2, with base cases of F0 = 0 and F1 = 1. A related series, Lucas Numbers, is defined as LN = LN – 1 + LN – 2, with base cases of L0 = 2 and L1 = 1. Implement fibonacci(n)
and lucas(n)
using a standard recursive approach and measure the time it takes to compute both FN and LN up to N = 40; depending on the speed of your computer, you might have to increase or decrease N to allow the code to terminate. Now implement a new fib_with_lucas(n)
method that takes advantage of the following two identities:
1 No. See the challenge exercises at the end of the chapter.
2 275,000 is about 512 squared.
3 To avoid crashing the Python interpreter because of infinite recursion, this code returns 1 when given any integer less than or equal to 1.
4 In Python, the recursion limit is technically less than 1,000 to prevent crashing the Python interpreter.
5 rmax
stands for recursive max.
6 Invented by Tony Hoare in 1959, Quicksort is well over 50 years old!