Sorting is a classic computational problem. During the first few decades of computers, almost all computer time was devoted to sorting. Many sorting algorithms have been developed. It is useful to know a number of them, because sorting needs to be done in many different situations. Some depend on low time complexity, other on small memory, others on simplicity. Throughout the book, we consider a number of sorting algorithms because they are simple yet provide a rich selection of examples for demonstrating different algorithmic techniques. We have already looked at selection, insertion, and bubble sort in Section 1.4. In this chapter we start with a simple version of bucket sort and then look at counting sort. Radix sort, which is another surprising sort, is considered. Finally, counting and radix sort are combined to give radix counting sort.
Most sorting algorithms are said to be comparison-based, because the only way of accessing the input values is by comparing pairs of them, i.e., ai ≤ aj. Radix counting sort manipulates the elements in other ways. Another strange thing about this algorithm is that its loop invariants are rather unexpected.
In Section 9.1, we consider merge sort and quick sort, which is a recursive and randomized version of bucket sort. We look at heap sort in Section 10.4.
Specifications: As a professor, I often have to sort a large stack of students' papers by last name. The algorithm that I use is an iterative version of quick sort and bucket sort. See Section 9.1.
Basic Steps:
Partitioning into Five Buckets: Computers are good at using a single comparison to determine whether an element is greater than the pivot value or not. Humans, on the other hand, tend to be good at quickly determining which of five buckets an element belongs in. I first partition the papers based on which of the following ranges the first letter of the name is within: [A–E], [F–K], [L–O], [P–T], or [U–Z]. Then I partition the [A–E] bucket into the subbuckets [A], [B], [C], [D], and [E]. Then I partition the [A] bucket based on the second letter of the name. This works for this application because the list to be sorted consists of names whose first letters are fairly predictably distributed through the alphabet.
A Stack of Buckets: One difficulty with this algorithm is keeping track of all the buckets. For example, after the second partition, we will have nine buckets: [A], [B], [C], [D], [E], [F–K], [L–O], [P–T], and [U–Z]. After the third, we will have 13. On a computer, the recursion of the algorithm is implemented with a stack of stack frames. Correspondingly, when I sort the student’s papers, I have a stack of buckets.
The Loop Invariant: I use the following loop invariant to keep track of what I am doing. The papers are split between a pile of already sorted papers and a stack of piles of partially sorted papers. The papers in the sorted pile (initially empty) come before all the partially sorted papers. Within the partially sorted stack of piles, the papers within each pile are out of order. However, each paper in a pile belongs before each paper in a later pile. For example, at some point in the algorithm, the papers starting with [A–C] will be sorted, and the piles in my stack will consist of [D], [E], [F–K], [L–O], [P–T], and [U–Z].
Maintain Loop Invariant: I make progress while maintaining this loop invariant as follows. I take the top pile off the stack, here the [D]. If it only contains a half dozen or so papers, I sort them using insertion sort. These are then added to the top of the sorted pile, [A–C], giving [A–D]. On the other hand, if the pile [D] taken off the stack is larger then this, I partition it into five piles, [DA–DE], [DF–DK], [DL–DO], [DP–DT], and [DU-DZ], which I push back onto the stack. Either way, my loop invariant is maintained.
Exit Condition: When the last bucket has been removed from the stack, the papers are sorted.
EXERCISE 5.1.1 Try sorting a deck of cards using this algorithm.
EXERCISE 5.1.2 Give code for this algorithm.
The counting sort algorithm is only useful in the special case where the elements to be sorted have very few possible values.
Specifications:
Preconditions: The input is a list of N values a0, . . . , aN−1, each within the range 0, . . . , k − 1.
Postconditions: The output is a list consisting of the same N values in nondecreasing order. The sort is stable, meaning that if two elements have the same value, then they must appear in the same order in the output as in the input. (This is important when extra data is carried with each element.)
Basic Steps:
Where an Element Goes: Consider any element of the input. By counting, we will determine where this element belongs in the output, and then we simply put it there. Where it belongs is determined by the number of elements that must appear before it. To simplify the argument, let’s index the locations with [0, N − 1]. This way, the element in the location indexed by 0 has no elements before it, and the element in location has
elements before it.
Suppose that the element ai has the value v. Every element that has a strictly smaller value must go before it. Let’s denote this count with , that is,
. The only other elements that go before ai are elements with exactly the same value. Because the sort must be stable, the number of these that go before it is the same as the number that appear before it in the input. If this number happens to be
, then element ai belongs in location
. In particular, the first element in the input with value v goes in location
.
Example:
The first element to appear in the input with value 0 goes into location 0, because there are elements with smaller values. The next such element goes into location 1, the next into 2, and so on.
The first element to appear in the input with value 1 goes into location 5, because there are elements with smaller values. The next such element goes into location 6, and the next into 7.
Similarly, the first element with value 2 goes into location .
Computing : We could compute
by making a pass through the input, counting the number of elements that have values smaller than v. Doing this separately for each value v ∈ [0..k − 1], however, would take O(kN) time, which is too much.
Instead, let’s first count how many times each value occurs in the input. For each v ∈ [0..k − 1], let cv = |{i | ai = v}|. This count can be computed with one pass through the input. For each element, if the element has value v, increment the counter cv. This requires only O(N) addition and indexing operations.
Given the cv values, we could compute . Computing one such
would require O(k) additions, and computing all of them would take O(k2) additions, which is too much.
Alternatively, note that and
. Of course, we must have computed the previous values before computing the next. Now computing one such
takes O(1) additions, and computing all of them takes only O(k) additions.
Put in Place: The main loop in the algorithm considers the input elements one at a time in the order a0, . . . , aN−1 that they appear in the input and places them in the output array where they belong.
The Loop Invariant:
1. The input elements that have already been considered have been put in their correct places in the output.
2. For each v ∈ [0..k − 1], gives the index in the output array where the next input element with value v goes.
Establishing the Loop Invariant: Compute the counts as described above. This establishes the loop invariant before any input elements are considered, because this
value gives the location where the first element with value v goes.
Main Step: Take the next input element. If it has value v, place it in the output location indexed by . Then increment
.
Maintain Loop Invariant: loop-invariant′
& not
exit-cond
& codeloop ⇒
loop-invariant″
. By the loop invariant, we know that if the next input element has value v, then it belongs in the output location indexed by
. Hence, it is being put in the correct place. The next input element with value v will then go immediately after this current one in the output, i.e., into location
. Hence, incrementing
maintains the second part of the loop invariant.
Exit Condition: Once all the input elements have been considered, the first loop invariant establishes that the list has been sorted.
Code:
Running Time: The total time is O(N + k) addition and indexing operations. If the input can only contain k = O(N) possible values, then this algorithm works in linear time. It does not work well if the number of possible values is much higher.
The radix sort is a useful algorithm that dates back to the days of card-sorting machines, now found only in computer museums.
Specifications:
Preconditions: The input is a list of N values. Each value is an integer with d digits. Each digit is a value from 0 to k − 1, i.e., the value is viewed as an integer base k.
Postconditions: The output is a list consisting of the same N values in nondecreasing order.
Basic Steps: For some digit i ∈ [1..d], sort the input according to the ith digit, ignoring the other digits. Use a stable sort, such as counting sort.
Examples: Old computer punch cards were organized into d = 80 columns, and in each column a hole could be punched in one of k = 12 places. A card-sorting machine could mechanically examine each card in a deck and distribute the card into one of 12 bins, depending on which hole had been punched in a specified column.
A “value” might consist of a year, a month, and a day. You could then sort the elements by the year, by the month, or by the day.
Order in Which to Consider the Digits: It is most natural to sort with respect to the most significant digit first. The final sort, after all, has all the elements with a 0 as the first digit at the beginning, followed by those with a 1.
If the operator of the card-sorting machine sorted first by the most significant digit, he would get 12 piles. Each of these piles would then have to be sorted separately, according to the remaining digits. Sorting the first pile according to the second digit would produce 12 more piles. Sorting the first of those piles according to the third digit would produce 12 more piles. The whole process would be a nightmare.
Sorting with respect to the least significant digit seems silly at first. Sorting 79, 94, 25
gives
94, 25, 79
, which is completely wrong. Even so, this is what the algorithm does.
The Algorithm: Loop through the digits from low to high order. For each, use a stable sort to sort the elements according to the current digit, ignoring the other digits.
Sorted by first 3 digits |
Considering 4th digit |
Stably sorted by 4th digit |
184 |
3184 |
1195 |
192 |
5192 |
1243 |
195 |
1195 |
1311 |
243 |
1243 |
3184 |
271 |
3271 |
3271 |
311 |
1311 |
5192 |
The result is sorted by the first four digits.
Loop Invariant: After sorting with respect to (wrt) the first i low-order digits, the elements are sorted wrt the value formed from these i digits.
Establishing the Loop Invariant: The loop invariant is initially trivially true, because initially no digits have been considered.
Maintain Loop Invariant: loop-invariant′
& not
exit-cond
& codeloop ⇒
loop-invariant″
. Suppose that the elements are sorted wrt the value formed from the lowest i − 1 digits. For the elements to be sorted wrt the value formed from the lowest i digits, all the elements with a 0 in the ith digit must come first, followed by those with a 1, and so on. This can be accomplished by sorting the elements wrt the ith digit while ignoring the other digits. Moreover, the block of elements with a 0 in the ith digit must be sorted wrt the lowest i − 1 digits. By the loop invariant, they were in this order, and because the sorting wrt the ith digit was stable, these elements will remain in the same relative order. The same is true for the block of elements with a 1 or 2 or . . . in the ith digit.
Ending: loop-invariant
&
exit-cond
& codepost-loop ⇒
post–cond
. When i = d, they are sorted wrt the value formed from all d digits, and hence are sorted.
I will now combine the radix and counting sorts. The resulting algorithm is said to run in linear Θ(n) time, whereas merge, quick, and heap sort are said to run in Θ(n log n) time. This makes radix counting appear to be faster, but this is confusing and misleading. Radix counting requires Θ(n) bit operations, where n is the total number of bits in the input instance. Merge, quick, and heap sort require Θ(N log N) comparisons, where N is the number of numbers in the list. Assuming that the N numbers to be sorted are distinct, each needs Θ(log N) bits to be represented, for a total of n = Θ(N log N) bits. Hence, merge, quick, and heap sort are also linear time in that they require Θ(n) bit operations, where n is the total number of bits in the input instance.
In practice, the radix counting algorithm may be a little faster than the other algorithms. However, quick and heap sort have the advantage of being done “in place” in memory, while the radix counting sort requires an auxiliary array of memory to transfer the data to.
Specifications:
Preconditions: The input is a list of N values. Each value is an l-bit integer.
Postconditions: The output is a list consisting of the same N values in nondecreasing order.
The Algorithm: The algorithm is to use radix sort with counting sort to sort each digit. To do this, we need to view each l-bit value as an integer with d digits, where each digit is a value from 0 to k − 1. This is done by splitting the l bits into d blocks of bits each and treating each such block as a digit between 0 and k − 1, where k = 2l/d. Here d is a parameter to be set later.
Example: Consider sorting the numbers 30, 41, 28, 40, 31, 26, 47, 45. Here N = 8 and l = 6. Let’s set d = 2 and split the l = 6 bits into d = 2 blocks of bits each. Treat each of these blocks as a digit between 0 and k − 1, where k = 23 = 8. For example, 30 = 0111102 gives the blocks 0112 = 3 and 1102 = 6.
This is sorted.
Running Time: Using the counting sort to sort with respect to one of the d digits takes Θ(N + k) operations. Hence, the entire algorithm takes Θ(d · (N + k)) operations. We have , giving
operations.
The parameter k (like l) is not dictated by the specifications of the problem, but can be chosen freely by the algorithm. Exercise 23.1.4 sets k = O(N) in order to minimize the running time to operations.
Formally, time complexity measures the number of bit operations performed as a function of the number of bits to represent the input. When we say that counting sort takes Θ(N + k) operations, a single operation must be able to add two values with magnitude Θ(N) or to index into arrays of size N (or k). Each of these takes Θ(log N) bit operations. Hence, the total time to sort is operations × log N (bit operations)/operation = Θ(l · N) bit operations. The input, consisting of N l-bit values, requires n = l · N bits to represent it. Hence, the running time Θ(l · N) = Θ(n) is linear in the size of the input.
One example is when you are sorting N values in the range 0 to Nr. Each value requires l = log Nr = r log N bits to represent it, for a total of n = N log(Nr) = r N bits. Our settings would then be k = N, , and T = Θ(d · N) = Θ(r N) = Θ(n).