4 Sorting

The US Constitution postulates that a decennial census should take place in order to apportion taxes and representatives among the several states of the union. The first census following the American Revolution took place in 1790, and a census has been done every ten years since.

In the hundred years since 1790, the United States grew rapidly—from a bit less than 4 million people in the first census, to more than 50 million in 1880. And therein lay a problem: it took eight years to count these people. When the next census year came, in 1890, the population was even bigger. If the count were taken in the same way, it would probably not have been completed before the following census of 1900.

At that time, Herman Hollerith, a young graduate from Columbia University’s School of Mines (he graduated in 1879, when he was 19), was working for the US Census Bureau. Aware of the pressing timing problem, he tried to find a way to speed up the census process using machines. Hollerith was inspired by the way conductors used holes punched in railway tickets to record traveler details; he invented a way in which punched cards could be used to record census details. These cards could then be processed by tabulating machines, electromechanical devices that could read the punched cards and use the data stored in them to make a tally.

Hollerith’s tabulating machine was used in the 1890 census and brought down the time required to complete it to six years—when it came out that the US population had grown to approximately 63 million people. Hollerith presented his tabulating machines to the Royal Statistical Society, noting that “it must not be considered that this system is still in an experimental stage. Over 100,000,000 punched cards have been counted several times over on these machines, and this has afforded ample opportunity to test its capabilities.”1 Following the census, Hollerith started a business, called the Hollerith Electric Tabulating System. This company, via a series of renames and amalgamations, evolved into International Business Machines (IBM) in 1924.

Today sorting is so ubiquitous that is largely invisible. Just a few decades ago, offices were full of file cabinets containing labeled folders, and corporate office personnel took care to keep them in the required order, like alphabetic or chronological. By contrast, we can sort the messages in our mailboxes just by clicking, and are able to order them using different criteria such as subject, date, and sender. Our contacts are kept sorted in our digital devices without us taking notice; again, a few years ago we would take pains to make sure we had our contacts organized in our diaries.

Going back to the US census, sorting was one of the first examples of office automation; it is not surprising, then, that it was one of the first applications of digital computers. A lot of different sorting algorithms have been developed. Some of them are not used in practice, but there are still a number of different sorting algorithms that are popular with programmers because they offer different comparative advantages and disadvantages. Sorting is such a fundamental part of what computers do that any book on algorithms will always devote some part to it, yet exactly because there are many different sorting algorithms, their exploration allows us to appreciate an important aspect of the work of computer scientists and programmers. Like toolsmiths, they have a whole toolbox at their disposal. There may be different tools for the same task. Think of different types of screwdrivers. We have slot, Phillips, Allen, and Robertson drivers, to name but a few. Although all of them have the same objective, particular screws require particular drivers. Sometimes we can make do using a slot driver on a cross screw; in general, though, we must use the proper tool for the job. The same with sorting. While all sorting algorithms put things in order, each is more suitable for particular uses.

Before we start exploring these algorithms, let’s look at some explanations of what exactly these algorithms do. Sure, they sort stuff, but that really begs the question, What exactly do we mean by sorting data?

We assume that we have a group of related data—usually called records—that contains some information that is of interest to us. For example, such data could be the emails in our in-box. We want to rearrange these data so that they appear in a specific order that is useful to us. The rearrangement has to take place using some specific feature or features of the data. In our email illustration, we may want to order our messages by delivery date, chronologically, or the sender’s name, alphabetically. The order may be ascending, from earlier messages to more recent ones, or descending, from recent messages going back in time. The output of the sorting process must be the same data as the input; in technical terms, this must be a permutation of the original data—that is, the original data in different order, but not changed in any other way.

The feature we are using to sort our data is usually called a key. A key may be atomic, when we consider that we cannot decompose it to parts, or it may be composite, when the key consists of more than a single feature. If we want to sort our emails by delivery date, this is an atomic key (we do not care that a date can be broken up in year, month, and day, and may also contain the exact time of delivery). But we may want to sort our emails by the sender’s name, and then for all the messages from the same sender, order them by delivery date. The combination of date and sender forms the composite key of our sort.

Although all of them have the same objective, particular screws require particular drivers. . . . The same with sorting. While all sorting algorithms put things in order, each is more suitable for particular uses.

Any kind of feature can be used as a key for sorting, as long as its values can be ordered. Obviously this holds true for numbers. If we want to sort sales data by the number of sales per items sold, the number of sales is an integer. When our keys are textual, such as senders’ emails, the ordering that we usually want is lexicographical. Sorting algorithms need to know how to compare our data so as to deduce their order, but any valid way to compare will do.

We’ll start our exploration of sorting methods with two algorithms that may be familiar because they are probably the most intuitive and even used by people with no knowledge of algorithms when they have to sort a pile of stuff.

Simple Sorting Methods

Our task is to sort the following items:

Admittedly, if you take a look at the task, it’s pretty trivial; these are the numbers from one to ten. But keeping things simple will allow us to concentrate on the logic of the sorting task.

First, we go through all the items and find the minimum among them. We take it from where we found it and place it first. The minimum of the items is 1, so this must be put into the first position. As this position is already taken, we have to do something with 4, which is currently at the first position; we cannot just throw it away. What we can do is to swap it with the minimum: move the minimum item to the first position and move the item previously in the first position to the position left vacant by moving the minimum. So we go from here, where the minimum is painted black,

to here,

where the minimum is painted white, to indicate that it is in its correct, ordered position.

Now we do exactly the same thing with all the numbers, save for the minimum we found—that is, all the numbers from the second position onward (the gray numbers). We find their minimum, which is 2, and swap it again with the first of the unsorted numbers, 6:

Again we do the same. We deal with the items from the third one onward; we find the minimum, which is 3, and swap it with the item currently in the third place, 10:

If we continue this way, item 4 will stay put because it is already in its correct place and we’ll go on to place 5 in its sorted position:

At each point we go through fewer and fewer items to find their minimum. In the end, we’ll find the minimum of the last two items, and once we’ve done that, all our items will be sorted.

This sorting method is called selection sort because each time, we select the minimum of the unsorted items and place it where it should be. As all sorting algorithms that we will examine, selection sort has no problem with ties—that is, elements that have the same order. If we find more than one minimum when we examine the unsorted items, we just pick any one of them as our working minimum. We’ll find the tied item next time around and put it next to its equal.

Selection sort is a straightforward algorithm. Is it also a good one? If we pay attention to what we are doing, we are going from the beginning to the end of the items that we want to sort, and each time we try to find the minimum of the remaining unsorted items. If we have n items, the complexity of the selection sort is . This is not bad in itself; such complexity is not prohibitive, and we can tackle large problems (read: sort a lot of items) in a reasonable amount of time.

The thing is, exactly because sorting is so important, algorithms do exist that are faster than that. So although selection sort is not inherently bad, we usually prefer to use other, more advanced algorithms when we have a lot of items at hand. At the same time, selection sort is not only easy to understand by humans but is also easy to implement on a computer in an efficient way. So it is clearly not of just academic interest; it is really used in practice.

The same can be said for another simple sorting algorithm that we’ll describe now. Like selection sort, this is a sorting method that is easy to understand beyond computers. In fact, this is the way we may sort our hand in a card game.

Imagine that you play a game of cards in which you are dealt ten cards (for example, you could be playing Rummy). As you take one card after the other, you want to sort them in your hand. We assume that the card rank, from the lowest to highest, is:

In fact, in many games (and Rummy), the ace can be the lowest- and highest-ranking card, but we’ll assume that there is a single order only.

You are dealt each card, so you start with one card in your hand and nine cards to follow:

Now you get a second card; it is a six:

Six is fine next to four, so you leave it there and take another card, which turns out to be two:

This time, so as to keep your hand in order, you need to move two to the left of four, thus pushing four and six one position to the right. You do that before you are dealt another card, a three:

You insert the three between the two and four, and see the next card, a nine. This is already in the right place in your hand.

You may continue with your hand—for instance, 7, Q, J, 8, and 5. In the end, you will end up with a sorted hand.

Each new card was inserted in the right place in relation to the previous cards that had been dealt. This way of sorting is called insertion sort for that reason, and it works for any kind of objects, not just playing cards.

Like selection sort, insertion sort is straightforward to implement. It turns out that it has the same complexity: . It does have a distinct characteristic, though: as in our playing cards example, you don’t need to know the items in advance before you sort them. In effect, you sort them as you get them. That means that you can use insertion sort when the items to be sorted are somehow streamed to you live. We met this kind of algorithm, which works live as it were, when we discussed the tournament problem in graphs in chapter 2, and we called it an online algorithm. If we have to sort an unknown number of items, or if we must be able to stop immediately and provide a sorted list whenever we are suddenly called to do so, then insertion sort is the way to go.2

Radix Sort

Let us now return to Hollerith. His tabulating machines did not use selection sort, nor insertion sort. They actually used a precursor of a method still in use today, called radix sort. As a tribute to the first machine-enabled sorting application, it is worth spending some time on how radix sort works. It is also interesting because this is a sorting method in which the items to be sorted are not really compared to each other. At least not entirely, as we will see. What’s more, radix sort is not just of historical interest, as it performs fantastically. What’s not to like in a venerable yet practical algorithm?3

The easiest way to see a radix sort is by using playing cards again. Suppose that we have a full deck of cards that has been shuffled and want to sort it. One way to do it is to form 13 piles, one for each rank value. We go through the deck, taking each card and placing it in the respective pile. We’ll get 13 piles of four cards each: a pile containing all the aces, another one containing all the twos, and so on.

Then we collect the cards, pile by pile, taking care to put each pile we pick at the bottom of the cards we are collecting. In this way we’ll have all the cards in our hands, partially sorted. The first four cards will be aces, the next four twos, and all the way to the kings.

We now create four new piles, one for each suit. We’ll go through the cards, taking each card and putting it into the corresponding pile. We’ll get four piles of suits. Because the values were already sorted, in each pile we will have all cards of a single suit, in rank order.

To finish sorting our cards, we only need to collect them pile by pile.

This is the essence of radix sort. We did not sort the cards by fully comparing cards between them. We performed partial comparisons, first by rank, and then by suit.

Of course, if radix sort was applicable only to cards, it would not merit our attention here. We can see how radix sort works with integer numbers. Suppose that we have the following group of integers:

We make sure that all the integers have the same number of digits. So we pad the numbers with zeros on the left if necessary, turning 5 to 005, 97 to 097, and 53 to 053. We go through all our numbers and triage them by their rightmost digit. We use that digit to place them in ten piles:

We lightened up the numbers’ fill color to indicate that they are partially sorted; each pile contains the numbers with the same rightmost digit. All the numbers in the first pile end in zero, and in the second pile they end in one, up to the last pile, where they end in nine. We now collect the ten piles, starting from the first on the left and adding piles at the bottom (taking care not to shuffle the numbers in any way). Then we redistribute them into ten piles using the second digit from the right and get:

This time all the numbers in the first pile have their second from the right digit equal to zero; in the second pile they have their second from the right digit equal to one, and similarly for the other piles. At the same time, the items in each pile are sorted by their last digit because that’s what we did when we piled them the first time.

We finish by collecting the piles and redistributing the numbers, using the third digit from the right this time:

Now the items in each pile start with the same digit and are sorted by their second digit, as a result of the previous piling, and their last digit, as a result of the first piling. To get our sorted numbers, we just collect the piles one final time.

Radix sort can work with words or any sequence of alphanumeric characters as well as integers. In computer science, we call a sequence of alphanumeric characters and symbols a string. Radix sort works with strings; the strings may be composed of digits, like in our example, but they may be any kind of strings. The number of piles for alphabetic strings will be equal to the number of distinct characters comprising the alphabet (for instance, 26 piles for English), but the operations will be exactly the same. What is distinctive in radix sort is that even when the strings are composed entirely of digits, we treat them as alphanumeric sequences, not as numbers. If you check how we worked, we did not care for the values of the numbers, but we were working each time with one particular digit from the number, in the same way that we would work by extracting characters from a word, going from the right to left. That is why radix sort is sometimes called a string sorting method.

Do not let this fool you and lead you to think that radix sort can order strings while the other sorting methods we present here cannot. All of them can. We can sort strings, as long as the symbols that compose them can themselves be ordered. Human names are strings to computers, and we can sort them because letters are ordered alphabetically and names can be compared lexicographically. The appellation “string sorting” is because radix sort treats all keys, even numbers, as strings. The other sorting methods in this chapter treat numbers as numbers and strings as strings, and work by comparing numbers or strings, as is appropriate. It is only for convenience that we use numbers as keys in our examples in the different sorting algorithms.

The way radix sort works by processing the items to be sorted digit by digit (or character by character) makes it efficient. If we have n items to sort, and the items consist of w digits or characters, then the complexity of the algorithm is . That is much better than the complexity required by selection and insertion sorts.

And so we come full circle to tabulating machines. A tabulating machine worked in a similar way, sorting punched cards. Imagine that we have a deck of cards where each card has ten columns; punched holes in each column indicate a digit. The machine could recognize the holes in each column, thus figuring out the corresponding digit. An operator put the cards in the machine, and the machine placed the cards in ten output bins depending on their last column—that is, the least significant digit. The operator collected the cards from the output bins, being careful not to mix them in any way, and fed them again into the machine, which this time distributed them into the output bins using their one but last column, the digit next to the least significant one. After repeating the process ten times, the operator could collect an ordered pile of cards. Voilà.

Quicksort

Suppose we have a group of kids milling around in a yard (perhaps at school) and want to put them in line, from the shortest to tallest. Initially we ask them to get in line, which they will do, in whatever order they want:

Now we pick a kid at random:

We tell the kids to move around so that all kids who are shorter than the chosen one should move to the left of them and all the rest should move to their right. In the following figure we show where the kid we picked ended up, and you can check that those kids who are taller are to the right and those who are shorter are to the left:

We did not ask the kids to put themselves in the right order. We only asked them to move relatively to the kid we chose. So they formed two groups, on the left and right of the chosen one. The kids in these groups are not in any shorter-to-taller sequence. We do know, however, that one kid is certainly in the final position in the line we are trying to form: the very kid we picked. All the kids on the left are shorter and all the kids on the right are at least as tall. We call the kid we picked pivot because the rest of the kids have moved around them.

As a visual aid, we will follow the convention of painting white the kids who are put in the right position. When we select a kid as a pivot, we will paint them black; when we have moved the rest of the kids around the pivot, we will use a small black hat to indicate the final position of the pivot (it’s white because it’s in the right position, with a black top, to indicate that it was the pivot).

Now we shift our attention to one of the two groups, left or right—say the left. Again we pick a pivot in that group at random:

We ask the kids in that group to do the same thing as before: move so that if they are shorter, they move to the left of the pivot and otherwise they should end to the right. We will have again two new, smaller groups, which you can see below. One of them is a group of one, so that kid is in their right place in that trivial group. Then we have the rest of the kids to the right of the second pivot. The second pivot is in the right place, with all the shorter kids to the left, and all the rest to the right. This group to the right extends to the first pivot. We then pick a new, third pivot from that group.

When we tell the kids in the group to move as before, related to how tall they are with respect to the third pivot, two smaller groups will be formed. We focus our attention to the one on the left. We do as before. We pick a pivot, our fourth, and we ask the kids in this group of three to move around it.

When they do, the pivot ends up being the first of the three, so we have a remaining group of two kids on the pivot’s right. We pick one of the pair as a pivot, and the other kid will move, if needed, to their right.

It turns out that this kid does not need to move at all. Right now, we have managed to put about half the kids in order; there are two groups that we had left when we were dealing with previous pivots. We go back to the first of these two groups from the left in order to pick a pivot there and repeat the process.

Again, no movement around was necessary and so we go to the last group of unsorted kids to pick a pivot.

We get a group of one, on the pivot’s right, and a group of two, on the pivot’s left. We focus on the left group and select one of the two as our last pivot.

We are done. All the kids are in order of height.

Let’s take stock of what we did. We managed to put the kids in order by putting one kid in their right place each time. To do that, we only needed to ask the rest of the kids to move around them. This will always work, of course, not just with kids but also with anything that we may want to sort. If we have a group of numbers that we can sort, we can follow a similar process, picking up a number at random and moving around the rest of the numbers so that those that are smaller end up before our chosen number, and the rest end up after it. We’ll repeat the process in the smaller groups that are formed; in the end, we’ll have all the numbers in the right order. This is the process that underlies the quicksort algorithm.

Quicksort is based on the observation that if we manage to position one element in the correct position with respect to all the rest, whatever that position might be, and then repeat this with the remaining elements, we’ll end up with all the elements in their correct positions. If we think back on what we did with selection sort, there we also took each element and positioned it correctly with respect to all the rest, but the element we took was always the minimum of the remaining ones. This is a crucial difference: in quicksort, we should not pick the minimum of the remaining elements as our pivot. Let’s see what happens if we do so.

If we start again with the same group of kids, we’ll get the shortest of all kids as our pivot. That one will go to the beginning of the line, and all the rest will move behind the pivot.

Then we’ll get the kid who is immediately taller than the first one and put them second in line. All the rest of the kids will go, again, behind the pivot.

Doing the same thing with the third kid gets us to this point:

But notice how this looks eerily like a selection sort, as we are filling in the line from the left to the right with the shortest of the remaining kids.

We have not said how we choose an element as a pivot each time. We now see we should not choose the minimum of the elements. First, choosing the minimum requires effort; we should really go and find the minimum each time. Second, it behaves like an algorithm we already know and so there should not be much point in doing it.

The truth is that quicksort is better than selection sort because “normally” (we’ll see what normally means shortly) we’ll pick as our pivot something that partitions our data in some more equitable way. Choosing the minimum element creates the most unequal partition: nothing on the left of the pivot, and all the rest to the right of the pivot. Each time, then, we just manage to position the pivot itself.

If the partition is better, then we do not just manage to position the pivot. We also manage to position all the elements to the left of the pivot in their correct positions with respect to the elements to the right of the pivot. Yes, they are not in their final positions yet. But overall, they are in better positions than before. So we have one element, the pivot, in the best position possible, and the other elements better positioned than before.

This has an important effect on the performance of quicksort: its expected complexity is , which is way better than . If we want to sort 1 million items, works out to , a trillion, while is about 20 million.

It all hinges on picking the proper pivot. Searching for a pivot that would partition our data in the best possible way each time does not make sense; it would require searching to find the right pivot, so this would add complexity to the process. A good strategy, then, is to leave it to chance. Just pick a pivot at random and use what you picked to partition the data.

To see why this is a good strategy, let us see why it is not a bad one. It would be a bad one if it led to a behavior like the one we just saw, where quicksort degenerates to selection sort. This would happen if we pick each time as a pivot an item that does not really partition the elements. This can happen if we pick each time the minimum or maximum of the items (the situation is exactly the same). The overall probability of all this happening can be found to be

A probability such as is hard to grasp because it is abysmally low. To put it into context, if you take a deck of 52 playing cards and shuffle it randomly, the probability that the deck will end up being in order is This is about the same as flipping a coin and coming out heads 226 times in a row. When you multiply by , things are not improved much. The number is approximately equal to . To put the matter in cosmic perspective, the earth is composed of about atoms. If you and a friend of yours were to pick independently an atom from the earth, the probability that you would pick the same atom would be , actually greater than —the probability of pathological quicksort on a deck of cards.4

That explains that “normally” we pick a pivot in a more equitable way, as we said above. Excepting a streak of bad luck of cosmic proportions, we do not expect to pick the worst pivot possible each time. The odds actually work better in our favor: it is by picking pivots at random that we expect to get a complexity of . It is theoretically possible to do worse than that, but the possibility is only of academic interest. Quicksort will be as fast as we expect it to be for all practical purposes.

Quicksort was developed by the British computer scientist Tony Hoare in 1959–1960.5 It is probably the most popular sorting algorithm today because when implemented correctly, it outruns all others. It is also the first algorithm that we see whose behavior is not entirely deterministic. Although it will always sort correctly, we cannot guarantee that it will always have the same runtime performance. We can guarantee that it is extremely unlikely that it will exhibit pathological behavior. This is an important concept, which brings us to the so-called randomized algorithms: those algorithms that use an element of chance in their operation. This runs contrary to our intuition; we expect algorithms to be the ultimate deterministic beasts, slavishly following the instructions we lay down for them on a preordained path. And yet randomized algorithms have blossomed in recent years, as it has turned out that chance can help us solve problems that remain intractable to more standard approaches.6

Merge Sort

We’ve met radix sort, which essentially sorts items by distribution: in each round through the data, it places each item in a correct pile. Now we’ll meet another sorting method, which sorts item by merging stuff together instead of splitting them apart. The method is called merge sort.

Merge sort starts by admitting to a limited capability for sorting; imagine that we are unable to sort our items if they are given to us in any random arrangement. We are only able to do the following: if we are given two groups of items, and each group is already sorted, we can merge them together and get a single, sorted group.

Randomized algorithms have blossomed in recent years, as it has turned out that chance can help us solve problems that remain intractable to more standard approaches.

For example, say we have the following two groups, one per row (although in our example the two groups have the same number of items, there is no need for the groups to be equal in size):

As you can see, each of the two groups is already sorted. We want to merge them in order to create a single sorted group. This is really simple. We check the first item of both groups. We see that 15 is smaller than 21, so this will be the first item of our third group:

We examine again the first elements of the two groups, and this time 21 from the second group is smaller than 27 from the first group. So we take it and append it to the third group.

If we continue in this way, we’ll take 27 from the first group and then 35 from the second group, adding them to the end of the third group:

Now 51 is smaller than 59, and 56 is smaller than 59. As we already have moved 35 from the second group to the third, in the end we’ll have moved three items in a row from the second group to the third. That is fine because in this way we keep items in the third group sorted. There is no reason why the two first groups should diminish at the same rate.

We return to the first group, as 59 is smaller than 69, so we add it to the third group:

Next, by moving 69 to the third group we empty the second group completely:

We finish by moving the last remaining elements of the first group to the third group—they are definitely larger than the last element of the third group or otherwise we would not have moved it there previously. Our items are completely sorted now:

It’s nice to have a way of producing a sorted group from two sorted groups, but this does not seem to solve our problem of sorting a single group of unsorted items. It is true it does not, yet it is an important component of the solution.

Imagine now that we have a group of people. We give to one of them a group of items to sort. That person does not know how to sort them, but they do know that if somehow they had two sorted parts of the items, they could produce a final sorted group. So what they do is this: they split the group in two and pass it on to two other people. They say to the first of them, “Take this group and sort it. Once you are done, return it to me.” They say the same thing to the second person. Then they wait.

Although our first point of contact does not know how to sort the items, if the two new contacts manage somehow to sort their own parts and return them, then the first person would return to us the final, completely sorted group. But the two other contacts know no more than our initial contact—they don’t know how to sort but rather only how to merge sorted stuff using the algorithm above—so has anything really been achieved?

The answer is yes, provided that they do the same: they split their part in two, and each delegates their part to two other persons, waiting for them to do their bidding and provide them with two sorted parts.

This seems like the ultimate pass-the-buck game, but look at what happens if we try to see it unfold with an example. We start with the numbers 95, 59, 15, 27, 82, 56, 35, 51, 21, and 79. We give them to Alice (A), who splits them in two, and passes them to Bob (B) and Carol (C). You can see that in the first level of the upside-down tree below:

Then Bob splits his numbers into two, and passes them on to Dave (D) and Eve (E). Similarly, Carol splits her numbers, and passes them on to Frank (F) and Grace (G). Our cast of characters continue passing the buck. Dave divides his numbers to Heidi (H) and Ivan (I); Eve distributes her two numbers to Judy (J) and Karen (K); Frank and Grace split to Leo (L) and Mallory (M) and Nick (N) and Olivia (O), respectively. Finally, Heidi splits her pair to Peggy (P) and Quentin (Q), while Leo splits his pair to Robert (R) and Sybil (S).

The people at the leaves of the tree have really nothing to do. Peggy and Quentin receive a number each, and they are told to sort it. But a single number is sorted by definition: it is in order with itself. So Peggy and Quentin just give their number back to Heidi. Also, Ivan, Judy, Karen, Robert, Sybil, Mallory, Nick, and Olivia return the numbers they received.

Now let’s move to the tree on the next page. In this tree we’ll move from the leaves, at the top (so this looks like a normal tree, not upside down), to the root at the bottom. Let’s concentrate on Heidi. She gets back two numbers, each one of which is (trivially) sorted. Heidi knows how to merge two sorted groups to produce a single group so she can use 95 and 59 to make 59, 95. She then returns this sorted group of two to Dave. Leo will act the same: he will get 35 and 56, which are already sorted (by themselves), and knows how to put these two in order and create 35, 56, which he returns to Frank.

Dave, who was clueless about the numbers 95, 59, 15 that he had initially received, now gets 59, 95 from Heidi and 15 from Ivan. Both of these groups are already sorted, which means that Dave can merge them to create 15, 59, 95. In the same way, Frank gets 35, 56 from Leo and 51 from Malory, and can produce 35, 51, 56.

If everybody acts in the same way, when the numbers reach Alice, she will get two sorted lists, one from Carol and one from Bob. She will merge them to create the final sorted list.

These two trees are the essence behind merge sort. We delegate the sorting as much as we can, to the point that no sorting can take place because lone items are already sorted by definition. Then we merge larger and larger groups, until we absorb all elements in a single, final, sorted group.

The smarts that we require from our characters is minimal. You can see in the first tree that Eve got from Bob a group of numbers that as it happened was already sorted: 27, 82. It does not matter. She does not stop to check whether they need sorting or not—and we don’t want her to because such a check would take time. She just splits and passes them down. She will get them back and merge them to produce what she already got. That’s all right; in the large scheme of things, this gratuitous pas de trois between Eve, Judy, and Karen won’t affect the performance of the algorithm.

The complexity of merge sort is as good as that of quicksort, . That means that we have two algorithms with the same performance. In practice, programmers may choose one or the other depending on additional factors. Usually quicksort programs run faster than merge sort ones because their concrete implementation in a programming language is faster. Merge sort splits the data before merging them, which means that they can be parallelized, so that vast amounts of data can be sorted by a computer cluster, where each computer acts like our human sorters above.

Merge sort is as old as computers. Its inventor was a Hungarian American, Neumann János Lajos, better known under his American name, John von Neumann (1903–1957). In 1945, he wrote a manuscript, in ink, 23 pages long, for one of the first digital computers, the Electronic Discrete Variable Automatic Computer, or EDVAC for short. At the top of the first page, the phrase “TOP SECRET” was penciled in (and later erased), as work on computers was classified in 1945 due to its connections with the military. The subject of the paper was a nonnumerical application of computers: sorting. The method that von Neumann described was what we now call merge sort.7