Unless you tell it otherwise, sort divides each line into fields at whitespace (blanks or tabs), and sorts the lines by field, from left to right.
That is, it sorts on the basis of field 0 (leftmost), but when the leftmost fields are the same, it sorts on the basis of field 1, and so on. This is hard to put into words, but it's really just common sense. Suppose your office inventory manager created a file like this:
supplies pencils 148 furniture chairs 40 kitchen knives 22 kitchen forks 20 supplies pens 236 furniture couches 10 furniture tables 7 supplies paper 29
You'd want all the supplies sorted into categories, and within each category, you'd want them sorted alphabetically:
% sort supplies
furniture chairs 40
furniture couches 10
furniture tables 7
kitchen forks 20
kitchen knives 22
supplies paper 29
supplies pencils 148
supplies pens 236
Of course, you don't always want to sort from left to right. The command-line
option +n
tells sort to
start sorting on field n; -n
tells
sort to stop sorting on field
n. Remember (again) that sort counts fields from left to right, starting with 0.[1] Here's an example. We want to sort a list of telephone numbers of
authors, presidents, and blues singers:
Robert M Johnson 344-0909 Lyndon B Johnson 933-1423 Samuel H Johnson 754-2542 Michael K Loukides 112-2535 Jerry O Peek 267-2345 Timothy F O'Reilly 443-2434
According to standard "telephone book rules," we want these names sorted by last name, first name, and middle initial. We don't want the phone number to play a part in the sorting. So we want to start sorting on field 2, stop sorting on field 3, continue sorting on field 0, sort on field 1, and (just to make sure) stop sorting on field 2 (the last name). We can code this as follows:
% sort +2 -3 +0 -2 phonelist
Lyndon B Johnson 933-1423
Robert M Johnson 344-0909
Samuel H Johnson 754-2542
Michael K Loukides 112-2535
Timothy F O'Reilly 443-2434
Jerry O Peek 267-2345
A few notes:
We need the -3
option to prevent sort from sorting on the telephone number
after sorting on the last name. Without -3
, the "Robert
Johnson" entry would appear before "Lyndon Johnson" because it has a
lower phone number.
We don't need to state +1
explicitly. Unless you give
an explicit "stop" field, +1
is implied after
+0
.
If two names are completely identical, we probably don't care what
happens next. However, just to be sure that something unexpected doesn't
take place, we end the option list with -2
, which says,
"After sorting on the middle initial, don't do any further
sorting."
There are a couple of variations that are worth mentioning. You may never need them unless you're really serious about sorting data files, but it's good to keep them in the back of your mind. First, you can add any "collation" operations (discard blanks, numeric sort, etc.) to the end of a field specifier to describe how you want that field sorted. Using our previous example, let's say that if two names are identical, you want them sorted in numeric phone number order. The following command does the trick:
% sort +2 -3 +0 -2 +3n phonelist
The +3n
option says "do a numeric sort on the fourth field."
If you're worried about initial blanks (perhaps some of the phone numbers have
area codes), use +3nb
.
Second, you can specify individual columns within any field for sorting, using
the notation +n.c
, where n is a field
number, and c is a character position within the field.
Likewise, the notation -n.c
says "stop sorting at the character
before character c." If you're counting characters, be sure
to use the -b
(ignore whitespace) option — otherwise, it will
be very difficult to figure out what character you're counting.
— ML