Here is a grab bag of useful, if not exactly interesting, sort features. The utility will actually do quite a bit, if you let it.
sort -u sorts the file and eliminates duplicate lines. It's more powerful than uniq (Section 21.20) because:
It sorts the file for you; uniq assumes that the file is already sorted and won't do you any good if it isn't.
It is much more flexible. sort -u considers lines "unique" if the sort fields (Section 22.2) you've selected don't match. So the lines don't even have to be (strictly speaking) unique; differences outside of the sort fields are ignored.
In return, there are a few things that uniq does that sort won't do — such as print only those lines that aren't repeated, or count the number of times each line is repeated. But on the whole, I find sort -u more useful.
Here's one idea for using sort -u. When I was writing a manual, I often needed to make tables of error messages. The easiest way to do this was to grep the source code for printf statements, write some Emacs (Section 19.1) macros to eliminate junk that I didn't care about, use sort -u to put the messages in order and get rid of duplicates, and write some more Emacs macros to format the error messages into a table. All I had to do then was write the descriptions.
One important option (that I've
mentioned a number of times) is -b
; this tells sort to ignore extra whitespace at the
beginning of each field. This is absolutely essential; otherwise, your sorts
will have rather strange results. In my opinion, -b
should
be the default. But they didn't ask me.
Another thing to remember about -b
: it works only if you
explicitly specify which fields you want to sort. By itself, sort
-b is the same as sort:
whitespace characters are counted. I call this a bug, don't you?
If you don't care about the difference
between uppercase and lowercase letters, invoke sort with the -f
(case-fold) option. This
folds lowercase letters into uppercase. In other words, it treats all
letters as uppercase.
The -d
option tells
sort to ignore all characters except
for letters, digits, and whitespace. In particular, sort
-d ignores punctuation.
The -M
option tells
sort to treat the first three
nonblank characters of a field as a three-letter month abbreviation and to
sort accordingly. That is, JAN comes before FEB, which comes before MAR.
This option isn't available on all versions of Unix.
The -r
option tells
sort to "reverse" the order of the
sort; i.e., Z comes before A, 9 comes before 1, and so on. You'll find that
this option is really useful. For example, imagine you have a program
running in the background that records the number of free blocks in the
filesystem at midnight each night. Your log file might look like
this:
Jan 1 2001: 108 free blocks Jan 2 2001: 308 free blocks Jan 3 2001: 1232 free blocks Jan 4 2001: 76 free blocks ...
The script below finds the smallest and largest number of free blocks in your log file:
head
Section 12.12
#!/bin/sh echo "Minimum free blocks" sort -t: +1nb logfile | head -1 echo "Maximum free blocks" sort -t: +1nbr logfile | head -1
It's not profound, but it's an example of what you can do.
— ML