The sort and uniq commands

The sort command reads its input and writes it out again with the lines sorted, by default, alphanumerically by the first column. Like most filter commands that came with Unix, the input can come from a pipe, from files, or from data typed at the terminal. The first two forms are the most useful:

$ sort /etc/shells
$ sort ~/file1 ~/file2
$ printf '%s\n' 'Line 2' 'Line1' | sort

sort can sort data according to a particular column of its input data. For example, to sort all the entries in /etc/passwd by home directory (column 6), we could write:

$ sort -t: -k6,6 /etc/passwd

The options used here are:

-t: Specifies the delimiter for the data that separates the columns, in this case a colon.
-k6,6: Specifies by which column the data should be sorted, starting at one, in this case the sixth column. We add ,6 to specify that only the sixth column should be sorted; the remaining columns should not be used to sort.

Some useful options to modify each column:

-n: Performs a numeric rather than alphabetical sort, so that "10" filters after "9," not before. Forgetting this is a common source of sort errors!
-r: Reverses the sort order; shows the results that would be last first.

You can combine these formats to specify very precisely how you'd like to sort:

$ sort -t: -k7,7 -k3,3nr -k4,4nr /etc/passwd

This sorts the users file by login shell first, and for users with the same login shell, sorts them numerically, first by user ID, and then by group ID, in descending order.

You can also use the -c option simply to check whether the data is sorted. Some Unix filter tools, such as uniq and comm, require the data to be sorted to work correctly, and may print a warning if they find data arrives in an order they didn't expect. In practice, it's usually easier just to sort the data on the fly and then pass it into these tools.

Another useful option to note here is the -u option, which filters the data uniquely, ignoring any lines that have keys they've already seen. This surprises some shell script programmers who are accustomed to using uniq to do this; it's more flexible, because you can apply it to individual columns:

$ cat lyrics
02 the
03 air
02 later
02 often
01 in
04 tonight
04 for
00 something
$ sort -k1,1n -u lyrics
00 something
01 in
02 the
03 air
04 tonight

Despite the availability of the -u flag for sort, the uniq tool does still have some uses; perhaps the most useful one is with the -c flag, which allows us to count occurrences of each line of sorted data:

$ cat ips
192.2.0.1
192.2.0.1
192.2.0.25
192.2.0.1
192.2.0.24
192.2.0.1
192.2.0.25
$ sort ips | uniq -c
  4 192.2.0.1
  1 192.2.0.24
  2 192.2.0.25

The preceding output shows us that the IP address 192.2.0.1 occurred in the data four times, 192.2.0.25 twice, and 192.2.0.1 only once. If you looked at the output and thought, "we could sort by the first column," you're thinking like a shell programmer:

$ sort ips | uniq -c | sort -k1,1nr
  4 192.2.0.1
  2 192.2.0.25
  1 192.2.0.24

In this example, -k1,1 is optional. Can you see why?