Especially if you use an ASCII-based terminal, files can have characters that your terminal can't display. Some characters will lock up your communications software or hardware, make your screen look strange, or cause other weird problems. So if you'd like to look at a file and you aren't sure what's in there, it's not a good idea to just cat the file!
Instead, try cat -v. It shows an ASCII
("printable") representation of unprintable and non-ASCII
characters. In fact, although most manual pages don't explain how, you can read
the output and see what's in the file. Another utility for displaying
nonprintable files is od
. I usually use its
-c
option when I need to look at a file character by
character.
Let's look at a file that's almost guaranteed to be unprintable: a directory file. This example is on a standard V7 (Unix Version 7) filesystem. (Unfortunately, some Unix systems won't let you read a directory. If you want to follow along on one of those systems, try a compressed file (Section 15.6) or an executable program from /bin.) A directory usually has some long lines, so it's a good idea to pipe cat's output through fold:
%ls -fa
. .. comp %cat -v . | fold -62
M-^?^N.^@^@^@^@^@^@^@^@^@^@^@^@^@>^G..^@^@^@^@^@^@^@^@^@^@^@^@ M-a comp^@^@^@^@^@^@^@^@^@^@^@^@MassAveFood^@^@^@^@^@hist^@^@^ @^@^@^@^@^@^@^@ %od -c .
0000000 377 016 . \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000020 > 007 . . \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 341 \n c o m p \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000060 \0 \0 M a s s A v e F o o d \0 \0 \0 0000100 \0 \0 h i s t \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000120
Each entry in a V7-type directory is 16 bytes long (that's also 16 characters, in the ASCII system). The od -c command starts each line with the number of bytes, in octal, shown since the start of the file. The first line starts at byte 0. The second line starts at byte 20 octal (that's byte 16 in decimal, the way most people count). And so on. Enough about od for now, though. We'll come back to it in a minute. Time to dissect the cat -v output:
You've probably seen sequences like ^N
and ^G
. Those are
control characters.
Another character like this is ^@
,
the character NUL (ASCII 0).
There are a lot of NULs in the directory; more about that later. A
DEL character (ASCII 177 octal)
is shown as ^?
. Check an ASCII
chart.
cat -v has its own symbol for characters outside
the ASCII range with their high bits set, also called metacharacters. cat
-v prints those as M-
followed by another character. There are two of them in the
cat -v output: M-^?
and M-a
.
To get a metacharacter, you add 200 octal. For an example, let's look
at M-a
. The octal value of the letter
a
is 141. When cat
-v prints M-a
, it
means the character you get by adding 141+200, or 341 octal.
You can decode that the character cat prints as M-^?
in
the same way. The ^?
stands for the
DEL character, which is octal 177. Add 200+177 to get 377 octal.
If a character isn't M-
something
or ^
something
, it's a regular printable
character. The entries in the directory (.
, ..
, comp
, MassAveFood
, and hist
)
are all made of regular ASCII characters.
If you're wondering where the entries MassAveFood
and hist
are in the ls listing, the answer is
that they aren't. Those entries have been deleted from the directory.
Unix puts two NUL (ASCII 0, or ^@
)
bytes in front of the names of deleted V7 directory entries.
cat
has two options, -t
and
-e
, for displaying whitespace in a line. The
-v
option doesn't convert TAB and trailing-space characters to a
visible form without those options. See Section 12.5.
Next, od -c. It's easier to explain than cat -v:
od -c
shows some characters
starting with a backslash (\
). It
uses the standard Unix and C abbreviations for control characters where it can. For
instance, \n
stands for a newline
character, \t
for a tab, etc. There's
a newline at the start of the comp
entry — see it in the od -c output? That explains
why the cat -v output was broken onto a new line at
that place: cat -v doesn't translate newlines when
it finds them.
The \0
is a NUL character (ASCII
0). It's used to pad the ends of entries in V7 directories when a name
isn't the full 14 characters long.
od -c shows the octal value of other characters
as three digits. For instance, the 007
means "the character 7 octal." cat
-v shows this as ^G
(CTRL-g).
Metacharacters, the ones with octal
values 200 and higher, are shown as M-
something
by cat -v. In
od -c, you'll see their octal values — such as
341
.
Each directory entry on a Unix Version 7 filesystem starts with a
two-byte "pointer" to its location in the disk's inode table. When you
type a filename, Unix uses this pointer to find the actual file
information on the disk. The entry for this directory (named .) is
377 016
. Its parent (named
..
) is at > 007
. And comp's
entry is 341 \n
. Find those in the
cat -v output, if you want; and compare the two
outputs.
Like cat -v, regular printable characters are shown as is by od -c.
The strings ( Section 13.15) program finds printable strings of characters (such as filenames) inside mostly nonprintable files (such as executable binaries).
— JP