Better Living with strace and ltrace

It is important to understand the difference between library functions and system calls. Library functions are higher level, run completely in user space, and provide a more convenient interface for the programmer to the functions that do the real work—system calls. System calls do work in kernel mode on the user's behalf and are provided by the kernel of the operating system itself. The library function printf() may look like a very general printing function, but all it really does is format the data you give it into strings and write the string data using the low-level system call write(), which then sends the data to a file associated with your terminal's standard output.

The strace utility prints out each system call that your program makes, along with its arguments and return value. Would you like to see what system calls are made by printf()? It's easy! Write a "Hello, world!" program, but run it like this:

$ strace ./a.out

Aren't you impressed by how hard your computer works just to print something to the screen?

Each line of the strace output corresponds to one system call. Most of the strace output shows calls to mmap() and open() with filenames like ld.so and libc. This has to do with system-level things like mapping disk files into memory and loading shared libraries. You most likely don't care about all of that. For our purposes, there are exactly two lines of interest, toward the end of the output:

write(1, "hello world\n", 12hello world) = 12
_exit(0) = ?

These lines illustrate the general format of strace output:

The name of the system function that is being called
The arguments to the system call between parentheses
The return value of the system call^[22] following the = symbol

That's all there is to it, but what a wealth of information! You also may see errors. On our system, for example, we get the following lines:

open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)   = 3

The first call to open() tries to open a file named /etc/ld.so.preload. The return value of the call to open() (which should be a non-negative file descriptor) is -1, indicating an error of some sort. strace helpfully tells us that the error that caused open() to fail is ENOENT: the file /etc/ld.so.preload doesn't exist.

strace tells us that the second call to open() returned the value 3. This is a valid file descriptor, so apparently the call to open the file /etc/ld.so.cache succeeded. Accordingly, there are no error codes on the second line of the strace output.

By the way, don't worry about errors like this. What you see is related to dynamic library loading and is not really an error per se. The file ld.so.preload can be used to override the system's default shared libraries. Since I have no desire to fiddle with such things, the file simply doesn't exist on my system. As you gain experience with strace, you'll get better and better at filtering out this kind of "noise" and concentrating on the parts of the output you're really interested in.

strace has a few options which you'll need to use at some point or another, so we'll describe them briefly here. If you looked at the complete strace output of a "Hello, world!" program, you might have noticed that strace can be a bit … verbose. It's much more convenient to save all that output in a file than to try to look at it on the screen. One way, of course, is to redirect stderr, but you can also use the -o logfile switch to make strace write all its output to a logfile. Also, strace normally truncates strings to 32 characters. This can sometimes hide important information. To force strace to truncate strings at N characters, you can use the -s N option. Lastly, if you're running strace on a program that forks child processes, you can capture strace output for the individual children to a file named LOG.xxx with the -o LOG -ff switch, where xxx is the child's process ID.

There's also a utility named ltrace which is like strace, but shows library calls rather than system calls. ltrace and strace have many options in common, so knowing how to use one of them will take you far toward learning the other.

The strace and ltrace utilities are very useful when you want to send bug reports and diagnostic information to maintainers of programs for which you don't have the source code, and even if you do have the source files, using these tools can sometimes be faster than digging through the code.

One of the authors first stumbled across the usefulness of these tools when trying to install and run a poorly documented proprietary application on his system. When launched, the application immediately returned to the shell, apparently without doing anything. He wanted to send the company something more informative than just the observation, "Your program immediately exits." Running strace on the application yielded a clue:

open(umovestr: Input/output error 0, O_RDONLY) = -1 EFAULT (Bad address)

and running ltrace yielded even more clues:

fopen(NULL, "r")                                 = 0

According to the output, this was the very first call to fopen(). The application presumably wanted to open some kind of configuration file, but fopen() was passed NULL. The application had some kind of internal fault handler that exited but produced no error message. The author was able to write a detailed bug report to the company, and as it turned out, the problem was that the application shipped with a faulty global configuration file that pointed to a non-existent local configuration file. A patch was issued the next day.

Since then the authors have found strace and ltrace to be immensely useful for tracking down bugs and figuring out stubborn, mysterious program behavior that can cause lots of head scratching.

^[22] You may be surprised by the question mark return value of exit(). All strace is saying here is that _exit returns a void.