Debugging Threaded Code

Threaded programming has become quite popular. For Unix, the most widespread threads package is the POSIX standard Pthreads, so we will use it for our example in this section. The principles are similar for other thread packages.

Review of Processes and Threads

Modern operating systems use timesharing to manage multiple running programs in such a way that they appear to the user to execute simultaneously. Of course, if the machine has more than one CPU, more than one program actually can run simultaneously, but for simplicity we will assume just one processor, in which case the simultaneity is only apparent.

Each instance of a running program is represented by the OS as a process (in Unix terminology) or a task (in Windows). Thus, multiple invocations of a single program that execute at the same time (e.g., simultaneous sessions of the vi text editor) are distinct processes. Processes have to "take turns" on a machine with one CPU. For concreteness, let's assume that the "turns," called timeslices, are of length 30 milliseconds.

After a process has run for 30 milliseconds, a hardware timer emits an interrupt, which causes the OS to run. We say that the process has been preempted. The OS saves the current state of the interrupted process so it can be resumed later, then selects the next process to give a timeslice to. This is known as a context switch, because the CPU's execution environment has switched from one process to another. This cycle repeats indefinitely.

A turn may end early. For example, when a process needs to perform input/output, it ultimately calls a function in the OS that carries out low-level hardware operations; for instance, a call to the C library function scanf() results in a call to the Unix OS read() system call, which interfaces with the keyboard driver. In this manner the process relinquishes its turn to the OS, and the turn ends early.

One implication of this is that scheduling of timeslices for a given process is rather random. The time it takes for the user to think and then hit a key is random, so the time its next timeslice starts is unpredictable. Moreover, if you are debugging a threaded program, you do not know the order in which the threads will be scheduled; this may make debugging more difficult.

Here is a bit more detail: The OS maintains a process table that lists information about all current processes. Roughly speaking, each process is marked in the table as being in either the Run state or the Sleep state. Let's consider an example in which a running program reaches a point at which it needs to read input from the keyboard. As just noted, this will end the process's turn. Because the process is now waiting for the I/O to complete, the OS marks it as being in the Sleep state, making it ineligible for timeslices. Thus, being in Sleep state means that the process is blocked, waiting for some event to occur. When this event finally occurs later on, the OS will then change its state in the process table back to Run.

Non-I/O events can trigger a transition to Sleep state as well. For instance, if a parent process creates a child process and calls wait(), the parent will block until the child finishes its work and terminates. Again, exactly when this happens is usually unpredictable.

Furthermore, being in the Run state does not mean that the process is actually executing on the CPU; rather, it merely means that it is ready to run—that is, eligible for a processor timeslice. Upon a context switch, the OS chooses the process that is next given a turn on the CPU from among those currently in the Run state, according to the process table. The scheduling procedure used by the OS to select the new context guarantees that any given process will keep getting timeslices, and so eventually finish, but there is no promise of which timeslices it will receive. Thus, exactly when a sleeping process actually "awakens" once the event that it awaits has occurred is random, as is the exact rate of the process's progress toward completion.

A thread is much like a process, except that it is designed to occupy less memory and to take less time to create and switch between than processes do. Indeed, threads are sometimes called "lightweight" processes and, depending on the thread system and run-time environment, they may even be implemented as operating system processes. Like programs that spawn processes to get work done, a multithreaded application will generally execute a main() procedure that creates one or more child threads. The parent, main(), is also a thread.

A major difference between processes and threads is that although each thread has its own local variables, just as is the case for a process, the global variables of the parent program in a threaded environment are shared by all threads and serve as the main method of communication between the threads. (It is possible to share globals among Unix processes, but inconvenient to do so.)

On a Linux system, you can view all the processes and threads currently on the system by running the command ps axH.

There are nonpreemptive thread systems, but Pthreads uses a preemptive thread management policy, and a thread in a program can be interrupted at any time by another thread. Thus, the element of randomness described above for processes in a timeshared system also arises in the behavior of a threaded program. As a result, some bugs in applications developed using Pthreads are not readily reproducible.

Basic Example

We'll keep things simple and use the following code for finding prime numbers as an example. The program uses the classic Sieve of Eratosthenes. To find all the primes from 2 through n, we first list all the numbers, then cross out all the multiples of 2, then all the multiples of 3, and so on. Whatever numbers remain at the end are prime numbers.

 1  // finds the primes between 2 and n; uses the Sieve of Eratosthenes,
 2  // deleting all multiples of 2, all multiples of 3, all multiples of 5,
 3  // etc.; not efficient, e.g. each thread should do deleting for a whole
 4  // block of values of base before going to nextbase for more
 5
 6  // usage:  sieve nthreads n
 7  // where nthreads is the number of worker threads
 8
 9  #include <stdio.h>
10  #include <math.h>
11  #include <pthread.h>
12
13  #define MAX_N 100000000
14  #define MAX_THREADS 100
15
16  // shared variables
17  int nthreads,  // number of threads (not counting main())
18      n,  // upper bound of range in which to find primes
19      prime[MAX_N+1],  // in the end, prime[i] = 1 if i prime, else 0
20      nextbase;  // next sieve multiplier to be used
21
22  int work[MAX_THREADS];  // to measure how much work each thread does,
23                          // in terms of number of sieve multipliers checked
24
25  // lock index for the shared variable nextbase
26  pthread_mutex_t nextbaselock = PTHREAD_MUTEX_INITIALIZER;
27
28  // ID structs for the threads
29  pthread_t id[MAX_THREADS];
30
31  // "crosses out" all multiples of k, from k*k on
32  void crossout(int k)
33  {  int i;
34
35     for (i = k; i*k <= n; i++)  {
36        prime[i*k] = 0;
37     }
38  }
39
40  // worker thread routine
41  void *worker(int tn)  // tn is the thread number (0,1,...)
42  {  int lim,base;
43
44    // no need to check multipliers bigger than sqrt(n)
45    lim = sqrt(n);
46
47    do  {
48       // get next sieve multiplier, avoiding duplication across threads
49       pthread_mutex_lock(&nextbaselock);
50       base = nextbase += 2;
51       pthread_mutex_unlock(&nextbaselock);
52       if (base <= lim)  {
53          work[tn]++;  // log work done by this thread
54          // don't bother with crossing out if base is known to be
55          // composite
56          if (prime[base])
57             crossout(base);
58       }
59       else return;
60    } while (1);
61  } 
62
63  main(int argc, char **argv)
64  {  int nprimes,  // number of primes found
65         totwork,  // number of base values checked
66         i;
67    void *p;
68
69    n = atoi(argv[1]);
70    nthreads = atoi(argv[2]);
71    for (i = 2; i <= n; i++)
72       prime[i] = 1;
73    crossout(2);
74    nextbase = 1;
75    // get threads started
76    for (i = 0; i < nthreads; i++)  {
77       pthread_create(&id[i],NULL,(void *) worker,(void *) i);
78    }
79
80    // wait for all done
81    totwork = 0;
82    for (i = 0; i < nthreads; i++)  {
83       pthread_join(id[i],&p);
84       printf("%d values of base done\n",work[i]);
85       totwork += work[i];
86    }
87    printf("%d total values of base done\n",totwork);
88
89    // report results
90    nprimes = 0;
91    for (i = 2; i <= n; i++)
92       if (prime[i]) nprimes++;
93    printf("the number of primes found was %d\n",nprimes);
94
95  }

There are two command-line arguments in this program, the upper bound n of the range to be checked for primes, and nthreads, the number of worker threads we wish to create.

Here main() creates the worker threads, each of which is an invocation of the function worker(). The workers share three data items: the upper bound variable, n; the variable specifying the next number whose multiples are to be eliminated from the range 2..n, nextbase; and the array prime[] that records, for each number in the range 2..n, whether or not it has been eliminated. Each invocation repeatedly fetches a yet-to-be-processed elimination multiplicand, base, and then eliminates all multiples of base from the range 2..n. After spawning the workers, main() uses pthread_join() to wait for all these threads to finish their work before resuming itself, at which point it counts the primes that are left and issues its report. The report includes not only the prime count, but also information on how much work each worker thread did. This assessment is useful for load balancing and performance optimization purposes on a multiprocessor system.

Each instance of worker() fetches the next value of base by executing the following code (lines 49–51):

pthread_mutex_lock(&nextbaselock);
base = nextbase += 2;
pthread_mutex_unlock(&nextbaselock);

Here, the global variable nextbase is updated and used to initialize the value of the worker() instance's local variable base; the worker then crosses out multiples of base in the array prime[]. (Note that we started by eliminating all multiples of 2 at the beginning of main(), and thereafter only need to consider odd values for base.)

Once the worker knows the value of base to use, it can safely cross out the multiples of base from the shared array prime[], because no other worker will use that value of base. However, we have to place guard statements around the update operation to the shared variable nextbase that base depends upon (line 26). Recall that any worker thread can be preempted, at an unpredictable time, by another worker thread, which will be at an unpredictable place in the code for worker(). In particular, it might just happen that the current worker is interrupted in the midst of the statement

base = nextbase += 2;

and the next timeslice is given to another thread that is also executing the same statement. In this case, there are two workers trying to modify the shared variable nextbase at once, which can lead to insidious and hard-to-reproduce bugs.

Bracketing the code that manipulates the shared variable—known as a critical section—with the guard statements prevents this from happening. The calls to pthread_mutex_lock() and pthread_mutex_unlock() ensure that there is at most only one thread executing the enclosed program fragment. They tell the OS to allow a thread to enter the critical section only if there is no other thread currently executing it, and to not preempt that thread until it completes the entire section. (The lock variable nextbaselock is used internally by the thread system to ensure this "mutual exclusion.")

Unfortunately, it's all too easy to fail to recognize and/or properly protect critical sections in threaded code. Let's see how GDB can be used to debug this sort of error in a Pthreads program. Suppose we had forgotten to include the unlock statement,

pthread_mutex_unlock(&nextbaselock);

This of course causes the program to hang once the critical section is first entered by a worker thread, as the other worker threads will wait forever for the lock to be relinquished. But let's pretend you don't already know this. How do you track down the culprit using GDB?

Compile the progam, making sure to include the flags -lpthread -lm in order to link in the Pthreads and math libraries (the latter is needed for thr call to sqrt()). Then run the code in GDB, with n = 100 and nthreads = 2:

(gdb) r 100 2
Starting program: /debug/primes 100 2
[New Thread 16384 (LWP 28653)]
[New Thread 32769 (LWP 28676)]
[New Thread 16386 (LWP 28677)]
[New Thread 32771 (LWP 28678)]

Each time a new thread is created, GDB announces it, as shown here. We'll look into which thread is which in a moment.

The program hangs, and you interrupt it by pressing CTRL-C. The GDB session now looks like this:

(gdb) r 100 2
Starting program: /debug/primes 100 2
[New Thread 16384 (LWP 28653)]
[New Thread 32769 (LWP 28676)]
[New Thread 16386 (LWP 28677)]
[New Thread 32771 (LWP 28678)]

Program received signal SIGINT, Interrupt.
[Switching to Thread 32771 (LWP 28678)]
0x4005ba35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0

At a point like this it's crucial to know what each thread is doing, which you can determine via GDB's info threads command:

(gdb) info threads
* 4 Thread 32771 (LWP 28678)  0x4005ba35 in __pthread_sigsuspend ()
   from /lib/i686/libpthread.so.0
  3 Thread 16386 (LWP 28677)  0x4005ba35 in __pthread_sigsuspend ()
   from /lib/i686/libpthread.so.0
  2 Thread 32769 (LWP 28676)  0x420db1a7 in poll () from
/lib/i686/libc.so.6
  1 Thread 16384 (LWP 28653)  0x4005ba35 in __pthread_sigsuspend ()
   from /lib/i686/libpthread.so.0

The asterisk means that you are currently in thread 4. Let's see what's going on with that thread:

(gdb) bt
#0  0x4005ba35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x4005adb8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
#2 0x4005d190 in __pthread_alt_lock () from /lib/i686/libpthread.so.0
#3 0x40059d77 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4 0x0804855f in worker (tn=1) at Primes.c:49
#5 0x40059881 in pthread_start_thread () from /lib/i686/libpthread.so.0
#6 0x40059985 in pthread_start_thread_event () from
/lib/i686/libpthread.so.0

(This works under the LinuxThreads implementation of Pthreads but may not be possible on some other platforms.)

Aha—you see in frames 3 and 4 that this thread is on line 49 of the source code and is trying to acquire the lock and enter the critical section:

pthread_mutex_lock(&nextbaselock);

Note also from frame 0 above that the thread is apparently suspended pending the lock's being relinquished by another thread. It will not get any timeslices until this happens and the thread manager arranges for it to acquire the lock.

What are the other threads doing? You can inspect any thread's stack by switching to that thread and then issuing the bt command:

(gdb) thread 3
[Switching to thread 3 (Thread 16386 (LWP 28677))]#0  0x4005ba35 in
__pthread_sigsuspend () from /lib/i686/libpthread.so.0
(gdb) bt
#0  0x4005ba35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x4005adb8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
#2  0x4005d190 in __pthread_alt_lock () from /lib/i686/libpthread.so.0
#3  0x40059d77 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4  0x0804855f in worker (tn=0) at Primes.c:49
#5  0x40059881 in pthread_start_thread () from /lib/i686/libpthread.so.0
#6  0x40059985 in pthread_start_thread_event () from
/lib/i686/libpthread.so.0

Recall that we created two worker threads. You saw above that thread 4 was one of them (frame 4 from its bt output), and now you see from frame 4 of the output here that thread 3 is the other one. You also see that thread 3 is trying to acquire the lock as well (frame 3).

There shouldn't be any other worker threads, but one of the fundamental principles of debugging is that nothing is taken on faith, and everything must be checked. We do this now by inspecting the status of the remaining threads. You'll find that the other two threads are nonworker threads, as follows:

(gdb) thread 2
[Switching to thread 2 (Thread 32769 (LWP 28676))]#0  0x420db1a7 in poll
()
   from /lib/i686/libc.so.6
(gdb) bt
#0  0x420db1a7 in poll () from /lib/i686/libc.so.6
#1  0x400589de in __pthread_manager () from /lib/i686/libpthread.so.0
#2  0x4005962b in __pthread_manager_event () from
/lib/i686/libpthread.so.0

So thread 2 is the threads manager. This is internal to the Pthreads package. It is certainly not a worker thread, partially confirming our expectation that there are only two worker threads. Checking thread 1,

(gdb) thread 1
[Switching to thread 1 (Thread 16384 (LWP 28653))]#0  0x4005ba35 in
__pthread_sigsuspend () from /lib/i686/libpthread.so.0
(gdb) bt
#0  0x4005ba35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x4005adb8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
#2  0x40058551 in pthread_join () from /lib/i686/libpthread.so.0
#3  0x080486aa in main (argc=3, argv=0xbfffe7b4) at Primes.c:83
#4  0x420158f7 in __libc_start_main () from /lib/i686/libc.so.6

you find it executes main() and can thus confirm that there are only two worker threads.

However, both of the workers are stalled, each waiting for the lock to be relinquished. No wonder the program is hanging! This is enough to pinpoint the location and nature of the bug, and we quickly realize that we forgot the call to the unlocking function.

A Variation

What if you hadn't realized the necessity of guarding the update of the shared variable nextbase in the first place? What would have happened in the previous example if you'd left out both the unlock and the lock operations?

A naive look at this question might lead to the guess that there would have been no harm in terms of correct operation of the program (i.e., getting an accurate count of the number of primes), albeit possibly with a slowdown due to duplicate work (i.e., using the same value of base more than once). It would seem that some threads may duplicate the work of others, namely when two workers happen to grab the same value of nextbase to initialize their local copies of base. Some composite numbers might then end up being crossed out twice, but the results (i.e., the count of the number of primes) would still be correct.

But let's take a closer look. The statement

base = nextbase += 2;

compiles to at least two machine language instructions. For instance, using the GCC compiler on a Pentium machine running Linux, the C statement above translates to the following assembly language instructions (obtained by running GCC with the -S option and then viewing the resulting .s file):

addl $2, nextbase
movl nextbase, %eax
movl %eax, -8(%ebp)

This code increments nextbase by 2, then copies the value of nextbase to the register EAX, and finally, copies the value of EAX to the place in the worker's stack where its local variable base is stored.

Suppose you have only two worker threads and the value of nextbase is, say, 9, and the currently running worker() invocation's timeslice ends just after it executes the machine instruction

addl $2, nextbase

which sets the shared global variable nextbase to 11. Suppose the next timeslice goes to another invocation of worker(), which happens to be executing those same instructions. The second worker now increments nextbase to 13, uses this to set its local variable base, and starts to eliminate all multiples of 13. Eventually, the first invocation of worker() will get another timeslice, and it will then pick up where it left off, executing the machine instructions

movl nextbase, %eax
movl %eax, -8(%ebp)

Of course, the value of nextbase is now 13. The first worker thus sets the value of its local variable base to 13 and proceeds to eliminate multiples of this value, not the value 11 that it set during its last timeslice. Neither worker does anything with the multiples of 11. You end up not only duplicating work unnecessarily, but also skipping necessary work!

How might you discover such an error using GDB? Presumably the "symptom" that surfaced was that the number of primes reported was too large. Thus you might suspect that values of base are somehow sometimes skipped. To check this hypothesis, you could place a breakpoint right after the line

base = nextbase += 2;

By repeatedly issuing the GDB continue (c) command and displaying the value of base,

(gdb) disp base

you might eventually verify that a value of base is indeed skipped.

The key word here is might. Recall our earlier discussion that threaded programs run in a somewhat random manner. In the context here, it may be the case that on some runs of the program the bug surfaces (i.e., too many primes are reported), but on other runs you may get correct answers!

There is, unfortunately, no good solution to this problem. Debugging threaded code often requires extra patience and creativity.

GDB Threads Command Summary

Here is a summary of the usage of GDB's thread-related commands:

info threads (Gives information on all current threads)
thread 3 (Changes to thread 3)
break 88 thread 3 (Stops execution when thread 3 reaches source line 88)
break 88 thread 3 if x == y (Stops execution when thread 3 reaches source line 88 and the variables x and y are equal)

Threads Commands in DDD

In DDD, select Status | Threads, and a window will pop up, displaying all threads in the manner of GDB's info threads, as seen in Figure 5-1. You can click a thread to switch the debugger's focus to it.

You will probably want to keep this pop-up window around, rather than using it once and then closing it. This way you don't have to keep reopening it every time you want to see which thread is currently running or switch to a different thread.

There appears to be no way to make a breakpoint thread-specific in DDD, as you did with the GDB command break 88 thread 3 above. Instead, you issue such a command to GDB via the DDD Console.

Threads Commands in Eclipse

Note first that the default makefile created by Eclipse will not include the -lpthread command-line argument for GCC (nor will it include the arguments for any other special libraries you need). You can alter the makefile directly if you wish, but it is easier to tell Eclipse to do it for you. While in the C/C++ perspective, right-click your project name, and select Properties; point the triangle next to C/C++ Build downward; select Settings | Tool Settings; point the triangle next to GCC C Linker downward and select Libraries | Add (the latter is the green plus sign icon); and fill in your library flags minus the -l (e.g., filling in m for -lm). Then build your project.

Recall from Chapter 1 that Eclipse constantly displays your thread list, as opposed to having to request it, as in DDD. Moreover, you do not need to ask for a backtrace kind of operation as you do in DDD; the call stack is shown in the thread list. This is depicted in Figure 5-2. As above, we ran the program for a while then interrupted it by clicking the Suspend icon to the right of Resume. The thread list is in the Debug view, which normally is in the upper-left portion of the screen but appears here in expanded form since we clicked Maximize in the Debug tab. (We can click Restore to return to the standard layout.)

Figure 5-1. Threads window

We see that thread 3 had been running at the time of the interruption; it had received a SIGINT signal, which is the interruption (CTRL-C) signal. We see also that the associated system call had been invoked by pthread_join(), which in turn had been called by main(). From what we've seen about this program earlier, we know that this indeed is the main thread.

To view the information for another thread, we merely click the triangle next to the thread to point it downward. To change to another thread, we click its entry in the list.

We may wish to set a breakpoint that applies only to a specific thread. To do so, we must first wait until the thread is created. Then when execution pauses via a previous setting of a breakpoint, or an interruption as above, we right-click the breakpoint symbol in the same manner as we would to make a breakpoint conditional, but this time, select Filtering. A pop-up window like the one in Figure 5-3 will appear. We see that this breakpoint currently applies to all three threads. If we wish it to apply only to thread 2, for instance, we would uncheck the boxes next to the entries for the other two threads.

Figure 5-2. Threads display in Eclipse