Extended Example

This section presents an example of debugging a shared-memory application developed using OpenMP. The necessary knowledge of OpenMP will be explained below. All that is needed is a basic understanding of threads.

OpenMP is essentially a higher-level parallel programming interface to thread-management operations. The number of threads is set via the environment variable OMP_NUM_THREADS. In the C shell, for instance, you type

% setenv OMP_NUM_THREADS 4

at the shell prompt to arrange to have four threads.

Application code consists of C interspersed with OpenMP directives. Each directive applies to the block that follows it, delimited by left and right braces. The most basic directive is

#pragma omp parallel

This sets up OMP_NUM_THREADS threads, each of which concurrently executes the block of code following the pragma. There will typically be other directives embedded within this block.

Another very common OpenMP directive is

#pragma omp barrier

This specifies a "meeting point" for all the threads. When any thread reaches this point, it will block until all the other threads have arrived there.

You may often wish to have just one thread execute a certain block, while the other threads skip it. This is accomplished by writing

#pragma omp single

There is an implied barrier immediately following such a block.

There are many other OpenMP directives, but the only other one used in the example here is

#pragma omp critical

As the name implies, this creates a critical section, in which only one thread is allowed at any given time.

We implement the famous Dijkstra algorithm for determining minimum distances between pairs of vertices in a weighted graph. Say we are given distances between adjacent vertices (if two vertices are not adjacent, the distance between them is set to infinity). The goal is to find the minimum distances between vertex 0 and all other vertices.

Following is the source file, dijkstra.c. It generates random edge lengths among a specified number of vertices and then finds the minimum distances from vertex 0 to each of the other vertices.

  1  // dijkstra.c
  2
  3  // OpenMP example program:  Dijkstra shortest-path finder in a
  4  // bidirectional graph; finds the shortest path from vertex 0 to all 
  5  // others
  6
  7  // usage:  dijkstra nv print
  8
  9  // where nv is the size of the graph, and print is 1 if graph and min
 10  // distances are to be printed out, 0 otherwise
 11
 12  #include <omp.h>  // required
 13  #include <values.h>
 14
 15  // including stdlib.h and stdio.h seems to cause a conflict with the
 16  // Omni compiler, so declare directly
 17  extern void *malloc();
 18  extern int printf(char *,...);
 19
 20  // global variables, shared by all threads
 21  int nv,  // number of vertices
 22      *notdone, // vertices not checked yet
 23      nth,  // number of threads
 24      chunk,  // number of vertices handled by each thread
 25      md,  // current min over all threads
 26      mv;  // vertex which achieves that min
 27
 28  int *ohd,  // 1-hop distances between vertices; "ohd[i][j]" is
 29             // ohd[i*nv+j]
 30      *mind;  // min distances found so far
 31
 32  void init(int ac, char **av)
 33  {  int i,j,tmp;
 34     nv = atoi(av[1]);
 35     ohd = malloc(nv*nv*sizeof(int));
 36     mind = malloc(nv*sizeof(int));
 37     notdone = malloc(nv*sizeof(int));
 38     // random graph
 39     for (i = 0; i < nv; i++)
 40        for (j = i; j < nv; j++)  {
 41           if (j == i) ohd[i*nv+i] = 0;
 42           else  {
 43              ohd[nv*i+j] = rand() % 20;
 44              ohd[nv*j+i] = ohd[nv*i+j];
 45           }
 46        }
 47     for (i = 1; i < nv; i++)  {
 48        notdone[i] = 1;
 49        mind[i] = ohd[i];
 50    }
 51  }
 52
 53  // finds closest to 0 among notdone, among s through e; returns min
 54  // distance in *d, closest vertex in *v
 55  void findmymin(int s, int e, int *d, int *v)
 56  {  int i;
 57     *d = MAXINT;
 58     for (i = s; i <= e; i++)
 59        if (notdone[i] && mind[i] < *d)  {
 60           *d = mind[i];
 61           *v = i;
 62       }
 63  }
 64
 65  // for each i in {s,...,e}, ask whether a shorter path to i exists, through
 66  // mv
 67  void updatemind(int s, int e)
 68  {  int i;
 69     for (i = s; i <= e; i++)
 70        if (notdone[i])
 71           if (mind[mv] + ohd[mv*nv+i] < mind[i])
 72              mind[i] = mind[mv] + ohd[mv*nv+i];
 73  }
 74
 75  void dowork()
 76  {
 77     #pragma omp parallel
 78     {  int startv,endv,  // start, end vertices for this thread
 79            step,  // whole procedure goes nv steps
 80            mymv,  // vertex which attains that value
 81            me = omp_get_thread_num(),
 82            mymd;  // min value found by this thread
 83        #pragma omp single
 84        {  nth = omp_get_num_threads();  chunk = nv/nth;
 85           printf("there are %d threads\n",nth);  }
 86        startv = me * chunk;
 87        endv = startv + chunk - 1;
 88        // the algorithm goes through nv iterations
 89        for (step = 0; step < nv; step++)  {
 90           // find closest vertex to 0 among notdone; each thread finds
 91           // closest in its group, then we find overall closest
 92           #pragma omp single
 93           {  md = MAXINT;
 94              mv = 0;
 95           }
 96           findmymin(startv,endv,&mymd,&mymv);
 97           // update overall min if mine is smaller
 98           #pragma omp critical
 99          {  if (mymd < md)
100               {  md = mymd;  }
101          }
102          #pragma omp barrier
103          // mark new vertex as done
104          #pragma omp single
105          {  notdone[mv] = 0;  }
106          // now update my section of ohd
107          updatemind(startv,endv);
108        }
109     }
110  }
111
112  int main(int argc, char **argv)
113  {  int i,j,print;
114     init(argc,argv);
115     // start parallel
116     dowork();
117     // back to single thread
118     print = atoi(argv[2]);
119     if (print)  {
120        printf("graph weights:\n");
121        for (i = 0; i < nv; i++)  {
122           for (j = 0; j < nv; j++)
123              printf("%u  ",ohd[nv*i+j]);
124           printf("\n");
125        }
126        printf("minimum distances:\n");
127        for (i = 1; i < nv; i++)
128           printf("%u\n",mind[i]);
129     }
130  }

Let's review how the algorithm works. Start with all vertices except vertex 0, which in this case are vertices 1 through 5, in a "not done" set. In each iteration of the algorithm, do the following:

The iteration continues until the "not done" set is empty.

Since OpenMP directives require preprocessing, there is always the potential problem that we will lose our original line numbers and variable and function names. To see how to address this, we will discuss two different compilers. First we'll look at the Omni compiler (http://www.hpcc.jp/Omni/), and then at GCC (version 4.2 or later is required).

We compile our code under Omni as follows:

$ omcc -g -o dij dijkstra.c

After compiling the program and running it with four threads, we find that it fails to work properly:

$ dij 6 1
there are 4 threads
graph weights:
0  3  6  17  15  13
3  0  15  6  12  9
6  15  0  1  2  7
17  6  1  0  10  19
15  12  2  10  0  3
13  9  7  19  3  0
minimum distances:
3
6
17
15
13

Analyzing the graph by hand shows that the correct minimum distances should be 3, 6, 7, 8, and 11.

Next, we run the program in GDB. Here it is very important to understand the consequences of the fact that OpenMP works via directives. Although line numbers, function names, and so on are mostly retained by the two compilers discussed here, there are some discrepancies between them. Look at what happens when we try to set a breakpoint in the executable, dij, at the outset of the GDB session:

(gdb) tb main
Breakpoint 1 at 0x80492af
(gdb) r 6 1
Starting program: /debug/dij 6 1
[Thread debugging using libthread_db enabled]
[New Thread -1208490304 (LWP 11580)]
[Switching to Thread -1208490304 (LWP 11580)]
0x080492af in main ()
(gdb) l
1       /tmp/omni_C_11486.c: No such file or directory.
        in /tmp/omni_C_11486.c

We discover that the breakpoint is not in the source file. Instead, it is in Omni's OpenMP infrastructure code. In other words, main() here is Omni's main(), not your own. The Omni compiler mangled the name of our main() to _ompc_main().

To set a breakpoint at main(), we type

(gdb) tb _ompc_main
Breakpoint 2 at 0x80491b3: file dijkstra.c, line 114.

and check it by continuing:

(gdb) c
Continuing.
[New Thread -1208493152 (LWP 11614)]
[New Thread -1218983008 (LWP 11615)]
[New Thread -1229472864 (LWP 11616)]
_ompc_main (argc=3, argv=0xbfab6314) at dijkstra.c:114
114        init(argc,argv);

Okay, there's the familiar init() line. Of course, we could have issued the command

(gdb) b dijkstra.c:114

Note the creation of the three new threads, making four in all.

However we choose to set our breakpoints, we must do a bit more work here than normal, so it's particularly important to stay within a single GDB session between runs of the program, even when we change our source code and recompile, so that we retain the breakpoints, conditions, and so on. That way we only have to go to the trouble of setting these things up once.

Now, how do you track down the bug(s)? It is natural to approach the debugging of this program by checking the results at the end of each iteration. The main results are in the "not done" set (in the array notdone[]) and in the current list of best-known distances from 0 to the other vertices, that is, the array mind[]. For example, after the first iteration, the "not done" set should consist of vertices 2, 3, 4, and 5, vertex 1 having been selected in that iteration.

Armed with this information, let's apply the Principle of Confirmation and check notdone[] and mind[] after each iteration of the for loop in dowork().

We have to be careful as to exactly where we set our breakpoints. Although a natural spot for this seems to be line 108, at the very end of the algorithm's main loop, this may not be so good, as GDB will stop there for each thread. Instead, opt for placing a breakpoint inside an OpenMP single block, so that it will stop for only one thread.

So, instead we check the results after each iteration by stopping at the beginning of the loop, starting with the second iteration:

(gdb) b 92 if step >= 1
Breakpoint 3 at 0x80490e3: file dijkstra.c, line 92.
(gdb) c
Continuing.
there are 4 threads

Breakpoint 3, __ompc_func_0 () at dijkstra.c:93
93               {  md = MAXINT;

Let's confirm that the first iteration did choose the correct vertex (vertex 1) to be moved out of the "not done" set:

(gdb) p mv
$1 = 0

The hypothesis is not confirmed, after all. Inspection of the code shows that on line 100 we forgot to set mv. We fix it to read

{  md = mymd; mv = mymv;  }

So, we recompile and run the program again. As noted earlier in this section (and elsewhere in this book), it is very helpful to not exit GDB when you rerun the program. We could run the program in another terminal window, but just for variety let's take a different approach here. We temporarily disable our breakpoints by issuing the dis command, then run the recompiled program from within GDB, and then re-enable the breakpoints using ena:

(gdb) dis
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
`/debug/dij' has changed; re-reading symbols.
Starting program: /debug/dij 6 1
[Thread debugging using libthread_db enabled]
[New Thread -1209026880 (LWP 11712)]
[New Thread -1209029728 (LWP 11740)]
[New Thread -1219519584 (LWP 11741)]
[New Thread -1230009440 (LWP 11742)]
there are 4 threads
graph weights:
0  3  6  17  15  13
3  0  15  6  12  9
6  15  0  1  2  7
17  6  1  0  10  19
15  12  2  10  0  3
13  9  7  19  3  0
minimum distances:
3
6
17
15
13

Program exited with code 06.
(gdb) ena

We're still getting wrong answers. Let's check things at that breakpoint again:

(gdb) r
Starting program: /debug/dij 6 1
[Thread debugging using libthread_db enabled]
[New Thread -1209014592 (LWP 11744)]
[New Thread -1209017440 (LWP 11772)]
[New Thread -1219507296 (LWP 11773)]
[New Thread -1229997152 (LWP 11774)]
there are 4 threads
[Switching to Thread -1209014592 (LWP 11744)]

Breakpoint 3, __ompc_func_0 () at dijkstra.c:93
93               {  md = MAXINT;
(gdb) p mv
$2 = 1

At least mv now has the right value. Let's check mind[]:

(gdb) p *mind@6
$3 = {0, 3, 6, 17, 15, 13}

Note that because we constructed the mind[] array dynamically via malloc(), we could not use GDB's print command in its usual form. Instead, we used GDB's artificial array feature.

At any rate, mind[] is still incorrect. For instance, mind[3] should be 3 + 6 = 9, yet it is 17. Let's check the code that updates mind[]:

(gdb) b 107 if me == 1
Breakpoint 4 at 0x8049176: file dijkstra.c, line 107.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /debug/dij 6 1
[Thread debugging using libthread_db enabled]
[New Thread -1209039168 (LWP 11779)] 
[New Thread -1209042016 (LWP 11807)]
[New Thread -1219531872 (LWP 11808)]
[New Thread -1230021728 (LWP 11809)]
there are 4 threads
[Switching to Thread -1230021728 (LWP 11809)]

Breakpoint 4, __ompc_func_0 () at dijkstra.c:107
107              updatemind(startv,endv);

First, confirm that startv and endv have sensible values:

(gdb) p startv
$4 = 1
(gdb) p endv
$5 = 1

The chunk size is only 1? Let's see:

(gdb) p chunk
$6 = 1

After checking the computation for chunk, we realize that we need the number of threads to evenly divide nv. The latter has the value 6, which is not divisible by our thread count, 4. We make a note to ourselves to insert some error-catching code later, and reduce our thread count to 3 for now.

Once again, we do not want to exit GDB to do this. GDB inherits the environment variables when it is first invoked, but the values of those variables can also be changed or set within GDB, and that is what we do here:

(gdb) set environment OMP_NUM_THREADS = 3

Now let's run again:

(gdb) dis
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /debug/dij 6 1
[Thread debugging using libthread_db enabled]
[New Thread -1208707392 (LWP 11819)]
[New Thread -1208710240 (LWP 11847)]
[New Thread -1219200096 (LWP 11848)]
there are 3 threads
graph weights:
0  3  6  17  15  13
3  0  15  6  12  9
6  15  0  1  2  7
17  6  1  0  10  19
15  12  2  10  0  3
13  9  7  19  3  0
minimum distances:
3
6
7
15
12

Program exited with code 06.
(gdb) ena

Aiyah, still the same wrong answers! We continue to check the updating process for mind[]:

(gdb) r
Starting program: /debug/dij 6 1
[Thread debugging using libthread_db enabled]
[New Thread -1208113472 (LWP 11851)]
[New Thread -1208116320 (LWP 11879)]
[New Thread -1218606176 (LWP 11880)]
there are 3 threads
[Switching to Thread -1218606176 (LWP 11880)]

Breakpoint 4, __ompc_func_0 () at dijkstra.c:107
107              updatemind(startv,endv);
(gdb) p startv
$7 = 2
(gdb) p endv
$8 = 3

All right, those are the correct values for startv and endv in the case of me = 1. So, we enter the function:

(gdb) s
[Switching to Thread -1208113472 (LWP 11851)]

Breakpoint 3, __ompc_func_0 () at dijkstra.c:93
93               {  md = MAXINT;
(gdb) c
Continuing.
[Switching to Thread -1218606176 (LWP 11880)]
updatemind (s=2, e=3) at dijkstra.c:69
69         for (i = s; i <= e; i++)

Note that due to context switches among the threads, we did not enter updatemind() immediately. Now we check the case i = 3:

(gdb) tb 71 if i == 3
Breakpoint 5 at 0x8048fb2: file dijkstra.c, line 71.
(gdb) c
Continuing.
updatemind (s=2, e=3) at dijkstra.c:71
71               if (mind[mv] + ohd[mv*nv+i] < mind[i])

As usual, we apply the Principle of Confirmation:

(gdb) p mv
$9 = 0

Well, that's a big problem. Recall that in the first iteration, mv turns out to be 1. Why is it 0 here?

After a while we realize that those context switches should have been a big hint. Take a look at the GDB output above again. The thread whose system ID is 11851 was already on line 93—in other words, it was already in the next iteration of the algorithm's main loop. In fact, when we hit c to continue, it even executed line 94, which is

mv = 0;

This thread overwrote mv's previous value of 1, so that the thread that updates mind[3] is now relying on the wrong value of mv. The solution is to add another barrier:

updatemind(startv,endv);
#pragma omp barrier

After this fix, the program runs correctly.

The foregoing was based on the Omni compiler. As mentioned, beginning with version 4.2, GCC handles OpenMP code as well. All you have to do is add the -fopenmp flag to the GCC command line.

Unlike Omni, GCC generates code in such a way that GDB's focus is in your own source file from the beginning. Thus, issuing a command

(gdb) b main

at the very outset of a GDB session really will cause a breakpoint to be set in one's own main(), unlike what we saw for the Omni compiler.

However, at this writing, a major shortcoming of GCC is that the symbols for local variables that are inside an OpenMP parallel block (called private variables in OpenMP terminology) will not be visible within GDB. For example, the command

(gdb) p mv

that you issued for the Omni-generated code above will work for GCC-generated code, but the command

(gdb) p startv

will fail on GCC-generated code.

There are ways to work around this, of course. For instance, if you wish to know the value of startv, you can query the value of s within updatemind(). Hopefully this issue will be resolved in the next version of GCC.