This section presents an example of debugging a shared-memory application developed using OpenMP. The necessary knowledge of OpenMP will be explained below. All that is needed is a basic understanding of threads.
OpenMP is essentially a higher-level parallel programming interface to thread-management operations. The number of threads is set via the environment variable OMP_NUM_THREADS
. In the C shell, for instance, you type
% setenv OMP_NUM_THREADS 4
at the shell prompt to arrange to have four threads.
Application code consists of C interspersed with OpenMP directives. Each directive applies to the block that follows it, delimited by left and right braces. The most basic directive is
#pragma omp parallel
This sets up OMP_NUM_THREADS
threads, each of which concurrently executes the block of code following the pragma. There will typically be other directives embedded within this block.
Another very common OpenMP directive is
#pragma omp barrier
This specifies a "meeting point" for all the threads. When any thread reaches this point, it will block until all the other threads have arrived there.
You may often wish to have just one thread execute a certain block, while the other threads skip it. This is accomplished by writing
#pragma omp single
There is an implied barrier immediately following such a block.
There are many other OpenMP directives, but the only other one used in the example here is
#pragma omp critical
As the name implies, this creates a critical section, in which only one thread is allowed at any given time.
We implement the famous Dijkstra algorithm for determining minimum distances between pairs of vertices in a weighted graph. Say we are given distances between adjacent vertices (if two vertices are not adjacent, the distance between them is set to infinity). The goal is to find the minimum distances between vertex 0 and all other vertices.
Following is the source file, dijkstra.c. It generates random edge lengths among a specified number of vertices and then finds the minimum distances from vertex 0 to each of the other vertices.
1 // dijkstra.c 2 3 // OpenMP example program: Dijkstra shortest-path finder in a 4 // bidirectional graph; finds the shortest path from vertex 0 to all 5 // others 6 7 // usage: dijkstra nv print 8 9 // where nv is the size of the graph, and print is 1 if graph and min 10 // distances are to be printed out, 0 otherwise 11 12 #include <omp.h> // required 13 #include <values.h> 14 15 // including stdlib.h and stdio.h seems to cause a conflict with the 16 // Omni compiler, so declare directly 17 extern void *malloc(); 18 extern int printf(char *,...); 19 20 // global variables, shared by all threads 21 int nv, // number of vertices 22 *notdone, // vertices not checked yet 23 nth, // number of threads 24 chunk, // number of vertices handled by each thread 25 md, // current min over all threads 26 mv; // vertex which achieves that min 27 28 int *ohd, // 1-hop distances between vertices; "ohd[i][j]" is 29 // ohd[i*nv+j] 30 *mind; // min distances found so far 31 32 void init(int ac, char **av) 33 { int i,j,tmp; 34 nv = atoi(av[1]); 35 ohd = malloc(nv*nv*sizeof(int)); 36 mind = malloc(nv*sizeof(int)); 37 notdone = malloc(nv*sizeof(int)); 38 // random graph 39 for (i = 0; i < nv; i++) 40 for (j = i; j < nv; j++) { 41 if (j == i) ohd[i*nv+i] = 0; 42 else { 43 ohd[nv*i+j] = rand() % 20; 44 ohd[nv*j+i] = ohd[nv*i+j]; 45 } 46 } 47 for (i = 1; i < nv; i++) { 48 notdone[i] = 1; 49 mind[i] = ohd[i]; 50 } 51 } 52 53 // finds closest to 0 among notdone, among s through e; returns min 54 // distance in *d, closest vertex in *v 55 void findmymin(int s, int e, int *d, int *v) 56 { int i; 57 *d = MAXINT; 58 for (i = s; i <= e; i++) 59 if (notdone[i] && mind[i] < *d) { 60 *d = mind[i]; 61 *v = i; 62 } 63 } 64 65 // for each i in {s,...,e}, ask whether a shorter path to i exists, through 66 // mv 67 void updatemind(int s, int e) 68 { int i; 69 for (i = s; i <= e; i++) 70 if (notdone[i]) 71 if (mind[mv] + ohd[mv*nv+i] < mind[i]) 72 mind[i] = mind[mv] + ohd[mv*nv+i]; 73 } 74 75 void dowork() 76 { 77 #pragma omp parallel 78 { int startv,endv, // start, end vertices for this thread 79 step, // whole procedure goes nv steps 80 mymv, // vertex which attains that value 81 me = omp_get_thread_num(), 82 mymd; // min value found by this thread 83 #pragma omp single 84 { nth = omp_get_num_threads(); chunk = nv/nth; 85 printf("there are %d threads\n",nth); } 86 startv = me * chunk; 87 endv = startv + chunk - 1; 88 // the algorithm goes through nv iterations 89 for (step = 0; step < nv; step++) { 90 // find closest vertex to 0 among notdone; each thread finds 91 // closest in its group, then we find overall closest 92 #pragma omp single 93 { md = MAXINT; 94 mv = 0; 95 } 96 findmymin(startv,endv,&mymd,&mymv); 97 // update overall min if mine is smaller 98 #pragma omp critical 99 { if (mymd < md) 100 { md = mymd; } 101 } 102 #pragma omp barrier 103 // mark new vertex as done 104 #pragma omp single 105 { notdone[mv] = 0; } 106 // now update my section of ohd 107 updatemind(startv,endv); 108 } 109 } 110 } 111 112 int main(int argc, char **argv) 113 { int i,j,print; 114 init(argc,argv); 115 // start parallel 116 dowork(); 117 // back to single thread 118 print = atoi(argv[2]); 119 if (print) { 120 printf("graph weights:\n"); 121 for (i = 0; i < nv; i++) { 122 for (j = 0; j < nv; j++) 123 printf("%u ",ohd[nv*i+j]); 124 printf("\n"); 125 } 126 printf("minimum distances:\n"); 127 for (i = 1; i < nv; i++) 128 printf("%u\n",mind[i]); 129 } 130 }
Let's review how the algorithm works. Start with all vertices except vertex 0, which in this case are vertices 1 through 5, in a "not done" set. In each iteration of the algorithm, do the following:
Find the "not done" vertex v
that is closest to vertex 0, along paths known so far. This checking is shared by all the threads, with each thread checking an equal number of vertices. The function that does this work is findmymin()
.
Then move v
to the "done" set.
For all remaining vertices i
in the "not done" set, check whether going first from 0 to v
along the best known path so far, and then from v
to i
in one hop, is shorter than the current shortest distance from 0 to i
. If so, update that distance accordingly. The function that performs these actions is updatemind()
.
The iteration continues until the "not done" set is empty.
Since OpenMP directives require preprocessing, there is always the potential problem that we will lose our original line numbers and variable and function names. To see how to address this, we will discuss two different compilers. First we'll look at the Omni compiler (http://www.hpcc.jp/Omni/), and then at GCC (version 4.2 or later is required).
We compile our code under Omni as follows:
$ omcc -g -o dij dijkstra.c
After compiling the program and running it with four threads, we find that it fails to work properly:
$ dij 6 1 there are 4 threads graph weights: 0 3 6 17 15 13 3 0 15 6 12 9 6 15 0 1 2 7 17 6 1 0 10 19 15 12 2 10 0 3 13 9 7 19 3 0 minimum distances: 3 6 17 15 13
Analyzing the graph by hand shows that the correct minimum distances should be 3, 6, 7, 8, and 11.
Next, we run the program in GDB. Here it is very important to understand the consequences of the fact that OpenMP works via directives. Although line numbers, function names, and so on are mostly retained by the two compilers discussed here, there are some discrepancies between them. Look at what happens when we try to set a breakpoint in the executable, dij
, at the outset of the GDB session:
(gdb) tb main Breakpoint 1 at 0x80492af (gdb) r 6 1 Starting program: /debug/dij 6 1 [Thread debugging using libthread_db enabled] [New Thread -1208490304 (LWP 11580)] [Switching to Thread -1208490304 (LWP 11580)] 0x080492af in main () (gdb) l 1 /tmp/omni_C_11486.c: No such file or directory. in /tmp/omni_C_11486.c
We discover that the breakpoint is not in the source file. Instead, it is in Omni's OpenMP infrastructure code. In other words, main()
here is Omni's main()
, not your own. The Omni compiler mangled the name of our main()
to _ompc_main()
.
To set a breakpoint at main()
, we type
(gdb) tb _ompc_main Breakpoint 2 at 0x80491b3: file dijkstra.c, line 114.
and check it by continuing:
(gdb) c Continuing. [New Thread -1208493152 (LWP 11614)] [New Thread -1218983008 (LWP 11615)] [New Thread -1229472864 (LWP 11616)] _ompc_main (argc=3, argv=0xbfab6314) at dijkstra.c:114 114 init(argc,argv);
Okay, there's the familiar init()
line. Of course, we could have issued the command
(gdb) b dijkstra.c:114
Note the creation of the three new threads, making four in all.
However we choose to set our breakpoints, we must do a bit more work here than normal, so it's particularly important to stay within a single GDB session between runs of the program, even when we change our source code and recompile, so that we retain the breakpoints, conditions, and so on. That way we only have to go to the trouble of setting these things up once.
Now, how do you track down the bug(s)? It is natural to approach the debugging of this program by checking the results at the end of each iteration. The main results are in the "not done" set (in the array notdone[]
) and in the current list of best-known distances from 0 to the other vertices, that is, the array mind[]
. For example, after the first iteration, the "not done" set should consist of vertices 2, 3, 4, and 5, vertex 1 having been selected in that iteration.
Armed with this information, let's apply the Principle of Confirmation and check notdone[]
and mind[]
after each iteration of the for
loop in dowork()
.
We have to be careful as to exactly where we set our breakpoints. Although a natural spot for this seems to be line 108, at the very end of the algorithm's main loop, this may not be so good, as GDB will stop there for each thread. Instead, opt for placing a breakpoint inside an OpenMP single
block, so that it will stop for only one thread.
So, instead we check the results after each iteration by stopping at the beginning of the loop, starting with the second iteration:
(gdb) b 92 if step >= 1 Breakpoint 3 at 0x80490e3: file dijkstra.c, line 92. (gdb) c Continuing. there are 4 threads Breakpoint 3, __ompc_func_0 () at dijkstra.c:93 93 { md = MAXINT;
Let's confirm that the first iteration did choose the correct vertex (vertex 1) to be moved out of the "not done" set:
(gdb) p mv $1 = 0
The hypothesis is not confirmed, after all. Inspection of the code shows that on line 100 we forgot to set mv
. We fix it to read
{ md = mymd; mv = mymv; }
So, we recompile and run the program again. As noted earlier in this section (and elsewhere in this book), it is very helpful to not exit GDB when you rerun the program. We could run the program in another terminal window, but just for variety let's take a different approach here. We temporarily disable our breakpoints by issuing the dis
command, then run the recompiled program from within GDB, and then re-enable the breakpoints using ena
:
(gdb) dis (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y `/debug/dij' has changed; re-reading symbols. Starting program: /debug/dij 6 1 [Thread debugging using libthread_db enabled] [New Thread -1209026880 (LWP 11712)] [New Thread -1209029728 (LWP 11740)] [New Thread -1219519584 (LWP 11741)] [New Thread -1230009440 (LWP 11742)] there are 4 threads graph weights: 0 3 6 17 15 13 3 0 15 6 12 9 6 15 0 1 2 7 17 6 1 0 10 19 15 12 2 10 0 3 13 9 7 19 3 0 minimum distances: 3 6 17 15 13 Program exited with code 06. (gdb) ena
We're still getting wrong answers. Let's check things at that breakpoint again:
(gdb) r Starting program: /debug/dij 6 1 [Thread debugging using libthread_db enabled] [New Thread -1209014592 (LWP 11744)] [New Thread -1209017440 (LWP 11772)] [New Thread -1219507296 (LWP 11773)] [New Thread -1229997152 (LWP 11774)] there are 4 threads [Switching to Thread -1209014592 (LWP 11744)] Breakpoint 3, __ompc_func_0 () at dijkstra.c:93 93 { md = MAXINT; (gdb) p mv $2 = 1
At least mv
now has the right value. Let's check mind[]
:
(gdb) p *mind@6 $3 = {0, 3, 6, 17, 15, 13}
Note that because we constructed the mind[]
array dynamically via malloc()
, we could not use GDB's print
command in its usual form. Instead, we used GDB's artificial array feature.
At any rate, mind[]
is still incorrect. For instance, mind[3]
should be 3 + 6 = 9, yet it is 17. Let's check the code that updates mind[]
:
(gdb) b 107 if me == 1 Breakpoint 4 at 0x8049176: file dijkstra.c, line 107. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /debug/dij 6 1 [Thread debugging using libthread_db enabled] [New Thread -1209039168 (LWP 11779)] [New Thread -1209042016 (LWP 11807)] [New Thread -1219531872 (LWP 11808)] [New Thread -1230021728 (LWP 11809)] there are 4 threads [Switching to Thread -1230021728 (LWP 11809)] Breakpoint 4, __ompc_func_0 () at dijkstra.c:107 107 updatemind(startv,endv);
First, confirm that startv
and endv
have sensible values:
(gdb) p startv $4 = 1 (gdb) p endv $5 = 1
The chunk size is only 1? Let's see:
(gdb) p chunk $6 = 1
After checking the computation for chunk
, we realize that we need the number of threads to evenly divide nv
. The latter has the value 6, which is not divisible by our thread count, 4. We make a note to ourselves to insert some error-catching code later, and reduce our thread count to 3 for now.
Once again, we do not want to exit GDB to do this. GDB inherits the environment variables when it is first invoked, but the values of those variables can also be changed or set within GDB, and that is what we do here:
(gdb) set environment OMP_NUM_THREADS = 3
Now let's run again:
(gdb) dis (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /debug/dij 6 1 [Thread debugging using libthread_db enabled] [New Thread -1208707392 (LWP 11819)] [New Thread -1208710240 (LWP 11847)] [New Thread -1219200096 (LWP 11848)] there are 3 threads graph weights: 0 3 6 17 15 13 3 0 15 6 12 9 6 15 0 1 2 7 17 6 1 0 10 19 15 12 2 10 0 3 13 9 7 19 3 0 minimum distances: 3 6 7 15 12 Program exited with code 06. (gdb) ena
Aiyah, still the same wrong answers! We continue to check the updating process for mind[]
:
(gdb) r Starting program: /debug/dij 6 1 [Thread debugging using libthread_db enabled] [New Thread -1208113472 (LWP 11851)] [New Thread -1208116320 (LWP 11879)] [New Thread -1218606176 (LWP 11880)] there are 3 threads [Switching to Thread -1218606176 (LWP 11880)] Breakpoint 4, __ompc_func_0 () at dijkstra.c:107 107 updatemind(startv,endv); (gdb) p startv $7 = 2 (gdb) p endv $8 = 3
All right, those are the correct values for startv
and endv
in the case of me = 1
. So, we enter the function:
(gdb) s [Switching to Thread -1208113472 (LWP 11851)] Breakpoint 3, __ompc_func_0 () at dijkstra.c:93 93 { md = MAXINT; (gdb) c Continuing. [Switching to Thread -1218606176 (LWP 11880)] updatemind (s=2, e=3) at dijkstra.c:69 69 for (i = s; i <= e; i++)
Note that due to context switches among the threads, we did not enter updatemind()
immediately. Now we check the case i = 3
:
(gdb) tb 71 if i == 3 Breakpoint 5 at 0x8048fb2: file dijkstra.c, line 71. (gdb) c Continuing. updatemind (s=2, e=3) at dijkstra.c:71 71 if (mind[mv] + ohd[mv*nv+i] < mind[i])
As usual, we apply the Principle of Confirmation:
(gdb) p mv $9 = 0
Well, that's a big problem. Recall that in the first iteration, mv
turns out to be 1. Why is it 0 here?
After a while we realize that those context switches should have been a big hint. Take a look at the GDB output above again. The thread whose system ID is 11851 was already on line 93—in other words, it was already in the next iteration of the algorithm's main loop. In fact, when we hit c
to continue, it even executed line 94, which is
mv = 0;
This thread overwrote mv
's previous value of 1, so that the thread that updates mind[3]
is now relying on the wrong value of mv
. The solution is to add another barrier:
updatemind(startv,endv); #pragma omp barrier
After this fix, the program runs correctly.
The foregoing was based on the Omni compiler. As mentioned, beginning with version 4.2, GCC handles OpenMP code as well. All you have to do is add the -fopenmp
flag to the GCC command line.
Unlike Omni, GCC generates code in such a way that GDB's focus is in your own source file from the beginning. Thus, issuing a command
(gdb) b main
at the very outset of a GDB session really will cause a breakpoint to be set in one's own main()
, unlike what we saw for the Omni compiler.
However, at this writing, a major shortcoming of GCC is that the symbols for local variables that are inside an OpenMP parallel
block (called private variables in OpenMP terminology) will not be visible within GDB. For example, the command
(gdb) p mv
that you issued for the Omni-generated code above will work for GCC-generated code, but the command
(gdb) p startv
will fail on GCC-generated code.
There are ways to work around this, of course. For instance, if you wish to know the value of startv
, you can query the value of s
within updatemind()
. Hopefully this issue will be resolved in the next version of GCC.