In this section we present a detailed example of debugging seg faults.
Below is some C code that might be part of an implementation of a managed string type similar to C++ strings. The code, contained in the source file cstring.c, implements a type called CString
; however, it's riddled with bugs, obvious and subtle. Our goal is to find all these bugs and correct them.
CString
is a typede
fed alias for a structure containing a pointer to storage for a char
string together with a variable that stores the string's length. Some utility functions useful for string handling have been implemented:
Init_CString()
Takes an old-style C string as an argument and uses it to initialize a new CString
.
Delete_CString()
CStrings
are allocated on the heap, and their memory must be freed when it is not needed anymore. This function takes care of the garbage collection.
Chomp()
Removes and returns the last character of a CString
.
Append_Chars_To_CString()
Appends a C-style string to a CString
.
Lastly, main()
is our driver function to test the CString
implementation.
Our code makes use of an extremely useful library function, snprintf()
. In case you haven't encountered this function yet, it's almost like printf()
, except that it writes its output into a character array instead of to Stdout
. To help prevent buffer overflows (which may occur with any function that copies null-terminated strings, if the null character is left out of the source string), snprintf()
also lets you specify the maximum number of bytes to write, including the trailing null character:
#include <stdio.h> #define STRSIZE 22 int main(void) { char s1[] = "brake"; char *s2 = "breakpoints"; char logo[STRSIZE]; snprintf(logo, STRSIZE, "%c %s %d %s.", 'I', s1, 2+2, s2); puts(logo); return 0; }
This program will write the string "I brake 4 breakpoints." into the character array logo
, ready for printing onto a bumper sticker.
Now here's the implementation of our CString
:
#include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct { char *str; int len; } CString; CString *Init_CString(char *str) { CString *p = malloc(sizeof(CString)); p->len = strlen(str); strncpy(p->str, str, strlen(str) + 1); return p; void Delete_CString(CString *p) { free(p); free(p->str); // Removes the last character of a CString and returns it. // char Chomp(CString *cstring) { char lastchar = *( cstring->str + cstring->len); // Shorten the string by one *( cstring->str + cstring->len) = '0'; cstring->len = strlen( cstring->str ); return lastchar; // Appends a char * to a CString // CString *Append_Chars_To_CString(CString *p, char *str) { char *newstr = malloc(p->len + 1); p->len = p->len + strlen(str); // Create the new string to replace p->str snprintf(newstr, p->len, "%s%s", p->str, str); // Free old string and make CString point to the new string free(p->str); p->str = newstr; return p; } int main(void) { CString *mystr; char c; mystr = Init_CString("Hello!"); printf("Init:\n str: `%s' len: %d\n", mystr->str, mystr->len); c = Chomp(mystr); printf("Chomp '%c':\n str:`%s' len: %d\n", c, mystr->str, mystr->len); mystr = Append_Chars_To_CString(mystr, " world!"); printf("Append:\n str: `%s' len: %d\n", mystr->str, mystr->len); Delete_CString(mystr); return 0; }
Study the code and try to guess what the output should be. Then compile and run it.
$ gcc -g -W -Wall cstring.c -o cstring $ ./cstring Segmentation fault (core dumped)
Oops! The first thing we need to do is find out where the seg fault happened. Then we can try to figure out why it happened.
Before continuing, we'd like to mention that our office colleague in the next cubicle, Milton, is also trying to fix the bugs in this program. Unlike us, Milton doesn't know how to use GDB, so he's about to open up Wordpad, insert calls to printf()
all over the code, and recompile the program in an attempt to figure out where the seg fault happened. Let's see if we can debug the program faster than Milton.
While Milton opens up Wordpad, we'll use GDB to examine the core file:
$ gdb cstring core Core was generated by `cstring'. Program terminated with signal 11, Segmentation fault. #0 0x400a9295 in strncpy () from /lib/tls/libc.so.6 (gdb) backtrace #0 0x400a9295 in strncpy () from /lib/tls/libc.so.6 #1 0x080484df in Init_CString (str=0x80487c5 "Hello!") at cstring.c:15 #2 0x080485e4 in main () at cstring.c:62
According to the backtrace output, the seg fault happened at line 15, in Init_CString()
, during the call to strncpy()
. Without even looking at the code, we already know that the chances are pretty good that we passed a NULL
pointer to strncpy()
on line 15.
At this point, Milton is still trying to decide where to insert the first of many calls to printf()
.
GDB told us that the seg fault occured at line 15 in Init_CString()
, so we'll change the current frame to the one for the invocation of Init_CString()
.
(gdb) frame 1 #1 0x080484df in Init_CString (str=0x80487c5 "Hello!") at cstring.c:15 15 strncpy(p->str, str, strlen(str) + 1);
We'll apply the Principle of Confirmation by looking at each of the pointer arguments passed to strncpy()
— namely, str
, p
, and p->str
—and verifying that their values are what we think they should be. We first print the value of str
:
(gdb) print str $1 = 0x80487c5 "Hello!"
Since str
is a pointer, GDB gives us its value as the hex address 0x80487c5
. And since str
is a pointer to char
, and thus the address of a character string, GDB helpfully tells us the value of the string as well: "Hello!" This was also clear in the backtrace output we saw above, but we should check anyway. So, str
is not NULL
and points to a valid string, and everything is okay so far.
Now let's turn our attention to the other pointer arguments, p
and p->str
:
(gdb) print *p $2 = { str = 0x0, len = 6 }
The problem is now clear: p->str
, which is also a pointer to a string, is NULL
. That explains the seg fault: We had tried to write to location 0 in memory, which is off limits to us.
But what could cause p->str
(the string pointer in the CString
under construction) to be NULL
? Well, taking a look at the code,
(gdb) list Init_CString 5 typedef struct { 6 char *str; 7 int len; 8 } CString; 9 10 11 CString *Init_CString(char *str) 12 { 13 CString *p = malloc(sizeof(CString)); 14 p->len = strlen(str); 15 strncpy(p->str, str, strlen(str) + 1); 16 return p; 17 } 18
we see that there are only two lines of code before the line in which the seg fault occurred, and between them, line 13 is much more likely to be the culprit.
We'll re-run the program from within GDB, set a temporary breakpoint at the entry into Init_CString()
, and step through this function line by line, looking at the value of p->str
.
(gdb) tbreak Init_CString Breakpoint 1 at 0x804849b: file cstring.c, line 13. (gdb) run Breakpoint 1, Init_CString (str=0x80487c5 "Hello!") at cstring.c:13 13 CString *p = malloc(sizeof(CString)); (gdb) step 14 p->len = strlen(str); (gdb) print p->str $4 = 0x0 (gdb) step 15 strncpy(p->str, str, strlen(str) + 1);
Here's the trouble: We're about to commit a seg fault because the next line of code dereferences p->str
, and p->str
is still NULL
. Now we use the little gray cells to figure out what happened.
When we allocated memory for p
we got enough memory for our struct
: a pointer to hold the address of the string and an int
to hold the string length, but we did not allocate memory to hold the string itself. We made the common error of declaring a pointer and not making it point to anything! What we need to do is first allocate enough memory to hold str
, and then make p->str
point to that newly allocated memory. Here's how we can do that (we need to add one to the length of the string because strlen()
doesn't count the terminating '\0'
):
CString *Init_CString(char *str) { // Allocate for the struct CString *p = malloc(sizeof(CString)); p->len = strlen(str); // Allocate for the string p->str = malloc(p->len + 1); strncpy(p->str, str, strlen(str) + 1); return p; }
By the way, Milton has just finished putting printf()
calls into his code and is about to recompile. If he's lucky, he'll find out where the seg fault happened. If not, he'll have to add even more printf()
calls.
During the debugging session, we don't ever exit GDB while we make changes to our code. As discussed earlier, this way we avoid the time-consuming startups, we retain our breakpoints, and so on.
Similarly, we keep the text editor open. By staying in the same editor session between compilations while debugging, we can make good use of our editor's "undo" capability. For example, a common strategy in the debugging process is to temporarily remove parts of the code, so as to focus on the remaining sections, where you think a bug lies. After you finish that inspection, you can simply use the undo capability of the editor to restore the deleted lines.
So, on the screen we'll typically have one window for GDB (or DDD) and one window for the editor. We'd also either have a third window open for issuing compiler commands or, better yet, execute them through the editor. For instance, if you use the Vim editor, you can issue the following command, which will save your editing changes and recompile the program in one step:
: make
(We're assuming you've set Vim's autowrite
variable, using set autowrite
, in your Vim startup file. This feature of Vim will also move your cursor to the first reported compilation warning or error, if there are any, and you can go back and forth within the compilation error list via Vim's :cnext
and :cprev
commands. Of course, all of this is made easier if you put short aliases for these commands in your Vim startup file.)
After fixing the the first bug, we run the program again from within GDB (remember that when GDB notices you've recompiled the program, it will automatically load the new executable, so that again there's no need to quit and restart GDB):
(gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y `cstring' has changed; re-reading symbols. Starting program: cstring Init: str: `Hello!' len: 6 Chomp '': str:`Hello!0' len: 7 Append: str: `Hello!0 world' len: 14 Program exited normally. (gdb)
There appear to be two problems with Chomp()
. First, it should have chomped a '!'
, but it looks like it chomped a nonprinting character. Second, a zero character appears at the end of our string. Since Chomp()
is an obvious place to look for these bugs, we'll start the program and place a temporary breakpoint at the entry of Chomp()
.
(gdb) tbreak Chomp Breakpoint 2 at 0x8048523: file cstring.c, line 32. (gdb) run Starting program: cstring Init: str: `Hello!' len: 6 Breakpoint 1, Chomp (cstring=0x804a008) at cstring.c:32 32 char lastchar = *( cstring->str + cstring->len); (gdb)
The last character of the string ought to be '!'
. Let's confirm this.
(gdb) print lastchar $1 = 0 '\0'
We were expecting lastchar
to be '!'
, but instead it's the null character. This looks like it's probably an "off by one" error. Let's figure it out. We can visualize the string like so:
pointer offset: 0 1 2 3 4 5 6 cstring->str: H e l l o ! \0 string length: 1 2 3 4 5 6
The last character of the string is stored at the address cstring->str + 5
, but because the string length is a character count, rather than an index, the address cstring->str + cstring->len
points to one array location past the last character, where the terminating NULL
is, instead of where we wanted it to point. We can fix this problem by changing line 31 from
char lastchar = *( cstring->str + cstring->len);
to
char lastchar = *( cstring->str + cstring->len - 1);
There is a third bug hiding in this part of the code. After the call to Chomp()
, the string "Hello!" became "Hello!0" (instead of "Hello"). The next line to execute in GDB, line 33, is where we wanted to shorten the string by replacing its last character with a terminating null character:
*( cstring->str + cstring->len) = '0';
Immediately we see that this line contains the same problem that we just fixed in line 31: We're referencing the last character of the string incorrectly. Furthermore, now that our eyes are trained on this line of code, it appears that we're storing the character '0'
at that location, which is not the null character. We meant to place '\0'
at the end of the string. After making these two corrections, line 33 reads
*( cstring->str + cstring->len - 1) = '\0';
At this point, Milton, our printf()
-using colleague, has found the first seg fault and is just now correcting the memory allocation issue in Init_CString()
. Instead of moving on to the bugs we just fixed in Chomp()
, he'll have to remove all the calls to printf()
and recompile the program. How inconvenient!
Let's make the corrections discussed in the previous section, recompile the code, and run the program again:
(gdb) run `cstring' has changed; re-reading symbols. Starting program: cstring Init: str: `Hello!' len: 6 Chomp '!': str:`Hello' len: 5 Append: str: `Hello world' len: 12 Program received signal SIG SEGV, Segmentation fault. 0xb7f08da1 in free () from /lib/tls/libc.so.6 (gdb)
Another seg fault. Judging by the missing exclamation point after the append operation, it appears that the next bug might be hiding in Append_Chars_To_CString()
. A simple backtrace
should confirm or refute this hypothesis:
1 (gdb) backtrace 2 #0 0xb7f08da1 in free () from /lib/tls/libc.so.6 3 #1 0x0804851a in Delete_CString (p=0x804a008) at cstring3.c:24 4 #2 0x08048691 in main () at cstring3.c:70 5 (gdb)
According to line 3 of the backtrace output, our assumption is wrong: The program actually crashed in Delete_CString()
. That doesn't mean that we don't also have a bug in Append_Chars_To_CString()
, but our immediate bug, the one that caused the seg fault, is in Delete_CString()
. This is exactly why we use GDB here to check our expectations—it completely removes any guesswork in finding where a seg fault occured. Once our printf()
-using friend catches up to this point in his debugging, he'll be putting trace code in the wrong function!
Luckily, Delete_CString()
is short, so we should be able to find what's wrong quickly.
(gdb) list Delete_CString 20 21 void Delete_CString(CString *p) 22 { 23 free(p); 24 free(p->str); 25 } 26
We first free p
, then we free p->str
. This counts as a not-so-subtle bug. After p
is freed, there's no guarantee that p->str
points to the correct location in memory anymore; it can point anywhere. In this case, that "anywhere" was memory we couldn't access, causing the seg fault. The fix is to reverse the order of the calls to free()
:
void Delete_CString(CString *p) { free(p->str); free(p); }
By the way, Milton got too frustrated trying to track down the off-by-one error in Chomp()
that we so easily fixed. That's him calling us on the phone now for help.
We correct, recompile, and re-run the code once more.
(gdb) run `cstring' has changed; re-reading symbols. Starting program: cstring Init: str: `Hello!' len: 6 Chomp '!': str:`Hello' len: 5 Append: str: `Hello world' len: 12 Program exited normally. (gdb)
After the append operation, we're missing the exclamation point in the string, which should be "Hello world!"
Curiously, the reported string length of 12 is correct even though the string is incorrect. The most logical place to look for this bug is in Append_Chars_To_CString()
, so we'll place a breakpoint there:
(gdb) tbreak Append_Chars_To_CString Breakpoint 3 at 0x8048569: file cstring.c, line 45. (gdb) run Starting program: cstring Init: str: `Hello!' len: 6 Chomp '!': str:`Hello' len: 5 Breakpoint 1, Append_Chars_To_CString (p=0x804a008, str=0x8048840 " world!") at cstring.c:45 45 char *newstr = malloc(p->len + 1);
The C string newstr
needs to be big enough to hold both p->str
and str
. We see that the call to malloc()
on line 45 doesn't allocate enough memory; it only allocates enough space for p->str
and a terminating null. Line 45 should be changed to
char *newstr = malloc(p->len + strlen(str) + 1);
After making this correction and recompiling, we get the following output:
(gdb) run `cstring' has changed; re-reading symbols. Starting program: cstring Init: str: `Hello!' len: 6 Chomp '!': str:`Hello' len: 5 Append: str: `Hello world' len: 12
Our correction didn't fix the bug we had in mind. What we found and fixed was a "silent bug." Make no mistake: It was a bug, and the fact that it didn't manifest itself as a seg fault was purely a matter of luck. It's highly probable that the remaining bug is still in Append_Chars_To_CString()
, so we'll set another breakpoint there:
(gdb) tbreak Append_Chars_To_CString Breakpoint 4 at 0x8048569: file cstring.c, line 45. (gdb) run Starting program: cstring (gdb) run Init: str: `Hello!' len: 6 Chomp '!': str:`Hello' len: 5 Breakpoint 1, Append_Chars_To_CString (p=0x804a008, str=0x8048840 " world!") at cstring.c:45 45 char *newstr = malloc(p->len + strlen(str) + 1); (gdb) step 46 p->len = p->len + strlen(str);
Line 46 shows why the string length is correct even though the string itself is incorrect: The addition correctly calculates the length of p->str
concatenated with str
. No problem here, so we'll step forward.
(gdb) step 49 snprintf(newstr, p->len, "%s%s", p->str, str);
The next line of code, line 49, is where we form the new string. We expect that newstr
will contain "Hello world!"
after this step. Let's apply the Principle of Confirmation and verify this.
(gdb) step 51 free(p->str); (gdb) print newstr $2 = 0x804a028 "Hello world"
The exclamation point is missing from the string constructed on line 51 of the code, so the bug probably occurs at line 49, but what can it be? In the call to snprintf()
, we requested that at most p->len
bytes be copied into newstr
. The value of p->len
was confirmed to be 12, and the text "Hello world!" has 12 characters. We didn't tell snprintf()
to copy the terminating null character in the source string. But then shouldn't we have gotten a malformed string, with an exclamation point in the last position and no null?
This is one of the great things about snprintf()
. It always copies a terminating null character into the target. If you goof and specify a maximum number of characters to copy that is less than the actual number of characters in the source (as we did here), snprintf()
will copy however many characters it can, but the last character written to the target is guaranteed to be a null character. To fix our mistake we need to tell snprintf()
to copy enough bytes to hold the source string's text and the terminating null.
So line 45 needs to be changed. Here's the complete, fixed function:
CString *Append_Chars_To_CString(CString *p, char *str) { char *newstr = malloc(p->len + strlen(str) + 1); p->len = p->len + strlen(str); // Create the new string to replace p->str snprintf(newstr, p->len + 1, "%s%s", p->str, str); // Free old string and make CString point to the new string free(p->str); p->str = newstr; return p; }
Let's recompile the fixed code and run the program:
(gdb) run `cstring' has changed; re-reading symbols. Starting program: cstring Init: str: `Hello!' len: 6 Chomp '!': str:`Hello' len: 5 Append: str: `Hello world!' len: 12 Program exited normally. (gdb)
Looks good!
We've covered quite a bit of territory and encountered some difficult concepts, but it was well worth it. Even if our buggy CString
implementation was a little contrived, our debugging session was fairly realistic and encompasses the many aspects of debugging:
The Principle of Confirmation
Using core files for post-mortem analysis of a process that crashed
Correcting, compiling, and re-running a program without ever leaving GDB
The inadequacy of printf()
-style debugging
Using good old fashioned brain power—there's no substitute for that
If you come from a printf()
-style debugging background, you may want to think about how much more difficult it would have been to track down some of these bugs using printf()
. Diagnostic code using printf()
has its place in debugging, but as a general-purpose "tool," it is woefully inadequate and inefficient for tracking down most of the bugs that occur in real code.