Assembly Language

GDB and DDD can be extremely useful in debugging assembly language code. There are a number of special considerations to keep in mind, which will be described in this section.

Take as an example the code in file testff.s:

 1  # the subroutine findfirst(v,w,b) finds the first instance of a value v
 2  # in a block of w consecutive words of memory beginning at b, returning
 3  # either the index of the word where v was found (0, 1, 2, ...) or -1 if
 4  # v was not found; beginning with _start, we have a short test of the
 5  # subroutine
 6
 7  .data # data segment
 8  x:
 9        .long  1
10        .long  5
11        .long  3
12        .long  168
13        .long  8888
14  .text # code segment
15  .globl _start # required
16  _start: # required to use this label unless special action taken
17        # push the arguments on the stack, then make the call
18        push $x+4 # start search at the 5
19        push $168 # search for 168 (deliberately out of order)
20        push $4 # search 4 words
21        call findfirst
22  done:
23        movl %edi, %edi # dummy instruction for breakpoint
24  findfirst:
25       # finds first instance of a specified value in a block of words
26       # EBX will contain the value to be searched for
27       # ECX will contain the number of words to be searched
28       # EAX will point to the current word to search
29       # return value (EAX) will be index of the word found (-1 if not found)
30       # fetch the arguments from the stack
31       movl 4(%esp), %ebx
32       movl 8(%esp), %ecx
33       movl 12(%esp), %eax
34       movl %eax, %edx # save block start location
35       # top of loop; compare the current word to the search value
36  top: cmpl (%eax), %ebx
37       jz found
38       decl %ecx # decrement counter of number of words left to search
39       jz notthere # if counter has reached 0, the search value isn't there
40       addl $4, %eax # otherwise, go on to the next word
41       jmp top
42  found:
43        subl %edx, %eax # get offset from start of block
44        shrl $2, %eax # divide by 4, to convert from byte offset to index
45        ret
46  notthere:
47        movl $-1, %eax
48        ret

This is Linux Intel assembly language, using the AT&T syntax, but users familiar with other Intel syntaxes should find the code easy to follow. (The GDB command set disassembly-flavor intel will cause GDB to display all output of its disassemble command in Intel syntax, which is like the syntax used by the NASM compiler, for example. By the way, since this is a Linux platform, the program runs in the Intel CPU's 32-bit flat mode.)

As indicated by the comments, the subroutine findfirst finds the first occurrence of a specified value within a specified block of consecutive words of memory. The return value of the subroutine is the index ( 0, 1, 2, …) of the word in which the value was found, or -1 if it was not found.

The subroutine expects the arguments to be placed on the stack, so that the stack looks like this upon entry:

address of the start of the block to be searched
number of words in the block
value to be searched
return address

Note

Intel stacks grow downward, that is, toward address 0 in memory. Words with smaller addresses appear further down in the picture.

To introduce a bug that we can use GDB to find, we deliberately scrambled the elements in the calling sequence in the "main" program:

push $x+4  # start search at the 5
push $168  # search for 168 (deliberately out of order)
push $4  # search 4 words

The instructions preceding the call should instead be

push $x+4  # start search at the 5
push $4  # search 4 words
push $168  # search for 168

Just as you use the -g option when compiling C/C++ code for use with GDB/DDD, here at the assembly level you use -gstabs:

$ as -a --gstabs -o testff.o testff.s

This produces an object file testff.o and prints out a side-by-side comparison of the assembly source code and the corresponding machine code. It also shows offsets of data items and other information that is potentially useful for the debugging process.

We then link:

$ ld testff.o

This produces an executable with the default name a.out.

Let's run this code under GDB:

(gdb) b done
Breakpoint 1 at 0x8048085: file testff.s, line 18.
(gdb) r
Starting program: /debug/a.out
Breakpoint 1, done () at testff.s:18
18              movl %edi, %edi  # dummy for breakpoint
Current language:  auto; currently asm
(gdb) p $eax
$1 = -1

As you can see here, the registers can be referred to via dollar sign prefixes, in this case $eax for the EAX register. Unfortunately, the value in that register is -1, indicating that the desired value, 168, was not found in the specified block.

When debugging assembly language programs, one of the first things to do is to check the stack for accuracy. So, let's set a breakpoint at the subroutine and then inspect the stack when you get there:

(gdb) b findfirst
Breakpoint 2 at 0x8048087: file testff.s, line 25.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /debug/a.out
Breakpoint 2, findfirst () at testff.s:25
25              movl 4(%esp), %ebx
(gdb) x/4w $esp
0xbfffd9a0:   0x08048085   0x00000004   0x000000a8   0x080490b4

The stack is of course part of memory, so to inspect it you must use the GDB x command, which examines memory. Here, we asked GDB to display the four words starting at the location indicated by the stack pointer ESP (note that the picture of the stack above shows four words). The x command will display memory in order of increasing addresses. That is exactly what you want, since on the Intel architecture, as on many others, the stack grows toward 0.

You see from the picture of the stack shown above that the first word should be the return address. This expectation could be checked in various ways. One approach would be to use GDB's disassemble command, which lists assembly language instructions (reverse-translated from the machine code) and their addresses. Stepping into the subroutine, you could then check whether contents of the first word on the stack matches the address of the function that follows the call.

You'll see that it does. However, you'll find that the second number, 4, which ought to be the value to be searched for (168), is in fact the size of the search block (4). From this information you'll quickly realize that we accidentally switched the two push instructions before the call.