The Almighty Breakpoint

Now that we have a functional debugging core, it's time to add breakpoints. Using the information from Chapter 2, we will implement soft breakpoints, hardware breakpoints, and memory breakpoints. We will also develop special handlers for each type of breakpoint and show how to cleanly resume the process after a breakpoint has been hit.

In order to place soft breakpoints, we need to be able to read and write into a process's memory. This is done via the ReadProcessMemory()[16] and WriteProcessMemory()[17] functions. They have similar prototypes:

BOOL WINAPI ReadProcessMemory(
    HANDLE hProcess,
    LPCVOID lpBaseAddress,
    LPVOID lpBuffer,
    SIZE_T nSize,
    SIZE_T* lpNumberOfBytesRead
);

BOOL WINAPI WriteProcessMemory(
    HANDLE hProcess,
    LPCVOID lpBaseAddress,
    LPCVOID lpBuffer,
    SIZE_T nSize,
    SIZE_T* lpNumberOfBytesWritten
);

Both of these calls allow the debugger to inspect and alter the debuggee's memory. The parameters are straightforward; lpBaseAddress is the address where you wish to start reading or writing. The lpBuffer parameter is a pointer to the data that you are either reading or writing, and the nSize parameter is the total number of bytes you wish to read or write.

Using these two function calls, we can enable our debugger to use soft breakpoints quite easily. Let's modify our core debugging class to support the setting and handling of soft breakpoints.

...
class debugger():

    def __init__(self):
        self.h_process         =     None
        self.pid               =     None
        self.debugger_active   =     False
        self.h_thread          =     None
        self.context           =     None
        self.breakpoints       =     {}
...
    def read_process_memory(self,address,length):
        data         = ""
        read_buf     = create_string_buffer(length)
        count        = c_ulong(0)


        if not kernel32.ReadProcessMemory(self.h_process,
                                          address,
                                          read_buf,
                                          length,
                                          byref(count)):
            return False

        else:

            data    += read_buf.raw
            return data

    def write_process_memory(self,address,data):

        count  = c_ulong(0)
        length = len(data)

        c_data = c_char_p(data[count.value:])

        if not kernel32.WriteProcessMemory(self.h_process,
                                           address,
                                           c_data,
                                           length,
                                           byref(count)):
            return False
        else:
            return True

    def bp_set(self,address):

        if not self.breakpoints.has_key(address):
            try:
                # store the original byte
                original_byte = self.read_process_memory(address, 1)

                # write the INT3 opcode
                self.write_process_memory(address, "\xCC")

                # register the breakpoint in our internal list
                    self.breakpoints[address] = (address, original_byte)
            except:
                return False

        return True

Now that we have support for soft breakpoints, we need to find a good place to put one. In general, breakpoints are set on a function call of some type; for the purpose of this exercise we will use our good friend printf() as the target function we wish to trap. The Windows debugging API has given us a very clean method for determining the virtual address of a function in the form of GetProcAddress(),[18] which again is exported from kernel32.dll. The only primary requirement of this function is a handle to the module (a .dll or .exe file) that contains the function we are interested in; we obtain this handle by using GetModuleHandle().[19] The function prototypes for GetProcAddress() and GetModuleHandle() look like this:

FARPROC WINAPI GetProcAddress(
    HMODULE hModule,
    LPCSTR lpProcName
);

HMODULE WINAPI GetModuleHandle(
    LPCSTR lpModuleName
);

This is a pretty straightforward chain of events: We obtain a handle to the module and then search for the address of the exported function we want. Let's add a helper function in our debugger to do just that. Again back to my_debugger.py.

import my_debugger

debugger = my_debugger.debugger()

pid = raw_input("Enter the PID of the process to attach to: ")

debugger.attach(int(pid))

printf_address = debugger.func_resolve("msvcrt.dll","printf")

print "[*] Address of printf: 0x%08x" % printf_address

debugger.bp_set(printf_address)

debugger.run()

So to test this, fire up printf_loop.py in a command-line console. Take note of the python.exe PID using Windows Task Manager. Now run your my_test.py script, and enter the PID. You should see output shown in Example 3-3.


We can first see that printf() resolves to 0x77c4186a, and so we set our breakpoint on that address. The first exception that is caught is the Windows-driven breakpoint, and when the second exception comes along, we see that the exception address is 0x77c4186a, the address of printf(). After the breakpoint is handled, the process should resume its loop. Our debugger now supports soft breakpoints, so let's move on to hardware breakpoints.

The second type of breakpoint is the hardware breakpoint, which involves setting certain bits in the CPU's debug registers. We covered this process extensively in the previous chapter, so let's get to the implementation details. The important thing to remember when managing hardware breakpoints is tracking which of the four available debug registers are free for use and which are already being used. We have to ensure that we are always using a slot that is empty, or we can run into problems where breakpoints aren't being hit where we expect them to.

Let's start by enumerating all of the threads in the process and obtain a CPU context record for each of them. Using the retrieved context record, we then modify one of the registers between DR0 and DR3 (depending on which are free) to contain the desired breakpoint address. We then flip the appropriate bits in the DR7 register to enable the breakpoint and set its type and length.

Once we have created the routine to set the breakpoint, we need to modify our main debug event loop so that it can appropriately handle the exception that is thrown by a hardware breakpoint. We know that a hardware breakpoint triggers an INT1 (or single-step event), so we simply add another exception handler to our debug loop. Let's start with setting the breakpoint.

...
class debugger():
    def __init__(self):
        self.h_process       =     None
        self.pid             =     None
        self.debugger_active =     False
        self.h_thread        =     None
        self.context         =     None
        self.breakpoints     =     {}
        self.first_breakpoint=     True
        self.hardware_breakpoints = {}
...
    def bp_set_hw(self, address, length, condition):

        # Check for a valid length value
        if length not in (1, 2, 4):
            return False
        else:
            length -= 1

        # Check for a valid condition
        if condition not in (HW_ACCESS, HW_EXECUTE, HW_WRITE):
            return False

        # Check for available slots
        if not self.hardware_breakpoints.has_key(0):
            available = 0
        elif not self.hardware_breakpoints.has_key(1):
            available = 1
        elif not self.hardware_breakpoints.has_key(2):
            available = 2
        elif not self.hardware_breakpoints.has_key(3):
            available = 3
        else:
            return False

        # We want to set the debug register in every thread
        for thread_id in self.enumerate_threads():
            context = self.get_thread_context(thread_id=thread_id)

            # Enable the appropriate flag in the DR7
            # register to set the breakpoint
            context.Dr7 |= 1 << (available * 2)

        # Save the address of the breakpoint in the
        # free register that we found
        if   available == 0:
            context.Dr0 = address
        elif available == 1:
            context.Dr1 = address
        elif available == 2:
            context.Dr2 = address
        elif available == 3:
            context.Dr3 = address

        # Set the breakpoint condition
        context.Dr7 |= condition << ((available * 4) + 16)

        # Set the length
        context.Dr7 |= length << ((available * 4) + 18)

        # Set thread context with the break set
        h_thread = self.open_thread(thread_id)
        kernel32.SetThreadContext(h_thread,byref(context))

        # update the internal hardware breakpoint array at the used
        # slot index.
          self.hardware_breakpoints[available] = (address,length,condition)

        return True

You can see that we select an open slot to store the breakpoint by checking the global hardware_breakpoints dictionary. Once we have obtained a free slot, we then assign the breakpoint address to the slot and update the DR7 register with the appropriate flags that will enable the breakpoint. Now that we have the mechanism to support setting the breakpoints, let's update our event loop and add an exception handler to support the INT1 interrupt.

...
class debugger():
...
    def get_debug_event(self):

        if self.exception == EXCEPTION_ACCESS_VIOLATION:
            print "Access Violation Detected."
        elif self.exception == EXCEPTION_BREAKPOINT:
            continue_status = self.exception_handler_breakpoint()
        elif self.exception == EXCEPTION_GUARD_PAGE:
            print "Guard Page Access Detected."
        elif self.exception == EXCEPTION_SINGLE_STEP:
            self.exception_handler_single_step()
        ...
    def exception_handler_single_step(self):

        # Comment from PyDbg:
        # determine if this single step event occurred in reaction to a
        # hardware breakpoint and grab the hit breakpoint.
        # according to the Intel docs, we should be able to check for
        # the BS flag in Dr6. but it appears that Windows
        # isn't properly propagating that flag down to us.
          if self.context.Dr6 & 0x1 and self.hardware_breakpoints.has_key(0):
            slot = 0
          elif self.context.Dr6 & 0x2 and self.hardware_breakpoints.has_key(1):
            slot = 1
          elif self.context.Dr6 & 0x4 and self.hardware_breakpoints.has_key(2):
            slot = 2
          elif self.context.Dr6 & 0x8 and self.hardware_breakpoints.has_key(3):
            slot = 3
        else:
            # This wasn't an INT1 generated by a hw breakpoint
            continue_status = DBG_EXCEPTION_NOT_HANDLED

        # Now let's remove the breakpoint from the list
        if self.bp_del_hw(slot):
            continue_status = DBG_CONTINUE

        print "[*] Hardware breakpoint removed."
        return continue_status

    def bp_del_hw(self,slot):

        # Disable the breakpoint for all active threads
        for thread_id in self.enumerate_threads():

            context = self.get_thread_context(thread_id=thread_id)

            # Reset the flags to remove the breakpoint
            context.Dr7 &= ~(1 << (slot * 2))

            # Zero out the address
            if   slot == 0:
                context.Dr0 = 0x00000000
            elif slot == 1:
                context.Dr1 = 0x00000000
            elif slot == 2:
                context.Dr2 = 0x00000000
            elif slot == 3:
                context.Dr3 = 0x00000000

            # Remove the condition flag
            context.Dr7 &= ~(3 << ((slot * 4) + 16))

            # Remove the length flag
            context.Dr7 &= ~(3 << ((slot * 4) + 18))

            # Reset the thread's context with the breakpoint removed
            h_thread = self.open_thread(thread_id)
            kernel32.SetThreadContext(h_thread,byref(context))

        # remove the breakpoint from the internal list.
        del self.hardware_breakpoints[slot]

        return True

This process is fairly straightforward; when an INT1 is fired we check to see if any of the debug registers are set up with a hardware breakpoint. If the debugger detects that there is a hardware breakpoint at the exception address, it zeros out the flags in DR7 and resets the debug register that contains the breakpoint address. Let's see this process in action by modifying our my_test.py script to use hardware breakpoints on our printf() call.

import my_debugger
from my_debugger_defines import *

debugger = my_debugger.debugger()

pid = raw_input("Enter the PID of the process to attach to: ")

debugger.attach(int(pid))

printf = debugger.func_resolve("msvcrt.dll","printf")
print "[*] Address of printf: 0x%08x" % printf

debugger.bp_set_hw(printf,1,HW_EXECUTE)
debugger.run()

This harness simply sets a breakpoint on the printf() call whenever it gets executed. The length of the breakpoint is only a single byte. You will notice that in this harness we imported the my_debugger_defines.py file; this is so we can access the HW_EXECUTE constant, which provides a little code clarity. When you run the script you should see output similar to Example 3-4.


You can see from the order of events that an exception gets thrown, and our handler removes the breakpoint. The loop should continue to execute after the handler is finished. Now that we have support for soft and hardware breakpoints, let's wrap up our lightweight debugger with memory breakpoints.

The final feature that we are going to implement is the memory breakpoint. First, we are simply going to query a section of memory to determine where its base address is (where the page starts in virtual memory). Once we have determined the page size, we will set the permissions of that page so that it acts as a guard page. When the CPU attempts to access this memory, a GUARD_PAGE_EXCEPTION will be thrown. Using a specific handler for this exception, we revert to the original page permissions and continue execution.

In order for us to properly calculate the size of the page we are manipulating, we have to first query the operating system itself to retrieve the default page size. This is done by executing the GetSystemInfo()[20] function, which populates a SYSTEM_INFO[21] structure. This structure contains a dwPageSize member, which gives us the correct page size for the system. We will implement this first step when our debugger() class is first instantiated.

...
class debugger():

    def __init__(self):
        self.h_process       =     None
        self.pid             =     None
        self.debugger_active =     False
        self.h_thread        =     None
        self.context         =     None
        self.breakpoints     =     {}
        self.first_breakpoint=     True
        self.hardware_breakpoints = {}

        # Here let's determine and store
        # the default page size for the system
        system_info = SYSTEM_INFO()
        kernel32.GetSystemInfo(byref(system_info))
        self.page_size = system_info.dwPageSize
    ...

Now that we have captured the default page size, we are ready to begin querying and manipulating page permissions. The first step is to query the page that contains the address of the memory breakpoint we wish to set. This is done by using the VirtualQueryEx()[22] function call, which populates a MEMORY_BASIC_INFORMATION[23] structure with the characteristics of the memory page we queried. Following are the definitions for both the function and the resulting structure:

SIZE_T WINAPI VirtualQuery(
    HANDLE hProcess,
    LPCVOID lpAddress,
    PMEMORY_BASIC_INFORMATION lpBuffer,
    SIZE_T dwLength
);

typedef struct MEMORY_BASIC_INFORMATION{
    PVOID BaseAddress;
    PVOID AllocationBase;
    DWORD AllocationProtect;
    SIZE_T RegionSize;
    DWORD State;
    DWORD Protect;
    DWORD Type;
}

Once the structure has been populated, we will use the BaseAddress value as the starting point to begin setting the page permission. The function that actually sets the permission is VirtualProtectEx(),[24] which has the following prototype:

BOOL WINAPI VirtualProtectEx(
  HANDLE hProcess,
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD flNewProtect,
  PDWORD lpflOldProtect
);

So let's get down to code. We are going to create a global list of guard pages that we have explicitly set as well as a global list of memory breakpoint addresses that our exception handler will use when the GUARD_PAGE_EXCEPTION gets thrown. Then we set the permissions on the address and surrounding memory pages (if the address straddles two or more memory pages).

...
class debugger():

    def __init__(self):
        ...
        self.guarded_pages      = []
        self.memory_breakpoints = {}
    ...

    def bp_set_mem (self, address, size):

        mbi = MEMORY_BASIC_INFORMATION()

      # If our VirtualQueryEx() call doesn't return
      # a full-sized MEMORY_BASIC_INFORMATION
      # then return False
        if kernel32.VirtualQueryEx(self.h_process,
                                   address,
                                   byref(mbi),
                                   sizeof(mbi)) < sizeof(mbi):

            return False

        current_page = mbi.BaseAddress

        # We will set the permissions on all pages that are
        # affected by our memory breakpoint.
        while current_page <= address + size:

            # Add the page to the list; this will
            # differentiate our guarded pages from those
            # that were set by the OS or the debuggee process
            self.guarded_pages.append(current_page)

            old_protection = c_ulong(0)
            if not kernel32.VirtualProtectEx(self.h_process,
                    current_page, size,
              mbi.Protect | PAGE_GUARD, byref(old_protection)):

                return False

            # Increase our range by the size of the
            # default system memory page size
            current_page += self.page_size

        # Add the memory breakpoint to our global list
        self.memory_breakpoints[address] = (address, size, mbi)

        return True

Now you have the ability to set a memory breakpoint. If you try it out in its current state by using our printf() looper, you should get output that simply says Guard Page Access Detected. The nice thing is that when a guard page is accessed and the exception is thrown, the operating system actually removes the protection on that page of memory and allows you to continue execution. This saves you from creating a specific handler to deal with it; however, you could build logic into the existing debug loop to perform certain actions when the breakpoint is hit, such as restoring the breakpoint, reading memory at the location where the breakpoint is set, pouring you a fresh coffee, or whatever you please.



[16] See MSDN ReadProcessMemory Function (http://msdn2.microsoft.com/en-us/library/ms680553.aspx).

[17] See MSDN WriteProcessMemory Function (http://msdn2.microsoft.com/en-us/library/ms681674.aspx).

[18] See MSDN GetProcAddress Function (http://msdn2.microsoft.com/en-us/library/ms683212.aspx).

[19] See MSDN GetModuleHandle Function (http://msdn2.microsoft.com/en-us/library/ms683199.aspx).

[20] See MSDN GetSystemInfo Function (http://msdn2.microsoft.com/en-us/library/ms724381.aspx).

[22] See MSDN VirtualQueryEx Function (http://msdn2.microsoft.com/en-us/library/aa366907.aspx).

[23] See MSDN MEMORY_BASIC_INFORMATION Structure (http://msdn2.microsoft.com/en-us/library/aa366775.aspx).