Chapter 4. PYDBG—A PURE PYTHON WINDOWS DEBUGGER

If you've made it this far, then you should have a good understanding of how to use Python to construct a user-mode debugger for Windows. We'll now move on to learning how to harness the power of PyDbg, an open source Python debugger for Windows. PyDbg was released by Pedram Amini at Recon 2006 in Montreal, Quebec, as a core component in the PaiMei^[25] reverse engineering framework. PyDbg has been used in quite a few tools, including the popular proxy fuzzer Taof and a Windows driver fuzzer that I built called ioctlizer. We will start with extending breakpoint handlers and then move into more advanced topics such as handling application crashes and taking process snapshots. Some of the tools we'll build in this chapter can be used later on to support some of the fuzzers we are going to develop. Let's get on with it.

Extending Breakpoint Handlers

In the previous chapter we covered the basics of using event handlers to handle specific debugging events. With PyDbg it is quite easy to extend this basic functionality by implementing user-defined callback functions. With a user-defined callback, we can implement custom logic when the debugger receives a debugging event. The custom code can do a variety of things such as read certain memory offsets, set further breakpoints, or manipulate memory. Once the custom code has run, we return control to the debugger and allow it to resume the debuggee.

The PyDbg function to set soft breakpoints has the following prototype:

bp_set(address, description="",restore=True,handler=None)

The address parameter is the address where the soft breakpoint should be set; the description parameter is optional and can be used to uniquely name each breakpoint. The restore parameter determines whether the breakpoint should automatically be reset after it's handled, and the handler parameter specifies which function to call when this breakpoint is encountered. Breakpoint callback functions take only one parameter, which is an instance of the pydbg() class. All context, thread, and process information will already be populated in this class when it is passed to the callback function.

Using our printf_loop.py script, let's implement a user-defined callback function. For this exercise, we will read the value of the counter that is used in the printf loop and replace it with a random number between 1 and 100. One neat thing to remember is that we are actually observing, recording, and manipulating live events inside the target process. This is truly powerful! Open a new Python script, name it printf_random.py, and enter the following code.

printf_random.py

from pydbg import *
from pydbg.defines import *

import struct
import random

# This is our user defined callback function
def printf_randomizer(dbg):

    # Read in the value of the counter at ESP + 0x8 as a DWORD
    parameter_addr = dbg.context.Esp + 0x8
    counter = dbg.read_process_memory(parameter_addr,4)

    # When we use read_process_memory, it returns a packed binary
    # string. We must first unpack it before we can use it further.
    counter = struct.unpack("L",counter)[0]
    print "Counter: %d" % int(counter)

    # Generate a random number and pack it into binary format
    # so that it is written correctly back into the process
    random_counter = random.randint(1,100)
    random_counter = struct.pack("L",random_counter)[0]

    # Now swap in our random number and resume the process
    dbg.write_process_memory(parameter_addr,random_counter)

    return DBG_CONTINUE

# Instantiate the pydbg class
dbg = pydbg()

# Now enter the PID of the printf_loop.py process
pid = raw_input("Enter the printf_loop.py PID: ")

# Attach the debugger to that process
dbg.attach(int(pid))

# Set the breakpoint with the printf_randomizer function
# defined as a callback
printf_address = dbg.func_resolve("msvcrt","printf")
dbg.bp_set(printf_address,description="printf_address",handler=printf_randomizer)

# Resume the process
dbg.run()

Now run both the printf_loop.py and the printf_random.py scripts. The output should look similar to what is shown in Table 4-1.

Table 4-1. Output from the Debugger and the Manipulated Process

Output from Debugger	Output from Debugged Process
Enter the printf_loop.py PID: `3466`	Loop iteration 0!
…	Loop iteration 1!
…	Loop iteration 2!
…	Loop iteration 3!
Counter: 4	Loop iteration 32!
Counter: 5	Loop iteration 39!
Counter: 6	Loop iteration 86!
Counter: 7	Loop iteration 22!
Counter: 8	Loop iteration 70!
Counter: 9	Loop iteration 95!
Counter: 10	Loop iteration 60!

You can see that the debugger set a breakpoint on the fourth iteration of the infinite printf loop, because the counter as recorded by the debugger is set to 4. You will also notice that the printf_loop.py script ran fine until it reached iteration 4; instead of outputting the number 4, it output the number 32! It is clear to see how our debugger records the real value of the counter and sets the counter to a random number before it is output by the debugged process. This is a simple yet powerful example of how you can easily extend a scriptable debugger to perform additional actions when debugging events occur. Now let's take a look at handling application crashes with PyDbg.

Access Violation Handlers

An access violation occurs inside a process when it attempts to access memory it doesn't have permission to access or in a particular way that it is not allowed. The faults that lead to access violations range from buffer overflows to improperly handled null pointers. From a security perspective, every access violation should be reviewed carefully, as the violation might be exploited.

When an access violation occurs within a debugged process, the debugger is responsible for handling it. It is crucial that the debugger trap all information that is relevant, such as the stack frame, the registers, and the instruction that caused the violation. You can now use this information as a starting point for writing an exploit or creating a binary patch.

PyDbg has an excellent method for installing an access violation handler, as well as utility functions to output all of the pertinent crash information. Let's first create a test harness that will use the dangerous C function strcpy() to create a buffer overflow. Following the test harness, we will write a brief PyDbg script to attach to and handle the access violation. Let's start with the test script. Open a new file called buffer_overflow.py, and enter the following code.

buffer_overflow.py

from ctypes import *

msvcrt = cdll.msvcrt

# Give the debugger time to attach, then hit a button
raw_input("Once the debugger is attached, press any key.")

# Create the 5-byte destination buffer
buffer = c_char_p("AAAAA")

# The overflow string
overflow = "A" * 100

# Run the overflow
msvcrt.strcpy(buffer, overflow)

Now that we have the test case built, open a new file called access_violation_handler.py, and enter the following code.

access_violation_handler.py

from pydbg import *
from pydbg.defines import *

# Utility libraries included with PyDbg
import utils

# This is our access violation handler
def check_accessv(dbg):

    # We skip first-chance exceptions
    if dbg.dbg.u.Exception.dwFirstChance:
            return DBG_EXCEPTION_NOT_HANDLED

    crash_bin = utils.crash_binning.crash_binning()
    crash_bin.record_crash(dbg)
    print crash_bin.crash_synopsis()

    dbg.terminate_process()

    return DBG_EXCEPTION_NOT_HANDLED

pid = raw_input("Enter the Process ID: ")

dbg = pydbg()
dbg.attach(int(pid))
dbg.set_callback(EXCEPTION_ACCESS_VIOLATION,check_accessv)
dbg.run()

Now run the buffer_overflow.py file and take note of its PID; it will pause until you are ready to let it run. Execute the access_violation_handler.py file, and enter the PID of the test harness. Once you have the debugger attached, hit any key in the console where the harness is running, and you will see output similar to Example 4-1.

Example 4-1. Crash output using PyDbg crash binning utility

 python25.dll:1e071cd8 mov ecx,[eax+0x54] from thread 3376 caused access
  violation when attempting to read from 0x41414195

 CONTEXT DUMP
    EIP: 1e071cd8 mov ecx,[eax+0x54]
    EAX: 41414141 (1094795585) -> N/A
    EBX: 00b055d0 (  11556304) -> @U`" B`Ox,`O )Xb@|V`"L{O+H]$6 (heap)
    ECX: 0021fe90 (   2227856) -> !$4|7|4|@%,\!$H8|!OGGBG)00S\o (stack)
    EDX: 00a1dc60 (  10607712) -> V0`w`W (heap)
    EDI: 1e071cd0 ( 503782608) -> N/A
    ESI: 00a84220 (  11026976) -> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (heap)
    EBP: 1e1cf448 ( 505214024) -> enable() -> NoneEnable automa (stack)
    ESP: 0021fe74 (   2227828) -> 2? BUH` 7|4|@%,\!$H8|!OGGBG) (stack)
    +00: 00000000 (         0) -> N/A
    +04: 1e063f32 ( 503725874) -> N/A
    +08: 00a84220 (  11026976) -> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (heap)
    +0c: 00000000 (         0) -> N/A
    +10: 00000000 (         0) -> N/A
    +14: 00b055c0 (  11556288) -> @F@U`" B`Ox,`O )Xb@|V`"L{O+H]$ (heap)

 disasm around:
          0x1e071cc9 int3
          0x1e071cca int3
          0x1e071ccb int3
          0x1e071ccc int3
          0x1e071ccd int3
          0x1e071cce int3
          0x1e071ccf int3
          0x1e071cd0 push esi
          0x1e071cd1 mov esi,[esp+0x8]
          0x1e071cd5 mov eax,[esi+0x4]
          0x1e071cd8 mov ecx,[eax+0x54]
          0x1e071cdb test ch,0x40
          0x1e071cde jz 0x1e071cff
          0x1e071ce0 mov eax,[eax+0xa4]
          0x1e071ce6 test eax,eax
          0x1e071ce8 jz 0x1e071cf4
          0x1e071cea push esi
          0x1e071ceb call eax
          0x1e071ced add esp,0x4
          0x1e071cf0 test eax,eax
          0x1e071cf2 jz 0x1e071cff

 SEH unwind:
          0021ffe0 -> python.exe:1d00136c jmp [0x1d002040]
          ffffffff -> kernel32.dll:7c839aa8 push ebp

The output reveals many pieces of useful information. The first portion Crash output using PyDbg crash binning utility tells you which instruction caused the access violation as well as which module that instruction lives in. This information is useful for writing an exploit or if you are using a static analysis tool to determine where the fault is. The second portion Crash output using PyDbg crash binning utility is the context dump of all the registers; of particular interest is that we have overwritten EAX with 0x41414141 (0x41 is the hexadecimal value of the capital letter A). As well, we can see that the ESI register points to a string of A characters, the same as for a stack pointer at ESP+08. The third section Crash output using PyDbg crash binning utility is a disassembly of the instructions before and after the faulting instruction, and the final section is the list of structured exception handling (SEH) handlers that were registered at the time of the crash.

You can see how simple it is to set up a crash handler using PyDbg. It is an incredibly useful feature that enables you to automate the crash handling and postmortem of a process that you are analyzing. Next we are going to use PyDbg's internal process snapshotting capability to build a process rewinder.

^[25] The PaiMei source tree, documentation, and development roadmap can be found at http://code.google.com/p/paimei/.