If you've made it this far, then you should have a good understanding of how to use Python to construct a user-mode debugger for Windows. We'll now move on to learning how to harness the power of PyDbg, an open source Python debugger for Windows. PyDbg was released by Pedram Amini at Recon 2006 in Montreal, Quebec, as a core component in the PaiMei[25] reverse engineering framework. PyDbg has been used in quite a few tools, including the popular proxy fuzzer Taof and a Windows driver fuzzer that I built called ioctlizer. We will start with extending breakpoint handlers and then move into more advanced topics such as handling application crashes and taking process snapshots. Some of the tools we'll build in this chapter can be used later on to support some of the fuzzers we are going to develop. Let's get on with it.
In the previous chapter we covered the basics of using event handlers to handle specific debugging events. With PyDbg it is quite easy to extend this basic functionality by implementing user-defined callback functions. With a user-defined callback, we can implement custom logic when the debugger receives a debugging event. The custom code can do a variety of things such as read certain memory offsets, set further breakpoints, or manipulate memory. Once the custom code has run, we return control to the debugger and allow it to resume the debuggee.
The PyDbg function to set soft breakpoints has the following prototype:
bp_set(address, description="",restore=True,handler=None)
The address
parameter is the address where the
soft breakpoint should be set; the description
parameter is optional and can be used to uniquely name each breakpoint.
The restore
parameter determines whether the
breakpoint should automatically be reset after it's handled, and the
handler
parameter specifies which function to call
when this breakpoint is encountered. Breakpoint callback functions take
only one parameter, which is an instance of the
pydbg()
class. All context, thread, and process
information will already be populated in this class when it is passed to
the callback function.
Using our printf_loop.py script, let's
implement a user-defined callback function. For this exercise, we will
read the value of the counter that is used in the
printf
loop and replace it with a random number
between 1 and 100. One neat thing to remember is that we are actually
observing, recording, and manipulating live events inside the target
process. This is truly powerful! Open a new Python script, name it
printf_random.py, and enter the following
code.
from pydbg import * from pydbg.defines import * import struct import random # This is our user defined callback function def printf_randomizer(dbg): # Read in the value of the counter at ESP + 0x8 as a DWORD parameter_addr = dbg.context.Esp + 0x8 counter = dbg.read_process_memory(parameter_addr,4) # When we use read_process_memory, it returns a packed binary # string. We must first unpack it before we can use it further. counter = struct.unpack("L",counter)[0] print "Counter: %d" % int(counter) # Generate a random number and pack it into binary format # so that it is written correctly back into the process random_counter = random.randint(1,100) random_counter = struct.pack("L",random_counter)[0] # Now swap in our random number and resume the process dbg.write_process_memory(parameter_addr,random_counter) return DBG_CONTINUE # Instantiate the pydbg class dbg = pydbg() # Now enter the PID of the printf_loop.py process pid = raw_input("Enter the printf_loop.py PID: ") # Attach the debugger to that process dbg.attach(int(pid)) # Set the breakpoint with the printf_randomizer function # defined as a callback printf_address = dbg.func_resolve("msvcrt","printf") dbg.bp_set(printf_address,description="printf_address",handler=printf_randomizer) # Resume the process dbg.run()
Now run both the printf_loop.py and the printf_random.py scripts. The output should look similar to what is shown in Table 4-1.
Table 4-1. Output from the Debugger and the Manipulated Process
Output from Debugger | Output from Debugged Process |
---|---|
Enter the printf_loop.py PID:
| Loop iteration 0! |
… | Loop iteration 1! |
… | Loop iteration 2! |
… | Loop iteration 3! |
Counter: 4 | Loop iteration 32! |
Counter: 5 | Loop iteration 39! |
Counter: 6 | Loop iteration 86! |
Counter: 7 | Loop iteration 22! |
Counter: 8 | Loop iteration 70! |
Counter: 9 | Loop iteration 95! |
Counter: 10 | Loop iteration 60! |
You can see that the debugger set a breakpoint on the fourth
iteration of the infinite printf
loop, because
the counter as recorded by the debugger is set to 4. You will also
notice that the printf_loop.py script ran fine
until it reached iteration 4; instead of outputting the number 4, it
output the number 32! It is clear to see how our debugger records
the real value of the counter and sets the counter to a random
number before it is output by the debugged process. This is a simple
yet powerful example of how you can easily extend a scriptable
debugger to perform additional actions when debugging events occur.
Now let's take a look at handling application crashes with
PyDbg.
An access violation occurs inside a process when it attempts to access memory it doesn't have permission to access or in a particular way that it is not allowed. The faults that lead to access violations range from buffer overflows to improperly handled null pointers. From a security perspective, every access violation should be reviewed carefully, as the violation might be exploited.
When an access violation occurs within a debugged process, the debugger is responsible for handling it. It is crucial that the debugger trap all information that is relevant, such as the stack frame, the registers, and the instruction that caused the violation. You can now use this information as a starting point for writing an exploit or creating a binary patch.
PyDbg has an excellent method for installing an access violation
handler, as well as utility functions to output all of the pertinent
crash information. Let's first create a test harness that will use the
dangerous C function strcpy()
to create a buffer
overflow. Following the test harness, we will write a brief PyDbg
script to attach to and handle the access violation. Let's start with
the test script. Open a new file called
buffer_overflow.py, and enter the following
code.
from ctypes import * msvcrt = cdll.msvcrt # Give the debugger time to attach, then hit a button raw_input("Once the debugger is attached, press any key.") # Create the 5-byte destination buffer buffer = c_char_p("AAAAA") # The overflow string overflow = "A" * 100 # Run the overflow msvcrt.strcpy(buffer, overflow)
Now that we have the test case built, open a new file called access_violation_handler.py, and enter the following code.
from pydbg import * from pydbg.defines import * # Utility libraries included with PyDbg import utils # This is our access violation handler def check_accessv(dbg): # We skip first-chance exceptions if dbg.dbg.u.Exception.dwFirstChance: return DBG_EXCEPTION_NOT_HANDLED crash_bin = utils.crash_binning.crash_binning() crash_bin.record_crash(dbg) print crash_bin.crash_synopsis() dbg.terminate_process() return DBG_EXCEPTION_NOT_HANDLED pid = raw_input("Enter the Process ID: ") dbg = pydbg() dbg.attach(int(pid)) dbg.set_callback(EXCEPTION_ACCESS_VIOLATION,check_accessv) dbg.run()
Now run the buffer_overflow.py file and take note of its PID; it will pause until you are ready to let it run. Execute the access_violation_handler.py file, and enter the PID of the test harness. Once you have the debugger attached, hit any key in the console where the harness is running, and you will see output similar to Example 4-1.
Example 4-1. Crash output using PyDbg crash binning utility
python25.dll:1e071cd8 mov ecx,[eax+0x54] from thread 3376 caused access violation when attempting to read from 0x41414195
CONTEXT DUMP EIP: 1e071cd8 mov ecx,[eax+0x54] EAX: 41414141 (1094795585) -> N/A EBX: 00b055d0 ( 11556304) -> @U`" B`Ox,`O )Xb@|V`"L{O+H]$6 (heap) ECX: 0021fe90 ( 2227856) -> !$4|7|4|@%,\!$H8|!OGGBG)00S\o (stack) EDX: 00a1dc60 ( 10607712) -> V0`w`W (heap) EDI: 1e071cd0 ( 503782608) -> N/A ESI: 00a84220 ( 11026976) -> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (heap) EBP: 1e1cf448 ( 505214024) -> enable() -> NoneEnable automa (stack) ESP: 0021fe74 ( 2227828) -> 2? BUH` 7|4|@%,\!$H8|!OGGBG) (stack) +00: 00000000 ( 0) -> N/A +04: 1e063f32 ( 503725874) -> N/A +08: 00a84220 ( 11026976) -> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (heap) +0c: 00000000 ( 0) -> N/A +10: 00000000 ( 0) -> N/A +14: 00b055c0 ( 11556288) -> @F@U`" B`Ox,`O )Xb@|V`"L{O+H]$ (heap)
disasm around: 0x1e071cc9 int3 0x1e071cca int3 0x1e071ccb int3 0x1e071ccc int3 0x1e071ccd int3 0x1e071cce int3 0x1e071ccf int3 0x1e071cd0 push esi 0x1e071cd1 mov esi,[esp+0x8] 0x1e071cd5 mov eax,[esi+0x4] 0x1e071cd8 mov ecx,[eax+0x54] 0x1e071cdb test ch,0x40 0x1e071cde jz 0x1e071cff 0x1e071ce0 mov eax,[eax+0xa4] 0x1e071ce6 test eax,eax 0x1e071ce8 jz 0x1e071cf4 0x1e071cea push esi 0x1e071ceb call eax 0x1e071ced add esp,0x4 0x1e071cf0 test eax,eax 0x1e071cf2 jz 0x1e071cff
SEH unwind: 0021ffe0 -> python.exe:1d00136c jmp [0x1d002040] ffffffff -> kernel32.dll:7c839aa8 push ebp
The output reveals many pieces of useful information. The
first portion tells you which instruction caused the access
violation as well as which module that instruction lives in. This
information is useful for writing an exploit or if you are using a
static analysis tool to determine where the fault is. The second
portion
is the context dump of all the registers; of
particular interest is that we have overwritten
EAX
with 0x41414141
(0x41
is the hexadecimal value of the capital
letter A). As well, we can see that the
ESI
register points to a string of
A characters, the same as for a stack pointer
at ESP+08
. The third section is a disassembly of the instructions before
and after the faulting instruction, and the final section
is the list of structured exception
handling (SEH) handlers that were registered at the time
of the crash.
You can see how simple it is to set up a crash handler using PyDbg. It is an incredibly useful feature that enables you to automate the crash handling and postmortem of a process that you are analyzing. Next we are going to use PyDbg's internal process snapshotting capability to build a process rewinder.
[25] The PaiMei source tree, documentation, and development roadmap can be found at http://code.google.com/p/paimei/.