Chapter 8. FUZZING

Fuzzing has been a hot topic for some time, mostly because it's one of the most effective techniques for finding bugs in software. Fuzzing is nothing more than creating malformed or semi-malformed data to send to an application in an attempt to cause faults. We will discuss the different types of fuzzers and the bug classes that represent the faults we are looking for; then we'll create a file fuzzer for our own use. In later chapters, we'll cover the Sulley fuzzing framework and a fuzzer designed to break Windows-based drivers.

First it's important to understand the two basic styles of fuzzers: generation and mutation fuzzers. Generation fuzzers create the data that they are sending to the target, whereas mutation fuzzers take pieces of existing data and alter it. An example of a generation fuzzer is something that would create a set of malformed HTTP requests and send them at a target web server daemon. A mutation fuzzer could be something that uses a packet capture of HTTP requests and mutates them before delivering them to the web server.

In order for you to understand how to create an effective fuzzer, we must first take a quick stroll through a sampling of the different bug classes that offer favorable conditions for exploitation. This is not going to be an exhaustive list[37] but rather a very high-level tour through some of the common faults present in applications today, and we'll show you how to hit them with your own fuzzers.

When analyzing a software application for faults, a hacker or reverse engineer is looking for particular bugs that will enable him to take control of code execution within that application. Fuzzers can provide an automated way of finding bugs that assist a hacker in taking control of the host system, escalating privileges, or stealing information that the application has access to, whether the target application operates as an independent process or as a web application that uses a scripting language. We are going to focus on bugs that are typically found in software that runs as an independent process on the host operating system and are most likely to result in a successful host compromise.

Buffer overflows are the most common type of software vulnerability. All kinds of innocuous memory-management functions, string-manipulation routines, and even intrinsic functionality are part of the programming language itself and cause software to fail because of buffer overflows.

In short, a buffer overflow occurs when a quantity of data is stored in a region of memory that is too small to hold it. A metaphor to explain this concept would be to think of a buffer as a bucket that can hold a gallon of water. It's fine to pour in two drops of water or half a gallon, or even fill the bucket to the top. But we all know what happens when you pour two gallons of water into the bucket: water spills out onto the floor, and you have a mess to clean up. Essentially the same thing happens in software applications; when there is too much water (data), it spills out of the bucket (buffer) and covers the surrounding floor (memory). When an attacker can control the way the memory is overwritten, he is on his way to getting full code execution and ultimately a compromise in some form or another. There are two primary buffer overflow types: stack-based overflows and heap-based overflows. These types behave quite differently but still produce the same result: attacker-controlled code execution.

A stack overflow is characterized by a buffer overflow that subsequently overwrites data on the stack, which can be used as a means to control execution flow. Code execution can be obtained from a stack overflow by the attacker overwriting a function's return address, changing function pointers, altering variables, or changing the execution chain of exception handlers within the application. Stack overflows throw access violations as soon as the bad data is accessed; this makes them relatively easy to track down after a fuzzing run.

A heap overflow occurs within the executing process's heap segment, where the application dynamically allocates memory at runtime. A heap is composed of chunks that are tied together by metadata stored in the chunk itself. When a heap overflow occurs, the attacker overwrites the metadata in the chunk that's adjacent to the region that overflowed. When this occurs, an attacker is controlling writes to arbitrary memory locations that can include variables, function pointers, security tokens, or any number of important data structures that may be stored in the heap at the time of the overflow. Heap overflows can be difficult to track down initially, and the chunks that have been affected may not get used until sometime later in the application's lifetime. This delay until an access violation is triggered can pose some challenges when you're trying to track down a crash during a fuzzing run.

In order to target buffer overflows from a fuzzing perspective, we simply try to pass very large amounts of data to the target application in the hope that it will make its way into a routine that is not correctly checking the length before copying it around.

We will now look at integer overflows, which are another common bug class found in software applications.

Integer overflows are an interesting class of bugs that involve exploiting the way a compiler sizes signed integers and how the processor handles arithmetic operations on these integers. A signed integer is one that can hold a value from –32767 to 32767 and is 2 bytes in length. An integer overflow occurs when an attempt is made to store a value beyond this range in a signed integer. Since the value is too large to be stored in a 32-bit signed integer, the processor drops the high-order bits in order to successfully store the value. At first glance this doesn't sound like a big deal, but let's take a look at a contrived example of how an integer overflow can result in allocating far too little space and possibly resulting in a buffer overflow down the road:

MOV EAX, [ESP + 0x8]
LEA EDI, [EAX + 0x24]
PUSH EDI
CALL msvcrt.malloc

The first instruction takes a parameter off the stack [ESP + 0x8] and loads it into EAX. The next instruction adds 0x24 to EAX and stores the result in EDI. We then use this resulting value as the single parameter (the requested allocation size) to the memory allocation routine malloc. This all seems fairly innocuous, right? Assuming that the parameter on the stack is a signed integer, if EAX contains a very high number that's close to the high range for a signed integer (remember 32767) and we add 0x24 to it, the integer overflows, and we end up with a very low positive value. Take a peek at Example 8-1 to see how this would play out, assuming the parameter on the stack is under our control and we can hand it a high value of 0xFFFFFFF5.


If this happens, then malloc will allocate only 0x19 bytes, which could be a much smaller portion of memory than what the developer intended to allocate. If this small buffer is supposed to hold a large portion of user-supplied input, then a buffer overflow occurs. To target integer overflows with a fuzzer, we need to make sure we are passing both high positive numbers and low negative values in an attempt to achieve an integer overflow, which could lead to undesired behavior in the target application or even a full buffer overflow condition.

Now let's take a quick peek at format string attacks, which are another common bug found in applications today.

Format string attacks involve an attacker passing input that gets treated as the format specifier in certain string-manipulation routines, such as the C function printf. Let's first examine the prototype of the printf function:

int printf( const char * format, ... );

The first parameter is the fully formatted string, which we'll combine with any number of additional parameters that represent the values to be formatted. An example of this would be:

int test = 10000;
printf("We have written %d lines of code so far.", test);

Output:

We have written 10000 lines of code so far.

The %d is the format specifier, and if a clumsy programmer forgets to put that format specifier in her calls to printf, then you'll see something like this:

char* test = "%x";
printf(test);

Output:

5a88c3188

This looks a lot different. When we pass in a format specifier to a printf call that doesn't have a specifier, it will parse the one we pass to it and assume that the next value on the stack is the variable to be formatted. In this case you are seeing 0x5a88c3188, which is either a piece of data stored on the stack or a pointer to data in memory. A couple of specifiers of interest are the %s and %n specifiers. The %s specifier tells the string function to scan memory for a string until it encounters a NULL byte signifying the end of the string. This is useful for reading in large amounts of data to either discover what's stored at a particular address or to cause the application to crash by reading memory that it is not supposed to access. The %n specifier is unique in that it enables you to write data to memory instead of just formatting it. This enables an attacker to overwrite the return address or a function pointer to an existing routine, which in both cases will lead to arbitrary code execution. In terms of fuzzing, we just need to make sure that the test cases we are generating pass in some of these format specifiers in an attempt to exercise a misused string function that accepts our format specifier.

Now that we have cruised through some high-level bug classes, it's time to begin building our first fuzzer. It will be a simple generation file fuzzer that can generically mutate any file format. We are also going to be revisiting our good friend PyDbg, which will control and track crashes in the target application. Onward!



[37] An excellent reference book, and one you should definitely add to your bookshelf, is Mark Dowd, John McDonald, and Justin Schuh's The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities (Addison-Wesley Professional, 2006).