Chapter 23. Binary Reverse Engineering

Finding and exploiting software flaws is always one of the most effective ways to breach the security of any computer system or network. Likewise, if you are going to defend against this sort of attack, you need to find these bugs first so they can be fixed. Finding bugs is relatively straightforward if you have the source code, but what about closed source binary programs? This class of software is very common on networks today. So how do you find these bugs even if the code is not available to you? You reverse engineer the binary.

I've always loved to play video games; only trouble is, I'm not very good at them. I've learned to make up for my gaming inadequacies by reverse engineering the games so I can cheat. Odd as it might seem, I think this is the best way to practice reverse engineering skills or learn how to use a new tool. For the rest of this chapter, I'm going to reverse engineer an open source video game, FreeCiv 2.0.8. You can get a copy of this game, complete with source code, from http://www.freeciv.org. To get the most out of this chapter, you might want to install FreeCiv and follow along as I rip it apart from the binary. Because it's open source, you can even check your work as you go.

Interactive Disassembler

Interactive disassembler (IDA) is easily the most powerful disassembler in widespread use today. In my opinion, it is nothing short of amazing. It will disassemble almost anything you put in front of it, from run of the mill x86 binaries to ARM and PDP-11 code (if you can find any). It has powerful automatic analysis abilities, and when that fails, it comes equipped with a rich scripting environment. For all its great features, IDA does have some pitfalls that you need to watch out for if you're going to get by.

Tip

If you're not very comfortable working with assembly code, IDA has a special feature just for you. From the top menu, select Options → General and then put a check beside Auto Comments. You will now have most of the instructions labeled on the disassembly window with a brief description of what they do. I've used this feature to teach myself various assembly languages before.

Let's start things off by opening up civclient.exe. The first thing you will see when you open a binary in IDA is the menu in Figure 23-1. IDA will present you with a number of options here, but unless you know that IDA has definitely misidentified the binary type, it's best to leave it with defaults. The next thing you see is IDA doing its auto analysis phase. It might take a while to finish, so this is a good time to get a cup of coffee and read a blog or two while IDA does its thing.

Figure 23-1. IDA menu

Special cases

There will be some times when IDA is either wrong in its assumptions about a file or when it is just not capable of parsing the file headers. This can happen because the file type is too new or obscure to be supported, or perhaps more commonly because we are reverse engineering a firmware image. In this case, it may be necessary to manually set the file type to Binary Image as well as set the segment and loading offset manually. I have also seen one occasion when a bug in IDA caused it to crash immediately after starting auto-analysis on a file. In this case, the solution was to disable auto-analysis.

Warning

IDA is not the most stable software ever created. This means that sooner or later it is going to crash on you. When it does and you reopen it, it will ask you if you would like to attempt to repair the old database. In my experience, IDA rarely repairs the old database successfully, which leaves you having to start over. To prevent this, never choose to repair the old database.

Searching in IDA

IDA supports a variety of search options, including searching for text strings, immediate values, or a sequence of bytes. These are useful when you are looking for a specific place in a binary when you know what you're looking for. In the past, I've been able to find cryptography sections of a program by searching for the specific byte sequences used to initialize the cryptographic functions. IDA also gives you the ability to search for the next code section, data section, errors, void operands, and unexplored sections. This can be helpful if you are trying to clean up a disassembly image quickly to make it as close to a completely accurate disassembly as possible.

Searching for text strings

Searching for text in the disassembly window is the most basic search you can do with IDA. This search will look for the occurrence of the given text anywhere on the disassembly window. This includes things like function names and comments added to the code. Later we will find more powerful ways to search for strings in the binary that will not include text on the disassembly window that is not from the binary (e.g., user comments). For now though, let's use this feature to find something interesting in the game we're reverse engineering.

To get started, I'd like to find where within the FreeCiv client (civclient.exe) it manipulates money for our player. So let's do a string text search for the word "gold" and see what we find. After searching for a few seconds, we find the following string:

Game: %s costs %d gold and you only have %d gold.

Paydirt! This is obviously a format string, and the third argument is probably the player's gold count. The code using this string looks like this:

.text:004081DD    mov dword ptr [esp], offset aGameSCostsDGol
.text:004081E4    call ds:libintl_gettext
.text:004081EA    mov edx, ds:dword_540A88
.text:004081F0    mov edx, [edx+2E0h]
.text:004081F6    mov [esp+0Ch], ebx
.text:004081FA    lea ebx, [ebp+var_408]
.text:00408200    mov [esp+8], eax
.text:00408204    mov eax, 200h
.text:00408209    mov [esp+14h], edx
.text:0040820D    mov [esp+10h], esi
.text:00408211    mov [esp+4], eax
.text:00408215    mov [esp+0], ebx
.text:00408218    call sub_42D5F0

If you look at the two lines of code in boldface, you will see that the third argument passed to the subroutine after the format string is [edx+2E0h]. The offset 2E0h is from some structure in memory that probably is associated with our current player's gold count.

Searching for immediate values

Now that we know the offset in this structure, we can begin to search for other places this might be used. Let's try to find a useful place for us to modify some code to add a cheat. From the top menu, select Search → "immediate value..." and enter 0x2E0. This will search for the next place in the binary where some code is using this same offset, and thus is probably referencing our player's gold count. If you select "Find all occurrences," it will give you a list of almost all the places that gold values are read or modified. Right away we can see two locations where our gold value is modified; the most interesting is the following:

.text:0041C181    mov [esi+2E0h], eax

This line is interesting for two reasons. First, we want to find where our gold value is set in the client; second, because of what we find in the cross-reference graph (more on this in a bit).

Defining Data Types

All binary reverse engineering tools are designed with the goal of making the very low-level assembly code that makes up an executable into something a little easier to read and understand. One of the ways that IDA does this is by allowing us to define various high-level data types that we can overlay with structures in memory to help make sense of what the binary is doing. The three basic data types supported by IDA are structures, unions, and enumerated types.

Annotating the Code

We've already found some locations of interest within this binary, so it's useful to be able to keep some notes on what we're doing. IDA provides us with a number of ways to keep notes on what's going on in a program and to mark interesting parts of code for later review.

Setting comments

The most basic way to make a note of what's going on at a specific location in the disassembly is to insert a comment. There are two basic types of comments in IDA: nonrepeatable and repeatable. Nonrepeatable comments are set by pressing the colon key (:) on the line of interest; this type will only be displayed on the line in which they are inserted. Repeatable comments are inserted by pressing the semicolon key (;) on the line of interest. This type of comment is useful because it will display the comment on the selected line just like a nonrepeatable comment, but if this line of code is referenced in some other place in the executable, then a copy of the comment will be placed there as well.

Marking positions

Another way that IDA lets us mark points of interest in a disassembly is by allowing us to mark special locations. They basically work like bookmarks, so we can mark all the locations we know we'll need to come back to later and then easily find them when we need them.

An example

To best see how all this works in action, let's set a repeatable comment in our FreeCiv disassembly. We already found that in sub_41C130, the program changes the amount of gold we have. Go to the top of that function and set a repeatable comment. Now that we've done that, we want to see how our comment is repeated on lines where this function is called. But before we leave this function, let's mark this position so we will be able to easily find it again. With the curser on the top line of this function, press Alt-M and type in the name for this position. We can now go to a place where this function is called to see how our comments are repeated. To the right of this function, you will see an auto-generated comment with a list of functions that call the current function; it should look like this:

; CODE XREF: sub_41EE90+28E&#8595;p

Double-click on this and you will be taken to the place where the current function is called. When you do, you will see that your repeatable comment gives you the notes you made about the function it is calling. Now go back to the position we marked earlier by either right-clicking on the addresses on the right side of the screen and selecting the marked position there or by selecting Jump → Mark position... from the top menu and then choosing it from the list.

Code Navigation

IDA provides an interface for navigating code that will be familiar to anyone who has used a web browser. This supports going forward and back as well as opening up a new window to follow a memory address. IDA lets us dynamically travel through a binary by treating memory addresses and symbolic names for variables such as hyperlinks in the disassembly window. This is why double-clicking on the function name earlier took us to that location in the disassembly window. These addresses and symbolic names can even be in user-supplied comments. In older versions of IDA, there was a limit on the number of marked positions we could set. I used the fact that I could mark locations in comments to get around this limitation.

IDA will also let us jump directly to parts of a file by specifying the address, symbolic name, or file offset. To get a complete list of these options, see the Jump menu.

Tip

To get a quick view of what you will see at a given address, hold your mouse over the address for a second or two and IDA will drop down a small window displaying what is at that location.

Tracking the Flow of Execution

Now that we know how to manually traverse the code in the disassembly window, we could use this to manually compile information about a given function or variable by following all its function calls—but this would take way too much time. Luckily, while IDA was performing its auto-analysis of the file when we first opened it, it already figured out this information for us. This starts with something as simple as a call graph for a function, but current versions of IDA will also give us complete cross-references to and from a function as well as some pretty graphs and charts to help visualize all this data.

Cross-reference

We've already seen one of the ways that IDA provides us with cross-reference information. Every function and variable has beside it an auto-generated comment with a list of memory locations where that function or variable is referenced. This is useful, but due to limitations on display space, this list is cut off after a few entries. You can increase the number of displayed cross-references by selecting Options → General..., clicking the tab labeled "Cross-references", and then increasing the value for "Number of displayed xrefs." You can get a complete list of cross-references by selecting the symbolic name and pressing the x key to bring up the cross-references window for this variable or function.

In addition to displaying lists of cross-references to a function or variable, IDA will give us a graph of all the cross-references to or from a given function or variable. This can be very helpful when we want to get an idea of how a function is used in relation to the rest of the program. Often you can tell almost all you need to know about a function simply by taking a peek at what functions call it and what functions it calls.

Earlier I did just that when searching for where civclient.exe modified the player's gold values. We found that in sub_41C130 there was some interesting code. Looking at the cross-reference graph, we can see what made me interested in this function. To get this graph, select View → Graphs → "Xrefs to". With a little bit of checking, we can see that the sub_41C130 subroutine is being called from the packet-processing code in the client. The more you look at it, the more it looks like the only places we adjust our gold is when we are processing packets from the server. I'll cut to the chase here and tell you the bad news. The client program takes all the information about our player, including how much gold we have, from the server. This makes for a major problem for us in our efforts to cheat, but don't despair because with the right tools we'll be able to fix this.

Flow charts

One of the ways that I help make sense of a seemingly chaotic disassembly is to make a flow chart of the execution path for a given function. This can be time-consuming to do by hand, especially when you're dealing with a very complex program. IDA helps us out here by including a feature to automatically generate execution flow charts, which include the assembly code right in them. We can even print these out if we want to. To generate these charts, select View → Graphs → "Flow chart".

Tracking function calls

In addition to flow charts and cross-references, it is often useful to simply track the function calls that a given function makes. There are two ways to get this information. The first is to select View → "Open subviews" → "Function calls". This will give us a list of function calls made by the currently selected function. Or, select View → Graphs → "Function calls", which will give us a graph of the same data similar to the cross-reference graphs.

Using Subview Windows

One reason that IDA is such a powerful reverse engineering tool is that it is able to clearly display massive amounts of information to the user with an interface that makes the data easy to digest quickly. It does this by breaking down the data into a number of data subview windows, which allows you to concentrate on different types of data separately.

Functions window

The functions window displays the names of all the functions it has identified in the binary. If the function's true name can't be determined by looking at the symbol table, then IDA will autogenerate a name for the function based on its memory location. By default, these function names look like sub_4A93F0, but you can change the function-naming conventions by selecting Options → "Name representation...". If the program we are reverse engineering is kind enough to provide us with symbols, this is the best place to start a search for a specific section of code. Otherwise, we will need to rely on some of the other subviews at first.

Strings window

We've already seen how useful searching for strings can be when reverse engineering. When we were doing a text search before, it was a bit tedious because we had to search through every string match one at a time. This is where the strings window really comes in handy. Using the strings window instead of doing a text search allows us to more easily browse for string matches and, because the strings window contains only strings found in the binary (and not also text from the disassembly window), we will get only what we want.

Names window

The names window is sort of a combination of the functions, imports, exports, and strings windows; however it displays only those names with useful meanings behind them. This means that it will include all function and variable names it gets from the symbol tables, including the imports and exports, as well as automatically generated string names which are named based on their content. When I am trying to find something specific in a program, it is often fastest to start here because it gives us a good access to those symbols that we know the most about.

Imports and exports windows

IDA provides an easy way to see which routines a program is exporting and which it is importing. This is amazingly helpful because it lets us quickly locate where a program is calling library functions. Many of these libraries will be well documented, which means we can read the assembly code from a high level and find what we're looking for faster.

Let's use the Imports Window to help us figure out a way to cheat. So open civserver.exe in IDA and we'll see if we can't find a bug in the server that will let us manipulate our player's gold values. The first step will be to find out where packet processing is done in the server process. The best way I know to do this is to check the Imports Window for socket routines and see who calls them. When we track back calls to recv, we find sub_4A93F0, which appears to be the top-level routine for packet parsing in the server, as shown in Figure 23-2.

Figure 23-2. Sample flow chart

Debugging with IDA

We can learn a lot about a program by looking at its image on disk, but sometimes it can be really helpful to see the program as it is running. This is especially true for programs like malware that intentionally modify their code at runtime so that the image in memory is very different from the one on the disk. Debugging can also make it easier to follow data through an application. This is very important when doing vulnerability research because we spend most of our time tracking data to see how we can make it affect a system.

Now that we've located the place where packets are parsed in the server, let's attach to it with a debugger and see if we can find a useful bug. From the top menu, select Debugger → Start process to start the server in the debugger. Once you do this, you should take notice of the new memory layout that the system shows you. This is because we now have access to see all the runtime segments created by this process including the code and data for all the DLLs that the process links with.

Warning

In my experience, the IDA debugger tends to be the least stable part of the program. I have seen it crash more than a few times when the process I am debugging crashes. So always remember to save your work often.

Initial configuration

When we start the server in the debugger, notice that it immediately starts running without breaking first. If we would like the debugger to stop when the process starts or at other places such as when libraries are loaded, select Debugger → "Debugger options..." from the top menu and choose from a list of initial breakpoint and logging options.

Setting breakpoints and watchpoints

A debugger isn't very useful without break or watchpoints, so let's set up some useful breakpoints in the server to track how packets are parsed. Go back to sub_4A93F0 and at the top of this function select Debugger → Breakpoints → "Add breakpoint". If we have a variable, we want to watch for read or write access to we can set a watchpoint by selecting Debugger → Watches → "Add watch".

Stepping through the program

Now that we have a breakpoint set, let's fire up a FreeCiv client and connect to our server. This should trigger our breakpoint and let us walk through the packet parsing. If all goes well, we should see the red line at the breakpoint turn purple as the breakpoint is reached. As with all debuggers, we have the standard options of "Step into," "Step over," and "Run until return." IDA also allows us to Run to curser. This sets a temporary breakpoint at the line where the curser is, which will be removed after the breakpoint is triggered. All these options are available from the Debugger menu.

Examining data

Now let's use these to find some useful bugs in the server. As we single step through the packet parsing, we are obviously going to need to examine the data being manipulated. IDA provides us with some extra subview windows while in debugger mode to help us. The two I find most useful are the General registers window and the IDA View-ESP, or the stack view. IDA will also display register contents if you mouse over a register name for a second or so. As you walk through the packet-parsing function, pay careful attention to how it responds to the data in our packet. I've identified two bugs that are of use to us, one in this function and another deeper in this function's call graph. See if you can find them on your own (or skip ahead to see my description of them).

Tracing

Sometimes we want to follow the execution of a program without actually stopping the program and manually single stepping. Sometimes we just don't want to spend time single stepping. And sometimes we need to do this because programs have timing-related issues that require faster execution than we can get single stepping. I've seen more than a few pieces of malware that used timing specifically to see whether it is being debugged. To set tracepoints in IDA, select Debugger → Tracing, then select from the options given for what conditions will cause the tracepoint to trigger. When a tracepoint triggers, you will see a log message in IDA's log display during debugging.

Taking a memory snapshot

A process will have much more information in the memory image used during debugging than it will in the binary image on disk. In some cases these images can be completely different. It is common for copy protection systems and malware to intentionally modify their memory segments when the program starts up to make reverse engineering difficult. In the later section "Making Scripts with IDC," we will discuss how to get around the best of these systems, but for now we can get around most of them by waiting for the system to decrypt its memory image for us and then saving a memory snapshot of it. To do this, select Debugger → "Take memory snapshot" from the top menu while in debugger mode.

Remote debugging

There are a number of occasions where we either can't run—or very much do not want to run—the application on the same machine that we use for our reverse engineering. In the case of IDA, the best example for this is when reverse engineering malware. For obvious reasons, it is always preferable to run the malware on a machine more disposable than your primary workstation. This is where remote debugging comes in very handy.

Configuring the client

On the client side, open the application you wish to debug in IDA and select Debugger → "Process options...". This will bring up the dialog shown in Figure 23-3. Make sure that the application and input file paths are correct; then be sure to set the hostname, port, and password appropriately.

Figure 23-3. Debug application setup

Configuring the remote host

From the command line, run the following command:

C:\win32_remote -Pmypassword

Replace mypassword with the desired password. By default, it uses port 23946; if you need to change this, use the command-line option -p to select a new port number.

Finding the Bugs

Browsing around the packet parsing, I've found a couple bugs that are of use to us here. The first is a denial of service attack in which we can cause the server to exit because it believes the system is out of memory. The idea for this first bug is to trick the server into letting us supply an argument to malloc that is way too large. I have annotated the disassembly for function sub_4A93F0 to show you how we trigger this bug:

.text:004A93F0    push ebp
.text:004A93F1    xor  ecx, ecx
.text:004A93F3    mov  ebp, esp
.text:004A93F5    push edi
.text:004A93F6    push esi
.text:004A93F7    push ebx
.text:004A93F8    sub  esp, 5Ch
.text:004A93FB    mov  eax, [ebp+arg_8]
.text:004A93FE    mov  [ebp+decompress_packet], 0
.text:004A9402    mov  edi, [ebp+arg_0]
.text:004A9405    mov  [ebp+header_len], 0
.text:004A940C    mov  byte ptr [eax], 0
.text:004A940F    cmp  byte ptr [edi+8], 0 ; is argument NULL?
.text:004A9413    jz   loc_4A94AA      ; if so jump to return
.text:004A9419    mov  edx, [edi+14h]  ; total length of the data
.text:004A941C    mov  eax, [edx]
.text:004A941E    cmp  eax, 2          ; do we have at least 3 bytes?
.text:004A9421    jle  loc_4A94AA      ; if not jump to return
.text:004A9427    mov  [esp+68h+var_60], eax
.text:004A942B    lea  ebx, [ebp+var_28]
.text:004A942E    mov  eax, [edx+0Ch]
.text:004A9431    mov  [esp+68h+var_68], ebx
.text:004A9434    mov  [esp+68h+var_64], eax
.text:004A9438    call sub_4A82A0      ; init length info
.text:004A943D    mov  [esp+68h+var_68], ebx
.text:004A9440    lea  eax, [ebp+length]
.text:004A9443    mov  [esp+68h+var_64], eax
.text:004A9447    call sub_4A8860      ; read 16bit short from packet
.text:004A944C    mov  eax, [ebp+length]
.text:004A944F    cmp  eax, 0FFFFh     ; is length == 0xFFFF?
.text:004A9454    jz   loc_4A95D0
.text:004A945A    cmp  eax, 4000h      ; is length > 0x4000?
.text:004A945F    jg   short loc_4A94B4 ; if so we take this branch

We see from examining data while debugging to determine that [edi+14h] is a structure offset to the length of the packet (as returned by recv). The first check we have to pass to get any length value past the validation code is at .text:004A941E. The total packet length must be greater than 2. The next point of interest is at .text:004A9447 where the function reads in two octets and converts them to host order. We know that sub_4A8860 reads a short Int from the network and converts it to host order by analyzing the flow chart for that function, as seen in Figure 23-4. As we see a few lines later, this is being used as a length. At .test:004A945A the function checks to see whether this length value is greater than 0x4000. As we'll see in a minute, this is the most interesting code path to take, so let's make sure we send a length value larger than 0x4000. This takes us to the following code after the last jump instruction:

.text:004A94B4    mov [ebp+decompress_packet], 1
.text:004A94B8    sub eax, 4001h
.text:004A94BD    mov [ebp+header_len], 2
.text:004A94C4    jmp short loc_4A9461

Figure 23-4. sub_4A8860

The first line sets what appears to be some Boolean value to true. The next line is very interesting because it subtracts 0x4001 from the length we give it. I am always interested in places where I can get something to allocate zero bytes; even if we can't get that value to malloc, it is likely to do something interesting. The next line sets a local stack variable to 2. By looking at cross-references for this stack variable, we see that the possible values it is set with are always equal to the amount of length information the function has already read. This to me looks like it is a header length field. Experience tells me that protocols with length fields that include the length of the header are prone to errors. We have already shown that we can get a modified length value to be zero by setting the length field to be 0x4001, so this might get interesting. Let's see where it goes next:

.text:004A9461    mov [ebp+length2], eax
.text:004A9464    mov eax, [edi+14h]
.text:004A9467    xor ecx, ecx
.text:004A9469    mov edx, [ebp+length2]
.text:004A946C    cmp edx, [eax]      ; is length2 > packet length?
.text:004A946E    ja  short loc_4A94AA ; if so jump to return
.text:004A9470    cmp [ebp+decompress_packet], 0
.text:004A9474    jnz loc_4A9520      ; we take this branch

Right away they save the modified length value to the local stack variable that I have named length2. Remember that if we are able to set the length field in the packet to 0x4001, then length2 will be set to 0. At .text:004A946C, they check to make sure that this new length is not larger than the total number of bytes in the packet. We know we can make this new length zero, and zero is definitely less than the number of bytes in this packet, so we are able to evade this bounds check. Next, if you will remember, the Boolean variable decompress_packet was set to true in the last code block, so we take the jump at the bottom here:

.text:004A9520    mov eax, [ebp+header_len]
.text:004A9523    mov ebx, edx    ; edx holds the modified length
.text:004A9525    sub ebx, eax    ; alloc_len = length2 - header_len
.text:004A9527    lea eax, [ebx+ebx*4] ; do math to get alloc_len
.text:004A952A    lea eax, [eax+eax*4]
.text:004A952D    lea ecx, ds:0[eax*4]
.text:004A9534    mov [ebp+alloc_len], ecx
.text:004A9537    mov eax, offset aPackets_c ; "packets.c"
.text:004A953C    mov [esp+68h+var_5C], eax
.text:004A9540    mov eax, 179h
.text:004A9545    mov [esp+68h+var_60], eax
.text:004A9549    mov eax, offset aMalloc_27 ; "malloc"
.text:004A954E    mov [esp+68h+var_68], ecx ; alloc_len
.text:004A9551    mov [esp+68h+var_64], eax
.text:004A9555    call sub_405530

Just as expected, at .text:004A9525 they subtract the header length (in this case, 2) from the modified length value (in this case, 0). This means that there is an integer underflow here causing the allocation length to be much larger than the packet length. The three lines after the subtraction are an optimized way to multiply the allocation length by 100, thus making our allocation length even larger. The next interesting line is .text:004A954E, where they set the alloc_len to be the first argument to sub_405530.

.text:00405530    push ebp
.text:00405531    mov ebp, esp
.text:00405533    sub esp, 18h
.text:00405536    mov [ebp+var_C], ebx
.text:00405539    mov eax, [ebp+arg_8]
.text:0040553C    mov ebx, [ebp+alloc_len]
.text:0040553F    mov [ebp+var_4], edi
.text:00405542    mov edx, [ebp+arg_4]
.text:00405545    mov edi, [ebp+arg_C]
.text:00405548    mov [esp+18h+var_18], eax
.text:0040554B    mov eax, ebx
.text:0040554D    mov [ebp+var_8], esi
.text:00405550    mov [esp+18h+var_14], edi
.text:00405554    call sub_4054F0
.text:00405559    mov [esp+18h+var_18], ebx ; alloc_len
.text:0040555C    call malloc

Looking at the two bold lines in this last code block, you can easily see our negative, and thus when used as a length value, very large allocation length is passed to malloc. If you were to follow the code path that handles malloc failures, you would see that they exit any time that malloc fails because they think the system is out of memory. Here's the payload needed to trigger this bug:

40 01 00

You can test this out by sending that payload (which you can create in a hex editor) using a tool such as Netcat.

We might be able to keep from losing by crashing the server whenever we want, but that isn't going to help us win any games. The second bug is a little more complicated, but it will let us cheat enough to ensure an unfair advantage leading to a victory. Like the last bug, this one is an integer-handling flaw where we are able to trick the server into accepting invalid input.

I still would like to find a way to cheat and give us a huge amount of gold. After finding the last integer-handling flaw, I decided to check for similar integer overflows in gold handling. This means I wanted to find out what sorts of things I can do that will manipulate my gold amount where the client can choose the amounts. One place where we manipulate our gold price and where we can name our price is in diplomatic negotiations. I decided to play a game where I make some valid diplomatic treaty offers including gold gifts and follow the packet processing until I see the transfer of this gold take place. This led me to the following code:

.text:004123A4    mov eax, [edx+2E0h] ; givers gold amount
.text:004123AA    mov ebx, [ecx+8]    ; amount of gold we offered
.text:004123AD    sub eax, ebx
.text:004123AF    mov [edx+2E0h], eax ; givers gold amount
.text:004123B5    mov edx, [ebp+var_2C]
.text:004123B8    mov eax, [ecx+8]    ; amount of gold we offered
.text:004123BB    add [edx+2E0h], eax ; receivers gold amount

Remember from our first attempts at modifying our gold count that the offset 0x2E0 probably refers to the gold value for a player. Here I had two ideas: first, we might get lucky and they might just trust the client to not offer more gold than it has, and second, we might be able to offer negative gold. We see here that if we were to offer a negative amount of gold as part of a treaty, and if the other player were to accept it, then we would in fact be taking money from them and giving it to us. Let's see what the restrictions are on what amounts of gold we can offer. To do this, I set a watch point on my player's gold amount that would trigger on reads. This leads me to the following code:

.text:00411BF0    mov edx, [ebp+var_10]; case 0x1
.text:00411BF3    mov eax, [edi+8]
.text:00411BF6    cmp [edx+2E0h], eax
.text:00411BFC    jge loc_4117E1      ; default
.text:00411C02    mov [esp+48h+var_48], offset aGameYouDonTH_0

If we follow aGameYouDonTH_0, we find the following string:

Game: You don't have enough gold, you can't accept treaty.

This is a pretty strong indication that this is in fact the code that does validation on our gold amounts in treaty offers. So, what we can see here is that they are in fact checking to make sure that we don't offer more money than we have, but they are doing a signed check. This means that they are not verifying that we are not offering negative amounts of gold.

Putting this to the test, I tweaked the packets I was sending to the server so that I could offer negative amounts of gold to a player. Sure enough, just as expected, it lets me offer negative amounts of gold, and it does in fact end up operating just as if they gave me gold. This is an interesting result, but there is still one major problem with it. No player would accept a gift of negative gold unless they were very stupid, so offering this to a human player would not work. Most games include at least a few computer players, and unlike most humans, computers are all completely stupid. Let's see if we can't dupe the AI players into accepting this sort of offer. Looking at the cross-references for the function where the gold check was done, we find that there are two places it is called: sub_42D470 and sub_4B58D0. Let's set breakpoints at both, then get the client to send a treaty offer to an AI player. When we do that, the breakpoint triggers here only if the AI player accepts the treaty. When it does, it leads us to the following code in sub_4B58D0:

.text:004B5A0F    mov eax, [ebp+accept_treaty]
.text:004B5A12    test eax, eax ; is accept_treaty < 0 ?
.text:004B5A14    js short loc_4B5A47
.text:004B5A16    mov edx, [ebp+var_10]
.text:004B5A19    mov eax, [edx]
.text:004B5A1B    mov [ebp+arg_0], edi
.text:004B5A1E    mov [ebp+arg_4], eax
.text:004B5A21    add esp, 2Ch
.text:004B5A24    pop ebx
.text:004B5A25    pop esi
.text:004B5A26    pop edi
.text:004B5A27    pop ebp
.text:004B5A28    jmp sub_411740 ; This calls the checking code

We know that it will only call sub_411740 if the AI player has accepted the treaty. We call sub_411740 only if accept_treaty is greater than or equal to zero. This means we can assume that the local stack variable, accept_treaty, is the indicator for when the AI accepts the treaty. So let's look at the cross-reference for accept_treaty and see what code affects this value:

.text:004B5950    call sub_4B5020
.text:004B5955    add [ebp+accept_treaty], eax

This code appears in a loop. Every time we call sub_4B5020, the return value is added to accept_treaty. So, we want to see what sub_4B5020 returns:

.text:004B5100    mov byte_4DB940, 1
.text:004B5107    mov eax, esi
.text:004B5109    mov ebx, [ebp+var_C]
.text:004B510C    mov esi, [ebp+var_8]
.text:004B510F    mov edi, [ebp+var_4]
.text:004B5112    mov esp, ebp
.text:004B5114    pop ebp
.text:004B5115    retn

This is the code that runs when sub_4B5020 returns, so we can see that whatever is in esi when it goes to return is what gets added to the treaty_accept value.

Looking at the top of this function, we find a big switch statement. Single stepping through this shows us that each case in the switch statement is for different clauses in the treaty. For gold, we see the following code:

.text:004B5250    cmp [ebp+var_11], 0; case 0x1
.text:004B5254    jz loc_4B53FE
.text:004B525A    mov ecx, [ebp+arg_0]
.text:004B525D    mov esi, [ecx+8]    ; gold offer amount
.text:004B5260    neg esi
.text:004B5262    jmp loc_4B5100      ; jump to return

This combines with the following code:

.text:004B53FE    mov ebx, [ebp+arg_0]
.text:004B5401    mov esi, [ebx+8]    ; gold offer amount
.text:004B5404    jmp loc_4B5100      ; jump to return

On the surface, what we see here is bad news. It seems that the amount of gold offered is directly added to the treaty_accept value. This means that a negative gold gift will effectively subtract from the value that must be positive for the AI player to accept. Thinking further on this, I noticed a way to get around this problem. There is no check here for integer wrapping when they add this value to treaty_accept. This means that if we can find any other values that will subtract a predictable amount from treaty_accept, we can send a value so small that further subtractions will cause an integer wrap and the AI will accept the treaty. One way that I know we can get the computer to further subtract from treaty_accept is to declare war on the player.

I tested this, and sure enough, the AI player changed its mind and decided to accept my gift of negative gold. The final procedure for this is to declare war on an AI player, and then immediately offer them a peace treaty along with a large negative amount of gold. They will accept, you will generously give them a huge debt, and you will get a huge treasury, as shown in Figure 23-5.

Figure 23-5. AI screenshot

Making Scripts with IDC

IDA's scripting feature is the most interactive part of interactive disassembler. By writing IDC scripts (IDA's internal scripting language), we can add hotkeys, automate large tedious tasks, and even add new features to the disassembler. This feature makes it relatively easy to take on difficult reverse engineering tasks such as unpacking obfuscated or encrypted binaries. It also makes it easy to clean up large areas of disassembly to account for shortcomings in IDA.

IDC Hello World

IDC has syntax very similar to C, so most IDA users should be able to pick it up quickly. The biggest difference between IDC and C is how variables are handled. IDC is not rigidly typed, so all variables are automatic and are only local in scope. Another limitation is that scalar variables only are supported, so we can't use structs, unions, or standard arrays. Also, every variable can be used as a Signed Long Int or as a String, similar to scripting languages such as Perl. The other differences you will pick up as we go along. Let's take a look at an example:

1 #include <idc.idc>
2
3 //This is a comment
4 static main (  ) {
5     Message("Hello World!\n");
6 }

The first line includes the IDC file that contains most of the core library functions. It's important to include this file if you're using any of the standard IDC functions. You can also include your own IDC files to add functionality to your own scripts. On line 3, you see that single-line comments are identical to the way C++ handles them. Line 4 shows us how to declare the main function. Like C, this is the first function to be called when the IDC file is loaded. The built-in function, Message, will print a formatted string to the messages window at the bottom of the IDA user interface.

Functions and variables

Let's take a look at some more examples of how to write scripts using functions and variables in IDC:

1 #include <idc.idc>
2
3 static foo (a, b, c) {
4     auto var1;
5     auto var2;
6
7     var1 = a+b;
8     var2 = "This is a string.";
9     Message("var1=%i, var2=%s, c=%s\n", var1, var2, c);
10     return(var1);
11 }
12
13 static main (  ) {
14     auto bar;
15
16     bar = foo(2, 3, "another string");
17     Message("bar=%i\n", bar);
18     return;
19 }

Defining functions is similar to the way it's done in C except that no type information is needed for return values or arguments. The needed conversions for these types are done automatically for us. Variables are all automatic locals, and they can all hold either a 32 bit signed integer, a character string (no longer than 255 characters), or a floating-point value. Note that a variable cannot be initialized on the same line that it is declared on. Calling functions is the same as it's done in C. Another useful feature of variables is that if you are running the script while IDA is in debugging mode, you can reference machine registers by name as if they were variables in your script.

Expressions and statements

Almost all operators in C are supported in IDC with the exception of complex assignment operators such as +=. IDC also supports almost all the same statement types that C does with the exception of switch case statements. There are some operators that are unique to IDC, like the calculate linear address operator:

[s, o]

Here s is the segment, or a string constant with the segment name, and o is the offset from the segment base. The calculation [s, o] is equivalent to the expression:

(s << 4) + o

IDC also has some expressions for converting types, such as:

long(expr)
char(expr)
float(expr)

Just like when doing type casting in C, these can truncate some values depending on which way the conversion is going.

Interacting with the IDA database

Up until now I haven't told you how to do anything useful. To be of any help, the IDC scripts need to be able to access the IDA database. IDA comes with a large set of built-in functions for reading and writing to the database. Just about anything you can do in the user interface, you can script in IDC. Table 23-1 provides a small list of the most common functions for accessing the database. For a more complete list, see the online documentation for IDC built-in functions.

Table 23-1. Functions for accessing the database

Built-in function name	Description
`AddHotKey`	Assigns IDC routine to a hotkey.
`AnalyseArea`	Performs full analysis on a region.
`AskStr`	Prompts the user for a string.
`AskFile`	Prompts the user for a filename.
`AskAddr`	Prompts the user for an address.
`AskLong`	Prompts the user for a long integer value.
`AskYN`	Prompts the user for "yes" or "no."
`Byte`	Get the value of a program byte.
`ChooseFunction`	Prompts the user for a function.
`Comment`	Adds a regular comment.
`CreateArray`	Creates an array.
`DelArrayElement`	Removes an array element.
`Dword`	Get the value of a program double word.
`Exec`	Executes an OS command.
`ExtLinA`	Adds a comment above a line.
`ExtLinB`	Adds a comment below a line.
`FindText`	Searches for text in the disassembly window.
`FindBinary`	Searches for binary strings in the image.
`FindImmediate`	Searches for immediate values.
`FindFuncEnd`	Finds the end of a function.
`GetCurrentLine`	Gets the disassembly line at the cursor.
`GetFirstIndex`	Gets the index of the first existing array element.
`GetFuncOffset`	Gets the function offset of an address.
`GetFunctionName`	Gets the name of a function at a given address.
`GetInputFile`	Gets the input filename.
`GetMnem`	Gets the mnemonic for an instruction at the given address.
`GetNextIndex`	Gets the index of the next existing array element.
`GetOpType`	Gets the type of an instruction operand.
`GetOperandValue`	Gets the value used in an instruction operand.
`GetOpnd`	Gets a string of an instruction operand.
`GetOriginalByte`	Gets the original value of a byte before patching is applied.
`GetPrevIndex`	Gets the index of the previous existing array element.
`ItemEnd`	Gets the address of the end of the item at a specified address.
`MakeArray`	Marks data at an address as an array type.
`MakeByte`	Marks data at an address as a byte.
`MakeCode`	Marks data at an address as code.
`MakeFunction`	Marks data or code at an address as a function.
`MaxEA`	Gets the largest valid program offset.
`Message`	Prints a message in the status window.
`MinEA`	Gets the smallest value program offset.
`ScreenEA`	Gets the program address at the cursor.
`SelEnd`	Gets the end address of a selected region.
`SelStart`	Gets the start address of a selected region.
`Wait`	Waits for the end of auto-analysis.

Adding graphical interfaces

IDC lets you get information from the user interactively using a series of functions:

1 #include <idc.idc>
2
3 static main (  ) {
4    auto s;
5    auto addr;
6    auto yn;
7    auto ynString;
8    auto i;
9
10    s = "";
11    addr = 0;
12    yn = 0;
13    l = 0;
14    s = AskStr("default string", "Enter A String");
15    if(s == 0) {
16        s = "You didn't give me a string.";
17    }
18    if((addr = AskAddr(0, "Give me an address.")) == BADADDR) {
19        addr = 0;
20    }
21    yn = AskYN(1, "Yes or no?");
22    if(yn == 1) {
23        ynString = "Yes";
24    } else if(yn == 0) {
25        ynString = "No";
26    } else {
27        ynString "You canceled";
28    }
29    i = AskLong(0, "Give me a long int.");
30    Warning("%s\n0x%08X\n%s\n%i\n", s, addr, ynString, i);
31    return;
32}

These are just a few of the built-in functions for interacting with the user. To get a full list, look in IDA's online help.

Faking global variables with arrays

There are times when writing particularly large IDC scripts where it becomes useful to keep persistent global data. Unfortunately, IDC does not allow for global variables of the normal type, but we can fake it. The standard way to make persistent global data in an IDC script is to use arrays. By creating arrays and then accessing them by their identifiers from another scope, we are able to keep the data persistent and then copy it into a local variable for use:

1 #include <idc.idc>
2
3 static main (  ) {
4    auto test1;
5    auto test2;
6    auto id;
7
8    test1 = 42;
9    test2 = "test data";
10    CreateArray("globalData");
11    id = GetArrayId("globalData");
12    SetArrayLong(id, 0, test1);
13    SetArrayString(id, 1, test2);
14    return;
15 }

Each index in the array can store either a 32 bit integer or a string value, and the index value is 32 bits long. To access this later from another scope, you would use code like this:

1 static foo (  ) {
2    auto test1;
3    auto test2;
4    auto id;
5
6    id = GetArrayId("globalData");
7    test1 = GetArrayElement(AR_LONG, id, 0);
8    test2 = GetArrayElement(AR_STR, id, 1);
9
10    Message("%i\n%s\n", test1, test2);
11    return;
12}

In this way, you can access your global data from any scope.

Making hotkeys

If you work on a disassembly project long enough, you will eventually find yourself doing the same operations over and over again. For example, sometimes programs will mangle their string values in an effort to obfuscate things and make our job difficult. When I see this, I will usually write a script to descramble these strings and then bind that function call to a hotkey. Here is an example of such a function being bound to a hotkey:

1 #include <idc.idc>
2
3 static descrambleString (  ) {
4    auto s;
5    auto off;
6    auto start;
7    auto length;
8    auto left;
9    auto c;
10    auto last;
11
12    start = ScreenEA(  );
13    off = start;
14    length = Dword(off);
15    if(length > 4096) {
16        Message("length too large\n");
17        return;
18    }
19    left = length;
20    s = "";
21    off = start + 4;
22    c = 0;
23    last = 0;
24    while(left > 0) {
25        left = left - 1;
26        c = Byte(off+left)%256;
27        c = (c^last)%256;
28        last = c;
29        s = form("%c%s", c, s);
30    }
31    MakeRptCmt(start, s);
32    return;
33 }
34
35 static main (  ) {
36    hotkey('z', "descrambleString");
37 }

The format of the strings in this example is a four-byte length followed by a string where each successive character is XORed with the character after it in the string. Let's break down this code. We define our function, descrambleString, to have no arguments because the hotkey will not provide us with any. We get the input for this function on line 12 by calling ScreenEA. This function returns the address at which the curser is currently set, and we use this value on line 14 when we call Dword to get the length value starting at that address. We then loop through the string, incrementing the address we're reading data from a byte at a time to build the string. On line 31, we make a repeatable comment on the line by the string using the MakeRptCmt function. I could have used PatchByte to modify the string in the database instead of just documenting what the string stands for, but whenever possible I prefer to keep my changes to the binary to a minimum. Finally, in the main routine we register a hotkey with IDA bound to the z key. Don't forget to put the function name we want this hotkey to trigger in quotes.

Automating large tasks

My final example is a script that I use to clean up large disassembly by searching for all void types in the data segment and heuristically detecting pointers values. When it finds these probable pointers, it sets their type appropriately. I find that this not only makes the data segment far more useful, but it also aids IDA in its auto-analysis so we get a better overall disassembly.

1 #include <idc.idc>
2
3 static fixDataOffsets (  ) {
4    auto off;
5    auto end;
6    auto val;
7
8    off = ScreenEA(  );
9    end = SegEnd(off);
10    off = SegStart(off);
11    while(off < end) {
12        // only fixup data offset...
13        if((hasValue(GetFlags(off))) && (GetMnem(off) == "")) {
14            val = Dword(off);
15            if((val >= 0x400000) && (val <= 0x500000)) {
16                opoffEx(off, 0, 2, −1, 0, 0x0);
17            }
18            wait(  );
19            off = ItemEnd(off);
20        } else if(GetMnem(off) != "") {
21            off = ItemEnd(off);
22        } else if(!hasValue(GetFlags(off))) {
23            Message("Complete.\n");
24            return;
25        } else {
26            off++;
27        }
28    }
29    Message("Complete");
30    return;
31 }

On line 8, we get the current address of the curser on the screen by calling ScreenEA. We then find the beginning and end of the segment this address is in by calling SegEnd and SegStart, respectively. The idea here is we want to loop through this segment (which is presumably a data segment) and find Dwords that look like they might be pointers. On line 13, we call hasValue to make sure that there is a value^[32] at this address, and we check to make sure that there is not a machine instruction at this address by calling GetMnem. If the value at a given address is an assembly instruction, GetMnem returns a string to its pneumonic; otherwise, it returns an empty string. If this check fails because the address has an assembly instruction at that address, we skip past the instruction by setting the offset to the address at the end of the current instruction by calling ItemEnd on line 21. If this check fails because there was no value located at this address, then we assume we have hit the end of usable data in this segment and we exit prematurely. If the address has a value and is not an instruction, then we do a quick check to see whether its value falls within a practical range for it to be a pointer. If it is, we call opoffEx to set the type at this address to be an offset. On line 18, we call wait to give IDA a chance to catch up with the changes we have made to the database and to perform auto-analysis based on the new type information we have given it. We could have left this out, but it will make IDA appear to be unresponsive during the run of our script and will almost lock up doing auto-analysis when we finish.

You may notice that we have not supplied a main routine with this script. I've left it out here because I don't want fixDataOffsets to run automatically when we load this. To start this function running, you will need to select File → "IC command..." from the top menu, enter fixDataOffsets( );, and press OK.

Warning

Be careful when using IDC routines that include infinite loops. If you don't provide for a way to exit the loop then it will never return from your script and you may be forced to terminate IDA without saving. To be safe, you should always save your work before testing out an IDC script for the first time.

Using IDA Plug-ins

IDA supports the use of plug-ins to add functionality to the IDA engine. The interface for creating your own plug-ins in C++ is available from the IDA Pro web site, and a number of community-created plug-ins are freely downloadable. To load a plug-in, simply place it in the plugins subdirectory of the IDA program folder. Alternatively, you can edit the plugins.cfg file to add another location.

^[32] An example of when there is no value at an address is an address in the stack segment when a program is not being debugged.