Analyzing switch Statements

switch statements are used by programmers (and malware authors) to make a decision based on a character or integer. For example, backdoors commonly select from a series of actions using a single byte value. switch statements are compiled in two common ways: using the if style or using jump tables.

Example 6-20 shows a simple switch statement that uses the variable i. Depending on the value of i, the code under the corresponding case value will be executed.

This switch statement has been compiled into the assembly code shown in Example 6-21. It contains a series of conditional jumps between and . The conditional jump determination is made by the comparison that occurs directly before each jump.

The switch statement has three options, shown at , , and . These code sections are independent of each other because of the unconditional jumps to the end of the listing. (You’ll probably find that switch statements are easier to understand using the graph shown in Figure 6-3.)

Example 6-21. Assembly code for the switch statement example in Example 6-20

00401013        cmp     [ebp+var_8], 1
00401017        jz      short loc_401027 
00401019        cmp     [ebp+var_8], 2
0040101D        jz      short loc_40103D
0040101F        cmp     [ebp+var_8], 3
00401023        jz      short loc_401053
00401025        jmp     short loc_401067 
00401027 loc_401027:
00401027        mov     ecx, [ebp+var_4] 
0040102A        add     ecx, 1
0040102D        push    ecx
0040102E        push    offset unk_40C000 ; i = %d
00401033        call    printf
00401038        add     esp, 8
0040103B        jmp     short loc_401067
0040103D loc_40103D:
0040103D        mov     edx, [ebp+var_4] 
00401040        add     edx, 2
00401043        push    edx
00401044        push    offset unk_40C004 ; i = %d
00401049        call    printf
0040104E        add     esp, 8
00401051        jmp     short loc_401067
00401053 loc_401053:
00401053        mov     eax, [ebp+var_4] 
00401056        add     eax, 3
00401059        push    eax
0040105A        push    offset unk_40C008 ; i = %d
0040105F        call    printf
00401064        add     esp, 8

Figure 6-3 breaks down each of the switch options by splitting up the code to be executed from the next decision to be made. Three of the boxes in the figure, labeled , , and , correspond directly to the case statement’s three different options. Notice that all of these boxes terminate at the bottom box, which is the end of the function. You should be able to use this graph to see the three checks the code must go through when var_8 is greater than 3.

From this disassembly, it is difficult, if not impossible, to know whether the original code was a switch statement or a sequence of if statements, because a compiled switch statement looks like a group of if statements—both can contain a bunch of cmp and Jcc instructions. When performing your disassembly, you may not always be able to get back to the original source code, because there may be multiple ways to represent the same code constructs in assembly, all of which are valid and equivalent.

The next disassembly example is commonly found with large, contiguous switch statements. The compiler optimizes the code to avoid needing to make so many comparisons. For example, if in Example 6-20 the value of i were 3, three different comparisons would take place before the third case was executed. In Example 6-22, we add one case to Example 6-20 (as you can see by comparing the listings), but the assembly code generated is drastically different.

The more efficient assembly code in Example 6-23 uses a jump table, shown at , which defines offsets to additional memory locations. The switch variable is used as an index into the jump table.

In this example, ecx contains the switch variable, and 1 is subtracted from it in the first line. In the C code, the switch table range is 1 through 4, and the assembly code must adjust it to 0 through 3 so that the jump table can be properly indexed. The jump instruction at is where the target is based on the jump table.

In this jump instruction, edx is multiplied by 4 and added to the base of the jump table (0x401088) to determine which case code block to jump to. It is multiplied by 4 because each entry in the jump table is an address that is 4 bytes in size.

Example 6-23. Assembly code for the switch statement example in Example 6-22

00401016        sub     ecx, 1
00401019        mov     [ebp+var_8], ecx
0040101C        cmp     [ebp+var_8], 3
00401020        ja      short loc_401082
00401022        mov     edx, [ebp+var_8]
00401025        jmp     ds:off_401088[edx*4] 
0040102C   loc_40102C:
              ...
00401040        jmp     short loc_401082
00401042   loc_401042:
              ...
00401056        jmp     short loc_401082
00401058   loc_401058:
              ...
0040106C        jmp     short loc_401082
0040106E   loc_40106E:
              ...
00401082   loc_401082:
00401082        xor     eax, eax
00401084        mov     esp, ebp
00401086        pop     ebp
00401087        retn
00401087   _main   endp
00401088  off_401088  dd offset loc_40102C
0040108C               dd offset loc_401042
00401090               dd offset loc_401058
00401094               dd offset loc_40106E

The graph in Figure 6-4 for this type of switch statement is clearer than the standard disassembly view.

As you can see, each of the four cases is broken down clearly into separate assembly code chunks. These chunks appear one after another in a column after the jump table determines which one to use. Notice that all of these boxes and the initial box terminate at the right box, which is the end of the function.