Shellcode is a sequence of instructions, or opcodes, represented in any format, and is generally used for executing the product of a successful exploit. Because it is a list of raw instructions that the CPU understands, it is architecture-specific, so an x86 Linux shellcode will not work on SPARC Solaris. Example 10-2 is a simple piece of shellcode for Linux x86 platforms. Its only role is to call the execve( )
system call with enough arguments to execute the /bin/sh program.
Example 10-2. Linux x86 shellcode that executes /bin/sh
\x31\xc0\x68\x2f\x73\x68\xaa\x88\x44\x24\x03\x68\x2f\x62\x69\x6e \x89\xe3\x50\x53\x89\xe1\xb0\x0b\x33\xd2\xcd\x80\xcc
This example is a great illustration to how small a simple shellcode can be, and robust enough to send as part of an HTTP request or in the payload of a custom packet. While it may make little obvious sense to a human, we will discover how it makes perfect sense to a computer.
Understanding a piece of existing shellcode begins with translating the machine instructions it is composed of into something that is more human-readable. The best tool for this is a dissasembler. A disassembler is an application that will translate raw machine code into assembly language. The ndisasm program provided in the Netwide Assembler (nasm) suite of tools is perfect for this, and it is able to take encoded binary from standard input. Here is the result of disassembling the shellcode from Example 10-2:
$ echo -ne "\x31\xc0\x68\x2f\x73\x68\xaa\x88\x44\x24\x03\x68\x2f\x62\x69\x6e \x89\xe3\x50\x53\x89\xe1\xb0\x0b\x33\xd2\xcd\x80\xcc" | ndisasm -u - 00000000 31C0 xor eax,eax 00000002 682F7368AA push dword 0xaa68732f 00000007 88442403 mov [esp+0x3],al 0000000B 682F62696E push dword 0x6e69622f 00000010 89E3 mov ebx,esp 00000012 50 push eax 00000013 53 push ebx 00000014 89E1 mov ecx,esp 00000016 B00B mov al,0xb 00000018 33D2 xor edx,edx 0000001A CD80 int 0x80 0000001C CC int3
Because the shellcode in Example 10-2 is not in any traditional structure, we have to use the -u
parameter to tell ndisasm that the binary input is in 32-bit mode.
Unfortunately, ndisasm is tied to the x86 architecture. Another tool that has disassembly capabilities and can work on many platforms is GNU objdump from the GNU binutils package. It is supported[29] on many popular architectures, such as i386, MIPS, Sparc, and it also handles many binary formats such as ELF, PE, and Mach-O. Like ndisasm, it works with raw instructions devoid of any structure, which is perfect for using it to work with shellcode:
$ objdump -m i386 -b binary -D /tmp/shellcode
/tmp/shellcode: file format binary
Disassembly of section .data:
0000000000000000 <.data>:
0: 31 c0 xor %eax,%eax
2: 68 2f 73 68 aa push $0xaa68732f
7: 88 44 24 03 mov %al,0x3(%esp)
b: 68 2f 62 69 6e push $0x6e69622f
10: 89 e3 mov %esp,%ebx
12: 50 push %eax
13: 53 push %ebx
14: 89 e1 mov %esp,%ecx
16: b0 0b mov $0xb,%al
18: 33 d2 xor %edx,%edx
1a: cd 80 int $0x80
1c: cc int3
The -b binary
switch instructs objdump to understand the file as a binary program without any format. Because this switch would leave ambiguity as to what platform the instructions in the file are for, we need to provide the architecture as well—hence, the -m i386
. The difference in syntax between the assembly code provided by objdump and that provided by ndisasm is AT&T and Intel synta,x respectively. If you like working with the Intel syntax, the -M intel
parameter for objdump will allow it. As we can see here, it works exactly the same for other platforms, such as MIPS:
$ objdump -m mips -b binary -D /tmp/shellcode_mips
/tmp/shellcode_mips: file format binary
Disassembly of section .data:
0000000000000000 <.data>:
0: ffff1004 bltzal zero,0x0
4: ab0f0224 li v0,4011
8: 55f04620 addi a2,v0,-4011
c: 6606ff23 addi ra,ra,1638
10: c2f9ec23 addi t4,ra,-1598
14: 6606bd23 addi sp,sp,1638
18: 9af9acaf sw t4,-1638(sp)
1c: 9ef9a6af sw a2,-1634(sp)
20: 9af9bd23 addi sp,sp,-1638
24: 21208001 move a0,t4
28: 2128a003 move a1,sp
2c: cccd4403 syscall 0xd1337
30: 2f62696e ldr t1,25135(s3)
34: 2f736800 0x68732f
Here we can see the last eight bytes have been decoded as instructions, whereas they are data and should have been decoded as the /bin/sh string. But because there is no structure, there is no way to tell data and code apart.
Sometimes the existing packaged disassemblers will just not do what you need them to. Whether you are doing something programmatically with the instructions, or you want to write a custom disassembler for a more advanced project, there is no need to write something completely from scratch. There are libraries that exist to assist you in your disassembling needs. One such library is libopcode, used by GNU binutils programs to handle assembly language on supported architectures. It is tightly linked with libbfd, which handles binary formats for binutils. Both libopcode and libbfd can be complicated to use, but it's nice to have a mainstream library that can handle many architectures (and they are simpler than writing something from scratch).
Example 10-3 is an example program that uses libopcode to disassemble the Linux /bin/sh shellcode from Example 10-2.
Example 10-3. The my_disas.c program
#include <stdio.h> #include <bfd.h> #include <dis-asm.h> int my_disas(unsigned char *buffer, bfd_size_type size, int vma, FILE *ostream) { int bytes; disassembler_ftype disassemble_fn; disassemble_info info; INIT_DISASSEMBLE_INFO(info, ostream, fprintf); /* Set up the target */ info.flavour = bfd_target_unknown_flavour; info.arch = bfd_arch_i386; /* enum bfd_architecture from bfd.h */ info.mach = bfd_mach_i386_i386; info.endian = BFD_ENDIAN_LITTLE; disassemble_fn = print_insn_i386; info.buffer = buffer; info.buffer_length = size; info.buffer_vma = vma; bytes = 0; while ( bytes < size ) { fprintf(ostream, "%8X : ", vma + bytes); bytes += (*disassemble_fn) (vma + bytes, &info); printf("\n"); } } unsigned char shellcode[] = "\x31\xc0\x68\x2f\x73\x68\xaa\x88\x44\x24\x03\x68\x2f\x62\x69\x6e\x89\xe3\x50" "\x53\x89\xe1\xb0\x0b\x33\xd2\xcd\x80\xcc"; int main ( int argc, char* argv[]) { my_disas(shellcode, sizeof(shellcode)-1, 0, stdout); }
Here is the result of compiling and running such a program:
$ gcc -g -o rawdisass rawdisass.c -lopcodes -lbfd $ ./rawdisass 0 : xor %eax,%eax 2 : push $0xaa68732f 7 : mov %al,0x3(%esp) B : push $0x6e69622f 10 : mov %esp,%ebx 12 : push %eax 13 : push %ebx 14 : mov %esp,%ecx 16 : mov $0xb,%al 18 : xor %edx,%edx 1A : int $0x80 1C : int3
Even though the libopcode library supports many architectures and is widely distributed through the GNU binutils package, it is hardly used by other programs that need opcode disassembly. There are several reasons for this; e.g., difficulty of use, the need to initialize many structures even for simple disassembly, and last but not least, the lack of any metadata provided with disassembled opcodes.
A project that should be mentioned is mammon's libdisasm library. It is a standalone library from the same authors of the Bastard disassembly environment. However, libdisasm is more than just a disassembling library—it also provides metadata on the disassembled instructions (e.g., their operands or whether they are read or written). This makes it easy to perform complex functions such as data propagation or determining whether two instructions can be swapped.
The libdisasm library can be used with multiple languages. The following is an example using the library with Python. Disassembly is done on special buffers, DisasmBuffer
, that hold the machine code. It also has an attribute that will be filled with a list of address/opcode couples. We only have to iterate over it and print its elements. The operands( )
method returns the operands list for the instruction along with the operand-associated metadata.
#! /usr/bin/env python import sys from libdisasm import disasm,disasmbuf dbuf = disasmbuf.DisasmBuffer(sys.stdin.read( )) d=disasm.LinearDisassembler( ) d.disassemble(dbuf) for rva,opcode in dbuf.instructions( ): operands = map(lambda x:"%s %-13s" % (x.access( ),"[%s]" % str(x)), opcode.operands( )) print "%08x: %-20s %s" % (rva,str(opcode), "".join(operands))
When applied to a shellcode, this small program will output something like this:
$ ./eggdis.py < /tmp/binsh.egg 00000000: push 11 r-- [11] rw- [esp] 00000002: pop eax -w- [eax] rw- [esp] 00000003: cdq rw- [eax] -w- [edx] 00000004: push edx r-- [edx] rw- [esp] 00000005: push 0x68732F6E r-- [0x68732F6E] rw- [esp] 0000000a: push 0x69622F2F r-- [0x69622F2F] rw- [esp] 0000000f: mov ebx, esp -w- [ebx] r-- [esp] 00000011: push edx r-- [edx] rw- [esp] 00000012: push ebx r-- [ebx] rw- [esp] 00000013: mov ecx, esp -w- [ecx] r-- [esp] 00000015: int −128 r-- [-128]
[29] The objdump program you will find on most platforms is usually tailored to those very platforms. But the GNU objdump can also be compiled to handle many other architectures. Debian users can use the binutils-multiarch package.