Message Attachments

While the direct content of a message is displayed clearly in our mail readers, to be read or deleted as we see fit, an attachment poses a dilemma. We cannot easily determine its contents without examining it, but that process alone can expose us to any computer virus that it might contain. This section will explain how you can safely extract the contents of a suspicious attachment and determine their function. Consider this email as an example:

    From: support@symantec.com
    To: XYZ@craic.com
    Subject: Re: Submit a Virus Sample
    Date: Sat, 15 Jan 2005 23:58:39 +0800

    The sample file you sent contains a new virus version of mydoom.j.
    Please clean your system with the attached signature.

    Sincerly,
     Robert Ferrew

    +++ Attachment: No Virus found
    +++ MessageLabs AntiVirus - www.messagelabs.com

Although that sounds vaguely convincing, I’m not going to trust an email from an antivirus company, Symantec, which appears to screen its messages with software from its competitor, MessageLabs. We can assume that the attached file, datfiles.zip, contains a virus or something equally nasty. How can we isolate the payload and figure out what it represents?

Warning

It should go without saying that you should not attempt any extraction or analysis of viruses, worms, or spyware on any Windows system.

On a Unix system, download the entire email message into a new directory and look at the text. Here are the relevant lines from our example. It has three parts: the mail headers, the text of the message, and a large block of encoded text.

    From: support@symantec.com
    To: XYZ@craic.com
    Subject: Re: Submit a Virus Sample
    Date: Sat, 15 Jan 2005 23:58:39 +0800
    MIME-Version: 1.0
    Content-Type: multipart/mixed;
            boundary="----=_NextPart_000_0016----=_NextPart_000_0016"

    This is a multi-part message in MIME format.

    ------=_NextPart_000_0016----=_NextPart_000_0016
    Content-Type: text/plain;
            charset="Windows-1252"
    Content-Transfer-Encoding: 7bit

    The sample file you sent contains a new virus version of mydoom.j.
    [...]

    ------=_NextPart_000_0016----=_NextPart_000_0016
    Content-Type: application/octet-stream;
            name="datfiles.zip"
    Content-Transfer-Encoding: base64
    Content-Disposition: attachment;
            filename="datfiles.zip"

    UEsDBAoAAAAAAEtqLzKjiB3egHMAAIBzAABTAAAAZG9jdW1lbnQudHh0ICAg
    ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
    [...]
    ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC5l
    eGVQSwUGAAAAAAEAAQCBAAAA8XMAAAAA

    ------=_NextPart_000_0016----=_NextPart_000_0016--

The Content-Type header line tells us that the message is in MIME format with multiple parts in potentially different formats:

    Content-Type: multipart/mixed;
            boundary="----=_NextPart_000_0016----=_NextPart_000_0016"

It also tells us the string that is used to mark the boundaries between the different parts. It doesn’t matter what the string is, as long as it doesn’t occur in the real text of any part. Typically these are long cryptic strings such as the one used here:

    ----=_NextPart_000_0016----=_NextPart_000_0016

Looking through the message, we can see three lines that match this string. These are the boundaries of the two parts to this email, which are the text of the message, followed by the encoded attachment. The third instance of the boundary string is slightly different. It ends with two dashes. This signifies that there are no more parts to the message after this.

Each part of the message, defined by these strings, has its own header lines that tell us what format it is in. The headers for the message part are:

    Content-Type: text/plain;
            charset="Windows-1252"
    Content-Transfer-Encoding: 7bit

These tell us this block of content is plain text using a specific character set, which in this case is standard for languages that don’t require any special characters. This would be different if the text used, say, Japanese characters. More interesting are the headers for the attachment:

    Content-Type: application/octet-stream;
            name="datfiles.zip"
    Content-Transfer-Encoding: base64
    Content-Disposition: attachment;
            filename="datfiles.zip"

Here the content type is application/octet-stream, which means that it is an encoded version of the original. Encoding is a way to represent binary data, such as executables or images, as simple ASCII text that can be transmitted via email. The particular encoding used here is given in the Content-Transfer-Encoding header and is Base64, which is perhaps the most common type. I talk a bit more about Base64 in Chapter 4 in the context of disguising information. The Content-Disposition header tells us the filename that should be used if and when the attachment block is saved to disk in the recipient’s email client. These headers are followed by a large block of indecipherable characters, which represents the encoded attachment.

To reveal what this contains, you need to decode this block. Your email client will do this for you but, as that is the way in which the payload of a virus is normally installed, you need to take a more cautious approach.

A simple and effective tool for this purpose is munpack , which was written by John G. Myers at Carnegie Mellon University. It can be downloaded, along with its partner mpack , from ftp://ftp.andrew.cmu.edu/pub/mpack/. The tools are compiled and installed on a Unix or Mac OS X system in a default location by the commands make and make install. Windows users will find binary executables at a number of download sites.

munpack is very easy to use. Given the name of the file containing your email, it will extract the attachment and report the name of the file it saved its contents to.

            % munpack virus_sample.eml
    datfiles.zip (application/octet-stream)

It actually creates two files: datfiles.zip and one called datafile.desc. The latter contains the contents of the message part of the email.

Having successfully extracted the payload from its delivery mechanism, you can now focus on what it contains. The .zip suffix suggests that it is a zip archive containing one or more files. But why should you trust that? The standard Unix command file can help us here. It knows about a wide range of file types and uses several approaches to make a best guess. You simply pass it the filename:

            % file datfiles.zip
    datfiles.zip: Zip archive data, at least v1.0 to extract

This does indeed appear to be a zip file, so let’s unpack it and see what’s inside. unzip is a standard Unix program that will take care of this. Windows users can use an equivalent tool, such as winzip or pkunzip . If you want to play it safe, then create a new directory, move the zip file into that and unpack it there so as not to overwrite any other files that might have the same names. To be especially cautious, you can have unzip list the files first without extracting them using the -l option:

            % unzip -l datfiles.zip
    Archive:  datfiles.zip
      Length     Date   Time    Name
     --------    ----   ----    ----
        29568  01-15-05 13:18   document.txt
                               .exe
     --------                   -------
        29568                   1 file

This tells us the file contains a single file called document.txt...or does it? Actually it is a single file called document.txt .exe, where the .txt and .exe are separated by 67 spaces. This trick is often used in virus or spyware attachments. By padding out the filename with whitespace the creator hopes that you will not notice the .exe suffix that indicates that it is an executable. For the sake of readability, I’ve renamed the file to document.txt.exe in the following paragraphs.

Now let’s throw caution to the wind and actually unzip the file and then run file on its product:

            % unzip datfiles.zip
    Archive:  datfiles.zip
     extracting: document.txt.exe
    % file document.txt.exe
    document.txt                                                                   .
    exe: MS-DOS executable (EXE), OS/2 or MS Windows

This confirms the suspicion that this is a Windows executable file. Now, we’re getting pretty close to what is most likely a virus. While it may have no effect on a Linux or Mac OS X system, I just don’t want to push my luck by trying to run the program and seeing what happens. And, of course, if you are doing this on a Windows system then don’t run it! Not only that, but if you use Samba to share filesystems between Unix and Windows, then make sure no one is able to run it from the Windows side by accident!

We can go a bit further without risking any damage. Although most of the content of an executable program is binary, there are often text strings embedded therein. These represent things such as error messages, library names, and so forth. We can look for these using another standard Unix program called strings. This will interpret a binary file as text and output any strings of at least four printable characters that it finds. You will want to pipe the output into more as it produces a lot of garbage, but hidden in there are real words and, sometimes, complete sentences. To see what it can reveal about a regular program, try it out on a standard Unix program:

            % strings /bin/sh | more

Running it on our suspect file produces a large amount of output, of which a sampling is shown here:

            % strings document.txt.exe | more
    !Windows Program
    KERNEL32.dll
    LoadLibraryA
    GetProcAddress
    bAZ\D$
    +;_+
    RyR
    [...]
    CU'l
    nfig9x.dql
    Protec
    KERN`L32.dql
    [...]

There is not a lot of recognizable text, but there are a few interesting things. The first few lines presumably refer to Windows linked libraries, then we get into all the gobbledygook. But, down near the bottom is the word “Protec”. That looks out of place and worth running through Google to see what is known about it. Sure enough, there is a worm called Protec.B listed on the web sites of antivirus companies, so perhaps this is an instance of that payload.

Windows users do not have the tools file or strings built in to their operating system. This can be addressed by installing the Cygwin package (available at http://www.cygwin.com/), which provides Windows equivalents of most common Unix command-line tools.

Delving any deeper into the dissection of viruses and worms would be beyond the scope of this book. But you can learn a lot by applying these simple Unix commands to the attachments that you come across in your Inbox. Look at a few examples of viruses or worms and you will notice similar approaches taken by their authors to their packaging and the naming of files. Even more interesting can be attachments that attempt to install spyware. Dissecting these can lead to a series of files that would, if they got the chance, install themselves on a Windows system and seriously impact its performance. To learn more about the disassembly of binary executables and similar techniques, you might want to look at Security Warrior by Cyrus Peikari and Anton Chuvakin (O’Reilly).