While the direct content of a message is displayed clearly in our mail readers, to be read or deleted as we see fit, an attachment poses a dilemma. We cannot easily determine its contents without examining it, but that process alone can expose us to any computer virus that it might contain. This section will explain how you can safely extract the contents of a suspicious attachment and determine their function. Consider this email as an example:
From: support@symantec.com To: XYZ@craic.com Subject: Re: Submit a Virus Sample Date: Sat, 15 Jan 2005 23:58:39 +0800 The sample file you sent contains a new virus version of mydoom.j. Please clean your system with the attached signature. Sincerly, Robert Ferrew +++ Attachment: No Virus found +++ MessageLabs AntiVirus - www.messagelabs.com
Although that sounds vaguely convincing, I’m not going to trust an email from an antivirus company, Symantec, which appears to screen its messages with software from its competitor, MessageLabs. We can assume that the attached file, datfiles.zip, contains a virus or something equally nasty. How can we isolate the payload and figure out what it represents?
It should go without saying that you should not attempt any extraction or analysis of viruses, worms, or spyware on any Windows system.
On a Unix system, download the entire email message into a new directory and look at the text. Here are the relevant lines from our example. It has three parts: the mail headers, the text of the message, and a large block of encoded text.
From: support@symantec.com To: XYZ@craic.com Subject: Re: Submit a Virus Sample Date: Sat, 15 Jan 2005 23:58:39 +0800 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0016----=_NextPart_000_0016" This is a multi-part message in MIME format. ------=_NextPart_000_0016----=_NextPart_000_0016 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit The sample file you sent contains a new virus version of mydoom.j. [...] ------=_NextPart_000_0016----=_NextPart_000_0016 Content-Type: application/octet-stream; name="datfiles.zip" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="datfiles.zip" UEsDBAoAAAAAAEtqLzKjiB3egHMAAIBzAABTAAAAZG9jdW1lbnQudHh0ICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg [...] ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC5l eGVQSwUGAAAAAAEAAQCBAAAA8XMAAAAA ------=_NextPart_000_0016----=_NextPart_000_0016--
The Content-Type
header line
tells us that the message is in MIME format with multiple parts in
potentially different formats:
Content-Type: multipart/mixed; boundary="----=_NextPart_000_0016----=_NextPart_000_0016"
It also tells us the string that is used to mark the boundaries between the different parts. It doesn’t matter what the string is, as long as it doesn’t occur in the real text of any part. Typically these are long cryptic strings such as the one used here:
----=_NextPart_000_0016----=_NextPart_000_0016
Looking through the message, we can see three lines that match this string. These are the boundaries of the two parts to this email, which are the text of the message, followed by the encoded attachment. The third instance of the boundary string is slightly different. It ends with two dashes. This signifies that there are no more parts to the message after this.
Each part of the message, defined by these strings, has its own header lines that tell us what format it is in. The headers for the message part are:
Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit
These tell us this block of content is plain text using a specific character set, which in this case is standard for languages that don’t require any special characters. This would be different if the text used, say, Japanese characters. More interesting are the headers for the attachment:
Content-Type: application/octet-stream; name="datfiles.zip" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="datfiles.zip"
Here the content type is application/octet-stream
, which means that it
is an encoded version of the original. Encoding is a way to represent
binary data, such as executables or images, as simple ASCII text that
can be transmitted via email. The particular encoding used here is given
in the Content-Transfer-Encoding
header and is Base64, which is perhaps the most common type. I talk a
bit more about Base64 in Chapter
4 in the context of disguising information. The Content-Disposition
header tells us the
filename that should be used if and when the attachment block is saved
to disk in the recipient’s email client. These headers are followed by a
large block of indecipherable characters, which represents the encoded
attachment.
To reveal what this contains, you need to decode this block. Your email client will do this for you but, as that is the way in which the payload of a virus is normally installed, you need to take a more cautious approach.
A simple and effective tool for this purpose is munpack
, which was written by John G. Myers at Carnegie Mellon
University. It can be downloaded, along with its partner mpack
, from ftp://ftp.andrew.cmu.edu/pub/mpack/. The tools are
compiled and installed on a Unix or Mac OS X system in a default
location by the commands make
and
make install
. Windows users will find
binary executables at a number of download sites.
munpack
is very easy to use.
Given the name of the file containing your email, it will extract the
attachment and report the name of the file it saved its contents
to.
% munpack virus_sample.eml
datfiles.zip (application/octet-stream)
It actually creates two files: datfiles.zip and one called datafile.desc. The latter contains the contents of the message part of the email.
Having successfully extracted the payload from its delivery
mechanism, you can now focus on what it contains. The .zip suffix suggests that it is a zip archive
containing one or more files. But why should you trust that? The
standard Unix command file
can help
us here. It knows about a wide range of file types and uses several
approaches to make a best guess. You simply pass it the filename:
% file datfiles.zip
datfiles.zip: Zip archive data, at least v1.0 to extract
This does indeed appear to be a zip file, so let’s unpack it and
see what’s inside. unzip
is a standard Unix program that will take care of this.
Windows users can use an equivalent tool, such as winzip
or pkunzip
. If you want to play it safe, then create a new
directory, move the zip file into that and unpack it there so as not to
overwrite any other files that might have the same names. To be
especially cautious, you can have unzip
list the files first without extracting
them using the -l
option:
% unzip -l datfiles.zip
Archive: datfiles.zip
Length Date Time Name
-------- ---- ---- ----
29568 01-15-05 13:18 document.txt
.exe
-------- -------
29568 1 file
This tells us the file contains a single file called document.txt...or does it? Actually it is a single file called document.txt .exe, where the .txt and .exe are separated by 67 spaces. This trick is often used in virus or spyware attachments. By padding out the filename with whitespace the creator hopes that you will not notice the .exe suffix that indicates that it is an executable. For the sake of readability, I’ve renamed the file to document.txt.exe in the following paragraphs.
Now let’s throw caution to the wind and actually unzip
the file and then run file
on its product:
% unzip datfiles.zip
Archive: datfiles.zip extracting: document.txt.exe% file document.txt.exe
document.txt . exe: MS-DOS executable (EXE), OS/2 or MS Windows
This confirms the suspicion that this is a Windows executable file. Now, we’re getting pretty close to what is most likely a virus. While it may have no effect on a Linux or Mac OS X system, I just don’t want to push my luck by trying to run the program and seeing what happens. And, of course, if you are doing this on a Windows system then don’t run it! Not only that, but if you use Samba to share filesystems between Unix and Windows, then make sure no one is able to run it from the Windows side by accident!
We can go a bit further without risking any damage. Although most
of the content of an executable program is binary, there are often text
strings embedded therein. These represent things such as error messages,
library names, and so forth. We can look for these using another
standard Unix program called strings
.
This will interpret a binary file as text and output any strings of at
least four printable characters that it finds. You will want to pipe the
output into more
as it produces a lot
of garbage, but hidden in there are real words and, sometimes, complete
sentences. To see what it can reveal about a regular program, try it out
on a standard Unix program:
% strings /bin/sh | more
Running it on our suspect file produces a large amount of output, of which a sampling is shown here:
% strings document.txt.exe | more
!Windows Program
KERNEL32.dll
LoadLibraryA
GetProcAddress
bAZ\D$
+;_+
RyR
[...]
CU'l
nfig9x.dql
Protec
KERN`L32.dql
[...]
There is not a lot of recognizable text, but there are a few interesting things. The first few lines presumably refer to Windows linked libraries, then we get into all the gobbledygook. But, down near the bottom is the word “Protec”. That looks out of place and worth running through Google to see what is known about it. Sure enough, there is a worm called Protec.B listed on the web sites of antivirus companies, so perhaps this is an instance of that payload.
Windows users do not have the tools file
or strings
built in to their operating system.
This can be addressed by installing the Cygwin package (available at http://www.cygwin.com/), which provides Windows
equivalents of most common Unix command-line tools.
Delving any deeper into the dissection of viruses and worms would be beyond the scope of this book. But you can learn a lot by applying these simple Unix commands to the attachments that you come across in your Inbox. Look at a few examples of viruses or worms and you will notice similar approaches taken by their authors to their packaging and the naming of files. Even more interesting can be attachments that attempt to install spyware. Dissecting these can lead to a series of files that would, if they got the chance, install themselves on a Windows system and seriously impact its performance. To learn more about the disassembly of binary executables and similar techniques, you might want to look at Security Warrior by Cyrus Peikari and Anton Chuvakin (O’Reilly).