Introducing the struct module

The struct module is part of the standard Python library and is incredibly useful. The struct library is used to convert C structures to or from binary data. Full documentation for this module can be found at http://docs.python.org/3/library/struct.html.

For forensic purposes, the most important function in the struct module is the unpack() method. This method takes a format string representing the objects to be extracted from the binary data. It is important that the size dictated by the format string correlates to the size of the binary data supplied to the function.

The format string informs the unpack() function of what kind of data is in the binary object and how it should be interpreted. If we do not correctly identify the types of data or try to unpack more or less than what is provided, the struct module will throw an exception. The following is a table of the most common characters we use to build our format strings. The standard size column indicates the expected size of the binary object in bytes:

Character	Python object	Standard size (bytes)
`h`	Integer	2
`i`	Integer	4
`q`	Integer	8
`s`	String	1
`x`	N/A	N/A

There are additional characters that can be used in format strings. For example, other characters can interpret binary data as floats, Booleans, and other various C structures. The x character is simply a padding character that can be used to ignore bytes we're not interested in.

Additionally, an optional starting character can be used to define byte order, size, and alignment. The default is native byte order, size, and alignment. As we cannot predict the environment the script might be running on, it is often not advisable to use any native option. Instead, we can specify little or big endian byte order with standard sizes using the <" and "> symbols, respectively. Let's practice with a few examples.

First, open an interactive prompt and import struct. Next, we assign 0x01000000 to a variable. In Python 3, hex notation is specified by an escape character and an x before every two hexadecimal characters. The length of our hex data is four bytes, and to interpret this as an integer, we use the i character. Interpreting the hex as a little endian integer returns a value of 1:

>>> import struct 
>>> raw_data = b'\x01\x00\x00\x00' # Integer (1) 
>>> print(struct.unpack('<i', raw_data)) # Little-Endian 
(1,)

The <i and >i represents the string format. We are telling the unpack() method to interpret raw_data as a four-byte integer in little or big endian byte ordering. The struct module returns the unpacked data as a tuple. By default, Python will print a single element tuple in parenthesis with a trailing comma, as seen in the following output:

>>> print(struct.unpack('>i', raw_data)) # Big-Endian 
(16777216,) 
>>> print(type(struct.unpack('>i', raw_data))) 
<class 'tuple'>

Let's look at another example. We can interpret rawer_data as three 4-byte integers by using three i characters. Alternatively, we can prepend a number to the format character to parse multiple values in a row. In both cases, when interpreted as a little endian, we receive the integers 1, 5, and 4. If we aren't interested in the middle integer, we can skip it with the 4x character:

>>> rawer_data = b'\x01\x00\x00\x00\x05\x00\x00\x00\x04\x00\x00\x00' 
>>> print(struct.unpack('<iii', rawer_data)) 
(1, 5, 4) 
>>> print(struct.unpack('<3i', rawer_data)) 
(1, 5, 4) 
>>> print(struct.unpack('<i4xi', rawer_data)) # "skip" 4 bytes 
(1, 4)

We raised the possibility of errors with struct earlier in this section. Now, let's purposely create errors with struct to understand what they mean. We receive an error for the following two examples because we tried to unpack() more or fewer values than were actually present in the rawer_data variable used previously. This can cause some initial frustration when trying to unpack a large amount of binary data. Always be sure to check the math, the byte order, and whether the size is standard or native:

>>> print(struct.unpack('<4i', rawer_data)) 
struct.error: unpack requires a buffer of 16 bytes
>>> print(struct.unpack('<2i', rawer_data)) 
struct.error: unpack requires a buffer of 8 bytes

Let's take it one step further and parse a UserAssist value using the struct module. We will parse a Windows XP value, which represents the easiest scenario as it is only 16 bytes in length. The byte offsets of a Windows XP UserAssist value are recorded in the following table:

Byte offset	Value	Object
0-3	Session ID	Integer
4-7	Count	Integer
8-15	FILETIME	Integer

The following hex dump is saved into the file Neguhe Qrag.bin. The file is packaged with the code bundle that can be downloaded from https://packtpub.com/books/content/support:

0000: 0300 0000 4800 0000  |....H... 
0010: 01D1 07C4 FA03 EA00  |........

When unpacking data from a file object, we need to open it in the rb mode rather than the default r mode to ensure that we can read the data as bytes. Once we have the raw data, we can parse it using our specific character format. We know that the first 8 bytes are two 4-byte integers (2i), and then one 8-byte integer (q) representing the FILETIME of the UserAssist value. We can use indexing on the returned tuple to print out each extracted integer:

>>> rawest_data = open('Neguhe Qrag.bin', 'rb').read()
>>> parsed_data = struct.unpack('<2iq', rawest_data)
>>> print('Session ID: {}, Count: {}, FILETIME: {}'.format(parsed_data[0], parsed_data[1], parsed_data[2]))
...
Session ID: 3, Count: 72, FILETIME: 6586952011847425

Once we have parsed the UserAssist values in our script, we will present the results in a report-ready format. In the past, we have used CSV and HTML for output reports. Frequently, reports are often reviewed in spreadsheet format using software such as Microsoft Excel. To provide reports that fully leverage this software, we will learn how to create XSLX-formatted spreadsheets as an output of our script.