Parsing Outlook PST Containers

Electronic mail (email) continues to be one of the most common methods of communication in the workplace, surviving the number of new communication services present in today's world. Emails can be sent from computers, websites, and the phones that're in so many pockets across the globe. This medium allows for the transmission of information in the form of text, HTML, attachments, and more in a reliable fashion. It's no wonder, then, that emails can play a large part in investigations, especially for cases involving the workplace. In this chapter, we're going to work with a common email format, Personal Storage Table (PST), used by Microsoft Outlook to store email content in a single file.

The script we'll develop in this chapter introduces us to a series of operations available through the libpff library developed by Joachim Metz. This library allows us to open PST file and explore its contents in a Pythonic manner. Additionally, the code we build demonstrates how to create dynamic, HTML-based, graphics to provide additional context to spreadsheet-based reports. For these reports, we'll leverage the Jinja2 module, introduced in Chapter 5, Databases in Python, and the D3.js framework to generate our dynamic HTML-based charts.

The D3.js project is a JavaScript framework that allows us to design informative and dynamic charts without much effort. The charts used in this chapter are open source examples of the framework shared with the community at https://github.com/d3/d3. Since this book doesn't focus on JavaScript, nor does it introduce the language, we won't cover the implementation details to create these charts. Instead, we'll demonstrate how to add our Python results to a pre-existing template.

Finally, we'll use a sample PST file, which has a large variety of data across time, to test our script. As always, we recommend running any code against test files before using it in casework to validate the logic and feature coverage. The library used in this chapter is in active development and is labeled experimental by the developer.

The following are the topics covered in this chapter:

The code for this chapter is developed and tested using Python 2.7.15.