Summarizing data in the folder_report() function

At this point, we've collected a fair amount of information about messages and folders. We use this code block to export that data into a simple report for review. To create this report, we require the message_list and folder_name variables. On line 146, we check whether there're any entries in the message_list; if not, we log a warning and return the function to prevent any of the remaining code from running.

If the message_list has content, we start to create a CSV report. We first generate the filename in the output directory by passing our desired filename into the make_path() function to get the absolute path of the file that we wish to write to. Using this file path, we open the file in wb mode to write our CSV file and to prevent a bug that would add an extra line between the rows of our reports on line 152. In the following line, we define the list of headers for the output document.

This list should reflect an ordered list of columns we wish to report. Feel free to modify lines 153 and 154 to reflect a preferred order or additional rows. All of the additional rows must be valid keys from all dictionaries within the message_list variable.

Following our headers, we initiate the csv.DictWriter class on line 155. If you recall from the start of our script, we imported the unicodecsv library to handle Unicode characters when writing to a CSV. During this import, we used the as keyword to rename the module from unicodecsv to csv within our script. This module provides the same methods as the standard library, so we can continue using the familiar function calls we have seen with the csv library. In this initialization of DictWriter(), we pass along the open file object, the field names, and an argument to tell the class what to do with unused information within the message_list dictionaries. Since we're not using all of the keys within the dictionaries in the message_list list, we need to tell the DictWriter() class that we would like to ignore these values, as follows:

138 def folder_report(message_list, folder_name):
139 """
140 The folder_report function generates a report per PST folder
141 :param message_list: A list of messages discovered
142 during scans
143 :folder_name: The name of an Outlook folder within a PST
144 :return: None
145 """
146 if not len(message_list):
147 logger.warning("Empty message not processed")
148 return
149
150 # CSV Report
151 fout_path = make_path("folder_report_" + folder_name + ".csv")
152 fout = open(fout_path, 'wb')
153 header = ['creation_time', 'submit_time', 'delivery_time',
154 'sender', 'subject', 'attachment_count']
155 csv_fout = csv.DictWriter(fout, fieldnames=header,
156 extrasaction='ignore')
157 csv_fout.writeheader()
158 csv_fout.writerows(message_list)
159 fout.close()

With the csv_fout variable initialized and configured, we can begin writing our header data using the writeheaders() method call on line 157. Next, we write the dictionary fields of interest to the file using the writerows() method. Upon writing all the rows, we close the fout file to write it to disk and release the handle on the object as seen on line 159.

On lines 119 through 141, we prepare the dictionaries from the message_list for use in generating HTML report statistics. We need to invoke the global statement as seen on line 162 to allow us to edit the date_list global variable. We then open two text files to record a raw list of all of the body content and sender names. These files will be used in a later section to generate our statistics and allow the collection of this data in a manner that doesn't consume large amounts of memory. These two text files, seen on lines 163 and 164, are opened in the a mode, which will create the file if it doesn't exist or append the data to the end of the file if it exists.

On line 165, we start a for loop to iterate through each message, m, in message_list. If the message body key has a value, then we write the value to the output file with two line breaks to separate this content. Following this, on lines 168 and 169, we perform a similar process on the sender key and its value. In this instance, we'll only use one line break so that we can iterate through it easier in a later function:

162     global date_list # Allow access to edit global variable
163 body_out = open(make_path("message_body.txt"), 'a')
164 senders_out = open(make_path("senders_names.txt"), 'a')
165 for m in message_list:
166 if m['body']:
167 body_out.write(m['body'] + "\n\n")
168 if m['sender']:
169 senders_out.write(m['sender'] + '\n')

After collecting the message content and senders, we accumulate the date information. To generate our heat map, we'll combine all three dates of activity into a single count to form a single chart. After checking that a valid date value is available, we gather the day of the week to determine which of the dictionaries within the date_list list we wish to update.

The Python datetime.datetime library has a weekday() method and an .hour attribute, which allows us to access the values as integers and handles the messy conversions for us. The weekday() method returns an integer from 0 to 6, where 0 represents Monday and 6 represents Sunday. The .hour attribute returns an integer between 0 and 23, representing time in a 24-hour fashion, though the JavaScript we're using for the heat map requires an integer between 1 and 24 to process correctly. Because of this, we add 1 to each of the hour values as seen on lines 175, 181, and 187.

We now have the correct weekday and time of day keys we need to update the value in the date_list. Upon completing the loop, we can close the two file objects on lines 189 and 190:

171         # Creation Time
172 c_time = m['creation_time']
173 if c_time isn't None:
174 day_of_week = c_time.weekday()
175 hour_of_day = c_time.hour + 1
176 date_list[day_of_week][hour_of_day] += 1
177 # Submit Time
178 s_time = m['submit_time']
179 if s_time isn't None:
180 day_of_week = s_time.weekday()
181 hour_of_day = s_time.hour + 1
182 date_list[day_of_week][hour_of_day] += 1
183 # Delivery Time
184 d_time = m['delivery_time']
185 if d_time isn't None:
186 day_of_week = d_time.weekday()
187 hour_of_day = d_time.hour + 1
188 date_list[day_of_week][hour_of_day] += 1
189 body_out.close()
190 senders_out.close()