Tuning the parse_setupapi() function

The parse_setupapi() function accepts the path of the setupapi.dev.log file as its only input. Before opening the file, we must initialize the device_list variable on line 68 so that we can store extracted device records in a list:

060 def parse_setupapi(setup_log):
061 """
062 Read data from provided file for Device Install Events for
063 USB Devices
064 :param setup_log: str - Path to valid setup api log
065 :return: list of tuples - Tuples contain device name and date
066 in that order
067 """
068 device_list = list()

Starting on line 69, we open the input file in a novel manner; the with statement opens the file as in_file and allows us to manipulate data within the file without having to worry about closing the file afterward. Inside this with loop is a for loop that iterates across each line, which provides superior memory management. In the previous iteration, we used the .readlines() method to read the entire file into a list by line; though not very noticeable on smaller files, the .readlines() method on a larger file would cause performance issues on systems with limited resources:

069     with open(setup_log) as in_file:
070 for line in in_file:

Within the for loop, we leverage similar logic to determine whether the line contains our device installation indicators. If responsive, we extract the device information using the same manner as discussed previously.

By defining the lower_line variable on line 74, we can truncate the remaining code by preventing continuous calls to the .lower() method. Please note that lines 73 through 75 reflect one line of wrapped code:

On line 73, the backslash (\) character indicates to Python that it should ignore the newline character and continue reading on the next line. Then, at the end of line 74, we can return to anywhere without the need for the backslash, as our conditional is within parenthesis.
071             lower_line = line.lower()
072 # if 'Device Install (Hardware initiated)' in line:
073 if 'device install (hardware initiated)' in \
074 lower_line and ('ven' in lower_line or
075 'vid' in lower_line):

As noted in the first iteration, a fair number of false positives were displayed in our output. That's because this log contains information related to many types of hardware devices, including those interfacing with PCI, and not just USB devices. To remove the noise, we will check to see what type of device it is.

We can split on the backslash character, as shown on lines 78 and 79, to access the first split element of the device_name variable and see if it contains the usb string. As mentioned in Chapter 1, Now for Something Completely Different, we need to escape a single backslash with another backslash so that Python knows to treat it as a literal backslash character. This will respond for devices labeled as USB and USBSTOR in the file. Some false positives will still exist, since mice, keyboards, and hubs will likely display as USB devices; however, we do not want to over-filter and miss relevant artifacts. If we discover that the entry does not contain the usb string, we execute the continue statement, telling Python to step through the next iteration of the for loop:

078                 if 'usb' not in device_name.split(
079 '\\')[0].lower():
080 continue

To retrieve the date, we need to use a different procedure to get the next line since we have not invoked the enumerate() function. To solve this challenge, we use the next() function on line 87 to step into the next line in the file. We then process this line in the same fashion as we discussed previously:

087                 date = next(in_file).split('start')[1].strip()

With the device's name and date processed, we append it to the device_list as a tuple, where the device's name is the first value and the date is the second. We need the double parenthesis, in this case, to ensure that our data is appended properly. The outer set is used by the .append() function. The inner parentheses allow us to build a tuple and append it as one value. If we did not have the inner parentheses, we would be passing the two elements as separate arguments to the append() function instead of a single tuple element. Once all of the lines have been processed in the for loop, the with loop will end and close the file. On line 90, the device_list is returned and the function exits.

088                 device_list.append((device_name, date))
089
090 return device_list