The Metadata_Parser framework overview

Now that we understand the concept of frameworks and what kind of data we're dealing with, we can examine the specifics of our framework implementation. Rather than a flow diagram, we use a high-level diagram to show how the scripts interact with each other:

This framework is going to be controlled by the metadata_parser.py script. This script will be responsible for launching our three plugin scripts and then shuttling the returned data to the appropriate writer plugins. During processing, the plugins make calls to processors to help validate data or perform other processing functions. We have two writer plugins, one for CSV output and another to plot geotagged data using Google Earth's KML format.

Each plugin will take an individual file as its input and store the parsed metadata tags in a dictionary. This dictionary is then returned to metadata_parser.py and is appended to a list. Once all of our input files are processed, we send these lists of dictionaries to writers. We use the DictWriter from the csv module to write our dictionary output to a CSV file.

Similar to Chapter 6, Extracting Artifacts from Binary Files, we'll have multiple Python directories to organize our code in a logical manner. To use these packages, we need to make the directory searchable with an __init__.py script and then import the directory in the code:

  |-- metadata_parser.py 
  |-- plugins 
      |-- __init__.py 
      |-- exif_parser.py 
      |-- id3_parser.py 
      |-- office_parser.py 
  |-- processors 
      |-- __init__.py 
      |-- utility.py 
  |-- writers 
      |-- __init__.py 
      |-- csv_writer.py 
      |-- kml_writer.py