Developing the get_tags() function

The get_tags() function, with the help of the PIL module, parses EXIF metadata tags from our JPEG image. On line 72, we create a list of headers for our CSV output. This list contains all of the possible keys that might be created in our EXIF dictionary in the order we want them to be displayed in a CSV file. As all JPEG images may not have the same or any embedded EXIF tags, we'll run into the scenario where some dictionaries have more tags than others. By supplying the writer with the list of ordered keys, we'll ensure that the fields are written in the appropriate order and columns:

062 def get_tags(filename):
063     """
064     The get_tags function extracts the EXIF metadata from the data
065     object.
066     :param filename: the path and name to the data object.
067     :return: tags and headers, tags is a dictionary containing
068     EXIF metadata and headers are the order of keys for the
069     CSV output.
070     """
071     # Set up CSV headers
072     headers = ['Path', 'Name', 'Size', 'Filesystem CTime',
073     'Filesystem MTime', 'Original Date', 'Digitized Date', 'Make',
074     'Model', 'Software', 'Latitude', 'Latitude Reference',
075     'Longitude', 'Longitude Reference', 'Exif Version', 'Height',
076     'Width', 'Flash', 'Scene Type']

On line 77, we open the JPEG file using the Image.open() function. Once again, we perform one final validation step using the verify() function. This function checks for any file corruption and raises errors if encountered. Otherwise, on line 84, we call the _getexif() function, which returns a dictionary of EXIF metadata:

077     image = Image.open(filename)
078 
079     # Detects if the file is corrupt without decoding the data
080     image.verify()
081 
082     # Descriptions and values of EXIF tags
083     # http://www.exiv2.org/tags.html
084     exif = image._getexif()

On line 86, we create our dictionary, tags, which will store metadata about our file object. On lines 87 through 94, we populate the dictionary with some filesystem metadata, such as the full path, name, size, and create and modify timestamps. The os.path.basename() function takes the full pathname and returns the filename. For example, os.path.basename('Users/LPF/Desktop/myfile.txt') would simply return myfile.txt.

Using the getsize() function will return the file size in bytes. The larger the number, the less useful it is for humans. We're more accustomed to seeing sizes with common prefixes, such as MB, GB, and TB. The convert_size() processor function does just this to make the data more useful for the human analyst.

On lines 91 and 93, we convert the integer returned by os.path.getctime(), representing the creation time expressed in seconds since the epoch. The epoch, 01/01/1970 00:00:00, can be confirmed by calling time.gmtime(0). We use the gmtime() function to convert these seconds into a time-structured object (similar to datetime). We use the strftime to format the time object into our desired date string:

086     tags = {}
087     tags['Path'] = filename
088     tags['Name'] = os.path.basename(filename)
089     tags['Size'] = processors.utility.convert_size(
090         os.path.getsize(filename))
091     tags['Filesystem CTime'] = strftime('%m/%d/%Y %H:%M:%S',
092         gmtime(os.path.getctime(filename)))
093     tags['Filesystem MTime'] = strftime('%m/%d/%Y %H:%M:%S',
094         gmtime(os.path.getmtime(filename)))

On line 95, we check whether there are any keys in the exif dictionary. If there are, we iterate through each key and check its value. The values we're querying for are from the EXIF tags described at http://www.exiv2.org/tags.html. There are many potential EXIF tags, but we're going to query for only some of the more forensically relevant ones.

If the particular tag does exist in the exif dictionary, then we transfer the value to our tags dictionary. Some tags require some additional processing, such as timestamp, scene, flash, and GPS tags. The timestamp tags are displayed in a format that's inconsistent with how we're representing other timestamps. For example, the time from tag 36867 on line 99 is separated by colons and in a different order:

2015:11:11 10:32:15

In line 100, we use the strptime function to convert our existing time string into a datetime object. In the very next line, we use the strftime function to convert it into our desired date string format:

095     if exif:
096         for tag in exif.keys():
097             if tag == 36864:
098                 tags['Exif Version'] = exif[tag]
099             elif tag == 36867:
100                 dt = datetime.strptime(exif[tag],
101                 '%Y:%m:%d %H:%M:%S')
102                 tags['Original Date'] = dt.strftime(
103                 '%m/%d/%Y %H:%M:%S')
104             elif tag == 36868:
105                 dt = datetime.strptime(exif[tag],
106                 '%Y:%m:%d %H:%M:%S')
107                 tags['Digitized Date'] = dt.strftime(
108                 '%m/%d/%Y %H:%M:%S')

The scene (41990) and flash (37385) tags have an integer value rather than a string. As we mentioned previously, the online documentation (http://www.exiv2.org/tags.html) explains what these integers represent. In these two scenarios, we create a dictionary containing the potential integers as keys and their descriptions as values. We check whether the tag's value is a key in our dictionary. If it's present, we store the description in the tags dictionary rather than the integer. Again, this is for the purpose of making analysis easier on the examiner. Seeing a string explanation of the scene or flash tag is more valuable than a number representing that explanation:

109             elif tag == 41990:
110                 # Scene tags
111                 # http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/scenecapturetype.html
112                 scenes = {0: 'Standard', 1: 'Landscape',
113                 2: 'Portrait', 3: 'Night Scene'}
114                 if exif[tag] in scenes:
115                     tags['Scene Type'] = scenes[exif[tag]]
116                 else:
117                     pass
118             elif tag == 37385:
119                 # Flash tags
120                 # http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/flash.html
121                 flash = {0: 'Flash did not fire',
122                 1: 'Flash fired',
123                 5: 'Strobe return light not detected',
124                 7: 'Strobe return light detected',
125                 9: 'Flash fired, compulsory flash mode',
126                 13: 'Flash fired, compulsory flash mode, return light not detected',
127                 15: 'Flash fired, compulsory flash mode, return light detected',
128                 16: 'Flash did not fire, compulsory flash mode',
129                 24: 'Flash did not fire, auto mode',
130                 25: 'Flash fired, auto mode',
131                 29: 'Flash fired, auto mode, return light not detected',
132                 31: 'Flash fired, auto mode, return light detected',
133                 32: 'No flash function',
134                 65: 'Flash fired, red-eye reduction mode',
135                 69: 'Flash fired, red-eye reduction mode, return light not detected',
136                 71: 'Flash fired, red-eye reduction mode, return light detected',
137                 73: 'Flash fired, compulsory flash mode, red-eye reduction mode',
138                 77: 'Flash fired, compulsory flash mode, red-eye reduction mode, return light not detected',
139                 79: 'Flash fired, compulsory flash mode, red-eye reduction mode, return light detected',
140                 89: 'Flash fired, auto mode, red-eye reduction mode',
141                 93: 'Flash fired, auto mode, return light not detected, red-eye reduction mode',
142                 95: 'Flash fired, auto mode, return light detected, red-eye reduction mode'}
143                 if exif[tag] in flash:
144                     tags['Flash'] = flash[exif[tag]]
145             elif tag == 271:
146                 tags['Make'] = exif[tag]
147             elif tag == 272:
148                 tags['Model'] = exif[tag]
149             elif tag == 305:
150                 tags['Software'] = exif[tag]
151             elif tag == 40962:
152                 tags['Width'] = exif[tag]
153             elif tag == 40963:
154                 tags['Height'] = exif[tag]

Finally, on line 155, we look for the GPS tags that are stored as a nested dictionary under the key 34853. If the latitude and longitude tags exist, we pass them to the dms_to_decimal() function to convert them into a more suitable manner for the KML writer:

155             elif tag == 34853:
156                 for gps in exif[tag]:
157                     if gps == 1:
158                         tags['Latitude Reference'] = exif[tag][gps]
159                     elif gps == 2:
160                         tags['Latitude'] = dms_to_decimal(
161                         exif[tag][gps])
162                     elif gps == 3:
163                         tags['Longitude Reference'] = exif[tag][gps]
164                     elif gps == 4:
165                         tags['Longitude'] = dms_to_decimal(
166                         exif[tag][gps])
167                     else:
168                         pass
169     return tags, headers