Developing the get_tags() function

The get_tags() function, with the help of the PIL module, parses EXIF metadata tags from our JPEG image. On line 72, we create a list of headers for our CSV output. This list contains all of the possible keys that might be created in our EXIF dictionary in the order we want them to be displayed in a CSV file. As all JPEG images may not have the same or any embedded EXIF tags, we'll run into the scenario where some dictionaries have more tags than others. By supplying the writer with the list of ordered keys, we'll ensure that the fields are written in the appropriate order and columns:

062 def get_tags(filename):
063 """
064 The get_tags function extracts the EXIF metadata from the data
065 object.
066 :param filename: the path and name to the data object.
067 :return: tags and headers, tags is a dictionary containing
068 EXIF metadata and headers are the order of keys for the
069 CSV output.
070 """
071 # Set up CSV headers
072 headers = ['Path', 'Name', 'Size', 'Filesystem CTime',
073 'Filesystem MTime', 'Original Date', 'Digitized Date', 'Make',
074 'Model', 'Software', 'Latitude', 'Latitude Reference',
075 'Longitude', 'Longitude Reference', 'Exif Version', 'Height',
076 'Width', 'Flash', 'Scene Type']

On line 77, we open the JPEG file using the Image.open() function. Once again, we perform one final validation step using the verify() function. This function checks for any file corruption and raises errors if encountered. Otherwise, on line 84, we call the _getexif() function, which returns a dictionary of EXIF metadata:

077     image = Image.open(filename)
078
079 # Detects if the file is corrupt without decoding the data
080 image.verify()
081
082 # Descriptions and values of EXIF tags
083 # http://www.exiv2.org/tags.html
084 exif = image._getexif()

On line 86, we create our dictionary, tags, which will store metadata about our file object. On lines 87 through 94, we populate the dictionary with some filesystem metadata, such as the full path, name, size, and create and modify timestamps. The os.path.basename() function takes the full pathname and returns the filename. For example, os.path.basename('Users/LPF/Desktop/myfile.txt') would simply return myfile.txt.

Using the getsize() function will return the file size in bytes. The larger the number, the less useful it is for humans. We're more accustomed to seeing sizes with common prefixes, such as MB, GB, and TB. The convert_size() processor function does just this to make the data more useful for the human analyst.

On lines 91 and 93, we convert the integer returned by os.path.getctime(), representing the creation time expressed in seconds since the epoch. The epoch, 01/01/1970 00:00:00, can be confirmed by calling time.gmtime(0). We use the gmtime() function to convert these seconds into a time-structured object (similar to datetime). We use the strftime to format the time object into our desired date string:

086     tags = {}
087 tags['Path'] = filename
088 tags['Name'] = os.path.basename(filename)
089 tags['Size'] = processors.utility.convert_size(
090 os.path.getsize(filename))
091 tags['Filesystem CTime'] = strftime('%m/%d/%Y %H:%M:%S',
092 gmtime(os.path.getctime(filename)))
093 tags['Filesystem MTime'] = strftime('%m/%d/%Y %H:%M:%S',
094 gmtime(os.path.getmtime(filename)))

On line 95, we check whether there are any keys in the exif dictionary. If there are, we iterate through each key and check its value. The values we're querying for are from the EXIF tags described at http://www.exiv2.org/tags.html. There are many potential EXIF tags, but we're going to query for only some of the more forensically relevant ones.

If the particular tag does exist in the exif dictionary, then we transfer the value to our tags dictionary. Some tags require some additional processing, such as timestamp, scene, flash, and GPS tags. The timestamp tags are displayed in a format that's inconsistent with how we're representing other timestamps. For example, the time from tag 36867 on line 99 is separated by colons and in a different order:

2015:11:11 10:32:15

In line 100, we use the strptime function to convert our existing time string into a datetime object. In the very next line, we use the strftime function to convert it into our desired date string format:

095     if exif:
096 for tag in exif.keys():
097 if tag == 36864:
098 tags['Exif Version'] = exif[tag]
099 elif tag == 36867:
100 dt = datetime.strptime(exif[tag],
101 '%Y:%m:%d %H:%M:%S')
102 tags['Original Date'] = dt.strftime(
103 '%m/%d/%Y %H:%M:%S')
104 elif tag == 36868:
105 dt = datetime.strptime(exif[tag],
106 '%Y:%m:%d %H:%M:%S')
107 tags['Digitized Date'] = dt.strftime(
108 '%m/%d/%Y %H:%M:%S')

The scene (41990) and flash (37385) tags have an integer value rather than a string. As we mentioned previously, the online documentation (http://www.exiv2.org/tags.html) explains what these integers represent. In these two scenarios, we create a dictionary containing the potential integers as keys and their descriptions as values. We check whether the tag's value is a key in our dictionary. If it's present, we store the description in the tags dictionary rather than the integer. Again, this is for the purpose of making analysis easier on the examiner. Seeing a string explanation of the scene or flash tag is more valuable than a number representing that explanation:

109             elif tag == 41990:
110 # Scene tags
111 # http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/scenecapturetype.html
112 scenes = {0: 'Standard', 1: 'Landscape',
113 2: 'Portrait', 3: 'Night Scene'}
114 if exif[tag] in scenes:
115 tags['Scene Type'] = scenes[exif[tag]]
116 else:
117 pass
118 elif tag == 37385:
119 # Flash tags
120 # http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/exif/flash.html
121 flash = {0: 'Flash did not fire',
122 1: 'Flash fired',
123 5: 'Strobe return light not detected',
124 7: 'Strobe return light detected',
125 9: 'Flash fired, compulsory flash mode',
126 13: 'Flash fired, compulsory flash mode, return light not detected',
127 15: 'Flash fired, compulsory flash mode, return light detected',
128 16: 'Flash did not fire, compulsory flash mode',
129 24: 'Flash did not fire, auto mode',
130 25: 'Flash fired, auto mode',
131 29: 'Flash fired, auto mode, return light not detected',
132 31: 'Flash fired, auto mode, return light detected',
133 32: 'No flash function',
134 65: 'Flash fired, red-eye reduction mode',
135 69: 'Flash fired, red-eye reduction mode, return light not detected',
136 71: 'Flash fired, red-eye reduction mode, return light detected',
137 73: 'Flash fired, compulsory flash mode, red-eye reduction mode',
138 77: 'Flash fired, compulsory flash mode, red-eye reduction mode, return light not detected',
139 79: 'Flash fired, compulsory flash mode, red-eye reduction mode, return light detected',
140 89: 'Flash fired, auto mode, red-eye reduction mode',
141 93: 'Flash fired, auto mode, return light not detected, red-eye reduction mode',
142 95: 'Flash fired, auto mode, return light detected, red-eye reduction mode'}
143 if exif[tag] in flash:
144 tags['Flash'] = flash[exif[tag]]
145 elif tag == 271:
146 tags['Make'] = exif[tag]
147 elif tag == 272:
148 tags['Model'] = exif[tag]
149 elif tag == 305:
150 tags['Software'] = exif[tag]
151 elif tag == 40962:
152 tags['Width'] = exif[tag]
153 elif tag == 40963:
154 tags['Height'] = exif[tag]

Finally, on line 155, we look for the GPS tags that are stored as a nested dictionary under the key 34853. If the latitude and longitude tags exist, we pass them to the dms_to_decimal() function to convert them into a more suitable manner for the KML writer:

155             elif tag == 34853:
156 for gps in exif[tag]:
157 if gps == 1:
158 tags['Latitude Reference'] = exif[tag][gps]
159 elif gps == 2:
160 tags['Latitude'] = dms_to_decimal(
161 exif[tag][gps])
162 elif gps == 3:
163 tags['Longitude Reference'] = exif[tag][gps]
164 elif gps == 4:
165 tags['Longitude'] = dms_to_decimal(
166 exif[tag][gps])
167 else:
168 pass
169 return tags, headers