Practical Programming, Third Edition

Storing Data Using Dictionaries

Here is the same bird-watching observation file that we saw in Set Example: Arctic Birds:

	canada goose
	canada goose
	long-tailed jaeger
	canada goose
	snow goose
	canada goose
	long-tailed jaeger
	canada goose
	northern fulmar

Suppose we want to know how often each species was seen. Our first attempt uses a list of lists, in which each inner list has two items. The item at index 0 of the inner list contains the species, and the item at index 1 contains the number of times it has been seen so far. To build this list, we iterate over the lines of the observations file. For each line, we search the outer list, looking for the species on that line. If we find that the species occurs in the list, we add one to the number of times it has been observed; if we do not find it, we add a new entry for the species:

	from typing import TextIO, List, Any
	from io import StringIO

	def count_birds(observations_file: TextIO) -> List[List[Any]]:
	"""Return a set of the bird species listed in observations_file, which has
	one bird species per line.

	>>> infile = StringIO('bird 1\\nbird 2\\nbird 1\\n')
	>>> count_birds(infile)
	[['bird 1', 2], ['bird 2', 1]]
	"""
	bird_counts = []
	for line in observations_file:
	bird = line.strip()
	found = False
	# Find bird in the list of bird counts.
	for entry in bird_counts:
	if entry[0] == bird:
	entry[1] = entry[1] + 1
	found = True
	if not found:
	bird_counts.append([bird, 1])

	return bird_counts

	if __name__ == '__main__':
	with open('observations.txt') as observations_file:
	bird_counts = count_birds(observations_file)

	# Print each bird and the number of times it was seen
	for entry in bird_counts:
	print(entry[0], entry[1])

Here is the output:

	canada goose 5
	long-tailed jaeger 2
	snow goose 1
	northern fulmar 1

This code uses a Boolean variable, found. Once a species is read from the file, found is assigned False. The program then iterates over the list, looking for that species at index 0 of one of the inner lists. If the species occurs in an inner list, found is assigned True. At the end of the loop over the list, if found still refers to False it means that this species is not yet present in the list and so it is added, along with the number of observations of it, which is currently 1.

Our code works, but there are two things wrong with it. The first is that it is complex. The more nested loops our programs contain, the harder they are to understand, fix, and extend. The second is that it is inefficient. Suppose we were interested in beetles instead of birds and that we had millions of observations of tens of thousands of species. Scanning the list of names each time we want to add one new observation would take a long, long time, even on a fast computer (a topic we will return to in Chapter 13, Searching and Sorting).

Can you use a set to solve both problems at once? Sets can look up values in a single step; why not combine each bird’s name and the number of times it has been seen into a two-valued tuple and put those tuples in a set?

The problem with this idea is that you can look for values only if you know what those values are. In this case, you won’t. You will know only the name of the species, but not how many times it has already been seen.

The right approach is to use another data structure called a dictionary. Also known as a map, a dictionary is an unordered mutable collection of key/value pairs. In plain English, Python’s dictionaries are like dictionaries that map words to definitions. They associate a key (like a word) with a value (such as a definition). The keys form a set: any particular key can appear once at most in a dictionary. Like the elements in sets, keys must be immutable (though the values associated with them don’t have to be).

Dictionaries are created by putting key/value pairs inside braces (each key is followed by a colon and then by its value):

	>>> bird_to_observations = {'canada goose': 3, 'northern fulmar': 1}
	>>> bird_to_observations
	{'northern fulmar': 1, 'canada goose': 3}

We chose variable name bird_to_observations since this variable refers to a dictionary where each key is a bird and each value is the number of observations of that bird. In other words, the dictionary maps birds to observations. Here is a picture of the resulting dictionary:

To get the value associated with a key, we put the key in square brackets, much like indexing into a list:

	>>> bird_to_observations['northern fulmar']
	1

Indexing a dictionary with a key it doesn’t contain produces an error, just like an out-of-range index for a list does:

	>>> bird_to_observations['canada goose']
	3
	>>> bird_to_observations['long-tailed jaeger']
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	KeyError: 'long-tailed jaeger'

The empty dictionary is written {} (this is why we can’t use this notation for the empty set). It doesn’t contain any key/value pairs, so indexing into it always results in an error.

As with sets, dictionaries are unordered:

	>>> dict1 = {'canada goose': 3, 'northern fulmar': 1}
	>>> dict2 = {'northern fulmar': 1, 'canada goose': 3}
	>>> dict1 == dict2
	True

Updating and Checking Membership

To update the value associated with a key, you use the same notation as for lists, except you use a key instead of an index. If the key is already in the dictionary, this assignment statement changes the value associated with it. If the key isn’t present, the key/value pair is added to the dictionary:

	>>> bird_to_observations = {}
	>>>
	>>> # Add a new key/value pair, 'snow goose': 33.
	>>> bird_to_observations['snow goose'] = 33
	>>>
	>>> # Add a new key/value pair, 'eagle': 999.
	>>> bird_to_observations['eagle'] = 999
	>>> bird_to_observations
	{'eagle': 999, 'snow goose': 33}
	>>>
	>>> # Change the value associated with key 'eagle' to 9.
	>>> bird_to_observations['eagle'] = 9
	>>> bird_to_observations
	{'eagle': 9, 'snow goose': 33}

To remove an entry from a dictionary, use del d[k], where d is the dictionary and k is the key being removed. Only entries that are present can be removed; trying to remove one that isn’t there results in an error:

	>>> bird_to_observations = {'snow goose': 33, 'eagle': 9}
	>>> del bird_to_observations['snow goose']
	>>> bird_to_observations
	{'eagle': 9}
	>>> del bird_to_observations['gannet']
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	KeyError: 'gannet'

To test whether a key is in a dictionary, we can use the in operator:

	>>> bird_to_observations = {'eagle': 999, 'snow goose': 33}
	>>> 'eagle' in bird_to_observations
	True
	>>> if 'eagle' in bird_to_observations:
	... print('eagles have been seen')
	...
	eagles have been seen
	>>> del bird_to_observations['eagle']
	>>> 'eagle' in bird_to_observations
	False
	>>> if 'eagle' in bird_to_observations:
	... print('eagles have been seen')
	...
	>>>

The in operator only checks the keys of a dictionary. In this example, 33 in birds evaluates to False, since 33 is a value, not a key.

Looping Over Dictionaries

Like the other collections you’ve seen, you can loop over dictionaries. The general form of a for loop over a dictionary is as follows:

	for variable in dictionary:
	block

For dictionaries, the loop variable is assigned each key from the dictionary in turn:

	>>> bird_to_observations = {'canada goose': 183, 'long-tailed jaeger': 71,
	... 'snow goose': 63, 'northern fulmar': 1}
	>>> for bird in bird_to_observations:
	... print(bird, bird_to_observations[bird])
	...
	canada goose 183
	long-tailed jaeger 71
	snow goose 63
	northern fulmar 1

When Python loops over a dictionary, it assigns each key to the loop variable. (It’s a lot easier to go from a dictionary key to the associated value than it is to take the value and find the associated key.)

Dictionary Operations

Like lists, tuples, and sets, dictionaries are objects. Their methods are described in Table 16, Dictionary Methods. The following code shows how the methods can be used:

	>>> scientist_to_birthdate = {'Newton' : 1642, 'Darwin' : 1809,
	... 'Turing' : 1912}
	>>> scientist_to_birthdate.keys()
	dict_keys(['Darwin', 'Newton', 'Turing'])
	>>> scientist_to_birthdate.values()
	dict_values([1809, 1642, 1912])
	>>> scientist_to_birthdate.items()
	dict_items([('Darwin', 1809), ('Newton', 1642), ('Turing', 1912)])
	>>> scientist_to_birthdate.get('Newton')
	1642
	>>> scientist_to_birthdate.get('Curie', 1867)
	1867
	>>> scientist_to_birthdate
	{'Darwin': 1809, 'Newton': 1642, 'Turing': 1912}
	>>> researcher_to_birthdate = {'Curie' : 1867, 'Hopper' : 1906,
	... 'Franklin' : 1920}
	>>> scientist_to_birthdate.update(researcher_to_birthdate)
	>>> scientist_to_birthdate
	{'Hopper': 1906, 'Darwin': 1809, 'Turing': 1912, 'Newton': 1642,
	'Franklin': 1920, 'Curie': 1867}
	>>> researcher_to_birthdate
	{'Franklin': 1920, 'Hopper': 1906, 'Curie': 1867}
	>>> researcher_to_birthdate.clear()
	>>> researcher_to_birthdate
	{}

Table 16. Dictionary Methods

Method	Description
D.clear()	Removes all key/value pairs from dictionary D.
D.get(k)	Returns the value associated with key k, or None if the key isn’t present. (Usually you’ll want to use D[k] instead.)
D.get(k, v)	Returns the value associated with key k, or a default value v if the key isn’t present.
D.keys()	Returns dictionary D’s keys as a set-like object—entries are guaranteed to be unique.
D.items()	Returns dictionary D’s (key, value) pairs as set-like objects.
D.pop(k)	Removes key k from dictionary D and returns the value that was associated with k—if k isn’t in D, an error is raised.
D.pop(k, v)	Removes key k from dictionary D and returns the value that was associated with k; if k isn’t in D , returns v.
D.setdefault(k)	Returns the value associated with key k in D.
D.setdefault(k, v)	Returns the value associated with key k in D; if k isn’t a key in D, adds the key k with the value v to D and returns v.
D.values()	Returns dictionary D’s values as a list-like object—entries may or may not be unique.
D.update(other)	Updates dictionary D with the contents of dictionary other; for each key in other, if it is also a key in D, replaces that key in D’s value with the value from other; for each key in other, if that key isn’t in D, adds that key/value pair to D.

As you can see from this output, the keys and values methods return the dictionary’s keys and values, respectively, while items returns the (key, value) pairs. Like the range object that you learned about previously, these are virtual sequences over which we can loop. Similarly, function list can be applied to them to create lists of keys/values or key/value tuples.

Because dictionaries usually map values from one concept (scientists, in our example) to another (birthdays), it’s common to use variable names linking the two—hence, scientist_to_birthdate.

One common use of items is to loop over the keys and values in a dictionary together:

	for key, value in dictionary.items():
	# Do something with the key and value

For example, the same format can be used to loop over the scientists and their birth years:

	>>> scientist_to_birthdate = {'Newton' : 1642, 'Darwin' : 1809,
	... 'Turing' : 1912}
	>>> for scientist, birthdate in scientist_to_birthdate.items():
	... print(scientist, 'was born in', birthdate)
	...
	Turing was born in 1912
	Darwin was born in 1809
	Newton was born in 1642

Instead of a single loop variable, there are two. The two parts of each of the two-item tuples returned by the method items is associated with a variable. Variable scientist refers to the first item in the tuple, which is the key, and birthdate refers to the second item, which is the value.

In every version of Python prior to Python 3.6, when iterating over the keys of a dictionary, the keys were unordered. Consider this program:

	items = {'first': 1, 'second': 2, 'third': 3}
	for key, value in items.items():
	print(key, value)

We ran it three times using Python 3.5. Notice that each run printed the items in a different order:

Run 1

Run 2

Run 3

	first 1
	third 3
	second 2

	second 2
	third 3
	first 1

	third 3
	first 1
	second 2

In Python 3.6, the way in which dictionaries are stored by Python has a side effect: the keys always come out in the same order. The language designers have warned that we should not rely on this, although it may become a guaranteed feature in future versions.

In keeping with this advice, none of the examples in this book rely on dictionary key order.

Dictionary Example

Back to birdwatching once again. Like before, we want to count the number of times each species has been seen. To do this, we create a dictionary that is initially empty. Each time we read an observation from a file, we check to see whether we have encountered that bird before—that is, whether the bird is already a key in our dictionary. If it is, we add 1 to the value associated with it. If it isn’t, we add the bird as a key to the dictionary with the value 1. Here is the program that does this. Notice the type annotation for dictionaries:

	from typing import TextIO, Dict
	from io import StringIO

	def count_birds(observations_file: TextIO) -> Dict[str, int]:
	"""Return a set of the bird species listed in observations_file, which has
	one bird species per line.

	>>> infile = StringIO('bird 1\\nbird 2\\nbird 1\\n')
	>>> count_birds(infile)
	{'bird 1': 2, 'bird 2': 1}
	"""
	bird_to_observations = {}
	for line in observations_file:
	bird = line.strip()
	if bird in bird_to_observations:
	bird_to_observations[bird] = bird_to_observations[bird] + 1
	else:
	bird_to_observations[bird] = 1

	return bird_to_observations

	if __name__ == '__main__':
	with open('observations.txt') as observations_file:
	bird_to_observations = count_birds(observations_file)
	for bird, observations in bird_to_observations.items():
	print(bird, observations)

The function body can be shortened by using the method dict.get, which saves three lines:

	def count_birds(observations_file: TextIO) -> Dict[str, int]:
	"""Return a set of the bird species listed in observations_file, which has
	one bird species per line.

	>>> infile = StringIO('bird 1\\nbird 2\\nbird 1\\n')
	>>> count_birds(infile)
	{'bird 1': 2, 'bird 2': 1}
	"""
	bird_to_observations = {}
	for line in observations_file:
	bird = line.strip()
	bird_to_observations[bird] = bird_to_observations.get(bird, 0) + 1

	return bird_to_observations

Using the get method makes the program shorter, but some programmers find it harder to understand at a glance. If the first argument to get is not a key in the dictionary, it returns 0; otherwise it returns the value associated with that key. After that, 1 is added to that value. The dictionary is updated to associate that sum with the key that bird refers to.