Here is the same bird-watching observation file that we saw in Set Example: Arctic Birds:
| canada goose |
| canada goose |
| long-tailed jaeger |
| canada goose |
| snow goose |
| canada goose |
| long-tailed jaeger |
| canada goose |
| northern fulmar |
Suppose we want to know how often each species was seen. Our first attempt uses a list of lists, in which each inner list has two items. The item at index 0 of the inner list contains the species, and the item at index 1 contains the number of times it has been seen so far. To build this list, we iterate over the lines of the observations file. For each line, we search the outer list, looking for the species on that line. If we find that the species occurs in the list, we add one to the number of times it has been observed; if we do not find it, we add a new entry for the species:
| from typing import TextIO, List, Any |
| from io import StringIO |
| |
| def count_birds(observations_file: TextIO) -> List[List[Any]]: |
| """Return a set of the bird species listed in observations_file, which has |
| one bird species per line. |
| |
| >>> infile = StringIO('bird 1\\nbird 2\\nbird 1\\n') |
| >>> count_birds(infile) |
| [['bird 1', 2], ['bird 2', 1]] |
| """ |
| bird_counts = [] |
| for line in observations_file: |
| bird = line.strip() |
| found = False |
| # Find bird in the list of bird counts. |
| for entry in bird_counts: |
| if entry[0] == bird: |
| entry[1] = entry[1] + 1 |
| found = True |
| if not found: |
| bird_counts.append([bird, 1]) |
| |
| return bird_counts |
| |
| if __name__ == '__main__': |
| with open('observations.txt') as observations_file: |
| bird_counts = count_birds(observations_file) |
| |
| # Print each bird and the number of times it was seen |
| for entry in bird_counts: |
| print(entry[0], entry[1]) |
Here is the output:
| canada goose 5 |
| long-tailed jaeger 2 |
| snow goose 1 |
| northern fulmar 1 |
This code uses a Boolean variable, found. Once a species is read from the file, found is assigned False. The program then iterates over the list, looking for that species at index 0 of one of the inner lists. If the species occurs in an inner list, found is assigned True. At the end of the loop over the list, if found still refers to False it means that this species is not yet present in the list and so it is added, along with the number of observations of it, which is currently 1.
Our code works, but there are two things wrong with it. The first is that it is complex. The more nested loops our programs contain, the harder they are to understand, fix, and extend. The second is that it is inefficient. Suppose we were interested in beetles instead of birds and that we had millions of observations of tens of thousands of species. Scanning the list of names each time we want to add one new observation would take a long, long time, even on a fast computer (a topic we will return to in Chapter 13, Searching and Sorting).
Can you use a set to solve both problems at once? Sets can look up values in a single step; why not combine each bird’s name and the number of times it has been seen into a two-valued tuple and put those tuples in a set?
The problem with this idea is that you can look for values only if you know what those values are. In this case, you won’t. You will know only the name of the species, but not how many times it has already been seen.
The right approach is to use another data structure called a dictionary. Also known as a map, a dictionary is an unordered mutable collection of key/value pairs. In plain English, Python’s dictionaries are like dictionaries that map words to definitions. They associate a key (like a word) with a value (such as a definition). The keys form a set: any particular key can appear once at most in a dictionary. Like the elements in sets, keys must be immutable (though the values associated with them don’t have to be).
Dictionaries are created by putting key/value pairs inside braces (each key is followed by a colon and then by its value):
| >>> bird_to_observations = {'canada goose': 3, 'northern fulmar': 1} |
| >>> bird_to_observations |
| {'northern fulmar': 1, 'canada goose': 3} |
We chose variable name bird_to_observations since this variable refers to a dictionary where each key is a bird and each value is the number of observations of that bird. In other words, the dictionary maps birds to observations. Here is a picture of the resulting dictionary:
To get the value associated with a key, we put the key in square brackets, much like indexing into a list:
| >>> bird_to_observations['northern fulmar'] |
| 1 |
Indexing a dictionary with a key it doesn’t contain produces an error, just like an out-of-range index for a list does:
| >>> bird_to_observations['canada goose'] |
| 3 |
| >>> bird_to_observations['long-tailed jaeger'] |
| Traceback (most recent call last): |
| File "<stdin>", line 1, in <module> |
| KeyError: 'long-tailed jaeger' |
The empty dictionary is written {} (this is why we can’t use this notation for the empty set). It doesn’t contain any key/value pairs, so indexing into it always results in an error.
As with sets, dictionaries are unordered:
| >>> dict1 = {'canada goose': 3, 'northern fulmar': 1} |
| >>> dict2 = {'northern fulmar': 1, 'canada goose': 3} |
| >>> dict1 == dict2 |
| True |
To update the value associated with a key, you use the same notation as for lists, except you use a key instead of an index. If the key is already in the dictionary, this assignment statement changes the value associated with it. If the key isn’t present, the key/value pair is added to the dictionary:
| >>> bird_to_observations = {} |
| >>> |
| >>> # Add a new key/value pair, 'snow goose': 33. |
| >>> bird_to_observations['snow goose'] = 33 |
| >>> |
| >>> # Add a new key/value pair, 'eagle': 999. |
| >>> bird_to_observations['eagle'] = 999 |
| >>> bird_to_observations |
| {'eagle': 999, 'snow goose': 33} |
| >>> |
| >>> # Change the value associated with key 'eagle' to 9. |
| >>> bird_to_observations['eagle'] = 9 |
| >>> bird_to_observations |
| {'eagle': 9, 'snow goose': 33} |
To remove an entry from a dictionary, use del d[k], where d is the dictionary and k is the key being removed. Only entries that are present can be removed; trying to remove one that isn’t there results in an error:
| >>> bird_to_observations = {'snow goose': 33, 'eagle': 9} |
| >>> del bird_to_observations['snow goose'] |
| >>> bird_to_observations |
| {'eagle': 9} |
| >>> del bird_to_observations['gannet'] |
| Traceback (most recent call last): |
| File "<stdin>", line 1, in <module> |
| KeyError: 'gannet' |
To test whether a key is in a dictionary, we can use the in operator:
| >>> bird_to_observations = {'eagle': 999, 'snow goose': 33} |
| >>> 'eagle' in bird_to_observations |
| True |
| >>> if 'eagle' in bird_to_observations: |
| ... print('eagles have been seen') |
| ... |
| eagles have been seen |
| >>> del bird_to_observations['eagle'] |
| >>> 'eagle' in bird_to_observations |
| False |
| >>> if 'eagle' in bird_to_observations: |
| ... print('eagles have been seen') |
| ... |
| >>> |
The in operator only checks the keys of a dictionary. In this example, 33 in birds evaluates to False, since 33 is a value, not a key.
Like the other collections you’ve seen, you can loop over dictionaries. The general form of a for loop over a dictionary is as follows:
| for variable in dictionary: |
| block |
For dictionaries, the loop variable is assigned each key from the dictionary in turn:
| >>> bird_to_observations = {'canada goose': 183, 'long-tailed jaeger': 71, |
| ... 'snow goose': 63, 'northern fulmar': 1} |
| >>> for bird in bird_to_observations: |
| ... print(bird, bird_to_observations[bird]) |
| ... |
| canada goose 183 |
| long-tailed jaeger 71 |
| snow goose 63 |
| northern fulmar 1 |
When Python loops over a dictionary, it assigns each key to the loop variable. (It’s a lot easier to go from a dictionary key to the associated value than it is to take the value and find the associated key.)
Like lists, tuples, and sets, dictionaries are objects. Their methods are described in Table 16, Dictionary Methods. The following code shows how the methods can be used:
| >>> scientist_to_birthdate = {'Newton' : 1642, 'Darwin' : 1809, |
| ... 'Turing' : 1912} |
| >>> scientist_to_birthdate.keys() |
| dict_keys(['Darwin', 'Newton', 'Turing']) |
| >>> scientist_to_birthdate.values() |
| dict_values([1809, 1642, 1912]) |
| >>> scientist_to_birthdate.items() |
| dict_items([('Darwin', 1809), ('Newton', 1642), ('Turing', 1912)]) |
| >>> scientist_to_birthdate.get('Newton') |
| 1642 |
| >>> scientist_to_birthdate.get('Curie', 1867) |
| 1867 |
| >>> scientist_to_birthdate |
| {'Darwin': 1809, 'Newton': 1642, 'Turing': 1912} |
| >>> researcher_to_birthdate = {'Curie' : 1867, 'Hopper' : 1906, |
| ... 'Franklin' : 1920} |
| >>> scientist_to_birthdate.update(researcher_to_birthdate) |
| >>> scientist_to_birthdate |
| {'Hopper': 1906, 'Darwin': 1809, 'Turing': 1912, 'Newton': 1642, |
| 'Franklin': 1920, 'Curie': 1867} |
| >>> researcher_to_birthdate |
| {'Franklin': 1920, 'Hopper': 1906, 'Curie': 1867} |
| >>> researcher_to_birthdate.clear() |
| >>> researcher_to_birthdate |
| {} |
Method |
Description |
---|---|
D.clear() |
Removes all key/value pairs from dictionary D. |
D.get(k) |
Returns the value associated with key k, or None if the key isn’t present. (Usually you’ll want to use D[k] instead.) |
D.get(k, v) |
Returns the value associated with key k, or a default value v if the key isn’t present. |
D.keys() |
Returns dictionary D’s keys as a set-like object—entries are guaranteed to be unique. |
D.items() |
Returns dictionary D’s (key, value) pairs as set-like objects. |
D.pop(k) |
Removes key k from dictionary D and returns the value that was associated with k—if k isn’t in D, an error is raised. |
D.pop(k, v) |
Removes key k from dictionary D and returns the value that was associated with k; if k isn’t in D , returns v. |
D.setdefault(k) |
Returns the value associated with key k in D. |
D.setdefault(k, v) |
Returns the value associated with key k in D; if k isn’t a key in D, adds the key k with the value v to D and returns v. |
D.values() |
Returns dictionary D’s values as a list-like object—entries may or may not be unique. |
D.update(other) |
Updates dictionary D with the contents of dictionary other; for each key in other, if it is also a key in D, replaces that key in D’s value with the value from other; for each key in other, if that key isn’t in D, adds that key/value pair to D. |
As you can see from this output, the keys and values methods return the dictionary’s keys and values, respectively, while items returns the (key, value) pairs. Like the range object that you learned about previously, these are virtual sequences over which we can loop. Similarly, function list can be applied to them to create lists of keys/values or key/value tuples.
Because dictionaries usually map values from one concept (scientists, in our example) to another (birthdays), it’s common to use variable names linking the two—hence, scientist_to_birthdate.
One common use of items is to loop over the keys and values in a dictionary together:
| for key, value in dictionary.items(): |
| # Do something with the key and value |
For example, the same format can be used to loop over the scientists and their birth years:
| >>> scientist_to_birthdate = {'Newton' : 1642, 'Darwin' : 1809, |
| ... 'Turing' : 1912} |
| >>> for scientist, birthdate in scientist_to_birthdate.items(): |
| ... print(scientist, 'was born in', birthdate) |
| ... |
| Turing was born in 1912 |
| Darwin was born in 1809 |
| Newton was born in 1642 |
Instead of a single loop variable, there are two. The two parts of each of the two-item tuples returned by the method items is associated with a variable. Variable scientist refers to the first item in the tuple, which is the key, and birthdate refers to the second item, which is the value.
Back to birdwatching once again. Like before, we want to count the number of times each species has been seen. To do this, we create a dictionary that is initially empty. Each time we read an observation from a file, we check to see whether we have encountered that bird before—that is, whether the bird is already a key in our dictionary. If it is, we add 1 to the value associated with it. If it isn’t, we add the bird as a key to the dictionary with the value 1. Here is the program that does this. Notice the type annotation for dictionaries:
| from typing import TextIO, Dict |
| from io import StringIO |
| |
| def count_birds(observations_file: TextIO) -> Dict[str, int]: |
| """Return a set of the bird species listed in observations_file, which has |
| one bird species per line. |
| |
| >>> infile = StringIO('bird 1\\nbird 2\\nbird 1\\n') |
| >>> count_birds(infile) |
| {'bird 1': 2, 'bird 2': 1} |
| """ |
| bird_to_observations = {} |
| for line in observations_file: |
| bird = line.strip() |
| if bird in bird_to_observations: |
| bird_to_observations[bird] = bird_to_observations[bird] + 1 |
| else: |
| bird_to_observations[bird] = 1 |
| |
| return bird_to_observations |
| |
| if __name__ == '__main__': |
| with open('observations.txt') as observations_file: |
| bird_to_observations = count_birds(observations_file) |
| for bird, observations in bird_to_observations.items(): |
| print(bird, observations) |
The function body can be shortened by using the method dict.get, which saves three lines:
| def count_birds(observations_file: TextIO) -> Dict[str, int]: |
| """Return a set of the bird species listed in observations_file, which has |
| one bird species per line. |
| |
| >>> infile = StringIO('bird 1\\nbird 2\\nbird 1\\n') |
| >>> count_birds(infile) |
| {'bird 1': 2, 'bird 2': 1} |
| """ |
| bird_to_observations = {} |
| for line in observations_file: |
| bird = line.strip() |
| bird_to_observations[bird] = bird_to_observations.get(bird, 0) + 1 |
| |
| return bird_to_observations |
Using the get method makes the program shorter, but some programmers find it harder to understand at a glance. If the first argument to get is not a key in the dictionary, it returns 0; otherwise it returns the value associated with that key. After that, 1 is added to that value. The dictionary is updated to associate that sum with the key that bird refers to.