Techniques for Reading Files

As we mentioned at the beginning of the chapter, Python provides several techniques for reading files. You’ll learn about them in this section.

All of these techniques work starting at the current file cursor. That allows us to combine the techniques as we need to.

The Read Technique

Use this technique when you want to read the contents of a file into a single string, or when you want to specify exactly how many characters to read. This technique was introduced in Opening a File; here is the same example:

 with​ open(​'file_example.txt'​, ​'r'​) ​as​ file:
  contents = file.read()
 
 print​(contents)

When called with no arguments, method read reads everything from the current file cursor all the way to the end of the file and moves the file cursor to the end of the file. When called with one integer argument, it reads that many characters and moves the file cursor after the characters that were just read. Here is a version of the same program in a file called file_reader_with_10.py; it reads ten characters and then the rest of the file:

 with​ open(​'file_example.txt'​, ​'r'​) ​as​ example_file:
  first_ten_chars = example_file.read(10)
  the_rest = example_file.read()
 
 print​(​"The first 10 characters:"​, first_ten_chars)
 print​(​"The rest of the file:"​, the_rest)

Method call example_file.read(10) moves the file cursor, so the next call, example_file.read(), reads everything from character 11 to the end of the file.

The Readlines Technique

Use this technique when you want to get a Python list of strings containing the individual lines from a file. Function readlines works much like function read, except that it splits up the lines into a list of strings. As with read, the file cursor is moved to the end of the file.

This example reads the contents of a file into a list of strings and then prints that list:

 with​ open(​'file_example.txt'​, ​'r'​) ​as​ example_file:
  lines = example_file.readlines()
 
 print​(lines)

Here is the output:

 ['First line of text.\n', 'Second line of text.\n', 'Third line of text.\n']

Take a close look at that list; you’ll see that each line ends in \n characters. Python does not remove any characters from what is read; it only splits them into separate strings.

The last line of a file may or may not end with a newline character, as you learned in Exploring String Methods.

Assume file planets.txt contains the following text:

 Mercury
 Venus
 Earth
 Mars

This example prints the lines in planets.txt backward, from the last line to the first (here, we use built-in function reversed, which returns the items in the list in reverse order):

 >>>​​ ​​with​​ ​​open(​​'planets.txt'​​,​​ ​​'r'​​)​​ ​​as​​ ​​planets_file:
 ...​​ ​​planets​​ ​​=​​ ​​planets_file.readlines()
 ...
 >>>​​ ​​planets
 ['Mercury\n', 'Venus\n', 'Earth\n', 'Mars\n']
 >>>​​ ​​for​​ ​​planet​​ ​​in​​ ​​reversed(planets):
 ...​​ ​​print(planet.strip())
 ...
 Mars
 Earth
 Venus
 Mercury

We can use the Readlines technique to read the file, sort the lines, and print the planets alphabetically (here, we use built-in function sorted, which returns the items in the list in order from smallest to largest):

 >>>​​ ​​with​​ ​​open(​​'planets.txt'​​,​​ ​​'r'​​)​​ ​​as​​ ​​planets_file:
 ...​​ ​​planets​​ ​​=​​ ​​planets_file.readlines()
 ...
 >>>​​ ​​planets
 ['Mercury\n', 'Venus\n', 'Earth\n', 'Mars\n']
 >>>​​ ​​for​​ ​​planet​​ ​​in​​ ​​sorted(planets):
 ...​​ ​​print(planet.strip())
 ...
 Earth
 Mars
 Mercury
 Venus

The “For Line in File” Technique

Use this technique when you want to do the same thing to every line from the file cursor to the end of a file. On each iteration, the file cursor is moved to the beginning of the next line.

This code opens file planets.txt and prints the length of each line in that file:

 >>>​​ ​​with​​ ​​open(​​'planets.txt'​​,​​ ​​'r'​​)​​ ​​as​​ ​​data_file:
 ...​​ ​​for​​ ​​line​​ ​​in​​ ​​data_file:
 ...​​ ​​print(len(line))
 ...
 8
 6
 6
 5

Take a close look at the last line of output. There are only four characters in the word Mars, but our program is reporting that the line is five characters long. The reason for this is the same as for function readlines: each of the lines we read from the file has a newline character at the end. We can get rid of it using string method strip, which returns a copy of a string that has leading and trailing whitespace characters (spaces, tabs, and newlines) stripped away:

 >>>​​ ​​with​​ ​​open(​​'planets.txt'​​,​​ ​​'r'​​)​​ ​​as​​ ​​data_file:
 ...​​ ​​for​​ ​​line​​ ​​in​​ ​​data_file:
 ...​​ ​​print(len(line.strip()))
 ...
 7
 5
 5
 4

The Readline Technique

This technique reads one line at a time, unlike the Readlines technique. Use this technique when you want to read only part of a file.

For example, you might want to treat lines differently depending on context; perhaps you want to process a file that has a header section followed by a series of records, either one record per line or with multiline records.

The following data, taken from the Time Series Data Library [Hyn06], describes the number of colored fox fur pelts produced in Hopedale, Labrador, in the years 1834–1842. (The full data set has values for the years 1834–1925.)

 Coloured fox fur production, HOPEDALE, Labrador, 1834-1842
 #Source: C. Elton (1942) "Voles, Mice and Lemmings", Oxford Univ. Press
 #Table 17, p.265--266
  22
  29
  2
  16
  12
  35
  8
  83
  166

The first line contains a description of the data. The next two lines contain comments about the data, each of which begins with a # character. Each piece of actual data appears on a single line.

We’ll use the Readline technique to skip the header, and then we’ll use the For Line in File technique to process the data in the file, counting how many fox fur pelts were produced.

 with​ open(​'hopedale.txt'​, ​'r'​) ​as​ hopedale_file:
 
 # Read and skip the description line.
  hopedale_file.readline()
 
 # Keep reading and skipping comment lines until we read the first piece
 # of data.
  data = hopedale_file.readline().strip()
 while​ data.startswith(​'#'​):
  data = hopedale_file.readline().strip()
 
 # Now we have the first piece of data. Accumulate the total number of
 # pelts.
  total_pelts = int(data)
 
 # Read the rest of the data.
 for​ data ​in​ hopedale_file:
  total_pelts = total_pelts + int(data.strip())
 
 print​(​"Total number of pelts:"​, total_pelts)

And here is the output:

 Total number of pelts: 373

Each call on the function readline moves the file cursor to the beginning of the next line.

Sometimes leading whitespace is important and you’ll want to preserve it. In the Hopedale data, for example, the integers are right-justified to make them line up nicely. In order to preserve this, you can use rstrip instead of strip to remove the trailing newline; here is a program that prints the data from that file, preserving the whitespace:

 with​ open(​'hopedale.txt'​, ​'r'​) ​as​ hopedale_file:
 
 # Read and skip the description line.
  hopedale_file.readline()
 
 # Keep reading and skipping comment lines until we read the first piece
 # of data.
  data = hopedale_file.readline().rstrip()
 while​ data.startswith(​'#'​):
  data = hopedale_file.readline().rstrip()
 
 # Now we have the first piece of data.
 print​(data)
 
 # Read the rest of the data.
 for​ data ​in​ hopedale_file:
 print​(data.rstrip())

And here is the output:

  22
  29
  2
  16
  12
  35
  8
  83
 166