Why do we keep data on our computers in “files”? Like many other computing terms it began its life within quotation marks, as an analogy to existing information and organization systems—in this case, the analogy being a filing cabinet.
In use from the start of the 1950s, the notion of a computer “file” initially described the physical medium on which information was stored—the medium, in the earliest computers, of punched cards rather than silicon disks.
It was only during the 1960s that computer files began to achieve their present sense, of virtual objects contained within computer storage media. Even then, however, the legacy of the word itself exerted a certain pressure on the medium, embodying the concept of each digital file as a distinct object, to be stored, copied, and edited much like the physical items from which they took their name.
As computer scientists such as Jaron Lanier have pointed out, the very idea of using a “filing system” to store data on a computer embodies limitations that are by no means inevitable in an electronic medium—and whose continuation owes something to the metaphorical force of the words first used to describe them.149
Modern computer files come in many different kinds, each performing different functions and matched to different kinds of software—a process of identification and matching in which file “extensions” are crucial. Essentially a qualifying suffix appended after a full stop—the letters “pdf” in a filename such as “Book.pdf,” for example—these extensions and their origins must rank among the most-used and least-understood aspects of digital vocabulary around. It’s a mysteriousness abetted by the fact that they’re usually only three letters long—a limitation that was once encoded in early file systems, but that today exists mainly because of inertia and compatibility issues.
Some file extensions have virtually entered into daily vocabulary, at least around the office. These include “.doc” (standing for “document” and used in many word processing packages), “.xls” (used in Microsoft’s Excel spreadsheet package, and standing for “Excel spreadsheet”), and “.pdf” (which stands for “portable document format” and was created by Adobe in 1993 as a way of preserving the exact layout of a printable document across different computers).
Other extensions can seem obscure but make considerably more sense once spelled out. A “.csv” file, for example, will consist of data in a “comma-separated values” format: that is, a list of information separated by commas. Similarly, understanding the nature of a “.gif” file becomes a little easier once you know it means a “Graphics Interchange Format” file, designed to store a relatively simple image. By contrast, the “.jpeg” method of storing images stands for the “Joint Photographic Experts Group,” which created .jpeg files in order to establish an international standard for storing high-quality photographs as computer files.
There are many hundreds of file types in the world today, some of which have even become words in their own right—“.MP3” for example, which is one of the world’s most popular digital forms of audio encoding, and derives its name from the Moving Picture Experts Group, responsible for many of the world’s standard audio and video formats.
Most of the time, we don’t even notice file extensions are there. And yet they form a vital layer of our verbal world—not least because it’s these few letters that tell both us and our machines exactly what kind of information we’re dealing with.