1
ELECTRONICALLY UNDRESSED
YOU’VE PROBABLY DECIDED that in order to save a buck on a bunch of grapes, you’ll let the supermarket compile data about your eating habits, and that in order to avoid a long line at the tollbooth, you’ll let some contractor for the state know how often and at what times of day you cross that bridge or drive on that highway. Maybe you even pay for a cup of coffee with a credit card. We are what we eat—and what we buy, and who we know, and where we live, and what we look at, and where we go. All of us—not just young people—give this information away freely because it’s convenient and often enjoyable to do so. Forty-five percent of Facebook users are twenty-six years old or older, and the fastest-growing segment of that group is women over fifty-five.1 Older people may not use Twitter, but they do use credit cards, pay tolls automatically on the interstate, bank online, and buy cars equipped with GPS.
In the last twenty years the ready availability of inexpensive, increasingly powerful, and ever-smaller networked computers has revolutionized how—and how fast—we create, process, store, and transmit information. These developments have changed our lives so pervasively, and accelerated the speed of change in our lives so sharply, that it’s hard to recall what the world was like before we were so happily and relentlessly connected. We think of letters written on paper as relics of a bygone era, but even dial-up modems or waiting thirty seconds for a Web page to load now seems quaint. In 2010 a six-year-old watching a 1980s movie asked her father, “Why is the phone attached to the wall?” If you can answer this question, you’re getting old. Manual typewriters, carbon paper, party-line telephones—these are incomprehensible phrases to the majority of people now living.
Computing power has doubled every year and a half since the mid-1960s.2 To grasp what this means, consider that in 1978 it cost about nine hundred dollars to fly from New York to Paris, and the flight took seven hours. If airline travel had accelerated at the same rate as computing power, you could now make the trip for about a penny, in less than a second.3 Our machines have sped up—today’s game processors can do at least a billion operations per second—but we haven’t. We can’t keep up with our own machines. So our machines have begun to talk to one another, making decisions for us, exchanging information about us. They apply the brakes in our cars when we’re too slow to do it, land huge aircraft unassisted, trade enormous volumes of securities, adjust the flow of electricity on the grid, and share data about us that we think of as private. And these machines are everywhere. The “personal digital assistant” in your pocket is more powerful than the 1960s IBM mainframe computer that occupied an entire room.
The movement of technology between government and the private sector is not a one-way street. Just as GPS has migrated from fighter planes to your car, so has technology moved from your living room to the front lines of conventional and cyber warfare. Most of the government’s computing systems are developed in the private sector now, and gaming consoles have directly influenced the design of instrumentation for weapons systems. The Cyber Crimes Center in the Department of Homeland Security has even dumped the eight-thousand-dollar consoles it once used to crack the passwords of seized computers—then replaced them with Sony’s PlayStation 3s for “brute force” password attacks that run through every conceivable password until they find the right one.4 The difference between electronic toys and business applications is vanishing.
The border between commerce and government wasn’t always so porous. In the beginning the Internet and its precursors were federally funded links between universities and government researchers, and it was illegal to use them for commercial purposes. Congress didn’t change that law until 1992.5 Even so, many university users were furious at the thought that an educational tool would be polluted by commerce, and as recently as the mid-1990s the Internet was still essentially a research tool and the plaything of a few. In 1995, the idea of buying and selling on the Internet aroused more suspicion than enthusiasm, but by January 2008 there were 1.3 billion Internet users .6 By 2011 the number of users had climbed to nearly 2 billion, and many of them were buying and selling online.7 No wonder that by 2009 information technology stocks had become the single largest sector in the U.S. economy.8 By 2015, the number of Internet hosts is expected to exceed the planet’s human population.9 Mobile data traffic is doubling every year, and all that data leaves a trail.
Going from rummaging in a file drawer to searching electronic data and images was dramatic; so was clicking a mouse instead of traveling to the library. In these cases, however, we were fetching things we wanted; new technology merely allowed us to fetch faster. Now data comes to us unbidden, based on choices we made in the past, who our electronic friends are, and where we live. Or it comes based on where we are, like a coupon for a latte that shows up on our mobile phone when we walk past the coffee shop.10 We now live in a sea of ambient data. Or rather, each of us increasingly lives in his or her own customized virtual sea of ambient data. And wherever we swim in that sea, each of us leaves electronic evidence of where we’ve been and strong indicators of where we are likely to go next—of which we are often unaware.

The Data Market

Data is a commodity, and the market for it is measured in billions of dollars—trillions if we include electronic banking and credit card issuers. Reed Elsevier PLC, one of the world’s biggest data aggregation companies, has reported a steady 10 percent annual growth of online traffic since 1999.11 Reed Elsevier owns LexisNexis, the largest source of online legal and periodical information. It also owns ChoicePoint, which does background and public records searches; it’s the outfit that checks the accuracy of the résumé you sent to a prospective employer or graduate school.12 These companies and others like them make fortunes based on information that has always been publicly available. They aggregate that information, sort it, reformat it, and make it instantaneously available to public- and private-sector users willing to pay for it. Other firms occupy other niches: financial data for investors; medical records for hospitals, doctors, and insurance companies; credit scoring for any business that gives you credit; and of course banks and credit card issuers. MasterCard alone made $5.5 billion in net revenue by managing $567 billion in charges, transactions, and settlements worldwide.13
Aggregated data tell a merchant what goods to stock and how to target advertising. Do you like fish but avoid red meat? Fine, we’ll send you an ad when we have a fish special, but we won’t waste our money sending you ads for roast beef. You prefer SUVs to convertibles, or casual clothing to suits, or certain kinds of movies or music? Great—we won’t waste our money or your time telling you about products you won’t buy anyway. This is good for the merchant; arguably it’s good for you, too. Aggregated data is also good for insurance companies, because without it they can’t calculate premiums for groups of people that represent different levels of risk. Whether that’s good for you depends on which risk pool the insurance companies put you in—and whether the data is accurate.
A single set of fingerprints probably has no value, but a bank of such prints helps the police identify criminals; DNA databases do the same. The federal DNA database holds 4.6 million profiles, or 1.5 percent of the U.S. population—mostly convicted criminals. Across the ocean, two thirds of Britons favor a law that would require everyone’s DNA to be stored.14 And if you visit central London you’re being photographed every time you walk down the street or enter the Underground, and if you drive, your license plate is being photographed wherever you go. The better the database, the more crimes will be solved. Whether that benefit is worth the privacy loss is another question. But however you feel about that, the benefit of aggregated data in solving crime is beyond dispute.
Data are valuable in all these cases because the aggregator can link an identity with a history or a pattern of behavior. But aggregated data also have enormous social importance even without links to individuals. Without that information, public health officials don’t know what diseases need more or less attention and resources, and predicting what kind of flu will strike next year would be even harder than it is. Having this data in real time, or near real time (as opposed to getting it a month or a year later), is also valuable, because it can warn that a new epidemic is breaking out right now, when we may be able to prevent or slow down its spread, and this is true whether the epidemic is natural or the result of a terrorist attack. Add personally identifying information back in, and you add more value. It helps find victims and, in some cases, the source of infection or attack.
The amount of information available about you is startling: your date of birth, driving record, medical history, credit rating, shopping patterns (including where you shop and what you buy), mortgage and property records, political contributions, vacation patterns (including the route you drive), whether you drive, telephone numbers (even if unlisted), the names of your spouse and children and business partners, your grades in school, your criminal record (if you have one)—and much else besides. In order to get that information about you, someone used to have to stand in line in several different buildings, not necessarily in the same city, just to request it, and he probably had to wait around or come back a second time to pick it up. Now, with several mouse clicks, the information is often available to anyone anywhere in the world who wants it.15 Your medical records are supposed to be under lock and key, but who keeps them? Your doctor, to start with, but so does the software provider that your doctor pays to store those records, and the doctor’s outside laboratory, and the insurance company that covers you, and their database administrator, who may work for someone else. Information you may think is confidential, sensitive, and private is sent to many different places automatically, sometimes in different countries, and each leg of its transmission over public communications networks represents a potential vulnerability. There is rarely such a thing as a single, unique record anymore. There are multiple copies of every record, stored in multiple places, in databases whose level of security is a mystery to most users, and sometimes even to company officials.
 
 
“We never don’t know anything about someone”
 
How did all this data become so readily available? Because you and I have given it away. In many cases we’ve had little choice about it—not if we want a mortgage or lease, a marriage or driver’s license, or health insurance; not if we want to enter the hospital, send our children to school, contribute to a political candidate, or buy a snack on the many airlines that no longer accept cash in flight.16 In other cases we don’t even know it’s happening. If you click on the credit card page of Capital One Financial’s Web site, for example, thousands of lines of code representing information about your education and income level and residence will be sucked up by the company in a fraction of a second. Your machine is talking to their machine, and in that fraction of a second your machine is working for them, not you. And so aggregated data snowballs. As Capital One’s data contractor quipped, “We never don’t know anything about someone.”17 That contractor probably doesn’t think he’s in the personal espionage business, but he is.
But do we care that we’re being spied upon? Is this new type of commercial spy collecting secrets—or simply gathering information we freely spread around? Many people find it comforting that someone else knows where they are at all times. Mobile phone companies and location services for your car, like OnStar, advertise their ability to do just that for you. Your mobile phone or PDA makes a constant record of where it goes. Mobile devices in the United States generate about 600 billion events per day.18 These events don’t just include the calls, text messages, and Internet connections you know you’re making. They also include the silent “pings” between each cell phone and a nearby tower whether you’re using the phone or not. They are the heartbeat of cell phone service and typically occur every few minutes. Each ping is tagged with geospatial location information,19 and if you have GPS or use Wi-Fi that record is very precise—less than eleven yards. If not, the record is still pretty accurate, because the cell towers that handle your calls are constantly pinging your phone and pinpointing your location to within about a block. According to Jeff Jonas, a data expert at IBM, this information will soon warn us about events that haven’t even occurred yet. Your free Gmail account, he surmises, will advise you “that your buddy Ken is going to be 15 minutes late to the pool hall this coming Thursday, unless he leaves work 15 minutes early . . . which he has done only twice in seven years.”20 With enough information about your past movements, scientists can predict your movements with about 94 percent accuracy. Or forecast traffic congestion. By examining patterns of communication and movement, they can detect flu symptoms before you know you’re getting sick. The emotional content of Twitter lets researchers predict the moves of the Dow Jones index with about 88 percent accuracy.21 Jonas also points out that the police could use the same kind of information to watch crowds form and disperse—a powerful tool for crowd control. Or for discouraging political expression. As of mid-2011, law enforcement officials in most jurisdictions can get geolocation data from mobile carriers simply by issuing a subpoena. Most jurisdictions don’t require a warrant signed by a judge—at least not unless the surveillance is long-term or involves movements inside a dwelling.22
And so, little by little, we find ourselves living in a glass house. Last year’s novelty is this year’s necessity; your friends wonder why you don’t have one. The price of last month’s expensive electronic luxury just fell by half. Your kid wants one for her birthday. It’s not simply adults who are being watched and marketed to, identified, and classified. Sure, we give up some privacy with each little step, but we get something back: convenience, peace of mind, whatever. We may be electronically naked, but we demand it. This is the world we now live in, and let’s face it: We find that world irresistible.23
 
 
THIS IRRESISTIBLE WORLD in which we are all numbered and accounted for crept up on us largely unnoticed, but its roots are old. Americans and Western Europeans have been counting, classifying, and identifying ourselves for several hundred years, but the initial steps were slow. The first modern European census was taken in Prussia (naturally) in 1719; the United States took its first census in 1790; Britain and France24 followed in 1801. Indeed, representative democracy required counting in order to achieve fairness in taxation and legislative representation. 25 And then, in the last quarter of the nineteenth century, two things happened that dramatically accelerated this business of identifying everybody. First, the state pension was born in Germany, and therefore the state was required to know who was owed money and at what age. Keeping increasingly exacting records was one essential result. National identity cards began to appear. Second, in big cities like London, Paris, and Buenos Aires, the police began to wonder, Who is this man we have arrested really? Haven’t we arrested him before?
Enter Alphonse Bertillon, a low-ranking functionary in the Paris sûreté who in 1882 showed that he could reliably reidentify anyone by measuring his or her head and body and making a careful record of tattoos, scars, and other quirks. Bertillonage was the beginning of systematic biometrics but it was soon superseded by fingerprinting, which quickly became the identification method of choice for law enforcement agencies worldwide. We fingerprint not only those charged with crimes, but also everyone in the military, everyone who applies for a security clearance, and welfare recipients. In the UK and in some parts of continental Europe, the fingerprinting of schoolchildren is widespread, and it is used in place of library cards.26 There are now fingerprint scanners, fingerprint door locks, safes with fingerprint locks, fingerprint time clocks, and fingerprint kits for your favorite niece or nephew.27 These devices are not being foisted on people; there’s a market for them. Fingerprinting is so ubiquitous that it has become a metaphor for any system of positive identification, like “DNA fingerprinting.”
The credit markets we take for granted are another aspect of this irresistible world. We could hardly live without them. Pioneered in the United States during the Great Depression,28 these markets require instant, accurate information about potential borrowers. Without them a home buyer could not “prequalify” for a mortgage on the phone or get instant credit to buy a refrigerator, as we now do as a matter of course. Before that, creditors didn’t lend to people they didn’t know, or they took property as security, like a pawnshop.
The overlapping and ever-expanding appetite of government and commerce to keep tabs on us—and our own appetite for keeping tabs on one another—means that it’s virtually impossible to elude our own autobiographical trail of purchasing habits, property ownership, employment history, credit scores, educational records, and in my case, a security clearance record a mile long. If you live in India, everyone’s personal data record will be a mile long, because the government there has launched a project to assign a unique twelve-digit identification number to every one of its 1.2 billion inhabitants, and to link that number to their fingerprints and iris scans. The idea is to ensure that welfare payments reach the right people and to permit India’s vast impoverished population to gain access to online banking and other services.29 Critics worry that the information will not be guarded adequately. They should also worry that aggregating data to prevent fraud also enables fraud, because gathering huge quantities of data in one place means it can all be stolen from one place.
Meanwhile, the shelf life of all this data gets longer and longer, because the cost of storing it has fallen like a stone. In 1990 a oneterabyte storage drive would have cost $1 million to buy. (A terabyte is a billion bytes of eight 1s and 0s, or a thousand times more than a gigabyte. Sixty-four terabytes is half the size of a big university library.) Now you can buy that drive for under $100. The price has fallen ten thousand times. Information about us doesn’t disappear with time. It can be saved forever, cheaply, in lots of places. That’s worrisome, but we also find it appealing. For about $175 you can buy a USB stick—also called a flash drive or a thumb drive—with sixty-four gigabytes (or “gigs”) of memory. That’s thirty-two million pages of text dangling at the end of a key chain. As we will see later, the ease of transporting huge amounts of data has dramatically changed the espionage trade.
 
 
FOLLOWING A SUDDEN explosion of Internet-enabled crime and the growing sense that our personal lives were exposed to strangers, Americans and Europeans began to search for ways to protect ourselves. Chiefly we have sought to regulate “personally identifiable information.” 30 The European Community defines this somewhat more broadly (and vaguely) than do U.S. laws, but the phrase generally means data such as your postal and e-mail addresses and Social Security and credit card numbers, which can readily be associated with you. Companies like to tell you how carefully they protect this kind of information. But are these efforts effective?
The answer depends on the objective. The rules have undoubtedly forced companies and government agencies to tighten their handling of information about their own customers and employees. Whether they have had any effect in reducing fraud is doubtful, however, because the amount of personal information legally available is burgeoning, and the black market in such information is vast, as we’ll see in the next chapter. But if the objective is to protect your anonymity, these laws have little effect—and may soon have none at all. That’s because each of us can now be easily identified without reference to any of the usual categories of personally identifiable information. Strip it away—that’s called “anonymizing” it—and data aggregators can put it back almost instantly.
Let’s suppose I know your zip code and gender. If twenty thousand people live in your zip code, I can eliminate ten thousand of them by gender. Add in your age, and I can reduce the number much further. If I know what kind of car you drive, I can identify you with near certainty. Researchers at Stanford University were able to reidentify people by their Netflix viewing habits simply by comparing the company’s carefully anonymized viewer ratings with publicly posted ratings on other Web sites that rated the same movies. Essentially, they showed the emptiness of the promises that Netflix and others make that you can do, watch, or buy whatever you like anonymously on their Web sites. Information scientists says they need only thirty-three “bits” of information—mundane things like your zip code or the make of your car—to identify you, and the information may have nothing to do with the legal definition of personally identifiable information. There are two possibilities for each bit (each bit must be a 1 or a 0), and 233 is a very large number—more than 8.6 billion, which is more than the Earth’s human population.31 A firm called PeekYou has filed a patent application for a “computerized distributed personal information aggregator” that matches real names with pseudonyms used on blogs and social network sites like Facebook and Twitter.32 It’s becoming almost impossible to be anonymous anymore.
Many of us are uncomfortable with the proliferation and transparency of personal data, and some would like Congress to pass laws to stop it. But the law is chasing reality, not shaping it. (This is another theme we’ll see cropping up repeatedly throughout this book.) The law is not fundamentally altering the direction or speed of our society’s movement toward the instant, universal availability of massive amounts of information that can be sliced, diced, and analyzed in microseconds—nor can it. You will have such data if you want it; your friends, enemies, and bank will have it too, and the government will either have it or be allowed to get it under certain legally defined conditions. We will write some rules that make access somewhat more difficult, but those rules won’t be able to hold back an overwhelming tide.
There are aspects of this that I find wonderful; others strike me as distasteful or worse. It has led to massive increases in productivity and wealth. It has also created vulnerabilities of staggering proportions—vulnerabilities that now generate billions of dollars in criminal revenue, are exploitable and exploited by foreign intelligence and military services as well as by criminals, and that—if not better understood and mitigated—put our communications, our economy, and even our military at risk of failure. For both good and ill, this is what’s happening. This is the glass house we live in.