Obfuscation

Will Obfuscation Work?

How can obfuscation succeed? How can the efforts of a few individuals generating extraneous data work against well-funded, determined institutions, let alone against such behemoths of data as Google, Facebook, Axciom, and the National Security Agency? Encountering these doubts again and again, we have come to see that when people ask about particular instantiations of obfuscation, or obfuscation generally “But does it work?” the reasonable answer is “Yes, but it depends.” It depends on the goals, the obfuscator, the adversary, the resources available, and more. These, in turn, suggest means, methods, and principles for design and execution.

The typical scenario we imagined earlier involves individuals functioning within information ecosystems often not of their own making or choosing. Against the designers, operators, managers, and owners of these ecosystems, individual data subjects stand in an asymmetric relation of knowledge, power, or both. Although these individuals are aware that information about them or produced by them is necessary for the relationship, there is much that they don’t know. How much is taken? What is done with it? How will they be affected? They may grasp enough about the ecosystems in which they are wittingly or unwittingly enrolled, from Web searching to facial recognition, to believe or recognize that its practices are inappropriate, but, at the same time, recognize that they aren’t capable of reasonably functioning outside it, or of reasonably inducing change within it.

Whether obfuscation works—whether unilateral shifting of terms of engagement over personal information is fulfilled by a particular obfuscation project—may seem to be a straightforward question about a specific problem- solving technique, but upon closer scrutiny it is actually several questions. Whether obfuscation works depends on characteristics of the existing circumstances, the desired alteration in terms, what counts as fulfillment of these desires, and the architecture and features of the particular obfuscation project under consideration. This is why answering the question “Does it work?” with “It depends” isn’t facetious; instead it is an invitation to consider in systematic terms what characteristics of an information ecosystem make it one in which obfuscation could work. Beyond these considerations, we seek to map design possibilities for obfuscation projects into an array of diverse goals that the instigators and users of such projects may have.

Therefore, we have to answer two questions with this chapter. We can take the question “Will obfuscation work?” in the sense “How can obfuscation work for me and my particular situation?” or in the sense “Does obfuscation work in general?” We will respond to both questions. The overall answer is straightforward: Yes, obfuscation can work, but whether it does and to what extent depends on how it is implemented to respond to a threat, fulfill a goal, and meet other specific parameters. This chapter presents a set of questions that we think should be addressed if obfuscation is to be applied well.

5.1 Obfuscation is about goals

In the world of security and privacy theory, it is by now well established that the answer to every “Does it work?” question is “It depends.” To secure something, to make it private or safe or secret, entails tradeoffs, many of which we have already discussed. Securing things requires time, money, effort, and attention, and adds organizational and personal friction while diminishing convenience and access to many tools and services. Near-total freedom from digital surveillance for an individual is simple, after all: just lead the life of an undocumented migrant laborer of the 1920s, with no Internet, no phones, no insurance, no assets, riding the rails, being paid off the books for illegal manual work. Simple, but with a very high cost, because the threat model of “everything” is ludicrously broad. When we think of organizational security tradeoffs, we can think of the “Cone of Silence” in the spy-movie-parody television series Get Smart.¹ Used for conducting top secret meetings, the Cone works so well that the people in it can’t hear one another—it is perfectly private and amusingly useless.²

Threat models lower the costs of security and privacy by helping us understand what our adversaries are looking for and what they are capable of finding, so that we can defend against those dangers specifically.³ If you know that your organization faces a danger that includes sophisticated attacks on its information security, you should fill in all the USB ports on the organization’s computers with rubber cement and keep sensitive information on “airgapped” machines that are never connected to the network. But if you don’t believe that your organization faces such a danger, why deprive people of the utility of USB sticks? Obfuscation in general is useful in relation to a specific type of threat, shaped by necessary visibility. As we have emphasized throughout, the obfuscator is already exposed to some degree—visible to radar, to people scrutinizing public legal filings, to security cameras, to eavesdropping, to Web search providers, and generally to data collection defined by the terms of service. Furthermore, he or she is exposed, to a largely unknown degree, at the wrong side of the information asymmetry, and this unknown exposure is further aggravated by time—by future circulation of data and systems of analysis. We take this visibility as a starting point for working out the role that obfuscation can play.

To put that another way, we don’t have a best-practices threat model available—in fact, an obfuscator may not have sufficient resources, research, or training to put such a model together. We are operating from a position of weakness, obligated to accept choices we should probably refuse. If this is the case, we have to make do (more on that below) and we must have a clear sense of what we want to accomplish. Consider danah boyd’s research on American teenagers’ use of social media. Teens in the United States are subject to an enormous amount of scrutiny, almost all of it without their consent or control (parents, school, other authority figures). Social media would seem to make them subject to even more. They are exposed to scrutiny by default—in fact, it is to their benefit, from a privacy perspective, to appear to be visible to everyone. “As teens encounter particular technologies, they make decisions based on what they’re trying to achieve,” boyd writes,⁴ and what they are trying to achieve is often to share content without sharing meaning. They can’t necessarily create secret social spaces for their community—parents can and do demand passwords to their social-network accounts and access to their phones. Instead, they use a variety of practices that assume everyone can see what they do, and then behave so that only a few people can understand the meaning of their actions. “Limiting access to meaning,” boyd writes, “can be a much more powerful tool for achieving privacy than trying to limit access to the content itself.”⁵ Their methods don’t necessarily use obfuscation (they lean heavily on subtle social cues, references, and nuance to create material that reads differently to different audiences, a practice of “social steganography”), but they emphasize the importance of understanding goals. The goal is not to disappear or to maintain total informational control (which may be impossible); it is to limit and shape the community that can accurately interpret actions that everyone can see.

Much the same is true of obfuscation. Many instances and practices that we have gathered under that heading are expressions of particular goals that take discovery, visibility, or vulnerability as a starting point. For all the reasons we have already discussed, people now can’t escape certain kinds of data collection and analysis, so the question then becomes “What does the obfuscator want to do with obfuscation?” The answer to that question gives us a set of parameters (choices, constraints, mechanisms) that we can use to shape our approach to obfuscation.

5.2 I want to use obfuscation …

A safe that can’t be cracked does not exist. Safes are rated in hours—in how long it would take an attacker (given various sets of tools) to open them.⁶ A safe is purchased as a source of security in addition to other elements of security, including locked doors, alarms, guards, and law-enforcement personnel. A one-hour safe with an alarm probably is adequate in a precinct where the police reliably show up in twenty minutes. If we abstract this a little bit, we can use it to characterize the goals of obfuscation. The strength of an obfuscation approach isn’t measured by a single objective standard (as safes are) but in relation to a goal and a context: to be strong enough. It may be used on its own or in concert with other privacy techniques. The success of obfuscation is always relative to its purposes, and to consideration of constraints, obstacles, and the un-level playing field of epistemic and power asymmetries.

When gathering different obfuscation examples, we observed that there was convergence around general aims and purposes that cropped up numerous times, even though a single system could be associated with several ends or purposes and even though intermediate ends sometimes served as means to achieve other ends. There are subtler distinctions, too, but we have simplified and unified purposes and ends into goals to make them more readily applicable to design and practice. They are arranged roughly in order of inclusion, from buying time to expressing protest. Interfering with profiling, the fifth goal, can include some of the earlier goals, such as providing cover, within it, and can be in turn contained by expressing protest (the sixth goal). (Since virtually all obfuscation contributes to the difficulty of rapidly analyzing and processing data for surveillance purposes, all the higher-order goals include the first goal: buying time.) As you identify the goal suited to your project, you ascend a ladder of complexity and diversity of possible types of obfuscation.

Skeptical readers—and we all should be skeptical—will notice that we are no longer relying heavily on examples of obfuscation used by powerful groups for malign ends, such as the use of Twitter bots to hamper election protests, the use of likefarming in social-network scams, or inter-business corporate warfare. We want this section to focus on how obfuscation can be used for positive purposes.

If you can answer the questions in the previous chapter to your satisfaction, then this chapter is intended for you. We begin with the possibility that you want to use obfuscation to buy some time.

… to buy some time

Did radar chaff “work”? After all, it fluttered to the ground in minutes, leaving the sky again open for the sweep of the beam—but of course by then the plane was already out of range.

The ephemeral obfuscation systems meant to buy time are, in a sense, elegantly simple, but they require a deep appreciation of intricate physical, scientific, technical, social, and cultural surroundings. Success doesn’t require that one buy a particular amount of time, or the longest time possible; it requires only that one buy just enough time. Using identical confederates, or even just slowing the process of going through documents, dealing with bureaucracy, or sorting true from false information, can work toward this end. Most obfuscation strategies work best in concert with other techniques of privacy protection or protest, but this is particularly true of time-buying approaches, which rely on other means of evasion and resistance already being in place—and a very clear sense of the adversary. (See the questions in section 5.3.)

… to provide cover

This subsection and the next are related but distinct, with a substantial overlap. They approach the same problem from different sides: keeping an adversary from definitively connecting particular activities, outcomes, or objects to an actor. Obfuscation for cover involves concealing the action in the space of other actions. Some approaches can be implemented to withstand scrutiny; others rely on the cover provided by context to escape observation. Think of babble tapes, which bury a message in dozens of channels of voices: we know that the speaker is speaking, but we don’t know what is being said. Or think of the approach that Operation Vula ultimately settled on: not simply encrypted email, but encrypted email that would perfectly fit the profile of banal international business. The communications of ANC operatives could take on cover as an additional layer of protection (along with crypto and superb operational security) by using the traffic of other messages similar to theirs to avoid observation. One method assumes scrutiny, and the other strives to be ignored; each is suited to its situation.

… for deniability

If providing cover hides the action in the space of other actions, providing deniability hides the decision, making it more difficult to connect an action and an actor with certainty. One of the benefits of running a Tor relay is the additional layer of confusion it creates: is this traffic starting with you, or are you just passing it along for someone else? (TrackMeNot has a similar mechanism; we will discuss it in greater detail in the subsection on interference with profiling.) Likewise, consider the use of simulated uploads to leak sites, which make it harder to determine definitively that a certain file was uploaded during a session by some particular IP address. Finally, think of something as simple as shuffling SIM cards around: it doesn’t conceal the activity of carrying phones and placing calls, but makes it more difficult to be certain that it’s this person with this phone at any time. Though providing deniability blurs a bit with providing cover and with preventing individual observation, it is particularly useful when you know that your adversary wants to be sure that it has the right person.

… to prevent individual exposure

This somewhat unusual goal may at first sound generic (don’t all obfuscation approaches want to prevent individual observation?), but we mean something very specific by it. Certain obfuscation approaches are well suited to achieving the positive social outcome of enabling individuals, companies, institutions, and governments to use aggregate data while keeping the data from being used to observe any particular person. Privacy-preserving participatory sensing can collect valuable aggregate data about traffic flows without revealing anything reliable about one particular vehicle. CacheCloak retains the significant social utility of location-based mobile services while preventing the providers of those services from tracking the users (and leaving open other avenues to making money). Pools for the swapping of loyalty cards give grocery and retail chains most of the benefits they were hoping for (the cards are driving business their way and providing useful demographic data, postal codes, or data on purchases) but prevent them from compiling dossiers on specific shoppers.

… to interfere with profiling

Another rung up the ladder of comprehensiveness, anti-profiling obfuscation may interfere with observation of individuals or with analysis of a group, may provide cover or deniability, or may raise the cost (in time and money) of the business of data. It may leave aggregate useful data intact or may pack it with ambiguity, reasonable lies, and nonsense.

Vortex was a cookie-swapping system that enabled users to hop between identities and profiles. Had it been widely implemented beyond the prototype stage, it would have rendered online profiling for advertising purposes useless. The various “cloning” and disinformation services we have described offer similar tools for making profiling less reliable. TrackMeNot provides search- query deniability (e.g., was that query about “Tea Party join” or “fluffy sex toys” from you, or not?) under the larger goal of rendering search profiles in general less reliable. Which queries can you trust? Which queries define the cluster into which the searcher fits? Against which queries should you serve ads, and what user activity and identities should you provide in response to a subpoena?

… to express protest

Of course, TrackMeNot is a gesture of protest, as are many of our other examples—for example, card-swapping activists and crowds in Guy Fawkes masks. Many obfuscation strategies can meet or contribute to goals already mentioned while also serving to register discontent or refusal. A pertinent question to ask of your obfuscation approach is whether it is intended to keep you unnoticed, to make you seem innocuous, or to make your dissent known.

5.3 Is my obfuscation project …

Now that you have a sense of your goals, we can turn to four remaining questions that build on the goals and shape the components of an obfuscation project. As was true of the six goals, there is some overlap between these questions. They will determine how an obfuscation system works, but they are not perfectly distinct, and they have some effect on each other. We have separated them according to the roles they play in implementing obfuscation.

… individual, or collective?

Can your obfuscation project be carried out effectively by one person, or does it require collective action? One person wearing a mask is more easily identified and tracked than someone not wearing a mask, but a hundred people wearing the same mask become a crowd of collective identity, and that makes individual attribution of actions difficult. Some obfuscation projects can be used by an individual or by a small group but will become more effective as more people join in. The reverse could also be true (see “known or unknown,” below): a technique that relies on blending in and not being noticed—that functions by avoiding scrutiny—will become far more vulnerable if widely adopted.

Two consequences will follow from your answer to the question this subsection asks.

First, an obfuscation technique that builds on collective action can spur adoption through the “network effect.” If the technique becomes more reliable or more robust for all existing users as more users join, you can think about the design from the perspective of crossing that threshold where significant gains for joining become apparent and you can spark widespread use. Does your technique require some number of users before it will be really effective? If it does, how will you get it to that point? This is an opportunity to think about whether the technique can “scale”—whether it can continue to provide utility once it is being rapidly taken up in large numbers. This also bears on usability: a technique that requires a number of users to succeed should have a lot of thought put into how immediately useable, understandable, and friendly it is. If your obfuscation requires a number of users, then the plan must include how to get them. The Tor project, for example, has recognized the need for greater accessibility to non-expert users.

Second, a technique that relies on relative obscurity—on not being widely adopted, or on not being something that an adversary is looking for—benefits from exclusivity.

… known, or unknown?

Some obfuscation methods use their ability to blend into the innocuous data they generate to avoid scrutiny; others use it to escape scrutiny. For the goals you want to accomplish, can your method work if your adversary knows it is being employed, or if your adversary is familiar in detail with how it works?

For many techniques that merely buy time, the answer doesn’t matter. For example, whether or not the adversary’s radar operator thinks a large number of dots represent real airplanes makes no difference to the adversary’s ability to coordinate a counterattack. As long as the radar operator is slowed down for ten minutes, the obfuscation provided by chaff is a success. More complex obfuscation methods can accomplish different goals depending on whether or not the adversary knows they are being used. For example, if AdNauseam activity isn’t known to the adversary, it works to foil profiling, filling the record of advertising clicks with indiscriminate, meaningless activity. If it is known, it both frustrates the work of profiling the individual and develops a protest role—a known gesture of mocking refusal. (Build a surveillance machine to get me to click a few ads? I’ll click all of them!)

However, in some cases the distinction matters and must be accounted for. If your goal is to render a database less effective or less valuable in the long term, so that your adversary continues to use it and thus is acting on misleading or false information, you want sources of plausible obfuscation to remain unknown so they can’t be selected and expunged or countered. Forms of obfuscation that function primarily as acts of public protest need their obfuscating nature to be made explicit so they can stand as refusal rather than compliance.

… selective, or general?

This is the most complex of the questions, with four different implications that must be considered.

Each of the goals discussed above, to one degree or another, relies on an understanding of the adversary against which obfuscation is directed. Often this understanding—whether it is formalized as a threat model or whether it is informed guesswork—is fragmentary, missing important components, or otherwise compromised. What first interested us in obfuscation was its use by people who often lacked precise mastery of the challenge they faced to their privacy: it was proprietary, or classified, or it relied on technologies and techniques they could not comprehend, or the “adversaries” included other people freely giving up their data, or the problem existed both in the present and in possible future vulnerabilities. In addition to having a clear understanding of the limits of obfuscation—that, knowing one’s adversary—we must bear in mind what we don’t know, and beware of relying on any one technique alone to protect sensitive information. This raises the question of how directed a particular obfuscation strategy is. Is it a general attempt at covering one’s tracks, or is the obfuscating noise that you produce tailored to a particular threat about which you have some knowledge? A few further questions follow from your answer to this.

First, is your obfuscation approach directed at a specific adversary, or is it directed at anyone who might be gathering and making use of data about you? Is there a specific point of analysis you are delaying or preventing, or are you just trying to kick up as much dust as you can? The strategy outlined in the “cloning” patent that Apple acquired is an example of the latter: producing many variants of the user, all generating plausible data, for anyone who might be collecting. If you know your adversary and know your adversary’s techniques and goals, you can be much more precise in your obfuscation.

If you know your adversary, a second question arises: Is that adversary targeting you (or a select group), or are you subject to a more general aggregation and analysis of data? If the former, you must find ways to selectively misrepresent your data. The latter possibility offers a different task for the obfuscator: the production of misleading data can take a significantly wider- ranging form, resembling data on what may be many individuals.

This, in turn, raises a third question: Is your technique supposed to provide selective benefit, or general benefit? In view of how much of the work of data surveillance is not about scrutinizing individuals but rather is about using inferences derived from larger groups, your method might work to obfuscate only your own tracks, or it might work to render overall profiles and models less reliable. Each of those possibilities presents its own distinct difficulties. For example, if TrackMeNot functions effectively, it has the capacity to cast doubt not only on the obfuscator’s profile but also on the profiles of others in the dataset.

Thinking about beneficiaries raises a fourth question: Is your goal to produce data of general illegibility, so no one knows or needs to know what is real and what is obfuscation? Or is it to produce obfuscated data that an adversary can’t get any value from (or can get only diminished value from), but that tell the truth to those who need to know what is real? Think of FaceCloak, a system that keeps Facebook from gaining access to personal data by providing it with meaningless noise while keeping the actual, salient personal and social data available to one’s friends. Or consider a system designed to preserve socially valuable classes of data – derived from the census, for example, in order to allocate resources effectively or to govern efficiently, while preventing the identification of individual data subjects within them. Creating a selectively readable system is far more challenging than simply making generally plausible lies, but a selectively readable system offers wider benefits along with privacy protection, and the difficulties involved in creating it are a challenge that should be accounted for at the outset of a project.

… short-term, or long-term?

Finally, over how long a time span should your project be effective? The goal of buying time is a starting place for answering this question. If you want to confuse the situation for only ten minutes, that’s one thing; if you want to render some database permanently unreliable, untrustworthy, and valueless for inference or prediction, that’s much harder. A major component of the information asymmetry that obfuscation helps to address is temporal—the “time-traveling robots from the future” problem we discussed in chapter 3. Certain data may be innocuous now, but a change in context, a change in ownership, or tools or laws can make the same data dangerous. Does your technique have to work only for now, and only for one outrage, one company, and one technique of collection and analysis, or does it have to ruin the data so that they can’t be trusted in the future or for other purposes? The former isn’t easy but is relatively straightforward. The latter involves a much broader set of challenges. It is worthwhile to consider this question now, at the development stage, so as not to be caught out after a technique has been widely adopted and you realize that it was provisional, or that it was particular to a company bound by certain national laws that no longer apply.

With these six goals and four questions in mind, we can assess the fundamentals—and some of the pitfalls—of putting together an obfuscation strategy. Of course, the questions won’t end with these. As viable practice, as a powerful and credible response to oppressive data regimes, obfuscation will be well served by conditions that will enable it to develop and thrive. These include the following:

• Progress in relevant sciences and engineering Develop methods in statistics, cryptography, systems engineering, machine learning, system security, networking, and threat modeling that address questions like: how much noise, what kind of noise, how to tailor for the target of noise, how to protect against attack, and for what specific problems is obfuscation the right solution?

• Progress in relevant social sciences, theory, and ethics Address questions about what individuals want and need in their uses of obfuscating systems, and to engage in sound normative assessments of proposed systems.

• Progress in technology policy and regulation Safeguard open and public standards and protocols that allow developers of obfuscating systems access to and engagement with critical infrastructure; encourage large, public facing systems to offer open APIs to developers of obfuscating systems; and refuse enforcement of Terms of Service that prohibit reasonable obfuscating systems.

Obfuscation, in its humble, provisional, better-than-nothing, socially contingent way, is deeply entangled with the context of use. Are you creating a personal act of refusal, designed to stand on its own as a gesture of protest, whether or not it actually makes data collection less useful? Are you using obfuscation as one element in a larger suite of privacy-protection tools tailored to a group and an adversary—obfuscation that has to work verifiably in relation to a specific data-analysis strategy? Perhaps you are applying obfuscation at the level of policy, or to data collection that requires more effort to misuse, so as to increase the cost of indiscriminate surveillance. Or perhaps you are developing or contributing to software that can provide a service with a layer of obfuscation that makes it difficult to do anything but provide the service. You may have access to considerable technical, social, political, and financial resources, or you may be filling out forms, dealing with institutions, or interacting online without much choice in the matter. With all of those different possibilities, however, the issues raised by our goals and questions are general to obfuscation projects across different domains, and working through them provides a starting point for getting your obfuscation work out into the world, where it can begin doing good by making noise.