As I discuss in the introduction, one of the definitions of the Dark Web I reject is the Deep Web characterization, which holds that it comprises everything Google hasn’t indexed. This definition implies that there are layers to the Internet, each one more impenetrable than the last, where only those with elite computer networking skills can navigate. It also implies that there are no Dark Web search engines.
Actually, quite a few search engines specialize in crawling, indexing, and sorting Freenet freesites, Tor hidden services, and I2P eepsites. One of them, onion.link, even uses a Google Custom Search engine, meaning—despite all the news stories and academic articles that say otherwise—Google has crawled significant parts of the Dark Web.1
The concept of a “web beyond Google” might give the Dark Web a lot of mystique, the idea that there is a web beyond the web, out of reach of those of us who engage in Google searches, only vaguely aware that there is more to the network than what Google caches. It also gives the makers of Dark Web search engines a lot of motivation: What if I could beat Google to these new networks? What if my search engines becomes, to use an often repeated phrase, the Google of the Dark Web? What if it becomes a legitimate portal into the Dark Web?
This chapter takes up Dark Web search engines and considers them through the specific lens of legitimacy as propriety, or the sort of legitimacy that corporations and nonprofit organizations seek. I first elaborate on what I mean by propriety and, following that, briefly lay out a theoretical framework for the chapter. Then, drawing on interviews with Dark Web search engine operators, archival research, and participant observation (including using multiple Dark Web search engines as well as installing and running distributed search engine software), I consider how Dark Web search engines seek propriety by aligning the interests and perceptions of various other entities, including Dark Web users and nonusers, humans and nonhuman entities. In doing so, I hope to answer the call for research on search engines that focuses on their “gatekeeping” capacities and how they technically function.2 I also hope to highlight the importance of search engines as entry points into specialized networks, such as the Dark Web; this focus on search can illustrate a portal by which Dark Web nonusers can become users. To emphasize this, I conclude by considering how these search engines seek to gain a big inheritance: the title of “Google of the Dark Web,” thus inheriting Google’s legitimacy.
Organizational and managerial communication literature reveals two facets of legitimacy. First, legitimacy is about perceptions. Second, legitimacy is about resources.
For the first facet, I return to organizational sociologist Mark Suchman’s definition of legitimacy. Legitimacy is “a generalized perception or assumption that the actions of an entity are desirable, proper, or appropriate within some socially constructed system of norms, values, beliefs, and definitions.”3 This places organizational legitimacy firmly within the bounds of strategic communication and perception management. For example, writing about challenges to organizational legitimacy, Myria Watkins Allen and Rachel H. Caillouet argue, “Corporate actors, especially those whose legitimate right to operate is being challenged, embed self-presentation strategies in their external discourse to control perceptions within their organizational field.”4 For scholars of start-ups, such as Monica A. Zimmerman and Gerald J. Zeitz, legitimacy is so important that its acquisition should be the first goal of the new venture, even ahead of becoming profitable: “New ventures need resources from their environment, and, in the end, the motivating factor for external actors to give such resources is their belief or feeling that the venture is indeed competent, efficient, effective, worthy, appropriate, and/or needed.”5 In other words, if a start-up is perceived to be legitimate, it is more likely to attract venture capital.
Second, many organizational and managerial communication scholars note that organizational legitimacy has a relationship to the command of resources. To put it simply, organizations are perceived to be legitimate by dint of the fact that they have resources.6 If organizational legitimacy is about perceptions, it stands to reason that organizations with the resources to engage in advertising, sponsorship, or lobbying enjoy increased odds of being perceived as desirable, proper, and appropriate.7 To use a term from the symbolic economy of legitimacy, organizations with resources can simply purchase legitimacy through ad campaigns. Furthermore, in contemporary capitalism, a firm’s legitimacy is often measured in terms of its resources. In this view, a firm’s profitability reflects the fact that consumers have chosen it over others when making purchasing decisions. More profits means more legitimacy. In fact, an extreme example of this resource-first-legitimacy-later approach can be seen when a previously illegal business “goes legit.” As R. T. Naylor caustically notes, “Today’s [criminals] … may well be tomorrow’s free-market pioneers. Someday Bogotá may well host the Pablo Escobar School of Business to vie with certain North American institutions bearing the names of notorious tobacco barons or booze smugglers.”8 Nonprofits pursue resources such as government grants not only to fund their organizations, but also as a marker of their legitimacy.9 The more grants and donations the nonprofit can attract, the more legitimate it is.
Indeed, these two facets of legitimacy—perceptions and resources—present a causal quandary. Does the perception that an organization is legitimate give it access to more resources? Or does the command of resources give an organization the perception of legitimacy?10 Rather than seeking the causal factor, however, I instead turn to organizational studies scholars who argue that we must attend to the relationships between social, technical, economic, and political factors as organizations take shape. My suggestion that this form of legitimacy is best understood as propriety, in the dual sense of something being proper as well as commanding resources and property (as in proprietorship), is meant to emphasize that we must simultaneously examine how organizations command both respect and resources. In this sense, legitimacy as propriety echoes the Weberian theory of state legitimacy, which holds that the state seeks a monopoly on the material tools of violence (weapons, militaries, police forces) as well as the perception that the state is the rightful master of these tools. Organizations seek something similar: command of resources (employees, profits, property) as well as the perception that they are the rightful masters of these resources.
Thus, to investigate Dark Web search engines through the lens of organizational legitimacy, we have to consider how search engines simultaneously develop the “generalized perception” of their propriety as well as their capacities to command resources. Moreover, this has to be done within the specific environment of the anonymizing networks Tor, I2P, and Freenet. As Zimmerman and Zeitz argue, any new organization “will face a different set of relevant environmental forces” as it seeks to command respect and resources. “No organization can be consistent with all environments; the point is for the new venture to be clear about the particular mix of environmental factors that is important to its survival.”11 Likewise, any investigation of the legitimacy of Dark Web search engines must take into account the specific environmental factors in which they operate.
If propriety is a perception of the appropriateness of the organization as well as its ability to command resources, we are already dealing with heterogeneity. We’re dealing with feelings and things. The problem is compounded when we consider the specific environment in which Dark Web search engines must operate, as well as the perceptions and resources Dark Web search engines must bring together, a list that includes
Any successful Dark Web search engine must balance these and other perceptions and resources to gain status as a legitimated point of passage into the Dark Web. Again, it would be difficult to tell whether the perception of a Dark Web search engine’s legitimacy would precede its command of resources, or whether resources, such as a large index of sites, would precede the perception. It is best, therefore, to think of all these elements in relation to one another.
For a framework to investigate these elements, I turn to a school of organizational studies that draws on actor-network theory and whose members include Barbara Czarniawska, Susan Leigh Star, John Law, and Michel Callon, scholars who have written extensively on how heterogeneous elements can be brought together into organizations that, to paraphrase Barbara Czarniawska and Tor Hernes, have legitimacy emerge through their organizing.12 Callon, a key figure in this field, has argued that organizations establish themselves as “obligatory points of passage” by naming various sets of actors with whom they want to have a relationship; defining relations between them; defining their interests; and presenting themselves as mediators between all the other actors on the network.13 Law similarly speaks of organizations as capable of harnessing the power of other elements in a network:
Actors (including collectivities) struggle to impose versions of reality on others which define a) the number of those others, both natural and social, that may be said to exist in the world, b) their characteristics, c) the nature of their interrelations, d) their respective sizes and e) their positions with respect to the actor attempting [to impose its version of reality].14
In other words, if an organization can “impose its reality” on others, it is far more likely to be legitimated. Importantly, for Callon and others in this school, “actors” includes human and nonhuman elements, discourses, and materials, any of which can be drawn on to support the legitimacy of the organization. Respect and resources must both be attended to.
Callon’s observation that successful organizations present themselves as obligatory points of passage is especially important in considering Dark Web search engines. After all, a basic relationship mediated by search is between user and information. In fact, multiple relationships are mediated by search engines, a point I explore in depth below. Here, I want to point to another organizational scholar, Susan Leigh Star, who has argued that we need to pay attention to infrastructures as we do our analysis. Infrastructure, which she defines as background, connective processes, and systems, is “part of human organizing.”15 Operating in the background, infrastructural systems are taken for granted. As sociotechnical systems, however, they can shape a great deal of our daily lives. Her key question is, “What values and ethical principles do we inscribe in the inner depths of the built information environment?”16 Star’s question is an important one to keep in mind as we analyze Dark Web search engines, since their legitimacy hinges in part on their ability to become infrastructural, essentially backgrounded parts of Dark Web interaction.
Finally, and in a related vein, this school of organizational studies reminds us to pay attention to elements that the organization hides or simplifies. John Law, Annemarie Mol, Gail T. Fairhurst, and François Cooren have all written about the relationships between presence and absence and simplification and complexity.17 In terms of presence and absence, Fairhurst and Cooren consider how political or corporate leaders work to establish their authority and power, arguing that successful leaders are able to highlight elements that make them look favorable and suppress others that do not.18 To put this into the terms of the symbolic economy of legitimacy, this latter practice is delegitimation. Yet, going so far as to hide other elements is often unnecessary; as Law argues, organizations can simply tell “ordering stories.” “When we tell ordering stories we simplify and ‘punctualize.’”19 This is because
not everything can crowd into a single place, and implosion, or, perhaps better, condensation, is impracticable. Perhaps this is a general principle, but, linked to concern with design and control, it’s what the actor-network theorists point to when they tell of “punctualization.” That which is complicated comes in simple packages … that can be used to make sense.20
In other words, instead of hiding unwanted elements, sometimes they can be hidden in plain sight by simplifying them, ordering them, or organizing them. To put this in terms of the symbolic economy of legitimacy, this can be appropriation (if the simplification is exploitative) or exchange (if the simplification is mutually beneficial).
Organizational studies scholars working in the actor-network theory tradition sound a note of caution, however, about hiding and simplification: these processes are not guaranteed. Hidden or simplified elements often resist the ordering stories and attempt to reassert themselves rather than be silenced. This is a point hammered home in Michel Serres’s book The Parasite, a work that has been influential on actor-network theory.21
Thus, if legitimacy at the organizational level is about aligning perceptions and marshaling resources, with these activities done in specific environments, the organizational studies scholars who attend to the relationships between heterogeneous elements are useful guides. All of these scholars highlight elements in organizing that can play roles in constructing an organization’s command of respect and command of resources. Taking up these organizational studies scholars, I want to suggest key ideas:
If a Dark Web search engine can achieve all of these goals, it can gain legitimacy as propriety. Aligning interests can aid search engine developers in achieving the perception of appropriateness. Becoming infrastructural will give them access to and influence over informational resources. Successfully dealing with elements that resist their ordering stories will further solidify their positions. The interaction between perceptions and resources can continue to strengthen the search engine in the network, to the point where Dark Web users agree that the search engine is legit. Legitimacy as propriety thus becomes a multiply caused effect of these alignments.
In this section, I consider a range of Dark Web search engines (table 5.1), some still running, others no longer available.
Table 5.1 Search engines across various Dark Web networks
Name | Network(s) |
not Evil | Tor |
Direct | I2P (inactive) |
elgoog | I2P, Tor (inactive) |
Ahmia | Tor, I2P |
MoniTOR | Tor (inactive) |
Freegle | Freenet (inactive) |
Grams | Tor |
Enzo’s Search | Freenet |
Seeker | I2P |
Beast | Tor |
Eepsites.i2p | I2P (inactive) |
Epsilon | I2P (inactive) |
Onion.link | Tor |
Candle | Tor |
I am arguing not that any of these search engines have succeeded in legitimating themselves, but that they are (or were) engaged in a great deal of work to achieve that status. To explore this, I first consider their naming of other actors and their relations. I next consider how they mediate between various points on the network, seeking to become infrastructural. I finally consider how Dark Web search engines attempt to hide or simplify other elements in the network, even as some of those elements resist such attempts.
First, to legitimate themselves, makers of these various Tor, I2P, and Freenet search engines must name relevant actors, identify relations among them, and discover their interests. Based on interviews with Dark Web search engine operators and analysis of developer mailing list archives, IRC chat logs, technical papers, archived Dark Web sites, and grant funding applications, I summarize the relevant actors and interests in table 5.2.
Table 5.2 Actors and interest relations potentially mediated by Dark Web search engines
Actor | Interests |
The network | Maintain bandwidth and anonymizing capacities |
Sites | Be found (although some sites want to hide) |
Vendors | Be found; gain and maintain reputation as legit |
DW users | Find sites or new content |
Law enforcement | Identify sites and subjects; arrest lawbreakers and seize servers |
Spiders | Access, duplicate, and store Dark Web pages |
Protocols | Retain anonymity; condition access |
Other search engines | Become the “Google of the Dark Web” |
Network builders | Gain organizational legitimacy; maintain the viability of the network |
Nonusers | Read about Dark Web in news; opine about necessity of Dark Web |
Second, after establishing the relevant actors, Dark Web search engines seek to align interests by mediating between as many of these actors as possible, inserting themselves into (or even constituting) relations between these actors. I consider users and networks, network builders and nonusers, Dark Web sites and law enforcement, users and law enforcement, vendors and buyers, and finally, software and protocols.
The relationship between Dark Web users and the Tor, I2P, or Freenet networks is the controlling relationship. Search engines seek to become the main channel for this crucial pair of actors. One the one side, we have the extreme heterogeneity of users, who may seek any number of things, from music, pornography, or conspiracy theories to mental health support or cat facts.22 Viewed through Daniel E. Rose and Danny Levinson’s conceptual framework, they might be navigating (e.g., trying to find a specific web page), information seeking (e.g., researching a specific topic), or resource seeking (e.g., trying to find software or be entertained).23 On the other side, we have an almost equally heterogeneous collection of websites hosted on Tor, I2P, or Freenet: social networks, blogs, forums, home pages, and markets, all covering a wide range of topics. Some of these sites’ operators want them to be found; others seek to remain hidden.
Between these two sit a host of potential mediators, including directories, wikis, knowledgeable users who share links, Reddit subreddits, publications such as Deep Dot Web, myriad trails of links between sites, and search engines, my focus here.
Across a range of mailing lists and IRC chats, Dark Web users have called repeatedly for reliable search. As Matthew Toseland of Freenet noted in 2005, “Every user sooner or later asks ‘why isn’t freenet searchable?’”24 When search engine operators present their work, they often argue that their engines are services that will benefit users most of all. For example, in an interview, the administrator of Tor’s not Evil search engine likened a good search engine to parents: “They serve as a guide. You’re supposed to be able to trust them with your questions.”25 Another Tor search engine operator argues that “the more of us who build engines, the more detailed and differing information will be available to the average user.”26 Likely because of the influence of Google, many users expect Dark Web networks to be navigable via search engines: as Juha Nurmi (founder of the search engine Ahmia) puts it, a “Google-like search site is the most user-friendly solution” to the problem of navigating the Dark Web.27
In addition, search engine operators argue that the networks themselves will benefit from their engines. As Enzo argued in an interview with me, “I would say the one thing Freenet needs the most is users. Users make Freenet interesting. Some of those users will become contributors, providing new and interesting content, code, documentation, translations, bug reports, or feedback.”28 Enzo suggested that Enzo’s Search would contribute to the goal of adding more users to Freenet. Given the structure of Tor, I2P, and Freenet, which relies on network traffic to obfuscate the identities of users, increasing numbers of users on these networks generally translates into stronger anonymity. More traffic could also translate into more heterogeneous content; as new users access Dark Web sites, they might decide to host their own to fill perceived gaps. Indeed, in addition to calling for search engines, users also call for more content to be hosted on Tor, I2P, or Freenet.
As one I2P developer notes, however, a poorly implemented search engine could actually discourage users and thus harm the networks: “Because if its not a service providing in depth information and a good overview about I2P content, it might actually hurt us. Someone using I2P first time might be disappointed, that the results wont keep up with google etc and assume theres actually no good content on I2P.”29 Likewise, Matthew Toseland notes that with inadequate search in place, “One obvious disadvantage is that users will search for something, won’t find it, and will assume freenet is crap. :)”30 Thus, if these networks introduce search engines, the stakes are high: they have to satisfy users’ heterogeneous search queries and return “good content” or they risk their networks being perceived as “crap.”
This basic relationship between users and networks, mediated by search engines, helps structure many other relationships and interests, including between the network builders and nonusers, law enforcement and hidden website operators, users and surveillance systems (both corporate and government), vendors and buyers, and software and Dark Web protocols.
In chapter 3, I trace the practices of the group I call the network builders—the coders, developers, and promoters of Tor, I2P, and Freenet. While a great deal of that development is technical (as in the development of protocols, networking schemes, and encryption practices), a significant part of the work is social: Tor, I2P, and Freenet developers also work to construct the reputation of their projects for the general public of nonusers. By “nonusers” I mean any consumers of news stories about the Dark Web who do not use the Dark Web themselves. This is obviously a heterogeneous group, and although it does not include current Dark Web users, it nonetheless has significant influence on the viability of these projects. Nonusers can react to what Wendy Hui-Kyong Chun calls the “extramedial representation” of the Dark Web, “the representation of networked media in other media and/or its functioning in larger economic and political systems,” by calling for or consenting to these projects being shut down, made illegal, or starved of funding.31
Networks builders’ efforts to present their work as contributing to general communications welfare have been largely overshadowed by journalistic coverage of the taboo activities of Dark Web users and site operators. Tor in particular has been associated with the Silk Road drug market, Freedom Hosting’s child exploitation images (CEI), and stories of hackers for hire. Freenet and I2P have had less coverage, but negative stories about both have also been published. To combat this image, the developers at the Tor Project, Invisible Internet Project, and Freenet have called for adding more mainstream services that may be recognizable to nonusers. This entreaty is directly tied to the perception of how appropriate these networks are.
To better present Dark Web networks to the general public, a central service that network builders call for is a search engine. As Freenet developer Arne Babenhauserheide argues, “For [Freenet to be] *more* socially acceptable we need more actively spidering [indexes] which only include what the creator deems acceptable.”32 In other words, to be legible to nonusers, Freenet needs more search engines (built in part through spiders that index Freenet) capable of highlighting “acceptable” content.
Perhaps the best example of a search engine mediating between network builders and nonusers is Juha Nurmi’s Ahmia, which indexes Tor and I2P. Nurmi and his engine operate as ambassadors for the Tor Project. He frames his search engine as a transparency tool, bringing Tor hidden services and I2P eepsites to light. Rather than framing Ahmia as a system that makes largely taboo activities visible (as the Electronic Frontier Foundation’s Jeremy Malcolm does), Nurmi uses statistical data produced through his crawler to claim that only a tiny number of Tor hidden services are dedicated to CEI or trade in illegal goods, and thus the majority of Tor services are appropriate and acceptable.33 As Nurmi argues in a slide presentation,
Unfortunately, many times the popular news about Tor are telling about drugs, guns and child porn … [which is] bad for Tor’s reputation. In reality, there are only [a] few [of] these kind of sites. Ahmia has the real statistics:
- Less than 20 child porn sites
- Less than 10 black markets
- A few scamming sites.34
In a Knight Foundation News Challenge application, Nurmi contends,
We are solving a key problem with hidden services. The problem is that it is hard to find content published anonymously using Tor. We are making Tor network accessible in many different ways: listing Hidden Services, gathering their descriptions and providing full text search to the content. We can also provide cached text versions of the pages. …
We are building good reputation to Tor network along with other online anonymity systems, such as Globaleaks and Tor2web, software project originally made by Aaron Swartz now maintained by Hermes Center for Transparency and Digital Human Rights. We have plans to integrate Globaleaks and Tor2web to our search engine.35
Here, Nurmi associates transparency and “good reputation,” and he links Ahmia with other acceptable practices and sites, such as the whistleblowing site GlobaLeaks. He also invokes Aaron Swartz, the activist whose suicide came after what many in the free information community saw as brutal treatment by U.S. federal law enforcement.
Ahmia, as Nurmi told me in an interview, is meant “to support human rights, such as privacy and freedom of speech. … It’s like Google search for onion sites.”36 Nurmi best exemplifies a mediator between the Tor Project and the general (non-Dark-Web-using) public, and his rewards have included sponsorship by the Tor Project at the Google Summer of Code in 2014. This is a legitimacy exchange: as Nurmi seeks to improve the reputation of anonymizing networks, he builds his reputation as a skilled computer scientist who supplies a needed search engine service to network builders, users, and a general public largely wary of the Dark Web.
In a story in Digital News Asia, Jeremy Malcolm of the Electronic Frontier Foundation argues that criminal activity moving onto the Dark Web will make crime more visible, not less:
The advantage of criminals using hidden services is that at least it provides transparency about the problem. Often law enforcement agencies will spout made-up figures about how much crime is conducted online, which others have no way of verifying. … But with hidden services, it is possible to get a better idea of what previously happened under wraps. This is the first step towards catching and prosecuting those criminals using conventional investigation methods.37
This argument is echoed by Dark Web market researchers James Martin and Nicolas Christin, who note that
in contrast to the secretive and opaque world of conventional drug markets, the online drugs trade takes place largely in the open. Protected by anonymizing technologies, online drug vendors freely advertise their products, including prices, quantities and the regions to which goods may be sent.38
In other words, Dark Web crime is far from hidden: it is made visible, with evidence being collected automatically as Bitcoins move from one wallet to another and as forum posts are recorded. Indeed, my analysis of the politics of Dark Web markets in chapter 4 was aided a great deal by scholars who have built archives of market activities, many of whom are keenly interested in the scale and scope of criminal activities on the Dark Web.39
Search engines can be part of this, creating new relationships between Dark Web sites and law enforcement. Given search engines’ capacities to aggregate and organize data on websites, it is not surprising that they could be seen as tools for law enforcement investigations. The best example is the Memex search engine of the Defense Advanced Research Projects Agency (DARPA). Named after Vannevar Bush’s famous 1945 thought experiment, DARPA Memex
will not only scrape content from the millions of regular web pages that get ignored by commercial search engines but will also chronicle thousands of sites on the so-called Dark Web—such as sites like the former Silk Road drug emporium that are part of the TOR network’s Hidden Services.40
Beyond indexing a large range of websites, including Tor hidden services, Memex is being designed to return detailed and linked results on specific search tasks. The initial task used to introduce Memex to the public was combating human trafficking:
DARPA plans to develop Memex to address a key Defense Department mission: fighting human trafficking. Human trafficking is a factor in many types of military, law enforcement and intelligence investigations and has a significant web presence to attract customers. The use of forums, chats, advertisements, job postings, hidden services, etc., continues to enable a growing industry of modern slavery. An index curated for the counter-trafficking domain, along with configurable interfaces for search and analysis, would enable new opportunities to uncover and defeat trafficking enterprises.41
Thus, Dark Web search engines can be deployed as tools for law enforcement. Obviously, beyond Memex, there is no reason that law enforcement agencies cannot use any Tor, I2P, or Freenet search engines to research and track Dark Web sites and activities.
Dark Web development has been driven, in part, by the perceived overreach of government agencies, including DARPA, the NSA, the UK’s Government Communications Headquarters (GCHQ), and Communications Security Establishment Canada (CSEC). Government agencies monitor Clear Web users’ search patterns or take warrants to corporations such as Google to gather data on users. In contrast, while search engines built for the Dark Web may make sites more visible, they tend to deny law enforcement easy access to users’ search records. For example, Ahmia seeks to keep law enforcement at a distance with its legal policy:
We take your privacy seriously: we absolutely do not maintain any IP address logs. We have no information to share to any third parties regarding usage of the Ahmia service.
We do not allow backdoors into our services for access by authorities or anyone else. Officials who want information for criminal investigations must contact the Ahmia Project Leader with a warrant. If this happens, we will publish the warrant and challenge it.42
Although other Dark Web search engines do not post such detailed legal policies, in interviews, their operators claim that their services will help protect end users against government surveillance. Given that Tor, I2P, and Freenet were developed in part to protect the anonymity of users, search engine operators are delegitimating the state’s attempts to deanonymize and track user activities.
Considering the claims of the developers of DARPA Memex and the developers of Ahmia, Seeker, or Enzo’s Search, broadly speaking, Dark Web search will in fact make hidden sites more accessible to all users, including law enforcement agents. But those who make these search engines are attempting to prevent users’ search habits from being monitored.
A major part of the Dark Web political economy includes the sale of drugs, counterfeit goods, and stolen information, so it is not surprising that Dark Web search engines can become channels of commerce. The Tor-based Grams (now defunct) specialized in this area. Echoing previous Clear Web efforts to connect buyers and sellers (such as Google’s Froogle), Grams offered Tor hidden service search specifically of markets. It also aggregated user-generated reviews of vendors from the markets as well as reviews on Reddit, offering a rating system not unlike Amazon’s. Grams was not useful for finding general Tor hidden services, but as the magazine Deep Dot Web proclaims, it became a hub for drug market sellers and buyers.43 Vendors could build a stable reputation (tied to a pseudonym and a PGP key) across multiple markets by being included in the Grams database. Beyond this, Grams also offered Bitcoin tumbling services and an advertising network where markets and vendors could advertise their offerings.
Because Tor, I2P, and Freenet are anonymizing networks designed to obfuscate the IP addresses of both users and site hosts, search engine operators face technical challenges quite different from standard World Wide Web search. Dark Web search engine operators alter existing software packages, such as Yacy (a peer-to-peer search engine), Apache Lucene, or custom Perl or Python scripts, in order to crawl Tor hidden services, eepsites, or Freesites. Many of these web search packages were made for the World Wide Web, which is older, faster, has far more content and links, and has a centralized naming system (the domain name system, or DNS). Documentation for adjusting search engine software packages for Tor, I2P, or Freenet tends to be sparse or nonexistent. Nurmi, the founder of Ahmia, notes these problems in terms of searching Tor sites: “First, the linking between onion sites is thin; as a result, algorithms using the backlinks aren’t working very well. Second, it takes time to crawl everything because Tor is slow. Lastly, onion sites are replacing their addresses all the time.”44 Search engine software, especially preexisting packages, must be heavily modified for these specific problems.
Even after doing so, however, Dark Web search engines must balance between crawling these networks enough to produce an up-to-date index while not overwhelming network bandwidth. Too little crawling of the network results in obsolete indexes; too much and bandwidth that would otherwise go to network users goes instead to the search engine software. This is especially tricky in Dark Web environments where hidden sites appear and disappear frequently. As MoniTOR explains, to see if a site is available on the Tor network, a search engine could “ping” a server (essentially asking the server, “Are you available?”). Although “every ping is tiny, … if enough people use your service, it could potentially increase your bandwidth use dramatically.”45 Given that bandwidth on the Tor network is at a premium, pinging could slow down the search engine or even result in an inadvertent denial-of-service attack on another server. But, as MoniTOR explains, without pinging (or other methods to see if a server is online), new content could be hidden from the search engine: “As for server content, if a server[’]s data changes, it may not be spidered for a period of time, or until it responds to the ping.”46
Moreover, even with careful tuning, some website developers may not want part or even all of their sites indexed and may protest if their content ends up in a search index. On Tor and I2P networks, site owners can use a two-decades-old standard, the robots exclusion standard (commonly called robots.txt). By using a text file at the root of their servers, site administrators can declare whether and how they want their sites indexed by crawlers. But this standard was built for the World Wide Web, where user agent strings are commonly used. User agent strings identify the operating systems, applications, and IP addresses of visitors to a server. This works well for the World Wide Web, where a search engine can use a standard identification. Google’s, for example, is “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html).” In contrast, Dark Web systems obfuscate this information by default. The I2P system uses “MYOB/6.66 (AN/ON)” as a default (a series of jokes: mind your own business, the number of the beast, anonymous).47 Search engine operators in I2P have to modify this default to provide unique identifying information; they also have to write guides for site administrators to properly configure robots.txt to work. In Freenet, the situation is worse: there is no support for robots.txt. In all of these systems, if a site administrator wants to be excluded from a search engine index, they are best off contacting the search engine operators or hiding material behind password protections.
Finally, a special problem appears in the case of Freenet. Recall that one of the technical aspects of Freenet is the network’s ability to “forget” unpopular content. If a file (HTML, PDF, MP3, JPG, or otherwise) is not accessed often enough, the network is designed to overwrite it. But this is predicated on the assumption that humans would be the arbiters of popularity. What about software processes? As David McNab notes in his description of his Freenet search engine, “The very act of spidering these pages will change their routing within Freenet—and will have a tendency to bring to life a lot of the more rarely-visited freesites.”48 In other words, as Freenet crawlers request freesites, they ensure that the network will not overwrite them, even if few humans actually want to visit them.
Even after all the work of tuning search engine software to operate in the unique protocological environments of Tor, I2P, or Freenet, the sheer resources required for search engines—and the purpose of Dark Web searchers—may drive search engine operators away. A key example of this is the highly successful, but now shuttered, Direct search on the I2P network. Direct search administrator I2Phreak explained why Direct was shut down:
The router consumes too much system resources. The crawler is also written in Java and consumes the rest:) All the data we need to store (fetched pages, even compressed, search index, URLs database), in order to run the service, takes about 25 Gb of disk space. And about 90% of all search requests were about child pornography. There is no reason to spend so much resources to serve such kind of requests.49
Given that the searches were largely for content I2Phreak was morally opposed to, running Direct was not worth the high amounts of RAM and hard disk space required.
Thus, in addition to identifying actors (such as users, nonusers, law enforcement, vendors, buyers, and network builders) and mediating in various ways among them, Dark Web search engine operators have to consider the problems and interests of the networks themselves: their specific protocols, topologies, and capacities. They must align the capacities of off-the-shelf search engine software with the specific Dark Web networks they seek to index. They also must consider the demands on their own equipment of search and networking. To become a legitimated point of passage and command respect and resources, Dark Web search must provide channels between software and network protocols. Finally, even after all this work, Dark Web search engine operators may consider hiding sites from their results, as the case of Direct implies.
So far, I have shown how Dark Web search engine developers identify relevant actors and attempt to align these actors’ interests, effectively becoming a channel between entities such as users, networks, law enforcement, and software. These developers seek to balance both senses of propriety, commanding respect (by aligning other actors’ interests) and commanding resources (by channeling informational flows). But it is too simple to suggest that Dark Web search engines merely provide open channels between pairs of actors. They must also engage in hiding and simplification.
Recall Barbenhauserheide’s argument about Freenet search: “For [Freenet to be] *more* socially acceptable we need more actively spidering [indexes] which only include what the creator deems acceptable.”50 Only including “what the creator deems acceptable” is the key phrase here. A more complex description comes from George Kadianakis, who describes an ideal search engine for Tor hidden services:
If I could automagically generate secure technologies on a whim, I would say that some kind of decentralized reputation-based fair search engine for hidden services might squarify our Zooko’s triangle a bit. “decentralized” so that no entities have pure control over the results. “reputation-based” so that legit hidden services flow on top. “fair” so that no scammers with lots of boxes can game the system. Unfortunately, “fair” and “reputation-based” usually contradict each other.51
Here is a more complex set of design goals: a decentralized search system that would return only reputable sites and prevent scamming. Regardless of the differences in design goals, both Babenhauserheide and Kadianakis are calling for search engines that filter results so that only an acceptable class of Dark Web sites are accessible. Another way to understand this: these developers are calling for searches that hide certain classes (disreputable, unacceptable) of Dark Web sites from view, thus delegitimating them.
In mainstream press coverage, all Dark Web sites are portrayed as inherently and equally hidden. In other words, popular press coverage tends to present Tor hidden services, eepsites, and freesites as equally hidden from technically inept web users. When they report on Dark Web search, they deploy the metaphor of “bringing light to the dark,” implying that search engines (as well as directories) are seen as systems that can bring all hidden Dark Web sites into view. Actually, search engine operators often hide sites from their results by either deleting sites from their indexes or preventing them from being indexed. This is in addition to sites that use robots.txt or password protection to avoid being indexed at all. Thus, all Dark Web sites are not equally hidden or equally accessible. Those sites that are not visible through a Dark Web search are, in a sense, “deeper,” or more hidden, a little Deep Web (in the original sense of this term) within the Dark Web, so to speak.52
Dark Web search engine operators do this by preventing classes of sites from being included in their indexes. For example, FreenetUser, the creator of the AFKindex, explains their “banned” criteria:
First indexing/publishing any found freesite, i quickly got disgusted by child porn content, and added some filtering capabilities to AFKindex to completely ignore those freesites (no more crawling of those filtered keys).
Unfortunately, lots of adult freesites provides links to indices or other freesites that points to child porn content after 1 or two “hops” ; this is why you shouldn’t be able to find any porn here.53
In other words, to ban CEI, FreenetUser filtered out all pornography from AFKindex after finding that Freenet porn sites were strongly linked to CEI sites.
Enzo, a Freenet search engine operator, describes that index’s selection criteria:
My index allows you to browse Freenet without the need to worry about what links you are clicking on. I wouldn’t say that I censor content, as it’s still available on Freenet. It can still be reached from other index sites, which I do include in my index. I hide any freesite that contains child pornography, bestiality or hate speech.54
Nurmi took similar steps with Ahmia (the Tor and I2P search engine):
If there is any images/videos where is naked children we will filter the site out. According to the law of Finland I am not obligated to filter out anything. However, I don’t want to maintain public search for child porn.55
As should be clear, CEI sites are the single most filtered class of Dark Web sites. Through a range of practices, including using basic heuristics (i.e., pornography sites often link to child abuse image sites), soliciting reports from users, or building indexing algorithms that can distinguish between CEI and non-CEI sites, these search engine operators attempt to hide CEI sites from view, delegitimating them while legitimating the material that is returned by the engines.
I2P developer zzz refers to search engines that hide CEI (and other taboo material) as “curated.”56 Curation can involve the care of an archive (in this case, an index of hidden sites), but it also refers to the selection of artifacts from an archive for presentation. This museum metaphor is apt, since many museums keep the majority of their artifacts in archives out of public view and exhibit only a small proportion of them. Even if a Dark Web search engine “collects” all Dark Web sites in its index, the search engine operator does not need to allow all of them to be accessible to the end user. In contrast, critics of such filtered or curated search refer to it as “censorship,” suggesting that search engine operators are only showing classes of Dark Web sites they approve of. Regardless of the language, such filtered search engines do in fact hide sites from view, even as they make other hidden sites more accessible, visible, and legible to Dark Web users.
Although most Dark Web search engines seek to hide classes of sites—especially CEI sites—they must provide at least some degree of access to the legit (i.e., authentic) Dark Web. Otherwise, their basic relationship to the end user would break down: if a search engine does not return results that map onto the end user’s perception of what the Dark Web contains, then that engine is not legit. Indeed, some of the search engine operators I’ve interviewed opted not to filter their search results. MoniTOR is one example:
My personal feeling is that MoniTOR needs to stay neutral. … MoniTOR does indeed index CP [child pornography] sites and communities. It will also provide search results to those who look for it. Do I like it? No. However, my place [is not] to judge what a person thinks or feels. Further to this, MoniTOR does not cache any of the content, just the URLs; headers and subject lines/meta tags. This keeps MoniTOR legal, as it does not host any of the material it spiders.57
Here, MoniTOR promises access to simplified (i.e., URL, headers, and meta tag) overviews of all the Tor sites the Yacy-based search will index, regardless of the operator’s judgment of the sites. Another search engine, I2P’s Direct search, did as well. Note that both search engines are, as of this writing, offline.
Filtered or not, all the engines engage in simplification, or the reduction of complexity. Following the practices of mainstream search engines, the results from querying a Dark Web search engine tend to be composed of four technical elements:
Although these elements provide a great deal of information about sites—title, content, and an indication of how “fresh” the content is—these are simplifications of the sites. They are not the sites themselves but rather metadata about the sites culled from the search engine’s index. With this simplification, the engine can present Dark Web sites to users in small batches (about ten at a time). Moreover, the placement of the sites on the results page is based on the search engine’s relevance algorithms. Finally, most—if not all—of the Dark Web search sites are in English; this of course “simplifies” things insofar as it presents web pages using non-English languages in an English metadata frame.
John Law’s point about “ordering stories” applies to search engine simplification. Dark Web search engines tell ordering stories about Tor hidden services, I2P, or Freenet by privileging sites over others in their indexes, blocking others, and presenting a simplified interface to end users. The result of this is the introduction of the politics of search to the Dark Web: the engines promise access to the legit Dark Web, but this access is algorithmically curated. Rather than simply building a smooth and open channel, search engines introduce mediation that structures the connection between relevant actors, hiding some elements, simplifying others, and above all, laying claim to legitimacy as propriety.
Although Dark Web search engines attempt to hide and simplify other elements in the networks, as Michel Serres has convincingly argued, hidden or simplified elements will assert themselves, irrupting into view.58 These elements challenge the legitimacy of search engines.
For example, a key set of actors on the Tor network are cloners. Tor hidden services use URLs that are 16 alphanumeric characters, followed by the pseudo-top-level domain .onion, as in Ahmia’s URL on the Tor network: msydqstlz2kzerdg.onion/. Clearly, these URLs are not easily memorized by humans. A major problem on the Tor network are phishing sites that act as proxies, thus performing “man-in-the-middle” attacks. As of this writing, there are at least four clones of Ahmia:
Cloned Tor hidden services are a security risk: if a user visits a cloned site and enters a password or Bitcoin information, that information will be stolen by the cloner. Because of the non-human-readable onion URLs, accidentally going to a cloned site is very easy to do.
Tor hidden service search engines struggle with this challenge, because these cloners undermine the ordering stories their search results tell. The end user can’t tell the legit site from the clone. As MoniTOR explained to me,
Some enterprising people/sites/engines have come up with the idea of putting or omitting certain information in site headers to indicate it is the legitimate page. The main problem is that anyone can edit their headers to match. So while this may work temporarily, it’s not an ideal solution.59
Another Tor-based search engine, not Evil, uses machine learning to label the real Tor hidden sites “official sites.” This is meant as a means for searchers to distinguish between authentic and cloned onion sites. Nonetheless, cloned sites continue to be a major problem on the Tor network.60
Finally, even with search engine operators hiding CEI sites from their indexes, search engine users report disturbing finds, as one frustrated I2P user lamented on the now-defunct I2P forum:
Then you browse search engine results, sieve through 80 pages flooded with a hundred variations of “11 yo *beep* pedo girls,” just mindlessly mirrored content from the clearnet, dead sites, improperly configured sites, all kinds of illegal *beep*, sites in Russian or Polish and other foreign languages … to find maybe one link that is just “facts about cats” or something. … In 20 minutes, you eyeballed 800 pages of mostly disgusting *beep*, i.e. sex with children, only to spent 2 minutes on a site about cat facts, that you didn’t even search for in the first place. … Again, you begin asking yourself, if there is anything relevant to be found in I2P at all. And if you shouldn’t just stop looking for it.
Thus, even if search engine operators attempt to hide classes of sites—especially CEI sites—those sites find their way into search engine indexes, as this I2P user describes. Although Dark Web search engines seek to hide and simplify the topologies of anonymity, would-be hidden elements emerge: cloners and CEI purveyors work to defeat algorithms and filters, exploiting the new channel that Dark Web search engines introduce.
Histories of the web often present search engines as key technologies of accessibility; engines such as Google have moved from being seen as bandwidth hogs that steal intellectual property to legitimated billion-dollar companies traded on stock markets.61 As René König and Miriam Rasch argue, mainstream search engines such as Google and Bing have become infrastructural: “Just as we expect water running from the tap, electricity coming from the plug, and roads to drive on, we take for granted that there are search engines to give us the information we need.”62 Google especially has become part of our “collective ‘techno-unconsciousness,’” invisibly structuring much of our daily lives, at least insofar as our lives are mediated by the Internet.63 Moreover, our hindsight gives us a safe vantage point to make the argument that Google is legitimate: it commands respect. For example, when someone with the title Google engineer makes a social proclamation, it is widely reported on.64 And Google also commands resources: billions of dollars and a global network of technologies from databases to operating systems to self-driving cars. Overall, Google’s propriety is not often questioned.65
What this chapter on Dark Web search reminds us, however, is that search engine legitimacy is never guaranteed. Any potential source of legitimacy should be exploited. In addition to mediating all the relationships described above, and in addition to hiding and simplifying, Dark Web search engines repeatedly lay claim to an inheritance: the legitimacy of previous search engines, especially Google.
This is immediately apparent when we consider claims to becoming the “Google of the Dark Web.” Chris McNaughton made this explicit for his search engine TorSearch.66 Nurmi also explicitly likened Ahmia to Google. Similarly, multiple search engines use visual and textual signals to stake their claim as inheritors of Google’s legitimacy. Freegle, a short-lived Freenet search, echoed Google in its name, as does I2P’s elgoog (“Google” spelled backward). The search engine not Evil playfully cites Google’s “Don’t be evil” motto and for several years used a primary color–based logo that echoed Google’s. I2P search engine Seeker and Tor engine the Beast also emulate Google’s stripped-down design (white background, textbox in the center of the screen). Tor’s Candle does the same, but with a black background. Grams also borrowed some of Google’s aesthetics, including the primary color logo on a white field, the “I’m feeling lucky” button, and the empire-building aspects of Google (including advertising networks and shopping services). Finally, Onion.link uses Google Custom Search to index Tor sites via the Tor2Web proxy.
Much as Google has done with multiple browsers and now the Android operating system, Dark Web search engines seek a prime position in relation to their respective networks: they seek to be on the networks’ home pages. Some have been successful in achieving this position. Enzo’s Search is included in Freenet’s default settings, signaling that the Freenet Project believes that Enzo’s Search is an appropriate window into the network. Others have not. I2P sought to do something similar; for a brief period, Eepsites.i2p and Epsilon were considered for the I2P Router Console page, the first page an I2P user sees.67 But as I2P developer zzz recalls,
We had a search box on I2P. We added it, and then immediately hid it, because we couldn’t find any search site existing now that could really hide all the worst of the worst. You know, you want to give people a good impression of I2P, and it is almost all clean and wonderful, helpful stuff, so we want to put that in front of people. And if we can’t find a search engine that can competently filter out the ugly stuff, we’re not going to enable that.68
In other words, the I2P developers were willing to include a default search engine if that search engine hid “the ugly stuff” (presumably CEI); if an engine is able to do so, it can enter into a legitimacy exchange with the I2P developers, gaining an “official” designation from the network builders and in turn providing users with a passage point into I2P. Selecting search engines as defaults provides the network builders with a tool that users often call for, and it consecrates those search engines as official.
The struggle to be the “Google of the Dark Web” is not settled, but it is telling that Dark Web search engine operators continue to make a claim to inheriting Google’s legitimacy. Success would bring respect and resources. It would be a means for new Dark Web users to enter the networks and find content. It would also mean that a legit Dark Web site builder found a way to provide such a portal into these networks.
And failure might have worse consequences than shutting down and giving up: it could lead to an external search behemoth coming to the Dark Web. As Virgil Griffith, developer of Onion.link, argues,
Respectfully, we lost [the war for Internet freedom]. However, a substantial fraction of the Tor community feels they can still win if they encircle the wagons tightly enough. And they see things like mainstream search engines as a finger by which mainstream attention and regulation will come to impact them more.69
In other words—setting aside the question of the war for Internet freedom—if a Dark Web user cannot build a “Google for the Dark Web,” the fear is that some external entity (Google? DARPA?) will do it instead, thus bringing corporate or state surveillance to these obscure corners of the Internet. To head off such an invasion, those who build Dark Web search engines are attempting to port mainstream search engine practices and software into the unique protocols of Tor, I2P, and Freenet. These practices include indexing, simplification, silencing, hierarchization, and gatekeeping. König and Rasch’s observation holds just as much for Dark Web search engines as it does for Google, Bing, and Baidu:
Search engines function as gatekeepers, channeling information by exclusion and inclusion as well as hierarchization. Their algorithms determine what part of the web we get to see and their omnipresence fundamentally shapes our thinking and access to the world. Whatever their bias may look like, it is obvious that man-made decisions are inscribed into the algorithms, leading unavoidably to favoring certain types of information while discriminating against others.70
Such techniques are necessary for building a legitimate, respectable, proper Dark Web search engine. No search engine can avoid creating hierarchies, gatekeeping, or shaping our interactions with a network; in fact, to be legitimate in the sense I am exploring here, they must do these things. Calls for a “Google of the Dark Web,” for a curated index of links and a web application with which to query it, is a call to make the Dark Web a bit more like the Clear Web, including replicating a de facto monopoly on a means for users to find content. By commanding respect and resources, such a search engine could even make the Dark Web itself legitimate.