Brandon Wiley, Freenet
In my travails as a Freenet developer, I often hear a vision of a file-sharing Utopia. They say, “Let’s combine all of the best features of Freenet, Gnutella, Free Haven, Mojo Nation, Publius, Jabber, Blocks, Jungle Monkey, IRC, FTP, HTTP, and POP3. We can use XML and create an ÜberNetwork which will do everything. Then we can IPO and rule the world.”
When I hear this vision, I shake my head sadly and walk slowly away. I have a different vision for solving the world’s file-sharing problems. I envision a heterogeneous mish-mash of existing peer-to-peer applications creating one network with a thousand faces—what you might call an OmniNetwork.
Every network has its flaws. As a Freenet developer, I never miss an opportunity to give Gnutella’s scalability and anonymity a good-natured ribbing. At the same time, Freenet is constantly criticized because (unlike with Gnutella) you have to donate your personal hard drive space to a bunch of strangers that may very well use it to host content that you disapprove of.
The obvious solution is to use only the network that best suits your needs. If you want anonymity, Freenet is a good choice. If you also want to be absolutely sure that you are not assisting the forces of evil (can you ever really be absolutely sure?) use Gnutella.
Ah, but what if you want Freenet’s “smart routing” and yet you also want Gnutella’s fast integration of new nodes into the network?
The answer is obvious: build an ÜberNetwork with anonymity and smart routing and fast node integration and a micropayment system and artist compensation and scalability to a large number of nodes and anti-spam safeguards and instant messaging capability, etc. It is ideas such as this that make me want to cast off the life of a peer-to-peer network developer in exchange for the gentle ways of a Shao-lin monk.
The problem with an ÜberNetwork is simple: it’s impossible. The differences in file-sharing networks are not merely which combinations of features are included in each particular one. While many file-sharing networks differ only in choice of features, there are also distinct and mutually exclusive categories of systems. Several optimization decisions are made during the design of a network that cause it to fall into one of these categories. You can’t optimize for everything simultaneously. An ÜberNetwork can’t exist because there are always trade-offs.
The idea of an ÜberClient is similar to that of an ÜberNetwork: To create a single application that does everything. An example of such an application in the client/server world is the ubiquitous web browser. These days, web browsers can be used for much more than just browsing the Web. They are integrated web, news, email, and FTP clients. The majority of your client/server needs can be serviced by a single application. Unlike the ÜberNetwork, the ÜberClient need not force everyone to convert to a new system. An ÜberClient would be compatible with all of the current systems, allowing you to pick which networks you wanted to retrieve information from.
The problem with the ÜberClient is that it is a client, and clients belong in the client/server world, not the world of peer-to-peer. Furthermore, the ÜberClient that already exists—the web browser—can serve as a kind of gateway to peer-to-peer applications. Many file-sharing networks either act as miniature web servers or are developing browser plugins. Someday you will probably be able to access all of the file-sharing networks from your web browser.
However, there is a catch: you will have to be running a node on each file-sharing network that you want to access. To do otherwise would not be peer-to-peer, but client/server. Also, the advantages of files crossing over between networks are lost. Files on Free Haven will still take a long time to load and unpopular files on Freenet will still disappear.
The next most popular solution after the creation of an ÜberClient is to link all of the existing networks together using an interoperable protocol, such as something based on XML, like XML-RPC or SOAP. The problem with this approach is that is doesn’t solve the right problem. The beauty of XML is that it’s a single syntax that can be used for many different purposes. It’s a step in the right direction for cross-platform, language-independent object serialization and a universal syntax for configuration files. However, the problem of interoperability between file-sharing networks is not the lack of a shared syntax. The syntax parsers for all existing file-sharing networks are minor components of the code base. A message parser for any existing system could be written in a weekend if a clear specification was available.
The problem of interoperability is one of semantics. The protocols are not interoperable because they carry different information. You wouldn’t expect Eliza to be a very good chess player or Deep Blue to be a good conversationalist even if they both used an XML-based protocol for communicating with you. Similarly, you should not expect a free system such as Gnutella to understand a micropayment transaction or an anonymous system such as Freenet to understand user trust ratings.
The solution to the problem, then, is not an ÜberNetwork, but the integration of all the different types of networks into a single, interoperable OmniNetwork. This has some advantages over the current state of many non-interoperable networks. Each person could use his or her network of choice and still get content from all of the other networks. This means that everyone gets to choose in what way they want to participate, but the data itself reflects the cumulative benefits of all systems.
To clarify this statement, I will describe how a gateway might work between Freenet and Free Haven. What are the advantages of a gateway? They pertain to the relative strengths and weaknesses of the systems. Data on Freenet can be retrieved quickly, whereas speed is recognized as a problem on Free Haven. However, data can disappear at unpredictable times on Freenet, whereas the person who publishes data on Free Haven specifies when it expires. Combine the two systems and you have readily available, potentially permanent data.
Suppose a user can insert information into either Free Haven or Freenet, depending on her preference. Then a second user can request the same information from either Free Haven or Freenet, depending on his preference. If the users are on the same network, the normal protocols are used for that network. What we’re interested in here are the two possibilities left: either the information is on Free Haven and the requester is on Freenet, or the information is on Freenet and the requester is on Free Haven. In either case, the information should still be retrievable:
Requesting data through Freenet guarantees the anonymity of the requester even if he distrusts the Free Haven node. Additionally, every request of the file through Freenet causes the information to migrate to Freenet, lending the caching ability of Freenet to future requests. While the first request has to go all the way to Free Haven to fetch the information, subsequent requests need only traverse Freenet and will therefore be faster. If the information expires from Freenet, a copy still exists in Free Haven.
In this case, the information is retrieved from Freenet and cached in Free Haven. Since the information was fetched from a Freenet node, the anonymity of the requester is guaranteed even if he mistrusts the Freenet node. Additionally, requesting the data from Free Haven will cause it to be cached in Free Haven, so a copy with a guaranteed lifetime will now exist. If it should expire from Freenet, a copy still exists in Free Haven.
This is just one example of the ways that the synergy of systems with opposing designs can create a richer whole. Each of the major types of file-sharing systems adds its own benefits to the network and has its own deficiencies that are compensated for. We’ll look at some details in the next section.
In this section I’ll list the characteristics that distinguish each of five popular networks—Freenet, Gnutella, Mojo Nation, Free Haven, and Publius—so we can evaluate the strengths each would offer to an all-encompassing OmniNetwork.
While the world of peer-to-peer is already large at quite a young age, I’ve chosen to focus here just on file storage and distribution systems. That’s because they already have related goals, so comparisons are easy. There are also several such systems that have matured far enough to be good subjects for examination.
Freenet adds several things to the OmniNetwork. Its niche is in the efficient and anonymous distribution of files. It is designed to find a file in the minimum number of node-to-node transactions. Additionally, it is designed to protect the privacy of the publisher of the information, the requester of the information, and all intervening nodes through which the information passes.
However, because of these design goals, Freenet is deficient in some other aspects. Since it is designed for file distribution and not fixed storage, it has no way to ensure the availability of a file. If the file is requested, it will stay in the network. If it is not requested, it will be eliminated to make room for other files. Freenet, then, is not an ideal place to store your important data for the rest of eternity.
Second, Freenet does not yet have a search system, because designing a search system which is sufficiently efficient and anonymous is very difficult. That particular part of the system just hasn’t been implemented yet.
A final problem with Freenet is that in order to assure that the node operators cannot be held accountable for what is passing through their nodes, the system makes it very difficult for a node operator to determine what is being stored on his hard drive. For some this is fine, but some people want to know exactly what is being stored on their computers at all times.
Gnutella offers an interesting counterpoint to Freenet. It is also designed for file distribution. However, each node holds only what the node operator desires it to hold. Everything being served by a Gnutella node was either put there by the node operator or else has been requested from the network by the node operator. The node operator has complete control over what she serves to the network.
Additionally, this provides for a form of permanent storage. The Gnutella request propagation model allows that if a single node wants to act as a permanent storage facility for some data, it need do nothing more than keep the files it is serving. Requests with a high enough time-to-live (TTL) will eventually search the entire network, finding the information that they are looking for. Also, Gnutella provides searching and updating of files.
However, the Gnutella design, too, has some deficiencies. For instance, it does not provide support for any sort of verification of information to avoid tampering, spamming, squatting, or general maliciousness from evil nodes and users. It also does not have optimized routing or caching to correct load imbalances. In short, it does not scale as well as Freenet. Nor does it provide much anonymity or deniability for publishers, requesters, or node operators. By linking Freenet and Gnutella, those who wish to remain anonymous and those who wish to retain control over their computers can share information.
What Mojo Nation adds to the peer-to-peer file-sharing world is a micropayment system, and a rich and complex one at that. A micropayment system adds the following advantages to the OmniNetwork: Reciprocity of contribution of resources, compensation for the producer of content, and monetary commerce.
Reciprocity of contribution simply means that somebody has to give something in order to get something. Both Freenet and Gnutella must deal with a lack of reciprocity, an instance of the archetypal problem called the tragedy of the commons. Actually, this has been a problem for file-sharing systems throughout the ages, including BBSs, anonymous FTP sites, and Hotline servers. Now, in the peer-to-peer age, there is no centralized administrator to kick out the leeches. In an anonymous system, it’s impossible even to tell who the leeches are (unless the providers of content want to voluntarily give up their anonymity, which they generally don’t).
Micropayments solve the reciprocity of contribution problem by enforcing a general karmic balance. You might not give me a file for every file you get from me (after all, I might not want your files), but all in all you will have to upload a byte to someone for every byte you download. Otherwise, you will run out of electronic currency. This is indeed a boon for those who fear the network will be overrun by leeches and collapse under its own weight.[110]
Solving reciprocity is particularly important for controlling spam and denial of service attacks. For every piece of junk someone asks you to serve to the network, you receive some currency. If you are flooded with requests from a single host, you receive currency for each request. The attacker may be able to monopolize all of your time, effectively rendering your node inoperable to the rest of the network, but he will have to pay a high price in order to do so. With micropayments, you are, in effect, being paid to be attacked. Also, the attackers must have some way of generating the currency for the attack, which limits the attackers to those with enough motivation and resources.
There are other uses for micropayments besides reciprocity, particularly the ability to engage in actual commerce through a file-sharing network. If you want to trade other people’s content for them in order to gain some currency, the handy tip button, a feature of the Mojo Nation interface, allows people to send some currency to the producers of content as well as the servers.
Also, the system could someday perhaps be used to exchange electronic currency for not just information, but things like food and rent. I can already see the kids dreaming of supporting themselves through savvy day trading of the latest underground indie tunes (the artists being supported by the tip button). Hipness can metamorphose from something that gets you ops on an IRC channel to a way to make mad cash.
However, not everyone wants to exchange currency for information. Even exchanges are certainly one mode of interaction, but it is very different from the Freenet/Gnutella philosophy of sharing information with everyone. Freenet and Gnutella serve a useful role in the Mojo Nation framework when a single node does not have the resources to make a transaction. If you don’t have any resources (you have a slow machine, slow connection, and small hard drive) it is hard to get currency, since you get currency by contributing resources. Without currency you can’t request anything. However, if a lot of low-resource nodes decided to get together and act as a single, pooled node, they would have significant resources. This is exactly how Freenet and Gnutella work. One node is the same as a whole network as far as your node is concerned. Thus, low-resource Mojo Nation nodes can form “syndicates” so that they will not be excluded from having a presence on the Mojo Nation network.
By combining the two types of networks, the free and communal networks of Freenet and Gnutella with the commercial network of Mojo Nation, people can choose whether to share freely or charge for resources as they see fit. Different people will choose differently on the matter, but they can still share content with each other.
Free Haven and Publius are in an entirely different category from other file- sharing networks. While the other networks concentrate on the distribution of content people want (reader-centric systems), these systems concentrate on anonymously preserving information (publisher-centric systems). The flaw people point out most often in Freenet is that data disappears if no one requests it for a long enough time. Luckily, Free Haven and Publius are optimized to provide for just that eventuality. They are conceptually derived from the mythical "Eternity Service” in which once you add a file it will be there forever. While it may be possible to delete a file from a Free Haven or Publius node, these networks are specifically designed to be resistant to the removal of content.
File storage networks have problems when viewed as file distribution networks. They are generally much slower to retrieve content from (because they are optimized for storage and not distribution). Additionally, they do not deal well with nodes fluttering on and off the network rapidly. To lose a node in a file storage network is problematic. To lose a node in a good file distribution network is unnoticeable.
There are great possibilities with the combination of reader-centric distribution networks with publisher-centric storage networks. It would be ideal to know that your information will always be available to everyone using a file-sharing network anywhere. People can choose to share, trade, buy, and sell your information, anonymously or non-anonymously, with all the benefits of distributed caching and a locationless namespace, and with no maintenance or popularity required to survive. Once the information is inserted in the network, it will live on without the publisher needing to provide a server to store it. Unfortunately, making gateways between networks actually work is somewhat problematic.
The problem with creating gateways is finding a path. Each piece of information is inserted into a single network. From there it must either find its way into every connected network, or else a request originating in another network must find its way to the information. Both of these are very difficult. In short, the problem is that an insert or request must find its way to a node that serves as a gateway to the separate network where the information is stored.
The problem with finding a path to another network during an insert is that the paths of inserts are generally very short and directed. Each network routes its inserts using a different method:
Freenet takes the “best” path to the “epicenter” for a given key. The length of the path is specified by the user. A longer path means that there is a greater chance for an insert to happen upon a gateway. However, longer paths also mean that you have to wait longer for the insert to complete.
Gnutella doesn’t have inserts.
Mojo Nation splits a file into multiple parts and inserts each part into a node. The nodes are chosen by comparing the file part’s hash to the range of hash values that a node advertises as serving.
Free Haven splits up the file using k-of-n file splitting and inserts each part to a node. The nodes are chosen by asking trusted nodes if they want to trade their own data for that particular file part.
Publius sends a file to a static lists of nodes and gives each node part of the key.
Some of these techniques could be extended to put material on a gateway node (for instance, Free Haven and Publius choose which nodes to use), but techniques that depend on randomizing the use of nodes are inimical to using gateways.
The problem with finding a path on a request is that the networks do not take into account the presence of gateways when routing a request message. Therefore, it is unlikely that a request message will randomly happen upon a gateway. The easy solution, of course, is to have everyone running a node on any network also run a gateway to all of the other networks. It’s an ideal solution, but probably infeasible.
The following sections describe the routing techniques used by each of the systems we’re looking at.
Freenet requests are routed just like inserts, using the “best” path to the “epicenter.” The length of the path is set by the user if the information is in Freenet (and if it is, we don’t need a gateway), but longer paths take longer to fail if the key is not in the network. You could, of course, find a gateway by searching all of Freenet, assuming that the number of nodes in the network is less than or equal to the maximum path length. That would almost certainly take longer than you would care to wait. Freenet is designed so that if the file is in the network, the path to the file is usually short. Consequently, Freenet is not optimized for long paths. Long paths are therefore very slow.
Gnutella messages are broadcast to all nodes within a certain time-to-live, so choosing a path is not an issue. You can’t choose a path even if you want to. The issue with Gnutella is that a gateway has to be within the maximum path radius, which is usually seven hops away. Fortunately, Gnutella is generally a very shallow network in which your node knows of a whole lot of other nodes. Generally, a gateway out of Gnutella to another system would have a high probability of being reached, since every request will potentially search a large percentage of the network. If there is a gateway node anywhere in the reachable network, it will be found. This is good if you want to access the whole world through Gnutella. Of course, it doesn’t help at all if you want to gateway into Gnutella from another system.
Mojo Nation requests are somewhat complicated. First, you must find the content you want on a content tracker that keeps a list of content and who has a copy of it. From the content tracker, you retrieve the address of a node that has a copy of the file part that you want. Then, you request the file part from the node. You do this until you have all of the parts needed to reconstruct the file.
This process actually lends itself quite well to gatewaying. As long as the gateways know what files are in Freenet, they can advertise for those keys. Unfortunately, gateways can’t know what files are in Freenet. A gateway can only know what files have passed through it, which is only a fraction of the total content of the network.
However, if gateways also act as content trackers, they can translate requests for unknown keys into Freenet requests and place any keys found into the Mojo Nation content tracker index. In this way, you can access content from Freenet as long as you are willing to use a content tracker that is also a Freenet gateway. While it would be nice just to ask the network in general for a key and have it be found in Freenet (if appropriate), that is not how Mojo Nation works. In Mojo Nation, you ask a particular content tracker for content.
One way to integrate gatewayed and non-gatewayed content trackers in Mojo Nation would be to have a proxy node that acts as a Freenet gateway. Using that, any content tracker that functions as a gateway and a proxy could be used. The content tracker would be searched first, and if it failed, the gateway could be searched.
Gatewaying Publius is an interesting problem. Each file is split into a number of parts, each of which is sent to a different server. In order to reconstruct the file, you need a certain number of parts. It is therefore necessary for at least that number of parts to make it into gateways.
The length of the path for each part of the file is only 1 because the file goes directly to a single node and then stops. That means that if you need k parts of the file, k of the nodes contacted must be gateways in order for the file to be able to be reconstructed in the other network. The only solution, therefore, is to make most Publius nodes gateways.
Making a gateway out of Free Haven is not quite as difficult as making one out of Publius, because parts of files get routinely traded between nodes. Every time a trade is made, the file part could potentially find a gateway, thus reaching the other network. However, when and how often files are traded is unknown and unpredictable. Thus, file trading cannot be counted on to propagate files, although it certainly will increase the probability of propagation by a nontrivial amount.
There is much theoretical work to be done in the area of making actual, working gateways between the different networks. However, even once that has been worked out, there is still the issue of implementation specifics.
There are a couple of ways that this could be approached. The first is to make each gateway between networks X and Y a hybrid X-Y node, speaking the protocols of both networks. This is undesirable because it leads to a combinatorial explosion of customized nodes, each of which has to be updated if the protocol changes for one of the networks.
A preferable solution would be to define a simple and universal interface that one node can use to query another for a file. Then, a gateway would consist merely of a cluster of nodes running on different networks and speaking different protocols, but talking to each other via a common interface mechanism. Using a common interface mechanism, gateway nodes would not even have to know what foreign networks they were talking to.
There are different possible interface mechanisms: CORBA,[111] RMI,[112] XML-RPC,[113] SOAP,[114] etc. The mechanism that I would recommend is HTTP. It is a standard protocol for requesting a file from a particular location (in this case a particular node, which represents a particular network). Also, some file-sharing networks already have support for HTTP. Freenet and Gnutella support interfacing through HTTP, for instance.
Modification of the code base for each system to make a normal node into a gateway would be minor. The node need merely keep a list of gateways and, upon the failure of the network to find a requested file, query the gateways. If a file is found on a gateway, it is transferred by HTTP to the local node and then treated exactly as if it was found in the node’s local data storage.
Despite the desire for interoperability among networks, little has been done to facilitate this. Network designers are largely consumed by the difficult implementation details of their individual networks.
The only gatewaying project currently underway, to my knowledge, is the World Free Web (WFW) project, which aims to combine Freenet and the World Wide Web. While the Web may not at first seem like a file-sharing network as much as a publication medium, now that we find web sites offering remote hosting of vacation photographs and business documents, the two uses are merging into one.
Freenet and the Web complement each other nicely. Freenet is adaptive, temporary, and locationless, whereas the Web is static, semipermanent, and location-based. The point of the WFW project is to ease the load of popular content on web servers by integrating Freenet into web browsers. A WFW-enabled web browser will first check Freenet for the requested file. If the browser can’t find the file, it will fetch the file from the Web and insert it into Freenet. The net effect is that popular web sites will load faster and the web servers will not crash under the load. This project, like many open source projects existing today, really needs only developers. The concepts are sound and merely call for experts on the various browsers to integrate them.
The peer-to-peer file-sharing developer community is not large. While the peer-to-peer world is expected to explode in popularity, those who have code in the here and now are few. There has been much discussion of interoperability among the various projects, so it may well happen. The technical challenges of routing requests to gateways are difficult ones, but certainly no more difficult than the challenges involved in anonymity, scalability, performance, and security that network designers have already had to face.
I’d like to thank Christine and Steve for contributing greatly to my second draft, and Isolde for making me write when I’d rather go to the park.
[110] See the Jargon File “Imminent Death of the Net Predicted!”, http://www.tuxedo.org/~esr/jargon/jargon.html.