Chapter 17. Reputation

Richard Lethin, Reputation Technologies, Inc.

Reputation is the memory and summary of behavior from past transactions. In real life, we use it to help us set our expectations when we consider future transactions. A buyer depends on the reputation of a seller when he considers buying. A student considers the reputation of a university when she considers applying for admission, and the university considers the student’s reputation when it decides whether to admit her. In selecting a candidate, a voter considers the reputation of a politician for keeping his word.

The possible effect on one’s reputation also influences how one behaves: an individual might behave properly or fairly to ensure that her reputation is preserved or enhanced. In situations without reputation, where there is no prospect of memory after the transaction, behavior in the negotiation of the transaction can be zero-sum. This is the classic used car salesman situation in which the customer is sold a lemon at an unreasonable price, because once the customer drives off the lot, the salesman is never going to see her again.

A trade with a prospective new partner is risky if we don’t know how he behaved in the past. If we know something about how he’s behaved in the past, and if our prospect puts his reputation on the line, we will be more willing to trade. So reputation makes exchange freer, smoother, and more liquid, removing barriers of risk aversion that interfere with trade’s free flow.

Reputation does all this without a central authority. Naturally, therefore, reputation turns up frequently in any discussion about distributed entities interacting peer-to-peer—a situation that occurs at many levels over the Internet. Some of these levels are close to real life, such as trade in the emerging e-marketplaces and private exchanges. Others are more esoteric, such as the interaction of anonymous storage servers in the Free Haven system described in Chapter 12. Chapter 16, includes a discussion of the value of reputation.

The use of reputation as a distributed means of control over fairness is a topic of much interest in the research literature. Economists and game theorists have analyzed the way reputation motivates fair play in repeated games, as opposed to a single interaction, which often results in selfish behavior as the most rational choice. Researchers in distributed artificial intelligence look to reputation as a system to control the behavior of distributed agents that are supposed to contribute collectively to intelligence. Researchers in computer security look at deeper meanings of trust, one of which is reputation.

In this chapter, I will present a commercial system called the Reputation Server™[87] that tries to bring everyday aspects of reputation and trust into online transactions. While not currently organized in a peer-to-peer fashion itself, the service has the potential to become more distributed and prove useful to peer-to-peer systems as well as traditional online businesses.

The Reputation Server is a computer system available to entities engaging in a prospective transaction—a third party to a trade that can be used by any two parties who want reputation to serve as motivation for fair dealing.

The server accepts feedback on the performance of the entities after each transaction is finished and stores the information for use by future entities. It also provides scores summarizing the history of transactions that an entity has engaged in. The Reputation Server, by holding onto the histories of transactions, acts as the memory that helps entities build reputations.

A North American buyer of textiles might be considering purchasing from a new supplier in China. The buyer can check the Reputation Server for scores based on feedback from other buyers who have used that supplier. If the scores are good enough to go forward, the buyer will probably still insist that the trade be recorded in a transaction context on the Reputation Server—that the seller be willing to let others see feedback about its performance—in order to make it costly for the seller to perform poorly in the transaction. Without the Reputation Server, the buyer has to rely solely on other means of reducing risk, such as costly product inspections or insurance.[88]

But the motivation to use the Reputation Server is not exclusively on the buyer’s side: A reliable seller may insist on using the Reputation Server so that the trade can reinforce his reputation.

In some cases, the Reputation Server may be the only way to reduce risk. For example, two entities might want to trade in a securely pseudonymous manner, with payment by a nonrepudiable anonymous digital cash protocol. Product inspection might be unwanted because it reveals the entity behind the pseudonym. Once the digital cash is spent, there’s no chance of getting a refund. Reputation helps ease some of the buyer’s concern about the risk of this transaction: she can check the reputation of the pseudonym, and she has the recourse of lowering that reputation should the transaction go bad. Thus, the inventors of anonymous digital cash have long recognized the interdependence of pseudonymous commerce systems and reputations. Also, the topic gets attention in the Cypherpunks Cyphernomicon as an enabling factor in the adoption of anonymous payment technologies.[89]

But more mundane risks can also make using the Reputation Server worthwhile. The example I started with in this section, of a buyer in North America purchasing textiles from China, has some aspects of functional anonymity: even though the buyer and seller aren’t actively hiding from each other, they don’t know each other because of the geographic, political, cultural, class, and language barriers that separate them. Reputation Servers can be the social network that is otherwise lacking and that enforces good behavior or allows the system to correct itself. As the Internet bridges the traditional barriers to create new relationships, the need for Reputation Servers grows.

At first, the implementation of this system seems trivial: just a database, some messaging, and some statistics. However, the following architecture discussion will reveal that the issues are quite complex. With keen competition and high-value transactions, the stakes are high. This makes it important to consider the design carefully and take a principled approach.

Reputation domains, entities, and multidimensional reputations

To understand how the Reputation Server accomplishes its task, you have to start with the abstraction of a reputation domain, which is a context in which a sequence of trades will take place and in which reputations are formed and used. A domain is created, administered, and owned by one entity. For example, a consultant integrating the software components for a business-to- business, online e-marketplace might create a reputation domain for that e-marketplace on the Reputation Server. Thousands of businesses that will trade in the e-marketplace can use the same domain. Or someone might create a smaller domain consisting of auto mechanics in Cambridge and the car owners that purchase repairs. Or someone might create a domain for the anonymous servers forming Free Haven.

The domain owner can specify the domain’s rules about which entities can join, the definition of reputation within that domain, which information is going to be collected, who can access the data, and what they can access. Reputations form within the domain according to the specified configuration. For the moment, we assume that there is no information transfer among domains: A reputation within one domain is meaningless in another domain.[90]

Entities in a Reputation Server correspond to the parties for whom reputations will be forming and the parties who will be providing feedback. Entities might correspond to people, companies, software agents, or Pretty Good Privacy (PGP) public keys. They exist outside the domains, so it is possible for an entity to be a participant in multiple domains.

The domain has a great degree of latitude in how it defines reputation. This definition might be a simple scalar quantity representing an overall reputation, or a multidimensional quantity representing different aspects of an entity’s performance in transactions. For example, one of the dimensions of a seller’s reputation might be a metric measuring the quality of goods a seller ships; another might be the ability to ship on time. The scoring algorithms do not depend on what the individual dimensions “mean”; the dimensions are measures within a range, and the domain configuration simply names them and hooks them up to sources and readers.

The notion of a domain is powerful, even for definitions that might be considered too small to be meaningful. For example, a domain with only one buyer seems solipsistic (self-absorbed) but can in fact be quite useful to an entity for privately monitoring its suppliers. The domain can provide a common area for the storage and processing of quality, docking, and exception information that might otherwise be used by only one small part of the buyer’s organization or simply lost outright.

Reputation information about a supplier might be kept internal to the buyer if the buyer thinks this is of strategic importance (that is, if knowing which supplier is good or bad in particular areas conveys a competitive advantage to the buyer). On the other hand, if the buyer is willing to share the reputation information he has taken the trouble to accumulate, it could be useful so that a seller can attract other buyers. For example, ACME computer company might allow its ratings of suppliers to be shared outside to help its suppliers win other buyers; this benefits ACME by allowing its suppliers to amortize fixed costs, and it might even be able to negotiate preferred terms from the supplier to realize this benefit.

Before gaining a reputation, an entity needs to have an identity that is made known to the Reputation Server. The domain defines how identities are determined.

Techniques for assuring an entity’s identity are discussed in other areas of this book, notably Chapter 15, and Chapter 18. An entity’s identity, for instance, might be a certified public key or a simple username validated with password login on the Reputation Server.

Some properties of identities can influence the scoring system. One of the most critical questions is whether an entity can participate under multiple identities. Multiple participation might be difficult to prevent, because entities might be trivially able to adopt a new identity in a marketplace. In this situation, with weak identities, we have to be careful how we distinguish a bad reputation from a new reputation. This is because we may create a moral hazard: the gain from cheating may exceed the loss to reputation if the identity can be trivially discarded and a new identity trivially constructed. Weak identities also have implications for credibility, because it becomes hard to distinguish true feedback from feedback provided by the entity itself.

While it is possible to run a reputation domain for weak identities, it is easier to do so for strong identities. Reputation domains with weak identities require the system to obtain and process more data, while strong identities allow the system to “bootstrap” online reputations with some grounding in the real world.

We use the term marketplace loosely: generally it corresponds to an online e-marketplace, but a marketplace might also correspond to the distributed block trading that is taking place in Free Haven or the private purchasing activity of the single buyer who has set up a private reputation domain. While some marketplaces, such as eBay, include an embedded reputation system, our Reputation Server exists outside the marketplace so that it can serve many marketplaces of different types.

The separation of the Reputation Server from the marketplace creates relatively simple technical issues as well as more complex business issues. We discuss some of the business issues later in Section 17.10. The main technical issue is that the marketplace and the Reputation Server need to communicate. This is easy to solve: The Internet supports many protocols for passing messages, such as email, HTTP, and MQ. The XML language is excellent for exchanging content-rich messages.

One of the simple messages that the marketplace can send to the Reputation Server indicates the completion of a transaction. This message identifies the buyer and seller entities and gives a description of the type of transaction and the monetary value of the transaction. The description is important: A reputation for selling textiles might not reflect on the ability to sell industrial solvents.

The transaction completion message permits the Reputation Server to accept feedback on the performance of entities in the transaction. For some domains, it also triggers the Reputation Server to send out a request for feedback on the transaction. In the most rudimentary case, the request for feedback and the results could be in electronic mail messages. Since a human being has to answer the email request for feedback, some messages may be discarded and only some transactions will get feedback. For this reason, obviously, it is preferable to automate the collection. So some businesses may interface the trader’s Enterprise Resource Planning (ERP) systems into the Reputation Server. For automated peer-to-peer protocols like Free Haven, an automated exchange of feedback will be easier to generate.

The marketplace and the Reputation Server will also exchange other, more complex messages. For example, the marketplace might send a message indicating the start of a potential transaction. Some transactions take a long time from start to finish, perhaps several weeks. Providing the Reputation Server with an early indication of the prospective transaction allows the Reputation Server to provide supplementary services, such as messages indicating changes in reputation of a prospective supplier before the transaction is consummated.

One of the most interesting aspects of the Reputation Server is the scoring system, the manner in which it computes reputations from all of the feedback that is has gathered.

Why bother computing reputations at all? If, as asserted in the first sentence of this chapter, “Reputation is the memory and summary of behavior from past transactions,” why not simply make the reputation be the complete summary of all feedback received, verbatim? Some online auctions do in fact implement this, so that a trader can view the entire chain of feedback for a prospective partner. This is okay when the trader has the facility to process the history as part of a decision whether to trade or not.

But more often, there is good reason for the Reputation Server to add value by processing the chain into a simple reputation score for the trader. First, the feedback chain may be sensitive information, because it includes a description of previous pricing and the good traded. Scoring algorithms can mask details and protect the privacy of previous raters. This trade-off between hiding and revealing data is more subtle than encryption. Encryption seeks to transform data so that, to the unauthorized reader, it looks as much like noise as possible. With reputation, there is a need to simultaneously mask private aspects of the transaction history—even to the authorized reader—while allowing some portion of the history through so it can influence the reputation. Some of this is accomplished simply by compressing the multiple dimensionality of the history into a single point, perhaps discretizing or adding another noise source to the point to constrain its dimensionality.

Furthermore, the Reputation Server has a more global view of the feedback data set than one can learn from viewing a simple history listing, and it can include other sources of information to give a better answer about reputation. Stated bluntly, the Reputation Server can process a whole bunch of data, including data outside the history. For example, the Reputation Server may have information about the credibility of feedback sources derived from the performance of those sources in other contexts.

The Reputation Server is a platform for multiple scoring functions, and each domain can choose the kinds of scoring used and the functions that compute the scores.

A number of reputation metrics have been proposed in the literature. Some simply provide ad hoc scales, dividing reputations into discrete steps or assigning boundaries and steps arbitrarily. While ad hoc definitions of reputation can seem reasonable at first, they can have undesirable properties.[91] For example, simply incrementing reputation by one for each good transaction and decrementing by one for each bad transaction allows a reputation to keep growing indefinitely if a seller cheats one buyer out of every four. If the seller does a lot of volume, she could have a higher reputation in this system than someone who trades perfectly but has less than three quarters the volume. Other reputation metrics can have high sensitivity to lies or losses of information.

Other approaches to reputation are principled.[92] One of the approaches to reputation that I like is working from statistical models of behavior, in which reputation is an unbound model parameter to be determined from the feedback data, using Maximum Likelihood Estimation (MLE). MLE is a standard statistical technique: it chooses model parameters that maximize the likelihood of getting the sample data.

The reputation calculation can also be performed with a Bayesian approach. In this approach, the Reputation Server makes explicit prior assumptions about a probability distribution for the reputation of entities, either the initial distribution that is assumed for every new entity or the distribution that has previously been calculated for entities. When new scores come in, this data is combined with the previous distribution to form a new posterior distribution that combines the new observations with the prior assumptions.

Our reputation scores are multidimensional vectors of continuous quantities. An entity’s reputation is an ideal to be estimated from the samples as measured by the different entities providing feedback points. An entity’s reputation is accompanied by an expression of the confidence or lack of confidence in the estimate.

Our reputation calculator is a platform that accepts different statistical models of how entities might behave during the transaction and in providing feedback. For example, one simple model might assume that an entity’s performance rating follows a normal distribution (bell) curve with some average and standard deviation. To make things even simpler, one can assume that feedback is always given honestly and with no bias. In this case, the MLE is a linear least squares fit of the feedback data.

This platform will accept more sophisticated reputation models as the amount of data grows. Some of the model enhancements our company is developing are described in the following list:

The rate of reputation drift, the related weight assigned to more recent feedback, biases, the estimate of the credibility of sources, and contextual correlation become additional free parameters to be chosen by the MLE solver. Getting good estimates of these parameters requires more data, obviously.

A property of this approach is that reputation does not continue increasing arbitrarily as time advances; it stays within the bounds established when the reputation domain was configured. Additional data increase the data points on which the extracted parameters are based, so as a trader earns more feedback, we usually offer greater confidence in her reputation. Confidence is not being confused with the estimate of reputation.

It’s interesting to think about how to incorporate the desire to punish poor performance quickly (making reputation “hard to build up, and easy to tear down”) into the model-based approach. It seems reasonable to want to make the penalty for an entity’s behaving in a dishonest way severe, to deter that dishonest behavior. With an ad hoc reputation-scoring function, positive interactions can be given fewer absolute reward points than absolute punishment points for negative behavior. But how is the ratio of positive to negative feedback chosen? There are a number of approaches that permit higher sensitivity to negative behavior.

One approach is to increase the amount of history transmitted with the reputation so the client’s decision function can incorporate it. If recent negative behavior is of great concern, the reputation model can include a drift component that results in more weight toward recent feedback. Another approach is to weight positive and negative credibility differently, giving more credence to warnings.

The design choices (including ad hoc parameter choices) depend intimately on the goals of the client and the characteristics of the marketplace. Such changes could be addressed by adapting the model to each domain, by representing the assumptions as parameters that each domain can tune or that can be extracted mechanically, and perhaps even by customizing the reputation component in a particular client.

How is MLE calculated? For simple models, MLE can be calculated analytically, by solving the statistical equations algebraically. Doing MLE algebraically has advantages: The answer is exact, updates can be computed quickly, and it is easier to break up the calculation in a distributed version of a Reputation Server. But an exact analytical solution may be hard to find, nonexistent, or computationally expensive to solve, depending on the underlying models. In that case, it may be necessary to use an approximation algorithm. However, some of these algorithms may be difficult to compute in a distributed manner, so here a centralized Reputation Server may be better than a distributed one.

One of the largest problems for the Reputation Server is the credibility of its sources. How can a source of feedback be trusted? Where possible, cryptographic techniques such as timestamps and digital signatures are used to gain confidence that a message originates from the right party. Even if we establish that the message is truly from the correct feedback source, how do we know that the source is telling the truth? This is the issue of source credibility, and it’s a hairy, hairy problem.

We address this in our Reputation Server by maintaining credibility measures for sources. These credibility measures factor into the scoring algorithms that form reputations—both our estimated reputation and the confidence that our service has in the estimate. Credibility measures are initialized based on heuristic judgments, and then updated over time using the Bayesian/MLE framework previously described. Sources that prove reliable over time increase their credibility. Sources that do not prove reliable find their credibility diminished.

This process can be automated through the MLE solver and folded into the scoring algorithm. Patterns of noncredible feedback are identified by the algorithm and given lower weights. Doing this, though, requires something more than the accumulated feedback from transactions; we should have an external reference or benchmark source of credible data. One way that we solve this is by allowing the domain configuration to designate benchmark sources. The Reputation Server assigns high credibility to those sources because the designation indicates that there is something special backing them up, such as a contractual arrangement, bonding of the result, or their offline reputation. In a sense, credibility flows from these benchmark sources to bootstrap the credibility of other sources.

Popular online marketplaces such as auctions have rudimentary reputation systems, providing transaction feedback for participants. These marketplaces strongly protect their control over the reputations that appear on their site, claiming they are proprietary to the marketplace company! The marketplaces fight cross-references from other auctions and complete copying of reputations with lawsuits, and they discourage users from referring to their reputations from other auctions.

These practices raise the question: Who owns your reputation? The popular auction sites claim that they own your reputation: It is their proprietary information. It is easy to understand why this is the case. Portable reputations would be a threat to the auction sites, because they reduce a barrier to buyers and suppliers trading on competitor auctions. Portable reputations make it more difficult for auctions to get a return from their investment in technology development and marketing that helped build the reputation.

The Reputation Server supports auction sites by isolating the reputation domains unless the owners of the domains permit sharing. In cases where the sharing can be economically beneficial, the scoring algorithms can permit joining the data of two domains to achieve higher confidence reputations. This is performed only with the permission of the domain owners.

One obstacle to the use of the Reputation Server is a bootstrapping or chicken-and-egg problem. While the server is of some use even when empty of transaction histories (because it serves as a place where entities can put their reputations on the line), it can be difficult to convince a marketplace to use it until some reputation information starts to appear.

Consequently, our server offers features to bootstrap reputations similar to the way reputations might be bootstrapped in a real-world domain: through the use of references. A supplier entering the system can supply the names of trade references and contact information for those references. The server uses that contact information to gather the initial ratings. While the reference gathering process is obviously open to abuse, credibility metrics are applied to those initial references. To limit the risk of trusting the references from outside the reputation system, those credibility metrics can signal that the consequent reputation is usable only for small transactions. As time passes and transactions occur within the reputation system, the feedback from transactions replaces the reference-based information in the computation of the reputation.

Business theorists have observed that the ability to communicate broadly and deeply through the Internet at low cost is driving a process whereby large businesses break up into a more competitive system of smaller component companies. They call this process “deconstruction.”[93] This process is an example of Coase’s Law, which states that other things being equal, the cost of transacting—negotiating, paying, dealing with errors or fraud—between firms determines the optimal size of the firm.[94] When business transactions between firms are expensive, it’s more economical to have larger firms, even though larger firms are considered less efficient because they are slower to make decisions. When transactions are cheaper, smaller firms can replace the larger integrated entity.

As an example, Evans and Wurster point to the financial industry. Where previously a bank provided all services like investments and mortgages, there are now many companies on the Internet filling small niches of the former service. Aggregation sites find the best mortgage rate out of hundreds of banks, investment news services are dedicated solely to investment news feeds, and so on. Even complex processes like the manufacturing of automobiles—already spread over chains of multiple companies for manufacturing parts, chassis, subsystems—could be further deconstructed into smaller companies.[95]

With more entities, there is an increased need for tracking reputations at the interaction points between them. At the extreme, a firm might completely deconstruct: One vision is that the substations that currently make up a factory can become independent entities, all transacting in real time and automatically to accomplish the manufacturing task that previously occurred in the single firm. The Reputation Server, as one of the components reducing the cost of transacting between firms, serves as a factor to assist in this deconstruction, which results in lower manufacturing costs.

Central Reputation Server versus distributed Reputation Servers

The first version of the Reputation Server is a centralized web server with a narrow messaging interface. One could well argue that it should be decentralized so that the architecture conforms to our ultimate goal: to provide fairness in a noncentralized manner for peer-to-peer networks.

Can we design a network of distributed Reputation Servers? Yes, in some cases, such as when the reputation metric computation can be executed in a distributed fashion and can give meaningful results with partial information. Not all reputation metrics have these properties, however, so if the design goal of a distributed server is important, we should choose one that does.

Reputation is a subtle and important part of trade that motivates fair dealing. We have described technologies for translating the reputation concept into electronic trade, applicable to business transactions and peer-to-peer interaction. The Reputation Server provides these technologies. Scoring algorithms based on MLE and Bayesian techniques estimate reputations based on feedback received when trades occur. We describe enhancements for addressing the credibility of sources. Reputation domains, which are an abstraction mapped to the client marketplace, serve to store the configuration of rules about how reputations form for that marketplace, allowing the Reputation Server to be a platform for many different reputation systems.



[87] Reputation Server™ is a trademark of Reputation Technologies, Inc.

[88] These other risk reduction techniques can also be used with the Reputation Server.

[90] This constraint is relaxed later in the chapter.