In search, you're only as good as your first results. While the exact numbers vary by system and depend upon the users, the task, and the interface, it's a safe bet that the top three results will draw 80 percent of the attention. The remaining results on the first page may each earn a few percentage points. After that, visibility drops off a cliff.
This is important for two reasons. First, surfacing great results satisfies the simplest use case for search. Users enter queries, scan the first few results, click a link, and the search is complete. Best first is crucial to making search simple, fast, and relevant. Second, the first few results inordinately influence query reformulation. Users enter search terms, scan and learn from the first few results, and try a different query. For the 25–50 percent of search sessions that involve query reformulation, those first three results are a vital part of the user interface. What we find changes what we seek.
Consequently, best first must be a top priority during search engine selection. Highquality, transparent, flexible result-ranking algorithms are critical to success. They must be good out of the gate, and they should support tuning to the unique requirements of a particular content collection or application. The algorithms should account for:
These algorithms focus on topical relevance or aboutness. They aim to match the query keywords to the text of the content and metadata. Effective algorithms account for term order, proximity, location, frequency, and document length. An exact phrase match in a short title is worth more than an AND co-occurrence in a long body. A phrase that repeats on a page but is rare on the site merits extra weight. Relevance algorithms must also manage the transformation of text queries to account for plurals and other word variants (e.g., poet and poetry). Tuning may be required to get the right balance of precision and recall. Relevance is typically the default setting and is often in truth a hybrid that combines the inputs of multiple algorithms into a balanced solution.
In most contexts, social data can deliver a big boost to semantic algorithms. Google's PageRank, which counts links as votes, was the first mainstream success. Today, popularity is typically a multialgorithmic measure. At Flickr, a photo's interestingness derives from views, comments, notes, bookmarks, favorites, and so on. At Amazon, users can sort by Bestselling or Best Reviews, but even when they sort by relevance, social data influences the results.
Sorting by date is rarely a good default, but it is a useful option, especially for news and email applications in which reverse chronological order (newest first) is relatively common. In many cases, the date of publication or modification can serve as a valuable input into the general-purpose relevance algorithm by improving the freshness of top results.
In pure form, format and content type are most useful as filters, allowing users to view only images, videos, or news. However, they can also help to boost the best results. For instance, on an intranet, HTML and PDF documents may be more polished than .doc or .xls files. In such cases, application-specific tuning that brings the best formats to the top is extremely valuable.
A user's search history, social network, or current location (online or off) are just a few inputs that might influence the order of results. We'll delve into this topic when we explore the personalized search pattern. For now, let's just note that personalization is at least as difficult as it is desirable.
In search, it's easy to get too much of a good thing. Diversity algorithms guard against redundant results and support query clarification and refinement by surfacing distinct meanings (e.g., apple and AAPL) and formats. Application-specific tuning delivers the right balance and a nice blend of results.
As designers, we need not understand exactly how these algorithms work, but they must be on our requirements list during search engine selection. We must tune them to our content or application. Generally, a blended default is in order. Users typically want results that are relevant, popular, and timely. A pure sort order, like the one shown in Figure 4-11, is a nice option but a poor default. Since The Little Prince is among the most popular books ever written, that's most likely the best first result. But without algorithms enhanced by social data, this library database serves up The Little Lame Prince instead.
Of course, algorithms aren't the only way to the top. While we must rely on software and distributed user behavior (e.g., tagging, bookmarking) to manage the long tail of search, applying centralized editorial effort to suggest Best Bets for the most common queries delivers a substantial return on investment. In most cases, the analysis of search query data reveals a power law distribution and invites us to apply the 80/20 rule. A small number of unique search phrases accounts for a large percentage of total queries. It has become a best practice for managers of large websites to integrate a simple database that matches these common search phrases to good starting points or destinations.
Best Bets goes by many names, including Suggested Links, Recommended Results, and Editor's Choice. Figure 4-13 shows that Microsoft also complements the algorithmic results with related products and downloads. This diversity enables query disambiguation by letting users clarify whether they want to buy a product or need support. Clearly, there's also an opportunity for cross-selling and upselling. Best Bets and search analytics in general are as useful in marketing circles as they are in the world of user experience.
For Best Bets, design considerations include the number and presentation of suggested links and their relationship to algorithmic results. Generally, one to three suggestions per query is sufficient. Ideally, links that appear as Best Bets are removed from the algorithmic list to avoid wasting valuable space with redundant results. And while it's not necessary to spatially separate the two types of results, in the interest of transparency it's helpful to label the Best Bets and visually distinguish them from the natural results.
In short, best first is the most universal and important design pattern in search. Its design is intertwingled with other patterns. The first few results must satisfy the simple lookup and support query clarification and refinement, but those results may appear first in autocomplete and may be modified using faceted navigation. Or they may be prequalified with advanced or personalized search. Finally, it's no good delivering the best of the worst. It's impossible for users to find what they need when searching the wrong place, which is why we must study the precarious pattern of federated search.