Content-based filtering, on the other hand, is based on a description of items and a profile of a user's preferences, which is combined as follows. First, the items are described with attributes, and to find similar items, we measure the distances between items using a distance measure, such as the cosine distance or Pearson coefficient (there is more about distance measures in Chapter 1, Applied Machine Learning Quick Start). Now, the user profile enters the equation. Given the feedback about the kinds of items the user likes, we can introduce weights, specifying the importance of a specific item attribute. For instance, the Pandora Radio streaming service applies content-based filtering to create stations, using more than 400 attributes. A user initially picks a song with specific attributes, and, by providing feedback, important song attributes are emphasized.
Initially, this approach needs very little information on user feedback; thus, it effectively avoids the cold start issue.