Book ratings dataset

In this chapter, we will work with a book ratings dataset (Ziegler et al., 2005) that was collected in a four-week crawl. It contains data on 278,858 members of the Book-Crossing website and 1,157,112 ratings, both implicit and explicit, referring to 271,379 distinct ISBNs. User data is anonymized, but with demographic information. The dataset is taken from Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen: Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan (

The Book-Crossing dataset is comprised of three files, as follows:

  • BX-Users: This contains the users. Note that user IDs (User-ID) have been anonymized and mapped to integers. Demographic data is provided (Location and Age) if available. Otherwise, these fields contain null values.
  • BX-Books: Books are identified by their respective ISBNs. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (Book-Title, Book-Author, Year-Of-Publication, and Publisher), which has been obtained from Amazon Web Services. Note that in the case of several authors, only the first author is provided. URLs linking to cover images are also given, appearing in three different flavors (Image-URL-S, Image-URL-M, and Image-URL-L), referring to small, medium, and large URLs. These URLs point to the Amazon website.
  • BX-Book-Ratings: This contains the book rating information. Ratings (Book-Rating) are either explicit, expressed on a scale of 1-10 (with higher values denoting higher appreciation), or implicit, expressed by 0.