Introduction to Information Retrieval by Manning, Christopher D. -- Read -- Imperial Library of Trantor

Index

Title Page Contents Copyright Page Preface 1 Boolean retrieval

1.1 An example information retrieval problem 1.2 A first take at building an inverted index 1.3 Processing Boolean queries 1.4 The extended Boolean model versus ranked retrieval 1.5 References and further reading

2 The term vocabulary and postings lists

2.1 Document delineation and character sequence decoding 2.2 Determining the vocabulary of terms 2.3 Faster postings list intersection via skip pointers 2.4 Positional postings and phrase queries 2.5 References and further reading

3 Dictionaries and tolerant retrieval

3.1 Search structures for dictionaries 3.2 Wildcard queries 3.3 Spelling correction 3.4 Phonetic correction 3.5 References and further reading

4 Index construction

4.1 Hardware basics 4.2 Blocked sort-based indexing 4.3 Single-pass in-memory indexing 4.4 Distributed indexing 4.5 Dynamic indexing 4.6 Other types of indexes 4.7 References and further reading

5 Index compression

5.1 Statistical properties of terms in information retrieval 5.2 Dictionary compression 5.3 Postings file compression 5.4 References and further reading

6 Scoring, term weighting, and the vector space model

6.1 Parametric and zone indexes 6.2 Term frequency and weighting 6.3 The vector space model for scoring 6.4 Variant tf–idf functions 6.5 References and further reading

7 Computing scores in a complete search system

7.1 Efficient scoring and ranking 7.2 Components of an information retrieval system 7.3 Vector space scoring and query operator interaction 7.4 References and further reading

8 Evaluation in information retrieval

8.1 Information retrieval system evaluation 8.2 Standard test collections 8.3 Evaluation of unranked retrieval sets 8.4 Evaluation of ranked retrieval results 8.5 Assessing relevance 8.6 A broader perspective: System quality and user utility 8.7 Results snippets 8.8 References and further reading

9 Relevance feedback and query expansion

9.1 Relevance feedback and pseudo relevance feedback 9.2 Global methods for query reformulation 9.3 References and further reading

10 XML retrieval

10.1 Basic XML concepts 10.2 Challenges in XML retrieval 10.3 A vector space model for XML retrieval 10.4 Evaluation of XML retrieval 10.5 Text-centric versus data-centric XML retrieval 10.6 References and further reading

11 Probabilistic information retrieval

11.1 Review of basic probability theory 11.2 The probability ranking principle 11.3 The binary independence model 11.4 An appraisal and some extensions 11.5 References and further reading

12 Language models for information retrieval

12.1 Language models 12.2 The query likelihood model 12.3 Language modeling versus other approaches in information retrieval 12.4 Extended language modeling approaches 12.5 References and further reading

13 Text classification and Naive Bayes

13.1 The text classification problem 13.2 Naive Bayes text classification 13.3 The Bernoulli model 13.4 Properties of Naive Bayes 13.5 Feature selection 13.6 Evaluation of text classification 13.7 References and further reading

14 Vector space classification

14.1 Document representations and measures of relatedness in vector spaces 14.2 Rocchio classification 14.3 k nearest neighbor 14.4 Linear versus nonlinear classifiers 14.5 Classification with more than two classes 14.6 The bias–variance tradeoff 14.7 References and further reading

15 Support vector machines and machine learning on documents

16 Flat clustering

16.1 Clustering in information retrieval 16.2 Problem statement 16.3 Evaluation of clustering 16.4 K-means 16.5 Model-based clustering 16.6 References and further reading

17 Hierarchical clustering

17.1 Hierarchical agglomerative clustering 17.2 Single-link and complete-link clustering 17.3 Group-average agglomerative clustering 17.4 Centroid clustering 17.5 Optimality of hierarchical agglomerative clustering 17.6 Divisive clustering 17.7 Cluster labeling 17.8 Implementation notes 17.9 References and further reading

18 Matrix decompositions and latent semantic indexing

18.1 Linear algebra review 18.2 Term–document matrices and singular value decompositions 18.3 Low-rank approximations 18.4 Latent semantic indexing 18.5 References and further reading

19 Web search basics

19.1 Background and history 19.2 Web characteristics 19.3 Advertising as the economic model 19.4 The search user experience 19.5 Index size and estimation 19.6 Near-duplicates and shingling 19.7 References and further reading

20 Web crawling and indexes

20.1 Overview 20.2 Crawling 20.3 Distributing indexes 20.4 Connectivity servers 20.5 References and further reading

21 Link analysis

21.1 The Web as a graph 21.2 PageRank 21.3 Hubs and authorities 21.4 References and further reading

Bibliography Index