Mining of Massive Datasets by Leskovec, Jure -- Read -- Imperial Library of Trantor

Index

Halftitle page Title page Copyright page Contents Preface 1 Data Mining

1.1 What is Data Mining? 1.2 Statistical Limits on Data Mining 1.3 Things Useful to Know 1.4 Outline of the Book 1.5 Summary of Chapter 1 1.6 References for Chapter 1

2 MapReduce and the New Software Stack

2.1 Distributed File Systems 2.2 MapReduce 2.3 Algorithms Using MapReduce 2.4 Extensions to MapReduce 2.5 The Communication Cost Model 2.6 Complexity Theory for MapReduce 2.7 Summary of Chapter 2 2.8 References for Chapter 2

3 Finding Similar Items

3.1 Applications of Near-Neighbor Search 3.2 Shingling of Documents 3.3 Similarity-Preserving Summaries of Sets 3.4 Locality-Sensitive Hashing for Documents 3.5 Distance Measures 3.6 The Theory of Locality-Sensitive Functions 3.7 LSH Families for Other Distance Measures 3.8 Applications of Locality-Sensitive Hashing 3.9 Methods for High Degrees of Similarity 3.10 Summary of Chapter 3 3.11 References for Chapter 3

4 Mining Data Streams

4.1 The Stream Data Model 4.2 Sampling Data in a Stream 4.3 Filtering Streams 4.4 Counting Distinct Elements in a Stream 4.5 Estimating Moments 4.6 Counting Ones in a Window 4.7 Decaying Windows 4.8 Summary of Chapter 4 4.9 References for Chapter 4

5 Link Analysis

5.1 PageRank 5.2 Efficient Computation of PageRank 5.3 Topic-Sensitive PageRank 5.4 Link Spam 5.5 Hubs and Authorities 5.6 Summary of Chapter 5 5.7 References for Chapter 5

6 Frequent Itemsets

6.1 The Market-Basket Model 6.2 Market Baskets and the A-Priori Algorithm 6.3 Handling Larger Datasets in Main Memory 6.4 Limited-Pass Algorithms 6.5 Counting Frequent Items in a Stream 6.6 Summary of Chapter 6 6.7 References for Chapter 6

7 Clustering

7.1 Introduction to Clustering Techniques 7.2 Hierarchical Clustering 7.3 K-means Algorithms 7.4 The CURE Algorithm 7.5 Clustering in Non-Euclidean Spaces 7.6 Clustering for Streams and Parallelism 7.7 Summary of Chapter 7 7.8 References for Chapter 7

8 Advertising on the Web

8.1 Issues in On-Line Advertising 8.2 On-Line Algorithms 8.3 The Matching Problem 8.4 The Adwords Problem 8.5 Adwords Implementation 8.6 Summary of Chapter 8 8.7 References for Chapter 8

9 Recommendation Systems

9.1 A Model for Recommendation Systems 9.2 Content-Based Recommendations 9.3 Collaborative Filtering 9.4 Dimensionality Reduction 9.5 The NetFlix Challenge 9.6 Summary of Chapter 9 9.7 References for Chapter 9

10 Mining Social-Network Graphs

10.1 Social Networks as Graphs 10.2 Clustering of Social-Network Graphs 10.3 Direct Discovery of Communities 10.4 Partitioning of Graphs 10.5 Finding Overlapping Communities 10.6 Simrank 10.7 Counting Triangles 10.8 Neighborhood Properties of Graphs 10.9 Summary of Chapter 10 10.10 References for Chapter 10

11 Dimensionality Reduction

11.1 Eigenvalues and Eigenvectors 11.2 Principal-Component Analysis 11.3 Singular-Value Decomposition 11.4 CUR Decomposition 11.5 Summary of Chapter 11 11.6 References for Chapter 11

12 Large-Scale Machine Learning

12.1 The Machine-Learning Model 12.2 Perceptrons 12.3 Support-Vector Machines 12.4 Learning from Nearest Neighbors 12.5 Comparison of Learning Methods 12.6 Summary of Chapter 12 12.7 References for Chapter 12

Index