Some examples of MapReduce applications

Here are a few examples of big data problems that can be solved with the MapReduce framework:

  1. Given a repository of text files, find the frequency of each word. This is called the WordCount problem.
  2. Given a repository of text files, find the number of words of each word length.
  3. Given two matrices in sparse matrix format, compute their product.
  4. Factor a matrix given in sparse matrix format.
  5. Given a symmetric graph whose nodes represent people and edges represent friendship, compile a list of common friends.
  6. Given a symmetric graph whose nodes represent people and edges represent friendship, compute the average number of friends by age.
  7. Given a repository of weather records, find the annual global minima and maxima by year.
  8. Sort a large list. Note that in most implementations of the MapReduce framework, this problem is trivial, because the framework automatically sorts the output from the map() function.
  9. Reverse a graph.
  10. Find a minimal spanning tree (MST) of a given weighted graph.
  11. Join two large relational database tables.

Examples 9 and 10 apply to graph structures (nodes and edges). For very large graphs, more efficient methods have been developed recently; for example, the Apache Hama framework created by Edward Yoon.