Some examples of MapReduce applications
Here are a few examples of big data problems that can be solved with the MapReduce framework:
- Given a repository of text files, find the frequency of each word. This is called the WordCount problem.
- Given a repository of text files, find the number of words of each word length.
- Given two matrices in sparse matrix format, compute their product.
- Factor a matrix given in sparse matrix format.
- Given a symmetric graph whose nodes represent people and edges represent friendship, compile a list of common friends.
- Given a symmetric graph whose nodes represent people and edges represent friendship, compute the average number of friends by age.
- Given a repository of weather records, find the annual global minima and maxima by year.
- Sort a large list. Note that in most implementations of the MapReduce framework, this problem is trivial, because the framework automatically sorts the output from the
map()
function. - Reverse a graph.
- Find a minimal spanning tree (MST) of a given weighted graph.
- Join two large relational database tables.
Examples 9 and 10 apply to graph structures (nodes and edges). For very large graphs, more efficient methods have been developed recently; for example, the Apache Hama framework created by Edward Yoon.