Ask any highly successful company out there and they will all unequivocally agree that data is a precious commodity. Companies use data to not only make informed short-term decisions that affect their day to day operations but also as a guide for shaping their strategy in the long term. In fact, in some industries (such as advertising), data is the product!
Nowadays, with the advent of cheap storage solutions, the collection of data has increased exponentially in comparison to the last few years. Furthermore, the rate of increase in storage requirements is expected to keep following an exponential curve well into the future.
While there are quite a few solutions for processing structured data (such as systems supporting map-reduce operations), they fall short when the data to be processed is organized as a graph. Running specialized algorithms against massive graphs is a fairly common use case for companies in the field of logistics or companies that operate social networks.
In this chapter, we will be focusing our attention on systems that process graphs at scale. More specifically, the following topics will be covered:
- Understanding the Bulk Synchronous Parallel (BSP) model for distributing computation across multiple nodes
- Applying the BSP model principles to create our very own graph processing system in Go
- Using the graph system as a platform for solving graph-based problems such as shortest path and graph coloring
- Implementing an iterative version of the PageRank algorithm for the Links 'R' Us project