Scaling across with a CockroachDB-backed graph implementation

While the in-memory graph implementation is definitely a great asset for running our unit tests or even for spinning up small instances of the Links 'R' Us system for demonstration or end-to-end testing purposes, it's not really something that we would actually want to use in a production-grade system.

First and foremost, the data in the in-memory store will not persist across service restarts. Even if we could somehow address this limitation (for example, by creating periodic snapshots of the graph to disk), the best we can do is scale our graph up: for example, we can run the link graph service on a machine with a faster CPU and/or more memory. But that's about it; as we anticipate the graph size eventually outgrowing the storage capacity of a single node, we need to come up with a more efficient solution that can scale across multiple machines.

To this end, the following sections will explore a second graph implementation that utilizes a database system that can support our scaling requirements. While there are undoubtedly quite a few DBMS out there that can satisfy our needs, I have decided to base the graph implementation on CockroachDB [5] for the following set of reasons: