Scaling across with a CockroachDB-backed graph implementation

While the in-memory graph implementation is definitely a great asset for running our unit tests or even for spinning up small instances of the Links 'R' Us system for demonstration or end-to-end testing purposes, it's not really something that we would actually want to use in a production-grade system.

First and foremost, the data in the in-memory store will not persist across service restarts. Even if we could somehow address this limitation (for example, by creating periodic snapshots of the graph to disk), the best we can do is scale our graph up: for example, we can run the link graph service on a machine with a faster CPU and/or more memory. But that's about it; as we anticipate the graph size eventually outgrowing the storage capacity of a single node, we need to come up with a more efficient solution that can scale across multiple machines.

To this end, the following sections will explore a second graph implementation that utilizes a database system that can support our scaling requirements. While there are undoubtedly quite a few DBMS out there that can satisfy our needs, I have decided to base the graph implementation on CockroachDB ^[5] for the following set of reasons:

It can easily scale horizontally just by increasing the number of nodes available to the cluster. CockroachDB clusters can automatically rebalance and heal themselves when nodes appear or go down. This property makes it ideal for our use case!
CockroachDB is fully ACID-compliant and supports distributed SQL transactions.
The SQL flavor supported by CockroachDB is compatible with the PostgreSQL syntax, which many of you should already be familiar with.
CockroachDB implements the PostgreSQL wire protocol; this means that we do not require a specialized driver package to connect to the database but can simply use the battle-tested pure-Go Postgres ^[19] package to connect to the database.