Simplifying end user interactions with the dbspgraph package

This chapter explores the various components of the distributed job runner implementation in detail. Nevertheless, we would rather want to keep all of the internal details hidden from the intended user of the dbspgraph package.

Essentially, we need to come up with a simplified API that the end users will use to interact with our package. As it turns out, this is quite easy to do. Assuming that the end users have already created (and tested) their graph algorithm with the help of the bspgraph package, they only need to provide a simple adaptor for interacting with the algorithm implementation. The set of required methods is encapsulated in the Runner interface definition, which is outlined as follows:

type Runner interface {
    StartJob(Details, bspgraph.ExecutorFactory) (*bspgraph.Executor, error)
    CompleteJob(Details) error
    AbortJob(Details)
}

The first argument to each one of the Runner methods is a structure that contains metadata about the currently executing job. The Details type mirrors the fields of the JobDetails protocol buffer message that the master broadcasts to each worker and is defined as follows:

type Details struct {
    JobID string
    CreatedAt time.Time
    PartitionFromID uuid.UUID
    PartitionToID uuid.UUID
}

The StartJob method provides a hook for allowing the end users to initialize a bspgraph.Graph instance, load the appropriate set of data (vertices and edges), and use the provided ExecutorFactory argument to create a new Executor instance, which StartJob returns to the caller. As you probably guessed, our code will invoke StartJob with the appropriate custom execution factory depending on whether the code is executing on a worker or master node.

Once both the master and workers have completed the execution of the graph, we will arrange things so that the CompleteJob method is invoked. This is where the end user is expected to extract the computed application-specific results from the graph and persist them to the stable store.

On the other hand, should an error occur either while running the algorithm or while attempting to persist the results, our job coordinator will invoke the AbortJob method to notify the end user and let them properly clean up or take any required action for rolling back any changes already persisted to disk.