Creating a distributed barrier for the graph execution steps

A barrier can be thought of as a rendezvous point for a set of processes. Once a process enters the barrier, it is prevented from making any progress until all other expected processes also enter the barrier.

In Go, we could model a barrier with the help of the sync.WaitGroup primitive, as follows:

func barrier(numWorkers int) {
    var wg sync.WaitGroup
    wg.Add(numWorkers)

    for i := 0; i < numWorkers; i++ {
        go func() {
            wg.Done()
            fmt.Printf("Entered the barrier; waiting for other goroutines to join")
            wg.Wait()
            fmt.Printf("Exited the barrier")
        }()
    }

    wg.Wait()
}

To guarantee that each worker executes the various stages of the graph state machine in lock-step with the other workers, we must implement a similar barrier primitive. However, as far as our particular application is concerned, the goroutines that we are interested in synchronizing execute on different hosts. This obviously complicates things as we now need to come up with a distributed barrier implementation!

As we mentioned in the previous section, the master node will serve the role of the coordinator for the distributed barrier. To make the code easier to follow, in the following subsections, we will split our distributed barrier implementations into a worker-side and master-side implementation and examine them separately of each other.