Indexing and looking up documents

To upsert a document to the index, we need to submit an update operation to the Elasticsearch cluster. The update request's contents is populated using the following block of code:

esDoc := makeEsDoc(doc)
update := map[string]interface{}{
    "doc":           esDoc,
    "doc_as_upsert": true,
}

The makeEsDoc helper converts the input indexer.Document instance into a representation that Elasticsearch can process. It is important to note that the mapped document does not include a PageRank score value, even if that is present in the original docs. This is intentional as we only allow PageRank scores to be mutated via a call to UpdateScore. The doc_as_upsert flag serves as a hint to Elasticsearch that it should create the document if it does not exist, that is, it should treat the update request as an upsert operation.

After populating the update document, we just need to serialize it into JSON, execute a synchronous update, and check for any reported errors:

var buf bytes.Buffer
err := json.NewEncoder(&buf).Encode(doc)
if err != nil {
    return xerrors.Errorf("index: %w", err)
}

res, err := i.es.Update(indexName, esDoc.LinkID, &buf, i.es.Update.WithRefresh("true"))
if err != nil {
    return xerrors.Errorf("index: %w", err)
}

var updateRes esUpdateRes
if err = unmarshalResponse(res, &updateRes); err != nil {
    return xerrors.Errorf("index: %w", err)
}

When performing any API call to Elasticsearch using the go-elastic client, errors can be reported in two different ways:

The client returns an error and a nil response value. This can happen, for instance, if the DNS resolution for the Elasticsearch nodes fails or if the client can't connect to any of the provided node addresses.
Elasticsearch sends a JSON response that contains a structured error as its payload.

To deal with the latter case, we can use the handy unmarshalResponse helper, which checks for the presence of errors in the response and returns them as regular Go error values.

What about document lookups? This operation is modeled as a search query where we try to match a single document with a specific link ID value. Like any other request to the Elasticsearch cluster, search queries are specified as JSON documents that are sent to the cluster via an HTTP POST request. The FindByID implementation creates the search query inline by defining a nested block of map[string]interface{} items which are then serialized via a JSON encoder instance:

var buf bytes.Buffer
query := map[string]interface{}{
    "query": map[string]interface{}{
        "match": map[string]interface{}{
            "LinkID": linkID.String(),
        },
    },
    "from": 0,
    "size": 1,
}
if err := json.NewEncoder(&buf).Encode(query); err != nil {
    return nil, xerrors.Errorf("find by ID: %w", err)
}

At this point, I would like to point out that I only opted to use an inline, type-less approach to define the search query for simplicity. Ideally, instead of using maps, you would define nested structs for each portion of the query. Besides the obvious benefits of working with typed values, one other important benefit of working with structs is that we can switch to a much more efficient JSON encoder implementation that doesn't require the use of reflection. One such example is easyjson ^[10], which utilizes code generation to create efficient JSON encoder/decoders and promises a 4x-5x increase in speed over the JSON encoder implementation that ships with the Go standard library.

After our query has been successfully serialized to JSON, we invoke the runSearch helper, which submits the query to Elasticsearch. The helper will then unserialize the obtained response into a nested struct while at the same time checking for the presence of errors:

searchRes, err := runSearch(i.es, query)
if err != nil {
    return nil, xerrors.Errorf("find by ID: %w", err)
}

if len(searchRes.Hits.HitList) != 1 {
    return nil, xerrors.Errorf("find by ID: %w", index.ErrNotFound)
}

doc := mapEsDoc(&searchRes.Hits.HitList[0].DocSource)

If everything goes according to plan, we will receive a single result. The obtained result is then passed to the mapEsDoc helper, which converts it back into a Document model instance, as follows:

func mapEsDoc(d *esDoc) *index.Document {
    return &index.Document{
        LinkID:    uuid.MustParse(d.LinkID),
        URL:       d.URL,
        Title:     d.Title,
        Content:   d.Content,
        IndexedAt: d.IndexedAt.UTC(),
        PageRank:  d.PageRank,
    }
}

As you can see in the preceding snippet, the majority of the fields are just copied over to the document with the exception of the LinkID field, which must be parsed from a string representation into a UUID value first. The converted document is then returned to the caller of the FindByID method.