Using the Bing Visual Search API, one can interpret images. This API allows us to gain insights about images. This includes finding visually similar images, searches, and shopping sources. It can also identify people, places, and objects, as well as text.
You will typically upload an image to the API to retrieve insights on it. In addition, you can pass on an URL to an image.
The endpoint you should use to query the Bing Visual Search API is https://api.cognitive.microsoft.com/bing/v7.0/images/visualsearch.
In either scenario, the following query parameters can be added:
In addition, two content headers must be specified. These are Content-Type
and Ocp-Apim-Subscription-Key
. The first one must be set to multipart/form-data;boundary={BOUNDARY}
. The latter must specify the API key.
For more information on content headers, please visit https://docs.microsoft.com/en-us/azure/cognitive-services/bing-visual-search/overview#content-form-types.
Once the request has gone through, a JSON object will be returned as a response.
This object will contain two objects: an array of tags
and an image
string. The image string is simply the insights token for the image. The list of tags
contains a tag
name and a list of actions
(insights). A tag, in this context, means category. For instance, if an actor is recognized in the image, the tag for this might be Actor.
Each action, or insight, describes something of the image. It might describe text in the image or different products discovered in the image. Each action includes a whole variety of data.
To see a full list of default insights, please visit https://docs.microsoft.com/en-us/azure/cognitive-services/bing-visual-search/default-insights-tag.