Microsoft's search engine, Bing, has an API that enables us to send queries and receive results in our own application. For a certain number of queries (currently, 5,000 per month), Bing Search API is free to use. However, we must register for it by taking the following steps:
Bing Search API
in the Search the Marketplace field. A list of results should appear. Click on the link for Bing Search API (not any variant such as Bing Search API – Web Results Only).Bing Search API has a third-party Python wrapper called pyBingSearchAPI. Download a ZIP archive of this wrapper from https://github.com/xthepoet/pyBingSearchAPI/archive/master.zip. Unzip it to our project folder. A script, bing_search_api.py
, should now be located alongside our other scripts.
To build atop pyBingSearchAPI
, we want a high-level interface to submit a query string and navigate through a resulting list of images, which should be in an OpenCV-compatible format. We will make a class, ImageSearchSession
, offering such an interface. First, let's create a file, ImageSearchSession.py
, and add the following import statements at the start of the file:
import bing_search_api import numpy import cv2 import pprint import RequestsUtils
Note that we are using OpenCV, as well as pyBingSearchAPI, pretty-print (to log JSON results from the search), and our networking utility functions.
ImageSearchSession
has member variables that store our Bing session (initialized using the Primary Account Key that we had copied from Azure Marketplace), the current query, metadata about the current image results, and metadata that helps us navigate to the previous and next results. We can initialize these variables as seen in the following code:
class ImageSearchSession(object): def __init__(self): # Replace the x's with the Primary Account Key of your # Microsoft Account. bingKey = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' self.verbose = False self._bing = bing_search_api.BingSearchAPI(bingKey) self._query = '' self._results = [] self._offset = 0 self._numResultsRequested = 0 self._numResultsReceived = 0 self._numResultsAvailable = 0
As an alternative to being hardcoded in the script, our Primary Account Key can be loaded from a custom environment variable. For example, if we defined an environment variable called BING_KEY
, we will get its value in Python as os.environ['BING_KEY']
. (We would need to import the os
module of the Python standard library.) This change would make it easier to securely share our script because we would not need to blank out our Primary Account Key.
We will provide getters for many of the member variables, as follows:
@property def query(self): return self._query @property def offset(self): return self._offset @property def numResultsRequested(self): return self._numResultsRequested @property def numResultsReceived(self): return self._numResultsReceived @property def numResultsAvailable(self): return self._numResultsAvailable
Given these variables, we can navigate through a large set of results by fetching only a few results at a time, that is, by looking at a window into the results. We can move our window to earlier or later results, as needed, by simply adjusting the offset by the number of requested results and clamping the offset to the valid range. Here are the implementations of searchPrev
and searchNext
methods, which rely on a more general search method that we will implement afterwards:
def searchPrev(self): if self._offset == 0: return offset = max(0, self._offset - self._numResultsRequested) self.search(self._query, self._numResultsRequested, offset) def searchNext(self): if self._offset + self._numResultsRequested >= \ self._numResultsAvailable: return offset = self._offset + self._numResultsRequested self.search(self._query, self._numResultsRequested, offset)
The more general-purpose search
method accepts a query string, a maximum number of results, and an offset relative to the first available result. We will store these arguments in member variables for reuse in the searchPrev
and searchNext
methods. Here is the first part of this method's implementation:
def search(self, query, numResultsRequested=20, offset=0): self._query = query self._numResultsRequested = numResultsRequested self._offset = offset
Then, we set up our search parameters, specifying that the results should be in the JSON format and should include color photos only:
params = { 'ImageFilters': '"color:color+style:photo"', '$format': 'json', '$top': numResultsRequested, '$skip': offset }
We will request the results and parse the response to get the portion of the JSON related to image metadata, which we will store for use in other methods:
response = self._bing.search('image', query, params) if not RequestsUtils.validateResponse(response): self._offset = 0 self._numResultsReceived = 0 return # In some versions, requests.Response.json is a dict. # In other versions, it is a method returning a dict. # Get the dict in either case. json = response.json if (hasattr(json, '__call__')): json = json() metaResults = json[u'd'][u'results'][0] if self.verbose: print \ 'Got results of Bing image search for "%s":' % \ query pprint.pprint(metaResults) self._results = metaResults[u'Image']
We will also parse and store metadata of the actual offset, actual number of results received, and number of results available:
self._offset = int(metaResults[u'ImageOffset']) self._numResultsReceived = len(self._results) self._numResultsAvailable = \ int(metaResults[u'ImageTotal'])
Although the
search
method fetches a textual description of results, including image URLs, it does not actually fetch any full-sized images. This is good because the full-sized images might be large and we do not need them all at once. Instead, we will provide another method, getCvImageAndUrl
, to retrieve the image and image URL that have a specified index in the current results. The index is given as an argument. As an optional second argument, this method accepts a Boolean value that indicates whether a thumbnail should be used instead of the full-sized image. Thumbnails are included directly in the query results, so retrieval is very quick in this case. Full-sized images must be downloaded, so we use cvImageFromUrl
to fetch and convert them. This implementation is done in the following code:
def getCvImageAndUrl(self, index, useThumbnail = False): if index >= self._numResultsReceived: return None, None result = self._results[index] url = result[u'MediaUrl'] if useThumbnail: result = result[u'Thumbnail'], url return RequestsUtils.cvImageFromUrl(url), url
The caller of getCvImageAndUrl
is responsible for dealing gracefully with the image downloads that are slow or that fail. Remember that our cvImageFromUrl
function just logs an error and returns None
if the download fails.
To test ImageSearchSession
, let's write a main function that instantiates the class, sets verbose to True
, searches for 'luxury condo sales'
, and writes the first resulting image to disk as shown in the following implementation:
def main(): session = ImageSearchSession() session.verbose = True session.search('luxury condo sales') image, url = session.getCvImageAndUrl(0) cv2.imwrite('image.png', image) if __name__ == '__main__': main()
Now that we have a classifier and a search session, we are almost ready to proceed to the frontend of the Luxocator. We just need a few more utility functions to help us prepare data and images to bundle and display them.