Acquiring images from the Web

Our query images will come from a web search. Before we start implementing the search functionality, let's write some helper functions, which let us fetch images via the Requests library and convert them to an OpenCV-compatible format. Because this functionality is highly reusable, we will put it in a module of static utility functions. Let's create a file called RequestsUtils.py and import OpenCV, NumPy, and Requests, as follows:

import numpy
import cv2
import requests
import sys

As a global variable, let's store HEADERS, a dictionary of headers that we will use while making web requests. Some servers reject requests that appear to come from a bot. To improve the chance of our requests being accepted, let's set the 'User-Agent' header to a value that mimics a web browser, as follows:

# Spoof a browser's User-Agent string.
# Otherwise, some sites will reject us as a bot.
HEADERS = {
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; ' + \
    'rv:25.0) Gecko/20100101 Firefox/25.0'
}

Whenever we receive a response to a web request, we want to check whether the status code is 200 OK. This is only a cursory test of whether the response is valid, but it is a good enough test for our purposes. We will implement this test in the following method, validateResponse, which returns True if the response is deemed valid; otherwise it logs an error message and returns False:

def validateResponse(response):
  statusCode = response.status_code
  if statusCode == 200:
    return True
  url = response.request.url
  print >> sys.stderr, \
    'Received unexpected status code (%d) when requesting %s' % \
      (statusCode, url)
  return False

With the help of HEADERS and validateResponse, we can try to get an image from a URL and return that image in an OpenCV-compatible format (failing that, we will return None.) As an intermediate step, we will read raw data from a web response into a NumPy array using a function called numpy.fromstring. We will then interpret this data as an image using a function called cv2.imdecode. Here is our implementation of a function called cvImageFromUrl that accepts a URL as an argument:

def cvImageFromUrl(url):
  response = requests.get(url, headers=HEADERS)
  if not validateResponse(response):
    return None
  imageData = numpy.fromstring(response.content, numpy.uint8)
  image = cv2.imdecode(imageData, cv2.CV_LOAD_IMAGE_COLOR)
  if image is None:
    print >> sys.stderr, \
      'Failed to decode image from content of %s' % url
  return image

To test these two functions, let's give RequestsUtils.py a main function that downloads an image from the web, converts it to an OpenCV-compatible format, and writes it to the disk using an OpenCV function called imwrite. This is covered in the following implementation:

def main():
  image = \ cvImageFromUrl('http://nummist.com/images/ceiling.gaze.jpg')
  if image is not None:
    cv2.imwrite('image.png', image)

if __name__ == '__main__':
  main()

To confirm that everything worked, open image.png (which should be in the same directory as RequestsUtils.py) and compare it to the online image, which you can view in a web browser at http://nummist.com/images/ceiling.gaze.jpg.

Note

Although we are putting a simple test of our RequestUtils module in a main function, a more sophisticated and maintainable approach to write tests in Python is to use the classes in the unittest module of the standard library. Refer to the official tutorial here for more information: https://docs.python.org/2/library/unittest.html.