Acquiring images from the Web

Our query images will come from a web search. Before we start implementing the search functionality, let's write some helper functions, which let us fetch images via the Requests library and convert them to an OpenCV-compatible format. Because this functionality is highly reusable, we will put it in a module of static utility functions. Let's create a file called RequestsUtils.py and import OpenCV, NumPy, and Requests, as follows:

import numpy
import cv2
import requests
import sys

As a global variable, let's store HEADERS, a dictionary of headers that we will use while making web requests. Some servers reject requests that appear to come from a bot. To improve the chance of our requests being accepted, let's set the 'User-Agent' header to a value that mimics a web browser, as follows:

# Spoof a browser's User-Agent string.
# Otherwise, some sites will reject us as a bot.
HEADERS = {
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; ' + \
    'rv:25.0) Gecko/20100101 Firefox/25.0'
}

Whenever we receive a response to a web request, we want to check whether the status code is 200 OK. This is only a cursory test of whether the response is valid, but it is a good enough test for our purposes. We will implement this test in the following method, validateResponse, which returns True if the response is deemed valid; otherwise it logs an error message and returns False:

def validateResponse(response):
  statusCode = response.status_code
  if statusCode == 200:
    return True
  url = response.request.url
  print >> sys.stderr, \
    'Received unexpected status code (%d) when requesting %s' % \
      (statusCode, url)
  return False

With the help of HEADERS and validateResponse, we can try to get an image from a URL and return that image in an OpenCV-compatible format (failing that, we will return None.) As an intermediate step, we will read raw data from a web response into a NumPy array using a function called numpy.fromstring. We will then interpret this data as an image using a function called cv2.imdecode. Here is our implementation of a function called cvImageFromUrl that accepts a URL as an argument:

def cvImageFromUrl(url):
  response = requests.get(url, headers=HEADERS)
  if not validateResponse(response):
    return None
  imageData = numpy.fromstring(response.content, numpy.uint8)
  image = cv2.imdecode(imageData, cv2.CV_LOAD_IMAGE_COLOR)
  if image is None:
    print >> sys.stderr, \
      'Failed to decode image from content of %s' % url
  return image

To test these two functions, let's give RequestsUtils.py a main function that downloads an image from the web, converts it to an OpenCV-compatible format, and writes it to the disk using an OpenCV function called imwrite. This is covered in the following implementation:

def main():
  image = \ cvImageFromUrl('http://nummist.com/images/ceiling.gaze.jpg')
  if image is not None:
    cv2.imwrite('image.png', image)

if __name__ == '__main__':
  main()

To confirm that everything worked, open image.png (which should be in the same directory as RequestsUtils.py) and compare it to the online image, which you can view in a web browser at http://nummist.com/images/ceiling.gaze.jpg.