Exercises

Generate 1,000 random numbers r_i between 0 and 1. Decide on a bin size and then take a histogram of 1/r_i.

Are there similar numbers of entries (i.e., within a factor of ±10) in each histogram bin?
Propose a way of dealing with distributions that are highly nonlinear so that each bin has, within a factor of 10, the same amount of data.

Take three images of a hand in each of the three lighting conditions discussed in the text. Use cvCalcHist() to make an RGB histogram of the flesh color of one of the hands photographed indoors.

Try using just a few large bins (e.g., 2 per dimension), a medium number of bins (16 per dimension) and many bins (256 per dimension). Then run a matching routine (using all histogram matching methods) against the other indoor lighting images of hands. Describe what you find.
Now add 8 and then 32 bins per dimension and try matching across lighting conditions (train on indoor, test on outdoor). Describe the results.

As in exercise 2, gather RGB histograms of hand flesh color. Take one of the indoor histogram samples as your model and measure EMD (earth mover's distance) against the second indoor histogram and against the first outdoor shaded and first outdoor sunlit histograms. Use these measurements to set a distance threshold.

Using this EMD threshold, see how well you detect the flesh histogram of the third indoor histogram, the second outdoor shaded, and the second outdoor sunlit histograms. Report your results.
Take histograms of randomly chosen nonflesh background patches to see how well your EMD discriminates. Can it reject the background while matching the true flesh histograms?

Using your collection of hand images, design a histogram that can determine under which of the three lighting conditions a given image was captured. Toward this end, you should create features—perhaps sampling from parts of the whole scene, sampling brightness values, and/or sampling relative brightness (e.g., from top to bottom patches in the frame) or gradients from center to edges.

Assemble three histograms of flesh models from each of our three lighting conditions.

Use the first histograms from indoor, outdoor shaded, and outdoor sunlit as your models. Test each one of these against the second images in each respective class to see how well the flesh-matching score works. Report matches.
Use the "scene detector" you devised in part a, to create a "switching histogram" model. First use the scene detector to determine which histogram model to use: indoor, outdoor shaded, or outdoor sunlit. Then use the corresponding flesh model to accept or reject the second flesh patch under all three conditions. How well does this switching model work?

Create a flesh-region interest (or "attention") detector.

Just indoors for now, use several samples of hand and face flesh to create an RGB histogram.
Use cvCalcBackProject() to find areas of flesh.
Use cvErode() from Chapter 5 to clean up noise and then cvFloodFill() (from the same chapter) to find large areas of flesh in an image. These are your "attention" regions.

Try some hand-gesture recognition. Photograph a hand about 2 feet from the camera, create some (nonmoving) hand gestures: thumb up, thumb left, thumb right.

Using your attention detector from exercise 6, take image gradients in the area of detected flesh around the hand and create a histogram model for each of the three gestures. Also create a histogram of the face (if there's a face in the image) so that you'll have a (nongesture) model of that large flesh region. You might also take histograms of some similar but nongesture hand positions, just so they won't be confused with the actual gestures.
Test for recognition using a webcam: use the flesh interest regions to find "potential hands"; take gradients in each flesh region; use histogram matching above a threshold to detect the gesture. If two models are above threshold, take the better match as the winner.
Move your hand 1–2 feet further back and see if the gradient histogram can still recognize the gestures. Report.

Repeat exercise 7 but with EMD for the matching. What happens to EMD as you move your hand back?

With the same images as before but with captured image patches instead of histograms of the flesh around the hand, use cvMatchTemplate() instead of histogram matching. What happens to template matching when you move your hand backwards so that its size is smaller in the image?