Computer vision is a world of knowledge and a decades-long research pursuit. Unlike many other disciplines, computer vision is not strongly hierarchical or vertical, which means new solutions for given problems are not always better and may not be based on preceding work. Being an applied field, computer vision algorithms are created with attention to the following aspects, which may explain the non-vertical development:
- Computation resources: CPU, GPU, embedded system, memory footprint, network connectivity.
- Data: Size of images, number of images, number of image stream (cameras), data type, sequentiality, lighting conditions, types of scenes, and so on.
- Performance requirements: Real-time output or another timing constraint (for example, human perception), accuracy and precision.
- Meta-algorithmic: Algorithm simplicity (cross-reference Occam's Razor theorem), implementation system and external tools, availability of formal proof.
With every algorithm created in order to cater for perhaps a certain one of these considerations, one can never know for sure that it will outperform all others without properly testing some or all of them. Granted, testing all algorithms for a given problem is unrealistic, even if the implementations are indeed available, and OpenCV certainly has many implementations available, as we've seen in the last section. On the other hand, computer vision engineers will be remiss if they do not consider the possibility that their implementations are not optimal because of their algorithm choices. This in essence flows from the no free lunch theorem, which states, in broad strokes, that no single algorithm is the best one over the entire space of possible datasets.
It is, therefore, a very welcome practice to test a set of different algorithmic options before committing to the best one out of that set. But how do we find the best one? The word best implies that each one will be better (or worse) than the others, which in turn suggests there is an objective scale or measurement in which they are all scored and ranked in order. Obviously, there is not a single measure (metric) for all algorithms in all problems each problem will have its own. In many cases, the metric for success will form a measurement of error, deviation from a known ground truth value that was sourced from humans or other algorithms we can trust. In optimization, this is known as a loss function or cost function, a function that we wish to minimize (sometimes maximize) in order to find the best option that has the lowest score. Another prominent class of metrics cares less about the output performance (for example, error) and more about runtime timing, memory footprint, capacity and throughput, and so on.
The following is a partial list of metrics we may see in select computer vision problems:
Task | Example metrics |
Reconstruction, registration, feature matching |
Mean absolute error (MAE), |
Object classification, recognition |
Accuracy, precision, recall, f1-score, false-positive rate (FPR) |
Segmentation, object detection |
Intersection-over-Union (IoU) |
Feature detection |
Repeatability, |
The why to find the best algorithm for a given task is either to set all the options at our disposal in a test scenario and measure their performance on the metrics of choice, or obtain someone else's measurements on a standard experiment or dataset. The highest ranking option should be picked, where the ranking is derived from a combination of the metrics (in the case of just a single metric, it's an easy task). Next, we will try our hand at such a task, and make an informed choice on the best algorithm.