OpenCV and the data revolution in computer vision

OpenCV existed before the data revolution in computer vision. In the late 1990s, access to big amounts of data was not a simple task for computer vision researchers. Fast access to the internet was not common, and even universities and big research institutes were not strongly networked. The limited storage capacities of personal and bigger institutional computers did not allow researchers and students to work with big amounts of data, let alone having the computational power (memory and CPU) required to do so. Thus, the research on large-scale computer vision problems was restricted to a selected list of laboratories worldwide, among them MIT's Computer Science and Artificial Intelligence Lab (CSAIL), the University of Oxford Robotics Research Group, Carnegie Mellon (CMU) Robotics Institute, and the California Institute of Technology (CalTech) Computational Vision Group. These laboratories also had the resources to curate big amounts of data on their own to serve the work of local scientists, and their compute clusters were powerful enough to work with that scale of data.

However, the beginning of the 2000s brought a change to this landscape. Fast internet connections enabled it to become a hub for research and data exchange, and in parallel, compute and storage power exponentially increased year over year. This democratization of large-scale computer vision work has brought the creation of seminal big datasets for computer vision work, such as MNIST (1998), CMU PIE (2000), CalTech 101 (2003), and MIT's LabelMe (2005). The release of these datasets also spurred algorithm research around large-scale image classification, detection, and recognition. Some of the most seminal work in computer vision was enabled directly or indirectly by these datasets, for example, LeCun's handwriting recognition (circa 1990), Viola and Jones' cascaded boosting face detector (2001), Lowe's SIFT (1999, 2004), Dalal's HoG people classifier (2005), and many more.

The second half of the 2000s saw a sharp increase in data offerings, with many big datasets released, such as CalTech 256 (2006), ImageNet (2009), CIFAR-10 (2009), and PASCAL VOC (2010), all of which still play a vital role in today's research. With the advent of deep neural networks around 2010-2012, and the momentous winning of the ImageNet large-scale visual recognition (ILSVRC) competition by Krizhevsky and Hinton's AlexNet (2012), large datasets became the fashion, and the computer vision world had changed. ImageNet itself has grown to monstrous proportions (more than 14 million photos), and other big datasets did too, such as Microsoft's COCO (2015, with 2.5 million photos), OpenImages V4 (2017, with less than nine million photos), and MIT's ADE20K (2017, with nearly 500,000 object segmentation instances). This recent trend pushed researchers to think on a larger scale, and today's machine learning that tackles such data will often have tens and hundreds of millions of parameters (in a deep neural network), compared to dozens of parameters ten years ago.

OpenCV's early claim to fame was its built-in implementation of the Viola and Jones face detection method, based on a cascade of boosted classifiers, which was a reason for many to select OpenCV in their research or practice. However, OpenCV did not target data-driven computer vision at first. In v1.0, the only machine learning algorithms were the cascaded boosting, hidden Markov model, and some unsupervised methods (such as K-means clustering and expectation maximization). Much of the focus was on image processing, geometric shape and morphological analysis, and so on. Versions 2.x and 3.x added a great deal of standard machine learning capabilities to OpenCV; among them were decision trees, randomized forests and gradient boosting trees, support vector machines (SVM), logistic regression, Naive Bayes classification, and more. As it stands, OpenCV is not a data-driven machine learning library, and in recent versions this becomes more obvious. The opencv_dnn core module lets developers use models learned with external tools (for example, TensorFlow) to run in an OpenCV environment, where OpenCV provides the image preprocessing and post-processing. Nevertheless, OpenCV plays a crucial role in data-driven pipelines, and plays a meaningful role in the scene.