Extremal region filtering

Although MSERs are a common approach to define which extremal regions are worth working with, the Neumann and Matas algorithm uses a different approach, by submitting all extremal regions to a sequential classifier that's been trained for character detection. This classifier works in two different stages:

  1. The first stage incrementally computes descriptors (bounding box, perimeter, area, and Euler number) for each region. These descriptors are submitted to a classifier that estimates how probable the region is to be a character in the alphabet. Then, only the regions of high probability are selected for stage 2.
  2. In this stage, the features of the whole area ratio, convex hull ratio, and the number of outer boundary inflexion points are calculated. This provides more detailed information that allows the classifier to discard non-text characters, but they are also much slower to calculate.

Under OpenCV, this process is implemented in a class called ERFilter. It is also possible to use different image single channel projections, such as R, G, B, Luminance, or gray scale conversion to increase the character recognition rates. Finally, all of the characters must be grouped into text blocks (such as words or paragraphs). OpenCV 3.0 provides two algorithms for this purpose:

Note that since these operations require classifiers, it is also necessary to provide a trained set as input. OpenCV 4.0 provides some of these trained sets in the following sample package: https://github.com/opencv/opencv_contrib/tree/master/modules/text/samples.
This also means that this algorithm is sensitive to the fonts used in classifier training.

A demonstration of this algorithm can be seen in the following video, which is provided by Neumann himself: https://www.youtube.com/watch?v=ejd5gGea2Fo&feature=youtu.be. Once the text is segmented, it just needs to be sent to an OCR like Tesseract, similarly to what we did in Chapter 10, Developing Segmentation Algorithms for Text Recognition. The only difference is that now we will use OpenCV text module classes to interface with Tesseract, since they provide a way to encapsulate the specific OCR engine we are using.