Although MSERs are a common approach to define which extremal regions are worth working with, the Neumann and Matas algorithm uses a different approach, by submitting all extremal regions to a sequential classifier that's been trained for character detection. This classifier works in two different stages:
- The first stage incrementally computes descriptors (bounding box, perimeter, area, and Euler number) for each region. These descriptors are submitted to a classifier that estimates how probable the region is to be a character in the alphabet. Then, only the regions of high probability are selected for stage 2.
- In this stage, the features of the whole area ratio, convex hull ratio, and the number of outer boundary inflexion points are calculated. This provides more detailed information that allows the classifier to discard non-text characters, but they are also much slower to calculate.
Under OpenCV, this process is implemented in a class called ERFilter. It is also possible to use different image single channel projections, such as R, G, B, Luminance, or gray scale conversion to increase the character recognition rates. Finally, all of the characters must be grouped into text blocks (such as words or paragraphs). OpenCV 3.0 provides two algorithms for this purpose:
- Prune exhaustive search: Also proposed by Mattas in 2011, this algorithm does not need any previous training or classification, but is limited to horizontally aligned text
- Hierarchical method for oriented text: This deals with text in any orientation, but needs a trained classifier
This also means that this algorithm is sensitive to the fonts used in classifier training.
A demonstration of this algorithm can be seen in the following video, which is provided by Neumann himself: https://www.youtube.com/watch?v=ejd5gGea2Fo&feature=youtu.be. Once the text is segmented, it just needs to be sent to an OCR like Tesseract, similarly to what we did in Chapter 10, Developing Segmentation Algorithms for Text Recognition. The only difference is that now we will use OpenCV text module classes to interface with Tesseract, since they provide a way to encapsulate the specific OCR engine we are using.