Introducing optical character recognition

Identifying text in an image is a very popular application for computer vision. This process is commonly called optical character recognition, and is divided as follows:

The preprocessing and segmentation phase can vary greatly depending on the source of the text. Let's take a look at common situations where preprocessing is done:

OCR application strategies also vary according to the objective of the identification. Will it be used for a full text search? Or should the text be separated into logical fields to index a database with information for a structured search?

In this chapter, we will focus on preprocessing scanned text, or text that's been photographed by a camera. We'll consider that the text is the main purpose of the image, such as in a photographed piece of paper or card, for example, in this parking ticket:

We'll try to remove common noise, deal with text rotation (if any), and crop the possible text regions. While most OCR APIs already do these things automatically and probably with state-of-the-art algorithms—it is still worth knowing how things happen under the hood. This will allow you to better understand most OCR APIs parameters and will give you better knowledge about the potential OCR problems you may face.