Sending the output to a file

Let's change our main method to send the recognized output to a file. We do this by using a standard ofstream:

int main(int argc, char* argv[])  
{ 
   //Loads the ticket image and binarize it 
   Mat ticket = binarize(imread("ticket.png"));     
   auto regions = findTextAreas(ticket); 
 
   std::ofstream file;  
   file.open("ticket.txt", std::ios::out | std::ios::binary); 
 
   //For each region 
   for (const auto& region : regions) { 
         //Crop  
         auto cropped = deskewAndCrop(ticket, region); 
         auto text = identifyText(cropped, "por"); 
          
         file.write(text, strlen(text)); 
         file endl; 
   } 
    
   file.close(); 
}

The following line opens the file in binary mode:

file.open("ticket.txt", std::ios::out | std::ios::binary);

This is important since Tesseract returns text encoded in UTF-8, taking into account special characters that are available in Unicode. We also write the output directly using the following command:

file.write(text, strlen(text));

In this sample, we called the identify function using Portuguese as an input language (this is the language the ticket was written in). You may use another photo, if you like.

The complete source file is provided in the segmentOcr.cpp file, which comes with this book.

ticket.png is a low resolution image, since we imagined you would want to display a window with the image while studying this code. For this image, the Tesseract results are rather poor. If you want to test with a higher resolution image, the code for this book provides you with a ticketHigh.png image. To test with this image, change the dilation repetitions to 12 and the minimum box size from 20 to 60. You'll get a much higher confidence rate (about 87%), and the resulting text will be almost fully readable. The segmentOcrHigh.cpp file contains these modifications.