Choose OCR Software

One way or another, you need software to turn your raw scans into searchable PDFs. OCR software of some sort most likely came with your scanner, but if it didn’t—or if you’re not happy with its features or accuracy—you have oodles of other choices. This chapter provides an overview of major factors to consider when choosing Mac-compatible OCR software, along with a few specific suggestions for software to try (or avoid).

Determine Your Needs

I haven’t tried every scanner and every OCR application out there, but I’m going to go out on a limb and suggest that almost any combination of scanner and software can be made to yield acceptable results for most users. If you don’t want to agonize over the decision, the path of least resistance is to use whatever software came with your scanner.

However, you may be the sort of person who should look more deeply into the capabilities of OCR tools before jumping in if any of the following statements apply to you:

If you have none of those needs, feel free to skip to the next chapter, Configure Your Software. Otherwise, continue reading to learn about features to consider when evaluating OCR software.

Consider Important OCR Features

Comparing OCR apps for Mac OS X is less of a science than an art—and a messy one at that. The information available on developers’ Web sites varies tremendously in scope and detail. Some have elaborate user manuals, while others have only a brief how-to guide. Many offer downloadable demo versions, but some don’t. Developers use different terms to describe the same features, and have wildly divergent ideas about what constitutes a nicely usable interface. A feature that one developer considers too obvious to mention may be a main selling point for another. And although most of these applications claim to have outstanding OCR accuracy, objective measurements are notoriously difficult to come by.

In short, it’s harder than one might expect to evaluate OCR software without trying it out (and even then, results may be ambiguous). However, a few factors are worth looking for:

With those thoughts in mind, let’s look at the range of OCR programs that you can choose from. (I offer my recommendations after the list of applications, in Joe’s OCR Software Recommendations.)

Tags vs. Folders

I mentioned that, starting in 10.9 Mavericks, the Finder supports something called tags—and so do numerous other apps, including DEVONthink and Yojimbo. So, what is a tag anyway?

A tag is a text label you create yourself—a word or phrase that tells you something about what’s in a document. For example, you might use the word recipe as a tag and apply it to any document that contains a recipe. That would enable you to quickly locate recipes later, even if none of them contain the actual word “recipe” in their title or contents.

Any document can have more than one tag. So you might apply the tags recipe, dessert, French, and vegetarian to your recipe for crème brûlée. Other documents would have different combinations of tags, and as you search by tags, only those that match the combination you specify would show up.

You can use tags instead of, or in addition to, hierarchical filing. You might have tax-related documents scattered in dozens of folders on your disk, but if they’re all tagged with tax info, you can instantly see them all in one place. (Some apps, including the Finder, also associate a color with each text label to make them easier to identify quickly.)

Pick a Mac OCR Package

The number and variety of Mac apps that can produce a searchable PDF are growing constantly. As I said earlier, if whatever software was bundled with your scanner yields results you find acceptable, there’s no need to look further. But if you’re looking for a better OCR package than what you have now, you should find no shortage of choices.

In the first edition of this book, I described 21 OCR apps. That list went out of date almost immediately. And frankly, I consider only a fraction of those apps to be noteworthy. So, just as I did with scanners, I’ve relocated the information on OCR software to the online appendixes, where you can peruse features and prices at your leisure, and where I can more easily keep them up to date.

Here, I want to call your attention to just a few of those choices that I find particularly interesting for one reason or another. If an OCR tool you’re wondering about isn’t listed here, check the online appendixes.

Notable OCR tools for Mac include:

Tip: To learn all about DEVONthink, including how to use it for OCR and general-purpose document management, read my book Take Control of Getting Started with DEVONthink 2.

Tip: For a thorough tour of PDFpen and PDFpen Pro, read Michael Cohen’s Take Control of PDFpen 6.

OCR in the Cloud

A desktop application isn’t the only way to perform OCR on a scanned image. You can also use your Web browser to upload it to a special service, wait a few minutes for the server to perform the OCR, and then download a searchable PDF. ABBYY FineReader Online is such a service; using the same technology in the desktop version of ABBYY FineReader, you can recognize the text in a scanned or photographed image and convert it to not only PDF but even an editable Word, Excel, RTF, or text file.

I’ve tried the service and it works, but because you must upload files one at a time, it makes sense only for occasional use (such as when you need to do OCR on the road and have no other tools available). Prices start at $3 per 20 pages, and decrease as you buy more; a 3-page demo account is free.

Another OCR-in-the-Cloud option, Free OCR, is, as the name suggests, free. But it’s even more awkward to use than ABBYY FineReader Online, and has several frustrating limitations—including the fact that it delivers only the recognized text by itself, not a searchable PDF.

Joe’s OCR Software Recommendations

Of the applications listed in the online appendixes, I have experience with about half. All the Mac OCR tools I’ve tried have had adequate (if not always great) accuracy, but some are easier to use than others.

My preference is for a tool that works more or less invisibly behind the scenes. I like to configure things so that images from my scanner get the OCR treatment without interrupting my work or taking over my screen. A few OCR applications—Presto PageManager, OmniPage, and Readiris Pro—are what I think of as “old school,” in that their design assumes you’ll open the application, initiate a single-page scan from within it (typically, on a flatbed scanner), watch the OCR as it progresses, edit the final document, and then save it. There’s nothing wrong with any of that—and all three of those applications can be used in a much more automated, hands-off way—but I tend to gravitate toward apps with a more modern, minimalist approach.

I’ve been pleased with the results, interface, and flexibility of ABBYY FineReader—especially the versions of it included with Fujitsu’s ScanSnap scanners and DEVONthink Pro Office. It’s reasonably fast, recognizes text in multiple languages without any fuss (for several years I scanned about equal amounts of English and French text), and requires no interaction under normal circumstances.

If you need an uncommon feature, you should go for one of the tools that offers it. Otherwise, if you’re looking for strictly OCR, I’d lean toward ABBYY FineReader—either an embedded version or the stand-alone FineReader Express. For PDF editing, I’d choose PDFpen over the much-more-expensive and automation-unfriendly Acrobat XI Pro; if document management is your focus, I’d go with DEVONthink Pro Office; and if you particularly need to deal with receipts, either Neat for Mac or Paperless is a fine choice.

OCR for Sheet Music

When I think about “business documents,” what normally comes to mind are pages full of text, tables, and graphics. But if your business happens to involve music, you may be interested to know that you can perform advanced OCR on sheet music, too! Specialized software can recognize notes and other musical symbols, transforming a scanned sheet into a file readable by various popular notation and scoring applications—and even into a MIDI file that you can play back immediately.

I’m aware of two Mac applications that can do this: