Configure Your Software

The best OCR software in the world can still produce lousy results if you don’t set it up just so and give it the best possible input material to work with. You’re looking for a combination of settings that gives you the best balance of OCR accuracy, processing speed, image quality, and file size. I help you figure out what those are in this chapter. I also show you several ways to automate scanning so that it takes as little manual effort as possible, and provide guidance about how to file your scanned documents so you can find and use them quickly in the future.

Understand the Scanning Process

The fact that your scanner includes OCR software, or that you’ve purchased such software separately, doesn’t necessarily mean that the process of creating a searchable PDF from a scanned document will be straightforward. It might be, but more often than not, it’s necessary to think through a multi-stage process, which may involve configuring the settings in two or more pieces of software.

Every scanner comes with customized software that handles the low-level communication between the scanner and your computer. For example, if you have a Fujitsu ScanSnap, the scanner-specific software is called ScanSnap Manager; with a Canon imageFORMULA scanner you’d use Canon CaptureOnTouch; with an Epson scanner it would be Epson Scan; and so on. This software is responsible for taking the raw data your scanner produces and turning it into a bitmap image stored on your hard disk. As a result, this software always provides some means of setting preferences such as resolution, destination, and file format. The scanner’s software may include many other capabilities, too, but for the moment, assume that its only purpose is to spit out a bitmap image, as shown in the top row of Figure 1.

If you were scanning photos, then the bitmap image would be all that you’d need. But for scanned documents, an additional step is generally necessary (the bottom row in Figure 1)—another piece of software opens the bitmap image, performs OCR on it, and generates a searchable PDF file.

**Figure 1:** Produce searchable PDFs from paper documents with as little manual effort as possible—ideally, no more than a single button press.

Figure 1: Produce searchable PDFs from paper documents with as little manual effort as possible—ideally, no more than a single button press.

Since you want to avoid manual effort whenever you scan, you may feel some concern about the fact that two or more applications may be involved. Fortunately, the process has several potential shortcuts:

Now that you know where we’re going, your first step is to consult the documentation that came with your scanner and open whatever application is responsible for managing those low-level settings such as resolution and destination. In the next couple of topics, I give you some guidance as to how you should configure that software to end up with a bitmap image suitable for OCR, and to send that image (if possible) directly to the software that’ll handle the OCR process.

Because space doesn’t permit me to give detailed instructions for every application, I illustrate most of the settings that follow with examples from the most recent version of Fujitsu’s ScanSnap Manager software. If you’re using software from another manufacturer, the options and wording will vary, but you should find roughly comparable settings.

Choose Your Main Scanning Options

Of the many settings you may want to consider, three are of particular importance because they affect file size and OCR accuracy, among other things: Resolution, Color Mode, and Compression. I look at each of those in turn, and then discuss how they fit together.

Resolution

The first decision to make is the resolution at which your scanner will save images of scanned documents. Almost every scanner mentioned in this book has an optical resolution of 600 dpi, so that’s the maximum—but they can be set to scan at lower resolutions, too. The choice of this single number has significant implications:

You may have to experiment with resolution settings (in combination with color mode and compression) to find what works best for you. My experiments suggest that a good starting point is 300 dpi for grayscale and color scans and 600 dpi for black-and-white scans.

Color Mode

By color mode, I’m referring to whether the scanned image is black-and-white, grayscale, or color.

Black-and-white bitmaps take up very little disk space, while grayscale images take up more space, and color images still more. However, I’ve found that compared to black-and-white scans, grayscale or color images tend to have significantly better OCR accuracy, as well as superior reprint fidelity—even at a lower resolution (for example, a 300-dpi grayscale image produces better OCR accuracy than a 600-dpi black-and-white image). And, with careful attention to compression (discussed next), the file sizes need not be unreasonably large. In fact, because of the way some OCR software alters scanned images, you may paradoxically end up with far larger files with a black-and-white scan than with a grayscale scan at the same resolution!

OCR accuracy and file size aside, if colors or gray shades are essential for the documents you’re scanning (for example, ones that include photographs or artwork), you’ll want to set the appropriate color mode. Your scanner may have an Auto setting that enables it to figure out the proper color mode as it scans, which may be even better (but do a few spot checks to confirm that it’s making wise choices).

Compression

The bitmap files produced by your scanner can be compressed using a variety of methods in order to reduce the file size. With black-and-white images, the compression is normally lossless (no information is discarded), whereas with grayscale and color scans, lossy compression is normally used—it shrinks the files much more, but decreases clarity at higher compression settings. Excessive compression can reduce OCR accuracy, not to mention making images less attractive, while using little or no compression results in unreasonably large files. So, in my experience, a medium compression setting is usually the best compromise.

However, let me qualify that in two ways.

First, although some OCR apps (such as PDFpen) leave the bitmap image alone and simply add the recognized text to the PDF, others try to recompress the image after recognizing the text. Usually this results in smaller files, but other times—particularly with black-and-white scans—this process uses a less-effective compression method that actually increases the file size, sometimes dramatically. (The stand-alone version of ABBYY FineReader Express, which is excellent in most other respects, happens to be a culprit in the latter category.) All that to say: no matter what you choose in terms of initial compression for your images, there’s no guarantee that your choice will be honored by software that processes the file later on. File sizes may get much better—or much worse.

Second, one OCR app—Readiris Pro—includes a proprietary compression algorithm called iHQC, which the developer claims can reduce storage space “up to 50 times,” with no loss of visual fidelity. Even if you don’t use Readiris Pro for OCR, you can use a separate tool called IRISCompressor to shrink PDFs after the fact, if small file size is crucial. (I have not tested iHQC, so I can’t comment on its effectiveness from personal experience.)

Putting It All Together

Juggling three different variables to get the best combination of file size, OCR accuracy, and other benefits is no mean feat. I’ve performed hundreds of experiments with many combinations of settings and software in an attempt to find the sweet spot for my own needs, but there’s no single answer that’s ideal for every situation.

In general, I’ve found that 300 dpi grayscale scans, with medium compression, yield the most favorable tradeoff between file size and OCR accuracy. They also look very good and don’t tax my computer’s CPU excessively. Of course, if you work with different sorts of documents, different hardware, or different software than I do, you may reach other conclusions.

In any case, all these settings apply to the initial scanned image—before any OCR takes place. That means you must configure it in the scanner’s software package.

To illustrate with one version of ScanSnap Manager, the Scanning tab (Figure 2, slightly ahead) offers the following choices:

Note: The ScanSnap’s optical resolution is only 600 dpi, but it can use a software trick called interpolation to simulate a higher resolution—in this case, 1200 dpi.

The “Excellent” option results in much slower scans as well as huge files, and I’ve never found that extra-high resolution to be helpful.

**Figure 2:** In ScanSnap Manager, choose a resolution from the Image Quality pop-up menu. The menu may be somewhat different depending on which version of the software you’re using.

Figure 2: In ScanSnap Manager, choose a resolution from the Image Quality pop-up menu. The menu may be somewhat different depending on which version of the software you’re using.

Although I had Image Quality set to Automatic Resolution and Color Mode set to Auto Color Detection for years without any apparent problems, my latest experiments have led me to prefer an Image Quality setting of Best (for improved OCR accuracy) and a Color Mode of Gray (to avoid the possibility of getting black-and-white images, which not only have lower OCR accuracy but which can grow precipitously when fed through certain third-party applications).

Tip: Although I use ScanSnap Manager as an example, I can’t get into all its many configuration details. If you’d like to know more about setting up a ScanSnap and find Fujitsu’s documentation wanting, check out The Unofficial ScanSnap Setup Guide, a $5 ebook by Brooks Duncan, who runs the DocumentSnap Web site.

Set the Destination

The next choice you must make is the destination for the bitmap file. In general, you can choose either of the following:

**Figure 3:** In ScanSnap Manager, choose an application on the Application tab to send scanned documents directly to it.

Figure 3: In ScanSnap Manager, choose an application on the Application tab to send scanned documents directly to it.

**Figure 4:** Most scanner software lets you choose what format filenames have when documents are first scanned.

Figure 4: Most scanner software lets you choose what format filenames have when documents are first scanned.

ScanSnap Manager and OCR

If you have a Fujitsu ScanSnap scanner, you may notice that its software offers up to four ways of performing OCR:

Fujitsu offers little explanation of these options and scant guidance as to when you should use which. You may have to experiment to find what works best for you. My own preference is “Use another application” with DEVONthink Pro Office, which has its own embedded (and reasonably customizable) version of ABBYY FineReader.

Your scanner software may offer more than just these two options (send scans to an application or save them to a file). For example, ScanSnap Manager lets you choose a feature called Quick Menu by selecting the Use Quick Menu checkbox at the top of the window. When you do this, every time you scan a document, a special window (the eponymous Quick Menu) pops up to let you choose what to do with that particular scan. This approach is useful if you do a variety of activities with scanned documents (for example, you print some, you email some, and you send still others to iPhoto). But since I almost always want to save my own scans as searchable PDFs, I leave this feature turned off most of the time.

Set the File Format

Your document will eventually end up as a PDF, but the bitmap image that your scanner produces initially could in theory be in any of several common formats, such as TIFF, JPEG, PNG, and (naturally) PDF. Most scanning software lets you choose the bitmap file format, although some offer more choices than others.

Here are my recommendations, in order from most to least preferable:

Set Other Scanning Options

The list of other scanning options you may be able to set is long and highly variable. And, in general, whether or how to change any of these things is up to you—you might want to experiment and see what works best with your combination of scanner, software, and documents. (Many of these settings are also available in certain OCR applications; it’s up to you to decide where it makes the most sense to use them.)

Examples of other commonly seen settings:

Set OCR Options

The preceding instructions should do it for configuring the software to create the initial bitmap files themselves. But whether you’re using OCR capabilities built into your scanner’s software or a separate OCR application, you should next take a quick spin through the OCR-related preferences. They’re likely to be less involved (and may include some of the same options described just previously), but at minimum, be sure you configure the following:

The settings for those last two items—destination and file name—may be obvious to you, or they may require more thought. And, if you’ll be sharing scanned files with others, some additional questions may arise. So before you finalize your settings, read the next section for advice on naming and filing your scans.

A Few Words about OCR Accuracy

I’ve already mentioned that resolution, color mode, and compression can affect OCR accuracy, regardless of which OCR tool you use. However, you may want to keep a couple more factors in mind, too.

Not all OCR software produces equally good results, even given the same, ideal file. Although I haven’t tested every app for accuracy, I can say that ABBYY FineReader (in whichever incarnation) produces the most accurate results I’ve seen, while Acrobat Pro was the least-accurate of those I tested. PDFpen falls somewhere in between.

On the other hand, even the worst-performing OCR tool, working with a suboptimal file (say, one with too-low resolution and too much compression) yielded enough accuracy to enable me to search for key terms in the file—which is all I need anyway. For the purposes of searching, it doesn’t usually matter if punctuation and spacing are wrong, or if half a dozen words on a page are misspelled.

So, although you shouldn’t ignore accuracy as a criterion, it’s usually not worth a huge amount of effort to get near-perfect accuracy.

Choose a Naming and Filing Strategy

Naming your searchable PDFs and filing them (that is, storing them in some particular location) may be entirely separate activities, but it usually makes sense to do them together. And, your OCR software may expect you to make decisions up front about how these tasks will be handled. So it’s a good idea to think through your options carefully.

Fundamentally, you have four questions to answer:

Decide When to Name and File Documents

As you perform OCR on your scanned documents, you have three basic choices as to what happens next:

I can think of good reasons for choosing any of these approaches, but the important thing is to weigh the pros and cons, decide how you want to handle the process, and stick with it.

Use the Hands-off Approach

The appeal of the entirely hands-off approach is obvious: it requires no effort other than pressing a button. So, if you’re concerned that you’ll never get around to scanning otherwise, that’s a significant positive.

On the negative side, if you let your software name scanned documents, the names (usually a string of numbers based on the date and time) won’t be meaningful to you, and when it comes time to find a file, you won’t be able to distinguish one from another by name; you’ll have to examine their contents, too.

And, if you let your software file the documents too, they’ll almost certainly end up in a single big folder somewhere, which again makes it harder to find what you’re looking for later on.

Tip: DEVONthink Pro Office’s Auto Classify feature can sort scanned documents into folder-like groups automatically based on their contents, mitigating this problem somewhat.

For me, since I’m scanning documents in order to make my life easier, the negatives of the hands-off approach outweigh the positives.

Name and File as You Go

You can choose to name every document as soon as your OCR software turns it into a searchable PDF. At the same time, you can optionally file it in an appropriate folder and, if you’re using a document manager that supports tags (or the Finder, starting in 10.9 Mavericks), apply tags that will help you identify the document later. (Read the sidebar Tags vs. Folders.)

The big advantage to doing this is that you’ll make it much quicker to find the document later—and by doing this work right away, the subject matter of the document will be fresh in your mind, making naming easier.

The disadvantage is that it’s not merely more work, it turns scanning into a task that demands your ongoing attention, because you have to stop after every document goes through the scanner, think about it, and perform one or more extra steps.

Name and File after the Fact

As a compromise between the first two options, you can let your computer process everything automatically at first, but then later —say, once a week—go back and review recently scanned documents, name them, file them in the proper locations, tag them, and so on.

Although this approach has the benefits of both of the other alternatives, it has a downside, too: it’s more time-consuming to identify and name documents after the fact than right away. And, if you let it go too long, the task might become so overwhelming that you never do it.

Nevertheless, this is what I do most of the time. I’m disciplined enough to avoid letting my unnamed scans pile up for months, and sometimes I’m even inspired enough to name files as I go.

Tip: An app called PDFiler aims to speed up after-the-fact naming of PDFs by offering an intelligent, adaptive template for filling in names and choosing filing locations. The interface is unusual, to say the least, but check out the video for a demonstration of how it works.

Choose a Retrieval Method

After OCR is complete (whether or not you’ve taken the time to choose a file name) and you have a searchable PDF, you can leave it in the folder where it started—the one where the bitmap images straight from the scanner live—or you can move it somewhere else. I’m a proponent of the “somewhere else” approach, but before you can decide where, exactly, to store your files, you need to know what technique you’ll use to find and view your PDFs later. In particular, you need to decide whether you will store the PDF as an ordinary file in the Finder—and if so, where? Or, will you store everything in a document manager?

Use the Finder

The default way to retrieve documents is through the Finder, possibly with the help of Spotlight. Storing PDFs in regular Finder folders is easy—it happens automatically if you don’t take any other action, and Spotlight automatically indexes the documents. Because the PDFs are now ordinary, searchable files, you can organize them just like all your other documents—for example, if you have scanned documents relating to a specific project, they might go in that project’s folder in the Finder; or if your scans are of utility bills, they might go in a folder with other financial documents. Or, you may keep all your searchable PDFs, regardless of contents, together in one place.

Before you choose Spotlight and the Finder as your retrieval tools, spend a moment pondering these questions:

Tip: If you want to become a Spotlight power user, check out Sharon Zardetto’s book Take Control of Spotlight for Finding Anything on Your Mac.

Tip: If you’re undecided, you can use the Finder initially and move to a document manager later if you need more organizational power.

Use a Document Manager

I mentioned several document managers in the discussion of OCR software (Pick a Mac OCR Package). Essentially, they’re applications that provide their own storage, categorization, display, and search methods for files and other snippets of data. You might prefer one of these over storing files only in the Finder for any number of reasons, such as a more pleasant user interface, more-flexible (or faster) searching, support for tags in pre-Mavericks versions of Mac OS X, or other database features.

Neat for Mac and Paperless, both of which I covered earlier in Pick a Mac OCR Package, are OCR tools with built-in document managers (or vice-versa) that are specially designed to work with structured data such as receipts. And, I’ve mentioned that I’m personally a fan of DEVONthink Pro Office. However, a few other options are also worth considering, as long as you have some independent way to perform OCR:

If you’re considering one of these applications, I suggest downloading a demo version and making sure you can find a way (such as using an AppleScript folder action) to store your searchable PDFs directly in the document manager of your choice.

Depending on your needs, you may want to look for a few features in particular:

Choose a Naming Convention

I don’t want to belabor this point, nor can I provide any universal solutions, but it’s worth giving some thought to what you’ll name your searchable PDFs so you can more easily find them later—and so anyone else who needs to use the files can clue into their contents. (If you’re content with file names like 2014_04_26_11_27_00.pdf then feel free to skip ahead to the next section!)

Suppose you’re scanning a stack of invoices. Naming them all “invoice.pdf” may be a bit better than nothing, but then, when you search for one of these and the result is a list of 100 files named “invoice.pdf,” you won’t know which is which without examining each one individually. On the other hand, although nothing prevents you from naming a file “Invoice #416, dated April 22, 2014, from ABC Supply Corp for $432.19.pdf,” that’s cumbersome to type and equally awkward to read. So, let me offer a few suggestions:

Tip: TextExpander can make it much easier to name files, including entering or reformatting dates. To learn more, see Michael Cohen’s book Take Control of TextExpander.

Choose a Destination

If you’re storing searchable PDFs as ordinary, Finder-accessible files rather than using a document manager, make sure you put those files in a location that makes sense for your needs. Here are your options:

What about the Originals?

Your scanner software puts a bitmap file in a folder somewhere, and your OCR software converts it to a searchable PDF—possibly storing it in a document manager as well. So, what happens to that original, non-searchable bitmap image? Depending on your software and settings, it could be any of the following:

All these are valid choices, depending on your needs, but I want you to be aware of the possibilities and determine, or decide, what happens on your system to avoid surprises in the future (such as looking for an original you assumed you had, but finding it was deleted). I like to set my OCR software’s preferences to overwrite the bitmap-only PDF with a copy that also contains searchable text.

Automate OCR

Earlier in this chapter, an option that I described was routing incoming scans to an OCR program or feature, which (if you’re lucky) then creates searchable PDFs automatically. If that’s what happens on your Mac, congratulations—you can skip this section. But if your scanner’s software doesn’t support that configuration, or if you want to use OCR software that doesn’t automatically generate a searchable PDF when it opens a document, read on for help with automating the process.

Any scanner can save bitmap files into a folder somewhere on your disk, so that’s our starting point. Fundamentally, you need to make both of the following tasks happen automatically:

Luckily, you can often use the same tool to accomplish both tasks: an AppleScript folder action. A folder action is an AppleScript that runs automatically when something happens to a specified folder—for example, you open or close it, or add files to it. So, the basic idea is this: create a folder action script and attach it to the folder where incoming scans are stored so that it watches for new files being added; have the script open those files in your OCR program and then instruct your OCR program to go ahead and process the files.

What’s particularly cool about this method is that sometimes it can even automate OCR in applications that don’t inherently support AppleScript—or don’t support it robustly. This is possible due to a feature called GUI Scripting, which means that instead of AppleScript issuing a direct command to an application to perform some action, it instead simulates the user actions of choosing menu commands, clicking buttons, filling in fields, and suchlike. Unfortunately, this means the application must be in the foreground—if you were to switch to another window while this was going on, the AppleScript would no longer be able to “see” and operate the necessary controls. Still, it’s way better than going through the entire process manually every time.

I’d like to offer you prewritten AppleScripts for every OCR program and with every possible combination of settings and behaviors, but life is too short. So, instead, I’m providing four scripts that can drive a few popular OCR tools, and serve as examples on which you can base your own scripts for other applications. You can also, of course, modify any of my scripts to make it work differently according to your needs.

Download the scripts. After unzipping them (if that doesn’t happen automatically), move the files into either /Library/Scripts/Folder Action Scripts (which requires authenticating with your username and password) or ~/Library/Scripts/Folder Action Scripts (see Basics for help accessing it); in either case, if the folder doesn’t already exist, create it. (Be sure not to put the scripts in the similarly named Folder Actions folder, which may appear in the same location.) Then proceed with the instructions that follow to configure the scripts on your Mac.

Note: For GUI scripting to work with my scripts for Readiris and some versions of Acrobat Pro, you must enable access for assistive devices. To do so, in System Preferences, open the Universal Access pane (called Accessibility prior to 10.8 Mountain Lion) and make sure Enable Access for Assistive Devices is selected at the bottom of the window. But if you forget, it’s no problem—the scripts are smart enough to check for this setting and alert you if it’s incorrect.

Enable and Attach Folder Actions

Before you can use folder action scripts, you must enable the system-wide Folder Actions capability if you haven’t previously done so, and attach a particular script to the folder where your incoming scans are stored. These steps should work in 10.6 Snow Leopard and later:

  1. Right-click (Control-click) on the folder where your scanner stores new scans (see Set the Destination, earlier in this chapter), and from the contextual menu that appears, choose Services > Folder Actions Setup. Folder Actions Setup opens.
  2. In the dialog that appears, select the script you want to use, such as OCR This (PDFpen & PDFpenPro), and click Attach. (Although you can attach multiple AppleScripts to a single folder, I don’t recommend it. Pick a single script, and if need be, you can return to this dialog and change it later.)
  3. Make sure Enable Folder Actions is checked at the top of the Folder Actions Setup window. Your window should look something like Figure 5.
    **Figure 5:** You’re looking for approximately this end result (folder and script names may differ) after configuring Folder Actions.

    Figure 5: You’re looking for approximately this end result (folder and script names may differ) after configuring Folder Actions.

  4. Quit Folder Actions Setup.

Use a Folder Action Script

You’re almost ready to go, but it’s best to tweak a setting or two for optimal behavior in your OCR application of choice, and to understand exactly what to expect of the scripts.

Acrobat Scripts

As I lamented earlier, Acrobat XI Pro is immune to this sort of scripting; however, if you’re still running an earlier version of Acrobat Pro (7, 8, 9, or X) you can use a folder action—but only after you’ve performed a slightly odd one-time procedure to prepare it for OCR:

  1. Open a PDF file (any file at all).
  2. Choose the Recognize Text command appropriate to the version of Acrobat you’re using:
    • Acrobat X Pro: Click Tools, then Recognize Text, then In This File.
    • Acrobat Pro 8 or 9: Choose Document > OCR Text Recognition > Recognize Text Using OCR.
    • Acrobat Pro or Standard 7: Choose Document > Recognize Text Using OCR > Start.
  3. Click Edit. In the dialog that appears, make sure the main language of your documents—for readers in North America, most likely English (US)—is chosen in the Primary OCR Language pop-up menu. Choose Searchable Image (Exact) from the PDF Output Style pop-up menu (although read Pick a Mac OCR Package for information on the ClearScan option).
  4. Click OK to close the Settings dialog, and then click Cancel (yes, Cancel) to dismiss the Recognize Text dialog.
  5. Close the PDF file you opened in Step 1.

Now you’re ready to try out the script—either by scanning a document or by dragging an existing scanned image into the folder to which the AppleScript is attached. I provide two different Acrobat scripts. Both work with Acrobat Standard version 7 and Acrobat Pro versions 7, 8, 9, and X, but they have slightly different behaviors:

Warning! Do not save the file in the folder to which the OCR This folder action is attached! If you do, this will trigger the script to run again on the new file.

The “OCR This” Action in Acrobat X Pro

The design of Acrobat X Pro precludes running the OCR command directly using AppleScript. However, it does support a type of built-in automation called Actions. So, my OCR This scripts for Acrobat X Pro work around Acrobat’s limitations by creating (a basic version) of an Action that performs OCR, and then running that Action. As created by either of the OCR This (Acrobat) scripts, the “OCR This” Action prompts you to choose a filename and location for saving the searchable file.

For complicated reasons, I was unable to reliably automate creation of an Action that would save a file in place, although you can easily modify the script to do so yourself with a few clicks. To change the way the Action behaves, follow these steps:

  1. In Acrobat X Pro, choose File > Action Wizard > Edit Actions.
  2. Select OCR This and click Edit.
  3. To change OCR settings, click the Options button by “Recognize Text (using OCR)”; this gives you the same settings as described earlier (see Step 3 under Acrobat Scripts, a page or so earlier).
  4. To make the Action save the file in place (rather than prompt you with a Save As dialog), choose The Same Folder Selected at Start from the Save To pop-up menu. Alternatively, to specify another folder, choose A Folder on My Computer, navigate to a folder, and click Choose.

In case you’re wondering, Acrobat XI Pro does still support actions; unfortunately, it no longer makes those actions accessible via a menu, so there’s no reliable way either to create an action or to run one using AppleScript. They can only be run manually.

PDFpen Script

The OCR This (PDFpen & PDFpenPro) script works with either version of the PDFpen software (version 5 or later), without requiring any modification. However, for best results I suggest making one small change in PDFpen’s settings before using the script:

  1. Choose either PDFpen > Preferences or PDFpenPro > Preferences.
  2. Click OCR.
  3. Uncheck the Prompt for OCR When Opening a Scanned Document checkbox.

This last step may seem counterintuitive, but if you leave that box checked, then whenever the script runs automatically on a newly scanned document, PDFpen will display a dialog asking if you want to perform OCR on it. That isn’t actually a problem—the script still works—but there’ll be a delay of a few seconds, and that dialog (and the beep that sounds when it appears) may be confusing and distracting.

Once PDFpen is configured, scan a document (or drop an already-scanned document into your designated scans folder) to try the script.

Readiris Pro Script

The Readiris Pro script was created with version 14 of the app. I can’t guarantee how well the script or setup instructions will work with older or newer versions.

Before using the OCR This (Readiris) script, open Readiris Pro and set it up as follows:

  1. Choose Settings > Edit PDF Export Options. At the top of the window, click Destination. Make sure the File radio button is selected.
  2. Still in the PDF Export Options dialog, click PDF Options. Choose Image-Text from the Type pop-up menu, and uncheck the Embed Fonts and Create Bookmarks checkboxes if they’re checked. Leave the other settings in this dialog as they are, and click OK.
  3. Make sure the PDF button is selected in the Format and Destination portion of the toolbar.
  4. Choose Settings > Save as Default. (That way, these settings should stick when you use Readiris Pro again.)

When a new PDF file appears in the folder to which you’ve attached the script, Readiris opens the file, recognizes the text in it, and saves it as a PDF; it prompts you to enter a name and select a location. (Unfortunately, because of Readiris Pro’s poor AppleScript support, I was unable to find a good way to avoid this need for interaction.) After you save the file, Readiris creates a new document (which clears all the existing scanned pages from its list).

Note: If you happened to have any pages open in Readiris Pro before running a script, the script will try to close them (to avoid adding extra pages to your PDFs). Therefore, before doing any scanning, make sure you’ve saved anything you were previously working on.

Extend Folder Action Scripts

The scripts I’ve provided are all fairly simple, but depending on your needs, preferences, and willingness to tinker with AppleScript, you could enhance them to do other things.

Here are a few ideas: