Chapter 25. Unix

Once upon a time, Apple's Mac OS operating system and the Unix operating system were two completely separate worlds, each with its own idea of what constituted "scripting." On Mac OS, there was AppleScript and the OSA. On Unix, there was the command line, shell scripting, and the various shell scripting languages such as Perl. Now, with Mac OS X, those two worlds are united; both kinds of "scripting" are present, and there is communication between them, in both directions. Thus, you can combine the power of Unix scripting with the power of AppleScript.

The way you call a Unix tool from AppleScript is with the do shell script scripting addition command. The way you call from Unix into AppleScript is with the osascript tool. This chapter discusses both, along with some examples; you'll see how Perl, curl, and AppleScript can work together to perform a web query, and you'll see Ruby, AppleScript, and Microsoft Excel joining forces to perform textual analysis and graph the results.

Your first step in getting acquainted with the do shell script scripting addition command should be to read Apple's excellent technical note documenting it (http://developer.apple.com/technotes/tn2002/tn2065.html ).

The direct object of do shell script is a string representing the text you would type at the command-line prompt in the Terminal. Actually, that's not quite true, so do not imagine that you can blithely test a prospective do shell script command simply by typing it in the Terminal, as there might be some differences. Your Terminal shell is probably bash or tcsh , whereas the shell for do shell script is sh (you probably won't experience this as a difference in Tiger, though, where sh is bash). Also, the default paths used by do shell script might not be the same as your own shell's paths, so to specify a command, you might have to provide a full pathname, such as /usr/bin/perl instead of just perl. (That's not a real example, though, as perl will probably work just fine.)

The result of do shell script is whatever is returned from the command via standard output (stdout). Unix newline characters are converted to Mac return characters by default, but you can prevent this if you wish. If the command terminates with a nonzero result (an error), an error by the same number is thrown in your script, and you can use this number (along with the man pages for the command) to learn what went wrong.

For example, the following code requests a (decimal) number from the user and converts it to hex by means of the Unix printf command:

set theNum to text returned of (display dialog "Enter a number:" default answer "")
set s to "printf %X " & theNum
display dialog (do shell script s)

The do shell script command both accepts and returns Unicode text. But the actual medium of communication between AppleScript and the shell is UTF-8—that is, the direct object of do shell script is converted to UTF-8 before passing to the shell, and the reply from the shell is assumed to be UTF-8 and is converted to Unicode text (UTF-16) before arriving in your script. This is a sensible approach. Most Unix tools are probably unprepared to deal with Unicode input, and for characters in the ASCII range, the UTF-8 representation is the ASCII representation, so in effect the direct object is coerced automatically from Unicode text to a string, as long as it consists of just ASCII characters—which it usually will, so you usually won't have to worry about it. On the other hand, some Unix tools produce UTF-8 output, and it's nice to know that you can capture this seamlessly as the result; indeed, this can be a useful technique, as you saw in "Unicode Text" in Chapter 13.

The optional parameter with administrator privileges lets you run a command as root. Such a command will require administrator authentication; you can provide it as part of the command (with the user name and password parameters) or permit the authentication dialog to appear. The implementation of this option has been tweaked several times over the course of its history; this is unfortunate, as it means that the precise version of AppleScript your script runs on can make a crucial difference to the script's behavior. For example, before Tiger, multiple commands (separated by semicolons) in a do shell script command with the with administrator privileges parameter broke; in early versions of Tiger, scripts that called sudo or perl from do shell script along with the with administrator privileges parameter broke. Be careful, and test thoroughly.

The shell set up by do shell script is not interactive, so beware of tools that expect an interactive shell. Many such tools provide a noninteractive alternative, though, so you can still use them. For instance, if you wanted to call top, you could call it in a noninteractive form such as top -l1 and parse the result. The following (rather silly) example converts a number to a hex string by way of bc by writing out a small bc program file and calling bc to process it:

set theNum to text returned of (display dialog "Enter a number:" default answer "")
set t to path to temporary items
set posixT to POSIX path of t
set f to open for access file ((t as string) & "bctemp") with write permission
write "obase = 16\n" to f
write (theNum as string) & "\n" to f
write "quit\n" to f
close access f
set s to "/usr/bin/bc " & quoted form of (posixT & "bctemp")
display dialog (do shell script s)

(The text returned of display dialog is Unicode text in Tiger; I don't want to write Unicode text to the file, because bc won't be able to read it, so I coerce to a string.)

The Unix shell parsing and quotation rules can be something of a headache. A simple solution is to protect a string from the parsing rules completely by wrapping it in single quotes; AppleScript makes this easy with the quoted form property of a string (see "String Properties" in Chapter 13). But this does not absolve you from AppleScript's own rules for forming literal strings (see Table 13-1). So, for instance, in the system attribute example in "File and Machine Information" in Chapter 21, quotes are escaped in the literal string s, to get AppleScript to do the right thing; then the entire string is munged with quoted form to get the shell to do the right thing.

Using a file as an intermediary can simplify things. When talking to Perl, for example, there is no problem forming a short Perl script and handing it to Perl directly by means of the -e switch; but for a longer Perl script it might make sense to write it into a file and then tell Perl to run the file. The Perl script can be created on the fly, but often there is no need; it might make more sense to prepare it beforehand.

Here's an example. I write for the weekly online Macintosh journal TidBITS , whose archives are searchable online: their web site has a page (http://db.tidbits.com) where you can enter words to search for, returning a page of links to past articles containing those words. We'll simulate this page, acting as a web client ourselves, with the help of curl (a brilliant Internet client Unix tool—read the man pages for more information). We have researched the HTML format of the results page, and we've prepared a Perl script to parse it and extract the URLs and titles of the found articles. The Perl script expects as argument the pathname of the file containing the HTML:

$s = "";
while (<>) {
    $s .= $_;
}
$s =~ m{search results (.*)$}si;
$1 =~ m{<table(.*?)</table>}si;
@rows = ($1 =~ m{<tr(.*?)</tr>}sig);
for ($i=0;$i<$#rows;$i++) {
    ($links[$i], $titles[$i]) =
        ($rows[$i+1] =~ m{<a href="(.*?)">(.*?)</a>}i);
}
print join "\n", @links, @titles;

Now for the AppleScript code. First we put up a dialog where the user can enter the terms to search for. We URL-encode these terms in a primitive way (substituting a plus sign for any spaces) and assemble the post data for the form submission. We use curl to submit this post data to the TidBITS server. The TidBITS server receives essentially the same HTML it would receive if a user entered the same search terms in the first field of the TidBITS search page and pressed the Submit button. The results come back as a page of HTML, which curl writes out to a file. We now hand this file over to our Perl script for parsing. The results come back from the Perl script, and now we have a list which is really two lists: first the URLs of the found pages, then the titles of those same pages. We put up a list showing the titles; if the user chooses one, we ask the browser to display the corresponding URL.

set t to text returned of ¬
    (display dialog "Search TidBITS for:" default answer "")
set text item delimiters to "+"
set t to (words of t) as string
set d to "'-response=TBSearch.lasso&-token.srch=TBAdv"
set d to d & "&Article+HTML=" & t
set d to d & "&Article+Author=&Article+Title=&-operator"
set d to d & "=eq&RawIssueNum=&-operator=equals&ArticleDate"
set d to d & "=&-sortField=ArticleDate&-sortOrder=descending"
set d to d & "&-maxRecords=20&-nothing=MSExplorerHack&-nothing"
set d to d & "=Start+Search' "
set u to "http://db.tidbits.com/TBSrchAdv.lasso"
set f to POSIX path of file ((path to temporary items as string) & "tempTidBITS")
do shell script "curl -d " & d & " -o " & f & " " & u
set perlScript to ... --where is the Perl script?
set r to do shell script "perl " & perlScript & " " & f
set L to paragraphs of r
set half to (count L) / 2
set L1 to items 1 thru half of L
set L2 to items (half + 1) thru -1 of L
set choice to (choose from list L2) as string
repeat with i from 1 to half
    if item i of L2 is choice then
        open location (item i of L1)
        exit repeat
    end if
end repeat

The only unsettled question is the line in the middle of the AppleScript code where the variable perlScript must be set to the POSIX pathname of the Perl script file. If we happen to know the name and location of this file, then of course the problem is solved and we can just hardcode this information into the AppleScript code; for example :

set perlScript to POSIX path of file ((path to desktop as string) & "perlScript.pl")

But this seems a rather fragile approach, as we are relying on an important file to be in a certain location; the file could accidentally be moved or renamed. This situation is just the sort where the script bundle format comes in handy (see "Compiled Script Files" in Chapter 3). We'll put the Perl script file inside the AppleScript compiled script file's bundle; the user will never see it, the two files won't be accidentally separated, and the AppleScript code will know where the Perl file is.

Create the AppleScript compiled script file using a script editor application, and save it as a script bundle (let's call it searchTidBITS). Create the Perl script in some other way (using a text editor such as BBEdit), save it as parseHTML.pl, and then move it into Contents/Resources inside searchTidBITS—you can do this manually using the Show Package Contents command in the Finder, or you can use Apple's Script Editor, which represents Contents/Resources in the Bundle Contents drawer of the script window (you can drag parseHTML.pl directly from the Finder into that drawer). The missing line of our code will then read:

set perlScript to quoted form of POSIX path of (path to resource "parseHTML.pl")

Note, when testing, that this script won't work when run from Apple's Script Editor, because path to resource in that context looks inside the Script Editor bundle, not the script bundle. Instead, use Script Debugger for testing, or else put the script bundle into your ~/Library/Scripts folder and run it from the Script Menu. (In Chapter 27 we'll take this approach one step further with AppleScript Studio: we'll build the Perl script into the application bundle and we'll wrap a nicer interface around our AppleScript code.)