Implement passive, configurable spellchecking to create correctly-spelled listings in less time.
The success of any auction is largely due to how readily it can be found in eBay searches. As described in Chapter 2, eBay searches show only exact matches (with very few exceptions), which means, among other things, that spelling most definitely counts.
Turbo Lister and eBay's Sell Your Item form have spellcheck features, both of which use the old-school, manual approach that forces you to interrupt your work to review each individual mistake. This hack streamlines the process by summarizing the spelling errors in all your listings in one place.
The following script requires the following modules and programs:
Table 8-2.
Module/program name | Available at |
---|---|
| search.cpan.org/perldoc?HTML::FormatText |
| search.cpan.org/perldoc?HTML::TreeBuilder |
| search.cpan.org/perldoc?HTML::Entities |
| search.cpan.org/perldoc?Lingua::Ispell |
ispell program (by Geoff Kuenning) | fmg-www.cs.ucla.edu/geoff/ispell.html |
Here's the script:
#!/usr/bin/perl require 'ebay.pl'; require HTML::TreeBuilder; require HTML::FormatText; use Lingua::Ispell qw( spellcheck ); Lingua::Ispell::allow_compounds(1); $out1 = ""; $outall = ""; $numchecked = 0; $numfound = 0; $today = &formatdate(time); $yesterday = &formatdate(time - 86400); my $page_number = 1; PAGE: while (1) {my $rsp = call_api({ Verb => 'GetSellerList', DetailLevel => 0, UserId => $user_id, StartTimeFrom => $yesterday, StartTimeTo => $today, PageNumber => $page_number }); if ($rsp->{Errors}) { print_error($rsp); last PAGE; } foreach (@{$rsp->{SellerList}{Item}}) { my %i = %$_; $id = @i{qw/Id/}; if (! -e "$localdir/$id") { my $rsp = call_api({ Verb => 'GetItem', DetailLevel => 2, Id => $id }); if ($rsp->{Errors}) { print_error($rsp) } else { my %i = %{$rsp->{Item}[0]}; my ($title, $description) = @i{qw/Title Descri ption/};
$spellthis = $title . " " . $description;
$tree = HTML::TreeBuilder->new_from_content($spellthis); $formatter = HTML::FormatText->new(); $spellthat = $formatter->format($tree);
$tree = $tree->delete;
for my $r ( spellcheck( $spellthat ) ) { if ( $r->{'type'} eq 'miss' ) { $out1 = $out1."'$r->{'term'}'"; $out1 = $out1." - near misses: @{$r->{'misses'}}\n"; $numfound++; } elsif ( $r->{'type'} eq 'guess' ) { $out1 = $out1."'$r->{'term'}'"; $out1 = $out1." - guesses: @{$r->{'guesses'}}\n"; $numfound++; } elsif ( $r->{'type'} eq 'none' ) { $out1 = $out1."'$r->{'term'}'"; $out1 = $out1." - no match.\n"; $numfound++; } } $numchecked++; if ($out1 ne "") { $outall = $outall."Errors in #$id '$title':\n"; $outall = $outall."$out1\n\n"; $out1 = ""; } } } } last PAGE unless $rsp->{SellerList}{HasMoreItems}; $page_number++; }
print "$numfound spelling errors found in $numchecked auctions:\n\n"; print "$outall\n";
This script is based on the one in "Automatically Keep Track of Items You've Sold" [Hack #112] , but it has a few important additions and changes.
First, instead of listing recently completed auctions, the
GetSellerList
API call
is used to retrieve auctions that have started
in the last 24 hours. This will work perfectly if you want to review
your listings daily or schedule [Hack
#21] it to run every 24 hours, say, at 3:00 P.M.
every day.
Second, since you want the auction descriptions, you need to use
the GetItem
API call for each
auction we spellcheck. This means that spellchecking a dozen auctions
will require 13 API calls: one call to retrieve the list, and one for
each auction.
The code actually responsible for performing spellcheck starts
on line , where the title and description are
concatenated into a single variable,
$spellthis,
so that only one spellcheck is
necessary for each auction. Next, the HTML::FormatText
module is used (lines
to
) to convert any HTML-formatted text to plain
text.
Finally, the Lingua::Ispell
module uses the external ispell
program to perform a spellcheck on
$spellthat
(the cleaned-up version of
$spellthis
). As errors are found,
suggestions are recorded into the $out1
variable, which is merged with
$outall
and displayed when the
spellcheck is complete.
Here are a few things you might want to do with this script:
Instead of simply printing out the results of the
spellcheck, as the script does on line , you can quite easily have the results
emailed to you [Hack #118] .
Currently, the script performs a spellcheck on every running auction started in the last 24 hours. If you run the script every 24 hours, then this won't pose a problem. But if you choose to run the script manually and therefore specify a broader range of dates, you may wish to include error checking to prevent the script from needlessly checking the same auction twice.
Most spellcheckers have a means of adding new words to
the dictionary, and this one should be no exception. An easy
solution is to create a text file with the proper names, technical
terms, and other words the spellchecker doesn't recognize. Then,
include code to read the list and eliminate any matches that are
found immediately after line , so that only newly unrecognized terms are
caught.