Chapter 22. Scheduling Webbots and Spiders

Up to this point, all of the example webbots have run only when executed directly from a command line or when loaded in a browser. In real-world situations, however, you may want to schedule your webbots and spiders to run automatically and periodically. This chapter describes methods for scheduling webbots to run unattended in a Windows environment. Most readers should have access to the scheduling tool I’ll be using here.

If you are using an operating system other than Windows, don’t despair. Most operating systems support scheduling software of some type. In Unix, Linux, and Mac OS X environments, you can always use the cron command, a text-based scheduling tool. Regardless of the operating system you use, there should be a graphical user interface (GUI) for a scheduling tool similar to the one Windows uses.

As many of you may still be using Windows XP, I’ll describe how to schedule webbots using that environment. Then, I’ll show you how to use a more modern scheduler in Windows 7.

Regardless of the scheduler you use, you should create a batch file that executes the webbot. It is easier to schedule a batch file than to specify the PHP file directly, because the batch file adds flexibility in defining pathnames and allows multiple webbots, or events, to run from the same scheduled task. Example 22-1 shows the format for executing a PHP webbot from a batch file.

In the batch file shown in Example 22-1, the operating system executes the PHP interpreter, which subsequently executes my_webbot.php.

You can also use a batch file to execute a remote webbot. Example 22-2 shows how to use PHP/CURL to execute a webbot that is on a remote webserver.