Now that you know how to automate the task of launching webbots from both scheduled and nonscheduled events, it’s time for a few words of caution.
A common question when deploying webbots is how often to schedule a webbot to check if data has changed on a target server. The answer depends on your need for stealth and how often the target data changes. If your webbot must run without detection, you should limit the number of file accesses you perform, since every file your webbot downloads leaves a clue in the server’s log file. Your webbot becomes increasingly obvious as it creates more and more log entries.
The periodicity of your webbot’s execution may also hinge on how often your target changes. Additionally, you may require notification as soon as a particularly important website changes. Timeliness may drive the need to run the webbot more frequently. In any case, you never want to run a webbot more often than necessary. You should read Chapter 31 before you deploy a webbot that runs frequently or consumes excessive bandwidth from a server.
I always contend that you shouldn’t access a target more than necessary to perform a job. If you’re connecting to a target more than once every hour or so, you’re probably hitting it too hard. Obviously, the rules change if you own the target server.
Remember that hardware and software are both subject to unexpected crashes. If your webbot performs a mission-critical task, you should ensure that your scheduler doesn’t create a single point of failure or execute a process step that may cause an entire webbot to fail if that one step crashes. Chapter 28 describes methods to ensure that your webbot does not stop working if a scheduled webbot fails to run.
The other potential problem with scheduled tasks is that they run precisely and repeatedly, creating entries in the target’s access log at the same hour, minute, and second every time. If you schedule your webbot to run once a month, this may not be a problem, but if a webbot runs daily at exactly the same time, it will become obvious to any competent system administrator that a webbot, not a human, is accessing the server. If you want to schedule a webbot that emulates a human using a browser, you should see Chapter 26 for more information.