There are three basic ways to create additional instances of a webbot:
Fork additional harvesting processes from the same process.
Use the operating system to create multiple instances of the same script.
Execute the same webbot on multiple pieces of hardware.
Some webbot developers prefer to create new instances of the same webbot by forking processes from a single script. Forking is the method of creating somewhat independent scripts from a parent script. It allows a script to execute tasks in parallel. In the case of webbot development, forking could allow a single script to download web pages from multiple target websites at the same time.
Forking is mentioned only because it is something you should be aware of, not because it’s something I necessarily recommend. You should, however, feel free to explore the forking commands in PHP/CURL on your own if they interest you. You will not, however, learn much about forking your webbot scripts here, because there are easier ways to accomplish the same thing. Forking also has the disadvantage of not benefiting from access to additional IP address because your forked instances will probably all run on the same computer.
Rather than developing methods to fork webbots, I find it much easier to run more than one copy of the same webbot at the same time. If your webbot is written correctly, you can create new instances of your webbot by simply running them in multiple command shells, as shown in Figure 25-5.
Later in this chapter, you’ll learn techniques for getting multiple instances of webbots to communicate with each other and work on the same team. Figure 25-5 shows three instances of the same webbot running at once, but once these techniques are mastered, you can scale your webbot to the size needed by simply executing it in as many shells as required.
The example depicted in Figure 25-5 still has the problem that, unless the network traffic for each webbot is directed through a different proxy server, each instance of the webbot will have the same IP address and appear to be running as the same user. The easiest way to solve this problem is to run the same webbot script from separate computers. Running webbots on multiple pieces of hardware, however, requires central management. Networks of identical webbots running on many computers and controlled from a botnet server are known as botnets.