Setting Traps

Your strongest defenses against webbots are techniques that detect webbot behavior. Webbots behave differently because they are machines and don’t have the reasoning ability of people. Therefore, a webbot will do things that a person won’t do, and a webbot lacks information that a person either knows or can figure out by examining his or her environment.

A spider trap is a technique that capitalizes on the behavior of a spider, forcing it to identify itself without interfering with normal human use. The spider trap in the following example exploits the spider behavior of indiscriminately following every hyperlink on a web page. If some links are either invisible or unavailable to people using browsers, you’ll know that any agent that follows the link is a spider. For example, consider the hyperlinks in Example 30-3.

There are many ways to trap a spider. Some other techniques include image maps with hot spots that don’t exist and hyperlinks located in invisible frames without width or height attributes.

Once unwanted guests are detected, you can treat them to a variety of services.

Identifying a spider is the first step in dealing with it. Moreover, with browser-spoofing techniques, a spider trap becomes a necessity in determining which traffic is automated and which is human. What you do once you detect a spider is up to you, but Table 30-1 should give you some ideas. Just remember to act within commonsense legal guidelines and your own website policies.