How Cookies Challenge Webbot Design

Webservers will not think anything is wrong if your webbots don’t use cookies, since many people configure their browsers not to accept cookies for privacy reasons. However, if your webbot doesn’t support cookies, you will not be able to access the multitude of websites that demand their use. Moreover, if your webbot doesn’t support cookies correctly, you will lose your webbot’s stealthy properties. You also risk revealing sensitive information if your webbot returns cookies to servers that didn’t write them.

Cookies operate transparently—as such, we may forget that they even exist. Yet the data passed in cookies is just as important as the data transferred in GET or POST methods. While PHP/CURL automatically handles cookies for webbot developers, some instances still cause problems—most notably when cookies are supposed to expire or when multiple users (with separate cookies) need to use the same webbot.

One thing to be cautious of is that when PHP/CURL writes cookies to the cookie file, they all become permanent, just like a cookie written to your hard drive by a browser. Using the techniques described here, all cookies accepted by PHP/CURL are written to the cookie file, whether or not they are intended to expire at the end of your session. This in itself is usually not a problem, unless your webbot accesses a website that manages authentication with temporary (session) cookies, which are normally intended to be erased when the browser closes. If you fail to purge your webbot’s temporary cookies and it accesses the same website for a whole year, you essentially tell the website’s system administrator that you haven’t closed your browser (let alone rebooted your computer!) for an entire twelve months. Since this is not a likely scenario, your account may receive unwanted attention or your webbot may eventually violate the website’s authentication process. If you use PHP/CURL as described here, you need to manually delete your cookies every so often in order to avoid these problems.

You can also avoid issues with unpurged cookies by inserting the line of code in Example 21-6 as part of your PHP/CURL session configuration, just after your other cookie declarations.

When this line of code is used, your cookies will still be written to the cookie file on your hard drive, but session cookies (those that are supposed to expire) will not be returned on subsequent sessions with the same website. In other words, these cookies are still written to the cookie file, but they otherwise exhibit regular browser-like cookie behavior.

There is also a PHP/CURL option that allows you to use cookies without ever writing them to the cookie file. I caution against using this option, however, because the ability to look in your cookie file to see how cookies are read and written makes debugging complex situations easier.

In some applications, your webbots may need to manage cookies for multiple users. For example, suppose you write one of the procurement bots or snipers mentioned in Chapter 18. You may want to integrate the webbot into a website where several people may log in and specify purchases. If these people each have private accounts at the e-commerce website that the webbot targets, each user’s cookies will require separate management.

Webbots can manage multiple users’ cookies by employing a separate cookie file for each user. LIB_http, however, does not support multiple cookie files, so you will have to write a scheme that assigns the appropriate cookie file to each user. Instead of declaring the name of the cookie file once, as is done in LIB_http, you will need to define the cookie file each time a PHP/CURL session is used. For simplicity, it makes sense to use the person’s username in the cookie file, as shown in Example 21-7.