You can write webbots that support cookies without using PHP/CURL, but doing so adds to the complexity of your designs. Without PHP/CURL, you’ll have to read each returned HTTP header, parse the cookies, and store them for later use. You will also have to decide which cookies to send to which domains, manage expiration dates, and return everything correctly in headers of page requests. PHP/CURL does all this for you, automatically. Even with PHP/CURL, however, cookies pose challenges to webbot designers.
Fortunately, PHP/CURL does support cookies, and we can effectively use it to capture the cookies from the previous example, as shown in Example 21-3.
Example 21-3. Reading cookies with PHP/CURL and the LIB_http
library
include("LIB_http.php"); $target="http://www.WebbotsSpidersScreenScrapers.com/Listing_21_1.php"; http_get($target, "");
LIB_http
defines the file where PHP/CURL stores cookies. This declaration is done near the beginning of the file, as shown in Example 21-4.
Example 21-4. Cookie file declaration, as made in LIB_http
# Location of your cookie file (must be a fully resolved address) define("COOKIE_FILE", "c:\cookie.txt");
As noted in Example 21-4, the address for a cookie file should be fully resolved and within the local file structure. In my experience, relative cookie file addresses may work on one PHP/CURL platform but not on others. For that reason, if you write webbots that need to perform in a mixed environment (Windows, Linux, etc.), you should always define fully resolved paths for your cookie files.
If a PHP/CURL script downloads a web page that writes cookies as done back in Example 21-1 (the URL for this web page is available at this book’s website), PHP/CURL writes the cookies in Netscape Cookie Format in the file defined in the LIB_http
configuration, as shown in Example 21-5.
Example 21-5. The cookie file, as written by PHP/CURL
# Netscape HTTP Cookie File # http://curl.haxx.se/rfc/cookie_spec.html # This file was generated by libcurl! Edit at your own risk. www.webbotsspidersscreenscrapers.com FALSE / FALSE 0 TemporaryCookie 66 www.webbotsspidersscreenscrapers.com FALSE / FALSE 1324323775 PermanentCookie 88