PHP/CURL and Cookies

You can write webbots that support cookies without using PHP/CURL, but doing so adds to the complexity of your designs. Without PHP/CURL, you’ll have to read each returned HTTP header, parse the cookies, and store them for later use. You will also have to decide which cookies to send to which domains, manage expiration dates, and return everything correctly in headers of page requests. PHP/CURL does all this for you, automatically. Even with PHP/CURL, however, cookies pose challenges to webbot designers.

Fortunately, PHP/CURL does support cookies, and we can effectively use it to capture the cookies from the previous example, as shown in Example 21-3.

Example 21-3. Reading cookies with PHP/CURL and the LIB_http library

include("LIB_http.php");
$target="http://www.WebbotsSpidersScreenScrapers.com/Listing_21_1.php";
http_get($target, "");

LIB_http defines the file where PHP/CURL stores cookies. This declaration is done near the beginning of the file, as shown in Example 21-4.

Example 21-4. Cookie file declaration, as made in LIB_http

# Location of your cookie file (must be a fully resolved address)
define("COOKIE_FILE", "c:\cookie.txt");

As noted in Example 21-4, the address for a cookie file should be fully resolved and within the local file structure. In my experience, relative cookie file addresses may work on one PHP/CURL platform but not on others. For that reason, if you write webbots that need to perform in a mixed environment (Windows, Linux, etc.), you should always define fully resolved paths for your cookie files.

If a PHP/CURL script downloads a web page that writes cookies as done back in Example 21-1 (the URL for this web page is available at this book’s website), PHP/CURL writes the cookies in Netscape Cookie Format in the file defined in the LIB_http configuration, as shown in Example 21-5.

Example 21-5. The cookie file, as written by PHP/CURL

# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

www.webbotsspidersscreenscrapers.com     FALSE   /     FALSE   0            TemporaryCookie 66
www.webbotsspidersscreenscrapers.com     FALSE   /     FALSE   1324323775   PermanentCookie 88

Note

Each web client maintains its own cookies, and the cookie file written by PHP/CURL is not the same cookie file created by your browser.