Since the output of this webbot contains formatted HTML, it is appropriate to run this webbot within a browser, as shown in Figure 10-2.
This webbot counts and identifies all the links on the target website. It also indicates the HTTP code and diagnostic message describing the status of the fetch used to download the page and displays the actual amount of time it took the page to load.
Let’s take this time to look at some of the libraries used by this webbot.
The following script creates an indexed array of HTTP error codes and their definitions. To use the array, simply include the library, insert your HTTP code value into the array, and echo as shown in Example 10-8.
Example 10-8. Decoding an HTTP code with LIB_http_codes
include(LIB_http_codes.php); echo $status_code_array[$YOUR_HTTP_CODE]['MSG']
LIB_http_codes
is essentially a group of array declarations, with the first element being the HTTP code and the second element, ['MSG']
, being the status message text. Like the others, this library is also available for download from this book’s website.
The library that creates fully resolved addresses, LIB_resolve_addresses
, is also available for download at the book’s website.
Before you download and examine this library, be warned that creating fully resolved URLs is a lot like making sausage—while you might enjoy how sausage tastes, you probably wouldn’t like watching those lips and ears go into the grinder. Simply put, the act of converting relative links into fully resolved URLs involves awkward, asymmetrical code with numerous exceptions to rules and many special cases. This library is extraordinarily useful, but it isn’t made up of pretty code.
If you don’t need to see how this conversion is done, there’s no reason to look. If, on the other hand, you’re intrigued by this description, feel free to download the library from the book’s website and see for yourself. More importantly, if you find a cleaner solution, please upload it to the book’s website to share it with the community.