This section expands on the concepts you learned in the previous section with simple yet demonstrative projects. Any of these projects, with further development, could be transformed from a simple webbot concept into a potentially marketable product.
The first project describes webbots that collect and analyze online prices from a mock store that exists on this book’s website. The prices change periodically, creating an opportunity for your webbots to analyze and make purchase decisions based on the price of items.
Since this example store is solely for your experimentation, you’ll gain confidence in testing your webbot on web pages that serve no commercial purpose and haven’t changed since this book’s publication. This environment also gives you the freedom to make mistakes without obsessing over the crumbs your webbots leave behind in an actual online store’s server log file.
The image-capturing webbot leverages your knowledge of downloading and parsing web pages to create an application that copies all the images (and their directory structure) to your local hard drive. In addition to creating a useful tool, you’ll also learn how to convert relative addresses into fully resolved URLs, a technique that is vital for later spidering projects.
Here you will have the opportunity to write a webbot that automatically verifies that all the links on a web page point to valid web pages. I conclude the chapter with ideas for expanding this concept into a variety of useful tools and products.
This project describes a simple webbot that determines how highly a search engine ranks a website, given a set of search criteria. You’ll also find a host of ideas about how you can modify this concept to provide a variety of other services.
Aggregation is a technique that gathers the contents of multiple web pages into a single location. This project introduces techniques that make it easy to exploit the availability of RSS news services.
Webbots that use FTP are able to move the information they collect to an FTP server for storage or for use by other applications. In this chapter, we’ll explore methods for navigating on, uploading to, and downloading from FTP servers.
Here you will learn how to write webbots that read and delete messages from any POP3 mail server. The ability to read email allows a webbot to interpret instructions sent by email or to apply a variety of email filters.
In this chapter, you’ll learn various methods that allow your webbots to send email messages and notifications. You will also learn how to leverage what you learned in the previous chapter to create “smart email addresses” that can determine how to forward messages based on their content without modifying anything on the mail server.
This project describes how you can use form emulation and parsing techniques to transform any pre-existing online application into a function you can call from any PHP program.