The chapters in this section explore the finer technical aspects of webbot and spider development. In the first two chapters, I’ll share lessons that I learned (sometimes the hard way) while writing very specialized webbots and spiders. I’ll also describe methods for leveraging PHP/CURL to create webbots that manage authentication, encryption, and cookies.
This discussion of spider design starts with an exploration of simple spiders that find and follow links on specific web pages. The conversation later expands to techniques for developing advanced spiders that autonomously roam the Internet, looking for specific information and dropping payloads—performing predefined functions as they find desired information.
In this chapter, we’ll explore the design theory of writing snipers, webbots that automatically purchase items. Snipers are primarily used on online auctions sites, “attacking” when a specific list of criteria is met.
Encrypted websites are not a problem for webbots using PHP/CURL. Here we’ll explore how online encryption certificates work and how PHP/CURL makes encryption easy to handle.
In this chapter on accessing authenticated (i.e., password-protected) sites, we’ll explore the various methods used to protect a website from unauthorized users. You’ll also learn how to write webbots that can automatically log in to these sites.
Advanced cookie management involves managing cookie expiration dates and multiple sets of cookies for multiple users. We’ll also explore PHP/CURL’s ability (and inability) to meet these challenges.
Webbots are most useful when they can be scheduled to run periodically and automatically. This chapter explores techniques that allow your webbots to run unattended while still simulating human activity.
Modern web development techniques, such as JavaScript, web sockets, AJAX, and Flash, complicate data extraction. These issues can nearly always be overcome with the use of simple iMacros browser macros.
After you’ve mastered simple browser macro scripts it’s time to learn a few tricks to make iMacros perform beyond its original scope. Here you’ll learn how to develop PHP scripts that write dynamic macros and how to make iMacros parse data or upload files.
This section’s final chapter describes various methods for deploying webbots, spiders, and screen scrapers in production environments, and how to gain maximum data gathering capacity from your designs.