Part III. Advanced Technical Considerations

The chapters in this section explore the finer technical aspects of webbot and spider development. In the first two chapters, I’ll share lessons that I learned (sometimes the hard way) while writing very specialized webbots and spiders. I’ll also describe methods for leveraging PHP/CURL to create webbots that manage authentication, encryption, and cookies.

Chapter 17

  • This discussion of spider design starts with an exploration of simple spiders that find and follow links on specific web pages. The conversation later expands to techniques for developing advanced spiders that autonomously roam the Internet, looking for specific information and dropping payloads—performing predefined functions as they find desired information.

Chapter 18

  • In this chapter, we’ll explore the design theory of writing snipers, webbots that automatically purchase items. Snipers are primarily used on online auctions sites, “attacking” when a specific list of criteria is met.

Chapter 19

  • Encrypted websites are not a problem for webbots using PHP/CURL. Here we’ll explore how online encryption certificates work and how PHP/CURL makes encryption easy to handle.

Chapter 20

  • In this chapter on accessing authenticated (i.e., password-protected) sites, we’ll explore the various methods used to protect a website from unauthorized users. You’ll also learn how to write webbots that can automatically log in to these sites.

Chapter 21

  • Advanced cookie management involves managing cookie expiration dates and multiple sets of cookies for multiple users. We’ll also explore PHP/CURL’s ability (and inability) to meet these challenges.

Chapter 22

  • Webbots are most useful when they can be scheduled to run periodically and automatically. This chapter explores techniques that allow your webbots to run unattended while still simulating human activity.

Chapter 23

  • Modern web development techniques, such as JavaScript, web sockets, AJAX, and Flash, complicate data extraction. These issues can nearly always be overcome with the use of simple iMacros browser macros.

Chapter 24

  • After you’ve mastered simple browser macro scripts it’s time to learn a few tricks to make iMacros perform beyond its original scope. Here you’ll learn how to develop PHP scripts that write dynamic macros and how to make iMacros parse data or upload files.

Chapter 25

  • This section’s final chapter describes various methods for deploying webbots, spiders, and screen scrapers in production environments, and how to gain maximum data gathering capacity from your designs.