This book identifies the limitations of typical web browsers and explores how you can use webbots to capitalize on these limitations. You’ll learn how to design and write webbots through sample scripts and example projects. Moreover, you’ll find answers to larger design questions like these:
I’ve written webbots, spiders, and screen scrapers for over 15 years, and in the process I’ve made most of the mistakes someone can make. Because webbots are capable of making unconventional demands on websites, system administrators can confuse webbots’ requests with attempts to hack into their systems. Thankfully, none of my mistakes has ever led to a courtroom, but they have resulted in intimidating phone calls, scary emails, and very awkward moments. Happily, I can say that I’ve learned from these situations, and it’s been a very long time since I’ve been across the desk from an angry system administrator. You can spare yourself a lot of grief by reading my stories and learning from my mistakes.
You will learn about the technology needed to write a wide assortment of webbots. Some technical skills you’ll master include these:
Programmatically downloading websites
Decoding encrypted websites
Unlocking authenticated web pages
Managing cookies
Parsing data
Writing spiders
Managing the large amounts of data that webbots generate
This book uses several code libraries that make it easy for you to write webbots, spiders, and screen scrapers. The functions and declarations in these libraries provide the basis for most of the example scripts used in this book. You’ll save time by using these libraries because they do the underlying work, leaving the upper-level planning and development to you. All of these libraries are available for download at this book’s website.