Index

A note on the digital index

A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.

Symbols

$address array, Sending an Email with mail()
$content_type variable, Writing a Webbot That Sends Email Notifications
$data_array, Examining LIB_http’s Source Code, Getting the Session Value
for LIB_http library functions, Examining LIB_http’s Source Code
$FETCH_DELAY, Example Spider, exclude_link()
$filter_array, Choosing Data Sources for Webbots, Dealing with CDATA
$link_array elements, Setting the Page Base
$page_base variable, Defining the Page Base
$result array, FILE element, preg_replace(pattern, replacement, subject)
$status_code_array, Displaying the Page Status
$_GET array, Avoid Form Errors
$_POST array, Avoid Form Errors
& (ampersand), in GET method, Methods
. (period), as POP3 end-of-message indicator, The POP3 LIST Command
404 Not Found error, Avoid Making Requests for Pages That Don’t Exist, HTTP Codes
? (question mark), in GET method, Methods

A

abstractions, of program interface, LIB_mysql
access log file, Cookies, Why Design a Stealthy Webbot?, Why Design a Stealthy Webbot?
and webbot detection, Why Design a Stealthy Webbot?
error logging in, Why Design a Stealthy Webbot?
access rights, verifying, Protect Intellectual Property
action attribute, Reverse Engineering Form Interfaces, Analyzing a Form
for form analyzer, Analyzing a Form
of form, Reverse Engineering Form Interfaces
action of person, simulating, Fetching the Search Results
$address array, Sending an Email with mail()
agent name, Cookies, Familiarizing Yourself with the Default Values, Error Logs, Types of Open Proxies, Use the robots.txt File, Fun Things to Do with Unwanted Spiders, CURLOPT_URL, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
default for, LIB_http, Familiarizing Yourself with the Default Values
defining for PHP/CURL session, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
log record of, Error Logs
spoofing, Cookies, Types of Open Proxies, Use the robots.txt File, Fun Things to Do with Unwanted Spiders, CURLOPT_URL
aggregating information by relevance, Inspiration from Browser Limitations
aggregation webbots, Price-Monitoring Webbots, Aggregation Webbots, Choosing Data Sources for Webbots, Example Aggregation Webbot, Familiarizing Yourself with RSS Feeds, Downloading and Parsing the Target, Dealing with CDATA, Dealing with CDATA
and filtering, Dealing with CDATA
CDATA, Dealing with CDATA
choosing data sources, Choosing Data Sources for Webbots
downloading and parsing script, Downloading and Parsing the Target
RSS feeds, Example Aggregation Webbot
writing, Familiarizing Yourself with RSS Feeds
Alexa web-monitoring service, A Safer Method of Passing Variables to Webbots
“all rights reserved” notice, Copyrights Do Not Have to Be Registered
Amazon Web Services, SOAP interfaces, A Safer Method of Passing Variables to Webbots
ampersand (&), in GET method, Methods
anchor tags, Link-Verification Webbots (see links)
Andreessen, Marc, Introduction
anonymity, Configuring PHP to Send Mail, Commercial Proxies
as a process, Commercial Proxies
in commercial email, Configuring PHP to Send Mail
anti-pokerbot software, Webbots That Act on Your Behalf
Anti-Spam Law, Virginia, Trespass to Chattels
Apache, Hardware, Referer Management, http_get_withheader(), Synchronize Clocks, Basic Authentication, Why Design a Stealthy Webbot?
cookies, Basic Authentication
headers, http_get_withheader(), Synchronize Clocks
installing PHP on, Referer Management
log files, Why Design a Stealthy Webbot?
web server, Hardware
Application Program Interfaces (APIs), Spidering Search Engines Is a Bad Idea, Example Aggregation Webbot, Example Aggregation Webbot, Example Aggregation Webbot
Amazon, Example Aggregation Webbot
eBay, Example Aggregation Webbot
Google, Spidering Search Engines Is a Bad Idea
Google Maps, Example Aggregation Webbot
archive_links() function, archive_links()
ARPANET, FTP Webbots
array, Parsing a Data Set into an Array: parse_array(), Regular Expressions Provide Little (If Any) Context, The GET Method, Initialization and Downloading the Target, Initialization and Downloading the Target
assigning parsed data to, Initialization and Downloading the Target
elements, form data as, The GET Method
of <img> tags, src attribute from, Regular Expressions Provide Little (If Any) Context
parsing, Parsing a Data Set into an Array: parse_array(), Initialization and Downloading the Target
data set into, Parsing a Data Set into an Array: parse_array()
table into, Initialization and Downloading the Target
attributes, parsing values, Parsing a Data Set into an Array: parse_array()
audience, for Internet, Old-School Client-Server Technology
authentication, Introducing PHP/CURL, Introducing PHP/CURL, Example FTP Webbot, Procurement Webbot Theory, Authenticate Buyer, Synchronize Clocks, Types of Online Authentication, Types of Online Authentication, Strengthening Authentication by Combining Techniques, Strengthening Authentication by Combining Techniques, Basic Authentication, Basic Authentication, Basic Authentication, Basic Authentication, Cookie Session Example, Purging Temporary Cookies, Authenticate Users, CURLOPT_SSL_VERIFYPEER
and encryption, Basic Authentication
basic, Introducing PHP/CURL, Strengthening Authentication by Combining Techniques, Basic Authentication, CURLOPT_SSL_VERIFYPEER
by PHP/CURL, Introducing PHP/CURL
curl_setopt() function options for, CURLOPT_SSL_VERIFYPEER
test pages, Basic Authentication
default response to request, Introducing PHP/CURL
digest, Basic Authentication
example scripts and practice pages, Strengthening Authentication by Combining Techniques
for deterring webbots, Authenticate Users
FTP, Example FTP Webbot
of buyer by procurement webbot, Procurement Webbot Theory
of snipers, Authenticate Buyer, Purging Temporary Cookies
session, Basic Authentication
strengthening by combining techniques, Types of Online Authentication
types, Types of Online Authentication
with query string sessions, Cookie Session Example
automating tasks, Help Out a Busy Executive

B

bandwidth, Referer Management, Binary-Safe Download Routine, Authenticate Buyer, Final Thoughts, Maintain the Accuracy of Referer Values
consumption, Authenticate Buyer, Final Thoughts
hijacking, Binary-Safe Download Routine, Maintain the Accuracy of Referer Values
stealing, Referer Management
base64-encoding, Storing Images in a Database
basic authentication, Introducing PHP/CURL, Strengthening Authentication by Combining Techniques, Strengthening Authentication by Combining Techniques, CURLOPT_SSL_VERIFYPEER
by PHP/CURL, Introducing PHP/CURL
curl_setopt() function options for, CURLOPT_SSL_VERIFYPEER
test pages, Strengthening Authentication by Combining Techniques
batch file, for webbot, Scheduling Webbots and Spiders
Bcc: address field, Sending an Email with mail()
Beck & Tysver legal website, It’s All About Respect
Bidder’s Edge spiders, Trespass to Chattels
bids, timing placement of, Evaluate Results, Time to Bid?
Bina, Eric, Introduction
binary-safe download routine, Creating the Image-Capturing Webbot
Bing, spiders used by, Spiders
biometrics, Types of Online Authentication
blobs, storing images as, The exe_sql() Function
blogs, Example Aggregation Webbot, Adding Filtering to Your Aggregation Webbot, Trespass to Chattels, Internet Law
aggregation of, Example Aggregation Webbot
laws concerning, Trespass to Chattels, Internet Law
searching for spelling errors, Adding Filtering to Your Aggregation Webbot
botnet management, Distributing the Task over Multiple Computers, Distributing the Task over Multiple Computers, Botnet Communication Methods, Polling the Botnet Server, Determining If There Is a Task for the Harvester to Perform, Determining If There Is a Task for the Harvester to Perform, Performing Tasks, Performing Tasks
assigning tasks, Determining If There Is a Task for the Harvester to Perform
communication methods, Distributing the Task over Multiple Computers
determining tasks, Polling the Botnet Server
performing tasks, Performing Tasks
polling the botnet server, Botnet Communication Methods
task checkout, Determining If There Is a Task for the Harvester to Perform
uploading botnet data, Performing Tasks
broken links, webbot detecting, Link-Verification Webbots
browser buffering, Executing Webbots in Browsers
browser macros, Scraping Difficult Websites with Browser Macros, Scraping Difficult Websites with Browser Macros, Overcoming Webscraping Barriers with Browser Macros, Overcoming Webscraping Barriers with Browser Macros, Overcoming Webscraping Barriers with Browser Macros, Overcoming Webscraping Barriers with Browser Macros, Installing and Using iMacros, Creating Your First Macro, Macro Initialization, Recording the Google Session, Recording the Google Session, Running a Macro, Running a Macro, Hacking iMacros, Hacking iMacros for Added Functionality, Hacking iMacros for Added Functionality, Reasons for Not Using the iMacros Scripting Engine, Reasons for Not Using the iMacros Scripting Engine, Writing a Script That Creates a Dynamic Macro, Integrating External Data into Dynamically Created Macros, Integrating External Data into Dynamically Created Macros, Launching iMacros from Windows
adding functionality to, Hacking iMacros for Added Functionality
browser-like webbots, Overcoming Webscraping Barriers with Browser Macros
commands, Recording the Google Session
creating your first, Installing and Using iMacros, Creating Your First Macro, Macro Initialization
initialization, Macro Initialization
recording, Creating Your First Macro
defined, Overcoming Webscraping Barriers with Browser Macros
dynamic macros, Reasons for Not Using the iMacros Scripting Engine, Reasons for Not Using the iMacros Scripting Engine, Writing a Script That Creates a Dynamic Macro
integrating data with, Writing a Script That Creates a Dynamic Macro
scripts that create, Reasons for Not Using the iMacros Scripting Engine
hacking, Hacking iMacros
iMacros Scripting Engine, reasons not to use, Hacking iMacros for Added Functionality
installing, Overcoming Webscraping Barriers with Browser Macros
launching automatically, Integrating External Data into Dynamically Created Macros, Integrating External Data into Dynamically Created Macros, Launching iMacros from Windows
in Linux, Launching iMacros from Windows
in Windows, Integrating External Data into Dynamically Created Macros
necessity for, Running a Macro
overcoming barriers with, Overcoming Webscraping Barriers with Browser Macros
reasons to use, Scraping Difficult Websites with Browser Macros
running, Running a Macro
suggested standard initialization of, Recording the Google Session
browser-like webbots, Overcoming Webscraping Barriers with Browser Macros
browsers, Introduction, Old-School Client-Server Technology, Ideas for Webbot Projects, Executing Webbots in Command Shells, Don’t Blow Your Cover, Parsing the Search Results, Aggregation Webbots
emulating, Don’t Blow Your Cover
(see also browser macros)
executing webbots in, Executing Webbots in Command Shells
inspiration from limitations of, Ideas for Webbot Projects
problem with, Old-School Client-Server Technology
search engine treatment vs. treatment of webbot, Parsing the Search Results
tabbed browsing, Aggregation Webbots
business leaders, webbot benefits for, Webbot Developers Are in Demand
buy-it-now auction purchases, Authenticate Buyer

C

CamelCase, Organizing Data
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), Authenticate Users, Internet Law
Cascading Style Sheets (CSS), Webbots That Interpret What They Find Online, Removing Formatting, Overcoming Webscraping Barriers with Browser Macros
impact of removing HTML tags, Removing Formatting
case, Setting the Page Base, Adding Filtering to Your Aggregation Webbot
for naming, Setting the Page Base
sensitivity, stristr() function vs. strstr() function, Adding Filtering to Your Aggregation Webbot
Cc: address field, Sending an Email with mail()
CDATA tags, Downloading and Parsing the Target
certificates, A Quick Overview of Web Encryption
Children’s Online Privacy Protection Act (COPPA), Email, Webbots, and Spam
ciphers, Webbots and Cryptography, A Quick Overview of Web Encryption
client URL Request Library (cURL), Hardware, Plot Unauthorized Wi-Fi Networks, Downloading Web Pages
client-server technology, Old-School Client-Server Technology
clipping service, online, Protect Intellectual Property, Adding Filtering to Your Aggregation Webbot
clocks, synchronization for sniper, Authenticate Buyer
code, About the Website, About the Code
in book, About the Website
libraries available online, About the Code
collusion webbots, Webbots That Interpret What They Find Online
comma-separated value (CSV) files, Executing Webbots in Browsers, Instructions You’ll Want in Every Macro
file() function for downloading, Executing Webbots in Browsers
iMacros file format, Instructions You’ll Want in Every Macro
command shell, Executing Webbots in Command Shells, Experimenting with the Spider, Forking Processes
and spider scripts, Experimenting with the Spider
executing webbots in, Executing Webbots in Command Shells
leveraging operating system with, Forking Processes
Common Object Request Broker Architecture (CORBA), A Safer Method of Passing Variables to Webbots
communication, on incompatible systems, Plot Unauthorized Wi-Fi Networks
competitive advantage, What’s in It for You?, Automating Form Submission, Final Thoughts, Time to Bid?, Designing Stealthy Webbots and Spiders, Writing Fault-Tolerant Webbots, Killing Spiders
Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA), Authenticate Users, Internet Law
compressing data, Compressing Data
computers, Forking Processes, Trespass to Chattels
(see also server)
distributing tasks across multiple, Forking Processes
constructive hacking, Webbot Developers Are in Demand
$content_type variable, PHP and FTP
Content-Type line, http_get_withheader(), The POP3 RETR Command
for email message, The POP3 RETR Command
in HTTP header, http_get_withheader()
converting website into function, Converting a Website into a Function
cookies, Cookies, Familiarizing Yourself with the Default Values, Event Triggers, Basic Authentication, Authentication with Query Sessions, Advanced Cookie Management, How Cookies Work, How Cookies Work, PHP/CURL and Cookies, PHP/CURL and Cookies, PHP/CURL and Cookies, PHP/CURL and Cookies, Purging Temporary Cookies, Adapting to Changes in Cookie Management, Adapting to Changes in Cookie Management, Selectively Allow Access to Specific Web Agents
about, Advanced Cookie Management
adapting to management changes, Adapting to Changes in Cookie Management
and forms, Event Triggers
and webbot design, How Cookies Work
defaults for, Familiarizing Yourself with the Default Values
deleting, PHP/CURL and Cookies, Adapting to Changes in Cookie Management
expiration dates for, PHP/CURL and Cookies
for authentication, Basic Authentication
for deterring webbots, Selectively Allow Access to Specific Web Agents
managing multiple users’, Purging Temporary Cookies
persistence with, PHP/CURL and Cookies
PHP/CURL to read and write, Cookies
purging temporary, PHP/CURL and Cookies
restrictions, with proxies, Authentication with Query Sessions
viewing, How Cookies Work
COOKIE_FILE, PHP/CURL and Cookies
COPPA (Children’s Online Privacy Protection Act), Email, Webbots, and Spam
copyright issues, Database or File?, It’s All About Respect, Copyrights Do Not Have to Be Registered, Copyrights Do Not Have to Be Registered, You Cannot Copyright a Fact, You Can Use Some Material Under Fair Use Laws
and facts, You Cannot Copyright a Fact
fair use laws, You Can Use Some Material Under Fair Use Laws
registration, Copyrights Do Not Have to Be Registered
“all rights reserved” notice, Copyrights Do Not Have to Be Registered
CORBA (Common Object Request Broker Architecture), A Safer Method of Passing Variables to Webbots
crawlers, Spiders (see spiders)
cron command, Scheduling Webbots and Spiders
cryptography, Webbots and Cryptography
CSS (Cascading Style Sheets), Webbots That Interpret What They Find Online, Removing Formatting, Overcoming Webscraping Barriers with Browser Macros
impact of removing HTML tags, Removing Formatting
CSV (comma-separated value) files,, Executing Webbots in Browsers, Instructions You’ll Want in Every Macro
file() function for downloading, Executing Webbots in Browsers
iMacros file format, Instructions You’ll Want in Every Macro
cURL (client URL Request Library), Hardware, Plot Unauthorized Wi-Fi Networks, Downloading Web Pages
curl_error() function, Executing the PHP/CURL Command
curl_exec() function, Executing the PHP/CURL Command
curl_getInfo() function, Executing the PHP/CURL Command
curl_init() function, Creating a Minimal PHP/CURL Session
curl_setopt() function, The POST Method, Binary-Safe Download Routine, A Quick Overview of Web Encryption, A Quick Overview of Web Encryption, Cookie Session Example, Cookie Session Example, Further Exploration, Further Exploration, Avoid Making Requests for Pages That Don’t Exist, Adapting to Changes in Cookie Management, Creating a Minimal PHP/CURL Session, CURLOPT_URL, CURLOPT_URL, CURLOPT_URL, CURLOPT_URL, CURLOPT_URL, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS, CURLOPT_TIMEOUT, CURLOPT_TIMEOUT, CURLOPT_TIMEOUT, CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYPEER, CURLOPT_VERBOSE, CURLOPT_VERBOSE, CURLOPT_VERBOSE, CURLOPT_VERBOSE, Executing the PHP/CURL Command
case sensitivity, CURLOPT_VERBOSE
CURLOPT_COOKIEFILE option, Cookie Session Example, Further Exploration, CURLOPT_TIMEOUT
CURLOPT_COOKIEJAR option, Cookie Session Example, Further Exploration, CURLOPT_TIMEOUT
CURLOPT_FOLLOWLOCATION option, CURLOPT_URL
CURLOPT_HEADER option, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
CURLOPT_HTTPHEADER option, CURLOPT_TIMEOUT
CURLOPT_MAXREDIRS option, Avoid Making Requests for Pages That Don’t Exist, CURLOPT_URL
CURLOPT_NOBODY option, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
CURLOPT_PORT option, CURLOPT_VERBOSE
CURLOPT_POST option, CURLOPT_SSL_VERIFYPEER
CURLOPT_POSTFIELDS option, CURLOPT_SSL_VERIFYPEER
CURLOPT_REFERER option, CURLOPT_URL
CURLOPT_RETURNTRANSFER option, CURLOPT_URL
CURLOPT_SSL_VERIFYHOST option, A Quick Overview of Web Encryption
CURLOPT_SSL_VERIFYPEER option, A Quick Overview of Web Encryption, CURLOPT_SSL_VERIFYPEER
CURLOPT_TIMEOUT option, Adapting to Changes in Cookie Management
CURLOPT_UNRESTRICTED_AUTH option, CURLOPT_SSL_VERIFYPEER
CURLOPT_URL option, CURLOPT_URL
CURLOPT_USERAGENT option, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
CURLOPT_USERPWD option, CURLOPT_SSL_VERIFYPEER
CURLOPT_VERBOSE option, CURLOPT_VERBOSE
executing, CURLOPT_VERBOSE
custom logs, and webbot detection, Error Logs

D

daily scheduling of webbots, Scheduling a Webbot to Run Daily
$data_array, Examining LIB_http’s Source Code, Parsing a Data Set into an Array: parse_array(), Reverse Engineering Form Interfaces, Form Handlers, Choosing Data Sources for Webbots, Getting the Session Value, Trespass to Chattels
fields in forms, Reverse Engineering Form Interfaces, Form Handlers
for LIB_http library functions, Examining LIB_http’s Source Code
networks, access and abuse, Trespass to Chattels
set, parsing into array, Parsing a Data Set into an Array: parse_array()
sources, choosing for aggregation webbot, Choosing Data Sources for Webbots
data management, Managing Large Amounts of Data, Managing Large Amounts of Data, Managing Large Amounts of Data, Naming Conventions, Storing Data in Structured Files, The exe_sql() Function, Database or File?, Database or File?, Compressing Data, Compressing Files on Your Hard Drive, Removing Formatting
organizing data, Managing Large Amounts of Data, Managing Large Amounts of Data, Naming Conventions, Storing Data in Structured Files, The exe_sql() Function
naming conventions, Managing Large Amounts of Data
storing images in database, The exe_sql() Function
storing text in database, Storing Data in Structured Files
structured files, Naming Conventions
reducing size, Database or File?, Database or File?, Compressing Data, Compressing Files on Your Hard Drive
data compression, Compressing Data
removing formatting, Compressing Files on Your Hard Drive
storing references to image files, Database or File?
thumbnailing images, Removing Formatting
data-only interfaces, Example Aggregation Webbot, Distributing Resources, Non-ASCII Content, Non-ASCII Content, XML, A Safer Method of Passing Variables to Webbots, Advantages of SOAP
lightweight data exchange, XML
REST (Representational State Transfer), Advantages of SOAP
SOAP (Simple Object Access Protocol), Distributing Resources, A Safer Method of Passing Variables to Webbots
XML (eXtensible Markup Language), Example Aggregation Webbot, Non-ASCII Content
database, Storing Data in Structured Files, The exe_sql() Function, Experimenting with the Spider
for saving links, Experimenting with the Spider
storing images in, The exe_sql() Function
storing text in, Storing Data in Structured Files
<data> tags, for insertion parse, Fetching the Search Results
dates, in filenames, Naming Conventions
DCOM (Distributed Component Object Model), A Safer Method of Passing Variables to Webbots
decode_zipcode() function, Defining the Interface
deep linking, Maintain the Accuracy of Referer Values
default file, for web page, Downloading Web Pages
delays, inserting between page fetches, Run Your Webbot During Busy Hours
DELE command (POP3), The POP3 RETR Command
deleting, Removing Unwanted Text: remove(), Compressing Files on Your Hard Drive, Removing Formatting, PHP/CURL and Cookies
cookies, PHP/CURL and Cookies
HTML formatting, Compressing Files on Your Hard Drive
unwanted text, Removing Unwanted Text: remove()
white space, Removing Formatting
delimiters, Using LIB_parse, Parsing Text Between Delimiters: return_between()
parsing text between, Parsing Text Between Delimiters: return_between()
splitting string at, Using LIB_parse
denial-of-service (DoS) attacks, preventing, exclude_link(), Many-to-One Environment, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
deployment of webbots, Deployment and Scaling (see scaling)
DES (Digital Encryption Standard), A Quick Overview of Web Encryption
describe_zipcode() function, Analyzing the Target Web Page
developers, webbot benefits for, What’s in It for You?
difficult websites, scraping, Scraping Difficult Websites with Browser Macros
digest authentication, Basic Authentication
digital certificate, Webbots and Cryptography
Digital Encryption Standard (DES), A Quick Overview of Web Encryption
directories, Naming Conventions, Binary-Safe Download Routine
script for creating, Binary-Safe Download Routine
disclaimer, Hardware
disk swapping, Experimenting with the Spider
Distributed Component Object Model (DCOM), A Safer Method of Passing Variables to Webbots
<div> tags, parsing data into array, Designing the Parsing Script
DOS (denial-of-service) attacks, preventing, exclude_link(), Many-to-One Environment, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
downloading, Downloading Web Pages, Downloading Web Pages, Think About Files, Not Web Pages, Executing Webbots in Browsers, Link-Verification Webbots, Generating Fully Resolved URLs, FTP Webbots
linked page, Generating Fully Resolved URLs
web pages, Downloading Web Pages
with FTP, FTP Webbots
with LIB_http, Downloading Web Pages
with link-verification webbot, Link-Verification Webbots
with PHP built-in functions, Think About Files, Not Web Pages
with PHP/CURL, Executing Webbots in Browsers
download_binary_file() function, Creating the Image-Capturing Webbot
download_images_for_page() function, Directory Structure
download_parse_rss() function, Familiarizing Yourself with RSS Feeds, Downloading and Parsing the Target

E

eBay, Help Out a Busy Executive, Choosing Data Sources for Webbots, Executing POP3 Commands with a Webbot, Authenticate Buyer, Evaluate Results, Running a Macro, Advantages of SOAP, Killing Spiders, Trespass to Chattels
snipers and, Authenticate Buyer
Electronic Frontier Foundation (EFF), Internet Law
email, Example FTP Webbot, Webbots That Read Email, The POP3 RETR Command, Executing POP3 Commands with a Webbot, Webbots That Send Email, Email, Webbots, and Spam, Email, Webbots, and Spam, Email, Webbots, and Spam, Sending an Email with mail(), Writing a Webbot That Sends Email Notifications, Keeping Legitimate Mail out of Spam Filters, Sending HTML-Formatted Email, Using Returned Emails to Prune Access Lists, The Windows 7 Task Scheduler
as webbot trigger, The Windows 7 Task Scheduler
for notification, Example FTP Webbot, Using Returned Emails to Prune Access Lists
of FTP transmission failure, Example FTP Webbot
of webbot action, Using Returned Emails to Prune Access Lists
guidelines, Email, Webbots, and Spam
headers, The POP3 RETR Command
keeping legitimate out of spam filter, Writing a Webbot That Sends Email Notifications
placing account information in script, Executing POP3 Commands with a Webbot
reading with webbots, Webbots That Read Email
sending, Webbots That Send Email, Email, Webbots, and Spam, Email, Webbots, and Spam, Sending an Email with mail(), Keeping Legitimate Mail out of Spam Filters
HTML-formatted, Keeping Legitimate Mail out of Spam Filters
notifications with webbots, Sending an Email with mail()
with mail() function, Email, Webbots, and Spam
with PHP, Email, Webbots, and Spam
undeliverable as alert to invalid address, Sending HTML-Formatted Email
email-controlled webbots, Executing POP3 Commands with a Webbot
encryption, Webbots and Cryptography, Webbots and Cryptography, A Quick Overview of Web Encryption, Final Thoughts, Use the robots.txt File
authentication and, Final Thoughts
certificate, A Quick Overview of Web Encryption
for deterring webbots, Use the robots.txt File
webbots using, Webbots and Cryptography
end-of-message indicator (POP3), The POP3 LIST Command
environments, Deployment and Scaling, Deployment and Scaling, One-to-One Environment, One-to-One Environment, Many-to-One Environment
many-to-many, One-to-One Environment
many-to-one, Many-to-One Environment
one-to-many, Deployment and Scaling
one-to-one, One-to-One Environment
error, http_get(), http_get(), Access Logs, Adapting to Network Outages and Network Congestion
handlers, Adapting to Network Outages and Network Congestion
information, http_get(), http_get()
from http_get() function, http_get()
from http_get_withheader() function, http_get()
logs, and webbot detection, Access Logs
eval() function, Lightweight Data Exchange
event triggers, Event Triggers
exclude_link() function, get_domain()
exclusion list, for spiders, exclude_link()
executing webbots, Executing Webbots in Command Shells, Executing Webbots in Command Shells
in browsers, Executing Webbots in Command Shells
in command shell, Executing Webbots in Command Shells
exe_sql() function, The update() Function
expiration dates, for cookies, How Cookies Work, Advanced Cookie Management
eXtensible Markup Language (XML), Example Aggregation Webbot, Determining If There Is a Task for the Harvester to Perform, Performing Tasks, Non-ASCII Content, XML
assigning tasks, Determining If There Is a Task for the Harvester to Perform, Performing Tasks
for RSS feeds, Example Aggregation Webbot
overhead, XML

F

facts, and copyright, You Cannot Copyright a Fact
fair use laws, You Can Use Some Material Under Fair Use Laws
fault-tolerant webbots, Writing Fault-Tolerant Webbots, Writing Fault-Tolerant Webbots, Avoid Making Requests for Pages That Don’t Exist, Follow Page Redirections, Maintain the Accuracy of Referer Values, Use Relative Parsing, Use Relative Parsing, Adapting to Changes in Cookie Management, Adapting to Changes in Cookie Management
cookie management changes, Adapting to Changes in Cookie Management
form changes, Use Relative Parsing
network outages and congestion, Adapting to Changes in Cookie Management
page content changes, Maintain the Accuracy of Referer Values
URL changes, Writing Fault-Tolerant Webbots, Avoid Making Requests for Pages That Don’t Exist, Follow Page Redirections, Use Relative Parsing
and referer values accuracy, Follow Page Redirections
page redirection, Avoid Making Requests for Pages That Don’t Exist
requests for nonexistent pages, Use Relative Parsing
$FETCH_DELAY, Example Spider, exclude_link()
fgets() function, Think About Files, Not Web Pages, Executing Webbots in Browsers
file() function, downloading files with, Executing Webbots in Browsers
file handle, Think About Files, Not Web Pages, Executing Webbots in Browsers
File Transfer Protocol (FTP), FTP Webbots, Example FTP Webbot
server, connecting to, Example FTP Webbot
webbots, FTP Webbots
filesystem, geographically structured, Storing Data in Structured Files
$filter_array, Choosing Data Sources for Webbots, Dealing with CDATA
filtering, Inspiration from Browser Limitations, Dealing with CDATA
by aggregation webbot, Dealing with CDATA
information by relevance, Inspiration from Browser Limitations
Flash, Barriers to Effective Web Scraping, Barriers to Effective Web Scraping, Authenticate Users
barrier to effective webscraping, Barriers to Effective Web Scraping
for deterring webbots, Authenticate Users
for website navigation, problems caused by, Barriers to Effective Web Scraping
fopen() function, Think About Files, Not Web Pages
<form> tag, action attribute, Form Handlers
form data variables, Form Handlers
format of names, Organizing Data
formatted_mail() function, Sending an Email with mail()
forms, Introducing PHP/CURL, Automating Form Submission, Automating Form Submission, Automating Form Submission, Automating Form Submission, Reverse Engineering Form Interfaces, Reverse Engineering Form Interfaces, Reverse Engineering Form Interfaces, Form Handlers, Form Handlers, Methods, The GET Method, Event Triggers, Event Triggers, Event Triggers, Analyzing a Form, Don’t Blow Your Cover, Defining the Interface, Analyzing the Target Web Page, Analyzing the Target Web Page, Analyzing the Target Web Page, Use Relative Parsing
adapting to changes in, Use Relative Parsing
analyzing, Analyzing a Form, Defining the Interface
and cookies, Event Triggers
avoiding errors, Don’t Blow Your Cover
emulation, Automating Form Submission, Automating Form Submission
legal issues and, Automating Form Submission
handlers, Reverse Engineering Form Interfaces
input tags, Form Handlers
interfaces, reverse engineering, Automating Form Submission
main parts, Reverse Engineering Form Interfaces
source code, Analyzing the Target Web Page, Analyzing the Target Web Page
displaying, Analyzing the Target Web Page
saving, Analyzing the Target Web Page
submission, Introducing PHP/CURL, Automating Form Submission, Reverse Engineering Form Interfaces, Form Handlers, Methods, The GET Method, Event Triggers, Analyzing the Target Web Page
data fields in forms, Form Handlers
event triggers, Event Triggers
form handlers, Reverse Engineering Form Interfaces
GET method, Methods
PHP/CURL for, Introducing PHP/CURL
POST method, The GET Method
unpredictability, Event Triggers
fputs() function, Removing Formatting, The Image-Processing Loop, The POP3 RETR Command
From: address field, Sending an Email with mail()
FTP (File Transfer Protocol), FTP Webbots, Example FTP Webbot
server, connecting to, Example FTP Webbot
webbots, FTP Webbots
ftp_cdup() function, PHP and FTP
ftp_chdir() function, PHP and FTP
ftp_delete() function, PHP and FTP
ftp_get() function, PHP and FTP
ftp_mkdir() function, PHP and FTP
ftp_put() function, PHP and FTP
ftp_rawlist() function, PHP and FTP
ftp_rename() function, PHP and FTP
ftp_rmdir() function, PHP and FTP
fully resolved URLs, PHP/CURL and Cookies
functions, Converting a Website into a Function, Converting a Website into a Function, Defining the Interface, Analyzing the Target Web Page, Getting the Session Value, Getting the Session Value
(see also individual function names)
converting website into, Converting a Website into a Function, Defining the Interface, Analyzing the Target Web Page, Getting the Session Value, Getting the Session Value
describe_zipcode() function, Analyzing the Target Web Page
interface definition, Getting the Session Value
submitting form, Getting the Session Value
target page analysis, Defining the Interface

H

hacking, Webbot Developers Are in Demand, Hacking iMacros, Why Design a Stealthy Webbot?
constructive, Webbot Developers Are in Demand
iMacros, Hacking iMacros
webbot activity misinterpreted as, Why Design a Stealthy Webbot?
handle for file, Think About Files, Not Web Pages
handshake process, A Quick Overview of Web Encryption
hard drives, compressing files on, Compressing Inbound Files
hardware requirements, About the Code
harvest, separating from payload, Experimenting with the Spider
harvest_links() function, harvest_links()
hash, Sending an Email with mail()
haystack, Removing Unwanted Text: remove()
<head> tag, detecting redirection, Avoid Making Requests for Pages That Don’t Exist, Use the robots.txt File
header tags, and search engine optimization, Meta Tags
headers, Generating Fully Resolved URLs, The POP3 LIST Command, Avoid Making Requests for Pages That Don’t Exist
in email, The POP3 LIST Command
redirection, Generating Fully Resolved URLs, Avoid Making Requests for Pages That Don’t Exist
Hello World! web page, Think About Files, Not Web Pages
hijacking bandwidth, Binary-Safe Download Routine, Maintain the Accuracy of Referer Values
holidays, scheduling webbots on, Run Your Webbot During Busy Hours
Hormel Foods Corporation, Webbots That Send Email
hotel room prices, aggregating and filtering data, Inspiration from Browser Limitations
href attribute, Parsing a Data Set into an Array: parse_array(), Generating Fully Resolved URLs
extracting value, Generating Fully Resolved URLs
of link tag, parsing, Parsing a Data Set into an Array: parse_array()
HTML (Hypertext Markup Language), Content Is Mixed with Markup, Parsing Text Between Delimiters: return_between(), Parsing a Data Set into an Array: parse_array(), Compressing Files on Your Hard Drive, Keeping Legitimate Mail out of Spam Filters
for formatting email, Keeping Legitimate Mail out of Spam Filters
parsing, Content Is Mixed with Markup, Parsing Text Between Delimiters: return_between(), Parsing a Data Set into an Array: parse_array()
content of reoccurring tags, Parsing a Data Set into an Array: parse_array()
poorly written web pages, Content Is Mixed with Markup
text between tags, Parsing Text Between Delimiters: return_between()
removing formatting, Compressing Files on Your Hard Drive
htmlspecialchars() function, Selectively Allow Access to Specific Web Agents
HTMLTidy (Tidy), Content Is Mixed with Markup, Measuring the Similarity of Strings
HTTP, Think About Files, Not Web Pages, Familiarizing Yourself with the Default Values, The GET Method, Familiarizing Yourself with RSS Feeds, Basic Authentication, Botnet Communication Methods, Status Codes
header, Familiarizing Yourself with the Default Values, The GET Method, Basic Authentication
and security, The GET Method
exchanging cookies in, Basic Authentication
protocol, Think About Files, Not Web Pages, Botnet Communication Methods
port for, Botnet Communication Methods
status codes, Familiarizing Yourself with RSS Feeds, Status Codes
HTTP codes, http_get_withheader(), Status Codes
from http_get_withheader() function, http_get_withheader()
http() routine, Familiarizing Yourself with the Default Values
HTTPS protocol, Webbots and Cryptography
http_get() function, Familiarizing Yourself with the Default Values, Examining LIB_http’s Source Code
http_get_form() function, Examining LIB_http’s Source Code
http_get_form_withheader() function, Examining LIB_http’s Source Code
http_get_withheader() function, http_get(), Examining LIB_http’s Source Code
http_header() function, Examining LIB_http’s Source Code
http_post_form() function, Examining LIB_http’s Source Code, Getting the Session Value
http_post_withheader() function, Examining LIB_http’s Source Code
human patterns, webbot simulation of, Log-Monitoring Software
Hypertext Markup Language, Keeping Legitimate Mail out of Spam Filters (see HTML (Hypertext Markup Language))

I

iMacros, Scraping Difficult Websites with Browser Macros (see browser macros)
image-capturing webbots, Image-Capturing Webbots, Creating the Image-Capturing Webbot, Creating the Image-Capturing Webbot, Binary-Safe Download Routine, Directory Structure
binary-safe download routine, Creating the Image-Capturing Webbot
directory structure, Binary-Safe Download Routine
execution, Creating the Image-Capturing Webbot
main script, Directory Structure
image-processing loop, The Image-Processing Loop
images, The exe_sql() Function, Removing Formatting, Binary-Safe Download Routine
borrowing from other sites, Binary-Safe Download Routine
storing in database, The exe_sql() Function
thumbnailing, Removing Formatting
<img> tags, Removing Unwanted Text: remove(), Defining the Page Base, Header Tags
alt attribute, Header Tags
parsing from downloaded web page, Defining the Page Base
src attribute from array, parsing, Removing Unwanted Text: remove()
incompatible systems, communication on, Plot Unauthorized Wi-Fi Networks, Executing POP3 Commands with a Webbot
index file, for web page, Downloading Web Pages
indexing web pages, by search engine spider, Header Tags
infinite loops, preventing, CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
information, aggregating and filtering by relevance, Inspiration from Browser Limitations
initialization, Example Image-Capturing Webbot, Link-Verification Webbots, How the Search-Ranking Webbot Works
download_images_for page() function, Example Image-Capturing Webbot
link-verification webbot, Link-Verification Webbots
search-ranking script, How the Search-Ranking Webbot Works
input tags in forms, Form Handlers
insert() function, LIB_mysql
insertion parse, Fetching the Search Results
installing, Introducing PHP/CURL, Referer Management, Overcoming Webscraping Barriers with Browser Macros
HTMLTidy, Introducing PHP/CURL
iMacros, Overcoming Webscraping Barriers with Browser Macros
PHP/CURL, Referer Management
intellectual property, Help Out a Busy Executive, Keeping Webbots out of Trouble
law, Keeping Webbots out of Trouble
protecting, Help Out a Busy Executive
interfaces, data-only, Non-ASCII Content
Internet, Old-School Client-Server Technology, Hardware, What’s in It for Business Leaders?, Trespass to Chattels
access to, Hardware
audience for, Old-School Client-Server Technology
customizing for business, What’s in It for Business Leaders?
law, Trespass to Chattels
Internet Explorer, setting webbot name to, Don’t Blow Your Cover
Internet Protocol (IP) addresses, Using Proxies to Become Anonymous
intranet, Hardware
IP (Internet Protocol) addresses, Using Proxies to Become Anonymous

L

landmark, Initialization and Downloading the Target, Initialization and Downloading the Target, Initialization and Downloading the Target, Reasons for Not Using the iMacros Scripting Engine, Maintain the Accuracy of Referer Values
for end of data, Initialization and Downloading the Target
for table heading row, Initialization and Downloading the Target
to identify table, Reasons for Not Using the iMacros Scripting Engine
using least likely to change, Maintain the Accuracy of Referer Values
legal issues, Automating Form Submission, Database or File?, Email, Webbots, and Spam, Fun Things to Do with Unwanted Spiders, Trespass to Chattels
(see also copyright issues)
for email, Email, Webbots, and Spam
in form emulation, Automating Form Submission
Internet, Trespass to Chattels
website policies and, Fun Things to Do with Unwanted Spiders
legitimate mail, keeping out of spam filters, Email, Webbots, and Spam
LIB_download_images library, Example Image-Capturing Webbot
LIB_http library, Introducing PHP/CURL, Referer Management, Familiarizing Yourself with the Default Values, Familiarizing Yourself with the Default Values, http_get_withheader(), Examining LIB_http’s Source Code, Examining LIB_http’s Source Code, Methods, Analyzing a Form, Cookie Session Example
default conditions for, Familiarizing Yourself with the Default Values
downloading with, Introducing PHP/CURL
file for storing cookies, Cookie Session Example
for form analysis emulation, Analyzing a Form
for form emulation, Methods
source code, Referer Management, Familiarizing Yourself with the Default Values, http_get_withheader(), Examining LIB_http’s Source Code, Examining LIB_http’s Source Code
defaults, Referer Management, Examining LIB_http’s Source Code
functions, Familiarizing Yourself with the Default Values, Examining LIB_http’s Source Code
LIB_http_codes library, Displaying the Page Status, Status Codes
LIB_mail library, Sending an Email with mail()
LIB_mysql library, Storing Data in Structured Files, LIB_mysql, LIB_mysql, LIB_mysql, The update() Function
exe_sql() function, LIB_mysql
insert() function, LIB_mysql
update() function, The update() Function
LIB_parse library, Basic Parsing Techniques
LIB_pop3 library, The POP3 RETR Command
LIB_resolve_addresses library, Link-Verification Webbots
LIB_rss library, Familiarizing Yourself with RSS Feeds
LIB_simple_spider library, Example Spider
LIB_thumbnail library, Removing Formatting
lightweight data exchange, XML
$link_array elements, Setting the Page Base
link-verification webbots, Link-Verification Webbots, Link-Verification Webbots, Initializing the Webbot and Downloading the Target, Setting the Page Base, Setting the Page Base, Setting the Page Base, Generating Fully Resolved URLs, Generating Fully Resolved URLs, Displaying the Page Status, Displaying the Page Status, LIB_http_codes
advanced options, LIB_http_codes
displaying page status, Displaying the Page Status
downloading linked page, Generating Fully Resolved URLs
flowchart, Initializing the Webbot and Downloading the Target
generating fully resolved URLs, Generating Fully Resolved URLs
initialization and downloading target, Link-Verification Webbots
parsing links, Setting the Page Base
running, Displaying the Page Status
setting page base, Setting the Page Base
verification loop, Setting the Page Base
links, Parsing a Data Set into an Array: parse_array(), Compressing Files on Your Hard Drive, Defining the Page Base, Link-Verification Webbots, Setting the Page Base, Experimenting with the Spider, Follow Page Redirections
broken, using webbot to detect, Link-Verification Webbots
href attribute of tag, parsing, Parsing a Data Set into an Array: parse_array()
impact of removing HTML tags, Compressing Files on Your Hard Drive
parsing, Setting the Page Base
relative, page base for, Defining the Page Base
saving in database, Experimenting with the Spider
well-defined, and search engine ranking, Follow Page Redirections
Linux, scheduling in, Scheduling Webbots and Spiders
LIST command (POP3), The POP3 LIST Command
Location: line, in HTTP header, Avoid Making Requests for Pages That Don’t Exist
log files, Why Design a Stealthy Webbot?, Error Logs
software for monitoring, Error Logs
webbot detection with, Why Design a Stealthy Webbot?
logging in, to POP3 mail server, The POP3 Protocol
login criteria, Types of Online Authentication

N

naming, Form Handlers, Don’t Blow Your Cover, Organizing Data
conventions, Organizing Data
data fields, Form Handlers
webbots, Don’t Blow Your Cover
National Oceanic and Atmospheric Association (NOAA), Converting a Website into a Function
needle, Removing Unwanted Text: remove()
network, Think About Files, Not Web Pages, Writing Fault-Tolerant Webbots
adapting to outages and congestion, Writing Fault-Tolerant Webbots
socket, Think About Files, Not Web Pages
Next button, simulating person clicking, Fetching the Search Results
NOAA (National Oceanic and Atmospheric Association), Converting a Website into a Function
nofollow option, for robots meta tag, Use the robots.txt File
noindex option, for robots meta tag, Use the robots.txt File
non-ASCII content, and search engine spiders, Non-ASCII Content
nonexistent web pages, Writing Fault-Tolerant Webbots, Use Relative Parsing, CURLOPT_TIMEOUT
avoiding requests for, Writing Fault-Tolerant Webbots
containing forms, Use Relative Parsing
timeouts to deal with, CURLOPT_TIMEOUT
null string, replacing text with, Detecting Whether a String Is Within Another String

P

package-tracking information, Webbots That Read Email
packet sniffer, Final Thoughts, Running a Macro
$page_base variable, Defining the Page Base
page base, Defining the Page Base, Initializing the Webbot and Downloading the Target
defining, Defining the Page Base
setting, Initializing the Webbot and Downloading the Target
page redirection, Avoid Making Requests for Pages That Don’t Exist, Selectively Allow Access to Specific Web Agents, CURLOPT_URL
CURLOPT_FOLLOWLOCATION option for, CURLOPT_URL
for deterring webbots, Selectively Allow Access to Specific Web Agents
page signature, Sending an Email with mail()
paid placements in search results, Search-Ranking Webbots
parse tolerance, Maintain the Accuracy of Referer Values
parse_array() function, Parsing a Data Set into an Array: parse_array(), preg_match_all(pattern, subject, result_array), Regular Expressions Provide Little (If Any) Context, Designing the Parsing Script, Defining the Page Base, Setting the Page Base, Parsing the Search Results
parsing, Basic Parsing Techniques, Content Is Mixed with Markup, Using LIB_parse, Parsing Text Between Delimiters: return_between(), Parsing a Data Set into an Array: parse_array(), Parsing a Data Set into an Array: parse_array(), Parsing a Data Set into an Array: parse_array(), Detecting Whether a String Is Within Another String, Measuring the Similarity of Strings, Advanced Parsing with Regular Expressions, Defining the Page Base, Setting the Page Base, Maintain the Accuracy of Referer Values
attribute values, Parsing a Data Set into an Array: parse_array()
data set into array, Parsing a Data Set into an Array: parse_array()
image tags from downloaded web page, Defining the Page Base
links, Setting the Page Base
poorly written HTML, Measuring the Similarity of Strings
position vs. relative, Maintain the Accuracy of Referer Values
src attribute, from array of <img> tags, Parsing a Data Set into an Array: parse_array()
standard routines for, Content Is Mixed with Markup
text between delimiters, Parsing Text Between Delimiters: return_between()
unformatted text, Detecting Whether a String Is Within Another String
with LIB_parse, Using LIB_parse
with regular expressions, Advanced Parsing with Regular Expressions
passwords, Types of Online Authentication, Authenticate Users
for deterring webbots, Authenticate Users
pattern matching, with regular expressions, Pattern Matching, the Key to Regular Expressions, Parsing Numbers, Parsing Numbers, Parsing Numbers, Matching Alpha Characters, Matching Alpha Characters, Specifying Alternate Matches, Specifying Alternate Matches
alpha, Parsing Numbers
alternate matches, Matching Alpha Characters
character sets, Parsing Numbers
grouping, Specifying Alternate Matches
numbers, Parsing Numbers
ranges, Specifying Alternate Matches
wildcards, Matching Alpha Characters
pay-per-click advertising, Internet Law
payload for spider, Example Spider, Experimenting with the Spider, Save Links in a Database
separating from harvest, Save Links in a Database
PEAR (PHP Extension and Application Repository), A Safer Method of Passing Variables to Webbots
penetration level for spider, Spiders
period (.), as POP3 end-of-message indicator, The POP3 LIST Command
periodicity of webbots, Scheduling a Webbot to Run Daily, Final Thoughts
permanent cookies, Advanced Cookie Management
persistence with cookies, Advanced Cookie Management
phishing attack, Email, Webbots, and Spam
phone numbers, parsing with regular expressions, Specifying Alternate Matches
PHP, About the Website, Hardware, Hardware, Downloading Web Pages, Think About Files, Not Web Pages, Removing Unwanted Text: remove(), Compressing Data, PHP and FTP, Email, Webbots, and Spam, Webbots and Cryptography, A Safer Method of Passing Variables to Webbots
and FTP, PHP and FTP
and SSL, Webbots and Cryptography
configuring to send email, Email, Webbots, and Spam
downloading, Downloading Web Pages, Think About Files, Not Web Pages
with built-in functions, Think About Files, Not Web Pages
with scripts, Downloading Web Pages
functions, Removing Unwanted Text: remove(), Compressing Data
for compressing data, Compressing Data
version 5 support for SOAP, A Safer Method of Passing Variables to Webbots
website, Hardware
PHP Extension and Application Repository (PEAR), A Safer Method of Passing Variables to Webbots
php.ini file, editing to show mail server location, Email, Webbots, and Spam
PHP/CURL, Introducing PHP/CURL, Introducing PHP/CURL, Referer Management, Webbots and Cryptography, A Quick Overview of Web Encryption, Basic Authentication, Avoid Making Requests for Pages That Don’t Exist, PHP/CURL Reference, Creating a Minimal PHP/CURL Session, Creating a Minimal PHP/CURL Session, CURLOPT_URL, Executing the PHP/CURL Command, Executing the PHP/CURL Command, Viewing PHP/CURL Errors
and certificates, A Quick Overview of Web Encryption
and cookies, Basic Authentication
downloading with, Introducing PHP/CURL
encryption and, Webbots and Cryptography
for following header redirections, Avoid Making Requests for Pages That Don’t Exist, CURLOPT_URL
installing, Referer Management
sessions, PHP/CURL Reference, Creating a Minimal PHP/CURL Session, Executing the PHP/CURL Command, Viewing PHP/CURL Errors
closing, Viewing PHP/CURL Errors
creating minimal, PHP/CURL Reference
initiating, Creating a Minimal PHP/CURL Session
retrieving information about, Executing the PHP/CURL Command
setting options, Creating a Minimal PHP/CURL Session
viewing errors, Executing the PHP/CURL Command
plotting Wi-Fi networks, Plot Unauthorized Wi-Fi Networks
pokerbots, Webbots That Interpret What They Find Online
POP3 protocol (Post Office Protocol 3), The POP3 Protocol, The POP3 Protocol, The POP3 RETR Command
authentication failure, The POP3 Protocol
executing commands with webbots, The POP3 RETR Command
port, The POP3 Protocol, Webbots and Cryptography
for HTTP and HTTPS protocols, Webbots and Cryptography
for POP3 server, The POP3 Protocol
position parsing, avoiding, Maintain the Accuracy of Referer Values
$_POST array, Avoid Form Errors
POST method, The GET Method, Access Logs
and errors, Access Logs
Post Office Protocol 3 (POP3), The POP3 Protocol, The POP3 Protocol, The POP3 RETR Command
authentication failure, The POP3 Protocol
executing commands with webbots, The POP3 RETR Command
preg_match() function, preg_replace(pattern, replacement, subject)
preg_match_all() function, preg_replace(pattern, replacement, subject)
preg_replace() function, Pattern Matching, the Key to Regular Expressions
preg_split() function, preg_match_all(pattern, subject, result_array)
price-monitoring webbots, Price-Monitoring Webbots, Price-Monitoring Webbots, Initialization and Downloading the Target
parsing script, Initialization and Downloading the Target
target, Price-Monitoring Webbots
procurement bot, Procurement Webbots and Snipers, Procurement Webbot Theory, Authenticate Buyer, Evaluate Results
purchase criteria, Evaluate Results
purchase triggers, Authenticate Buyer
theory, Procurement Webbot Theory
project ideas, Ideas for Webbot Projects, Webbots That Interpret What They Find Online, Webbots That Interpret What They Find Online, Webbots That Act on Your Behalf, Help Out a Busy Executive, Help Out a Busy Executive, Protect Intellectual Property, Protect Intellectual Property, Plot Unauthorized Wi-Fi Networks, Plot Unauthorized Wi-Fi Networks, Plot Unauthorized Wi-Fi Networks
automating tasks, Help Out a Busy Executive
communicating on incompatible systems, Plot Unauthorized Wi-Fi Networks
consolidating industry news articles, Webbots That Act on Your Behalf
intellectual property protection, Help Out a Busy Executive
online clipping service, Protect Intellectual Property
plotting Wi-Fi networks, Plot Unauthorized Wi-Fi Networks
pokerbots, Webbots That Interpret What They Find Online
tracking web technologies, Plot Unauthorized Wi-Fi Networks
verifying access rights, Protect Intellectual Property
WebSiteOptimization.com, Webbots That Interpret What They Find Online
projects, Price-Monitoring Webbots, Image-Capturing Webbots, Link-Verification Webbots, Search-Ranking Webbots, Aggregation Webbots, FTP Webbots, Webbots That Read Email, Email-Controlled Webbots, Converting a Website into a Function
aggregation webbots, Aggregation Webbots
converting website into function, Converting a Website into a Function
FTP webbots, FTP Webbots
image-capturing webbots, Image-Capturing Webbots
link-verification webbots, Link-Verification Webbots
price-monitoring webbots, Price-Monitoring Webbots
reading email with webbots, Webbots That Read Email
search-ranking webbots, Search-Ranking Webbots
sending email with webbots, Email-Controlled Webbots
proxies, Authentication with Query Sessions, Proxies, Proxies, Proxies in the Virtual World, Proxies in the Virtual World, Using a Proxy to Be Somewhere Else, Using a Proxy to Be Somewhere Else, Using a Proxy to Be Somewhere Else, Using a Proxy in a Browser, Using a Proxy in a Browser, Types of Open Proxies, Types of Open Proxies, Types of Open Proxies, Types of Open Proxies, Types of Open Proxies, Tor, Configuring PHP/CURL to Use Tor, Configuring PHP/CURL to Use Tor, Configuring PHP/CURL to Use Tor, Commercial Proxies
commercial, Configuring PHP/CURL to Use Tor
cookie restrictions with, Authentication with Query Sessions
creating a service, Commercial Proxies
defined, Proxies
listing services, Types of Open Proxies
open, Using a Proxy to Be Somewhere Else, Types of Open Proxies, Types of Open Proxies, Types of Open Proxies, Types of Open Proxies
anonymous, Types of Open Proxies
dark side of, Types of Open Proxies
spoofing, Types of Open Proxies
transparent, Types of Open Proxies
reasons developers use, Proxies in the Virtual World, Proxies in the Virtual World, Using a Proxy to Be Somewhere Else
anonymity, Proxies in the Virtual World
relocation, Using a Proxy to Be Somewhere Else
Tor, Tor, Configuring PHP/CURL to Use Tor, Configuring PHP/CURL to Use Tor
configuration for PHP/CURL, Configuring PHP/CURL to Use Tor
disadvantages of, Configuring PHP/CURL to Use Tor
using, Using a Proxy to Be Somewhere Else, Using a Proxy in a Browser, Using a Proxy in a Browser
in a browser, Using a Proxy in a Browser
with PHP/CURL, Using a Proxy in a Browser
public, capitalizing on inexperience with webbots, What’s in It for Business Leaders?
purchase, Procurement Webbot Theory, Authenticate Buyer
criteria, for procurement bot, Procurement Webbot Theory
triggers, for procurement bot, Authenticate Buyer

Q

query string sessions, authentication with, Cookie Session Example
question mark (?), in GET method, Methods
QUIT command (POP3), The POP3 RETR Command

R

random delay, Fetching the Search Results
ranking web pages, by search engine spider, Optimizing Web Pages for Search Engine Spiders
reading mail from POP3 server, Webbots That Read Email
Real Simple Syndication (RSS) feed, Choosing Data Sources for Webbots, Example Aggregation Webbot
realm, Basic Authentication
redirection, Cookies, Avoid Making Requests for Pages That Don’t Exist, Selectively Allow Access to Specific Web Agents, CURLOPT_URL
CURLOPT_FOLLOWLOCATION option for, CURLOPT_URL
for deterring webbots, Selectively Allow Access to Specific Web Agents
with PHP/CURL, Cookies
references to image files, storing, Database or File?
referer, Referer Management, http_get(), Analyzing a Form, Binary-Safe Download Routine, CURLOPT_URL
management, with PHP/CURL, Referer Management
variable, http_get(), Analyzing a Form, Binary-Safe Download Routine, CURLOPT_URL
regular expressions, Using LIB_parse, Using LIB_parse, Don’t Render Parsed Text While Debugging, Advanced Parsing with Regular Expressions, Advanced Parsing with Regular Expressions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, preg_replace(pattern, replacement, subject), preg_replace(pattern, replacement, subject), preg_match_all(pattern, subject, result_array), preg_match_all(pattern, subject, result_array), Parsing Numbers, Parsing Numbers, Parsing Numbers, Matching Alpha Characters, Matching Alpha Characters, Specifying Alternate Matches, Specifying Alternate Matches, Specifying Alternate Matches, When Regular Expressions Are (or Aren’t) the Right Parsing Tool, When Regular Expressions Are (or Aren’t) the Right Parsing Tool, When Regular Expressions Are (or Aren’t) the Right Parsing Tool, Regular Expressions Provide Little (If Any) Context, Regular Expressions Provide Little (If Any) Context, Regular Expressions Provide Little (If Any) Context, Regular Expressions Complicate Your Code
advanced parsing with, Advanced Parsing with Regular Expressions
avoiding, Using LIB_parse, Don’t Render Parsed Text While Debugging
disadvantages of, When Regular Expressions Are (or Aren’t) the Right Parsing Tool, When Regular Expressions Are (or Aren’t) the Right Parsing Tool, Regular Expressions Provide Little (If Any) Context, Regular Expressions Provide Little (If Any) Context, Regular Expressions Provide Little (If Any) Context
complicating code, Regular Expressions Provide Little (If Any) Context
confusing choices, Regular Expressions Provide Little (If Any) Context
difficulty debugging, Regular Expressions Provide Little (If Any) Context
lack of context, When Regular Expressions Are (or Aren’t) the Right Parsing Tool
functions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, preg_replace(pattern, replacement, subject), preg_replace(pattern, replacement, subject), preg_match_all(pattern, subject, result_array), preg_match_all(pattern, subject, result_array)
preg_match(), preg_replace(pattern, replacement, subject)
preg_match_all(), preg_replace(pattern, replacement, subject)
preg_replace(), Pattern Matching, the Key to Regular Expressions
preg_split(), preg_match_all(pattern, subject, result_array)
resemblance to PHP built-in, preg_match_all(pattern, subject, result_array)
parsing phone numbers with, Specifying Alternate Matches
pattern matching with, Pattern Matching, the Key to Regular Expressions, Parsing Numbers, Parsing Numbers, Parsing Numbers, Matching Alpha Characters, Matching Alpha Characters, Specifying Alternate Matches, Specifying Alternate Matches
alpha, Parsing Numbers
alternate matches, Matching Alpha Characters
character sets, Parsing Numbers
grouping, Specifying Alternate Matches
numbers, Parsing Numbers
ranges, Specifying Alternate Matches
wildcards, Matching Alpha Characters
speed of, vs. PHP built-in functions, Regular Expressions Complicate Your Code
types of, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions, Pattern Matching, the Key to Regular Expressions
PCRE, Pattern Matching, the Key to Regular Expressions
POSIX, Pattern Matching, the Key to Regular Expressions
when to use, When Regular Expressions Are (or Aren’t) the Right Parsing Tool
relational database, Managing Large Amounts of Data
relative links, page base for, Defining the Page Base, Initializing the Webbot and Downloading the Target, Setting the Page Base, LIB_http_codes
relay host, Configuring PHP to Send Mail
relevance, aggregating and filtering information by, Ideas for Webbot Projects
Remote Procedure Call (RPC), A Safer Method of Passing Variables to Webbots
remote server, using PHP/CURL to execute webbot on, Scheduling Webbots and Spiders
remove() function, Removing Unwanted Text: remove()
replacing portion of string, Detecting Whether a String Is Within Another String
Reply-to: address field, Sending an Email with mail()
Representational State Transfer (REST), Advantages of SOAP
resolve_address() function, Generating Fully Resolved URLs
resources, distributing, Parsing and Returning the Result
respect, Keeping Webbots out of Trouble
REST (Representational State Transfer), Advantages of SOAP
$result array, FILE element, preg_replace(pattern, replacement, subject)
RETR command (POP3), The POP3 LIST Command
Return-path: address field, Sending an Email with mail()
return_between() function, Parsing Text Between Delimiters: return_between(), Regular Expressions Provide Little (If Any) Context, Downloading and Parsing the Target
reverse engineering form interfaces, Automating Form Submission
robot exclusion file, Use the robots.txt File
robots meta tag, Use the robots.txt File
robots.txt file, Use the robots.txt File
root, Defining the Page Base, archive_links()
directory, creating for imported file structure, Defining the Page Base
domain, parsing from target URL, archive_links()
RPC (Remote Procedure Call), A Safer Method of Passing Variables to Webbots
RSET command (POP3), The POP3 RETR Command
RSS (Real Simple Syndication) feed, Choosing Data Sources for Webbots, Example Aggregation Webbot

S

sale item, verifying availability, Authenticate Buyer
saving, Defining the Interface, Experimenting with the Spider
links in database, Experimenting with the Spider
source code for form, Defining the Interface
scaling, Deployment and Scaling, Deployment and Scaling, Deployment and Scaling, One-to-One Environment, One-to-One Environment, Many-to-One Environment, Many-to-One Environment, The Problems with Scaling Too Well, The Problems with Scaling Too Well, Forking Processes, Forking Processes
(see also botnet management)
causing DoS attacks, Many-to-One Environment
environments, Deployment and Scaling, Deployment and Scaling, One-to-One Environment, One-to-One Environment, Many-to-One Environment
many-to-many, One-to-One Environment
many-to-one, Many-to-One Environment
one-to-many, Deployment and Scaling
one-to-one, One-to-One Environment
multiple instances, creating, The Problems with Scaling Too Well, The Problems with Scaling Too Well, Forking Processes, Forking Processes
distributing tasks, Forking Processes
forking, The Problems with Scaling Too Well
leveraging the operating system, Forking Processes
scheduling, Regulate Page Requests, Scheduling Webbots and Spiders, Scheduling Webbots and Spiders, Scheduling a Webbot to Run Daily, Scheduling a Webbot to Run Daily, Complex Schedules, The Windows 7 Task Scheduler, The Windows 7 Task Scheduler, Final Thoughts, Run Your Webbot During Busy Hours, Error Handlers
adding variety to, Final Thoughts
and stealth, Run Your Webbot During Busy Hours
complex, Scheduling a Webbot to Run Daily
disabling, Error Handlers
for distributed spider, Regulate Page Requests
webbots to run daily, Scheduling a Webbot to Run Daily
webbots to run monthly, Complex Schedules
Windows 7 Task Scheduler, The Windows 7 Task Scheduler, The Windows 7 Task Scheduler
Windows XP Task Scheduler, Scheduling Webbots and Spiders
scraping, difficult websites, Scraping Difficult Websites with Browser Macros
scripts, What to Expect from This Book, About the Website, Measuring the Similarity of Strings
writing in small steps, Measuring the Similarity of Strings
search engine, Search-Ranking Webbots, Parsing the Search Results, Spiders, Many-to-One Environment, Designing Webbot-Friendly Websites, Optimizing Web Pages for Search Engine Spiders, Killing Spiders, Embed Text in Other Media
optimization, Search-Ranking Webbots, Many-to-One Environment, Designing Webbot-Friendly Websites
spiders, Spiders, Optimizing Web Pages for Search Engine Spiders, Embed Text in Other Media
design techniques hindering, Embed Text in Other Media
indexing web pages with, Optimizing Web Pages for Search Engine Spiders
Terms of Service agreement, Parsing the Search Results, Killing Spiders
search results page, parts of, Search-Ranking Webbots
search term, in URL, Initializing Variables
search-ranking webbots, Search-Ranking Webbots, Search-Ranking Webbots, What the Search-Ranking Webbot Does, What the Search-Ranking Webbot Does, What the Search-Ranking Webbot Does, How the Search-Ranking Webbot Works, Initializing Variables, Fetching the Search Results, Fetching the Search Results
fetching search results, Fetching the Search Results
how they work, What the Search-Ranking Webbot Does
initializing variables, How the Search-Ranking Webbot Works
parsing search results, Fetching the Search Results
running, What the Search-Ranking Webbot Does
search results page description, Search-Ranking Webbots
starting loop, Initializing Variables
what they do, What the Search-Ranking Webbot Does
Secure Sockets Layer (SSL), Creating the Image-Capturing Webbot, Webbots and Cryptography, A Quick Overview of Web Encryption, A Quick Overview of Web Encryption
CURLOPT_SSL_VERIFYHOST option for, A Quick Overview of Web Encryption
CURLOPT_SSL_VERIFYPEER option for, A Quick Overview of Web Encryption
sites, downloading images from, Creating the Image-Capturing Webbot
seed URL, Spiders
sending email, Webbots That Send Email
server, Don’t Blow Your Cover, Authenticate Buyer, Scheduling Webbots and Spiders, Trespass to Chattels
avoiding undue load on target, Trespass to Chattels
error log, form errors in, Don’t Blow Your Cover
obtaining clock value, Authenticate Buyer
remote, using PHP/CURL to execute webbot on, Scheduling Webbots and Spiders
session, Form Handlers, Analyzing the Target Web Page, Basic Authentication, Using a Proxy in a Browser
authentication, Basic Authentication
ID, forms with, Form Handlers
value, dynamically assigned, Analyzing the Target Web Page
with proxies, Using a Proxy in a Browser
set_time_limit() function, Example Spider, Adapting to Network Outages and Network Congestion
Short Message Service (SMS), Using Returned Emails to Prune Access Lists, SMS Gateways
Simple Object Access Protocol (SOAP), Distributing Resources, A Safer Method of Passing Variables to Webbots
simulating action of person, Log-Monitoring Software
single points of failure, avoiding, Final Thoughts
size reduction, Database or File?, Database or File?, Compressing Data, Compressing Files on Your Hard Drive
data compression, Compressing Data
removing formatting, Compressing Files on Your Hard Drive
storing references to image files, Database or File?
SMS (Short Message Service), Using Returned Emails to Prune Access Lists, SMS Gateways
snipers, Procurement Webbots and Snipers, Authenticate Buyer, Authenticate Buyer, Time to Bid?
authentication, Authenticate Buyer
clock synchronization, Authenticate Buyer
testing, Time to Bid?
SOAP (Simple Object Access Protocol), Distributing Resources, A Safer Method of Passing Variables to Webbots
socket management, with PHP/CURL, Referer Management
software, Hardware, Error Logs
for monitoring logs, Error Logs
requirements for, Hardware
source code, LIB_mysql, Defining the Interface, Analyzing the Target Web Page
configuration area of LIB_mysql, LIB_mysql
for form, Defining the Interface, Analyzing the Target Web Page
displaying, Defining the Interface
saving, Analyzing the Target Web Page
spam, Webbots That Send Email, Email, Webbots, and Spam, Writing a Webbot That Sends Email Notifications, Distributing the Task over Multiple Computers, Optimizing Web Pages for Search Engine Spiders, Meta Tags, Trespass to Chattels
filters, Email, Webbots, and Spam
keeping legitimate mail out of, Writing a Webbot That Sends Email Notifications
keywords, Meta Tags
law, Trespass to Chattels
spam indexing, Optimizing Web Pages for Search Engine Spiders
special characters, Initializing Variables, Selectively Allow Access to Specific Web Agents
spiders, Parsing the Search Results, Spiders, Spiders, Spiders, Spiders, Example Spider, Example Spider, harvest_links(), archive_links(), archive_links(), get_domain(), exclude_link(), Experimenting with the Spider, Experimenting with the Spider, Save Links in a Database, Regulate Page Requests, Embed Text in Other Media, Fun Things to Do with Unwanted Spiders, Fun Things to Do with Unwanted Spiders
adding payload, Experimenting with the Spider
distributing tasks across multiple computers, Save Links in a Database
examples, Example Spider
experimenting with, exclude_link()
how they work, Spiders
LIB_simple_spider library, Example Spider, harvest_links(), archive_links(), archive_links(), get_domain()
archive_links() function, archive_links()
exclude_link() function, get_domain()
get_domain() function, archive_links()
harvest_links() function, harvest_links()
maximum penetration level for, Spiders
of search engines, Parsing the Search Results
options for treating unwanted, Fun Things to Do with Unwanted Spiders
potential ideas for, Spiders
regulating page requests of, Regulate Page Requests
saving links in database, Experimenting with the Spider
setting traps for, Embed Text in Other Media
what to do with unwanted, Fun Things to Do with Unwanted Spiders
splitting string, at delimiter, Using LIB_parse
split_string() function, Using LIB_parse
SQL (Structured Query Language), Storing Data in Structured Files
src attribute, from array of <img> tags, parsing, Removing Unwanted Text: remove()
SSL (Secure Sockets Layer), Creating the Image-Capturing Webbot, Webbots and Cryptography, A Quick Overview of Web Encryption, A Quick Overview of Web Encryption
CURLOPT_SSL_VERIFYHOST option for, A Quick Overview of Web Encryption
CURLOPT_SSL_VERIFYPEER option for, A Quick Overview of Web Encryption
sites, downloading images from, Creating the Image-Capturing Webbot
$status_code_array, Displaying the Page Status
status codes, Status Codes, Status Codes, NNTP Codes
HTTP, Status Codes
NNTP, NNTP Codes
status messages, quantity created in file transfer, CURLOPT_VERBOSE
status of request, from http_get_withheader() function, http_get()
stealth, Designing Stealthy Webbots and Spiders, Designing Stealthy Webbots and Spiders, Log-Monitoring Software, Run Your Webbot During Busy Hours
and scheduling, Run Your Webbot During Busy Hours
reasons for, Designing Stealthy Webbots and Spiders
simulating human patterns in order to achieve, Log-Monitoring Software
Stenberg, Daniel, Introducing PHP/CURL
strings, Basic Parsing Techniques, Removing Unwanted Text: remove(), Detecting Whether a String Is Within Another String, Measuring the Similarity of Strings
detecting within strings, Removing Unwanted Text: remove()
measuring similarity of, Measuring the Similarity of Strings
replacing portion of, Detecting Whether a String Is Within Another String
splitting at delimiter, Basic Parsing Techniques
strip_cdata_tags() function, Using LIB_parse
strip_tags() function, Regular Expressions Provide Little (If Any) Context, Compressing Files on Your Hard Drive
stristr() function, Removing Unwanted Text: remove()
strops() function, Parsing the Search Results
strstr() function, Detecting Whether a String Is Within Another String
structured files, Naming Conventions
Structured Query Language (SQL), Storing Data in Structured Files
str_replace() function, Detecting Whether a String Is Within Another String
submit button, Form Handlers
substr() function, preg_match_all(pattern, subject, result_array), Parsing the Search Results
synchronization, Plot Unauthorized Wi-Fi Networks, Authenticate Buyer
of clocks for snipers, Authenticate Buyer

T

tables, Initialization and Downloading the Target, Maintain the Accuracy of Referer Values
parsing data in, Initialization and Downloading the Target
using landmarks to identify, Maintain the Accuracy of Referer Values
tags, What to Expect from This Book (see individual tag names)
target URL, defining for PHP/CURL session, CURLOPT_URL
targets, What to Expect from This Book, Example Image-Capturing Webbot, Directory Structure
validation in download_images_for_page() function, Example Image-Capturing Webbot, Directory Structure
Task Scheduler (Windows 7), The Windows 7 Task Scheduler
Task Scheduler (Windows XP), Scheduling Webbots and Spiders, Scheduling a Webbot to Run Daily
complex scheduling, Scheduling a Webbot to Run Daily
tasks, automating, Help Out a Busy Executive
Telnet, Old-School Client-Server Technology, Introducing PHP/CURL, The POP3 Protocol
for executing POP3 commands, The POP3 Protocol
temporary cookies, Advanced Cookie Management, PHP/CURL and Cookies
purging, PHP/CURL and Cookies
Terms of Service agreements, Search-Ranking Webbots, Parsing the Search Results, Killing Spiders
for search engines, Search-Ranking Webbots
text, Removing Unwanted Text: remove(), Detecting Whether a String Is Within Another String, Storing Data in Structured Files, Using Returned Emails to Prune Access Lists, Authenticate Users, SMS Gateways
embedding in other media, Authenticate Users
messaging, Using Returned Emails to Prune Access Lists, SMS Gateways
parsing unformatted, Detecting Whether a String Is Within Another String
removing unwanted, Removing Unwanted Text: remove()
storing in database, Storing Data in Structured Files
thumbnailing images, Removing Formatting
Tidy (HTMLTidy), Content Is Mixed with Markup, Measuring the Similarity of Strings
time, Displaying the Page Status, Run Your Webbot During Busy Hours
required for downloading linked pages, Displaying the Page Status
running webbot during busy, Run Your Webbot During Busy Hours
timeout, Familiarizing Yourself with the Default Values, Example Spider, Adapting to Changes in Cookie Management, Adapting to Changes in Cookie Management, Adapting to Network Outages and Network Congestion, CURLOPT_TIMEOUT, CURLOPT_TIMEOUT
curl_setopt() function for, Adapting to Changes in Cookie Management, CURLOPT_TIMEOUT
default for, Familiarizing Yourself with the Default Values, Example Spider
and spiders, Example Spider
for PHP/CURL, Adapting to Changes in Cookie Management, CURLOPT_TIMEOUT
in PHP, changing, Adapting to Network Outages and Network Congestion
timestamp, Unix, Analyzing the Target Web Page
<title> tag, and spiders, Optimizing Web Pages for Search Engine Spiders
TLS (Transport Layer Security), Webbots and Cryptography
Tor, Tor, Configuring PHP/CURL to Use Tor, Configuring PHP/CURL to Use Tor
configuration for PHP/CURL, Configuring PHP/CURL to Use Tor
disadvantages of, Configuring PHP/CURL to Use Tor
tracking web technologies, Plot Unauthorized Wi-Fi Networks
TrackRates.com, Inspiration from Browser Limitations
transactional websites, Further Exploration
transfer protocols, PHP/CURL support for, Introducing PHP/CURL
Transport Layer Security (TLS), Webbots and Cryptography
trespass-to-chattels law, Reasons for Not Using the iMacros Scripting Engine, The Problems with Scaling Too Well, Final Thoughts, You Can Use Some Material Under Fair Use Laws
triggers, non-calendar-based, The Windows 7 Task Scheduler
trim() function, Regular Expressions Provide Little (If Any) Context
Tysver, Daniel A., It’s All About Respect

V

validation point, for downloaded web page, Avoid Making Requests for Pages That Don’t Exist
variables, passing to webbots, A Safer Method of Passing Variables to Webbots
verification loop, Initializing the Webbot and Downloading the Target
Virginia, Anti-Spam Law, Trespass to Chattels
virtual private networks (VPNs), Types of Online Authentication
virtual property, laws governing, Trespass to Chattels
VPNs (virtual private networks), Types of Online Authentication

W

W3C (World Wide Web Consortium), HTTP codes, Generating Fully Resolved URLs
weather forecasts, Converting a Website into a Function
web agents, selectively allowing access to specific, Use the robots.txt File
web pages, Content Is Mixed with Markup, Defining the Page Base, Generating Fully Resolved URLs, Sending an Email with mail(), Writing Fault-Tolerant Webbots, Avoid Making Requests for Pages That Don’t Exist, Maintain the Accuracy of Referer Values, Designing Webbot-Friendly Websites, Optimizing Web Pages for Search Engine Spiders, Status Codes
accessibility to webbots, Designing Webbot-Friendly Websites
adapting to content changes, Maintain the Accuracy of Referer Values
avoiding requests for nonexistent, Writing Fault-Tolerant Webbots
displaying status of, Generating Fully Resolved URLs
notification of change in, Sending an Email with mail()
parsing image tags from downloaded, Defining the Page Base
poorly written HTML within, Content Is Mixed with Markup
ranking by search engine spider, Optimizing Web Pages for Search Engine Spiders
status of request for, Status Codes
validation point for, Avoid Making Requests for Pages That Don’t Exist
web services, XML, A Safer Method of Passing Variables to Webbots
designing custom lightweight, XML
web spiders, Spiders (see spiders)
web technologies, tracking, Plot Unauthorized Wi-Fi Networks
web walkers, Spiders (see spiders)
webbots (web robots), Old-School Client-Server Technology, What’s in It for You?, Uncovering the Internet’s True Potential, Uncovering the Internet’s True Potential, Webbot Developers Are in Demand, Ideas for Webbot Projects, Think About Files, Not Web Pages, Think About Files, Not Web Pages, Executing Webbots in Command Shells, Executing Webbots in Command Shells, Parsing the Search Results, Webbots That Read Email, The POP3 Protocol, The POP3 RETR Command, Webbots That Send Email, Parsing and Returning the Result, Time to Bid?, PHP/CURL and Cookies, Scheduling Webbots and Spiders, Scheduling a Webbot to Run Daily, Scheduling a Webbot to Run Daily, Complex Schedules, Final Thoughts, Reasons for Not Using the iMacros Scripting Engine, The Problems with Scaling Too Well, Designing Stealthy Webbots and Spiders, Log-Monitoring Software, Run Your Webbot During Busy Hours, Final Thoughts, Writing Fault-Tolerant Webbots, Killing Spiders, Killing Spiders, Killing Spiders, Use the robots.txt File, Use the robots.txt File, Use the robots.txt File, Selectively Allow Access to Specific Web Agents, Selectively Allow Access to Specific Web Agents, Authenticate Users, Embed Text in Other Media, Keeping Webbots out of Trouble, You Can Use Some Material Under Fair Use Laws
and trespass-to-chattels law, Reasons for Not Using the iMacros Scripting Engine, The Problems with Scaling Too Well, Final Thoughts, You Can Use Some Material Under Fair Use Laws
benefits of, What’s in It for You?, Uncovering the Internet’s True Potential, Webbot Developers Are in Demand
for business leaders, Webbot Developers Are in Demand
for developers, Uncovering the Internet’s True Potential
cookies and design of, PHP/CURL and Cookies
countermeasures for, Parsing the Search Results, Killing Spiders, Killing Spiders, Killing Spiders, Use the robots.txt File, Use the robots.txt File, Use the robots.txt File, Selectively Allow Access to Specific Web Agents, Selectively Allow Access to Specific Web Agents, Authenticate Users
allowing selective access to specific agents, Use the robots.txt File
embedding text in other media, Authenticate Users
obfuscation, Selectively Allow Access to Specific Web Agents
reasons for, Killing Spiders
robots meta tag, Use the robots.txt File
robots.txt file, Use the robots.txt File
Terms of Service agreements, Parsing the Search Results, Killing Spiders
with cookies, encryption, JavaScript, and redirection, Selectively Allow Access to Specific Web Agents
creating first script, Think About Files, Not Web Pages
daily scheduling of, Scheduling a Webbot to Run Daily
executing, Executing Webbots in Command Shells, Executing Webbots in Command Shells
in browsers, Executing Webbots in Command Shells
in command shell, Executing Webbots in Command Shells
fault-tolerant, Writing Fault-Tolerant Webbots
for reading email, Webbots That Read Email, The POP3 Protocol, The POP3 RETR Command
and executing POP3 commands, The POP3 RETR Command
and POP3 protocol, The POP3 Protocol
for sending email, Webbots That Send Email
growth in use, Uncovering the Internet’s True Potential
monthly scheduling of, Complex Schedules
periodicity of, Scheduling a Webbot to Run Daily, Final Thoughts
preparing to run as scheduled tasks, Scheduling Webbots and Spiders
preventing negative consequences of, Keeping Webbots out of Trouble
(see also copyright issues)
project ideas, Ideas for Webbot Projects
reasons for stealth, Designing Stealthy Webbots and Spiders
script, creating first, Think About Files, Not Web Pages
setting traps, Embed Text in Other Media
simulating human patterns, Log-Monitoring Software
spreading burden of running complex, Parsing and Returning the Result
testing, Time to Bid?
weekend scheduling of, Run Your Webbot During Busy Hours
webbot_error_handler() function, Adapting to Network Outages and Network Congestion
WEBBOT_NAME constant, Don’t Blow Your Cover
WebSiteOptimization.com, Webbots That Interpret What They Find Online
websites, About the Website, Webbots That Interpret What They Find Online, Converting a Website into a Function, Further Exploration, Authentication
converting into functions, Converting a Website into a Function
for book, About the Website
limiting access to, Authentication
optimizing performance of, Webbots That Interpret What They Find Online
transactional, Further Exploration
weekends, scheduling webbots not to run on, Run Your Webbot During Busy Hours
well-defined links, for search engine optimization, Optimizing Web Pages for Search Engine Spiders
white space, deleting, Detecting Whether a String Is Within Another String, Removing Formatting
Wi-Fi networks, plotting, Plot Unauthorized Wi-Fi Networks
Windows Task Scheduler, Scheduling Webbots and Spiders, Scheduling a Webbot to Run Daily, The Windows 7 Task Scheduler
Windows 7, The Windows 7 Task Scheduler
Windows XP, Scheduling Webbots and Spiders, Scheduling a Webbot to Run Daily
complex scheduling, Scheduling a Webbot to Run Daily
wireless subscriber, mail server, Sending Text Messages
World Wide Web, Introduction
World Wide Web Consortium (W3C), HTTP codes, Generating Fully Resolved URLs
wrapper function, using PHP/CURL within, Referer Management

Y

Yahoo!, spiders used by, Spiders