Spiders

Spiders are classes that define the way to navigate through a specific site or domain and how to extract data from those pages; that is, we define in a personalized way the behavior to analyze the pages of a particular site.

The cycle that follows a spider is the following:

These requests will be made by downloading by Scrapy and their responses manipulated by the backward functions. In the backward functions, we analyze the content typically using the selectors (XPath selectors) and generate the items with the content analyzed. Finally, the items returned by the spider can be passed to an item pipeline.