Creating our spider

This is the code for our first spider. Save it in a file named MySpider.py under the spiders directory in your project:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractors.lxmlhtml import LxmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item

class MySpider(CrawlSpider):
name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com']
rules = (Rule(LxmlLinkExtractor(allow=())))

def parse_item(self, response):
hxs = HtmlXPathSelector(response)
element = Item()
return element

CrawlSpider provides a mechanism that allows you to follow the links that follow a certain pattern. Apart from the inherent attributes of the BaseSpider class, this class has a new rules attribute with which we can indicate to the spider the behavior that it should follow.