Chapter 11. Search-Ranking Webbots

Every day, millions of people find what they need online through search websites. If you own an online business, your search ranking may have far-reaching effects on that business. A higher-ranking search result should yield higher advertising revenue and more customers. Without knowing your search rankings, you have no way to measure how easy it is for people to find your web page, nor will you have a way to gauge the success of your attempts to optimize your web pages for search engines.

Manually finding your search ranking is not as easy as it sounds, especially if you are interested in the ranking of many pages with an assortment of search terms. If your web page appears on the first page of results, it’s easy to find, but if your page is listed somewhere on the sixth or seventh page, you’ll spend a lot of time figuring out how your website is ranked. Even searches for relatively obscure terms can return a large number of pages. (A recent Google search on the term tapered drills, for example, yielded over 3,940,000 results.) Since search engine spiders continually update their records, your search ranking may also change on a daily basis. Complicating the matter more, a web page will have a different search ranking for every search term. Manually checking web page search rankings with a browser does not make sense—webbots, however, make this task nearly trivial.

With all the search variations for each of your web pages, there is a need for an automated service to determine your web page’s search ranking. A quick Internet search will reveal several such services, like the one shown in Figure 11-1.

A search-ranking service, GoogleRankings.com

Figure 11-1. A search-ranking service, GoogleRankings.com

This chapter demonstrates how to design a webbot that finds a search ranking for a domain and a search term. While this project’s target is on the book’s website, you can modify this webbot to work on a variety of available search services.[35] This example project also shows how to perform an insertion parse, which injects parsing tags within a downloaded web page to make parsing easier.

Most search engines return two sets of results for any given search term, as shown in Figure 11-2. The most prominent search results are paid placements, which are purchased advertisements made to look something like search results. The other set of search results is made up of organic placements (or just organics), which are non-sponsored search results.

This chapter’s project focuses on organics because they’re the links that people are most likely to follow. Organics are also the search results whose visibility is improved through Search Engine Optimization.

The other part of the search result page we’ll focus on is the Next link. This is important because it tells our webbot where to find the next page of search results.

For our purposes, the search ranking is determined by counting the number of pages in the search results until the subject web page is first found. The page number is then combined with the position of the subject web page within the organic placements on that page. For example, if a web page is the sixth organic on the first result page, it has a search ranking of 1.6. If a web page is the third organic on the second page, its search ranking is 2.3.



[35] If you modify this webbot to work on other search services, make sure you are not violating their respective Terms of Service agreements.