The application-level access to most web client
activities is through modules called urllib and
urllib2 (Section 42.6).
urllib
is the simple web interface; it
provides basic functions for opening and retrieving web resources via their
URLs.
The primary functions in urllib
are
urlopen( )
, which opens an URL and
returns a file-like object, and urlretrieve(
)
, which retrieves the entire web resource at the given URL. The
file-like object returned by urlopen supports the following methods: read( )
, readline(
)
, readlines( )
, fileno( )
, close(
)
, info( )
, and geturl( )
. The first five methods work just like
their file counterparts. info( )
returns a
mimetools.Message
object, which for HTTP
requests contains the HTTP headers associated with the URL. geturl( )
returns the real URL of the resource,
since the client may have been redirected by the web server before getting the
actual content.
urlretrieve( )
returns a tuple (filename, info)
, where filename
is the local file to which the web resource was copied
and info
is the same as the return value from
urlopen's info( )
method.
If the result from either urlopen( )
or
urlretrieve( )
is HTML, you can use
htmllib
to parse it.
urllib
also provides a function urlencode( )
, which converts standard tuples or
dictionaries into properly URL-encoded queries. Here is an example session that
uses the GET method to retrieve a URL containing parameters:
>>>import urllib
>>>params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>>f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>>print f.read( )
The following example performs the same query but uses the POST method instead:
>>>import urllib
>>>params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>>f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>>print f.read( )
— DJPH