The S3 web service application program interface (API) is made available through two interfaces: REST and SOAP. In this book we will use the REST interface.
The S3 service implementation presented in this chapter uses the REST API functionality in the AWS Ruby module. The AWS module includes methods, presented in REST API Implementation” in Chapter 2, that perform authentication, transmission, and response checking of REST API requests.
The REST API interface for the S3 service uses five HTTP methods to perform API operations: GET, HEAD, PUT, DELETE, and POST. The meaning of each method varies slightly, depending on what kind of S3 resource the operation is targeting: an object, a bucket, an Access Control List (ACL), or the S3 service itself. Table 3-1 lists some of the operations you can perform on S3 resources using different HTTP methods.
Table 3-1. Acting on S3 resources with HTTP methods
Resource | GET | HEAD | PUT | DELETE | POST |
---|---|---|---|---|---|
S3 Service | List your buckets | - | - | - | - |
Bucket | List the bucket’s objects | - | Create the bucket | Delete the bucket | - |
Object | Retrieve the object’s data and metadata | Retrieve the object’s metadata | Create or replace the object | Delete the object | Create or replace the object |
ACL (for a Bucket or Object resource) | Retrieve ACL settings | - | Apply new ACL settings | - | - |
The most recent S3 API version available when this book was
written, was 2006-03-01
. This
version number is used as a component of the XML namespace of
documents provided to and produced by the service, http://s3.amazonaws.com/doc/2006-03-01/.
In this chapter, we will gradually build up a complete implementation class called “S3” that you can use to interact with the S3 service. Example 3-1 shows a basic Ruby code stub that defines the S3 class, to which we will add API implementation methods as we proceed through the chapter. Save this code to a file named S3.rb in the same directory as the AWS module file AWS.rb, which we defined in Chapter 2.
Example 3-1. S3 class stub: S3.rb
require 'AWS' require 'digest/md5' class S3 include AWS # Include the AWS module as a mixin S3_ENDPOINT = "s3.amazonaws.com" XMLNS = 'http://s3.amazonaws.com/doc/2006-03-01/' # S3 API implementation methods will go here... end
This class will rely on the communication library implementation defined in the AWS module, which it includes as a mixin module. A mixin is a feature of Ruby that makes the variables and methods defined in a module available to a class.
The S3 class defines two constant values. The S3_ENDPOINT
constant defines the default
endpoint hostname for S3 service requests. The XMLNS
constant defines the XML namespace
that is used in documents received from, or sent to, the
service.
Resources in the S3 service are identified using URIs that include three components:
The location of the S3 service and the communication protocol (HTTP or HTTPS)
The S3 resource the request is targeting, such as a bucket or an object
Optional request parameters used to identify special resources, such as the access control settings associated with a resource
To perform an action on an S3 resource, you must first construct a URI string that identifies that resource. Constructing this URI is not a trivial task, because its content can vary a great deal depending on what kind of resource is involved and whether the request will use the standard S3 service domain or an alternative hostname instead. Table 3-2 shows some example URIs that may be used to represent resources in S3.
Table 3-2. S3 resource URI examples
Resource | Example URI |
---|---|
S3 Service | http://s3.amazonaws.com/ |
Bucket | http://s3.amazonaws.com/bucket_name |
Object | http://s3.amazonaws.com/bucket_name/objectkey |
ACL | http://s3.amazonaws.com/bucket_name/objectkey?acl |
Object via S3 subdomain | http://bucketname.s3.amazonaws.com/objectkey |
Object via virtual hostname | http://www.mydomain.com/objectkey |
S3 understands three different URI formats:
A URI with the default service hostname s3.amazonaws.com, which will contain the bucket name in its resource path, if a bucket or object is specified.
A URI with a subdomain hostname constructed from the bucket name and s3.amazonaws.com; a bucket name will not be included in the resource path.
A URI with a virtual hostname that matches the name of the bucket; a bucket name will not be included in the resource path.
We will discuss how you can use these different URI formats in Alternative Hostnames” later in this chapter. For the time being, we must decide which URI format to use for our S3 request messages. Unfortunately we cannot simply choose one format that will work in all cases. Instead, we must choose between the subdomain format and the default format, depending on the name of the S3 bucket we are addressing.
The best URI construction to use is the subdomain format, because URIs of this type will work with buckets stored in any geographical location (see Bucket Locations” for more information). Unfortunately, this format cannot be used in cases where a bucket’s name is incompatible with the DNS system; for example, if it contains uppercase characters or underscores. In these cases, we must use the default S3 hostname format, in which the bucket name is contained in the resource path instead of the hostname.
In summary, we will use the subdomain format provided the name of the bucket we are addressing can be included in a valid hostname. Example 3-2 defines a method that determines whether we can build a valid subdomain URI for a given bucket name.
Example 3-2. Detemine if a bucket name is a valid DNS component: S3.rb
def valid_dns_name(bucket_name) if bucket_name.size > 63 or bucket_name.size < 3 return false end return false unless bucket_name =~ /^[a-z0-9][a-z0-9.-]+$/ return false unless bucket_name =~ /[a-z]/ # Cannot be an IP address bucket_name.split('.').each do |fragment| return false if fragment =~ /^-/ or fragment =~ /-$/ or fragment =~ /^$/ end return true end
This method returns a true
response if the bucket name will make a valid subdomain hostname. To
make a valid subdomain, a bucket name must:
be between 3 and 63 characters long.
contain only lowercase letters, numbers, periods, or dashes.
start with a letter or a number.
contain at least one letter, so the bucket name cannot resemble an IP address.
have no components (fragments between period characters) that start with a dash, end with a dash, or are an empty string.
Now that we have this method to determine when we can create subdomain URIs and when we cannot, we can move on to the code that will generate our URIs.
Example 3-3 defines a method that
constructs URIs for S3 service requests. The URI will include a
hostname that is either the default S3 service domain name or a
sub-domain, based on the result of the valid_dns_name
method. The URI will also
include a path specifying the bucket or object resource the request
will act upon, if any, and any additional request parameters provided
to the method.
Example 3-3. Generate S3 URI: S3.rb
def generate_s3_uri(bucket_name='', object_name='', params=[]) # Decide between the default and subdomain hostname formats if valid_dns_name(bucket_name) hostname = bucket_name + "." + S3_ENDPOINT else hostname = S3_ENDPOINT end # Build an initial secure or nonsecure URI for the end point. request_uri = (@secure_http ? "https://" : "http://") + hostname; # Include the bucket name in the URI except for alternative hostnames if hostname == S3_ENDPOINT request_uri << '/' + URI.escape(bucket_name) if bucket_name != '' end # Add object name component to URI if present request_uri << '/' + URI.escape(object_name) if object_name != '' # Add request parameters to the URI. Each item in the params variable # is a hash dictionary containing multiple keys. query = "" params.each do |hash| hash.each do |name, value| query << '&' if query.length > 0 if value.nil? query << "#{name}" else query << "#{name}=#{CGI::escape(value.to_s)}" end end end request_uri << "?" + query if query.length > 0 return URI.parse(request_uri) end
The URI generated by this method will use the standard HTTP or
secure HTTPS protocol, depending on whether the @secure_http
variable is set. If the S3
bucket and object names parameters provided are not empty strings,
they are added to the URI path. If URI will follow the subdomain
format, only object names and not bucket names will be added to the
path.
The following examples demonstrate how this method can be used to construct URIs corresponding to the examples in Table 3-2 earlier in this chapter.
# Load the S3 class and instantiate it in a variable irb> require 'S3' irb> s3 = S3.new # S3 Service URI irb> s3.generate_s3_uri().to_s => "https://s3.amazonaws.com" # Bucket URI (where the bucket name is a valid DNS name) irb> s3.generate_s3_uri('bucketname').to_s => "https://bucketname.s3.amazonaws.com" # Bucket URI (where the bucket name is not a valid DNS name) irb> s3.generate_s3_uri('bucket_name').to_s => "https://s3.amazonaws.com/bucket_name" # Object URI (where the bucket name is a valid DNS name) irb> s3.generate_s3_uri('bucket.name','objectkey').to_s => "https://bucket.name.s3.amazonaws.com/objectkey" # Object URI (where the bucket name is not a valid DNS name) irb> s3.generate_s3_uri('bucket-.-name','objectkey').to_s => "https://s3.amazonaws.com/bucket-.-name/objectkey" # Bucket's ACL URI irb> s3.generate_s3_uri('bucketname', '', [:acl=>nil]).to_s => "https://bucketname.s3.amazonaws.com?acl"