Interacting with S3

REST API

The S3 web service application program interface (API) is made available through two interfaces: REST and SOAP. In this book we will use the REST interface.

The S3 service implementation presented in this chapter uses the REST API functionality in the AWS Ruby module. The AWS module includes methods, presented in REST API Implementation” in Chapter 2, that perform authentication, transmission, and response checking of REST API requests.

The REST API interface for the S3 service uses five HTTP methods to perform API operations: GET, HEAD, PUT, DELETE, and POST. The meaning of each method varies slightly, depending on what kind of S3 resource the operation is targeting: an object, a bucket, an Access Control List (ACL), or the S3 service itself. Table 3-1 lists some of the operations you can perform on S3 resources using different HTTP methods.

Table 3-1. Acting on S3 resources with HTTP methods

Resource	GET	HEAD	PUT	DELETE	POST
S3 Service	List your buckets	-	-	-	-
Bucket	List the bucket’s objects	-	Create the bucket	Delete the bucket	-
Object	Retrieve the object’s data and metadata	Retrieve the object’s metadata	Create or replace the object	Delete the object	Create or replace the object
ACL (for a Bucket or Object resource)	Retrieve ACL settings	-	Apply new ACL settings	-	-

The most recent S3 API version available when this book was written, was 2006-03-01. This version number is used as a component of the XML namespace of documents provided to and produced by the service, http://s3.amazonaws.com/doc/2006-03-01/.

S3 Implementation Stub

In this chapter, we will gradually build up a complete implementation class called “S3” that you can use to interact with the S3 service. Example 3-1 shows a basic Ruby code stub that defines the S3 class, to which we will add API implementation methods as we proceed through the chapter. Save this code to a file named S3.rb in the same directory as the AWS module file AWS.rb, which we defined in Chapter 2.

Example 3-1. S3 class stub: S3.rb

require 'AWS'
require 'digest/md5'

class S3
  include AWS # Include the AWS module as a mixin

  S3_ENDPOINT = "s3.amazonaws.com"
  XMLNS = 'http://s3.amazonaws.com/doc/2006-03-01/'
    
  # S3 API implementation methods will go here...
  
end

This class will rely on the communication library implementation defined in the AWS module, which it includes as a mixin module. A mixin is a feature of Ruby that makes the variables and methods defined in a module available to a class.

The S3 class defines two constant values. The S3_ENDPOINT constant defines the default endpoint hostname for S3 service requests. The XMLNS constant defines the XML namespace that is used in documents received from, or sent to, the service.

Constructing S3 URIs

Resources in the S3 service are identified using URIs that include three components:

The location of the S3 service and the communication protocol (HTTP or HTTPS)
The S3 resource the request is targeting, such as a bucket or an object
Optional request parameters used to identify special resources, such as the access control settings associated with a resource

To perform an action on an S3 resource, you must first construct a URI string that identifies that resource. Constructing this URI is not a trivial task, because its content can vary a great deal depending on what kind of resource is involved and whether the request will use the standard S3 service domain or an alternative hostname instead. Table 3-2 shows some example URIs that may be used to represent resources in S3.

Table 3-2. S3 resource URI examples

Resource	Example URI
S3 Service	http://s3.amazonaws.com/
Bucket	http://s3.amazonaws.com/bucket_name
Object	http://s3.amazonaws.com/bucket_name/objectkey
ACL	http://s3.amazonaws.com/bucket_name/objectkey?acl
Object via S3 subdomain	http://bucketname.s3.amazonaws.com/objectkey
Object via virtual hostname	http://www.mydomain.com/objectkey

S3 understands three different URI formats:

A URI with the default service hostname s3.amazonaws.com, which will contain the bucket name in its resource path, if a bucket or object is specified.
A URI with a subdomain hostname constructed from the bucket name and s3.amazonaws.com; a bucket name will not be included in the resource path.
A URI with a virtual hostname that matches the name of the bucket; a bucket name will not be included in the resource path.

We will discuss how you can use these different URI formats in Alternative Hostnames” later in this chapter. For the time being, we must decide which URI format to use for our S3 request messages. Unfortunately we cannot simply choose one format that will work in all cases. Instead, we must choose between the subdomain format and the default format, depending on the name of the S3 bucket we are addressing.

The best URI construction to use is the subdomain format, because URIs of this type will work with buckets stored in any geographical location (see Bucket Locations” for more information). Unfortunately, this format cannot be used in cases where a bucket’s name is incompatible with the DNS system; for example, if it contains uppercase characters or underscores. In these cases, we must use the default S3 hostname format, in which the bucket name is contained in the resource path instead of the hostname.

In summary, we will use the subdomain format provided the name of the bucket we are addressing can be included in a valid hostname. Example 3-2 defines a method that determines whether we can build a valid subdomain URI for a given bucket name.

Example 3-2. Detemine if a bucket name is a valid DNS component: S3.rb

def valid_dns_name(bucket_name)
  if bucket_name.size > 63 or bucket_name.size < 3
    return false
  end

  return false unless bucket_name =~ /^[a-z0-9][a-z0-9.-]+$/

  return false unless bucket_name =~ /[a-z]/ # Cannot be an IP address

  bucket_name.split('.').each do |fragment|
    return false if fragment =~ /^-/ or fragment =~ /-$/ or fragment =~ /^$/
  end

  return true
end

This method returns a true response if the bucket name will make a valid subdomain hostname. To make a valid subdomain, a bucket name must:

be between 3 and 63 characters long.
contain only lowercase letters, numbers, periods, or dashes.
start with a letter or a number.
contain at least one letter, so the bucket name cannot resemble an IP address.
have no components (fragments between period characters) that start with a dash, end with a dash, or are an empty string.

Now that we have this method to determine when we can create subdomain URIs and when we cannot, we can move on to the code that will generate our URIs.

Example 3-3 defines a method that constructs URIs for S3 service requests. The URI will include a hostname that is either the default S3 service domain name or a sub-domain, based on the result of the valid_dns_name method. The URI will also include a path specifying the bucket or object resource the request will act upon, if any, and any additional request parameters provided to the method.

Example 3-3. Generate S3 URI: S3.rb

def generate_s3_uri(bucket_name='', object_name='', params=[])
  # Decide between the default and subdomain hostname formats
  if valid_dns_name(bucket_name)
    hostname = bucket_name + "." + S3_ENDPOINT
  else
    hostname = S3_ENDPOINT
  end

  # Build an initial secure or nonsecure URI for the end point.
  request_uri = (@secure_http ? "https://" : "http://") + hostname;

  # Include the bucket name in the URI except for alternative hostnames
  if hostname == S3_ENDPOINT
    request_uri << '/' + URI.escape(bucket_name) if bucket_name != ''
  end

  # Add object name component to URI if present
  request_uri << '/' + URI.escape(object_name) if object_name != ''

  # Add request parameters to the URI. Each item in the params variable
  # is a hash dictionary containing multiple keys.
  query = ""
  params.each do |hash|
    hash.each do |name, value|
      query << '&' if query.length > 0

      if value.nil?
        query << "#{name}"
      else
        query << "#{name}=#{CGI::escape(value.to_s)}"
      end
    end
  end
  request_uri << "?" + query if query.length > 0

  return URI.parse(request_uri)
end

The URI generated by this method will use the standard HTTP or secure HTTPS protocol, depending on whether the @secure_http variable is set. If the S3 bucket and object names parameters provided are not empty strings, they are added to the URI path. If URI will follow the subdomain format, only object names and not bucket names will be added to the path.

The following examples demonstrate how this method can be used to construct URIs corresponding to the examples in Table 3-2 earlier in this chapter.

# Load the S3 class and instantiate it in a variable
irb> require 'S3'
irb> s3 = S3.new

# S3 Service URI
irb> s3.generate_s3_uri().to_s
=> "https://s3.amazonaws.com"

# Bucket URI (where the bucket name is a valid DNS name)
irb> s3.generate_s3_uri('bucketname').to_s
=> "https://bucketname.s3.amazonaws.com"

# Bucket URI (where the bucket name is not a valid DNS name)
irb> s3.generate_s3_uri('bucket_name').to_s
=> "https://s3.amazonaws.com/bucket_name"

# Object URI (where the bucket name is a valid DNS name)
irb> s3.generate_s3_uri('bucket.name','objectkey').to_s
=> "https://bucket.name.s3.amazonaws.com/objectkey"

# Object URI (where the bucket name is not a valid DNS name)
irb> s3.generate_s3_uri('bucket-.-name','objectkey').to_s
=> "https://s3.amazonaws.com/bucket-.-name/objectkey"

# Bucket's ACL URI
irb> s3.generate_s3_uri('bucketname', '', [:acl=>nil]).to_s
=> "https://bucketname.s3.amazonaws.com?acl"