The SimpleDB service stores all content as text, including the attribute values that store your data. The service does not recognize data types in the same way that a relational database does. This feature makes the service more flexible, because you can store any values you like without having to worry about whether they match a predefined schema; however, it also means that the service is only able to compare or sort values based on lexicographical (alphabetical) ordering. Whereas a traditional database can compare various data types based on a full understanding of what the particular type means, SimpleDB is oblivious to the standard data types and will assume that an alphabetical ordering always makes sense.
If you intend to perform queries that use comparison operators, such as less-than and greater-than, you will have to carefully encode any nontextual data you store in the service so that its lexicographical ordering is the same as the expected ordering for data of that type. You will also need to be able to decode these text values when you retrieve them from SimpleDB.
In this section we will define methods to encode and decode the most commonly used data types: Boolean, date, integer, and float. Our data-type encodings are designed to meet two criteria:
The encoding of data types into text strings is an advanced topic that we cannot discuss in depth in this book. The encoding techniques we present here are intended to meet the needs of most SimpleDB users, but there are bound to be some situations in which the reader will have to implement an encoding format that better suits his or her application. Regardless of how you encode your data, make sure the end result sorts correctly as text for the full range of values you intend to use.
Our encoding of Boolean values is very simple: !b represents
true
and !B represents false
. Example 13-10 and Example 13-11
define methods that will encode and decode Boolean values in the
SimpleDB class.
Example 13-10. Encode Boolean value: SimpleDB.rb
def encode_boolean(value) if value return '!b' else return '!B' end end
Example 13-11. Decode Boolean value: SimpleDB.rb
def decode_boolean(value_str) if value_str == '!B' return false elsif value_str == '!b' return true else raise "Cannot decode boolean from string: #{value_str}" end end
This Boolean encoding is easy to use and to recognize:
# Encoding boolean values irb> sdb.encode_boolean(true) => "!b" irb> sdb.encode_boolean(false) => "!B" irb> sdb.encode_boolean(nil) => "!B"
We use the ISO 8601 date format to store dates, because this format was designed such that lexicographical order corresponds to chronological order in all but a few cases (such as dates prior to 0 B.C.E.). Example 13-12 and Example 13-13 elaborate. To ensure that encoded dates can be properly compared, we always convert dates to the UTC time zone.
Example 13-12. Encode date value:SimpleDB.rb
def encode_date(value) return "!d" + value.getutc.iso8601 end
Example 13-13. Decode date value: SimpleDB.rb
def decode_date(value_str) if value_str[0..1] == '!d' return Time.parse(value_str[2..-1]) else raise "Cannot decode date from string: #{value_str}" end end
These date strings should look very familiar, because the AWS services use the ISO 8601 date format extensively.
irb> sdb.encode_date(Time.now) => "!d2008-01-03T05:14:39Z" irb> sdb.decode_date('!d2008-01-03T05:12:50Z') => Thu Jan 03 05:12:50 UTC 2008
Integer values do not sort well when converted to text, because the ordering is affected by the number of digits in the string and by the presence of a minus sign for negative values. To encode positive integers to text strings, we add zeros to the beginning of the string so that all integer strings are the same length. Encoding negative numbers is more difficult. In this case we record the number as the difference between the actual value and the largest value that can be represented using our formatting scheme, given a limit on how many digits can be included. Positive and negative numbers are identified with the prefixes !i and !I respectively. Example 13-14 and Example 13-15 define methods that encode and decode integer values.
Example 13-14. Encode integer value: SimpleDB.rb
def encode_integer(value, max_digits=18) upper_bound = (10 ** max_digits) if value >= upper_bound or value < -upper_bound raise "Integer #{value} is outside encoding range (-#{upper_bound} " + "to #{upper_bound - 1})" end if value < 0 return "!I" + format("%0#{max_digits}d", upper_bound + value) else return "!i" + format("%0#{max_digits}d", value) end end
Example 13-15. Decode integer value: SimpleDB.rb
def decode_integer(value_str) if value_str[0..1] == '!I' # Encoded value is a negative integer max_digits = value_str.size - 2 upper_bound = (10 ** max_digits) return value_str[2..-1].to_i - upper_bound elsif value_str[0..1] == '!i' # Encoded value is a positive integer return value_str[2..-1].to_i else raise "Cannot decode integer from string: #{value_str}" end end
Some example encodings may make clearer how the integer encoding format produces strings that sort in the correct order.
# Maximum number of digits allowed in encoded strings (default is 18) irb> max_digits = 2 irb> sdb.encode_integer(7, max_digits) => "!i07" irb> sdb.encode_integer(25, max_digits) => "!i25" irb> sdb.encode_integer(-3, max_digits) => "!I97" irb> sdb.encode_integer(-100, max_digits) => "!I00" # Confirm the encoded values sort correctly irb> ["!i07", "!i25", "!I97", "!I00"].sort => ["!I00", "!I97", "!i07", "!i25"] #ie -100, -3, 7, 25
The floating-point data type is the most difficult one to encode into text values, because we must handle three separate components: the number’s sign, exponent, and fraction. To encode a float’s sign, we use the !f and !F prefixes to represent positive and negative values respectively. We store the exponent as a zero-padded integer with an offset value added to convert negative exponents into positive values. The fraction component is stored as an integer using the same technique described above to handle positive and negative values. If the fraction component is too large to fit in the space allowed, we reduce the precision by rounding the value.
Example 13-16 defines a method that encodes a positive or negative floating-point value, while Example 13-17 defines a method that decodes it. By default, the number of digits allocated to the float’s exponent is 2, which allows for exponent values between –50 and 49 to be encoded. The default number of digits allocated to the float’s fraction is 15, which represents a precision greater than the floating-point data type of most languages.
Example 13-16. Encode float value: SimpleDB.rb
def encode_float(value, max_exp_digits=2, max_precision_digits=15) exp_midpoint = (10 ** max_exp_digits) / 2 sign, fraction, base, exponent = BigDecimal(value.to_s).split if exponent >= exp_midpoint or exponent < -exp_midpoint raise "Exponent #{exponent} is outside encoding range " + "(-#{exp_midpoint} " + "to #{exp_midpoint - 1})" end if fraction.size > max_precision_digits # Round fraction value if it exceeds allowed precision. fraction_str = fraction[0...max_precision_digits] + '.' + fraction[max_precision_digits..-1] fraction = BigDecimal(fraction_str).round(0).split[1] elsif fraction.size < max_precision_digits # Right-pad fraction with zeros if it is too short. fraction = fraction + ('0' * (max_precision_digits - fraction.size)) end # The zero value is a special case, for which the exponent must be 0 exponent = -exp_midpoint if value == 0 if sign == 1 return format("!f%0#{max_exp_digits}d", exp_midpoint + exponent) + format("!%0#{max_precision_digits}d", fraction.to_i) else fraction_upper_bound = (10 ** max_precision_digits) diff_fraction = fraction_upper_bound - BigDecimal(fraction) return format("!F%0#{max_exp_digits}d", exp_midpoint - exponent) + format("!%0#{max_precision_digits}d", diff_fraction) end end
Example 13-17. Decode float value: SimpleDB.rb
def decode_float(value_str) prefix = value_str[0..1] if prefix != '!f' and prefix != '!F' raise "Cannot decode float from string: #{value_str}" end value_str =~ /![fF]([0-9]+)!([0-9]+)/ exp_str = $1 fraction_str = $2 max_exp_digits = exp_str.size exp_midpoint = (10 ** max_exp_digits) / 2 max_precision_digits = fraction_str.size if prefix == '!F' sign = -1 exp = exp_midpoint - exp_str.to_i fraction_upper_bound = (10 ** max_precision_digits) fraction = fraction_upper_bound - BigDecimal(fraction_str) else sign = 1 exp = exp_str.to_i - exp_midpoint fraction = BigDecimal(fraction_str) end return sign * "0.#{fraction.to_i}".to_f * (10 ** exp) end
More encoding examples are shown here to further illustrate working with floating-point values.
irb> sdb.encode_float(0.0) => "!f00!000000000000000" # Exponent: 0, 15-digit fraction: 000000000000000 irb> sdb.encode_float(12345678901234567890) => "!f70!123456789012346" # Exponent: 20, Rounded 15-digit fraction: 123456789012346 irb> sdb.encode_float(0.12345678901234567890) => "!f50!123456789012346" # Exponent: 0, Rounded 15-digit fraction: 123456789012346 irb> sdb.encode_float(-12345678901234567890) => "!F30!876543210987654" # Exponent: -20, 15-digit fraction difference: 876543210987654 irb> sdb.encode_float(-0.12345678901234567890) => "!F50!876543210987654" # Exponent: 0, 15-digit fraction difference: 876543210987654 # Confirm the encoded values sort correctly irb> ["!f00!000000000000000","!f70!123456789012346","!f50!123456789012346", irb> "!F30!876543210987654","!F50!876543210987654"].sort => ["!F30!876543210987654", # -12345678901234567890 "!F50!876543210987654", # -0.12345678901234567890 "!f00!000000000000000", # 0.0 "!f50!123456789012346", # 0.12345678901234567890 "!f70!123456789012346"] # 12345678901234567890
The documentation and code samples provided by Amazon describe alternative strategies for encoding integers and floating-point numbers. You may prefer Amazon’s approach to ours, because it is more straight-forward, though it requires that you know in advance the largest negative numbers you will need to store.
To allow the SimpleDB
class to automatically encode and decode attribute values on the fly,
in Example 13-18 and Example 13-19 we will define the methods encode_attribute_value
and decode_attribute_value
. These
are called by the existing class methods when attribute values are set
(see Example 13-6) or retrieved (see
Example 13-7).
Example 13-18. Encode an attribute value of any type: SimpleDB.rb
def encode_attribute_value(value) if value == true or value == false return encode_boolean(value) elsif value.is_a? Time return encode_date(value) elsif value.is_a? Integer return encode_integer(value) elsif value.is_a? Numeric return encode_float(value) else # No type-specific encoding is available, so we simply convert # the value to a string. return value.to_s end end
Example 13-19. Decode an attribute value of any type: SimpleDB.rb
def decode_attribute_value(value_str) return '' if value_str.nil? # Check whether the '!' flag is present to indicate an encoded value return value_str if value_str[0..0] != '!' prefix = value_str[0..1].downcase if prefix == '!b' return decode_boolean(value_str) elsif prefix == '!d' return decode_date(value_str) elsif prefix == '!i' return decode_integer(value_str) elsif prefix == '!f' return decode_float(value_str) else return value_str end end