Ruby Under a Microscope

Hash tables in Rubinius

At a high level, Rubinius uses the same hash table algorithm as MRI and JRuby - but using Ruby instead of C or Java. This means the Rubinius source code is about 10 times easier to understand than either the MRI or JRuby code, and is a great way to learn more about hash tables if you’re interested in getting your hands dirty without learning C or Java.

Here’s how hashes look inside of Rubinius:

Since this is just plain Ruby, in Rubinius your Ruby objects are actually implemented with a real Ruby class called Hash. You’ll see it has a few integer attributes, such as @size, @capacity and @max_entries, and also an instance variable called @entries which is the bin array that actually contains the hash data. Rubinius implements the bin array using a Ruby class called Rubinius::Tuple, a simple storage class similar to an array. Rubinius saves each hash element inside a Ruby object called Bucket, saved inside of the @entries Rubinius::Tuple array.

One difference you’ll see in the Rubinius hash table implementation is that it uses simple powers of two to decide how many hash bins to create, instead of prime numbers. Initially Rubinius uses 16 Bucket objects. Whenever Rubinius needs to allocate more bins, it just doubles the size of the bin array, @entries, in the code above. While theoretically this is less ideal than using prime numbers, it simplifies the code substantially and also allows Rubinius to use bitwise arithmetic to calculate the bin index, instead of having to divide and take the remainder/modulus.

You’ll find the Rubinius hash implementation in source code files called kernel/common/ hash18.rb and kernel/common/hash19.rb - Rubinius has entirely different implementations of hashes depending on whether you start in Ruby 1.8 or Ruby 1.9 compatibility mode. Here’s a snippet from hash18.rb, showing how Rubinius finds a value given a key:

def [](key)
  if item = find_item(key)
    item.value
  else
    default key
  end
end
...etc...
# Searches for an item matching +key+. Returns the item
# if found. Otherwise returns +nil+.
def find_item(key)
  key_hash = key.hash
  item = @entries[key_index(key_hash)]
  while item
    if item.match? key, key_hash
      return item
    end
    item = item.link
  end
end
...etc...
# Calculates the +@entries+ slot given a key_hash value.
def key_index(key_hash)
  key_hash & @mask
end

You can see the key_index method uses bitwise arithmetic to calculate the bin index, since the bin count will always be a power of two for Rubinius, and not a prime number.