Ruby Under a Microscope

Deducing what’s inside the Class structure

Two objects, one class

We saw above that every object remembers its class by saving a pointer to an RClass structure. What information does each RClass structure contain? What would I see if I could look inside a Ruby class? Let’s build up a model of what information must be present in RClass, and therefore, a technical definition of what a Ruby class is, based on what we know classes can do.

Every Ruby developer knows how to write a class: you type the class keyword, specify a name for the new class, and then type in the class’s methods. In fact, I already wrote a Ruby class this way in the previous section:

class Mathematician
  attr_accessor :first_name
  attr_accessor :last_name
end

As you probably know, attr_accessor is just shorthand for defining get and set methods for an attribute. The methods defined by attr_accessor also check for nil values. I don’t show this here. Here’s the more verbose way of defining the same Mathematician class:

class Mathematician
  def first_name
    @first_name
  end
  def first_name=(value)
    @first_name = value
  end
  def last_name
    @last_name
  end
  def last_name=(value)
    @last_name = value
  end
end

When taking a step back, and looking at this class, or any Ruby class, it looks like it is just a group of method definitions. I can assign behavior to an object by adding methods to its class, and when I call a method on an object, Ruby looks for the method in the object’s class. This leads me to my first definition of what a Ruby class is:

A Ruby class is a group of method definitions.

Therefore, I know that the RClass structure for Mathematician must save a list of all the methods I defined in the class:

While reviewing my Ruby code above, notice that I’ve also created two instance variables called @first_name and @last_name. We saw earlier how Ruby stores these values in each RObject structure, but you may have noticed that the names of these variables were not stored in RObject, just the values were. (As I mentioned above, Ruby 1.8 actually stores the names in RObject as well.) Instead, Ruby must store the attribute names in RClass; this makes sense since the names will be the same for every Mathematician instance. Let’s redraw RClass again, including a table of attribute names as well this time:

Now my definition of a Ruby class is:

A Ruby class is a group of method definitions and a table of attribute names.

At the beginning of this chapter I mentioned that everything in Ruby is an object. This might be true for classes too. It’s easy to prove this is, in fact, the case using IRB:

> p Mathematician.class
 => Class

You can see Ruby classes are all instances of the Class class; therefore, classes are also objects. Let’s update our definition of a Ruby class again:

A Ruby class is a Ruby object that also contains method definitions and attribute names.

Since Ruby classes are objects, we know that the RClass structure must also contain a class pointer and an instance variable array, the values that we know every Ruby object contains:

You can see I’ve added a pointer to the Class class, in theory the class of every Ruby class object. However, in Experiment 3-2 below I’ll show that actually this diagram is not accurate, that klass actually points to something else! I’ve also added a table of instance variables. Note: these are the class level instance variables. Don’t confuse this with the table of attribute names for the object level instance variables.

As you can see, this is rapidly getting out of control; the RClass structure seems to be much more complex than the RObject structure was! But, don’t worry, we’re almost done. In a moment I’ll show you what the actual RClass structure looks like. But first, there are still two more important types of information we need to consider that each Ruby class contains.

Another essential feature of object oriented programming that we all know Ruby also implements is inheritance. Ruby implements single inheritance by allowing us to optionally specify one superclass when we create a class, or if we don’t specify a superclass then Ruby assigns the Object class to be the superclass. For example, I could rewrite my Mathematician class using a superclass like this:

class Mathematician < Person
...

Now every instance of Mathematician will include the same methods that instances of Person have. In this example, I might want to move the first_name and last_name accessor methods into Person. I could also move the @first_name and @last_name attributes into the Person class, all instances of Mathematician would also share these attributes. Somehow the Mathematician class must contain a reference to the Person class (its superclass) so that Ruby can find any methods or attributes that actually were defined in a superclass.

Let’s update my definition again, assuming that Ruby tracks the superclass using another pointer similar to klass:

A Ruby class is a Ruby object that also contains method definitions, attribute names and a superclass pointer.

And let’s redraw the RClass structure including the new superclass pointer:

At this point it is critical to understand the difference between the klass pointer and the super pointer. The klass pointer indicates which class the Ruby class object is an instance of. This will always be the Class class:

> p Mathematician.class
 => Class

Ruby uses the klass pointer to find the methods of the Mathematician class, such as the new method which every Ruby class implements.

However, the super pointer records which class is the superclass of this class:

> p Mathematician.superclass
 => Person

Ruby uses the “super” pointer to help find methods that each Mathematician instance has, such as first_name= or last_name. I’ll cover method lookup in the next section.

Now we have just one more feature of Ruby classes to cover: constants. As you probably know, Ruby allows you to define constant values inside of a class, like this:

class Mathematician < Person
  AREA_OF_EXPERTISE = "Mathematics"
  etc...

Constant values must start with a capital letter, and are valid within the scope of the current class. Curiously, Ruby actually allows you to change a constant value but will display a warning when you do so. Let’s add a constant table to our RClass structure, since Ruby must save these values inside each class:

That’s it – so now we can write a complete, technical definition of what a Ruby class is:

A Ruby class is a Ruby object that also contains method definitions, attribute names, a superclass pointer and a constants table.

This isn’t as concise as the simple definition we had for what a Ruby object is, but each Ruby class does actually contain much more information than each Ruby object does. Ruby classes are obviously fundamental to the language.

The actual RClass structure

Now that we have built up a conceptual model for what information must be stored in RClass, let’s look at the actual C structure that Ruby uses to represent classes:

As you can see, Ruby actually uses two separate structures to represent each class: RClass and rb_classext_struct. But, these act as one large structure since each RClass always contains a pointer (ptr) to a corresponding rb_classext_struct. You might guess that the Ruby core team decided to use two different structures since there are so many different values to save, but actually they likely created rb_classext_struct to save internal values they didn’t want to expose in the public Ruby C extension API.

Like I did for RObject, on the left I show a VALUE pointer. Ruby always accesses classes using these VALUE pointers. On the right, you can see the technical names for all of the fields we just discussed:

flags and klass are the same RBasic values that every Ruby value contains.
m_tbl is the method table, a hash whose keys are the names or id’s of each method and whose values are pointers to the definition of each method, including the compiled YARV instructions.
iv_index_tbl is the attribute names table, a hash that maps each instance variable name to the index of the attribute’s value in each RObject instance variable array.
super is a pointer to the RClass structure for this class’s superclass.
iv_tbl contains the class level instance variables – both their names and values.
And finally const_tbl is a hash containing all of the constants – names and values – defined in this class’s scope. You can see that Ruby implements iv_tbl and const_tbl in the same way; that is, class level instance variables and constants are almost the same thing.

Now let’s take a quick look at the actual RClass structure definition:

typedef struct rb_classext_struct rb_classext_t;
struct RClass {
    struct RBasic basic;
    rb_classext_t *ptr;
    struct st_table *m_tbl;
    struct st_table *iv_index_tbl;
};

Like the RObject definition we saw earlier, you can find this structure definition in the include/ruby/ruby.h file. You can see all of the values I showed in the previous diagram.

The rb_classext_struct structure definition, on the other hand, can be found in the internal.h C header file:

struct rb_classext_struct {
    VALUE super;
    struct st_table *iv_tbl;
    struct st_table *const_tbl;
};

Once again, you can see the values I showed in the diagram. In Chapter 4 I’ll cover hash tables in detail, the st_table type here, which Ruby uses to save all of these values: the method table, the constant table, the instance variables for the class and also the instance variable names/id’s for object instances of this class.

Experiment 3-2: Where does Ruby save class methods?

Above we saw how each RClass structure saves all the methods defined in a certain class; in my example:

class Mathematician
  def first_name
    @first_name
  end

Ruby stores information about the first_name method inside the RClass structure for Mathematician using the method table.

But what about class methods? It’s a common idiom in Ruby to save methods in a class directly, using this syntax:

class Mathematician
  def self.class_method
    puts "This is a class method."
  end

Or this syntax:

class Mathematician
  class << self
    def class_method
      puts "This is a class method."
    end
  end

Are they saved in the RClass structure along with the normal methods for each class, maybe with a flag to indicate they are class methods and not normal methods? Or are they saved somewhere else? Let’s find out!

It’s easy to see where class methods are not saved. They are obviously not saved in the RClass method table along with normal methods, since instances of Mathematician cannot call them:

obj = Mathematician.new
obj.class_method
=> undefined method `class_method' for
#< Mathematician:0x007fdd8384d1c8 (NoMethodError)

Thinking about this some more, since Mathematician is also a Ruby object – remember my definition from above:

A Ruby class is a Ruby object that also contains method definitions, attribute names, a superclass pointer and a constants table.

…then Ruby should save methods for Mathematician in the same way it saves them for any object: in the method table for the object’s class. That is, Ruby should get Mathematician’s class using the klass pointer and save the method in the method table in that RClass structure:

But actually Ruby doesn’t do this – you can prove this is the case by creating another class and trying to call the new method:

> class AnotherClass; end
> AnotherClass.class_method
=> undefined method `class_method' for AnotherClass:Class (NoMethodError)

If Ruby had added the class method to the method table in the Class class, then all classes in your application would have the method. Obviously this isn’t what we intended by writing a class method, and thankfully Ruby doesn’t implement class methods this way.

Then where do the class methods go? You can find a clue by using the ObjectSpace.count_objects method, as follows:

$ irb
> ObjectSpace.count_objects[:T_CLASS]
 => 859 
> class Mathematician; end
 => nil 
> ObjectSpace.count_objects[:T_CLASS]
 => 861

ObjectSpace.count_objects returns the number of objects of a given type that currently exist. In this test, I’m passing the T_CLASS symbol to get the count of class objects that exist in my IRB session. Before I create Mathematician, there are 859 classes. After I declare Mathematician, there are 861 – two more. This seems a bit odd… I declared one new class but Ruby actually created two! What is the second one for? Where is it?

It turns out whenever you create a new class internally Ruby always creates two classes! The first class is your new class: Ruby creates a new RClass structure to represent your class as I have described above. But internally Ruby also creates a second, hidden class called the “metaclass.” Why? Just for this reason: to save any class methods you might later create for your new class. In fact, Ruby sets the metaclass to be the class of your new class – it sets the klass pointer of your new RClass structure to point to the metaclass.

Without writing C code, there’s no easy way to see the metaclass or the klass pointer value, but you can obtain the metaclass as a Ruby object as follows:

class Mathematician
end
obj = Mathematician.new
p obj.class
p obj.singleton_class

Running this I get:

$ ruby metaclass.rb
Mathematician
#<Class:#< Mathematician:0x007fb6228856c8>>

The first line displays the object’s class, while the second line displays the object’s metaclass; the odd “#<Class:#< Mathematician…” syntax indicates that the second class is the metaclass for Mathematician. This is the second RClass structure that Ruby automatically created for me when I declared the Mathematician class. And this second RClass structure is where Ruby saves my class method:

If I now display the methods for the metaclass, I’ll see all the usual Ruby Class methods, along with my new class method for Mathematician:

p obj.singleton_class.methods
=> [ ... :class_method, ...  ]