How Ruby implements modules and method lookup

image
You can mix multiple modules into one class.

Most Ruby developers are used to the idea that Ruby only supports single inheritance; unlike C++ for example, you can only specify one superclass for each class. However, Ruby does allow for multiple inheritance in an indirect way using modules. You can include as many different modules into a class as you wish, each of them adding new methods and behavior.

How do modules work? Following the same pattern we’ve seen with RObject and RClass, is there also an RModule structure that defines a Ruby module? And how does Ruby keep track of which modules have been included in which classes? Finally, how does Ruby lookup methods? How does it know whether to search for a certain method in a class or a module?

It turns out that Ruby doesn’t use an RModule structure. Internally Ruby implements modules as classes. Whenever you create a module, Ruby actually creates another RClassrb_classext_struct structure pair, just like it would for a new class. For example, when I define a new module like this:

module Professor
end

…internally Ruby will create a class, not a module! Here are the class structures again:

image

However, while internally modules are really classes they are still different from classes in two important ways:

So in fact modules don’t use the iv_index_tbl value, since there are no object level attributes to keep track of. Modules don’t have object instances. Therefore, we can imagine modules using a slightly smaller version of the RClass structures:

image

Following the same train of thought, we can write a technical definition of a Ruby module as follows:

A Ruby module is a Ruby object that also contains method definitions, a superclass pointer and a constants table.

What happens when you include a module in a class?

The real magic behind modules happens when you include one into a class. At the moment you include a module into a class, for example:

module Professor
end
class Mathematician
  include Professor
end

Ruby creates a copy of the RClass structure for the Professor module and inserts it as the new superclass for Mathematician. Ruby’s C source code refers to this copy of the module as an “included class.” The superclass of the new copy of Professor is set to the original superclass of Mathematician, preserving the superclass or “ancestor chain:”

image

Here I’ve kept things simple by only displaying the RClass structures and not the rb_classext_struct structures, which actually hold the super pointers.

Ruby’s method lookup algorithm

Why go to all of this trouble? Why does Ruby bother to change all of the super pointers to make included modules behave as if they were superclasses? Ruby does this to allow its method lookup algorithm to work properly, taking both superclasses and modules into account.

Understanding Ruby’s method lookup algorithm thoroughly is essential for every Ruby developer, so let’s take a close look at it:

image

What surprised me about this algorithm is how simple it is; Ruby simply follows the super pointers until it finds the class or module containing the target method. I had always imagined this would be a much more complex process: that Ruby would have to distinguish between modules and classes using some special logic, that it would have to handle the case when there were multiple included modules with some special code, and more. But no, it’s very simple, just a simple loop on the super pointer linked list.

Let’s take an example and walk through the method lookup process. Suppose I decide to move my first and last name attributes out of Mathematician and into the Person superclass like this:

class Person
  attr_accessor :first_name
  attr_accessor :last_name
end

Remember my Mathematician class uses Person as the superclass and also now includes the Professor module:

module Professor
  def lectures; ...etc... end
end
class Mathematician < Person
  include Professor
end

Now, suppose I set the first name of a mathematician:

ramanujan = Mathematician.new
ramanujan.first_name = "Srinivasa"

To execute this code, Ruby needs to find the first_name= method. To do this, Ruby will start by taking the ramanujan object and getting it’s class via the klass pointer:

image

Then Ruby will look to see if Mathematician implements first_name= directly by looking through its method table:

image

Since I moved all of the methods down into the Person superclass, the first_name= method is no longer there. Instead Ruby will get the superclass of Mathematician using the super pointer:

image

Remember, this is not the Person class but instead is the “included class,” or copy of the Professor module. Ruby will now look through the method table for Professor, but will only find the lectures method, and not first_name=.

An important detail here is that, because of the way Ruby inserts modules above the original superclass in the superclass chain, methods in an included module will override methods present in a superclass. In this example, if Professor also had a first_name= method, Ruby would call it and not the method in Person.

Since in this example Ruby doesn’t find first_name= in Professor, it will continue to iterate over the super pointers – this time using the super pointer in Professor:

image

Note the superclass of the Professor module, or more precisely, the superclass of the included class copy of the Professor module, is now actually the Person class. This was the original superclass of Mathematician. Finally, Ruby can find the first_name= method and call it.

What is interesting here is that internally Ruby implements module inclusion using class inheritance. Saying that in a different way, there is no difference at all between including a module and specifying a superclass. Both make new methods available to the target class, and internally both use the class’s super pointer. Including multiple modules in a Ruby class really is equivalent to specifying multiple superclasses.

However, Ruby keeps things simple by enforcing a single list of ancestors. While including multiple modules does create multiple superclasses internally, Ruby maintains them in a single list for you. As a Ruby developer, you get the benefits of multiple inheritance – adding new behavior to class from as many different modules as you would like – while keeping the simplicity of the single inheritance model. Ruby itself benefits from this simplicity as well! By enforcing this single list of superclass ancestors, Ruby’s method lookup algorithm can be very simple. Whenever you call a method on an object, all Ruby has to do is iterate through the superclass linked list until it finds the class or module that contains the target method.

Including two modules in one class

Ruby’s method lookup algorithm is simple, but the code it uses to include modules is not. As we saw above, when you include a module in a class, Ruby inserts a copy of the module into the class’s ancestor chain. This also means if you include two modules, one after the other, the second module will appear first in the ancestor chain… and will be found first by Ruby’s method lookup logic.

For example, suppose I include two modules into Mathematician:

class Mathematician < Person
  include Professor
  include Employee
end

Now Mathematician objects have methods from the Professor module, the Employee module and the Person class. But which methods will Ruby find first? Which methods override which?

Using a diagram, it’s easy to see the order: since I include the Professor module first, Ruby inserts its copy as a superclass first:

image

And now when I include the Employee module, its copy will be inserted above the Professor module’s copy using the same process:

image

This means that methods from Employee will override methods from Professor, which in turn will override methods from Person, the actual superclass.

Finally, modules don’t allow you to specify superclasses; i.e., I can’t write:

module Professor < Employee
end

But I can include one module into another like this:

module Professor
  include Employee
end

Now what happens when I include Professor, a module with other modules included in it, into Mathematician? Which methods will Ruby find first? Here’s what happens: first, when I include Employee into Professor, Ruby will create a copy of Employee and set it as the superclass of Professor internally:

image

That’s right: modules can’t have a superclass in your code, but inside of Ruby they can! This is because Ruby represents modules with classes internally. And now, finally, when I include Professor into Mathematician, Ruby iterates over the two modules and inserts them both as superclasses of Mathematician:

image

Now Ruby will find the methods in Professor first, and Employee second.

Experiment 3-3: Modifying a module after including it

image

Following a suggestion by Xavier Noria, this experiment will look at what happens when you modify a module after it’s been included into a class. Let’s reuse the same Mathematician class and the Professor module:

module Professor
  def lectures; end
end
class Mathematician
  attr_accessor :first_name
  attr_accessor :last_name
  include Professor
end

This time the Mathematician class contains the accessor methods for @first_name and @last_name, and I’ve also included the Professor module. If I inspect the methods of a mathematician object, I should see both the attribute methods, first_name=, etc., and the lectures method which came from Professor:

fermat = Mathematician.new
fermat.first_name = 'Pierre'
fermat.last_name = 'de Fermat'
p fermat.methods.sort
 => [ … :first_name, :first_name=, … :last_name, :last_name=, :lectures … ]

No surprise; I see all the methods.

Now let’s try adding some new methods to the Professor module after including it in the Mathematician class. Is Ruby smart enough to know the new methods should be added to Mathematician as well? Let’s find out.

module Professor
  def primary_classroom; end
end
p fermat.methods.sort
=> [ ... :first_name, :first_name=, ... :last_name, :last_name=, :lectures,
... :primary_classroom, ... ]

As you can see, I get all the methods, including the new primary_classroom method added to Professor after it was included in Mathematician. No surpise again – Ruby is one step ahead of me.

Now let’s try one more test. What will happen if I re-open the Professor module and include yet another module into to it:

module Employee
  def hire_date; end
end
module Professor
  include Employee
end

This is getting somewhat confusing now, so let me summarize what I’ve done so far:

Let’s see if Ruby works as I expect:

p fermat.methods.sort
 => [ … :first_name, :first_name=, … :last_name, :last_name=, :lectures … ]

The hire_date method is not available in the fermat object. Including a module into a module that was already included into a class does not effect that class. After learning about how Ruby implements modules this shouldn't be too hard to understand. Including Employee into Professor does change the Professor module, but not the copy of Professor that Ruby created when I included it in Mathematician:

image

But what about the primary_classroom method? How was Ruby able to include primary_classroom in Mathematician, even though I added it to Professor after I included Professor in Mathematician? Looking at the diagram above, it’s clear Ruby created a copy of the Professor module before I added the new method to it. But the fermat object gets the new method… how?

To understand this, we need to take a closer look at how Ruby copies modules when you include them into a class. It turns out that Ruby copies the RClass structure, but not the underlying module table! Here’s what I mean:

image

Ruby doesn’t copy the method table for Professor. Instead, it simply sets m_tbl in the new copy of Professor, the “included class,” to point to the same method table. This means that modifying the method table, reopening the module and adding new methods, will change both the module and any classes it was already included in.