Ruby Foundations

This book relies heavily on a firm understanding of Ruby. This section will explain some aspects of Ruby that are often confusing or misunderstood. Some of this may be familiar, but these are important concepts that form the basis for the metaprogramming techniques covered later in this chapter.

Classes and Modules

Classes and modules are the foundation of object-oriented programming in Ruby. Classes facilitate encapsulation and separation of concerns. Modules can be used as mixins—bundles of functionality that are added onto a class to add behaviors in lieu of multiple inheritance. Modules are also used to separate classes into namespaces.

In Ruby, every class name is a constant. This is why Ruby requires class names to begin with an uppercase letter. The constant evaluates to the class object, which is an object of the class Class. This is distinct from the Class object, which represents the actual class Class. ^[3] When we refer to a "class object" (with a lowercase C), we mean any object that represents a class (including Class itself). When we refer to the "Class object" (uppercase C), we mean the class Class, which is the superclass of all class objects.

The class Class inherits from Module; every class is also a module. However, there is an important distinction. Classes cannot be mixed in to other classes, and classes cannot extend objects; only modules can.

Method Lookup

Method lookup in Ruby can be very confusing, but it is quite regular. The easiest way to understand complicated situations is to visualize the data structures that Ruby creates behind the scenes.

Every Ruby object^[4] has a set of fields in memory:

klass

A pointer to the class object of this object. (It is klass instead of class because the latter is a reserved word in C++ and Ruby; if it were called class, Ruby would compile with a C compiler but not with a C++ compiler. This deliberate misspelling is used everywhere in Ruby.)

iv_tbl

"Instance Variable Table," a hashtable containing the instance variables belonging to this object.

flags

A bitfield of Boolean flags with some status information, such as the object's taint status, garbage collection mark bit, and whether the object is frozen.

Every Ruby class or module has the same fields, plus two more:

m_tbl

"Method Table," a hashtable of this class or module's instance methods.

super

A pointer to this class or module's superclass.

These fields play a huge role in method lookup, and it is important that you understand them. In particular, you should pay close attention to the difference between the klass and super pointers of a class object.

The rules

The method lookup rules are very simple, but they depend on an understanding of how Ruby's data structures work. When a message is sent to an object, ^[5] the following steps occur:

Ruby follows the receiver's klass pointer and searches the m_tbl of that class object for a matching method. (The target of a klass pointer will always be a class object.)
If no method is found, Ruby follows that class object's super pointer and continues the search in the superclass's m_tbl.
Ruby progresses in this manner until the method is found or the top of the super chain is reached.
If the method is not found in any object on the chain, Ruby invokes method_ missing on the receiver of the original method. This starts the process over again, this time looking for method_missing rather than the original method.

These rules apply universally. All of the interesting things that method lookup involves (mixins, class methods, and singleton classes) are consequences of the structure of the klass and super pointers. We will now examine this process in detail.

Class inheritance

The method lookup process can be confusing, so we'll start simple. Here is the simplest possible class definition in Ruby:

	class A
	end

This code generates the following data structures in memory (see Figure 1-1).

Figure 1-1. Data structures for a single class

The double-bordered boxes represent class objects—objects whose klass pointer points to the Class object. A's super pointer refers to the Object class object, indicating that A inherits from Object. For clarity, from now on we will omit default klass pointers to Class, Module, and Object where there is no ambiguity.

The next-simplest case is inheritance from one class. Class inheritance simply follows the super pointers. For example, we will create a B class that descends from A:

	class B < A 
	end

The resulting data structures are shown in Figure 1-2.

Figure 1-2. One level of inheritance

The super keyword always delegates along the method lookup chain, as in the following example:

	class B
	  def initialize
	    logger.info "Creating B object"
	    super
	  end
	end

The call to super in initialize will follow the standard method lookup chain, beginning with A#initialize.

Class instantiation

Now we get a chance to see how method lookup is performed. We first create an instance of class B:

obj = B.new

This creates a new object, and sets its klass pointer to B's class object (see Figure 1-3).

Figure 1-3. Class instantiation

The single-bordered box around obj represents a plain-old object instance. Note that each box in this diagram is an object instance. However, the double-bordered boxes represent objects that are instances of the Class class (hence their klass pointer points to the Class object).

When we send obj a message:

	obj.to_s

this chain is followed:

obj's klass pointer is followed to B; B's methods (in m_tbl) are searched for a matching method.
No methods are found in B. B's super pointer is followed, and A is searched for methods.
No methods are found in A. A's super pointer is followed, and Object is searched for methods.
The Object class contains a to_s method in native code (rb_any_to_s). This is invoked, yielding a value like "#<B:0x1cd3c0>". The rb_any_to_s method examines the receiver's klass pointer to determine what class name to display; therefore, B is shown even though the method invoked resides in Object.

Including modules

Things get more complicated when we start mixing in modules. Ruby handles module inclusion with ICLASSes,^[6] which are proxies for modules. When you include a module into a class, Ruby inserts an ICLASS representing the included module into the including class object's super chain.

For our module inclusion example, let's simplify things a bit by ignoring B for now. We define a module and mix it in to A, which results in data structures shown in Figure 1-4:

	module Mixin
	  def mixed_method
	    puts "Hello from mixin"
	  end
	end
	
	class A
	  include Mixin
	end

Figure 1-4. Inclusion of a module into the lookup chain

Here is where the ICLASS comes into play. The super link pointing from A to Object is intercepted by a new ICLASS (represented by the box with the dashed line). The ICLASS is a proxy for the Mixin module. It contains pointers to Mixin's iv_tbl (instance variables) and m_tbl (methods).

From this diagram, it is easy to see why we need proxy classes: the same module may be mixed in to any number of different classes—classes that may inherit from different classes (thus having different super pointers). We could not directly insert Mixin into the lookup chain, because its super pointer would have to point to two different things if it were mixed in to two classes with different parents.

When we instantiate A, the structures are as shown in Figure 1-5:

	objA = A.new

Figure 1-5. Method lookup for a class with an included module

We invoke the mixed_method method from the mixin, with objA as the receiver:

	objA.mixed_method
	# >> Hello from mixin

The following method-lookup process takes place:

objA's class, A, is searched for a matching method. None is found.
A's super pointer is followed to the ICLASS that proxies Mixin. This proxy object is searched for a matching method. Because the proxy's m_tbl is the same as Mixin's m_tbl, the mixed_method method is found and invoked.

Many languages with multiple inheritance suffer from the diamond problem, which is ambiguity in resolving method calls on objects whose classes have a diamond-shaped inheritance graph, as shown in Figure 1-6.

Given this diagram, if an object of class D calls a method defined in class A that has been overridden in both B and C, there is ambiguity about which method should be called. Ruby resolves this by linearizing the order of inclusion. Upon a method call, the lookup chain is searched linearly, including any ICLASSes that have been inserted into the chain.

First of all, Ruby does not support multiple inheritance; however, multiple modules can be mixed into classes and other modules. Therefore, A, B, and C must be modules. We see that there is no ambiguity here; the method chosen is the latest one that was inserted into the lookup chain:

	module A
	  def hello
	    "Hello from A"
	  end
	end

Figure 1-6. The diamond problem of multiple inheritance

	module B
	  include A
	  def hello
	    "Hello from B"
	  end
	end

	module C
	  include A
	  def hello
	    "Hello from C"
	  end
	end

	class D
	  include B
	  include C
	end

	D.new.hello # => "Hello from C"

And if we change the order of inclusion, the result changes correspondingly:

	class D
	  include C
	  include B
	end
	
	D.new.hello # => "Hello from B"

In this last example, where B is included last, the object graph looks like Figure 1-7 (for simplicity, pointers to Object and Class have been elided).

Figure 1-7. Ruby's solution for the diamond problem: linearization

The singleton class

Singleton classes (also metaclasses or eigenclasses; see the upcoming sidebar, "Single-ton Class Terminology") allow an object's behavior to be different from that of other objects of its class. You've probably seen the notation to open up a singleton class before:

	class A
	end

	objA = A.new
	objB = A.new
	objA.to_s # => "#<A:0x1cd0a0>"
	objB.to_s # => "#<A:0x1c4e28>"

	class <<objA # Open the singleton class of objA
	  def to_s; "Object A"; end
	end

	objA.to_s # => "Object A"
	objB.to_s # => "#<A:0x1c4e28>"

The class <<objA notation opens objA's singleton class. Instance methods added to the singleton class function as instance methods in the lookup chain. The resulting data structures are shown in Figure 1-8.

Figure 1-8. Singleton class of an object

The objB instance is of class A, as usual. And if you ask Ruby, it will tell you that objA is also of class A:

	objA.class # => A

However, something different is going on behind the scenes. Another class object has been inserted into the lookup chain. This object is the singleton class of objA.We refer to it as "Class:objA" in this documentation. Ruby calls it a similar name: #<Class:#<A:0x1cd0a0>>. Like all classes, the singleton class's klass pointer (not shown) points to the Class object.

The singleton class is marked as a virtual class (one of the flags is used to indicate that a class is virtual). Virtual classes cannot be instantiated, and we generally do not see them from Ruby unless we take pains to do so. When we ask Ruby for objA's class, it traverses the klass and super pointers up the hierarchy until it finds the first nonvirtual class.

Therefore, it tells us that objA's class is A. This is important to remember: an object's class (from Ruby's perspective) may not match the object pointed to by klass.

Singleton classes are called singleton for a reason: there can only be one singleton class per object. Therefore, we can refer unambiguously to "objA's singleton class" or Class:objA. In our code, we can assume that the singleton class exists; in reality, for efficiency, Ruby creates it only when we first mention it.

Ruby allows singleton classes to be defined on any object except Fixnums or symbols. Fixnums and symbols are immediate values (for efficiency, they're stored as themselves in memory, rather than as a pointer to a data structure). Because they're stored on their own, they don't have klass pointers, so there's no way to alter their method lookup chain.

You can open singleton classes for true, false, and nil, but the singleton class returned will be the same as the object's class. These values are singleton instances (the only instances) of TrueClass, FalseClass, and NilClass, respectively. When you ask for the singleton class of true, you will get TrueClass, as the immediate value true is the only possible instance of that class. In Ruby:

	true.class # => TrueClass
	class << true; self; end # => TrueClass
	true.class == (class << true; self; end) # => true

Singleton classes of class objects

Here is where it gets complicated. Keep in mind the basic rule of method lookup: first Ruby follows an object's klass pointer and searches for methods; then Ruby keeps following super pointers all the way up the chain until it finds the appropriate method or reaches the top.

The important thing to remember is that classes are objects, too. Just as a plain-old object can have a singleton class, class objects can also have their own singleton classes. Those singleton classes, like all other classes, can have methods. Since the singleton class is accessed through the klass pointer of its owner's class object, the singleton class's instance methods are class methods of the singleton's owner.

The full set of data structures for the following code is shown in Figure 1-9:

	class A
	end

Class A inherits from Object. The A class object is of type Class. Class inherits from Module, which inherits from Object. The methods stored in A's m_tbl are instance methods of A. So what happens when we call a class method on A?

	A.to_s # => "A"

The same method lookup rules apply, with A as the receiver. (Remember, A is a constant that evaluates to A's class object.) First, Ruby follows A's klass pointer to Class. Class's m_tbl is searched for a function named to_s. Finding none, Ruby follows Class's super pointer to Module, where the to_s function is found (in native code, rb_mod_to_s).

Figure 1-9. Full set of data structures for a single class

This should not be a surprise. There is no magic here. Class methods are found in the exact same way as instance methods—the only difference is whether the receiver is a class or an instance of a class.

Now that we know how class methods are looked up, it would seem that we could define class methods on any class by defining instance methods on the Class object (to insert them into Class's m_tbl). Indeed, this works:

	class A; end
	# from Module#to_s
	A.to_s # => "A"

	class Class
	  def to_s; "Class#to_s"; end
	end
	
	A.to_s # => "Class#to_s"

That is an interesting trick, but it is of very limited utility. Usually we want to define unique class methods on each class. This is where singleton classes of class objects are used. To open up a singleton class on a class, simply pass the class's name as the object to the singleton class notation:

	class A; end
	class B; end

	class <<A
	  def to_s; "Class A"; end
	end

	A.to_s # => "Class A"
	B.to_s # => "B"

The resulting data structures are shown in Figure 1-10. Class B is omitted for brevity.

Figure 1-10. Singleton class of a class

The to_s method has been added to A's singleton class, or Class:A. Now, when A.to_s is called, Ruby will follow A's klass pointer to Class:A and invoke the appropriate method there.

There is one more wrinkle in method definition. In a class or module definition, self always refers to the class or module object:

	class A
	  self # => A
	end

So, inside A's class definition, class<<A can also be written class<<self, since inside that definition A and self refer to the same object. This idiom is used everywhere in Rails to define class methods. This example shows all of the ways to define class methods:

	class A
	  def A.class_method_one; "Class method"; end

	  def self.class_method_two; "Also a class method"; end

	  class <<A
	    def class_method_three; "Still a class method";
	  end
	end

	  class <<self
	    def class_method_four; "Yet another class method"; end
	  end
	end

	  def A.class_method_five
	    "This works outside of the class definition"
	  end

	  class <<A
	    def A.class_method_six
	      "You can open the metaclass outside of the class definition"
	    end
	  end

	  # Print the result of calling each method in turn
	  %w(one two three four five six).each do |number|
	    puts A.send(:"class_method_#{number}")
	  end

	  # >> Class method
	  # >> Also a class method
	  # >> Still a class method
	  # >> Yet another class method
	  # >> This works outside of the class definition
	  # >> You can open the metaclass outside of the class definition

This also means that inside a singleton class definition—as in any other class definition—self refers to the class object being defined. When we remember that the value of a block or class definition is the value of the last statement executed, we can see that the value of class <<objA; self; end is objA's singleton class. The class <<objA construct opens up the singleton class, and self (the singleton class) is returned from the class definition.

Putting this together, we can open up the Object class and add an instance method to every object that returns that object's singleton class:

	class Object
   	  def metaclass
	    class <<self
	       self
	    end
	  end
	 end

This method forms the basis of Metaid, which is described shortly.

Method missing

After all of that confusion, method_missing is remarkably simple. There is one rule: if the whole method lookup procedure fails all the way up to Object, method lookup is tried again, looking for a method_missing method rather than the original method. If the method is found, it is called with the same arguments as the original method, with the method name prepended. Any block given is also passed through.

The default method_missing function in Object (rb_method_missing) raises an exception.

Metaid

why the lucky stiff has created a tiny library for Ruby metaprogramming called metaid.rb. This snippet is useful enough to include in any project in which meta-programming is needed:^[7]

	class Object
	  # The hidden singleton lurks behind everyone
	  def metaclass; class << self; self; end; end
	  def meta_eval &blk; metaclass.instance_eval &blk; end

	  # Adds methods to a metaclass
	  def meta_def name, &blk
	    meta_eval { define_method name, &blk }
	  end

	  # Defines an instance method within a class
	  def class_def name, &blk
	    class_eval { define_method name, &blk }
	  end	
	end

This library defines four methods on every object:

metaclass: Refers to the singleton class of the receiver (self).
meta_eval: The equivalent of class_eval for singleton classes. Evaluates the given block in the context of the receiver's singleton class.
meta_def: Defines a method within the receiver's singleton class. If the receiver is a class or module, this will create a class method (instance method of the receiver's singleton class).
class_def: Defines an instance method in the receiver (which must be a class or module).

Metaid's convenience lies in its brevity. By using a shorthand for referring to and augmenting metaclasses, your code will become clearer rather than being littered with constructs like class << self; self; end. The shorter and more readable these techniques are, the more likely you are to use them appropriately in your programs.

This example shows how we can use Metaid to examine and simplify our singleton class hacking:

	class Person
	  def name; "Bob"; end
	  def self.species; "Homo sapiens"; end
	end

Class methods are added as instance methods of the singleton class:

	Person.instance_methods(false)            # => ["name"]
	Person.metaclass.instance_methods -
	  Object.metaclass.instance_methods       # => ["species"]

Using the methods from Metaid, we could have written the method definitions as:

Person.class_def(:name) { "Bob" }
Person.meta_def(:species) { "Homo sapiens" }

Variable Lookup

There are four types of variables in Ruby: global variables, class variables, instance variables, and local variables.^[8] Global variables are stored globally, and local variables are stored lexically, so neither of them is relevant to our discussion now, as they do not interact with Ruby's class system.

Instance variables are specific to a certain object. They are prefixed with one @ symbol: @price is an instance variable. Because every Ruby object has an iv_tbl structure, any object can have instance variables.

Since a class is also an object, a class can have instance variables. The following code accesses an instance variable of a class:

	class A
	  @ivar = "Instance variable of A"
	end

	A.instance_variable_get(:@ivar) # => "Instance variable of A"

Instance variables are always resolved based on the object pointed to by self. Because self is A's class object in the class A … end definition, @ivar belongs to A's class object.

Class variables are different. Any instance of a class can access its class variables (which start with @@). Class variables can also be referenced from the class definition itself. While class variables and instance variables of a class are similar, they're not the same:

	class A
	  @var = "Instance variable of A"
	  @@var = "Class variable of A"

	  def A.ivar
	    @var
	  end

	  def A.cvar
	    @@var
	  end
	end

	A.ivar # => "Instance variable of A"
	A.cvar # => "Class variable of A"

In this code sample, @var and @@var are stored in the same place: in A's iv_tbl. However, they are different variables, because they have different names (the @ symbols are included in the variable's name as stored). Ruby's functions for accessing instance variables and class variables check to ensure that the names passed are in the proper format:

	A.instance_variable_get(:@@var)
	# ~> -:17:in 'instance_variable_get': '@@var' is not allowed as an instance
	    variable name (NameError)

Class variables can be somewhat confusing to use. They are shared all the way down the inheritance hierarchy, so subclasses that modify a class variable will modify the parent's class variable as well.

	>> class A; @@x = 3 end
	=> 3
	>> class B < A; @@x = 4 end
	=> 4
	>> class A; @@x end
	=> 4

This may be useful, but it may also be confusing. Generally, you either want class instance variables—which are independent of the inheritance hierarchy—or the class inheritable attributes provided by ActiveSupport, which propagate values in a controlled, well-defined manner.

Blocks, Methods, and Procs

One powerful feature of Ruby is the ability to work with pieces of code as objects. There are three classes that come into play, as follows:

Proc: A Proc represents a code block: a piece of code that can be called with arguments and has a return value.
UnboundMethod: This is similar to a Proc; it represents an instance method of a particular class. (Remember that class methods are instance methods of a class object, so UnboundMethods can represent class methods, too.) An UnboundMethod must be bound to a class before it can be invoked.
Method: Method objects are UnboundMethods that have been bound to an object with UnboundMethod#bind. Alternatively, they can be obtained with Object#method.

Let's examine some ways to get Proc and Method objects. We'll use the Fixnum#+ method as an example. We usually invoke it using the dyadic syntax:

	3 + 5 # => 8

However, it can be invoked as an instance method of a Fixnum object, like any other instance method:

	3.+(5) # => 8

We can use the Object#method method to get an object representing this instance method. The method will be bound to the object that method was called on, 3.

	add_3 = 3.method(:+)
	add_3 # => #<Method: Fixnum#+>

This method can be converted to a Proc, or called directly with arguments:

	add_3.to_proc # => #<Proc:0x00024b08@-:6>
	add_3.call(5) # => 8
	# Method#[] is a handy synonym for Method#call.
	add_3[5] # => 8

There are two ways to obtain an unbound method. We can call instance_method on the class object:

	add_unbound = Fixnum.instance_method(:+)
	add_unbound # => #<UnboundMethod: Fixnum#+>

We can also unbind a method that has already been bound to an object:

	add_unbound == 3.method(:+).unbind # => true
	add_unbound.bind(3).call(5) # => 8

We can bind the UnboundMethod to any other object of the same class:

	add_unbound.bind(15)[4] # => 19

However, the object we bind to must be an instance of the same class, or else we get a TypeError:

	add_unbound.bind(1.5)[4] # =>
	# ~> -:16:in 'bind': bind argument must be an instance of Fixnum (TypeError)
	# ~> from -:16

We get this error because + is defined in Fixnum; therefore, the UnboundMethod object we receive must be bound to an object that is a kind_of?(Fixnum). Had the + method been defined in Numeric (from which both Fixnum and Float inherit), the preceding code would have returned 5.5.

Blocks to Procs and Procs to blocks

One downside to the current implementation of Ruby: blocks are not always Procs, and vice versa. Ordinary blocks (created with do…end or {}) must be attached to a method call, and are not automatically objects. For example, you cannot say code_ block ={puts"abc"}. This is what the Kernel#lambda and Proc.new functions are for: converting blocks to Procs. ^[9]

	block_1 = lambda { puts "abc" } # => #<Proc:0x00024914@-:20>
	block_2 = Proc.new { puts "abc" } # => #<Proc:0x000246a8@-:21>

There is a slight difference between Kernel#lambda and Proc.new. Returning from a Proc created with Kernel#lambda returns the given value to the calling function; returning from a Proc created with Proc.new attempts to return from the calling function, raising a LocalJumpError if that is impossible. Here is an example:

	def block_test
	  lambda_proc = lambda { return 3 }
	  proc_new_proc = Proc.new { return 4 }

	  lambda_proc.call # => 3
	  proc_new_proc.call # =>

	  puts "Never reached"
	end
	
	block_test # => 4

The return statement in lambda_proc returns the value 3 from the lambda. Conversely, the return statement in proc_new_proc returns from the calling function, block_test— thus, the value 4 is returned from block_test. The puts statement is never executed, because the proc_new_proc.call statement returns from block_test first.

Blocks can also be converted to Procs by passing them to a function, using & in the function's formal parameters:

	def some_function(&b)
	  puts "Block is a #{b} and returns #{b.call}"
	end

	some_function { 6 + 3 }
	# >> Block is a #<Proc:0x00025774@-:7> and returns 9

Conversely, you can also substitute a Proc with & when a function expects a block:

	add_3 = lambda {|x| x+3}
	(1..5).map(&add_3) # => [4, 5, 6, 7, 8]

Closures

Closures are created when a block or Proc accesses variables defined outside of its scope. Even though the containing block may go out of scope, the variables are kept around until the block or Proc referencing them goes out of scope. A simplistic example, though not practically useful, demonstrates the idea:

	def get_closure
	  data = [1, 2, 3]
	  lambda { data }
	end
	block = get_closure
	block.call # => [1, 2, 3]

The anonymous function (the lambda) returned from get_closure references the local variable data, which is defined outside of its scope. As long as the block variable is in scope, it will hold its own reference to data, and that instance of data will not be destroyed (even though the get_closure function returns). Note that each time get_closure is called, data references a different variable (since it is function-local):

	block = get_closure
	block2 = get_closure

	block.call.object_id # => 76200
	block2.call.object_id # => 76170

A classic example of closures is the make_counter function, which returns a counter function (a Proc) that, when executed, increments and returns its counter. In Ruby, make_counter can be implemented like this:

	def make_counter(i=0)
	  lambda { i += 1 }
	end

	x = make_counter
	x.call # => 1
	x.call # => 2

	y = make_counter
	y.call # => 1
	y.call # => 2

The lambda function creates a closure that closes over the current value of the local variable i. Not only can the variable be accessed, but its value can be modified. Each closure gets a separate instance of the variable (because it is a variable local to a particular instantiation of make_counter). Since x and y contain references to different instances of the local variable i, they have different state.

^[3] If that weren't confusing enough, the Class object has class Class as well.

^[4]Except immediate objects (Fixnums, symbols, true, false, and nil); we'll get to those later.

^[5]Ruby often co-opts Smalltalk's message-passing terminology: when a method is called, it is said that one is sending a message. The receiver is the object that the message is sent to.

^[6]ICLASS is Mauricio Fernández's term for these proxy classes. They have no official name but are of type T_ICLASS in the Ruby source.

^[7]"Seeing Metaclasses Clearly." http://whytheluckystiff.net/articles/seeingMetaclassesClearly.html

^[8]There are also constants, but they shouldn't vary. (They can, but Ruby will complain.)

^[9]Kernel#proc is another name for Kernel#lambda, but its usage is deprecated.