Ruby Under a Microscope

Blocks: Closures in Ruby

Sussman and Steele gave a useful definition of the term “closure”
in this 1975 academic paper, one of the so-called “Lambda Papers.”

Internally Ruby represents each block using a C structure called rb_block_t:

Exactly what are blocks? One way to answer this question would be to take a look at the values Ruby stores inside this structure. Just as we did in Chapter 3 with the RClass structure, let’s deduce what the contents of the rb_block_t structure must be based on what we know blocks can do in our Ruby code.

Starting with the most obvious attribute of blocks, we know each block must consist of a piece of Ruby code, or internally a set of compiled YARV byte code instructions. For example, if I call a method and pass a block as a parameter:

10.times do
  str = "The quick brown fox jumps over the lazy dog."
  puts str
end

…it’s clear that when executing the 10.times call, Ruby needs to know what code to iterate over. Therefore, the rb_block_t structure must contain a pointer to that code:

In this diagram, you can see a value called iseq which is a pointer to the YARV instructions for the Ruby code in my block.

Another obvious but often overlooked behavior of blocks is that they can access variables in the surrounding or parent Ruby scope. For example:

str = "The quick brown fox"
10.times do
  str2 = "jumps over the lazy dog."
  puts "#{str} #{str2}"
end

Here the puts function call refers equally well to the str variable located inside the block and the str2 variable from the surrounding code. We often take this for granted – obviously blocks can access values from the code surrounding them. This ability is one of the things that makes blocks useful.

If you think about this for a moment, you’ll realize blocks have in some sense a dual personality. On the one hand, they behave like separate functions: you can call them and pass them arguments just as you would with any function. On the other hand, they are part of the surrounding function or method. As I wrote the sample code above I didn’t think of the block as a separate function – I thought of the block’s code as just part of the simple, top level script that printed a string 10 times.

Stepping through how Ruby calls a block

How does this work internally? Does Ruby internally implement blocks as separate functions? Or as part of the surrounding function? Let’s step through the example above, slowly, and see what happens inside of Ruby when you call a block.

In this example when Ruby executes the first line of code, as I explained in Chapter 2, YARV will store the local variable str on its internal stack, and save its location in the DFP pointer located in the current rb_control_frame_t structure. If the outer code was located inside a function or method then the DFP would point to the stack frame as shown, but if the outer code was located in the top level scope of your Ruby program, then Ruby would use dynamic access to save the variable in the TOPLEVEL_BINDING environment instead – more on this in section 5.3. Regardless the DFP will always indicate the location of the str variable.

Next Ruby will come to the “10.times do” call. Before executing the actual iteration – before calling the times method – Ruby will create and initialize a new rb_block_t structure to represent the block. Ruby needs to create the block structure now, since the block is really just another argument to the times method:

To do this, Ruby copies the current value of the DFP, the dynamic frame pointer, into the new block. In other words, Ruby saves away the location of the current stack frame in the new block.

Next Ruby will proceed to call the times method on the object 10, an instance of the Fixnum class. While doing this, YARV will create a new frame on its internal stack. Now we have two stack frames: on the top is the new stack frame for the Fixnum.times method, and below is the original stack frame used by the top level function:

Ruby implements the times method internally using its own C code. It’s a built in method, but Ruby implements it the same way you probably would in Ruby. Ruby starts to iterate over the numbers 0, 1, 2, etc., up to 9, and calls yield, calling the block once for each of these integers. Finally, the code that implements yield internally actually calls the block each time through the loop, pushing a third Ruby actually pushes an extra, internal stack frame whenever you call yield before actually calling the block, so strictly speaking there should be four stack frames in this diagram. I only show three for the sake of clarity. frame onto the top of the stack for the code inside the block to use:

Above, on the left, we now have three stack frames:

On the top is the new stack frame for the block, containing the str2 variable.
In the middle is the stack frame used by the internal C code that implements the Fixnum.times method.
And at the bottom is the original function’s stack frame, containing the str variable from the outer scope.

While creating the new stack frame at the top, Ruby’s internal yield code copies the DFP from the block into the new stack frame. Now the code inside the block can access both its own local variables, via the rb_control_frame structure as usual, and indirectly the variables from the parent scope, via the DFP pointer using dynamic variable access, as I explained in Chapter 2. Specifically, this allows the puts statement to access the str2 variable from the parent scope.

Borrowing an idea from 1975

To summarize, we have seen now that Ruby’s rb_block_t structure contains two important values:

a pointer to a snippet of YARV code instructions, and
a pointer to a location on YARV’s internal stack, the location that was at the top of the stack when the block was created:

At first glance, this seems like a very technical, unimportant detail. This is obviously a behavior we expect Ruby blocks to exhibit, and the DFP seems to be just another minor, uninteresting part of Ruby’s internal implementation of blocks.

Or is it? I believe the DFP is actually a profound, important part of Ruby internals. The DFP is the basis for Ruby’s implementation of “closures,” a computer science concept invented long before Ruby was created in the 1990s. In fact, the Scheme programming language, a dialect of Lisp invented by Gerald Sussman and Guy Steele in 1975, was one of the first languages to formally implement closures – almost twenty years earlier! Here’s how Sussman and Steele defined the term “closure” in their 1975 paper Scheme: An Interpreter for Extended Lambda Calculus:

In order to solve this problem we introduce the notion of a closure [11, 14] which is a data structure containing a lambda expression, and an environment to be used when that lambda expression is applied to arguments.

Reading this again, a closure is defined to be the combination of:

A “lambda expression,” i.e. a function that takes a set of arguments, and
An environment to be used when calling that lambda or function.

I’ll have more context and information about “lambda expressions” and how Ruby’s borrowed the “lambda” keyword from Lisp in section 5-2, but for now take another look at the internal rb_block_t structure:

Notice that this structure meets the definition of a closure Sussman and Steele wrote back in 1975:

iseq is a pointer to a lambda expression – i.e. a function or code snippet, and
DFP is a pointer to the environment to be used when calling that lambda or function – i.e. a pointer to the surrounding stack frame.

Following this train of thought, we can see that blocks are Ruby’s implementation of closures. Ironically blocks, one of the features that in my opinion makes Ruby so elegant and natural to read – so modern and innovative – is based on research and work done at least 20 years before Ruby was ever invented!

In Ruby 1.9 and later you can find the actual definition of the rb_block_t structure in the vm_core.h file. Here it is:

typedef struct rb_block_struct {
    VALUE self;
    VALUE *lfp;
    VALUE *dfp;
    rb_iseq_t *iseq;
    VALUE proc;
} rb_block_t;

You can see the iseq and dfp values I described above, along with a few other values:

self: As we’ll see in the next section when I cover the lambdas, procs and bindings, the value the self pointer had when the block was first referred to is also an important part of the closure’s environment. Ruby executes block code inside the same object context the code outside the block had.
lfp: It turns out blocks also contain a local frame pointer, along with the dynamic frame pointer. However, Ruby doesn’t use local variable access inside of blocks; it doesn’t use the set/getlocal YARV instructions inside of blocks. Instead, Ruby uses this LFP for internal, technical reasons and not to access local variables.
proc: Finally, Ruby uses this value when it creates a proc object from a block. As we’ll see in the next section, procs and blocks are closely related.

Right above the definition of rb_block_t in vm_core.h you’ll see the rb_control_frame_t structure defined:

typedef struct {
    VALUE *pc;            /* cfp[0] */
    VALUE *sp;            /* cfp[1] */
    VALUE *bp;            /* cfp[2] */
    rb_iseq_t *iseq;      /* cfp[3] */
    VALUE flag;           /* cfp[4] */
    VALUE self;           /* cfp[5] / block[0] */
    VALUE *lfp;           /* cfp[6] / block[1] */
    VALUE *dfp;           /* cfp[7] / block[2] */
    rb_iseq_t *block_iseq;/* cfp[8] / block[3] */
    VALUE proc;           /* cfp[9] / block[4] */
    const rb_method_entry_t *me;/* cfp[10] */
} rb_control_frame_t;

Notice that this C structure also contains all of the same values the rb_block_t structure did: everything from self down to proc. The fact that these two structures share the same values is actually one of the interesting, but confusing, optimizations Ruby uses internally to speed things up a bit. Whenever you refer to a block for the first time by passing it into a method call, as I explained above, Ruby creates a new rb_block_t structure and copies values such as the LFP from the current rb_control_frame_t structure into it. However, by making the members of these two structures similar – rb_block_t is a subset of rb_control_frame_t; they contain the same values in the same order – Ruby is able to avoid creating a new rb_block_t structure and instead sets the pointer to the new block to refer to the common portion of the rb_control_frame structure. In other words, instead of allocating new memory to hold the new rb_block_t structure, Ruby simply passes around a pointer to the middle of the rb_control_frame_t structure. This is very confusing, but does avoid unnecessary calls to malloc, and speeds up the process of creating blocks.

Experiment 5-1: Which is faster: a while loop or passing a block to each?

I said earlier that in my opinion Ruby code containing blocks is often more elegant and succinct than the equivalent code would be using an older language such as C. For example, in C I would write a simple while loop to add up the numbers 1 through 10 like this:

#include <stdio.h>
main()
{
  int i, sum;
  i = 1;
  sum = 0;
  while (i <= 10) {
    sum = sum + i;
    i++;
  }
  printf("Sum: %d\n", sum);
}

…and I could use a while loop in Ruby in the same manner:

sum = 0
i = 1
while i <= 10
  sum += i
  i += 1
end
puts "Sum: #{sum}"

However, most Rubyists would write this loop using an iterator with a block, like this:

sum = 0
(1..10).each do |i|
  sum += i
end
puts "Sum: #{sum}"

Aesthetics aside, is there any performance penalty for using a block here? Does Ruby slow down significantly in order to create the new rb_block_t structure, copy the DFP value, and create new stack frames – everything I discussed above?

I won’t benchmark the C code – clearly that will be faster than either option using Ruby. Instead, let’s measure how long it takes Ruby to add up the integers 1 through 10 to obtain 55, using a simple while loop:

require 'benchmark'
ITERATIONS = 1000000
Benchmark.bm do |bench|
  bench.report("iterating from 1 to 10, one million times") do
    ITERATIONS.times do
      sum = 0
      i = 1
      while i <= 10
        sum += i
        i += 1
      end
    end
  end
end

Here I am using the benchmark library to measure the time required to run the while loop one million times. Admittedly I’m using a block to control the million iterations (ITERATIONS.times do) but I’ll use the same block in the next test as well.

On my laptop with Ruby 1.9.2, I can run through this code in just over a half second:

$ ruby while.rb
      user     system      total        real
      iterating from 1 to 10, one million times  0.520000   0.000000
                                                 0.520000 (  0.514112)

Now let’s measure the time required using the each iterator with a block:

require 'benchmark'
ITERATIONS = 1000000
Benchmark.bm do |bench|
  bench.report("iterating from 1 to 10, one million times") do
    ITERATIONS.times do
      sum = 0
      (1..10).each do |i|
        sum += i
      end
    end
  end
end

This time it takes somewhat longer to run through the loop a million times, about 0.8 seconds:

$ ruby each.rb
      user     system      total        real
      iterating from 1 to 10, one million times  0.800000   0.000000
                                                 0.800000 (  0.808752)

Ruby requires about 57% more time to call the block 10 times, compared to iterating through the simple while loop 10 times.

Time for 1,000,000 iterations (sec)

At first glance, 57% more time seems like a large performance penalty… just for making your Ruby code somewhat more readable and pleasant to look at. Depending on your work and the context of this while loop this may or may not be an important difference. If this loop were part of a time-sensitive, critical operation that your end users were waiting for – and if there weren’t other expensive operations inside the loop – then writing the iteration using an old-fashioned C style while loop might be worthwhile.

However, the performance of most Ruby applications, and certainly Ruby on Rails web sites, is usually limited by database queries, network connections and other factors – and not by Ruby execution speed. It’s rare that Ruby’s execution speed has an immediate, direct impact on your application’s overall performance. Of course, if you are using a large framework such as Ruby on Rails then your own Ruby code is a very small piece of a very large system. I imagine that Rails uses blocks and iterators many, many times while processing a simple HTTP request, apart from the Ruby code you write yourself.