In the previous section, we saw how Ruby maintained two stacks: an internal stack used by YARV as well as your Ruby program’s call stack. But something obvious was missing from both of these code examples: variables. Neither of my scripts used any Ruby variables – a more realistic example program would have used variables many times. How does Ruby handle variables internally? Where are they stored?
Storing variables is straightforward: Ruby stores all of the values you save in variables on YARV’s stack, along with the parameters to and return values from the YARV instructions. However, accessing these variables is not so simple. Internally Ruby uses two very different methods for saving and retrieving a value you save in a variable: local access and dynamic access.
Let’s start with local access first, since that’s simpler. Whenever you make a method call, Ruby sets aside some space on the YARV stack for any local variables that are declared inside the method you are calling. Ruby knows how many variables you are using by consulting the “local table” that was created for each method during the compilation step I covered in Chapter 1.
For example, suppose I write a very silly Ruby function to display a string:
On the left is my Ruby code, and on the right is a diagram showing the YARV stack and stack pointer. You can see that Ruby stores the variables on the stack just under the stack pointer. Notice there’s a space reserved for the str value on the stack, three slots under where the SP is, in other words at SP-3.
Ruby uses the svar/cref slot for two different purposes: it might contain a pointer to a table of the “special variables” that exist in the current method. These are values such as $! (last exception message) or $& (last regular expression match). Or it might contain a pointer to the current lexical scope. Lexical scope indicates which class or module you are currently adding methods to. In Experiment 2-2 I’ll explore what special variables are and how they work.
Ruby uses the first slot – the “special” variable – to keep track of information related to blocks. I’ll have more about this in a moment when I discuss dynamic variable access.
When my example code saves a value into str, Ruby just needs to write the value into that space on the stack:
Internally YARV uses another pointer similar to the stack pointer called the LFP or “Local Frame Pointer.” This points to where the local variables for the current method are located on the stack. Initially it is set to SP-1. Later the value of SP will change as YARV executes instructions, while the LFP value will normally remain constant.
Here are the YARV instructions that Ruby compiled my display_string function into:
First the putstring instruction saves the “Local access” string on the top of the stack, incrementing the SP pointer. Then you can see YARV uses the setlocal instruction to get the value at the top of the stack and save it in the space allocated on the stack for the str local variable. Internally, setlocal uses the LFP pointer and a numerical index indicating which variable to set – in this example that would be: “address of str = LFP-2.”
Next for the call to “puts str” Ruby uses the getlocal instruction:
Here Ruby has pushed the string value back onto the top of the stack, where it can be used as an argument for the call to the puts function.
The works the same way if I instead pass the string in as a method parameter – method arguments are essentially the same as local variables:
The only difference between method arguments and local variables is that the calling code pushes the arguments onto the stack before the method calls even occurs. In this example, there are no local variables, but the single argument appears on the stack just like a local variable:
Now let’s take a look at how dynamic variable access works, and what that “special” value is. Ruby uses dynamic access when you use a variable that’s defined in a different scope, for example when you write a block that references values in the parent scope. Here’s an example:
def display_string str = 'Dynamic access.' 10.times do puts str end end
Here str is again a local variable in display_string, and Ruby will save it using the setlocal instruction we saw above.
However, now I’m calling “puts str” from inside a block. To access the str local variable from the block, Ruby will have to use dynamic access to reach the stack frame for the parent scope. Before explaining exactly how dynamic access works, let’s first step through the process of calling the block to see how Ruby sets up the stack.
First Ruby will call the 10.times method, passing a block in as an argument:
First, notice the value 10 on the stack – this is the actual receiver of the method times. You can also see just above that Ruby has created a new stack frame on the right for the C code that implements Integer#times to use. Since I passed a block into the method call, Ruby saves a pointer to this block in the “special” variable on the stack. Each frame on the YARV stack corresponding to a method call keeps track of whether or not there was a block argument using this “special” variable. I’ll cover blocks and the rb_block_t structure in much more detail in Chapter 5.
Now Ruby will call the block’s code over and over again, 10 times:
You can see here that, as I explained in section 2.1, Ruby actually creates two new stack frames when you call a block: a FINISH frame and a BLOCK frame. The first FINISH frame is more or less a copy of the previous stack frame, holding the block as a parameter in the “special” variable. But when Ruby starts to execute the block itself, it changes the “special” variable to become something else: a pointer to the parent scope’s stack frame. This is known as the DFP or Dynamic Frame Pointer.
Ruby uses the DFP to enable dynamic variable access. Here are the YARV code instructions Ruby compiled my block into:
The dashed arrows indicate Ruby’s dynamic variable access: the getdynamic YARV instruction copies the value of str from the lower stack frame, from the parent or outer Ruby scope, up to the top of the stack, where the YARV instructions in the block can access it. Note how the DFP pointers, in a sense, form a ladder that Ruby can climb to access the local variables in the parent scope, or the grandparent scope, etc.
In the “getdynamic str, 1” call above, the second parameter 1 indicates which stack frame or Ruby scope to look for the variable str in. Ruby implements this by iterating through the DFP pointers that number of times. In this case Ruby moves up one scope before looking for str. If I had two nested blocks like this:
def display_string str = 'Dynamic access.' 10.times do 10.times do puts str end end end
… then Ruby would have used “getdynamic str, 2” instead.
Let’s take a look at the actual C implementation of getdynamic. Like most of the other YARV instructions, Ruby implements getdynamic in the insns.def code file:
/** @c variable @e Get value of block local variable (pointed to by idx 'level' indicates the nesting depth from the current @j level, idx で指定されたブロックローカル変数の値をスタックに level はブロックのネストレベルで、何段上か */ DEFINE_INSN getdynamic (dindex_t idx, rb_num_t level) () (VALUE val) { rb_num_t i; VALUE *dfp2 = GET_DFP(); for (i = 0; i < level; i++) { dfp2 = GET_PREV_DFP(dfp2); } val = *(dfp2 - idx); }
Here the GET_DFP macro returns the DFP from the current scope. This macro is defined in the vm_insnhelper.h file along with a number of other YARV instruction related macros. Then Ruby iterates over the DFP pointers, moving from the current scope to the parent scope, and then from the parent scope to the grandparent scope, by repeatedly dereferencing the DFP pointers. Ruby uses the GET_PREV_DFP macro, also defined in vm_insnhelper.h, to move from one DFP to another. The level parameter indicates how many times to iterate, or how many rungs of the ladder to climb.
Finally, Ruby obtains the target variable using the idx parameter; this is the index of the target variable. Therefore, this line of code:
val = *(dfp2 - idx);
…gets the value from the target variable. It means:
Start from the address of the DFP for the target scope, dfp2, obtained previously from the GET_PREV_DFP iterations.
Subtract idx from this address. idx tells getdynamic the index of the local variable you want to load, or in other words how far down the stack the target variable is located.
Get the value from the YARV stack at this adjusted address.
So in my example above:
getdynamic str, 2
YARV will take the DFP from the scope two levels up on the YARV stack, and subtract the index value str (this might be 2 or 3 for example) from it to obtain a pointer to the str variable.
In the diagrams above I showed a value called svar/cref in the LFP-1 position on the stack. What are these two values? And how can Ruby save two values in one location on the stack? Why does it do this? Let’s take a look….
Most often the LFP-1 slot in the stack will contain the svar value - this is a pointer to a table of any special variables that might exist in this stack frame. In Ruby the term “special variables” refers to values that Ruby automatically creates for you as a convenience based on the environment or on recent operations. For example, Ruby sets $* to the ARGV array and $! to the last exception raised.
Notice that all of the special variables begin with the dollar sign character, which usually indicates a global variable. This begs the question: are special variables global variables? If so, then why does Ruby save a pointer to them on the stack? To find out, let’s create a simple Ruby script to match a string using a regular expression:
/fox/.match("The quick brown fox jumped over the lazy dog.\n") puts "Value of $& in the top level scope: #{$&}"
Here I’m matching the word fox in the string using a regex. Then I print out the matching string using the $& special variable. Running this I get:
$ ruby regex.rb Value of $& in the top level scope: fox
Now I’ll search the same string twice: first in the top level scope and then again from inside a method call:
str = "The quick brown fox jumped over the lazy dog.\n" /fox/.match(str)
def search(str) /dog/.match(str) puts "Value of $& inside method: #{$&}" end search(str)
puts "Value of $& in the top level scope: #{$&}"
This is simple Ruby code, but it’s still a bit confusing. Here’s how this works:
First I search the string in the top scope for fox. This matches the word and saves fox into the $& special variable.
Then I call the search method and search for the word dog. I immediately print out the match using the same $& variable inside the method.
Finally I return to the top level scope and print out the value of $& again.
Running this test, I get:
$ ruby regex_method.rb Value of $& inside method: dog Value of $& in the top level scope: fox
This is what we expect, but think carefully about this for a moment. The $& variable is obviously not global since it has different values at different places in my Ruby script. Ruby preserves the value of $& from the top level scope during the execution of the search method, allowing me to print out the matching word “fox” from the original search.
Ruby provides for this behavior by saving a separate set of special variables at each level of the stack using the svar value:
Here you can see Ruby saved the “fox” string in a table referred to by the svar pointer for the top level scope, and saved the “dog” string in a different table for the inner method scope. Ruby finds the proper special variable table using the LFP pointer for each stack frame. Depending on exactly which special variable you use, the table in this diagram might be a hash table or just a simple C structure. I’ll discuss hash tables in Chapter 4.
Ruby saves actual global variables - these are variables you define using a dollar sign prefix - in a single, global hash table. Regardless of where you save or retrieve the value of a normal global variable, Ruby accesses the same global hash table.
Now let’s try one more test - what happens if I perform the search inside a block and not a method?
str = "The quick brown fox jumped over the lazy dog.\n" /fox/.match(str)
2.times do /dog/.match(str) puts "Value of $& inside block: #{$&}" end
puts "Value of $& in the top level scope: #{$&}"
Running this last test, I get:
$ ruby regex_block.rb Value of $& inside block: dog Value of $& inside block: dog Value of $& in the top level scope: dog
Notice that now Ruby has overwritten the value of $& in the top scope with the matching word “dog” from the search I performed inside the block! This is by design: Ruby considers the top level scope and the inner block scope to be the same with regard to special variables. This is similar to how dynamic variable access works: we expect variables inside the block to have the same values as those in the parent scope.
Here is how Ruby implements this behavior:
Now Ruby has just a single special variable table, for the top level scope. Ruby finds the special variables using the LFP pointer, which points only to the top level scope. Inside the block scope, since there is no need for a separate copy of the special variables, Ruby takes advantage of the DFP-1 open slot and saves a value called the cref there instead.
What does the cref value mean? Unfortunately, I don’t have space in this book to explain this carefully, but in a nutshell cref indicates whether the given block should be executed in a different lexical scope compared to the parent frame. Lexical scope refers to the class or module the you are currently defining methods for. Ruby uses the cref value to implement metaprogramming API calls such as eval and instance_eval - the cref value is a pointer to the location on the lexical scope stack this block should be evaluated in. I’ll touch on these advanced concepts in Chapter 5, but you’ll have to wait for Ruby Under a Microscope - Part 2 to read a complete explanation of lexical scope and how Ruby implements it.
The best way to get an accurate list of all the special variables Ruby supports is to look right at the MRI C source; here’s a snippet of the C code that tokenizes your Ruby program. I’ve taken this from the parser_yylex function located in parse.y:
case '$': lex_state = EXPR_END; newtok(); c = nextc(); switch (c) { case '_': /* $_: last read line string */ c = nextc(); if (parser_is_identchar()) { tokadd('$'); tokadd('_'); break; } pushback(c); c = '_'; /* fall through */ case '~': /* $~: match-data */ case '*': /* $*: argv */ case '$': /* $$: pid */ case '?': /* $?: last status */ case '!': /* $!: error string */ case '@': /* $@: error position */ case '/': /* $/: input record separator */ case '\\': /* $\: output record separator */ case ';': /* $;: field separator */ case ',': /* $,: output field separator */ case '.': /* $.: last read line number */ case '=': /* $=: ignorecase */ case ':': /* $:: load path */ case '<': /* $<: reading filename */ case '>': /* $>: default output handle */ case '\"': /* $": already loaded files */ tokadd('$'); tokadd(c); tokfix(); set_yylval_name(rb_intern(tok())); return tGVAR;
At the top of this code snippet you can see Ruby matches a dollar sign “$” character - this is part of the large C switch statement that tokenizes your Ruby code, the process I discussed back at the beginning of Chapter 1. This is followed by an inner switch statement that matches on the following character; each of these characters corresponds to a special variable.
Just a bit farther down in the function is more C code that parses other special variable tokens you write in your Ruby code - these are the “regex last match” and related special variables:
case '&': /* $&: last match */ case '`': /* $`: string before last match */ case '\'': /* $': string after last match */ case '+': /* $+: string matches last paren. */ if (last_state == EXPR_FNAME) { tokadd('$'); tokadd(c); goto gvar; } set_yylval_node(NEW_BACK_REF(c)); return tBACK_REF;
Finally, this last snippet parses $1, $2, etc., producing the special variables that return the “nth back reference” from the last regular expression operation:
case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': tokadd('$'); do { tokadd(c); c = nextc(); } while (c != -1 && ISDIGIT(c)); pushback(c); if (last_state == EXPR_FNAME) goto gvar; tokfix(); set_yylval_node(NEW_NTH_REF(atoi(tok()+1))); return tNTH_REF;