Chapter 1

Tokenization, Parsing and Compilation

image
Your code has a long road to take
before Ruby ever runs it.

How many times do you think Ruby reads and transforms your code before running it? Once? Twice? Whenever you run a Ruby script – whether it’s a large Rails application, a simple Sinatra web site, or a background worker job – Ruby rips your code apart into small pieces and then puts them back together in a different format… three times! Between the time you type “ruby” and start to see actual output on the console, your Ruby code has a long road to take, a journey involving a variety of different technologies, techniques and open source tools.

At a high level, here’s what this journey looks like:

image

First, Ruby tokenizes your code. During this first step, Ruby reads the text characters in your code file and converts them into tokens. Think of tokens as the words that are used in the Ruby language. In the next step, Ruby parses these tokens; “parsing” means to group the tokens into meaningful Ruby statements. This is analogous to grouping words into sentences. Finally, Ruby compiles these statements or sentences into low level instructions that Ruby can execute later using a virtual machine.

I’ll get to Ruby’s virtual machine, called “Yet Another Ruby Virtual Machine” (YARV), next in Chapter 2, but first in this chapter I’ll describe the tokenizing, parsing and compiling processes which Ruby uses to understand the code you give it. Join me as I follow a Ruby script on its journey!

Chapter 1 Roadmap

  1. Tokens: the words that make up the Ruby language
  2. Experiment 1-1: Using Ripper to tokenize different Ruby scripts
  3. Parsing: how Ruby understands the code you write
    1. Understanding the LALR parse algorithm
    2. Some actual Ruby grammar rules
  4. Experiment 1-2: Using Ripper to parse different Ruby scripts
  5. Compilation: how Ruby translates your code into a new language
    1. Stepping through how Ruby compiles a simple script
    2. Compiling a call to a block
  6. Experiment 1-3: Using the RubyVM class to display YARV instructions
  7. Tokenization, parsing and compilation in JRuby
  8. Tokenization, parsing and compilation in Rubinius