Chapter 1

Tokenization, Parsing and Compilation

Your code has a long road to take
before Ruby ever runs it.

How many times do you think Ruby reads and transforms your code before running it? Once? Twice? Whenever you run a Ruby script – whether it’s a large Rails application, a simple Sinatra web site, or a background worker job – Ruby rips your code apart into small pieces and then puts them back together in a different format… three times! Between the time you type “ruby” and start to see actual output on the console, your Ruby code has a long road to take, a journey involving a variety of different technologies, techniques and open source tools.

At a high level, here’s what this journey looks like:

First, Ruby tokenizes your code. During this first step, Ruby reads the text characters in your code file and converts them into tokens. Think of tokens as the words that are used in the Ruby language. In the next step, Ruby parses these tokens; “parsing” means to group the tokens into meaningful Ruby statements. This is analogous to grouping words into sentences. Finally, Ruby compiles these statements or sentences into low level instructions that Ruby can execute later using a virtual machine.

I’ll get to Ruby’s virtual machine, called “Yet Another Ruby Virtual Machine” (YARV), next in Chapter 2, but first in this chapter I’ll describe the tokenizing, parsing and compiling processes which Ruby uses to understand the code you give it. Join me as I follow a Ruby script on its journey!

Tokenization, Parsing and Compilation

Chapter 1 Roadmap