Chapter 6. Performance

Premature optimization is the root of all evil (or at least most of it) in programming.

—Donald Knuth (attributed to C. A. R. Hoare)

Performance is an interesting beast. Performance optimization often has a bad reputation because it is often performed too early and too often, usually at the expense of readability, maintainability, and even correctness. Rails is generally fast enough, but it is possible to make it slow if you are not careful.

You should keep the following guidelines in mind when optimizing performance:

Algorithmic improvements always beat code tweaks: It is very tempting to try to squeeze every last bit of speed out of a piece of code, but often you can miss the bigger picture. No amount of C or assembly tweaking will make bubblesort faster than quicksort. Start your optimization from the top down.
As a general rule, maintainability beats performance: Your code should be first easy to read and understand, and only then optimized for speed.
Only optimize what matters: Typically, the code profile has a lopsided distribution: 80% of the time is spent in 20% of the code (for some value of 80% and 20%). It makes sense to spend your limited resources optimizing the sections that will bring the greatest gain in performance.
Measure twice, cut once: The only way to be certain about where your code is spending its time is to measure it. And just as in carpentry, you can waste a lot of time if you make changes without being very sure exactly what those changes should be. In this chapter, we will explore some of the best tools and methods for determining where to cut.

Measurement Tools

Of course, in order to properly measure performance, we need tools. This section is concerned with analysis of Ruby and Rails code, as well as web applications in general. There are a series of tools that can be used to analyze the full Rails stack, from HTTP down to Ruby's internals.

Black-Box Analysis

The most basic high-level measurement you will be interested in is: in the ideal case, how fast can this server serve requests? While the answer is a somewhat nebulous value that often bears no relation to actual performance under typical load, it is still useful to compare against itself—for example, when testing caching or deploying a new feature set.

This technique is called black-box analysis: we measure how much traffic the server can handle, while treating it as a "black box." For now, we don't really care what's inside the box, only about how fast it can serve requests. We will leave the minutiae of profiling and tweaking until later.

For this stage, we will need a benchmarking utility—but first, a brief diversion into the world of mathematics.

Statistics: The least you need to know

It doesn't take much knowledge of statistics to properly interpret the results of blackbox analysis, but there are some things you need to know.

Statistical analysis deals with the results of multiple samples, which in this case correspond to HTTP response times. In Ruby fashion, we will illustrate this with a Ruby array:

	samples = %w(10 11 12 10 10).map{|x|x.to_f}

The average, or mean, of these samples is their sum divided by the number of samples. This is straightforward to translate into Ruby—adding a few methods to Enumerable:

	module Enumerable
	  def sum(identity = 0)
	    inject(identity) {|sum, x| sum + x}
	  end

	  def mean
	    sum.to_f / length
	  end
	end

This gives us predictable results:

	samples.sum    # => 53.0
	samples.length # => 5
	samples.mean   # => 10.6

Everyone is familiar with the mean, but the problem is that by itself, the mean is nearly worthless for describing a data set. Consider these two sets of samples:

	samples1 = %w(10 11 12 10 10 9 12 10 9 9).map{|x|x.to_f}
	samples2 = %w( 2 11 6 14 20 21 3 4 8 13).map{|x|x.to_f}

These two data sets in fact have the same mean, 10.2. But they clearly represent wildly different performance profiles, as can be seen from their graph (see Figure 6-1).

Figure 6-1. Two vastly different response-time profiles with the same mean

We need a new statistic to measure how much the data varies from the mean. That statistic is the standard deviation. The standard deviation of a sample is calculated by taking the root mean square deviation from the sample mean. In Ruby, it looks like this:

	module Enumerable
	  def population_stdev
	    Math.sqrt( map{|x| (x - mean) ** 2}.mean )
	  end
	end

This code maps over the collection, taking the square of the deviation of each element from the mean. It then takes the mean of those squared values, and takes the square root of the mean, yielding the standard deviation.

However, this is only half the story. What has been introduced so far is the population standard deviation, while what we really want is the sample standard deviation. Without completely diving into the relevant mathematics, the basic difference between the two is whether the data represent an entire population or only a portion of it.

As our data set represents application response times, from which we want to infer a mean and confidence interval applicable to data points we have not sampled, we want to use the sample standard deviation. Using the population standard deviation on our sample would underestimate our population's actual standard deviation. Here is the Ruby code for the sample standard deviation, which we will use from here on out:

	module Enumerable
	  def stdev
	    Math.sqrt( map{|x| (x - mean) ** 2}.sum / (length-1) )
	  end
	end

The standard deviation is a very useful way to get a feel for the amount of variation in a data set. We see that the second set of samples from above has a much larger standard deviation than the first:

	samples1.stdev # => 1.13529242439509
	samples2.stdev # => 6.7954232964384

The standard deviation has the same units as the sample data; if the original data were in milliseconds, then the samples have standard deviations of 1.1 ms and 6.4 ms, respectively.

We can use the standard deviations to estimate a confidence interval. The confidence interval and mean will give us a good idea for the limits of the data. Assuming a normal distribution, ^[48] the following guidelines apply:

Approximately 68% of the data points lie within one standard deviation (
) of the mean.
95% of the data is within 2
of the mean.
99.7% of the data is within 3
of the mean.

Using the second rule, we will generate a 95% confidence interval from the statistics we have generated. This Ruby code uses the mean and standard deviation to return a range in which 95% of the data should lie:

	module Enumerable
	  def confidence_interval
	    (mean - 2*stdev) .. (mean + 2*stdev)
	  end
	end

	samples1.confidence_interval # => 7.92941515120981..12.4705848487902
	samples2.confidence_interval # => -3.39084659287681..23.7908465928768

We see that the server that produced the first set of samples will usually (95% of the time) respond in between 8 and 12 ms. On the other hand, the data from the second server varies so wildly as to be nearly meaningless.

When comparing two data sets, it is important to compare not just their means but their standard deviations and confidence intervals as well. It may look as if you have made an improvement, but if the confidence intervals overlap by a significant amount, there is no statistical significance to the result.

Black-box analysis with httperf

Now that we know how to analyze results, we can benchmark a site. The best reference is Zed Shaw's instructions about tuning Mongrel using httperf (http://mongrel.rubyforge.org/docs/how_many_mongrels.html. We will not repeat the procedure here, but we will give some caveats:

Ensure that you have the front end server (usually Apache, lighttpd, or nginx) configured to serve static files. Then do a baseline measurement that requests a static file from the front end web server. You will never get Rails faster than this.
Run your tests from a machine as close as possible (in network terms) to the server. You want to eliminate latency and jitter (variance in latency) from the results.
Do not run performance tests from your web server. There are too many interactions between the server and the analyzer that will be confounded with your results. Even if CPU utilization is not a problem (such as on a multiprocessor machine), you will not know if I/O contention has skewed the results.

Code Timing

The Ruby standard library includes Benchmark, which can be used to answer simple questions about code performance. The key word here is simple: it is all too easy to ignore confounding factors and take the numbers that Benchmark gives you as gospel.

Suppose we want to compare conventional method dispatch to the idiom of using method_missing and then examining the method name to decide what action to take. Here is a simple code example that benchmarks the two options:

	require 'benchmark'
	
	class Test
	  def test_one
	    1 + 1
	  end

	  def method_missing(method_id)
	    case method_id
	    when :test_unknown: 1 + 1

	    else super
	    end
	  end
	end

	t = Test.new

	Benchmark.bmbm do |b|
	  b.report("Direct method call") do
	    1_000_000.times { t.test_one }
	  end

	  b.report("method_missing") do
	    1_000_000.times { t.test_unknown }
	  end
	end

Note what we are not testing: we are not comparing the raw speed of ordinary method dispatch versus a bare call to method_missing. We are comparing an ordinary method call to the standard Ruby practice of using method_missing to answer for one method name. This gives us answers that are more relevant to our question: "How much will method_missing hurt me in this particular piece of code?"

We use the Benchmark.bmbm method, which runs the entire benchmark suite once (the "rehearsal") to minimize startup costs and give the measured code a "warm start." To get the most accurate numbers possible, each trial runs one million method calls.

The Benchmark library starts garbage collection before each run, because garbage collection during the measured run would alter the results. Here is the output of that benchmark on my computer:

	Rehearsal -------------------------------------------------------
	Direct method call    0.350000   0.000000   0.350000 (  0.352929)
	method_missing        0.480000   0.000000   0.480000 (  0.476009)
	---------------------------------------------- total: 0.830000sec

	                          user     system      total         real
	Direct method call    0.320000   0.000000   0.320000 (  0.324030)
	method_missing        0.480000   0.000000   0.480000 (  0.477420)

The rehearsal numbers come first, followed by the actual measurement. We can see that under this environment, the average cost of a normal method call is 320 nanoseconds (0.32 seconds per million calls), while a method_missing call and case statement take 480 nanoseconds. Modulo the accuracy of our measurement, this is a 50% performance penalty for using method_missing. Balanced against the additional power we get from method_missing, this certainly seems to be a good trade.

Benchmark is a powerful tool, but it can quickly amount to guesswork and black magic. There is no use optimizing method dispatch unless it is a bottleneck. ^[49]

Don't just go poking around looking for code to benchmark, though. Profilers are much more powerful utilities when you don't know what needs optimization. We will consider them next.

Rails Analyzer Tools

The Rails Analyzer Tools http://rails-analyzer.rubyforge.org/ are a set of utilities that can help profile your Rails application. While Benchmark is Ruby-specific, and httperf will benchmark any web site, the Rails Analyzer Tools were written with Rails in mind.

Production log analyzer

The first tool, the Production Log Analyzer, scans your production logfiles to find the slowest actions in your deployed application. This is very useful for problem solving, but it has a number of downsides. It requires the logs to go through the SyslogLogger (provided with the Rails Analyzer Tools), so you must set this up before the requests come in. The tool also requires a syslogd that supports filtering on program names, which usually means either running BSD or installing syslog-ng. For these reasons, we will not go into detail on its use here. Complete setup instructions are available at http://rails-analyzer.rubyforge.org/pl_analyzel/.

Action profiler

Once you know which actions are hurting for performance, you need a way to dig deeper. Action Profiler is a library (and a corresponding executable, action_profiler) that glues Rails to a profiler. It profiles a single Rails action through the entire Rails stack and application code so that you can examine where the most time is being spent during that action.

Action Profiler can be used with one of several profiling engines: ZenProfiler, ruby-prof, and Ruby's built-in Profiler class. They will be tried in that order unless one is specified on the command line with thel-P option. Here is a sample run of Ruby's Profiler, showing a flat call profile:

	$ action_profiler -P Profiler ListingController#map
	Warmup...
	Profiling...
	  %    cumulative   self             self     total
	 time   seconds     seconds  calls  ms/call  ms/call  name
	 13.73     0.07        0.07    204     0.34     0.34  ERB::Util.html_escape
	 11.76     0.13        0.06   1462     0.04     0.08  String#gsub
	 11.76     0.19        0.06      6    10.00    10.00  AssetTagHelper.
	                                                        compute_public_path
	  9.80     0.24        0.05     89      0.56    0.67  FormOptionsHelper.
	                                                        option_value_selected?
	  9.80     0.29        0.05    178      0.28    0.28  Array#<<
	  3.92     0.31        0.02     44      0.45    6.36  Array#each
	  3.92     0.33        0.02    287      0.07    0.07  Kernel.respond_to?

	  3.92     0.35        0.02     89      0.22    0.45  Array#each_index
	  3.92     0.37        0.02     89      0.22    0.34  Inflector.humanize
	  1.96     0.38        0.01    179      0.06    0.06  Kernel.===
	  1.96     0.39        0.01     67      0.15    0.15  String#concat
	  1.96     0.40        0.01     16      0.63    0.63  ActionView::Base#
	                                                        template_exists?
	  1.96     0.41        0.01    613      0.02    0.02  String#to_s
	  1.96     0.42        0.01      7      1.43    1.43  MonitorMixin.synchronize
 (and many more lines...)

These results include the following columns.

Column name	Description
% Time	Percentage of total time spent in only this function.
Cumulative Seconds	Total number of seconds spent in this function, including time spent in functions called by this one.
Self Seconds	Total number of seconds spent in only this function, excluding time spent in functions called by this one.
Calls	Total number of calls to this function during the profiling period.
Self ms/call	Average number of milliseconds spent in this function per call, excluding calls to other functions.
Total ms/call	Average number of milliseconds spent in this function per call, including calls to other functions.
Name	Name of the function being profiled.

The generated profile shows that we are spending most of our time in five functions: html_escape (better known as h()), gsub, compute_public_path, option_value_ selected?, and <<. This is hard to improve upon, and in fact this particular action is pretty snappy. There is not much that we can do to reduce time spent in methods like Array#<< except find ways to call them fewer times.

This example demonstrates both the power and weakness of flat profiles. Although they are very useful to get a general idea of which functions are taking the most time, they do not help very much with long-running or complicated programs. Rails applications certainly qualify as complicated—even a simple action can easily have hundreds of lines in the profile. It can be difficult to track down where your expensive functions are being called.

The solution to this complexity is, believe it or not, more complexity—provided we introduce some structure to the data. We will see later in the chapter how a graph profile, though it contains a huge amount of data, can offer better insight into where time is being spent.

Rails Analyzer Tools

The Rails Analyzer Tools are a collection of miscellaneous parts that help with performance tuning. Two of the utilities, bench and crawl, measure "black-box" performance by making rapid-fire HTTP requests to a site. We can already do this, and more, with httperf.

The IOTail library mixes in tail_lines method to IO and StringIO. This method yields once for each line that is added to an open file, as it is added. This is used in some of the other utility methods from the Rails Analyzer Tools.

The real gem in this library, however, is rails_stat. This simple command-line utility takes as arguments the path to a Rails production log and an optional refresh interval, which defaults to 10 seconds. It sits on the logfile, watching for new lines, and summarizes the traffic to the site, refreshing at the given interval until it receives SIGINT (Ctrl-C):

	$ rails_stat log/production.log
	~ 1.4 req/sec, 0.0 queries/sec, 7.9 lines/sec

The implementation of rails_stat is very simple (it is based on IOTail). This code could be used as the basis for a flexible real-time log analytics system.

^[48] It is reasonable to assume a normal distribution here. We can safely treat series of server response times as i.i.d. random variables; therefore, by the central limit theorem, the distribution will converge to normal given enough samples.

^[49]Here's a hint: at more than two million method calls per second, method dispatch is probably not your bottleneck.