Chapter 6. Performance

Premature optimization is the root of all evil (or at least most of it) in programming.

—Donald Knuth (attributed to C. A. R. Hoare)

Performance is an interesting beast. Performance optimization often has a bad reputation because it is often performed too early and too often, usually at the expense of readability, maintainability, and even correctness. Rails is generally fast enough, but it is possible to make it slow if you are not careful.

You should keep the following guidelines in mind when optimizing performance:

Algorithmic improvements always beat code tweaks

It is very tempting to try to squeeze every last bit of speed out of a piece of code, but often you can miss the bigger picture. No amount of C or assembly tweaking will make bubblesort faster than quicksort. Start your optimization from the top down.

As a general rule, maintainability beats performance

Your code should be first easy to read and understand, and only then optimized for speed.

Only optimize what matters

Typically, the code profile has a lopsided distribution: 80% of the time is spent in 20% of the code (for some value of 80% and 20%). It makes sense to spend your limited resources optimizing the sections that will bring the greatest gain in performance.

Measure twice, cut once

The only way to be certain about where your code is spending its time is to measure it. And just as in carpentry, you can waste a lot of time if you make changes without being very sure exactly what those changes should be. In this chapter, we will explore some of the best tools and methods for determining where to cut.

Of course, in order to properly measure performance, we need tools. This section is concerned with analysis of Ruby and Rails code, as well as web applications in general. There are a series of tools that can be used to analyze the full Rails stack, from HTTP down to Ruby's internals.

The most basic high-level measurement you will be interested in is: in the ideal case, how fast can this server serve requests? While the answer is a somewhat nebulous value that often bears no relation to actual performance under typical load, it is still useful to compare against itself—for example, when testing caching or deploying a new feature set.

This technique is called black-box analysis: we measure how much traffic the server can handle, while treating it as a "black box." For now, we don't really care what's inside the box, only about how fast it can serve requests. We will leave the minutiae of profiling and tweaking until later.

For this stage, we will need a benchmarking utility—but first, a brief diversion into the world of mathematics.

It doesn't take much knowledge of statistics to properly interpret the results of blackbox analysis, but there are some things you need to know.

Statistical analysis deals with the results of multiple samples, which in this case correspond to HTTP response times. In Ruby fashion, we will illustrate this with a Ruby array:

	samples = %w(10 11 12 10 10).map{|x|x.to_f}

The average, or mean, of these samples is their sum divided by the number of samples. This is straightforward to translate into Ruby—adding a few methods to Enumerable:

	module Enumerable
	  def sum(identity = 0)
	    inject(identity) {|sum, x| sum + x}
	  end

	  def mean
	    sum.to_f / length
	  end
	end

This gives us predictable results:

	samples.sum    # => 53.0
	samples.length # => 5
	samples.mean   # => 10.6

Everyone is familiar with the mean, but the problem is that by itself, the mean is nearly worthless for describing a data set. Consider these two sets of samples:

	samples1 = %w(10 11 12 10 10 9 12 10 9 9).map{|x|x.to_f}
	samples2 = %w( 2 11 6 14 20 21 3 4 8 13).map{|x|x.to_f}

These two data sets in fact have the same mean, 10.2. But they clearly represent wildly different performance profiles, as can be seen from their graph (see Figure 6-1).

We need a new statistic to measure how much the data varies from the mean. That statistic is the standard deviation. The standard deviation of a sample is calculated by taking the root mean square deviation from the sample mean. In Ruby, it looks like this:

	module Enumerable
	  def population_stdev
	    Math.sqrt( map{|x| (x - mean) ** 2}.mean )
	  end
	end

This code maps over the collection, taking the square of the deviation of each element from the mean. It then takes the mean of those squared values, and takes the square root of the mean, yielding the standard deviation.

However, this is only half the story. What has been introduced so far is the population standard deviation, while what we really want is the sample standard deviation. Without completely diving into the relevant mathematics, the basic difference between the two is whether the data represent an entire population or only a portion of it.

As our data set represents application response times, from which we want to infer a mean and confidence interval applicable to data points we have not sampled, we want to use the sample standard deviation. Using the population standard deviation on our sample would underestimate our population's actual standard deviation. Here is the Ruby code for the sample standard deviation, which we will use from here on out:

	module Enumerable
	  def stdev
	    Math.sqrt( map{|x| (x - mean) ** 2}.sum / (length-1) )
	  end
	end

The standard deviation is a very useful way to get a feel for the amount of variation in a data set. We see that the second set of samples from above has a much larger standard deviation than the first:

	samples1.stdev # => 1.13529242439509
	samples2.stdev # => 6.7954232964384

The standard deviation has the same units as the sample data; if the original data were in milliseconds, then the samples have standard deviations of 1.1 ms and 6.4 ms, respectively.

We can use the standard deviations to estimate a confidence interval. The confidence interval and mean will give us a good idea for the limits of the data. Assuming a normal distribution, [48] the following guidelines apply:

Using the second rule, we will generate a 95% confidence interval from the statistics we have generated. This Ruby code uses the mean and standard deviation to return a range in which 95% of the data should lie:

	module Enumerable
	  def confidence_interval
	    (mean - 2*stdev) .. (mean + 2*stdev)
	  end
	end

	samples1.confidence_interval # => 7.92941515120981..12.4705848487902
	samples2.confidence_interval # => -3.39084659287681..23.7908465928768

We see that the server that produced the first set of samples will usually (95% of the time) respond in between 8 and 12 ms. On the other hand, the data from the second server varies so wildly as to be nearly meaningless.

When comparing two data sets, it is important to compare not just their means but their standard deviations and confidence intervals as well. It may look as if you have made an improvement, but if the confidence intervals overlap by a significant amount, there is no statistical significance to the result.

The Ruby standard library includes Benchmark, which can be used to answer simple questions about code performance. The key word here is simple: it is all too easy to ignore confounding factors and take the numbers that Benchmark gives you as gospel.

Suppose we want to compare conventional method dispatch to the idiom of using method_missing and then examining the method name to decide what action to take. Here is a simple code example that benchmarks the two options:

	require 'benchmark'
	
	class Test
	  def test_one
	    1 + 1
	  end

	  def method_missing(method_id)
	    case method_id
	    when :test_unknown: 1 + 1

	    else super
	    end
	  end
	end

	t = Test.new

	Benchmark.bmbm do |b|
	  b.report("Direct method call") do
	    1_000_000.times { t.test_one }
	  end

	  b.report("method_missing") do
	    1_000_000.times { t.test_unknown }
	  end
	end

Note what we are not testing: we are not comparing the raw speed of ordinary method dispatch versus a bare call to method_missing. We are comparing an ordinary method call to the standard Ruby practice of using method_missing to answer for one method name. This gives us answers that are more relevant to our question: "How much will method_missing hurt me in this particular piece of code?"

We use the Benchmark.bmbm method, which runs the entire benchmark suite once (the "rehearsal") to minimize startup costs and give the measured code a "warm start." To get the most accurate numbers possible, each trial runs one million method calls.

The Benchmark library starts garbage collection before each run, because garbage collection during the measured run would alter the results. Here is the output of that benchmark on my computer:

	Rehearsal -------------------------------------------------------
	Direct method call    0.350000   0.000000   0.350000 (  0.352929)
	method_missing        0.480000   0.000000   0.480000 (  0.476009)
	---------------------------------------------- total: 0.830000sec

	                          user     system      total         real
	Direct method call    0.320000   0.000000   0.320000 (  0.324030)
	method_missing        0.480000   0.000000   0.480000 (  0.477420)

The rehearsal numbers come first, followed by the actual measurement. We can see that under this environment, the average cost of a normal method call is 320 nanoseconds (0.32 seconds per million calls), while a method_missing call and case statement take 480 nanoseconds. Modulo the accuracy of our measurement, this is a 50% performance penalty for using method_missing. Balanced against the additional power we get from method_missing, this certainly seems to be a good trade.

Benchmark is a powerful tool, but it can quickly amount to guesswork and black magic. There is no use optimizing method dispatch unless it is a bottleneck. [49]

Don't just go poking around looking for code to benchmark, though. Profilers are much more powerful utilities when you don't know what needs optimization. We will consider them next.

The Rails Analyzer Tools http://rails-analyzer.rubyforge.org/ are a set of utilities that can help profile your Rails application. While Benchmark is Ruby-specific, and httperf will benchmark any web site, the Rails Analyzer Tools were written with Rails in mind.

Once you know which actions are hurting for performance, you need a way to dig deeper. Action Profiler is a library (and a corresponding executable, action_profiler) that glues Rails to a profiler. It profiles a single Rails action through the entire Rails stack and application code so that you can examine where the most time is being spent during that action.

Action Profiler can be used with one of several profiling engines: ZenProfiler, ruby-prof, and Ruby's built-in Profiler class. They will be tried in that order unless one is specified on the command line with thel-P option. Here is a sample run of Ruby's Profiler, showing a flat call profile:

	$ action_profiler -P Profiler ListingController#map
	Warmup...
	Profiling...
	  %    cumulative   self             self     total
	 time   seconds     seconds  calls  ms/call  ms/call  name
	 13.73     0.07        0.07    204     0.34     0.34  ERB::Util.html_escape
	 11.76     0.13        0.06   1462     0.04     0.08  String#gsub
	 11.76     0.19        0.06      6    10.00    10.00  AssetTagHelper.
	                                                        compute_public_path
	  9.80     0.24        0.05     89      0.56    0.67  FormOptionsHelper.
	                                                        option_value_selected?
	  9.80     0.29        0.05    178      0.28    0.28  Array#<<
	  3.92     0.31        0.02     44      0.45    6.36  Array#each
	  3.92     0.33        0.02    287      0.07    0.07  Kernel.respond_to?

	  3.92     0.35        0.02     89      0.22    0.45  Array#each_index
	  3.92     0.37        0.02     89      0.22    0.34  Inflector.humanize
	  1.96     0.38        0.01    179      0.06    0.06  Kernel.===
	  1.96     0.39        0.01     67      0.15    0.15  String#concat
	  1.96     0.40        0.01     16      0.63    0.63  ActionView::Base#
	                                                        template_exists?
	  1.96     0.41        0.01    613      0.02    0.02  String#to_s
	  1.96     0.42        0.01      7      1.43    1.43  MonitorMixin.synchronize
 (and many more lines...)

These results include the following columns.

Column name

Description

% Time

Percentage of total time spent in only this function.

Cumulative Seconds

Total number of seconds spent in this function, including time spent in functions called by this one.

Self Seconds

Total number of seconds spent in only this function, excluding time spent in functions called by this one.

Calls

Total number of calls to this function during the profiling period.

Self ms/call

Average number of milliseconds spent in this function per call, excluding calls to other functions.

Total ms/call

Average number of milliseconds spent in this function per call, including calls to other functions.

Name

Name of the function being profiled.

The generated profile shows that we are spending most of our time in five functions: html_escape (better known as h()), gsub, compute_public_path, option_value_ selected?, and <<. This is hard to improve upon, and in fact this particular action is pretty snappy. There is not much that we can do to reduce time spent in methods like Array#<< except find ways to call them fewer times.

This example demonstrates both the power and weakness of flat profiles. Although they are very useful to get a general idea of which functions are taking the most time, they do not help very much with long-running or complicated programs. Rails applications certainly qualify as complicated—even a simple action can easily have hundreds of lines in the profile. It can be difficult to track down where your expensive functions are being called.

The solution to this complexity is, believe it or not, more complexity—provided we introduce some structure to the data. We will see later in the chapter how a graph profile, though it contains a huge amount of data, can offer better insight into where time is being spent.



[48] It is reasonable to assume a normal distribution here. We can safely treat series of server response times as i.i.d. random variables; therefore, by the central limit theorem, the distribution will converge to normal given enough samples.

[49] Here's a hint: at more than two million method calls per second, method dispatch is probably not your bottleneck.