Premature optimization is the root of all evil (or at least most of it) in programming.
—Donald Knuth (attributed to C. A. R. Hoare)
Performance is an interesting beast. Performance optimization often has a bad reputation because it is often performed too early and too often, usually at the expense of readability, maintainability, and even correctness. Rails is generally fast enough, but it is possible to make it slow if you are not careful.
You should keep the following guidelines in mind when optimizing performance:
It is very tempting to try to squeeze every last bit of speed out of a piece of code, but often you can miss the bigger picture. No amount of C or assembly tweaking will make bubblesort faster than quicksort. Start your optimization from the top down.
Your code should be first easy to read and understand, and only then optimized for speed.
Typically, the code profile has a lopsided distribution: 80% of the time is spent in 20% of the code (for some value of 80% and 20%). It makes sense to spend your limited resources optimizing the sections that will bring the greatest gain in performance.
The only way to be certain about where your code is spending its time is to measure it. And just as in carpentry, you can waste a lot of time if you make changes without being very sure exactly what those changes should be. In this chapter, we will explore some of the best tools and methods for determining where to cut.
Of course, in order to properly measure performance, we need tools. This section is concerned with analysis of Ruby and Rails code, as well as web applications in general. There are a series of tools that can be used to analyze the full Rails stack, from HTTP down to Ruby's internals.
The most basic high-level measurement you will be interested in is: in the ideal case, how fast can this server serve requests? While the answer is a somewhat nebulous value that often bears no relation to actual performance under typical load, it is still useful to compare against itself—for example, when testing caching or deploying a new feature set.
This technique is called black-box analysis: we measure how much traffic the server can handle, while treating it as a "black box." For now, we don't really care what's inside the box, only about how fast it can serve requests. We will leave the minutiae of profiling and tweaking until later.
For this stage, we will need a benchmarking utility—but first, a brief diversion into the world of mathematics.
It doesn't take much knowledge of statistics to properly interpret the results of blackbox analysis, but there are some things you need to know.
Statistical analysis deals with the results of multiple samples, which in this case correspond to HTTP response times. In Ruby fashion, we will illustrate this with a Ruby array:
samples = %w(10 11 12 10 10).map{|x|x.to_f}
The average, or mean, of these
samples is their sum divided by the number of samples. This is
straightforward to translate into Ruby—adding a few methods to
Enumerable
:
module Enumerable def sum(identity = 0) inject(identity) {|sum, x| sum + x} end def mean sum.to_f / length end end
This gives us predictable results:
samples.sum # => 53.0 samples.length # => 5 samples.mean # => 10.6
Everyone is familiar with the mean, but the problem is that by itself, the mean is nearly worthless for describing a data set. Consider these two sets of samples:
samples1 = %w(10 11 12 10 10 9 12 10 9 9).map{|x|x.to_f} samples2 = %w( 2 11 6 14 20 21 3 4 8 13).map{|x|x.to_f}
These two data sets in fact have the same mean, 10.2. But they clearly represent wildly different performance profiles, as can be seen from their graph (see Figure 6-1).
We need a new statistic to measure how much the data varies from the mean. That statistic is the standard deviation. The standard deviation of a sample is calculated by taking the root mean square deviation from the sample mean. In Ruby, it looks like this:
module Enumerable def population_stdev Math.sqrt( map{|x| (x - mean) ** 2}.mean ) end end
This code maps over the collection, taking the square of the deviation of each element from the mean. It then takes the mean of those squared values, and takes the square root of the mean, yielding the standard deviation.
However, this is only half the story. What has been introduced so far is the population standard deviation, while what we really want is the sample standard deviation. Without completely diving into the relevant mathematics, the basic difference between the two is whether the data represent an entire population or only a portion of it.
As our data set represents application response times, from which we want to infer a mean and confidence interval applicable to data points we have not sampled, we want to use the sample standard deviation. Using the population standard deviation on our sample would underestimate our population's actual standard deviation. Here is the Ruby code for the sample standard deviation, which we will use from here on out:
module Enumerable def stdev Math.sqrt( map{|x| (x - mean) ** 2}.sum / (length-1) ) end end
The standard deviation is a very useful way to get a feel for the amount of variation in a data set. We see that the second set of samples from above has a much larger standard deviation than the first:
samples1.stdev # => 1.13529242439509 samples2.stdev # => 6.7954232964384
The standard deviation has the same units as the sample data; if the original data were in milliseconds, then the samples have standard deviations of 1.1 ms and 6.4 ms, respectively.
We can use the standard deviations to estimate a confidence interval. The confidence interval and mean will give us a good idea for the limits of the data. Assuming a normal distribution, [48] the following guidelines apply:
Approximately 68% of the data points lie within one standard deviation (
) of the mean.
95% of the data is within 2
of the mean.
99.7% of the data is within 3
of the mean.
Using the second rule, we will generate a 95% confidence interval from the statistics we have generated. This Ruby code uses the mean and standard deviation to return a range in which 95% of the data should lie:
module Enumerable def confidence_interval (mean - 2*stdev) .. (mean + 2*stdev) end end samples1.confidence_interval # => 7.92941515120981..12.4705848487902 samples2.confidence_interval # => -3.39084659287681..23.7908465928768
We see that the server that produced the first set of samples will usually (95% of the time) respond in between 8 and 12 ms. On the other hand, the data from the second server varies so wildly as to be nearly meaningless.
When comparing two data sets, it is important to compare not just their means but their standard deviations and confidence intervals as well. It may look as if you have made an improvement, but if the confidence intervals overlap by a significant amount, there is no statistical significance to the result.
Now that we know how to analyze results, we can benchmark a site. The best reference is Zed Shaw's instructions about tuning Mongrel using httperf (http://mongrel.rubyforge.org/docs/how_many_mongrels.html. We will not repeat the procedure here, but we will give some caveats:
Ensure that you have the front end server (usually Apache, lighttpd, or nginx) configured to serve static files. Then do a baseline measurement that requests a static file from the front end web server. You will never get Rails faster than this.
Run your tests from a machine as close as possible (in network terms) to the server. You want to eliminate latency and jitter (variance in latency) from the results.
Do not run performance tests from your web server. There are too many interactions between the server and the analyzer that will be confounded with your results. Even if CPU utilization is not a problem (such as on a multiprocessor machine), you will not know if I/O contention has skewed the results.
The Ruby standard library includes Benchmark, which can be used to answer simple questions about code performance. The key word here is simple: it is all too easy to ignore confounding factors and take the numbers that Benchmark gives you as gospel.
Suppose we want to compare conventional method dispatch to the
idiom of using method_missing
and
then examining the method name to decide what action to take. Here is
a simple code example that benchmarks the two options:
require 'benchmark' class Test def test_one 1 + 1 end def method_missing(method_id) case method_id when :test_unknown: 1 + 1 else super end end end t = Test.new Benchmark.bmbm do |b| b.report("Direct method call") do 1_000_000.times { t.test_one } end b.report("method_missing") do 1_000_000.times { t.test_unknown } end end
Note what we are not testing: we are not comparing the raw speed
of ordinary method dispatch versus a bare call to method_missing
. We are comparing an ordinary
method call to the standard Ruby practice of using method_missing
to answer for one method
name. This gives us answers that are more relevant to our question:
"How much will method_missing
hurt
me in this particular piece of code?"
We use the Benchmark.bmbm
method, which runs the entire benchmark suite once (the "rehearsal")
to minimize startup costs and give the measured code a "warm start."
To get the most accurate numbers possible, each trial runs one million
method calls.
The Benchmark library starts garbage collection before each run, because garbage collection during the measured run would alter the results. Here is the output of that benchmark on my computer:
Rehearsal ------------------------------------------------------- Direct method call 0.350000 0.000000 0.350000 ( 0.352929) method_missing 0.480000 0.000000 0.480000 ( 0.476009) ---------------------------------------------- total: 0.830000sec user system total real Direct method call 0.320000 0.000000 0.320000 ( 0.324030) method_missing 0.480000 0.000000 0.480000 ( 0.477420)
The rehearsal numbers come first, followed by the actual
measurement. We can see that under this environment, the
average cost of a normal method call is 320 nanoseconds (0.32 seconds
per million calls), while a method_missing
call and case
statement take 480 nanoseconds. Modulo
the accuracy of our measurement, this is a 50% performance penalty for using
method_missing
. Balanced against
the additional power we get from method_missing
, this certainly seems to be a
good trade.
Benchmark is a powerful tool, but it can quickly amount to guesswork and black magic. There is no use optimizing method dispatch unless it is a bottleneck. [49]
Don't just go poking around looking for code to benchmark, though. Profilers are much more powerful utilities when you don't know what needs optimization. We will consider them next.
The Rails Analyzer Tools http://rails-analyzer.rubyforge.org/ are a set of utilities that can help profile your Rails application. While Benchmark is Ruby-specific, and httperf will benchmark any web site, the Rails Analyzer Tools were written with Rails in mind.
The first tool, the Production Log Analyzer, scans your production
logfiles to find the slowest actions in your deployed application.
This is very useful for problem solving, but it has a number of
downsides. It requires the logs to go through the SyslogLogger
(provided with the Rails
Analyzer Tools), so you must set this up before the requests come
in. The tool also requires a syslogd that supports filtering on
program names, which usually means either running BSD or installing
syslog-ng. For these reasons, we will not go into detail on its use
here. Complete setup instructions are available at http://rails-analyzer.rubyforge.org/pl_analyzel/.
Once you know which actions are hurting for performance, you
need a way to dig deeper. Action Profiler is a library (and a corresponding
executable, action_profiler
) that
glues Rails to a profiler. It profiles a single Rails action through
the entire Rails stack and application code so that you can examine
where the most time is being spent during that action.
Action Profiler can be used with one of several
profiling engines: ZenProfiler, ruby-prof, and Ruby's built-in
Profiler
class. They will be
tried in that order unless one is specified on the command line with
thel-P
option. Here is a sample
run of Ruby's Profiler, showing a flat call profile:
$ action_profiler -P Profiler ListingController#map
Warmup...
Profiling...
% cumulative self self total
time seconds seconds calls ms/call ms/call name
13.73 0.07 0.07 204 0.34 0.34 ERB::Util.html_escape
11.76 0.13 0.06 1462 0.04 0.08 String#gsub
11.76 0.19 0.06 6 10.00 10.00 AssetTagHelper.
compute_public_path
9.80 0.24 0.05 89 0.56 0.67 FormOptionsHelper.
option_value_selected?
9.80 0.29 0.05 178 0.28 0.28 Array#<<
3.92 0.31 0.02 44 0.45 6.36 Array#each
3.92 0.33 0.02 287 0.07 0.07 Kernel.respond_to?
3.92 0.35 0.02 89 0.22 0.45 Array#each_index
3.92 0.37 0.02 89 0.22 0.34 Inflector.humanize
1.96 0.38 0.01 179 0.06 0.06 Kernel.===
1.96 0.39 0.01 67 0.15 0.15 String#concat
1.96 0.40 0.01 16 0.63 0.63 ActionView::Base#
template_exists?
1.96 0.41 0.01 613 0.02 0.02 String#to_s
1.96 0.42 0.01 7 1.43 1.43 MonitorMixin.synchronize
(and many more lines...
)
These results include the following columns.
Column name | Description |
---|---|
% Time | Percentage of total time spent in only this function. |
Cumulative Seconds | Total number of seconds spent in this function, including time spent in functions called by this one. |
Self Seconds | Total number of seconds spent in only this function, excluding time spent in functions called by this one. |
Calls | Total number of calls to this function during the profiling period. |
Self ms/call | Average number of milliseconds spent in this function per call, excluding calls to other functions. |
Total ms/call | Average number of milliseconds spent in this function per call, including calls to other functions. |
Name | Name of the function being profiled. |
The generated profile shows that we are spending most of our
time in five functions: html_escape
(better known as h()), gsub, compute_public_path, option_value_
selected
?, and <<. This is hard to improve upon, and
in fact this particular action is pretty snappy. There is not much
that we can do to reduce time spent in methods like Array#<<
except find ways to call
them fewer times.
This example demonstrates both the power and weakness of flat profiles. Although they are very useful to get a general idea of which functions are taking the most time, they do not help very much with long-running or complicated programs. Rails applications certainly qualify as complicated—even a simple action can easily have hundreds of lines in the profile. It can be difficult to track down where your expensive functions are being called.
The solution to this complexity is, believe it or not, more complexity—provided we introduce some structure to the data. We will see later in the chapter how a graph profile, though it contains a huge amount of data, can offer better insight into where time is being spent.
The Rails Analyzer Tools are a collection of miscellaneous
parts that help with performance tuning. Two of the utilities,
bench
and crawl
, measure "black-box" performance by
making rapid-fire HTTP requests to a site. We can already do this,
and more, with httperf.
The IOTail library mixes in tail_lines
method to IO
and StringIO
. This method yields once for each
line that is added to an open file, as it is added. This is used in
some of the other utility methods from the Rails Analyzer Tools.
The real gem in this library, however, is rails_stat
. This simple command-line
utility takes as arguments the path to a Rails production log and an optional refresh interval,
which defaults to 10 seconds. It sits on the logfile, watching for
new lines, and summarizes the traffic to the site, refreshing at the
given interval until it receives SIGINT (Ctrl-C):
$ rails_stat log/production.log ~ 1.4 req/sec, 0.0 queries/sec, 7.9 lines/sec
The implementation of rails_stat
is very simple (it is based on
IOTail). This code could be used as the basis for a flexible
real-time log analytics system.
[48] It is reasonable to assume a normal distribution here. We can safely treat series of server response times as i.i.d. random variables; therefore, by the central limit theorem, the distribution will converge to normal given enough samples.
[49] Here's a hint: at more than two million method calls per second, method dispatch is probably not your bottleneck.