Rails, Angular, Postgres, and Bootstrap, Second Edition

Understanding Query Performance with the Query Plan

If you aren’t familiar with database indexes, Wikipedia has a pretty good definition,^[35] but in essence, an index is a data structure created inside the database that speeds up query operations. Usually, databases use advanced data structures like B-trees to find the data you’re looking for without examining every single row in a table.

If you are familiar with indexes, you might only be familiar with the type of indexes that can be created by Active Record’s Migrations API. This API provides a “lowest common denominator” approach. The best we can do is create an index on last_name, first_name, and email. Doing so won’t actually help us because of the search we are doing. We need to match values that start with the search term and ignore case.

Postgres allows much more sophisticated indexes to be created. To see how this helps, let’s ask Postgres to tell us how our existing query will perform. This can be done by preceding a SQL statement with EXPLAIN ANALYZE. The output is somewhat opaque, but it’s useful. We’ll walk through it step by step.

	$ bundle exec rails dbconsole
	shine_development> EXPLAIN ANALYZE
	SELECT *
	FROM customers
	WHERE
	lower(first_name) like 'pat%' OR
	lower(last_name) like 'pat%' OR
	lower(email) = 'pat@example.com'
	ORDER BY
	email = 'pat@example.com' DESC,
	last_name ASC ;

	QUERY PLAN
	--------------------------------------------------------------------------------
①	Sort (cost=13930.19..13943.25 rows=5225 width=79)
	(actual time=618.065..618.103 rows=704 loops=1)
	Sort Key: (((email)::text = 'pat@example.com'::text)) DESC, last_name
	Sort Method: quicksort Memory: 124kB
②	-> Seq Scan on customers (cost=0.00..13607.51 rows=5225 width=79) #
	(actual time=0.165..612.380 rows=704 loops=1)
③	Filter: ((lower((first_name)::text) ~~ 'pat%'::text) OR
	(lower((last_name)::text) ~~ 'pat%'::text) OR
	(lower((email)::text) = 'pat@example.com'::text))
	Rows Removed by Filter: 349296
	Planning time: 1.223 ms
④	Execution time: 618.258 ms

This gobbledegook is the query plan and is quite informative if you know how to interpret it. There are four parts to it that will help you understand how Postgres will execute our query.

①: Here, Postgres is telling us that it’s sorting the results, which makes sense since we’re using an order by clause in our query. The details (for example, cost=15479.51) are useful for fine-tuning queries, but we’re not concerned with that right now. Just take from this that sorting is part of the query.
②: This is the most important bit of information in this query plan. “Seq Scan on customers” means that Postgres has to examine every single row in the table to satisfy the query. This means that the bigger the table is, the more work Postgres has to do to search it. Queries that you run frequently should not require examining every row in the table for this reason.
③: This shows us how Postgres has interpreted our where clause. It’s more or less what was in our query, but Postgres has annotated it with the internal data types it’s using to interpret the values.
④: Finally, Postgres estimates the runtime of the query. In this case, it’s more than half a second. That’s not much time to you or me, but to a database, it’s an eternity.

EXPLAIN vs. EXPLAIN ANALYZE
	EXPLAIN ANALYZE actually runs the query, whereas EXPLAIN (without ANALYZE) will just show the query plan. This means that repeatedly executing EXPLAIN ANALYZE on the same query could produce different timings, because Postgres could cache the query’s results. The query plan (everything up to “Execution time” in the output shown earlier) will always be the same. It’s not easy to control the caches Postgres uses, but if you vary the search string or ID in your WHERE clauses, you can often prevent it from using the cache.

EXPLAIN vs. EXPLAIN ANALYZE

EXPLAIN ANALYZE actually runs the query, whereas EXPLAIN (without ANALYZE) will just show the query plan. This means that repeatedly executing EXPLAIN ANALYZE on the same query could produce different timings, because Postgres could cache the query’s results. The query plan (everything up to “Execution time” in the output shown earlier) will always be the same. It’s not easy to control the caches Postgres uses, but if you vary the search string or ID in your WHERE clauses, you can often prevent it from using the cache.

Given all of this, it’s clear that our query will perform poorly. It’s likely that it performs poorly on our development machine, and will certainly not scale in a real-world scenario.

In most databases, because of the case-insensitive search and the use of like, there wouldn’t be much we could do. Postgres, however, can create an index that accounts for this way of searching.