The reason our materialized view is so much faster than the regular view is because it essentially caches the results of the backing query into a real table. The trade-off is that the contents of the table could lag behind what’s in the tables that the backing query queries.
Postgres provides a way to refresh the view via REFRESH MATERIALIZED VIEW. Before Postgres 9.4, refreshing materialized views like this was a problem, because it would lock the view while it was being refreshed. That meant that any application that wanted to query the view would have to wait until the update was completed. Since this could potentially be a long time, it meant that materialized views were mostly useless before 9.4.
As of Postgres 9.4, the refresh can be done concurrently in the background, allowing users of the table to continue querying old data until the refresh is complete. This is what we’ll set up here, and it requires running the command REFRESH MATERIALIZED VIEW CONCURRENTLY.
Let’s try it out.
| sql> refresh materialized view concurrently customer_details; |
| sql> select * from customer_details where customer_id = 388399; |
| -[ RECORD 1 ]---------------+------------------------------ |
| customer_id | 388399 |
| first_name | Dave |
| last_name | Copeland |
| email | dave@dave.dave |
| username | davetron5000 |
| joined_at | 2015-06-25 08:28:54.327645 |
| billing_street | 530 Nienow Stravenue |
| billing_city | West Aniyah |
| billing_state | RI |
| billing_zipcode | 72842-8201 |
| shipping_address_id | 1 |
| shipping_street | 530 Nienow Stravenue |
| shipping_city | West Aniyah |
| shipping_state | RI |
| shipping_zipcode | 72842-8201 |
| shipping_address_created_at | 2015-06-20 16:51:06.891914-04 |
Now that you know how to refresh the view, the trick is when to do it. This highly depends on how often the underlying data changes and how important it is for you to see the most recent data in the view. We’ll look at two techniques for doing that here. The first is to create a Rake task to refresh the view on a schedule. The second is to use database triggers to refresh the view whenever underlying data changes.
The simplest way to refresh the view is to create a Rake task and then arrange for that task to be run on a regular schedule. You can do this by creating lib/tasks/refresh_materialized_views.rake and using the connection method on ActiveRecord::Base, which will allow us to execute arbitrary SQL.
| desc "Refreshes materialized views" |
| task refresh_materialized_views: :environment do |
| ActiveRecord::Base.connection.execute %{ |
| REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details |
| } |
| end |
You can then run it on the command line via rails:
| $ bin/rails refresh_materialized_views |
With this in place, you can then configure our production system to run this periodically, for example using cron. How frequently to run it depends on how recent the data should be to users, as well as how long it takes to do the refresh. If users need the data to be fairly up-to-date, you could try running it every five minutes. If users can do their jobs without the absolute latest, you could run it every hour or even every day.
If users need it to be absolutely up-to-date with the underlying tables, you can have the database itself refresh whenever the underlying data changes by using triggers.
A database trigger is similar to an Active Record callback: it’s code that runs when certain events occur. In our case, we’d want to refresh our materialized view whenever data in the tables that view is based on changes.
To do this, we’ll create a database function that refreshes the materialized view, and then create several triggers that use that function when the data in the relevant tables changes. We can do this all in a Rails migration, so let’s create one where we can put this code:
| $ bin/rails g migration trigger-refresh-customer-details |
| invoke active_record |
| create db/migrate/20161007122606_trigger_refresh_customer_details.rb |
First, we’ll create a function to refresh the materialized view. This requires using Postgres’s PL/pgSQL[71] language. It looks fairly archaic, but we don’t need to use much of it.
| execute %{ |
| CREATE OR REPLACE FUNCTION |
| refresh_customer_details() |
| RETURNS TRIGGER LANGUAGE PLPGSQL |
| AS $$ |
| BEGIN |
| REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details; |
| RETURN NULL; |
| EXCEPTION |
| WHEN feature_not_supported THEN |
| RETURN NULL; |
| END $$; |
| } |
The key part of this is RETURNS TRIGGER, which is what will allow you to use this function in the triggers you’ll set up next. Also note the exception-handling clause that starts with EXCEPTION. This is similar to Ruby’s rescue keyword and is a way to handle errors that happen at runtime. You can provide any number of WHEN clauses to indicate how to handle a particular exception. In this case, we’re handling feature_not_supported, which is thrown if we run this function before the materialized view has been updated. In practice this won’t happen, but in our testing environment it can, since we are resetting the database during our tests.
The form of a trigger we want will look like so:
| CREATE TRIGGER |
| refresh_customer_details |
| AFTER |
| INSERT OR |
| UPDATE OR |
| DELETE |
| ON |
| customers |
| FOR EACH STATEMENT |
| EXECUTE PROCEDURE refresh_customer_details(); |
The code for this trigger reflects what it does: any insert, update, or delete on the customers table causes the database to run refresh_customer_details. So, we just need to set this up for each table that’s relevant.
If we assume that the list of U.S. states doesn’t change, we can set up triggers for the other three tables: ADDRESSES, CUSTOMERS_SHIPPING_ADDRESSES, and CUSTOMERS_BILLING_ADDRESSES. Since the code is almost the same for each table, we’ll loop over the table names and construct the SQL dynamically.
| class TriggerRefreshCustomerDetails < ActiveRecord::Migration[5.0] |
| def up |
| execute %{ |
| CREATE OR REPLACE FUNCTION |
| refresh_customer_details() |
| RETURNS TRIGGER LANGUAGE PLPGSQL |
| AS $$ |
| BEGIN |
| REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details; |
| RETURN NULL; |
| EXCEPTION |
| WHEN feature_not_supported THEN |
| RETURN NULL; |
| END $$; |
| } |
| %w(customers |
| customers_shipping_addresses |
| customers_billing_addresses |
| addresses).each do |table| |
| execute %{ |
| CREATE TRIGGER refresh_customer_details |
| AFTER |
| INSERT OR |
| UPDATE OR |
| DELETE |
| ON #{table} |
| FOR EACH STATEMENT |
| EXECUTE PROCEDURE |
| refresh_customer_details() |
| } |
| end |
| end |
| def down |
| %w(customers |
| customers_shipping_addresses |
| customers_billing_addresses |
| addresses).each do |table| |
| execute %{DROP TRIGGER refresh_customer_details ON #{table}} |
| end |
| execute %{DROP FUNCTION refresh_customer_details()} |
| end |
| end |
After we run rails db:migrate, you can insert new customers and see the view get refreshed automatically:
| sql> insert into customers( |
| first_name,last_name,email,username,created_at,updated_at) |
| values ( |
| 'Amy','Copeland','amy@amy.dave','amytron',now(),now()); |
| INSERT 0 1 |
| sql> select id from customers where username = 'amytron'; |
| id |
| -------- |
| 350002 |
| sql> insert into customers_shipping_addresses |
| (customer_id,address_id,"primary") |
| values |
| (350002,1,true); |
| INSERT 0 1 |
| sql> insert into customers_billing_addresses |
| (customer_id,address_id) |
| values |
| (350002,1); |
| INSERT 0 1 |
| > select * from customer_details where customer_id = 350002; |
| -[ RECORD 1 ]-------+--------------------------- |
| customer_id | 350002 |
| first_name | Amy |
| last_name | Copeland |
| email | amy@amy.dave |
| username | amytron |
| joined_at | 2015-06-26 08:17:17.536305 |
| billing_address_id | 1 |
| billing_street | 123 any st |
| billing_city | washington |
| billing_state | DC |
| billing_zipcode | 20001 |
| shipping_address_id | 1 |
| shipping_street | 123 any st |
| shipping_city | washington |
| shipping_state | DC |
| shipping_zipcode | 20001 |
You’ll notice that the inserts took a lot longer to execute than before. This is the downside of this technique. The materialized view is as up-to-date as it possibly can be; however, it updates slowly. If your table will have a high volume of writes or updates, you’ll be refreshing the view a lot, and this could slow down your database. You’ll have to evaluate which technique will be best, based on your actual usage.
One thing you’ll notice is that your tests are now failing. Let’s fix those before moving on.
The issue you should be seeing is that one of the tests in spec/features/customer_search_spec.rb is failing with a message similar to Couldn’t find CustomerDetail with ’customer_id’=11. This is because we’ve switched the controller’s show method to use CustomerDetail, which is backed by our materialized view, and that view requires that a customer have a billing address and shipping address. It’s using inner joins (the default when you just use JOIN).
An inner join is a way to tell Postgres not to return data if there isn’t data in a related table. In our case, if there is no row in customers_shipping_addresses or customers_billing_addresses for the given customer, the query backing our view won’t return a row.
We could use an outer join, which would tell Postgres to return empty values when there isn’t related data, but this should only be done if it’s allowed by our business rules. In our case, it’s not. Every customer must have a billing address and a primary shipping address, so instead of changing our query, we’ll change our test data so it’s creating valid data.
First, we need to modify the existing create_customer method you created in Test the Typeahead Search to add the requisite addresses. We’ll assume the existence of a method create_address, which you’ll see in a moment.
| def create_customer(first_name:, last_name:, email: nil) |
| username = "#{Faker::Internet.user_name}#{rand(1000)}" |
| email ||= "#{username}#{rand(1000)}@" + |
| "#{Faker::Internet.domain_name}" |
| |
» | customer = Customer.create!( |
| first_name: first_name, |
| last_name: last_name, |
| username: username, |
| email: email |
| ) |
| |
» | customer.create_customers_billing_address(address: create_address) |
» | customer.customers_shipping_address.create!(address: create_address, |
» | primary: true) |
» | customer |
| end |
If you aren’t familiar with some of the methods Active Record creates for you, create_customers_billing_address is provided because we’ve configured the relationship using a has_one.[72] Similarly, you can call create! on the customers_shipping_address relation because we used has_many.[73]
Now, let’s see create_address:
| def create_address |
| state = State.find_or_create_by!( |
| code: Faker::Address.state_abbr, |
| name: Faker::Address.state) |
| |
| Address.create!( |
| street: Faker::Address.street_address, |
| city: Faker::Address.city, |
| state: state, |
| zipcode: Faker::Address.zip) |
| end |
With that in place, our tests should be passing again.
If you choose not to use triggers to maintain your materialized view, you will need to execute REFRESH MATERIALIZED VIEW customer_details in your tests. You could do this by using RSpec’s hooks, which you learned about in Using DatabaseCleaner to Manage Test Data. Also note that the very first time you run your test, you might get an error like “ERROR: materialized view ’customer_details’ has not been populated.” In that case, manually refresh the materialized view in your test schema by running bin/rails dbconsole test and executing REFRESH MATERIALIZED VIEW customer_details.
The path that led us to materialized views was the promise of high performance and code simplicity. It may have felt circuitous, but it’s a great demonstration of the type of power you have as a full-stack developer. By understanding the breadth of tools available to you, and how to use them, you can create solutions that are simple.
Although you had to create a materialized view and triggers to keep it updated, you were able to avoid setting up a new piece of infrastructure for caching and can get more out of the database system we already have in place. Our Rails code looks like regular Rails code (we’re using Active Record to query data), and you didn’t need to write a background process to keep our data updated. Eventually, you’ll have complex enough query needs that you can’t use this technique, but you’re getting a lot further than you would with other RDBMSs.