Keeping Materialized Views Updated

The reason our materialized view is so much faster than the regular view is because it essentially caches the results of the backing query into a real table. The trade-off is that the contents of the table could lag behind what’s in the tables that the backing query queries.

Postgres provides a way to refresh the view via REFRESH MATERIALIZED VIEW. Before Postgres 9.4, refreshing materialized views like this was a problem, because it would lock the view while it was being refreshed. That meant that any application that wanted to query the view would have to wait until the update was completed. Since this could potentially be a long time, it meant that materialized views were mostly useless before 9.4.

As of Postgres 9.4, the refresh can be done concurrently in the background, allowing users of the table to continue querying old data until the refresh is complete. This is what we’ll set up here, and it requires running the command REFRESH MATERIALIZED VIEW CONCURRENTLY.

Let’s try it out.

 sql>​​ ​​refresh​​ ​​materialized​​ ​​view​​ ​​concurrently​​ ​​customer_details;
 sql>​​ ​​select​​ ​​*​​ ​​from​​ ​​customer_details​​ ​​where​​ ​​customer_id​​ ​​=​​ ​​388399;
 -[ RECORD 1 ]---------------+------------------------------
 customer_id | 388399
 first_name | Dave
 last_name | Copeland
 email | dave@dave.dave
 username | davetron5000
 joined_at | 2015-06-25 08:28:54.327645
 billing_street | 530 Nienow Stravenue
 billing_city | West Aniyah
 billing_state | RI
 billing_zipcode | 72842-8201
 shipping_address_id | 1
 shipping_street | 530 Nienow Stravenue
 shipping_city | West Aniyah
 shipping_state | RI
 shipping_zipcode | 72842-8201
 shipping_address_created_at | 2015-06-20 16:51:06.891914-04

Now that you know how to refresh the view, the trick is when to do it. This highly depends on how often the underlying data changes and how important it is for you to see the most recent data in the view. We’ll look at two techniques for doing that here. The first is to create a Rake task to refresh the view on a schedule. The second is to use database triggers to refresh the view whenever underlying data changes.

Refreshing the View on a Schedule

The simplest way to refresh the view is to create a Rake task and then arrange for that task to be run on a regular schedule. You can do this by creating lib/tasks/refresh_materialized_views.rake and using the connection method on ActiveRecord::Base, which will allow us to execute arbitrary SQL.

10_materialized-views/10-data-model/shine/lib/tasks/refresh_materialized_views.rake
 desc ​"Refreshes materialized views"
 task ​refresh_materialized_views: :environment​ ​do
  ActiveRecord::Base.connection.execute ​%{
  REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details
  }
 end

You can then run it on the command line via rails:

 $ ​​bin/rails​​ ​​refresh_materialized_views

With this in place, you can then configure our production system to run this periodically, for example using cron. How frequently to run it depends on how recent the data should be to users, as well as how long it takes to do the refresh. If users need the data to be fairly up-to-date, you could try running it every five minutes. If users can do their jobs without the absolute latest, you could run it every hour or even every day.

If users need it to be absolutely up-to-date with the underlying tables, you can have the database itself refresh whenever the underlying data changes by using triggers.

Refreshing the View with Triggers

A database trigger is similar to an Active Record callback: it’s code that runs when certain events occur. In our case, we’d want to refresh our materialized view whenever data in the tables that view is based on changes.

To do this, we’ll create a database function that refreshes the materialized view, and then create several triggers that use that function when the data in the relevant tables changes. We can do this all in a Rails migration, so let’s create one where we can put this code:

 $ ​​bin/rails​​ ​​g​​ ​​migration​​ ​​trigger-refresh-customer-details
 invoke active_record
 create db/migrate/20161007122606_trigger_refresh_customer_details.rb

First, we’ll create a function to refresh the materialized view. This requires using Postgres’s PL/pgSQL[71] language. It looks fairly archaic, but we don’t need to use much of it.

10_materialized-views/10-data-model/shine/db/mi … 161007122606_trigger_refresh_customer_details.rb
 execute ​%{
  CREATE OR REPLACE FUNCTION
  refresh_customer_details()
  RETURNS TRIGGER LANGUAGE PLPGSQL
  AS $$
  BEGIN
  REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details;
  RETURN NULL;
  EXCEPTION
  WHEN feature_not_supported THEN
  RETURN NULL;
  END $$;
 }

The key part of this is RETURNS TRIGGER, which is what will allow you to use this function in the triggers you’ll set up next. Also note the exception-handling clause that starts with EXCEPTION. This is similar to Ruby’s rescue keyword and is a way to handle errors that happen at runtime. You can provide any number of WHEN clauses to indicate how to handle a particular exception. In this case, we’re handling feature_not_supported, which is thrown if we run this function before the materialized view has been updated. In practice this won’t happen, but in our testing environment it can, since we are resetting the database during our tests.

The form of a trigger we want will look like so:

 CREATE​ ​TRIGGER
  refresh_customer_details
 AFTER
 INSERT​ ​OR
 UPDATE​ ​OR
 DELETE
 ON
  customers
 FOR​ ​EACH​ STATEMENT
 EXECUTE​ ​PROCEDURE​ refresh_customer_details();

The code for this trigger reflects what it does: any insert, update, or delete on the customers table causes the database to run refresh_customer_details. So, we just need to set this up for each table that’s relevant.

If we assume that the list of U.S. states doesn’t change, we can set up triggers for the other three tables: ADDRESSES, CUSTOMERS_SHIPPING_ADDRESSES, and CUSTOMERS_BILLING_ADDRESSES. Since the code is almost the same for each table, we’ll loop over the table names and construct the SQL dynamically.

10_materialized-views/10-data-model/shine/db/mi … 161007122606_trigger_refresh_customer_details.rb
 class​ TriggerRefreshCustomerDetails < ActiveRecord::Migration[5.0]
 def​ up
  execute ​%{
  CREATE OR REPLACE FUNCTION
  refresh_customer_details()
  RETURNS TRIGGER LANGUAGE PLPGSQL
  AS $$
  BEGIN
  REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details;
  RETURN NULL;
  EXCEPTION
  WHEN feature_not_supported THEN
  RETURN NULL;
  END $$;
  }
 %w(customers
  customers_shipping_addresses
  customers_billing_addresses
  addresses)​.each ​do​ |table|
  execute ​%{
  CREATE TRIGGER refresh_customer_details
  AFTER
  INSERT OR
  UPDATE OR
  DELETE
  ON #{table}
  FOR EACH STATEMENT
  EXECUTE PROCEDURE
  refresh_customer_details()
  }
 end
 end
 def​ down
 %w(customers
  customers_shipping_addresses
  customers_billing_addresses
  addresses)​.each ​do​ |table|
  execute ​%{DROP TRIGGER refresh_customer_details ON #{table}}
 end
  execute ​%{DROP FUNCTION refresh_customer_details()}
 end
 end

After we run rails db:migrate, you can insert new customers and see the view get refreshed automatically:

 sql>​​ ​​insert​​ ​​into​​ ​​customers(
  first_name,last_name,email,username,created_at,updated_at)
  values (
  'Amy','Copeland','amy@amy.dave','amytron',now(),now());
 INSERT 0 1
 sql>​​ ​​select​​ ​​id​​ ​​from​​ ​​customers​​ ​​where​​ ​​username​​ ​​=​​ ​​'amytron'​​;
  id
 --------
  350002
 sql>​​ ​​insert​​ ​​into​​ ​​customers_shipping_addresses
  (customer_id,address_id,"primary")
  values
  (350002,1,true);
 INSERT 0 1
 sql>​​ ​​insert​​ ​​into​​ ​​customers_billing_addresses
  (customer_id,address_id)
  values
  (350002,1);
 INSERT 0 1
 > select * from customer_details where customer_id = 350002;
 -[ RECORD 1 ]-------+---------------------------
 customer_id | 350002
 first_name | Amy
 last_name | Copeland
 email | amy@amy.dave
 username | amytron
 joined_at | 2015-06-26 08:17:17.536305
 billing_address_id | 1
 billing_street | 123 any st
 billing_city | washington
 billing_state | DC
 billing_zipcode | 20001
 shipping_address_id | 1
 shipping_street | 123 any st
 shipping_city | washington
 shipping_state | DC
 shipping_zipcode | 20001

You’ll notice that the inserts took a lot longer to execute than before. This is the downside of this technique. The materialized view is as up-to-date as it possibly can be; however, it updates slowly. If your table will have a high volume of writes or updates, you’ll be refreshing the view a lot, and this could slow down your database. You’ll have to evaluate which technique will be best, based on your actual usage.

One thing you’ll notice is that your tests are now failing. Let’s fix those before moving on.

Altering Tests to Work with the Materialized View

The issue you should be seeing is that one of the tests in spec/features/customer_search_spec.rb is failing with a message similar to Couldn’t find CustomerDetail with ’customer_id’=11. This is because we’ve switched the controller’s show method to use CustomerDetail, which is backed by our materialized view, and that view requires that a customer have a billing address and shipping address. It’s using inner joins (the default when you just use JOIN).

An inner join is a way to tell Postgres not to return data if there isn’t data in a related table. In our case, if there is no row in customers_shipping_addresses or customers_billing_addresses for the given customer, the query backing our view won’t return a row.

We could use an outer join, which would tell Postgres to return empty values when there isn’t related data, but this should only be done if it’s allowed by our business rules. In our case, it’s not. Every customer must have a billing address and a primary shipping address, so instead of changing our query, we’ll change our test data so it’s creating valid data.

First, we need to modify the existing create_customer method you created in Test the Typeahead Search to add the requisite addresses. We’ll assume the existence of a method create_address, which you’ll see in a moment.

10_materialized-views/10-data-model/shine/spec/features/customer_search_spec.rb
 def​ create_customer(first_name:, last_name:, ​email: ​​nil​)
  username = ​"​​#{​Faker::Internet.user_name​}#{​rand(1000)​}​​"
  email ||= ​"​​#{​username​}#{​rand(1000)​}​​@"​ +
 "​​#{​Faker::Internet.domain_name​}​​"
 
» customer = Customer.create!(
 first_name: ​first_name,
 last_name: ​last_name,
 username: ​username,
 email: ​email
  )
 
» customer.create_customers_billing_address(​address: ​create_address)
» customer.customers_shipping_address.create!(​address: ​create_address,
»primary: ​​true​)
» customer
 end

If you aren’t familiar with some of the methods Active Record creates for you, create_customers_billing_address is provided because we’ve configured the relationship using a has_one.[72] Similarly, you can call create! on the customers_shipping_address relation because we used has_many.[73]

Now, let’s see create_address:

10_materialized-views/10-data-model/shine/spec/features/customer_search_spec.rb
 def​ create_address
  state = State.find_or_create_by!(
 code: ​Faker::Address.state_abbr,
 name: ​Faker::Address.state)
 
  Address.create!(
 street: ​Faker::Address.street_address,
 city: ​Faker::Address.city,
 state: ​state,
 zipcode: ​Faker::Address.zip)
 end

With that in place, our tests should be passing again.

If you choose not to use triggers to maintain your materialized view, you will need to execute REFRESH MATERIALIZED VIEW customer_details in your tests. You could do this by using RSpec’s hooks, which you learned about in Using DatabaseCleaner to Manage Test Data. Also note that the very first time you run your test, you might get an error like “ERROR: materialized view ’customer_details’ has not been populated.” In that case, manually refresh the materialized view in your test schema by running bin/rails dbconsole test and executing REFRESH MATERIALIZED VIEW customer_details.

The path that led us to materialized views was the promise of high performance and code simplicity. It may have felt circuitous, but it’s a great demonstration of the type of power you have as a full-stack developer. By understanding the breadth of tools available to you, and how to use them, you can create solutions that are simple.

Although you had to create a materialized view and triggers to keep it updated, you were able to avoid setting up a new piece of infrastructure for caching and can get more out of the database system we already have in place. Our Rails code looks like regular Rails code (we’re using Active Record to query data), and you didn’t need to write a background process to keep our data updated. Eventually, you’ll have complex enough query needs that you can’t use this technique, but you’re getting a lot further than you would with other RDBMSs.