Rails, Angular, Postgres, and Bootstrap, Second Edition

Keeping Materialized Views Updated

The reason our materialized view is so much faster than the regular view is because it essentially caches the results of the backing query into a real table. The trade-off is that the contents of the table could lag behind what’s in the tables that the backing query queries.

Postgres provides a way to refresh the view via REFRESH MATERIALIZED VIEW. Before Postgres 9.4, refreshing materialized views like this was a problem, because it would lock the view while it was being refreshed. That meant that any application that wanted to query the view would have to wait until the update was completed. Since this could potentially be a long time, it meant that materialized views were mostly useless before 9.4.

As of Postgres 9.4, the refresh can be done concurrently in the background, allowing users of the table to continue querying old data until the refresh is complete. This is what we’ll set up here, and it requires running the command REFRESH MATERIALIZED VIEW CONCURRENTLY.

Let’s try it out.

	sql> refresh materialized view concurrently customer_details;
	sql> select * from customer_details where customer_id = 388399;
	-[ RECORD 1 ]---------------+------------------------------
	customer_id \| 388399
	first_name \| Dave
	last_name \| Copeland
	email \| dave@dave.dave
	username \| davetron5000
	joined_at \| 2015-06-25 08:28:54.327645
	billing_street \| 530 Nienow Stravenue
	billing_city \| West Aniyah
	billing_state \| RI
	billing_zipcode \| 72842-8201
	shipping_address_id \| 1
	shipping_street \| 530 Nienow Stravenue
	shipping_city \| West Aniyah
	shipping_state \| RI
	shipping_zipcode \| 72842-8201
	shipping_address_created_at \| 2015-06-20 16:51:06.891914-04

Now that you know how to refresh the view, the trick is when to do it. This highly depends on how often the underlying data changes and how important it is for you to see the most recent data in the view. We’ll look at two techniques for doing that here. The first is to create a Rake task to refresh the view on a schedule. The second is to use database triggers to refresh the view whenever underlying data changes.

Refreshing the View on a Schedule

The simplest way to refresh the view is to create a Rake task and then arrange for that task to be run on a regular schedule. You can do this by creating lib/tasks/refresh_materialized_views.rake and using the connection method on ActiveRecord::Base, which will allow us to execute arbitrary SQL.

10_materialized-views/10-data-model/shine/lib/tasks/refresh_materialized_views.rake

	desc "Refreshes materialized views"
	task refresh_materialized_views: :environment do
	ActiveRecord::Base.connection.execute %{
	REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details
	}
	end

You can then run it on the command line via rails:

$ bin/rails refresh_materialized_views

With this in place, you can then configure our production system to run this periodically, for example using cron. How frequently to run it depends on how recent the data should be to users, as well as how long it takes to do the refresh. If users need the data to be fairly up-to-date, you could try running it every five minutes. If users can do their jobs without the absolute latest, you could run it every hour or even every day.

If users need it to be absolutely up-to-date with the underlying tables, you can have the database itself refresh whenever the underlying data changes by using triggers.

Refreshing the View with Triggers

A database trigger is similar to an Active Record callback: it’s code that runs when certain events occur. In our case, we’d want to refresh our materialized view whenever data in the tables that view is based on changes.

To do this, we’ll create a database function that refreshes the materialized view, and then create several triggers that use that function when the data in the relevant tables changes. We can do this all in a Rails migration, so let’s create one where we can put this code:

	$ bin/rails g migration trigger-refresh-customer-details
	invoke active_record
	create db/migrate/20161007122606_trigger_refresh_customer_details.rb

First, we’ll create a function to refresh the materialized view. This requires using Postgres’s PL/pgSQL^[71] language. It looks fairly archaic, but we don’t need to use much of it.

10_materialized-views/10-data-model/shine/db/mi … 161007122606_trigger_refresh_customer_details.rb

	execute %{
	CREATE OR REPLACE FUNCTION
	refresh_customer_details()
	RETURNS TRIGGER LANGUAGE PLPGSQL
	AS $$
	BEGIN
	REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details;
	RETURN NULL;
	EXCEPTION
	WHEN feature_not_supported THEN
	RETURN NULL;
	END $$;
	}

The key part of this is RETURNS TRIGGER, which is what will allow you to use this function in the triggers you’ll set up next. Also note the exception-handling clause that starts with EXCEPTION. This is similar to Ruby’s rescue keyword and is a way to handle errors that happen at runtime. You can provide any number of WHEN clauses to indicate how to handle a particular exception. In this case, we’re handling feature_not_supported, which is thrown if we run this function before the materialized view has been updated. In practice this won’t happen, but in our testing environment it can, since we are resetting the database during our tests.

The form of a trigger we want will look like so:

	CREATE TRIGGER
	refresh_customer_details
	AFTER
	INSERT OR
	UPDATE OR
	DELETE
	ON
	customers
	FOR EACH STATEMENT
	EXECUTE PROCEDURE refresh_customer_details();

The code for this trigger reflects what it does: any insert, update, or delete on the customers table causes the database to run refresh_customer_details. So, we just need to set this up for each table that’s relevant.

If we assume that the list of U.S. states doesn’t change, we can set up triggers for the other three tables: ADDRESSES, CUSTOMERS_SHIPPING_ADDRESSES, and CUSTOMERS_BILLING_ADDRESSES. Since the code is almost the same for each table, we’ll loop over the table names and construct the SQL dynamically.

10_materialized-views/10-data-model/shine/db/mi … 161007122606_trigger_refresh_customer_details.rb

	class TriggerRefreshCustomerDetails < ActiveRecord::Migration[5.0]
	def up
	execute %{
	CREATE OR REPLACE FUNCTION
	refresh_customer_details()
	RETURNS TRIGGER LANGUAGE PLPGSQL
	AS $$
	BEGIN
	REFRESH MATERIALIZED VIEW CONCURRENTLY customer_details;
	RETURN NULL;
	EXCEPTION
	WHEN feature_not_supported THEN
	RETURN NULL;
	END $$;
	}
	%w(customers
	customers_shipping_addresses
	customers_billing_addresses
	addresses).each do \|table\|
	execute %{
	CREATE TRIGGER refresh_customer_details
	AFTER
	INSERT OR
	UPDATE OR
	DELETE
	ON #{table}
	FOR EACH STATEMENT
	EXECUTE PROCEDURE
	refresh_customer_details()
	}
	end
	end
	def down
	%w(customers
	customers_shipping_addresses
	customers_billing_addresses
	addresses).each do \|table\|
	execute %{DROP TRIGGER refresh_customer_details ON #{table}}
	end
	execute %{DROP FUNCTION refresh_customer_details()}
	end
	end

After we run rails db:migrate, you can insert new customers and see the view get refreshed automatically:

	sql> insert into customers(
	first_name,last_name,email,username,created_at,updated_at)
	values (
	'Amy','Copeland','amy@amy.dave','amytron',now(),now());
	INSERT 0 1
	sql> select id from customers where username = 'amytron';
	id
	--------
	350002
	sql> insert into customers_shipping_addresses
	(customer_id,address_id,"primary")
	values
	(350002,1,true);
	INSERT 0 1
	sql> insert into customers_billing_addresses
	(customer_id,address_id)
	values
	(350002,1);
	INSERT 0 1
	> select * from customer_details where customer_id = 350002;
	-[ RECORD 1 ]-------+---------------------------
	customer_id \| 350002
	first_name \| Amy
	last_name \| Copeland
	email \| amy@amy.dave
	username \| amytron
	joined_at \| 2015-06-26 08:17:17.536305
	billing_address_id \| 1
	billing_street \| 123 any st
	billing_city \| washington
	billing_state \| DC
	billing_zipcode \| 20001
	shipping_address_id \| 1
	shipping_street \| 123 any st
	shipping_city \| washington
	shipping_state \| DC
	shipping_zipcode \| 20001

You’ll notice that the inserts took a lot longer to execute than before. This is the downside of this technique. The materialized view is as up-to-date as it possibly can be; however, it updates slowly. If your table will have a high volume of writes or updates, you’ll be refreshing the view a lot, and this could slow down your database. You’ll have to evaluate which technique will be best, based on your actual usage.

One thing you’ll notice is that your tests are now failing. Let’s fix those before moving on.

Altering Tests to Work with the Materialized View

The issue you should be seeing is that one of the tests in spec/features/customer_search_spec.rb is failing with a message similar to Couldn’t find CustomerDetail with ’customer_id’=11. This is because we’ve switched the controller’s show method to use CustomerDetail, which is backed by our materialized view, and that view requires that a customer have a billing address and shipping address. It’s using inner joins (the default when you just use JOIN).

An inner join is a way to tell Postgres not to return data if there isn’t data in a related table. In our case, if there is no row in customers_shipping_addresses or customers_billing_addresses for the given customer, the query backing our view won’t return a row.

We could use an outer join, which would tell Postgres to return empty values when there isn’t related data, but this should only be done if it’s allowed by our business rules. In our case, it’s not. Every customer must have a billing address and a primary shipping address, so instead of changing our query, we’ll change our test data so it’s creating valid data.

First, we need to modify the existing create_customer method you created in Test the Typeahead Search to add the requisite addresses. We’ll assume the existence of a method create_address, which you’ll see in a moment.

10_materialized-views/10-data-model/shine/spec/features/customer_search_spec.rb

	def create_customer(first_name:, last_name:, email: nil)
	username = "#{Faker::Internet.user_name}#{rand(1000)}"
	email \|\|= "#{username}#{rand(1000)}@" +
	"#{Faker::Internet.domain_name}"

»	customer = Customer.create!(
	first_name: first_name,
	last_name: last_name,
	username: username,
	email: email
	)

»	customer.create_customers_billing_address(address: create_address)
»	customer.customers_shipping_address.create!(address: create_address,
»	primary: true)
»	customer
	end

If you aren’t familiar with some of the methods Active Record creates for you, create_customers_billing_address is provided because we’ve configured the relationship using a has_one.^[72] Similarly, you can call create! on the customers_shipping_address relation because we used has_many.^[73]

Now, let’s see create_address:

10_materialized-views/10-data-model/shine/spec/features/customer_search_spec.rb

	def create_address
	state = State.find_or_create_by!(
	code: Faker::Address.state_abbr,
	name: Faker::Address.state)

	Address.create!(
	street: Faker::Address.street_address,
	city: Faker::Address.city,
	state: state,
	zipcode: Faker::Address.zip)
	end

With that in place, our tests should be passing again.

If you choose not to use triggers to maintain your materialized view, you will need to execute REFRESH MATERIALIZED VIEW customer_details in your tests. You could do this by using RSpec’s hooks, which you learned about in Using DatabaseCleaner to Manage Test Data. Also note that the very first time you run your test, you might get an error like “ERROR: materialized view ’customer_details’ has not been populated.” In that case, manually refresh the materialized view in your test schema by running bin/rails dbconsole test and executing REFRESH MATERIALIZED VIEW customer_details.

The path that led us to materialized views was the promise of high performance and code simplicity. It may have felt circuitous, but it’s a great demonstration of the type of power you have as a full-stack developer. By understanding the breadth of tools available to you, and how to use them, you can create solutions that are simple.

Although you had to create a materialized view and triggers to keep it updated, you were able to avoid setting up a new piece of infrastructure for caching and can get more out of the database system we already have in place. Our Rails code looks like regular Rails code (we’re using Active Record to query data), and you didn’t need to write a background process to keep our data updated. Eventually, you’ll have complex enough query needs that you can’t use this technique, but you’re getting a lot further than you would with other RDBMSs.

	sql> refresh materialized view concurrently customer_details;
	sql> select * from customer_details where customer_id = 388399;
	-[ RECORD 1 ]---------------+------------------------------
	customer_id \| 388399
	first_name \| Dave
	last_name \| Copeland
	email \| dave@dave.dave
	username \| davetron5000
	joined_at \| 2015-06-25 08:28:54.327645
	billing_street \| 530 Nienow Stravenue
	billing_city \| West Aniyah
	billing_state \| RI
	billing_zipcode \| 72842-8201
	shipping_address_id \| 1
	shipping_street \| 530 Nienow Stravenue
	shipping_city \| West Aniyah
	shipping_state \| RI
	shipping_zipcode \| 72842-8201
	shipping_address_created_at \| 2015-06-20 16:51:06.891914-04