Preventing Bad Data Using Check Constraints

If you’ve done any database work at all, you’re no doubt familiar with a “not null” constraint that prevents inserting null into a column of the database:

 CREATE​ ​TABLE​ people (
  id ​INT​ ​NOT​ ​NULL​,
 name​ ​VARCHAR​(255) ​NOT​ ​NULL​,
  birthdate ​DATE​ ​NULL
 );

In this table, id and name may not be NULL, but birthdate may be. Postgres takes the “null constraint” concept much further by allowing arbitrary constraints on fields. Postgres also has support for regular expressions. This means you can create a constraint on your email field that requires its value to match the same regular expression you used in our Rails code. This would prevent non-company email addresses from being inserted into the table entirely.

First, create a new migration where you can add this constraint:

 $ ​​bundle​​ ​​exec​​ ​​rails​​ ​​g​​ ​​migration​​ ​​add-email-constraint-to-users
  invoke active_record
  create db/migrate/20160718143725_add_email_constraint_to_users.rb

The Domain-Specific Language (DSL) for writing Rails migrations doesn’t provide any means of creating this constraint, so you have to do it in straight SQL. Although Postgres Data Definition Language (DDL) looks different from what is normally used in migrations, it’s still relatively straightforward and well documented online.[28]

The basic structure of our constraint is that we want to “alter” the USERS to “add” a constraint that will “check” the email column for invalid values. Here’s what our migration will look like (see the following sidebar to learn why we’re using the older up and down methods):

3_postgres-login/10-constraint/shine/db/migrate/20160718143725_add_email_constraint_to_users.rb
 class​ AddEmailConstraintToUsers < ActiveRecord::Migration[5.0]
 def​ up
  execute ​%{
  ALTER TABLE
  users
  ADD CONSTRAINT
  email_must_be_company_email
  CHECK ( email ~* '^[^@]+@example​​\\​​.com$' )
  }
 end
 
 def​ down
  execute ​%{
  ALTER TABLE
  users
  DROP CONSTRAINT
  email_must_be_company_email
  }
 end
 end

The ~* operator is how Postgres does regular expression matching. Therefore, this code means that the email column’s value must match the regular expression we’ve given or the insert or update command will fail. The regular expression is more or less identical to the one we used when configuring Devise.

Let’s see it in action by first running the migrations.

 $ ​​bundle​​ ​​exec​​ ​​rails​​ ​​db:migrate
 == 20160718143725 AddEmailConstraintToUsers: migrating =============
 -- execute("
  ALTER TABLE
  users
  ADD CONSTRAINT
  email_must_be_company_email
  CHECK ( email ~* '^[^@]+@example\\.com' )
  ")
  ->​​ ​​0.0324s
 == 20160718143725 AddEmailConstraintToUsers: migrated (0.0324s) ====

If you ran the migrations and saw something like the following error, you’d need to do a bit more work to apply this change.

 $ ​​bundle​​ ​​exec​​ ​​rails​​ ​​db:migrate
 ActiveRecord::StatementInvalid: PG::CheckViolation: ERROR:
  check constraint "email_must_be_company_email" is violated by some row:
  ALTER TABLE
  users
  ADD CONSTRAINT
  email_must_be_company_email
  CHECK ( email ~* '[A-Za-z0-9._%-]+@example\.com' )
  ;

This means that at least one row in your development database has a value for the email column that violates our new constraint. Postgres is refusing to apply the constraint because it doesn’t know what to do.

In your development environment, it’s safe to manipulate the database or just blow it away. If you are seeing this issue, I recommend you just delete all the rows from the table since that’s easy to do via delete from users;. If you were doing this to an active, production data set, you would not have that luxury. You would need to get more creative. There are several ways of handling this:

In any case, if you’re adding constraints to a running production system, you’ll have to be more careful.

With the migration applied, let’s see how it works. First, insert a user whose email is on our company’s domain:

 $ ​​bundle​​ ​​exec​​ ​​rails​​ ​​dbconsole
 shine_development>​​ ​​INSERT​​ ​​INTO
  users (
  email,
  encrypted_password,
  created_at,
  updated_at
  )
  VALUES (
  'foo@example.com',
  '$abcd',
  now(),
  now()
  );
 INSERT 0 1

This works as expected. Now let’s try to insert a user using a different domain:

 shine_development>​​ ​​INSERT​​ ​​INTO
  users (
  email,
  encrypted_password,
  created_at,
  updated_at
  )
  VALUES (
  'foo@bar.com',
  '$abcd',
  now(),
  now()
  );
 ERROR: new row for relation "users" violates
  check constraint "email_must_be_company_email"
 DETAIL: Failing row contains (4,
  foo@bar.com,
  $abcd,
  null,
  null,
  null,
  0,
  null,
  null,
  null,
  null,
  '2015-03-03:12:12:14.000',
  '2015-03-03:12:12:14.000'0).

You can see that Postgres refuses to allow invalid data into the table (and you get a pretty useful error message as well). This means that a rogue application, bug in the code, or even a developer at a production console will not be able to allow access to any user who doesn’t have a company email address.

Given how little effort this was, and the peace of mind it gives us, it’s a no-brainer to add this level of security. Postgres makes it simple, meaning the cost of securing our website is low.

There’s one last thing we’ll need to change because we’re using a feature that’s Postgres-specific. By default, Rails stores a snapshot of the database schema in db/schema.rb, which is a Ruby source file using the DSL for Rails migrations. Rails creates this by examining the database schema and creating what is essentially a single migration, in Ruby, to create the schema from scratch. This is what tests use to create a fresh database.

The problem is that Rails doesn’t know about check constraints, so the one we just added won’t be present in db/schema.rb. This is easily remedied by telling Rails to use SQL, rather than Ruby, for storing the schema. You can do this by adding one line to config/application.rb.

3_postgres-login/10-constraint/shine/config/application.rb
 module​ Shine
 class​ Application < Rails::Application
» config.active_record.schema_format = ​:sql
 end
 end

You’ll then need to remove the old db/schema.rb file, create db/structure.sql by running migrations, and finally reset your test database by dropping it and re-creating it. You can do all this with rake:

 $ ​​rm​​ ​​db/schema.rb
 $ ​​bundle​​ ​​exec​​ ​​rails​​ ​​db:migrate
 $ ​​RAILS_ENV=test​​ ​​bundle​​ ​​exec​​ ​​rails​​ ​​db:drop
 $ ​​RAILS_ENV=test​​ ​​bundle​​ ​​exec​​ ​​rails​​ ​​db:create

You May Get Churn in db/structure.sql

images/aside-icons/note.png

Because db/structure.sql is a Postgres-specific dump of the schema, certain aspects of it are dependent on the local environment. For example, if you use add_foreign_key, the names Postgres auto-generates might be different on different machines.

It’s not a big problem for your application’s behavior, since db/structure.sql is not used in production, but it can make for unnecessary churn in your version control history. You can combat this by tightly controlling the versions of Postgres each developer is using, providing explicit names for constraints and indexes, and not committing spurious changes to the file.