Hashrocket.com / blog

Large polymorphic associations

Modeling Polymorphic Associations in a Relational Database

posted on and written by in

Image 100x100 jack christensen

Polymorphic associations can be difficult to correctly represent in a relational database. In this post I will compare and contrast four different ways to model these challenging relationships.

Imagine you are tasked with designing a fine-grained access control system for a large system. There are many resources across multiple types with permissions such as read, write, and admin. Permissions can be granted to users or user groups.

A good first step is to introduce an access control list entity. This ACL entity will be responsible for the entire process or determining whether a user has a permission on a resource. Every access controlled resource should have its own ACL. The ACL concept itself would be rather complicated, probably composed of several tables. But for this example we won't concern ourselves with the implementation of the ACL, instead we will use a stub acl table and focus on the challenge of how to relate an acl to arbitrary resource types. For this example, the resource types we want to access control will be document, image, file, and report.

Polymorphic Joins

A simple approach to connect an acl to a resource is to use two columns on the acl table: resource_type and resource_id. This approach was popularized by Ruby on Rails. The acl table could be defined as follows:

create table acl(
  id serial primary key,
  resource_type varchar not null,
  resource_id integer not null,
  -- other fields omitted
  unique(resource_id, resource_type)
);

The query for retrieving an acl for document id:42 would be like the following:

select *
from acl
where resource_type='document'
  and resource_id=42;

A serious problem with this approach is the database is very limited in the data integrity it can enforce due to the lack of foreign key constraints. It can ensure a resource has no more that one acl, but that is all. A resource can be missing an acl and an acl can point to a missing resource.

Join Table Per Relationship Type

Another approach is to use a join table for each relationship type.

create table acl_document(
  acl_id integer not null unique references acl,
  document_id integer not unique null references document
);

create table acl_image(
  acl_id integer not null unique references acl,
  image_id integer not null unique references image
);

-- and so on for each resource type

The query for retrieving an acl for document id:42 would be like the following:

select acl.*
from acl
  join acl_document on acl_document.acl_id=acl.id
where document_id=42;

This approach does use foreign key constraints, so the database can ensure that any connections between an acl and a resource are valid. However, it has no way to require that a resource has an acl. And while the uniqueness constraints ensure that an acl cannot be connected to multiple records of the same type (e.g. two document rows) it has no way to prevent an acl from being connected to resources of different types (e.g. one acl could be incorrectly connected to both a document and an image).

Reverse Belongs-To

Even though the acl logically belongs to the resource, this relationship can be reversed by including an acl_id on the resource. This is similar in implementation to class table inheritance, but the relationship is more of inclusion rather than is-a.

create table document(
  id serial primary key,
  acl_id integer not null unique references acl,
  --- other fields omitted
);

With this design, all resources are guaranteed to have valid references to an acl. However, there is no way to prevent orphan acl records. As with the previous technique there is no way to prevent different types of resources from pointing to the same acl.

Exclusive Belongs To (AKA Exclusive Arc)

In this model, the acl has foreign keys to all tables to which it can belong.

create table acl(
  id serial primary key,
  document_id integer references document,
  image_id integer references image,
  file_id integer references file,
  report_id integer references report,
  -- other fields omitted
  check(
    (
      (document_id is not null)::integer +
      (image_id is not null)::integer +
      (file_id is not null)::integer +
      (report_id is not null)::integer 
    ) = 1
  )
);

create unique index on acl (document_id) where document_id is not null;
create unique index on acl (image_id) where image_id is not null;
create unique index on acl (file_id) where file_id is not null;
create unique index on acl (report_id) where report_id is not null;

Take note of the check constraint. This ensures that an acl belongs to exactly one resource of any type. With this design an acl cannot be orphaned, but there is no way to enforce that a resource has an acl. Also important are the partial unique indexes. Limiting the unique indexes to only not null values dramatically saves space as well as reducing write operations on insert.

How to Choose

I usually dismiss a polymorphic join immediately due to its lack of data integrity guarantees. The only advantage it has is an ORM such as Rails' ActiveRecord may make it very easy to use.

Join table per relationship type is an improvement over polymorphic joins, but at the cost of an extra table per relationship. In addition, it doesn't do anything that either of the belongs-to models can't do better.

Both belongs-to models are nearly equivalent in their data integrity and performance characteristics. The major difference is whether the acl or the resource has to exist first to satisfy the foreign key constraint. As a resource is naturally the master record it is usually more convenient and intuitive to insert the resource first. This leads me to slightly slightly prefer exclusive belongs-to.

But there are a few concerns with exclusive belongs-to.

First, multiple null fields offends traditional relational design sensibilities. However, data integrity is maintained via the check constraint.

Second, wide tables may cause performance to suffer. But in the case of PostgreSQL, null values are almost free. A nullable field has a 1-bit per row overhead rounded up to the nearest byte (e.g. 30 nullable fields have 4 bytes of overhead). Combine that with partial indexes ignoring null values and the performance is not an issue.

Third, adding a new table requires adding a column to the exclusive belongs-to table. If this was a large, heavily used table there might be an issue with how long the table would be locked. With PostgreSQL, this is not a problem. Nullable fields can be added quickly regardless of table size. The updated check constraint can also be added without blocking concurrent usage.

In conclusion, I suggest using an exclusive belongs-to model to represent a polymorphic association. If situation or developer sensibilities preclude this, then use a reverse belongs-to model. It has nearly the same characteristics.