<home

Exclusive Subtyping in SQL

This is a revised version of a blog post from 2007 called Distributed Keys and Disjoint Subtypes. A copy of that original blog post is linked at the bottom of this page.

Subtyping is a very common data modelling scenario that gets extensive coverage in books on relational design [1] but it isn’t necessarily well supported by SQL DBMSs. The following is a simple technique for implementing exclusive subtypes using what Hugh Darwen calls a distributed key – a uniqueness constraint spanning multiple tables [2].

First, here’s an example of a scenario requiring mutually exclusive subtypes. A retailer sells three types of product: Books, CDs and DVDs. In their product database all these product types have two attributes in common: SKU and Title. They also have other attributes that are unique to one type or another. The unique attributes are: Number of Pages (for Books), Number of Tracks (for CDs) and Duration (DVDs). Here’s an ORM diagram for product and subtypes [3]:

Note the exclusive-or constraint represented by the symbol: . What it means is that a Product is ether a DVD or a CD or a Book but it can’t be more than one of those things. In other words, those categories of products are mutually exclusive.

Most textbooks don’t seem to address the problem of how to enforce such a constraint in SQL. ISO Standard SQL has no special exclusion constraint or table subtype constraint syntax. Assertions (the generalised form of constraints in standard SQL) aren't implemented by most SQL DBMSs - at least not by the products I use.

The solution I like to use is to create separate tables for the generic (supertype) attributes of a product and for each product subtype. Include a ProductType attribute in both the supertype and the subtype tables. ProductType is a code identifies the type of product: B,C or D for Book, CD or DVD. Then add a composite foreign key constraint on the superkey (SKU, ProductType) in each subtype table. Here are the constraints for the Books table:

ALTER TABLE Books
  ADD CONSTRAINT BooksProductsProductTypeFK
  FOREIGN KEY (SKU, ProductType)
  REFERENCES Products (SKU, ProductType);

ALTER TABLE Books
  ADD CONSTRAINT BooksProductTypeCK
  CHECK (ProductType ='B');

Similar constraints apply to CDs and DVDs. The following is the DDL for the complete schema. To aid readability I’ve omitted the constraint names. I’ve also added some defaults:

CREATE TABLE Products
 (SKU INT NOT NULL PRIMARY KEY,
  ProductType CHAR(1) NOT NULL
  CHECK (ProductType IN ('B','C','D')),
  Title VARCHAR(50) NOT NULL,
  UNIQUE (SKU,ProductType));

CREATE TABLE Books
 (SKU INT NOT NULL PRIMARY KEY,
  ProductType CHAR(1) DEFAULT 'B' NOT NULL, CHECK (ProductType ='B'),
  Pages SMALLINT NOT NULL,
  FOREIGN KEY (SKU,ProductType)
  REFERENCES Products (SKU,ProductType));

CREATE TABLE CDs
 (SKU INT NOT NULL PRIMARY KEY,
  ProductType CHAR(1) DEFAULT 'C' NOT NULL, CHECK (ProductType ='C'),
  Tracks SMALLINT NOT NULL,
  FOREIGN KEY (SKU,ProductType)
  REFERENCES Products (SKU,ProductType));

CREATE TABLE DVDs
 (SKU INT NOT NULL PRIMARY KEY,
  ProductType CHAR(1) DEFAULT 'D' NOT NULL, CHECK (ProductType ='D'),
  Duration SMALLINT NOT NULL,
  FOREIGN KEY (SKU,ProductType)
  REFERENCES Products (SKU,ProductType));

That’s all there is to it. The requirement that product types are mutually exclusive is enforced by the combination of the composite foreign key constraints and check constraints. The presence of the ProductType attribute in the subtype tables is a violation of 2nd Normal Form. The check constraints control that aspect of redundancy by ensuring that the product type is correct and consistent in each table.

Notice the apparently redundant uniqueness constraint on (SKU,ProductType) in the Products table. SQL requires that the set of columns in a FOREIGN KEY constraint exactly matches the set of columns of a uniqueness constraint in the table being referenced, even if - as in this case - a subset of those columns already has such a constraint.

I don't claim any originality for this design pattern. When I blogged an earlier version of this article in 2007 [4] I asked if anyone knew of another source for the same technique and Jon Heggland referred me to Joe Celko's book Trees and Hierarchies in SQL (Chapter 10). Joe later used my example in the 4th edition of his SQL for Smarties book (page 83).

Thanks to Hugh Darwen, Jon Heggland, Claes Welander, Johan Dufour and others on the Third Manifesto mailing list who commented on the original version of this article.

[1] See: Halpin - Information Modeling and Relational Databases; Pascal - Practical Issues in Database Management.

[2] "A 'distributed key' value is one that must exist as a key value in no more than one of the tables over which the constraint in question is defined", How To Handle Missing Information Without Using NULL, Hugh Darwen

[3] ORM 2 notation

[4] Distributed Keys and Disjoint Subtypes