<home

The Principle of Orthogonal Design

The Principle of Orthogonal Design (POD) is a database design principle devised by Chris Date and Dave McGoveran[1]. It is a formalised version of a familiar idea in software engineering, namely "Don't repeat yourself" or "Once and once only". POD tries to define precisely what kind of repetition ought to be avoided in a relational database design. It complements other design principles, such as Normalization, and helps to identify and eliminate problems that won't necessarily be avoided just by satisfying the Normal Forms.

The Principle of Orthogonal Design
Let A and B be two distinct relvars. There must not exist nonloss decompositions of A and B into A1, A2, …, Am and B1, …, Bn (respectively) such that two distinct projections in the set A1, A2, …, Am, B1, B2, …, Bn have overlapping meanings.

What this means is that database tables should not overlap in their meaning either in whole or in part. At its simplest, POD is violated if two distinct relvars (tables) consist of the same attributes. Take the example of a personnel database where each department's employees are held in separate but identical tables.


SALES
EmployeeNum EmployeeName
10372 Alice
10442 Bob
10433 Charles

MARKETING
EmployeeNum EmployeeName
10185 Doug
10251 Ellen
10575 Frank

Underlining denotes the key in each table. Assuming that the functional dependency EmployeeNum → EmployeeName is the only dependency here then we can be sure that the SALES and MARKETING tables satisfy the requirements of 6th Normal Form. In other words the tables are "fully normalized". There are some fairly clear disadvantages to this design however:

The obvious remedy is simply to combine the employee data into a single table and add a “Dept” column to identify the department of each employee.

Not all examples of repetition in table designs are so obvious. A similar situation can occur where table structures are not identical but only partially overlap - i.e. some subset of attributes is shared between tables. Tables that are structurally identical are just a worst case example of the more general problem of table structures which overlap. Table structures overlap when two distinct tables share some set of attributes which could be non-loss decomposed from those tables.[2] If a table satisfies 5th Normal Form then we know that the only possible non-loss decompositions are on sets of attributes which include a key (because any non-loss decomposition not implied by the keys of a table is a violation of 5th Normal Form).

Here's a revised version of our personnel database. The departments have been combined into a single table:

EMPLOYEE
EmployeeNum EmployeeName Dept
10185 Doug M
10251 Ellen M
10372 Alice S
10433 Charles S
10442 Bob S
10575 Frank M


This corrects the original problem of the duplicated table structures but now let's assume we need to add two more tables, to record the details of permanent and temporary staff respectively:


EMPLOYEE_PERM
EmployeeNum Salary StartDate
10185 41500 2010-09-10
10251 48000 2011-01-10
10442 42900 2013-09-10
10575 51200 2013-10-18

EMPLOYEE_TEMP
EmployeeNum Rate StartDate
10372 400 2013-02-04
10433 450 2013-04-30

The two extra tables exist because the attributes associated with permanent and temporary employees are different. Only permanent employees have a Salary attribute; only temporary employees have a Rate attribute. The StartDate is common to all employees but here it appears in both tables and that could give rise to some of the same disadvantages of our original two-table design. The presence of an attribute in multiple tables doesn't automatically imply a violation of POD because the POD is concerned only with sets of attributes common to non-loss decompositions of more than one table. In this case however, the non-loss decompositions from the EMPLOYEE_PERM and EMPLOYEE_TEMP tables would both include the projection on EmployeeNum and StartDate and therefore this design does not satisfy the requirements of POD. Assuming a set of tables satisfy 5th Normal Form, decompositions that may violate POD are only possible where multiple tables have at least one key shared between them. In the present case the shared key is obviously EmployeeNum.

We can satisfy 5th Normal Form by putting the StartDate attribute into any (or all!) of the three tables we have defined. However, the Principle of Orthogonal Design requires that the combination of EmployeeNum and StartDate can appear only in one place and logically enough that has to be the EMPLOYEE table because StartDate applies to all employees.

POD in its most complete and precise form poses some problems of interpretation and practicality. For example, is it always possible to satisfy POD and if not then what is the determining factor? Are there "good" and "bad" kinds of overlapping tables? When exactly do two identical projections qualify as "overlapping" each other? This last question is something that Chris Date has answered in several different ways over the years and is perhaps still open to interpretation. I'm not attempting to address all the issues here. Even so, the value of POD is that it can act as a useful guide to design issues that are otherwise difficult to define and describe.


[1] Date and McGoveran's original writings on the subject can be found here:

Part 1: http://web.archive.org/web/20100224075429/http://www.dbdebunk.com/page/page/622331.htm

Part 2: http://web.archive.org/web/20100224075056/http://www.dbdebunk.com/page/page/622312.htm

The formal definition was refined several times in later publications. For the purposes of this short introduction I've only given a simplified description that I believe is true to the general intention of POD but without covering all the details of its final form.


[2] In a later refinement of the principle its authors say that two tables overlap in violation of POD only when constraints allow that a given tuple could be inserted into either table. The version I'm describing here is stricter but simpler to explain.