The Primary Key Assumption

Logically and practically speaking there are really no "primary" keys in the relational model because all candidate keys can and do have exactly the same status, features and function: candidate keys are irreducibly unique, non-nullable identifiers for tuples in a relation.

The practice of singling out any one key as "preferred", "most important" or for any other purpose is a matter of convention and convenience rather than necessity. If a relation has only one candidate key then we might just as well call it a primary key by default. If a relation has more than one key then the choice of which to call primary is only as important as you want it to be and the significance of that choice is defined by the user and/or by the syntax and DBMS features used to define it.

E.F.Codd originally used the term "primary key" to mean any and all keys of a relation and stated that a relation could have more than one primary key. It was only later that the term "candidate key" appeared in the relational database literature and the term "primary key" took on its modern meaning of denoting one and only one key per relation.

What I refer to as "The Primary Key Assumption" is the conventional assumption among database professionals that exactly one key per table should be designated as primary.

*****

Are primary keys passé?

March 1, 2012

The idea of designating one key per table as a "primary" one is essentially superfluous, outdated and in many ways very unhelpful.

It is superfluous because logically speaking all keys can and do serve the same function. Leaving aside the limitations of any particular DBMS, logically speaking a "primary" key enjoys exactly the same features and functionality as any other key of the same table. The designation of one key as "primary" is therefore only as important as the database designer or user wants it to be. The distinction is arbitrary (that's the word used by E.F.Codd) and purely psychological (C.J.Date).

The concept is outdated because in modern practice it is commonplace for tables to have more than one key and for different users and consumers of data to have different "preferred" or "most significant" identifiers for the same piece of data. E.g.: an end user may recognise and use one key of a table (often the one called a "business" or "natural" key); a middle-tier programmer will possibly be more interested in a different key in the same table (e.g. a "surrogate" key); the DBA on the other hand may view the "clustered" key as the most important or maybe he is equally concerned with all keys that have indexes. So the preferred or most important key depends on the point-of-view and the intended usage - it is not a rigid structural feature at all.

The "primary key" concept is unhelpful for at least two reasons. Firstly, software vendors of database development tools, DBMSs and modelling tools have unfortunately attached all sorts of software features to the keys designated as "primary key". This actually works against the original concept. No longer do we just need to select one key per table that has some logical significance for the designer or user. We are encouraged or even compelled to choose "primary" keys to support this or that feature in X,Y or Z piece of software, regardless of other considerations. This is very regrettable because it represents a limitation and a lack of flexibility in software. We ought to be free to choose an appropriate key for each purpose and not be restricted to just one key per table for every purpose.

The final reason that primary keys are unhelpful is that they are a needless distraction from more important issues of database design. The primary key concept is given often vastly exaggerated significance in education, in textbooks on database design and in everyday data management practice. This is frequently to the detriment or actual exclusion of the more fundamental issue, i.e. that all of the keys and all of the other integrity constraints can be just as important to successful database design and implementation.

I have often argued that the term "primary key" ought to be deprecated and dropped from data management vocabulary as well as from data management software.

*****

Down with Primary Keys?

September 14, 2006

"Every table should have a primary key" is a familiar phrase that I must have used often enough. Of course I am wrong. Every table should have at least one candidate key. A primary key on the other hand is something that isn't formally necessary at all, although it is usually a very good idea. The designation of a primary key is supposed to be essentially a label of convenience - an aide-mémoire applied to one of the candidate keys in a table where such a key has some special significance for the user.

Confusingly, we also have something else called a PRIMARY KEY (note the capitals), which is a type of constraint in SQL. Is a PRIMARY KEY constraint the same thing as a primary key? Not really. The only significance that SQL attaches to a PRIMARY KEY constraint is that it becomes the default for a foreign key reference when no other columns are specified. This has never struck me as an especially useful feature.

If the PRIMARY KEY constraint has any other purpose at all it is to act as a hint that the key may be singled out for special treatment in some way. Microsoft SQL Server for example will use a clustered index for a key that is designated as a PRIMARY KEY - but only if you don't specify otherwise and only if no clustered index already exists on the same table. Exactly why it does this isn't very clear. Keys don't necessarily make a good choice for clustering.

In fact the very existence of the PRIMARY KEY in SQL seems to be an historical accident of some kind. According to author [Fabian Pascal] the earliest incarnations of SQL didn't have any key constraints and PRIMARY KEY was only later addded to the SQL standards. The designers of the standard obviously took the term from E.F.Codd.

Unfortunately the PRIMARY KEY (and perhaps the primary key concept as well to some extent) gets much more attention than it deserves. Many books and articles about SQL are wrong to the extent that they refer again and again to a PRIMARY KEY when they really mean just "a candidate key". Students and even professionals still obsess over the question of "What should be the primary key?" when they really ought to ask "What keys do I need?". Worst of all, keys other than the one chosen to be PRIMARY are sometimes forgotten about and not declared at all.

So what I've often contemplated is whether it would be better in the long run to boycott the PRIMARY KEY constraint altogether and use the logical equivalent - UNIQUE NOT NULL - for all my keys. I have never yet had the courage to do this, but only because I suspect colleagues and clients wouldn't be entirely happy about it. So here's a modest proposal for inclusion in the next ISO SQL standard. Do away with PRIMARY KEY and let's just have keys instead. All keys are equal and we've wasted enough time on the PRIMARY ones already!

*****

For further reading, Chris Date put the case better than I can in "The Primacy of Primary Keys: An Investigation" reprinted in "Relational Database Writings 1991-1994" (Addison-Wesley).