Logically and practically speaking there are really no "primary" keys in the relational model because all candidate keys can and do have exactly the same status, features and function: candidate keys are irreducibly unique, non-nullable identifiers for tuples in a relation.
The practice of singling out any one key as "preferred", "most important" or for any other purpose is a matter of convention and convenience rather than necessity. If a relation has only one candidate key then we might just as well call it a primary key by default. If a relation has more than one key then the choice of which to call primary is only as important as you want it to be and the significance of that choice is defined by the user and/or by the syntax and DBMS features used to define it.
E.F.Codd originally used the term "primary key" to mean any and all keys of a relation and stated that a relation could have more than one primary key. It was only later that the term "candidate key" appeared in the relational database literature and the term "primary key" took on its modern meaning of denoting one and only one key per relation.
What I refer to as "The Primary Key Assumption" is the conventional assumption among database professionals that exactly one key per table should be designated as primary.
*****
The idea of designating one key per table as a "primary" one is essentially superfluous, outdated and in many ways very unhelpful.
It is superfluous because logically speaking all keys can and do serve the same function. Leaving aside the limitations of any particular DBMS, logically speaking a "primary" key enjoys exactly the same features and functionality as any other key of the same table. The designation of one key as "primary" is therefore only as important as the database designer or user wants it to be. The distinction is arbitrary (that's the word used by E.F.Codd) and purely psychological (C.J.Date).
The concept is outdated because in modern practice it is commonplace for tables to have more than one key and for different users and consumers of data to have different "preferred" or "most significant" identifiers for the same piece of data. E.g.: an end user may recognise and use one key of a table (often the one called a "business" or "natural" key); a middle-tier programmer will possibly be more interested in a different key in the same table (e.g. a "surrogate" key); the DBA on the other hand may view the "clustered" key as the most important or maybe he is equally concerned with all keys that have indexes. So the preferred or most important key depends on the point-of-view and the intended usage - it is not a rigid structural feature at all.
The "primary key" concept is unhelpful for at least two reasons. Firstly, software vendors of database development tools, DBMSs and modelling tools have unfortunately attached all sorts of software features to the keys designated as "primary key". This actually works against the original concept. No longer do we just need to select one key per table that has some logical significance for the designer or user. We are encouraged or even compelled to choose "primary" keys to support this or that feature in X,Y or Z piece of software, regardless of other considerations. This is very regrettable because it represents a limitation and a lack of flexibility in software. We ought to be free to choose an appropriate key for each purpose and not be restricted to just one key per table for every purpose.
The final reason that primary keys are unhelpful is that they are a needless distraction from more important issues of database design. The primary key concept is given often vastly exaggerated significance in education, in textbooks on database design and in everyday data management practice. This is frequently to the detriment or actual exclusion of the more fundamental issue, i.e. that all of the keys and all of the other integrity constraints can be just as important to successful database design and implementation.
I have often argued that the term "primary key" ought to be deprecated and dropped from data management vocabulary as well as from data management software.
*****
Down with Primary Keys?
September
14, 2006
"Every
table should have a primary key" is a familiar phrase that I
must have used often enough. Of course I am wrong. Every table should
have at least one candidate
key.
A primary
key on
the other hand is something that isn't formally necessary at all,
although it is usually a very good idea. The designation of a primary
key is supposed to be essentially a label of convenience - an
aide-mémoire applied to one of the candidate keys in a table where
such a key has some special significance for the user.
Confusingly,
we also have something else called a PRIMARY KEY (note the capitals),
which is a type of constraint in SQL. Is a PRIMARY KEY constraint the
same thing as a primary
key?
Not really. The only significance that SQL attaches to a PRIMARY KEY
constraint is that it becomes the default for a foreign key reference
when no other columns are specified. This has never struck me as an
especially useful feature.
If the PRIMARY KEY constraint has
any other purpose at all it is to act as a hint that the key may be
singled out for special treatment in some way. Microsoft SQL Server
for example will use a clustered index for a key that is designated
as a PRIMARY KEY - but only if you don't specify otherwise and only
if no clustered index already exists on the same table. Exactly why
it does this isn't very clear. Keys don't necessarily make a good
choice for clustering.
In fact the very existence of the
PRIMARY KEY in SQL seems to be an historical accident of some kind.
According to author [Fabian Pascal] the earliest incarnations of SQL
didn't have any key constraints and PRIMARY KEY was only later addded
to the SQL standards. The designers of the standard obviously took
the term from E.F.Codd.
Unfortunately the PRIMARY KEY (and
perhaps the primary key concept
as
well to some extent) gets much more attention than it deserves. Many
books and articles about SQL are wrong to the extent that they refer
again and again to a PRIMARY KEY when they really mean just "a
candidate key". Students and even professionals still obsess
over the question of "What should be the primary key?" when
they really ought to ask "What keys do I need?". Worst of
all, keys other than the one chosen to be PRIMARY are sometimes
forgotten about and not declared at all.
So what I've often
contemplated is whether it would be better in the long run to boycott
the PRIMARY KEY constraint altogether and use the logical equivalent
- UNIQUE NOT NULL - for all my keys. I have never yet had the courage
to do this, but only because I suspect colleagues and clients
wouldn't be entirely happy about it. So here's a modest proposal for
inclusion in the next ISO SQL standard. Do away with PRIMARY KEY and
let's just have keys instead. All keys are equal and we've wasted
enough time on the PRIMARY ones already!
*****
For further reading, Chris Date put the case better than I can in "The Primacy of Primary Keys: An Investigation" reprinted in "Relational Database Writings 1991-1994" (Addison-Wesley).
See also: Why primary keys are superfluous