Appendix C. Historical Notes

History is not what you thought. It is what you can remember.

—W. C. Sellar and R. J. Yeatman: 1066 and All That

This appendix presents a brief and not unbiased survey of some of the seminal research publications in the field of design theory. The publications in question are listed in chronological order, more or less.

The relational model as such had its origins in two landmark papers by Codd:

E. F. Codd: “Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks,” IBM Research Report RJ599 (August 19th, 1969) and elsewhere.
E. F. Codd: “A Relational Model of Data for Large Shared Data Banks,” CACM 13, No. 6 (June 1970); republished in Milestones of Research—Selected Papers 1958-1982 (CACM 25th Anniversary Issue), CACM 26, No. 1 (January 1983) and elsewhere.

The first of these papers has nothing to say about design per se. The second, however, includes a section with the title “Normal Form” that includes the following tantalizing remarks: “Further operations of a normalizing kind are possible. These are not discussed in this paper.” These remarks appear following an example that shows how to eliminate relation valued attributes (see the answer to Exercise 12.8 in Appendix D); that’s why Codd uses the phrase “further operations” (emphasis added).

Incidentally, the second of the foregoing papers is the source of the term connection trap (see Chapter 9).

Design theory as such began with Codd’s introduction of FDs, 2NF, and 3NF in:

E.F. Codd: “Further Normalization of the Data Base Relational Model,” in Randall J. Rustin, ed., Data Base System: Courant Computer Science Symposia Series 6 (Prentice Hall, 1972).

Two brief comments here: First, the title is misleading—further normalization isn’t something that’s done to the relational model, it’s something that’s done to relvars, or rather to relvar designs. (To repeat from the answer to Exercise 1.1 in Appendix D, the relational model as such doesn’t care how the database happens to be designed, just so long as the design in question doesn’t violate any of the precepts of the relational model.) Second, a preliminary version of some of the material in this paper can be found in two earlier papers of Codd’s. The first is an IBM technical memo, “The Second and Third Normal Forms for the Relational Model,” dated October 6th, 1970. The second is “Normalized Data Base Structure: A Brief Tutorial,” Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego, Calif., (November 11th-12th, 1971).^[185]

Heath’s Theorem actually appeared before Codd’s 1972 normalization paper, in:

I. J. Heath: “Unacceptable File Operations in a Relational Data Base,” Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, San Diego, Calif. (November 11th-12th, 1971).
BCNF was defined in the following paper (though it was there referred to as third normal form):
E. F. Codd: “Recent Investigations into Relational Data Base Systems,” Proc. IFIP Congress, Stockholm, Sweden (North-Holland, 1974) and elsewhere.
That same IFIP meeting also saw the first presentation of Armstrong’s axioms for FDs:
W. W. Armstrong: “Dependency Structures of Data Base Relationships,” Proc. IFIP Congress, Stockholm, Sweden (North-Holland, 1974).
MVDs and 4NF and what in Chapter 12 I referred to as Fagin’s Theorem^[186] were all defined in:
Ronald Fagin: “Multivalued Dependencies and a New Normal Form for Relational Databases,” ACM TODS 2, No. 3 (September 1977).
The MVD axiomatization was reported in:
Catriel Beeri, Ronald Fagin, and John H. Howard: “A Complete Axiomatization for Functional and Multivalued Dependencies,” Proc. 1977 ACM SIGMOD International Conference on Management of Data, Toronto, Canada (August 1977).
The theory of dependency preservation had its origins in:
Jorma Rissanen: “Independent Components of Relations,” ACM TODS 2, No. 4 (December 1977).

The following paper is generally credited with being the first to point out that relvars can exist that aren’t equal to the join of any two of their projections, but are equal to the join of three or more (though in fact, as mentioned in Chapter 9, Codd had effectively made the same observation in his 1969 paper). The paper is also the source of the chase algorithm; however, it considers only FDs and MVDs, not general JDs.

A. V. Aho, C. Beeri, and J. D. Ullman: “The Theory of Joins in Relational Databases,” ACM TODS 4, No. 3 (September 1979); previously published in Proc. 19th IEEE Symposium on Foundations of Computer Science (October 1977).
JDs as such were first defined in:
Jorma Rissanen: “Theory of Relations for Databases—A Tutorial Survey,” Proc. 7th Symposium on Mathematical Foundations of Computer Science, Springer-Verlag Lecture Notes in Computer Science 64 (Springer-Verlag, 1979).

The next paper introduced the concept of projection-join normal form (PJ/NF), also called 5NF. It can be regarded as the definitive statement of what might be called “classical” normalization theory—i.e., the theory of nonloss decomposition based on projection as the decomposition operator and natural join as the corresponding recomposition operator, and the normal forms BCNF, 4NF, and 5NF.

Ronald Fagin: “Normal Forms and Relational Database Operators,” Proc. 1979 ACM SIGMOD International Conference on Management of Data, Boston, Mass. (May/June 1979).

The next paper presents a sound and complete set of inference rules—in other words, an axiomatization—for inclusion dependencies (INDs). Note: I’m not aware of any formal treatment anywhere of equality dependencies (EQDs), which are an important, but in some ways rather trivial, special case.

Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou: “Inclusion Dependencies and Their Interaction with Functional Dependencies,” Proc. 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Calif. (March 1982).
Domain-key normal form is defined in:
Ronald Fagin: “A Normal Form for Relational Databases That Is Based on Domains and Keys,” ACM TODS 6, No. 3 (September 1981).
6NF is defined in:
C. J. Date, Hugh Darwen, and Nikos A. Lorentzos: Temporal Data and the Relational Model (Morgan Kaufmann, 2003).
As for orthogonality, the concept was first discussed, though not by that name, in:
C. J. Date and David McGoveran: “A New Database Design Principle,” in C. J. Date, Relational Database Writings 1991-1994 (Addison-Wesley, 1995); previously published in Database Programming & Design 7, No. 7 (July 1994).

Note, however, that orthogonality as described in the present book is significantly different from the version discussed in the foregoing paper. (I accept full responsibility for this state of affairs; although the concept was originally due to David McGoveran, I wrote the bulk of the referenced paper, and I realize now that I was rather confused when I did so.)

Stop Press: “Our” redundancy free normal form (see Chapter 13 and Appendix B) has recently been renamed essential tuple normal form (ETNF). It’s described in:

Hugh Darwen, C. J. Date, and Ronald Fagin: “A Normal Form for Preventing Redundant Tuples in Relational Databases,” Proc. 15th International Conference on Database Theory, Berlin, Germany (March 26th-29th, 2012).

All of the theorems, results, etc. discussed in Chapter 13 in connection with “our” RFNF (now ETNF) are proved in this paper, but the terminology has been revised somewhat. Let me relate the terminology of the paper to the terminology of Chapter 13. First of all, let R be a relvar, let r be a value of R, and let t be a tuple in r. Then t is redundant in r if and only if it’s either partly redundant or fully redundant. The paper shows that (a) such a tuple exists and is partly redundant if and only if R isn’t in BCNF (i.e., if and only if R is FD redundant—see Chapter 13); (b) such a tuple exists and is fully redundant if and only if a tuple forcing JD holds in R (i.e., if and only if R is JD redundant—again, see Chapter 13). Note, therefore, that “fully redundant” is not a special case of “partly redundant”; in fact, a tuple can be partly redundant without being fully so or the other way around. Finally, tuple t is essential in r if and only if it’s not redundant in r. If R is in ETNF, then every relation r that’s a legitimate value for R is such that every tuple is essential in r. Note: Our choice of the term essential for use in this context was influenced by Codd’s notion of essentiality, introduced in E. F. Codd and C. J. Date: "Interactive Support for Nonprogrammers: The Relational and Network Approaches," in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control—Data Models: Data-Structure-Set versus Relational (Ann Arbor, Michigan, May 1st-3rd, 1974).^[187] Briefly, to say some data construct is essential in Codd’s sense is to say its loss would cause a loss of information. As already indicated, every tuple in every relation that’s a possible value for an ETNF relvar is essential in this sense.

^[185] This second paper isn’t concerned so much with 2NF and 3NF per se as it is with the idea that relations can represent anything that other data structures—hierarchies, networks, etc.—can. It does discuss 2NF and 3NF very briefly, but its coverage of those topics is essentially limited to giving a single fairly informal example in each case.

^[186]Actually there are scores of theoretical results in the computing literature, not just in the field of design theory as such, that could all justifiably be referred to as “Fagin’s Theorem.”

^[187]Republished in C. J. Date, Relational Database: Selected Writings (Addison-Wesley, 1986).