Appendix C. Historical Notes

History is not what you thought. It is what you can remember.

This appendix presents a brief and not unbiased survey of some of the seminal research publications in the field of design theory. The publications in question are listed in chronological order, more or less.

The relational model as such had its origins in two landmark papers by Codd:

The first of these papers has nothing to say about design per se. The second, however, includes a section with the title “Normal Form” that includes the following tantalizing remarks: “Further operations of a normalizing kind are possible. These are not discussed in this paper.” These remarks appear following an example that shows how to eliminate relation valued attributes (see the answer to Exercise 12.8 in Appendix D); that’s why Codd uses the phrase “further operations” (emphasis added).

Incidentally, the second of the foregoing papers is the source of the term connection trap (see Chapter 9).

Design theory as such began with Codd’s introduction of FDs, 2NF, and 3NF in:

Two brief comments here: First, the title is misleading—further normalization isn’t something that’s done to the relational model, it’s something that’s done to relvars, or rather to relvar designs. (To repeat from the answer to Exercise 1.1 in Appendix D, the relational model as such doesn’t care how the database happens to be designed, just so long as the design in question doesn’t violate any of the precepts of the relational model.) Second, a preliminary version of some of the material in this paper can be found in two earlier papers of Codd’s. The first is an IBM technical memo, “The Second and Third Normal Forms for the Relational Model,” dated October 6th, 1970. The second is “Normalized Data Base Structure: A Brief Tutorial,” Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego, Calif., (November 11th-12th, 1971).[185]

Heath’s Theorem actually appeared before Codd’s 1972 normalization paper, in:

The following paper is generally credited with being the first to point out that relvars can exist that aren’t equal to the join of any two of their projections, but are equal to the join of three or more (though in fact, as mentioned in Chapter 9, Codd had effectively made the same observation in his 1969 paper). The paper is also the source of the chase algorithm; however, it considers only FDs and MVDs, not general JDs.

The next paper introduced the concept of projection-join normal form (PJ/NF), also called 5NF. It can be regarded as the definitive statement of what might be called “classical” normalization theory—i.e., the theory of nonloss decomposition based on projection as the decomposition operator and natural join as the corresponding recomposition operator, and the normal forms BCNF, 4NF, and 5NF.

The next paper presents a sound and complete set of inference rules—in other words, an axiomatization—for inclusion dependencies (INDs). Note: I’m not aware of any formal treatment anywhere of equality dependencies (EQDs), which are an important, but in some ways rather trivial, special case.

Note, however, that orthogonality as described in the present book is significantly different from the version discussed in the foregoing paper. (I accept full responsibility for this state of affairs; although the concept was originally due to David McGoveran, I wrote the bulk of the referenced paper, and I realize now that I was rather confused when I did so.)

Stop Press: “Our” redundancy free normal form (see Chapter 13 and Appendix B) has recently been renamed essential tuple normal form (ETNF). It’s described in:

All of the theorems, results, etc. discussed in Chapter 13 in connection with “our” RFNF (now ETNF) are proved in this paper, but the terminology has been revised somewhat. Let me relate the terminology of the paper to the terminology of Chapter 13. First of all, let R be a relvar, let r be a value of R, and let t be a tuple in r. Then t is redundant in r if and only if it’s either partly redundant or fully redundant. The paper shows that (a) such a tuple exists and is partly redundant if and only if R isn’t in BCNF (i.e., if and only if R is FD redundant—see Chapter 13); (b) such a tuple exists and is fully redundant if and only if a tuple forcing JD holds in R (i.e., if and only if R is JD redundant—again, see Chapter 13). Note, therefore, that “fully redundant” is not a special case of “partly redundant”; in fact, a tuple can be partly redundant without being fully so or the other way around. Finally, tuple t is essential in r if and only if it’s not redundant in r. If R is in ETNF, then every relation r that’s a legitimate value for R is such that every tuple is essential in r. Note: Our choice of the term essential for use in this context was influenced by Codd’s notion of essentiality, introduced in E. F. Codd and C. J. Date: "Interactive Support for Nonprogrammers: The Relational and Network Approaches," in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control—Data Models: Data-Structure-Set versus Relational (Ann Arbor, Michigan, May 1st-3rd, 1974).[187] Briefly, to say some data construct is essential in Codd’s sense is to say its loss would cause a loss of information. As already indicated, every tuple in every relation that’s a possible value for an ETNF relvar is essential in this sense.



[185] This second paper isn’t concerned so much with 2NF and 3NF per se as it is with the idea that relations can represent anything that other data structures—hierarchies, networks, etc.—can. It does discuss 2NF and 3NF very briefly, but its coverage of those topics is essentially limited to giving a single fairly informal example in each case.

[186] Actually there are scores of theoretical results in the computing literature, not just in the field of design theory as such, that could all justifiably be referred to as “Fagin’s Theorem.”

[187] Republished in C. J. Date, Relational Database: Selected Writings (Addison-Wesley, 1986).