SIXTH NORMAL FORM

Having said, or at least implied, that we won’t be departing in this chapter from our usual assumptions regarding decomposition and recomposition operators, I’ll begin my discussion of sixth normal form by doing exactly that ... In our book Temporal Data and the Relational Model (Morgan Kaufmann, 2003), Hugh Darwen, Nikos Lorentzos, and I define:

  1. Generalized versions of the projection and join operators, and hence

  2. A generalized form of join dependency, and hence

  3. A new normal form, which we call 6NF.

As the title of that book might suggest, these developments turn out to be particularly important in connection with temporal data, and they’re discussed in detail in that book. However, temporal data as such is beyond the scope of the book you’re reading right now; all I want to do here is give a definition of 6NF that works for “regular”—i.e., nontemporal—data (and I’ll assume from this point forward that all data is “regular” in this sense). Appealing only to projection and join as classically defined, therefore (and hence only to JDs as classically defined also),[122] here’s the 6NF definition:

Of course, we can never get rid of trivial dependencies; thus, a relvar in 6NF can’t be nonloss decomposed at all, other than trivially. For that reason, a 6NF relvar is sometimes said to be irreducible (yet another kind of irreducibility, observe). Our usual shipments relvar SP is in 6NF, and so is relvar CTXD from Chapter 9; by contrast, our usual parts relvar P is in 5NF but not 6NF. (By contrast, our usual suppliers relvar S isn’t even in 3NF, of course.)

Now, it follows immediately from the definition that every 6NF relvar is certainly in 5NF—i.e., 6NF implies 5NF. (That’s why it’s reasonable to use the name sixth normal form, because 6NF really does represent another step along the classical road from 1NF to 2NF to ... to 5NF.) What’s more, 6NF is always achievable. It’s also intuitively attractive, for the following reason: If relvar R is replaced by its 6NF projections R1, ..., Rn, then the predicates for R1, ..., Rn are all simple, and the predicate for R overall is the conjunction of those simple predicates (i.e., it’s a conjunctive predicate). Let me immediately explain what I mean by these remarks:

For example, suppose we replace relvar P by its projections PN, PL, PW, and PC on attributes {PNO,PNAME}, {PNO,COLOR}, {PNO,WEIGHT}, and {PNO,CITY}, respectively. Then the predicates for these projections are as follows (note that they’re all simple):

And the predicate for P itself is the AND of these four.[123] As the example shows, therefore, relvars in 6NF can be thought of as breaking the meaning of the data down into pieces that can’t be broken down any further (they represent what are sometimes called “atomic facts” or, perhaps preferably, “irreducible facts”). Loosely, we might say the predicate for a 6NF relvar doesn’t involve any ANDs.

Aside: In this connection, let me briefly remind you of relvars CTX and SPJ from Chapter 12 and Chapter 9, respectively. For CTX, the predicate was certainly conjunctive—Course CNO can be taught by teacher TNO and course CNO uses textbook XNO—and decomposing that relvar into its binary (and in fact 6NF) projections on {CNO,TNO} and {CNO,XNO} effectively eliminated the AND. As for SPJ, the predicate there was conjunctive too, even though it didn’t appear so in the simplified form in which I stated it. Here’s a more complete version: Supplier SNO supplies part PNO to some project JNO and part PNO is supplied to project JNO by some supplier SNO and project JNO is supplied by supplier SNO with some part PNO. Again, decomposing the relvar into its three binary (and in fact 6NF) projections eliminates the ANDs. End of aside.

Here now is a nice characterization of 6NF (in fact, it’s a theorem):

For example, let relvar PLUS have attributes A, B, and C (so the degree is three), and let the relvar predicate be A + B = C. Then PLUS is in 5NF, and it has three keys (viz., AB, BC, and CA, to use Heath notation once again); however, none of those keys is of degree less than two, and PLUS is thus in 6NF.[124]

By the way, please don’t misunderstand me—I’m not saying that relvars should always be in 6NF, or that normalization should always be carried as far as 6NF. Sometimes some lower normal form (5NF, say) is at least adequate. What’s more, to repeat something I said in Chapter 8, a design can be fully normalized (meaning the relvars are all in 5NF, or even 6NF) and yet still be bad. For example, the projection of the suppliers relvar S on {SNO,STATUS} is certainly in 6NF, but it’s not a good design, as we saw in Chapter 6.

Another point to consider is that replacing a 5NF relvar by 6NF projections will probably lead to the need to enforce certain equality dependencies (EQDs). As we saw in the previous section, an EQD is a constraint to the effect that certain projections of certain relvars must be equal (speaking a trifle loosely). For example, if we decompose relvar P as discussed above into its projections PN, PL, PW, and PC, then the following constraints will probably apply:

     CONSTRAINT ... PL { PNO } = PN { PNO } ;
     CONSTRAINT ... PW { PNO } = PN { PNO } ;
     CONSTRAINT ... PC { PNO } = PN { PNO } ;

On the other hand, as explained elsewhere,[125] decompositions like the one under discussion can be a good basis for dealing with missing information. Suppose every part does always have a known name but doesn’t necessarily have a known color, weight, or city. Then a part with no known color will simply have no tuple in relvar PL (and similarly for weights and cities and relvars PW and PC, respectively). Of course, the equality dependencies will then become inclusion dependencies (actually foreign key constraints), from PL to PN, PW to PN, and PC to PN, respectively.

The net of the foregoing discussion is as follows (I’ll express it in terms of the parts example, just for definiteness): If there are two or more properties that every part always has—say name and color—then separating those two properties into distinct projections is probably a bad idea; but if some property is “optional” (in other words, has the potential to be “missing” or unknown), then placing that property in a relvar of its own is probably a good idea.



[122] So I’m not really departing from our usual assumptions after all.

[123] In other words, every part has exactly one name, color, weight, and city. Indeed, it’s precisely because these things are so that we don’t actually need to decompose relvar P into its projections PN, PL, PW, and PC if we don’t want to; the single relvar P can effectively serve as shorthand for the combination of those four relvars.

[124] Actually, PLUS might be a relation constant rather than a relation variable—but it still has keys.

[125] See either SQL and Relational Theory or the book Database Explorations: Essays on The Third Manifesto and Related Topics, by Hugh Darwen and myself (Trafford, 2010).