Chapter 13. Additional Normal Forms

—Tom Stoppard: Rosencrantz and Guildenstern Are Dead

Now, this is not the end. It is not even the beginning of the end.

—Winston Churchill: The End of the Beginning

But it is, perhaps, the end of the beginning.

To paraphrase something I said in Chapter 9, I’ve assumed so far in this book that the only dependencies we care about^[120] are ones that have to do with projection as the decomposition operator and join as the corresponding recomposition operator. I also said that, given that assumption, it followed that 5NF was the final normal form. However, I did also say, in a footnote, that there was something called “sixth” normal form or 6NF. In fact, it turns out that we can define, not just 6NF as such, but several other normal forms also, all without departing from those same assumptions regarding available decomposition and recomposition operators. Figure 13-1 (an extended version of Figure 3-3 from Chapter 3) shows how some of those additional normal forms—viz., RFNF, SKNF, and 6NF, shown in boldface italics in the figure—fit into the overall scheme of things, as it were. In this chapter, I’ll be describing those three normal forms as well as (briefly) a few more, for completeness.

Figure 13-1. The normal form hierarchy (II)

EQUALITY DEPENDENCIES

Before describing the various additional normal forms as such, I need to spend a little time on another preliminary matter. Recall from Chapter 3 the example in which relvar S was replaced by its projections SNC and CT on {SNO,SNAME,CITY} and {CITY,STATUS}, respectively. As part of my discussion of that example, I pointed out that the following constraint—

     CONSTRAINT ... SNC { CITY } = CT { CITY } ;

—holds (or at least might hold) in the result of the decomposition, and I mentioned that this constraint was in fact an equality dependency. Here’s a definition:

Definition: Let R1 and R2 be relvars with headings H1 and H2, respectively. Also, let X1 and X2 be subsets of H1 and H2, respectively, such that there exists a possibly empty set of attribute renamings such that the result, R, of applying those renamings to the projection R1{X1} has heading X2. Then an equality dependency (EQD) between R1 and R2 is a statement to the effect that R and R2{X2} must be equal. (More generally, an EQD is any constraint that requires two relations to be equal.)

Actually, equality dependencies are an important special case of a more general phenomenon known as inclusion dependencies:

Definition: Let R1 and R2 be relvars with headings H1 and H2, respectively. Also, let X1 and X2 be subsets of H1 and H2, respectively, such that there exists a possibly empty set of attribute renamings such that the result, R, of applying those renamings to the projection R1{X1} has heading X2. Then an inclusion dependency (IND) from R1 to R2 is a statement to the effect that R must be included in (i.e., be a subset of) R2{X2}. (More generally, an IND is any constraint that requires one relation to be included in another.)

Points arising from this latter definition:

A foreign key constraint is a special case of an IND. In the suppliers-and-parts database, for example, {SNO} in relvar SP is a foreign key, referencing the key {SNO} in relvar S; thus, there’s an IND from SP to S—the projection of SP on {SNO} is included in the projection of S on {SNO}. But note that (to use the notation of the foregoing definition) INDs in general, unlike foreign key constraints in particular, don’t require X2 to be a key (or even a superkey) for R2.
As already noted, an EQD is a special case of an IND, too. To be more specific, the EQD “A = B” is equivalent to the pair of INDs “A is included in B” and “B is included in A.” In other words, an EQD is an IND that goes both ways, as it were.

Now, we’re going to be seeing lots of examples of EQDs in particular, as opposed to INDs in general, in what follows. In fact this state of affairs should be obvious: Nonloss decomposing a relvar into projections usually leads to INDs at least and often to EQDs, as we already know. However, it’s EQDs that don’t arise as a result of nonloss decomposition that are the interesting ones, in a way. The reason is that the existence of such an EQD often turns out to be a mark of redundancy—because if (as I put it in Chapter 3) some piece of information is recorded twice, an EQD might be what’s needed to keep the two representations in agreement.

By the way, if you haven’t heard much about EQDs before, you might be wondering why not, given their conceptual importance. In my opinion, the most likely reason for the omission is the SQL language ... As you’ll know if you’ve ever tried the exercise, EQDs are extremely awkward to formulate in SQL, because SQL has no direct way of expressing relational comparisons.^[121] A striking example in support of this contention can be found in the discussion of Example 12 in the section of that name in Chapter 15.

^[120] Apart from equality and inclusion dependencies, that is.

^[121]By SQL here, I mean SQL as defined by the SQL standard. The situation is even worse in mainstream implementations, where most EQDs can’t be formulated at all, owing to the fact that the implementations in question don’t allow subqueries in constraints.