... AND STILL ANOTHER

Consider now a revised version of the example from the previous section in which suppliers are again partitioned into classes, but each class has just one associated city (where each city in turn has just one associated status, as before). So we have a relvar RX3 that looks like this (again I’ll ignore supplier names for simplicity):

     RX3 { SNO , CLASS , CITY , STATUS }

In fact, of course, RX3 has the same heading as RX2 did, but the predicate is different: Supplier SNO is part of class CLASS, which has associated city CITY, which has status STATUS. The following FDs hold among others:

     { SNO }   → { CLASS }
     { CLASS } → { CITY }
     { CITY }  → { STATUS }

Relvar RX3 isn’t in 3NF, because in the FD {CLASS} → {CITY}, {CLASS} isn’t a superkey and {CITY} isn’t a subkey. (The same goes for {CITY} → {STATUS}, mutatis mutandis.) The conventional normalization procedure would thus recommend that we decompose the relvar by applying Heath’s Theorem to that FD {CLASS} → {CITY}. But if we do, this is what we get:

     RX3A { CLASS , CITY }
          KEY { CLASS }

     RX3B { SNO , CLASS , STATUS }
          KEY { SNO }

RX3A is in 3NF but RX3B is only in 2NF—and as you can see, the FD {CITY} → {STATUS} is lost. In fact, it would have been better to decompose on the basis of the FD {CITY} → {STATUS}:

     RX3A' { CITY , STATUS }
           KEY { CITY }

     RX3B' { SNO , CLASS , CITY }
           KEY { SNO }

RX3A' is in 3NF while RX3B' is only in 2NF, but at least the FD {CITY} → {STATUS} has been preserved. What’s more, we can now go on to decompose RX3B' on the basis of the FD {CLASS} → {CITY} to obtain:

     RX3BA' { CLASS , CITY }
            KEY { CLASS }

     RX3BB' { SNO , CLASS }
            KEY { SNO }

These relvars are both in 3NF.

So now we’ve seen four different examples of decompositions in which FDs are or might be lost. There’s more that could be said on the topic, but one clear message is: The conventional normalization procedure—in fact, the one that’s often taught in practice—is inadequate in several respects. To be specific:

Of course, the very nomenclature of “first,” “second,” etc. reinforces this last perception ... but that nomenclature is really nothing more than a historical accident, in a way. I mean, if the first of the normal forms to be defined had been BCNF—which it easily could have been, since the definition is so conceptually simple, involving as it does no mention of FD irreducibility, nonkey attributes, subkeys, 1NF, 2NF, or 3NF—then there would really never have been any need to call out 2NF and 3NF as specific normal forms, as such, at all.[58]



[58] In support of this contention, I’d like to quote something Codd himself had to say in the paper in which he introduced 2NF and 3NF (see Appendix C): “The basic ideas underlying [these] normal forms are simple, but they have many subtle ramifications. The author has found that numerous examples are needed to explain and motivate the precise definitions of these normal forms.”