Chapter 9. JDs and 5NF (Informal)

If you can’t beat them, join them

Just as Boyce/Codd normal form is defined in terms of functional dependencies, so fifth normal form (5NF) is defined in terms of join dependencies (JDs);[87] as noted in Chapter 4, in fact, 5NF is the normal form with respect to JDs, just as BCNF is the normal form with respect to FDs. And the treatment of these ideas in this part of the book therefore parallels the treatment of BCNF and FDs in Part II. In other words, I plan to treat the material both formally, in Chapter 10, and informally in the present chapter.

Let me immediately add that although 5NF is indeed “the” normal form with respect to JDs, this state of affairs shouldn’t be taken to mean that 5NF is the ultimate goal of the normalization process. Au contraire, in fact: There are at least two other normal forms that have a better claim to that title, as we’ll see in Chapter 13. From a pedagogical point of view, however—as well as from a historical one—I think it’s desirable to discuss 5NF in detail first. (I mention this point simply in order to avoid giving a false impression; one of my reviewers felt I should have presented the material in a different sequence, but I don’t agree.)

Now, in previous writings I’ve tended to regard JDs as if they were just a generalized kind of FD. I now think this perception is wrong, or at least misleading; I now think it’s better to regard JDs as a completely different phenomenon. Of course, FDs and JDs are both dependencies (i.e., constraints), and they do resemble each other in certain respects; in particular, the fact that a certain JD holds in relvar R implies that R can be nonloss decomposed in certain ways, just as the fact that a certain FD holds in relvar R also implies that R can be nonloss decomposed in certain ways. It’s also true that every FD implies a JD, so that if some FD F holds in relvar R, a certain JD J holds in R also. But not all JDs are implied by FDs; in fact, to speak very loosely—but I must emphasize that what I’m about to say is extremely imprecise—we might say that 5NF has to do with JDs that aren’t implied by FDs. That is, it’s if some relvar R is in BCNF, but is subject to some JD that’s not implied by FDs, that the notion of 5NF might be relevant.

Now, a relvar is in BCNF if and only if all FDs to which it’s subject are implied by keys. As you’d probably expect, therefore, a relvar is in 5NF if and only if all JDs to which it’s subject are implied by keys.[88] However, this latter notion—i.e., the notion of JDs being implied by keys—is a bit trickier to pin down than its FD counterpart; in fact, there’s a very rich theory surrounding these ideas, as we’ll soon see, and some of that theory can be a little overwhelming (not to say confusing) at first. You need to keep a clear head! As someone much more knowledgeable in these matters than I am once said to me: JDs are very mysterious.

So much for the preamble; now let’s get down to specifics.

Most of the time in this book so far, I’ve been making a tacit assumption: namely, I’ve been assuming that when we decompose some relvar, we always do so by replacing that relvar by exactly two of its projections.[89] (Note that Heath’s Theorem, which provides the formal underpinning for most of what I’ve had to say so far regarding nonloss decomposition, specifically addresses decomposition into exactly two projections.) What’s more, that assumption was fully warranted, so long as our target was only BCNF—in other words, it successfully carried us as far as that specific target. So you might be surprised to learn that there exist relvars that can’t be nonloss decomposed into two projections but can be nonloss decomposed into three (or maybe more than three).

As an aside, I note that, remarkably enough, Codd gave an example in 1969, in his very first paper on the relational model (see Appendix C), that showed he was aware of this possibility. However, that example was apparently overlooked by most of the paper’s original readers; certainly it seems to have come as a surprise to the research community when the possibility was rediscovered several years later (in 1977, to be precise).

Now, I said earlier, albeit loosely, that 5NF had to do with JDs that aren’t implied by FDs. I can now add, though again speaking very loosely, that it has to do with relvars that can’t be nonloss decomposed into two projections but can be nonloss decomposed into three or more. In other words, it’s when these circumstances arise—i.e., when there are JDs that aren’t implied by FDs, and relvars that can only be nonloss decomposed into more than two projections—that you do really have to come to grips with JDs and 5NF.

So what exactly do we mean when we say some JD holds in some relvar? Here’s a definition:

By way of a simple example, consider the suppliers relvar S once again. As we know, that relvar is subject to the FD {CITY} → {STATUS}, and so Heath’s Theorem tells us it can be nonloss decomposed into its projections on {SNO,SNAME,CITY} and {CITY,STATUS}. In other words, the following JD holds in that relvar:

      { { SNO , SNAME , CITY } , { CITY , STATUS } }

Points arising:

As a simple illustration of this last point, to say the following JD holds in relvar S—

      { { SNO , SNAME , CITY } , { CITY , STATUS } }

—is to say the tuple (s,n,t,c) appears in S if and only if there’s a tuple in S with SNO = s and SNAME = n and CITY = c and there’s a tuple in S with CITY = c and STATUS = t.

To continue with this example for a moment, the fact that the specified JD holds in relvar S is a logical consequence of Heath’s Theorem, as we know. In fact, we can now restate Heath’s Theorem as follows:

As stated earlier, therefore, FDs imply JDs—but not all JDs are implied by FDs, as we’ll see. Before I elaborate on this point, however, let me stress the requirement that the union of the components of a given JD must be equal to the pertinent heading. No analogous requirement applies to FDs; with FDs, the left and right sides don’t have to be such that their union is equal to the pertinent heading, they only have to be subsets of that heading. This distinction might help to illustrate the point (at least intuitively) that JDs and FDs really are different in kind, in a sense.

Now, the JD in the foregoing example—

      { { SNO , SNAME , CITY } , { CITY , STATUS } }

—is binary: It has two components, and it corresponds to a nonloss decomposition into two projections. Here by contrast is another JD that holds in relvar S:

      { { SNO , SNAME } , { SNO , CITY } , { CITY , STATUS } }

This one is ternary, but it’s derived, in effect, by “cascading” two binary ones:

It follows that the given ternary JD holds in the original relvar. By contrast, in the section immediately following, I’ll give an example of a ternary JD that’s not derived by cascading binary ones, and hence an example of a relvar that can be nonloss decomposed into three projections and not into two.



[87] So is 4NF, but I’m going to ignore 4NF (mostly) until Chapter 12.

[88] This very informal definition is the only one I’ll be giving for 5NF in the present chapter.

[89] Of course, I’m referring here to an individual step in the overall process. Clearly, repeated steps—i.e., repeated individual decompositions—will yield a result consisting of more than two projections, in general, even if each individual decomposition is into just two.

[90] The keyword EXISTS in the following definition denotes the existential quantifier. Refer to SQL and Relational Theory if you need a tutorial on such matters.

[91] I’m appealing here to an easily proved theorem: viz., that the FD XY holds in some projection of relvar R if and only if it holds in R itself (see Exercise 12.5 in Chapter 12).