In Chapter 13, I discussed a normal form I called RFNF (redundancy free normal form). However, I did also mention in that chapter the fact that that very same name had been used earlier in a paper by Millist W. Vincent to mean something rather different.[180] In this appendix, I present a brief introduction to Vincent’s RFNF.
Consider our usual suppliers relvar S, with its FD {CITY} → {STATUS} and sample value as shown in Figure 1-1. The tuple for supplier S1 in that relvar has city London and status 20; as a consequence, the tuple for supplier S4, which also has city London, must have status 20, for otherwise the FD {CITY} → {STATUS} would be violated. In a sense, therefore, the occurrence of that status value 20 in the tuple for supplier S4 is redundant, because there’s nothing else it could possibly be—it’s a logical consequence of, and is fully determined by, the values appearing elsewhere in the relation that’s the current value of the relvar at the time in question.
Examples like the foregoing provide the motivation for the following intuitively attractive definition (due to Vincent but considerably paraphrased here):
Definition: Let relation r be a value of relvar R, let t be a tuple in r, and let v be an attribute value within t. Then that occurrence of v within t is redundant in r, and R is subject to redundancy, if and only if replacing that occurrence of v by an occurrence of v′ (v′ ≠ v), while leaving everything else unchanged, causes some FD or JD of R to be violated.
In other words, redundancy exists if the attribute value occurrence in question must be v and nothing else.
Aside: Even though I said the foregoing definition is intuitively attractive (and I think it is), I think I should also point out that in at least one respect it’s a little strange, too. Consider the motivating example again, in which the tuple in relvar S for supplier S4 had to have status value 20 because the tuple for supplier S1 had status value 20. Observe now that the reverse argument holds equally well: The tuple for supplier S1 has to have status value 20 because the tuple for supplier S4 has status value 20! Now, it surely makes no sense to say those 20’s are both redundant (does it?) —but the fact that it appears to be arbitrary as to which of the two we regard as the redundant one does seem a little odd, at least to me. End of aside.
Be that as it may, Vincent goes on to define a new normal form, as follows:[181]
Definition: Relvar R is in (Vincent’s) redundancy free normal form, RFNF, if and only if it’s not subject to redundancy as just defined.
Now, it’s obvious that a relvar that’s not in BCNF (or indeed 4NF) isn’t in Vincent’s RFNF either by the foregoing definition (once again, see relvar S for an example). But what about a relvar that’s in “our” RFNF?[182] Well, recall the following example from Chapter 13. We’re given a relvar SPJ′, with attributes SNO (supplier number), PNO (part number), and JNO (project number), and predicate Supplier SNO supplies part PNO to project JNO. Also, the following dependencies hold:
{ SNO , PNO } → { JNO } { { SNO , PNO } , { PNO , JNO } , { JNO , SNO } }
Now, we saw in Chapter 13 that if the relvar contains the following three tuples—
t1 = s1 p1 j2
t2 = s1 p2 j1
t3 = s2 p1 j1
—then the following tuple has to appear as well:
t4 = s1 p1 j1
But {SNO,PNO} is a key; it follows that tuples t1 and t4, since they have the same key value, are in fact one and the same (and hence that j1 = j2). As a consequence, SPJ′ suffers from neither FD redundancy nor JD redundancy (as I defined those terms in Chapter 13), and the relvar is therefore in “our” RFNF.
Observe now, however, that the very fact that j2, in tuple t1, must be equal to j1 means the relvar is subject to redundancy by Vincent’s definition! It follows that the two RFNFs, ours and Vincent’s, are logically different; in fact, Vincent’s definition is strictly stronger than ours, in the sense that a relvar can be in RFNF by our definition without being in RFNF by Vincent’s, while the converse isn’t so. (It further follows that the two RFNFs really need different names, of course.) In fact, we can make the following stronger statement:
Theorem: 5NF implies SKNF; SKNF implies Vincent’s RFNF; Vincent’s RFNF implies our RFNF; and our RFNF implies 4NF. The reverse implications do not hold.
Vincent also proves the following useful result:
Theorem: Relvar R is in Vincent’s RFNF if and only if it’s in BCNF and, for every JD J that holds in R, the union of those components of J that are superkeys for R is equal to the heading of R.[183]
By way of example, consider relvar SPJ′ once again. As we know, the following JD holds in that relvar:
{ { SNO , PNO } , { PNO , JNO } , { JNO , SNO } }
The only component of this JD that’s a superkey is {SNO,PNO}; the union of all superkey components is certainly not equal to the heading, therefore, and the relvar is thus not in Vincent’s RFNF.
So SPJ′ is an example of a relvar that’s in our RFNF and not in Vincent’s. But can we find an example of a relvar that’s in Vincent’s RFNF and not in SKNF? Note that such relvars must exist, given that Vincent’s RFNF lies strictly between our RFNF and SKNF (that is, SKNF implies Vincent’s RFNF and Vincent’s RFNF implies our RFNF). Well, we can easily construct an example of such a relvar by taking relvar SPJ′ with its two dependencies as discussed previously—
{ SNO , PNO } → { JNO } { { SNO , PNO } , { PNO , JNO } , { JNO , SNO } }
—and adding another:
{ PNO , JNO } → { SNO }
This additional dependency (which implies, of course, that {PNO,JNO} is another key) corresponds to an additional business rule:
Any given part p is supplied to a given project j by at most one supplier s.
Observe now that this revised version of relvar SPJ′ satisfies the conditions of Vincent’s Theorem: To be specific, the superkey components of the JD {{SNO,PNO},{PNO,JNO},{JNO,SNO}} are {SNO,PNO} and {PNO,JNO}; their union is equal to the entire heading; and so the relvar is in Vincent’s RFNF.[184] At the same time, it isn’t in SKNF, because that same JD contains a component, {JNO,SNO}, that isn’t a superkey. So here we have an example of a relvar that’s in Vincent’s RFNF and not in SKNF.
So now we’ve seen examples of relvars that are (a) in Vincent’s RFNF and not in SKNF and (b) in our RFNF and not in Vincent’s. But the “syntactic” differences between these various normal forms can be a little hard to remember, and it might be helpful to summarize them here:
Relvar R is in our RFNF if and only if it’s in BCNF and, for every JD J that holds in R, some component of J is a superkey for R.
Relvar R is in Vincent’s RFNF if and only if it’s in BCNF and, for every JD J that holds in R, the union of those components of J that are superkeys for R is equal to the heading of R.
Relvar R is in SKNF if and only if, for every irreducible JD J that holds in R, every component of J is a superkey for R.
Now let me remind you of the following definition of redundancy from Chapter 15:
Definition: Let D be a database design; let DB be a database value (i.e., a set of values for the relvars mentioned in D) that conforms to D; and let p be a proposition not involving any existential quantification. If DB contains two or more distinct representations of p, then DB contains, and D permits, redundancy.
Consider relvar SPJ′ once again (the original version, without the additional FD {PNO,JNO} → {SNO}). I now claim that the design of this relvar does permit redundancy by this definition. To be specific (and with reference to the sample tuples shown earlier), the unquantified proposition Supplier s1 supplies part p1 to project j1 is represented both explicitly, by the appearance of tuple t1 (or tuple t4), and implicitly, as a logical consequence of the JD and the following propositions:
Supplier s1 supplies part p1 (to some project): Represented by tuple t1
Part p1 is supplied to project j1 (by some supplier): Represented by tuple t3
Project j1 is supplied by supplier s1 (with some part): Represented by tuple t2
Thus, I claim that Vincent’s specific definition of redundancy is consistent with, and can be seen as a special case of, the definition (“final version”) that I proposed for redundancy in general in Chapter 15.
I’d like to close this appendix by pointing out that a relvar can be in Vincent’s RFNF (and even in 5NF) and yet still be subject to redundancy of a kind that, while not identical to that defined by Vincent, is certainly very similar to it. Recall this example (discussed briefly in a footnote in Chapter 12): We’re given a relvar SPJQ, with attributes SNO, PNO, JNO, and QTY (only) and predicate “Supplier SNO supplies part PNO to project JNO in quantity QTY.” The sole key is {SNO,PNO,JNO}, and the relvar is therefore in BCNF. Note that the JD {{SNO,PNO},{PNO,JNO},{JNO,SNO}} does not hold in this relvar; however, it does hold in the projection of the relvar on {SNO,PNO,JNO} (in other words, it’s an embedded dependency; refer to Chapter 12 if you need to refresh your memory regarding this notion). Now suppose the relvar contains the following tuples (only):
s1 p1 j2 100
s1 p2 j1 200
s2 p1 j1 300
s1 p1 j3 400
By the embedded dependency, then, we must have j3 = j1. As previously stated, therefore, relvar SPJQ is certainly subject to redundancy; what’s more, the kind of redundancy it’s subject to is very similar to the kind defined by Vincent. Nevertheless, the relvar is in Vincent’s RFNF! The point is, Vincent’s redundancy is defined with respect to the FDs and JDs (only) of the relvar in question; it has nothing to say about any other constraints, such as embedded dependencies, that might also happen to hold.
[180] Millist W. Vincent: “Redundancy Elimination and a New Normal Form for Relational Database Design,” in B. Thalheim and L.Libkin (eds.), Semantics in Databases. Berlin, FDR: Springer-Verlag (1998).
[181] Note that the arbitrariness just referred to has no impact on this definition (perhaps fortunately).
[182] As noted in Chapter 13, “our” RFNF has since been renamed ETNF (“essential tuple normal form”). For further discussion, see the section “Stop Press” in Appendix C.
[183] What Vincent actually does is this: He defines relvar R to be in yet another normal form that he calls KCNF (key complete normal form) if and only if it satisfies the stated conditions (i.e., if and only if it’s in BCNF and, for every JD J that holds in R, the union of those components of J that are superkeys for R is equal to the heading of R). Then he goes on to prove that KCNF and his RFNF are logically equivalent.
[184] I’m relying here on the fact that this particular JD is the only nontrivial one to hold.