Now let’s get back to the suppliers-and-parts database as such, with sample values as shown in Figure 1-1 in the previous chapter. Here now are definitions of the three relvars in that database, expressed in a language called Tutorial D (see further explanation following the definitions):
VAR S BASE RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR } KEY { SNO } ; VAR P BASE RELATION { PNO CHAR , PNAME CHAR , COLOR CHAR , WEIGHT RATIONAL , CITY CHAR } KEY { PNO } ; VAR SP BASE RELATION { SNO CHAR , PNO CHAR , QTY INTEGER } KEY { SNO , PNO } FOREIGN KEY { SNO } REFERENCES S FOREIGN KEY { PNO } REFERENCES P ;
As I said, these definitions are expressed in a language called Tutorial D. Now, I believe that language is pretty much self-explanatory; however, a comprehensive description can be found if needed in the book Databases, Types, and the Relational Model: The Third Manifesto (3rd edition), by C. J. Date and Hugh Darwen (Addison-Wesley, 2006).[17] Note: As its title suggests, that book also introduces and explains The Third Manifesto, a precise though somewhat formal definition of the relational model and a supporting type theory (including, incidentally, a comprehensive model of type inheritance). In particular, it uses the name D as a generic name for any language that conforms to the principles laid down by The Third Manifesto. Any number of distinct languages could qualify as a valid D; sadly, however, SQL isn’t one of them, which is why examples in this book are expressed (where it makes any difference) in Tutorial D and not SQL. (Of course, Tutorial D is a valid D; in fact, it was explicitly designed to be suitable as a vehicle for illustrating and teaching the ideas of The Third Manifesto.)
Aside: This is as good a point as any to mention that the terminology used in the present book is based on that of the Manifesto. As a consequence, it does differ on occasion from that found in some of the design theory literature. For example, that literature typically doesn’t talk about relational headings; instead, it uses the term relation schema.[18] Nor does it talk about relation variables (relvars); instead, what this book refers to as a (relation) value that’s assigned to some relation variable it calls an instance of the corresponding schema. End of aside.
Back to the relvar definitions. As you can see, each of those definitions includes a KEY specification, which means that every relation that might ever be assigned to any of those relvars is required to satisfy the corresponding key constraint. (Recall from Chapter 1 that every relvar does have at least one key.) For example, every relation that might ever be assigned to relvar S is required to satisfy the constraint that no two distinct tuples in that relation have the same SNO value. What’s more, I’m going to assume throughout this book, barring explicit statements to the contrary, that the following functional dependency (FD) also holds in relvar S:
{ CITY }→ { STATUS }
You can read this FD, informally, as STATUS is functionally dependent on CITY, or as CITY functionally determines STATUS, or more simply as just CITY arrow STATUS. What it means is that every relation that might ever be assigned to relvar S is required to satisfy the constraint that if two tuples in that relation have the same CITY value, then they must also have the same STATUS value.[19] Observe that the sample value of relvar S given in Figure 1-1 does indeed satisfy this constraint. Note: I’ll have a great deal more to say about FDs later in Parts II and III of this book, but I’m sure you’re already familiar with the basic idea anyway.
Now, just as KEY specifications are used to declare key constraints, so we need some kind of syntax in order to be able to declare FD constraints. Tutorial D provides no specific syntax for that purpose, however[20] (nor does SQL, come to that). It does allow them to be expressed in a somewhat roundabout fashion—for example:
CONSTRAINT XCT COUNT ( S { CITY } ) = COUNT ( S { CITY , STATUS } ) ;
Explanation: In Tutorial D, an expression of the form r{A1,...,An} denotes the projection of relation r on attributes A1, ..., An. If the current value of relvar S is s (a relation), therefore, (a) the expression S{CITY} denotes the projection of s on CITY; (b) the expression S{CITY,STATUS} denotes the projection of s on CITY and STATUS; and (c) the constraint overall—which I’ve named, arbitrarily, XCT—requires the cardinalities (COUNT) of those two projections to be equal. (If it’s not obvious that requiring these two counts to be equal is equivalent to requiring the desired FD constraint to hold, try interpreting it in terms of the sample data in Figure 1-1.)
Aside: In case you feel those appeals to COUNT in the formulation of constraint XCT are somehow a little inelegant, here’s an alternative formulation that avoids them:
CONSTRAINT XCT WITH ( CT := S { CITY , STATUS } ) : AND ( JOIN { CT , CT RENAME { STATUS AS X } } , STATUS = X ) ;Explanation: First, the WITH specification (“WITH (...):”) serves merely to introduce a name, CT, that can be used repeatedly later in the overall expression to avoid having to write out several times the expression it stands for. Second, the Tutorial D RENAME operator is more or less self-explanatory (but is defined anyway, in the answer to Exercise 2.15 in Appendix D). Third, the Tutorial D expression AND(rx,bx), where rx is a relational expression and bx is a boolean expression, returns TRUE if and only if the condition denoted by bx evaluates to TRUE for every tuple in the relation denoted by rx. End of aside.
The foregoing state of affairs notwithstanding, I’ll assume throughout this book that FDs can be stated using the simpler arrow notation illustrated earlier. Analogous remarks apply to other kinds of dependencies also (in particular, to join dependencies and multivalued dependencies, which are introduced in Chapter 9 and Chapter 12, respectively).
I’ll close this chapter with a little teaser. Assuming the only constraints that apply to the suppliers-and-parts database are the foregoing FD and the specified key (and foreign key) constraints, then we can say that relvars S, P, and SP are in second, fifth, and sixth normal form, respectively. To understand the significance of these observations, please read on!
[17] Actually Tutorial D has been revised and extended somewhat since that book was first published. A description of the revised version (which is the version I’ll be using throughout the present book) can be found both in Database Explorations: Essays on The Third Manifesto and Related Topics, by C. J. Date and Hugh Darwen (Trafford, 2010) and on the website www.thethirdmanifesto.com (which, as its name suggests, also contains much current information regarding The Third Manifesto as such).
[18] I mustn’t give the impression that headings and (relational) schemas are exactly the same thing. Rather, a schema is the combination of a heading and certain dependencies (including but not necessarily limited to functional and join dependencies in particular, which are discussed in detail later in this book).
[19] This example of what FDs mean also serves to show why such dependencies are called functional. To elaborate: A function in mathematics is a mapping from one set A to some set B, not necessarily distinct from A, with the property that each element in A maps to just one element in B (but any number of distinct elements in A can map to the same element in B). In the example, therefore, we could say there’s a mapping from the set of CITY values in S to the set of STATUS values in S, and that mapping is indeed a mathematical function.
[20] One reason it doesn’t is that if the design recommendations discussed in the present book are followed, there should rarely be a need to declare FDs explicitly anyway.