1 What Is Relational?

Before jumping right into SQL, you should understand the logic behind the structure of the databases that SQL supports. In this chapter, you’ll learn why the relational database was invented, how it is constructed, and why you should use it. This information provides the foundation you need to understand what SQL really is all about and will eventually help to clarify how you can leverage SQL to your best advantage.

Types of Databases

What is a database? As you probably know, a database is an organized collection of data used to model some type of organization or organizational process. It really doesn’t matter whether you’re using paper or an application program to collect and store the data. You have a database as long as you’re collecting and storing data in some organized manner for a specific purpose. Throughout the remainder of this discussion, I’ll assume that you’re using an application program to collect and maintain your data.

Generally, two types of databases are used in database management: operational databases and analytical databases.

Operational databases are the backbone of many companies, organizations, and institutions throughout the world today. This type of database is primarily used to collect, modify, and maintain data on a day-to-day basis. The type of data stored is dynamic, meaning that it changes constantly and always reflects up-to-the-minute information. Organizations such as retail stores, manufacturing companies, hospitals and clinics, and publishing houses use operational databases because their data is in a constant state of flux.

In contrast, an analytical database stores and tracks historical and time-dependent data. It is a valuable asset for tracking trends, viewing statistical data over a long period, or making tactical or strategic business projections. The type of data stored is static, meaning that the data is never (or very rarely) modified, although new data might often be added. The information gleaned from an analytical database reflects a point-in-time snapshot of the data and is usually not up to date. Chemical labs, geological companies, and marketing analysis firms are examples of organizations that use analytical databases. Note that the data found in analytical databases is usually gleaned from an operational database. For example, sales history each month might be summarized and saved in an analytical database.

A Brief History of the Relational Model

Several types of database models exist. Some, such as hierarchical and network, are used only on legacy systems, while others, such as relational, have gained wide acceptance. You might also encounter discussions in other books about object, object-relational, or online analytical processing (OLAP) models. In fact, there are extensions defined in the SQL Standard that support these models, and some commercial database systems have implemented some of these extensions. For my purposes, however, I will focus strictly on the relational model and the core of the international SQL Standard.

In the Beginning …

The relational database was first conceived in 1969 and has arguably become the most widely used database model in database management today. The father of the relational model, Dr. Edgar F. Codd (1923–2003), was an IBM research scientist in the late 1960s and was at that time looking into new ways to handle large amounts of data. His dissatisfaction with database models and database products of the time led him to begin thinking of ways to apply the disciplines and structures of mathematics to solve the myriad problems he had been encountering. A mathematician by profession, he strongly believed that he could apply specific branches of mathematics to solve problems such as data redundancy, weak data integrity, and a database structure’s overdependence on its physical implementation.

Dr. Codd formally presented his new relational model in a landmark work titled “A Relational Model of Data for Large Shared Databanks” in June 1970.¹ He based his new model on two branches of mathematics—set theory and first-order predicate logic. Indeed, the name of the model itself is derived from the term relation, which is part of set theory. (A widely held misconception is that the relational model derives its name from the fact that tables within a relational database can be related to one another. However, the term relation in the model is used to describe what most relational database systems call a table. Now that you know the truth, you’ll have a peaceful, restful sleep tonight!) Fortunately, you don’t need to know the details of set theory or first-order predicate logic to design and use a relational database. If you use a good database design methodology—such as the one presented in Mike Hernandez’s Database Design for Mere Mortals, Third Edition (Addison-Wesley, 2013)—you can develop a sound and effective database structure that you can confidently use to collect and maintain any data. (Well, OK, you do need to understand a little bit about predicates and set theory to solve more complex problems. I cover the essentials that you need to know about predicates—really a fancy name for a filter—in Chapter 6, “Filtering Your Data,” and the basics of set theory in Chapter 7, “Thinking in Sets.”)

Relational Database Systems

A relational database management system (RDBMS) is a software application program you use to create, maintain, modify, and manipulate a relational database. Many RDBMS programs also provide the tools you need to create end-user applications that interact with the data stored in the database. RDBMS programs have continually evolved since their first appearance, and they continue to become more full-featured and powerful as advances occur in hardware technology and operating environments.

In the earliest days of the relational database, RDBMSs were written for use on mainframe computers. Two RDBMS programs prevalent in the early 1970s were System R, developed by IBM at its San Jose Research Laboratory in California, and Interactive Graphics Retrieval System (INGRES), developed at the University of California at Berkeley. These two programs contributed greatly to the general appreciation of the relational model.

As the benefits of the relational database became more widely known, many companies decided to make a slow move from hierarchical and network database models to the relational database model, thus creating a need for more and better mainframe RDBMS programs. The 1980s saw the development of various commercial RDBMSs for mainframe computers by companies such as Oracle and IBM.

The early to mid-1980s saw the rise of the personal computer, and with it, the development of PC-based RDBMS programs. Some of the early entries in this category, from companies such as Ashton-Tate and Fox Software, were nothing more than elementary file-based database-management systems. True PC-based RDBMS programs began to emerge with products developed by companies such as Microrim and Ansa Software. These companies helped to spread the idea and potential of database management from the mainframe-dominated domain of information systems departments to the desktop of the common end user.

The need to share data became apparent as more and more users worked with databases throughout the late 1980s and early 1990s. The concept of a centrally located database that could be made available to multiple users seemed a very promising idea. This would certainly make data management and database security much easier to implement. Database vendors such as Microsoft and Oracle responded to this need by developing client/server RDBMS programs.

The manner in which databases are used evolved immensely over the years, and many organizations began to realize that a lot of useful information could be gathered from data they stored in various relational and nonrelational databases. This prompted them to question whether there was a way to mine the data for useful analytical information that they could then use to make critical business decisions. Furthermore, they wondered whether they could consolidate and integrate their data into a viable knowledgebase for their organizations. Indeed, these would be difficult questions to answer.

IBM proposed the idea of a data warehouse, which, as originally conceived, would enable organizations to access data stored in any number of nonrelational databases. It was unsuccessful in its first attempts at implementing data warehouses, primarily because of the complexities and performance problems associated with such a task. Only since the 1990s has the implementation of data warehouses become more viable and practical. William H. (Bill) Inmon, widely regarded as the father of the data warehouse, is a strong and vocal advocate of the technology and has been instrumental in its evolution. Data warehouses are now more commonplace as companies move to leverage the vast amounts of data they’ve stored in their databases over the years.

The Internet has had a significant impact on the way organizations use databases. Many companies and businesses use the Web to expand their consumer base, and much of the data they share with and gather from these consumers is stored in a database. Developers commonly use eXtensible Markup Language (XML) to assemble and consolidate data from various relational and nonrelational systems.

There has been a considerable effort by various vendors to get their clients to create databases and store data in the “cloud”; that is, a location that is completely apart from the client’s location. The idea is that the client can access data from the cloud database via the Internet from anywhere at any time. As an example of using the “cloud” for database management, Microsoft’s focus in the last several releases of Microsoft Access has been to migrate data from desktop devices to cloud servers. Given the broad emergence and use of connected devices within the past several years (as of this writing), it will be interesting to see how database management systems evolve within this type of environment.

Anatomy of a Relational Database

According to the relational model, data in a relational database is stored in relations, which are perceived by the user as tables. Each relation is composed of tuples (records or rows) and attributes (fields or columns). A relational database has several other characteristics, which are discussed in this section.

Tables

Tables are the main structures in the database. Each table always represents a single, specific subject. The logical order of rows and columns within a table is of absolutely no importance. Every table contains at least one column—known as a primary key—that uniquely identifies each of its rows. (In Figure 1-1, for example, CustomerID is the primary key of the Customers table.) In fact, data in a relational database can exist independent of the way it is physically stored in the computer because of these last two table characteristics. This is great news for users because they aren’t required to know the physical location of a row in order to retrieve its data.

The subject that a given table represents can be either an object or an event. When the subject is an object, the table represents something that is tangible, such as a person, place, or thing. Regardless of its type, every object has characteristics that can be stored as data. You can then process this data in an almost infinite number of ways. Pilots, products, machines, students, buildings, and equipment are all examples of objects that can be represented by a table. Figure 1-1 illustrates one of the most common examples of this type of table.

When the subject of a table is an event, the table represents something that occurs at a given point in time and has characteristics you wish to record. These characteristics can be stored as data and then processed as information in the same manner as a table that represents some specific object. Examples of events you might need to record include judicial hearings, distributions of funds, lab test results, and geological surveys. In a sales orders database, an order can be considered both an object (the physical piece of paper representing an order) and an event (the shipment of the items ordered). Figure 1-2 shows an example of a table representing an event that we all have experienced at one time or another—a doctor’s appointment.

Columns

A column is the smallest structure in the database, and it represents a characteristic of the subject of the table to which it belongs. Columns are the structures that store data. You can retrieve the data in these columns and then present it as information in almost any configuration imaginable. Remember that the quality of the information you get from your data is in direct proportion to the amount of time you’ve dedicated to ensuring the structural integrity and data integrity of the columns themselves. There is just no way to underestimate the importance of columns.

Every column in a properly designed database contains one and only one value, and its name identifies the type of value it holds. This makes entering data into a column very intuitive. If you see columns with names such as FirstName, LastName, City, State, and ZipCode, you know exactly what type of value goes into each column. You’ll also find it very easy to sort the data by state or to look for everyone whose last name is Viescas.

Rows

A row represents a unique instance of the subject of a table. It is composed of the entire set of columns in a table, regardless of whether or not the columns contain any values. Because of the manner in which a table is defined, each row is identified throughout the database by a unique value in the primary key column(s) of that row.

In Figure 1-1, for example, each row represents a unique customer within the table, and the CustomerID column identifies a given customer throughout the database. In turn, each row includes all the columns within the table, and each column describes some aspect of the customer represented by the row. Rows are a key factor in understanding table relationships because you need to know how a row in one table relates to other rows in another table.

Keys

Keys are special columns that play very specific roles within a table. The type of key determines its purpose within the table. Although a table might contain several types of keys, I will limit my discussion to the two most important ones: the primary key and the foreign key.

A primary key consists of one or more columns that uniquely identify each row within a table. (When a primary key is composed of two or more columns, it is known as a composite primary key.) The primary key is the most important for two reasons: Its value identifies a specific row throughout the entire database, and its column identifies a given table throughout the entire database. Primary keys also enforce table-level integrity and help establish relationships with other tables. Every table in your database should have a primary key.

The AgentID column in Figure 1-3 is a good example of a primary key because it uniquely identifies each agent within the Agents table and helps to guarantee table-level integrity by ensuring nonduplicate rows. It is also used to establish relationships between the Agents table and other tables in the database, such as the Entertainers table shown in the example.

When you determine that a pair of tables has a relationship to each other, you typically establish the relationship by taking a copy of the primary key from the first table and inserting it into the second table, where it becomes a foreign key. (The term foreign key is derived from the fact that the second table already has a primary key of its own, and the primary key you are introducing from the first table is foreign to the second table.)

Figure 1-3 also shows a good example of a foreign key. In this example, AgentID is the primary key of the Agents table, and it is a foreign key in the Entertainers table. As you can see, the Entertainers table already has a primary key—EntertainerID. In this relationship, AgentID is the column that establishes the connection between Agents and Entertainers.

Foreign keys are important not only for the obvious reason that they help establish relationships between pairs of tables but also because they help ensure relationship-level integrity. This means that the rows in both tables will always be properly related because the values of a foreign key must be drawn from the values of the primary key to which it refers. Foreign keys also help you avoid the dreaded “orphaned rows,” a classic example of which is an order row without an associated customer. If you don’t know who placed the order, you can’t process it, and you obviously can’t invoice it. That’ll throw off your quarterly sales!

Views

A view is a virtual table composed of columns from one or more tables in the database. The tables that comprise the view are known as base tables. The relational model refers to a view as virtual because it draws data from base tables rather than storing any data on its own. In fact, the only information about a view that is stored in the database is its structure.

Views enable you to see the information in your database from many different perspectives, thus providing great flexibility for working with data. You can create views in a variety of ways—they are especially useful when based on multiple related tables. For example, you can create a view that summarizes information such as the total number of hours worked by every carpenter within the downtown Seattle area. Or you can create a view that groups data by specific columns. An example of this type of view is displaying the total number of employees in each city within every state of a specified set of regions. Figure 1-4 presents an example of a typical view.

In many RDBMS programs, a view is commonly implemented and referred to as a saved query or, more simply, a query. In most cases, a query has all the characteristics of a view, so the only difference is that it is referred to by a different name. (I often wonder if someone in some marketing department had something to do with this.) It’s important to note that some vendors refer to a query by its real name. Regardless of what it’s called in your RDBMS program, you’ll certainly use views in your database.

Having said that, the name of this book is SQL Queries for Mere Mortals, but I’m really focused on teaching you how to build views. As you’ll learn in Chapter 2, “Ensuring Your Database Structure Is Sound,” the correct way to design a relational database is to break up your data so that you have one table per subject or event. Most of the time, however, you’ll want to get information about related subjects or events—which customers placed what orders or what classes are taught by which instructors. To do that, you need to build a view, and you need to know SQL to do that.

Relationships

If rows in a given table can be associated in some way with rows in another table, the tables are said to have a relationship between them. The manner in which the relationship is established depends on the type of relationship. Three types of relationships can exist between a pair of tables: one-to-one, one-to-many, and many-to-many. Understanding relationships is crucial to understanding how views work and, by definition, how multi-table SQL queries are designed and used. (You’ll learn more about this in Part III, “Working with Multiple Tables.”)

One-to-One

A pair of tables has a one-to-one relationship when a single row in the first table is related to only one row in the second table, and a single row in the second table is related to only one row in the first table. In this type of relationship, one table is referred to as the primary table, and the other is referred to as the secondary table. You establish this relationship by taking the primary key of the primary table and inserting it into the secondary table, where it becomes a foreign key. This is a special type of relationship because in nearly all cases the foreign key also acts as the primary key of the secondary table.

Figure 1-5 shows an example of a typical one-to-one relationship in which Agents is the primary table and Compensation is the secondary table. The relationship between these tables is such that a single row in the Agents table can be related to only one row in the Compensation table, and a single row in the Compensation table can be related to only one row in the Agents table. Note that AgentID is indeed the primary key in both tables but also serves as a foreign key in the secondary table.

The selection of the table that will play the primary role in this type of relationship depends on whether rows can exist in one table with no matching row in the other table. You cannot add rows to the secondary table in a one-to-one relationship unless a matching row already exists in the primary table. For example, a new agent might be hired, but the compensation is not decided yet. You need to be able to define the agent without requiring that a matching compensation row exists, so Agent becomes the primary table. Also, defining a compensation row for an employee who doesn’t exist does not make sense, so clearly compensation is the secondary table. One-to-one relationships are not very common and are usually found in cases where a table has been split into two parts for confidentiality purposes.

One-to-Many

When a pair of tables has a one-to-many relationship, a single row in the first table can be related to many rows in the second table, but a single row in the second table can be related to only one row in the first table. This relationship is established by taking the primary key of the table on the “one” side and inserting it into the table on the “many” side, where it becomes a foreign key.

Figure 1-6 shows a typical one-to-many relationship. In this example, a single row in the Entertainers table can be related to many rows in the Engagements table, but a single row in the Engagements table can be related to only one row in the Entertainers table. As you probably have guessed, EntertainerID is a foreign key in the Engagements table.

Figure 1-6 An example of a one-to-many relationship

Many-to-Many

A pair of tables is in a many-to-many relationship when a single row in the first table can be related to many rows in the second table, and a single row in the second table can be related to many rows in the first table. To establish this relationship properly, you must create what is known as a linking table. This table provides an easy way to associate rows from one table with those of the other and will help to ensure that you have no problems adding, deleting, or modifying any related data. You define a linking table by taking a copy of the primary key of each table in the relationship and using them to form the structure of the new table. These columns actually serve two distinct roles: Together they form the composite primary key of the linking table, and separately they each serve as a foreign key.

A many-to-many relationship that has not been properly established is said to be unresolved. Figure 1-7 shows a clear example of an unresolved many-to-many relationship. In this case, a single row in the Customers table can be related to many rows in the Entertainers table, and a single row in the Entertainers table can be related to many rows in the Customers table.

Figure 1-7 An unresolved many-to-many relationship

This relationship is unresolved because of the inherent problem with a many-to-many relationship. The issue is this: How do you easily associate rows from the first table with rows in the second table? To reframe the question regarding the tables shown in Figure 1-7, how do you associate a single customer with several entertainers or a specific entertainer with several customers? (If you are running an entertainment booking agency, you certainly hope that any one customer will book multiple entertainers over time and that any one entertainer has more than one customer!) Do you insert a few columns from the Customers table into the Entertainers table? Or do you add several columns from the Entertainers table to the Customers table? Either of these approaches is going to create some problems when you try to work with related data, not least of which regards data integrity. The solution to this dilemma is to create a linking table in the manner previously stated. By creating and using the linking table, you can properly resolve the many-to-many relationship. Figure 1-8 shows this solution in practice.

Figure 1-8 A properly resolved many-to-many relationship

The linking table in Figure 1-8 was created by taking the CustomerID from the Customers table and the EntertainerID from the Entertainers table and using them as the basis for a new table. As with any other table in the database, the new linking table has its own name—Engagements. In fact, the Engagements table is a good example of a table that stores the information about an event. Entertainer 1003 (JV & the Deep Six) played an engagement for customer 10001 (Doris Hartwig) on February 23. And a linking table lets you store additional information about the link—like the date and perhaps the cost of the engagement. The real advantage of a linking table is that it allows you to associate any number of rows from both tables in the relationship. As the example shows, you can now easily associate a given customer with any number of entertainers or a specific entertainer with any number of customers.

As I stated earlier, understanding relationships will pay great dividends when you begin to work with multi-table SQL queries, so be sure to revisit this section when you begin working on Part III of this book.

What’s in It for You?

Why should you be concerned with understanding relational databases? Why should you even care what kind of environment you’re using to work with your data? And in addition to all this, what’s really in it for you? Here’s where the enlightenment starts and the fun begins.

The time you spend learning about relational databases is an investment, and it is to your distinct advantage to do so. You should develop a good working knowledge of the relational database because it’s the most widely used data model in existence today. Forget what you read in the trades and what Harry over in the Information Technology Services department told you—a vast majority of the data being used by businesses and organizations is being collected, maintained, and manipulated in relational databases. Yes, there have been extensions to the model, the application programs that work with relational databases have been injected with object orientation, and relational databases have been thoroughly integrated into the Web and the cloud. But no matter how you slice it, dice it, and spice it, it’s still a relational database! The relational database has been around for more than 40 years, it’s still going strong, and it’s not going be replaced anytime in the foreseeable future.

Nearly all commercial database management software used today is relational. (However, folks such as C. J. Date and Fabian Pascal might seriously question whether any commercial implementation is truly relational!) If you want to be gainfully employed in the database field, you’d better know how to design a relational database and how to implement it using one of the popular RDBMS programs. And now that so many companies and corporations depend on the Internet, the cloud, and connected services, you’d better have some Web development experience under your belt as well.

Having a good working knowledge of relational databases is helpful in many ways. For instance, the more you know about how relational databases are designed, the easier it will be for you to develop end-user applications for a given database. You’ll also be surprised by how intuitive your RDBMS program will become because you’ll understand why it provides the tools it does and how to use those tools to your best advantage. Your working knowledge will be a great asset as you learn how to use SQL because SQL is the standard language for creating, maintaining, and working with a relational database.

Where Do You Go from Here?

Now that you know the importance of learning about relational databases, you must understand that there is a difference between database theory and database design. Database theory involves the principles and rules that formulate the basis of the relational database model. It is what is learned in the hallowed halls of academia and then quickly dismissed in the dark dens of the real world. But theory is important, nonetheless, because it guarantees that the relational database is structurally sound and that all actions taken on the data in the database have predictable results. On the other hand, database design involves the structured, organized set of processes used to design a relational database. A good database design methodology will help you ensure the integrity, consistency, and accuracy of the data in the database and guarantee that any information you retrieve will be as accurate and up to date as possible.

If you want to design and create enterprise-wide databases, or develop web-based Internet commerce databases, or begin to delve into data warehousing, you should seriously think about studying database theory. This applies even if you’re not going to explore any of these areas but are considering becoming a high-end database consultant. For the rest of you who are going to design and create relational databases on a variety of platforms (which, I believe, is the vast majority of the people reading this book), learning a good, solid database design methodology will serve you well. Always remember that designing a database is relatively easy, but implementing a database within a specific RDBMS program on a particular platform is another issue altogether. (Another story, another book, another time.)

There are a number of good database design books on the market. Some, such as Mike Hernandez’s companion book Database Design for Mere Mortals, Third Edition (Addison-Wesley, 2013), deals only with database design methodologies. Others, such as C. J. Date’s An Introduction to Database Systems, Eighth Edition (Addison-Wesley, 2003), mix both theory and design. (Be warned, though, that the books dealing with theory are not necessarily light reading.) After you decide in which direction you want to go, select and purchase the appropriate books, grab a double espresso (or your beverage of choice), and dig right in. After you become comfortable with relational databases in general, you’ll find that you will need to study and become very familiar with SQL.

And that’s why you’re reading this book.

Summary

I began this chapter with a brief discussion of the different types of databases commonly found today. You learned that organizations working with dynamic data use operational databases, ensuring that the information retrieved is always as accurate and up-to-the-minute as possible. You also learned that organizations working with static data use analytical databases.

I then looked at a brief history of the relational database model. I explained that Dr. E. F. Codd created the model based on specific branches of mathematics and that the model has been in existence for nearly 50 years. Database software, as you now know, has been developed for various computing environments and has steadily grown in power, performance, and capability since the 1970s. From the mainframe to the desktop to the Web to connected services, RDBMS programs are the backbone of many organizations today.

Next, I looked at an anatomy of a relational database. I introduced you to its basic components and briefly explained their purpose. You learned about the three types of relationships and now understand their importance, not only regarding the database structure itself but also as they relate to your understanding of SQL.

Finally, I explained why it’s to your advantage to learn about relational databases and how to design them. You now know that the relational database is the most common type of database in use today and that just about every database software program you’re likely to encounter will be used to support a relational database. You now have some ideas of how to pursue your education on relational database theory and design a little further.

In the next chapter, you’ll learn some techniques to fine-tune your existing database structures.