Hour 7

Normalizing Your Database

What You’ll Learn in This Hour:

▶ Definition of normalization
▶ Benefits of normalization
▶ Advantages of denormalization
▶ Normalization techniques using the most common normal forms
▶ How to expand upon your data model to the BIRDS database
▶ Normalization techniques used with the sample BIRDS database

In this hour, you learn how to expand upon the beginnings of the logical database design process that you’ve learned in previous hours. This hour focuses on normalization, which is a technique that eliminates redundant data in a database and optimizes relationships between data entities. As stated before, this is not a database design book and this hour does not exhaustively cover normalization; however you learn how to quickly apply the basic concepts of normalization to your database design. Using the auxiliary database that you have been modeling the past couple hours, you finalize an entity relationship diagram that is ready to be implemented as a physical database using SQL commands. Everything that you have learned up to this point serves as an extremely strong foundation so that you can immediately apply all the SQL concepts that you learn in this book.

As with nearly everything covered up to this point, the great part about the process of normalization is that it works the same regardless of which relational database management system (RDBMS) you use. The advantages and disadvantages of both normalization and denormalization of a database are also discussed in this hour, along with data integrity versus performance issues that pertain to normalization.

Defining Normalization

Normalization is the process of reducing redundancies of data in a database. It comes into play when you are designing and redesigning a database, to help you eliminate or reduce redundant data. The actual guidelines of normalization, called normal forms, are discussed later in this hour.

The decision to cover normalization in this book was difficult because of the complexity involved. Understanding the rules of the normal forms can be challenging this early in your SQL journey. However, normalization is an important process that, if understood, increases your understanding of SQL.

Note

Understanding Normalization

The material in this hour attempts to simplify the process of normalization as much as possible. At this point, don’t be overly concerned with all the specifics of normalization; understanding the basic concepts is most important. Because this topic is not always easy to understand at first, several different types of examples are included, starting with standard employee-type data, then moving on to the BIRDS database, incorporating bird resource data into the BIRDS database, and finally finishing with your own opportunity to normalize photographer data that can also be integrated into the BIRDS database.

The Raw Database

A database that is not normalized might include data that is contained in one or more tables for no apparent reason. This is detrimental for security reasons, disk space usage, speed of queries, efficiency of database updates, and, maybe most important, data integrity. A database that has not been normalized has not been broken down logically into smaller, more manageable tables. Figure 7.1 illustrates an example of a simple database before normalization.

Figure of the Company database. The Company table ID is COMPANY_DATABASE. — **FIGURE 7.1**
The raw database

Determining the set of information that the raw database contains is one of the first and most important steps in logical database design, as previously discussed. You must know all the data elements that comprise your database to effectively apply the techniques discussed in this hour. Taking the time to gather the set of required data keeps you from having to backtrack your database design scheme because of missing data elements.

Logical Database Design

Any database should be designed with the end user in mind. As discussed in previous hours, logical database design, also referred to as the logical model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The logical design of a database should reduce data repetition or completely eliminate it. After all, why store the same data twice? In addition, the logical database design should strive to make the database easy to maintain and update. Naming conventions used in a database should also be standard and logical, to aid in this endeavor.

Note

The End User’s Needs

The needs of the end user should be one of the top considerations when designing a database. Remember that the end user is the person who ultimately uses the database. The user’s front-end tool (a client program that enables a user access to a database) should be easy to use, but this (along with optimal performance) cannot be achieved if the user’s needs are not considered.

Data Redundancy

Data should not be redundant; duplicated data should be kept to a minimum. For example, storing an employee’s home address in more than one table is unnecessary, as is storing a specific bird’s name in more than one table. Duplicate data takes up unnecessary space. Confusion is also a threat when, for instance, an address for an employee in one table does not match the address for the same employee in another table. Which table is correct? Do you have documentation to verify the employee’s current address? Data management is difficult enough; redundant data can be disastrous.

Reducing redundancy also ensures that updating the data within the database is a relatively simple process. If you have a single table for employees’ addresses and you update that table with new addresses, you can rest assured that it is updated for everyone who is viewing the data.

Note

Eliminating Data Redundancy

One of the main objectives of database design and, specifically, normalization is to eliminate redundant data as much as possible. Even if you cannot completely eliminate redundant data, you should minimize redundant data, to avoid issues with data integrity, relationships, and general usability and scalability of the database in the future.

Exploring the Most Common Normal Forms of the Normalization Process

Normalizing a relational database is achieved by applying general guidelines that are organized progressively in two phases called normal forms. The next sections discuss normal forms, an integral concept in the process of database normalization.

A normal form is a way of measuring the levels, or depth, to which a database has been normalized. A database’s level of normalization is determined by the normal form.

The following are the three most common normal forms in the normalization process:

▶ The first normal form
▶ The second normal form
▶ The third normal form

Additional normal forms exist beyond these, but they are used far less often than the three major ones noted here. Of the three major normal forms, each subsequent normal form depends on normalization steps taken in the previous normal form. For example, to normalize a database using the second normal form, the database must be in the first normal form.

First Normal Form (FNF): The Key

The objective of the first normal form is to divide the base data into logical entities and, ultimately, tables. When each entity has been designed, a primary key is assigned to most or all entities. Examine Figure 7.2, which illustrates how the raw database shown in Figure 7.1 has been redeveloped using the first normal form.

The first normal form of the Company database divided into three tables namely: EMPLOYEE_TBL , CUSTOMER_TBL, and PRODUCTS_TBL. — **FIGURE 7.2**
The first normal form

You can see that, to achieve the first normal form, data had to be broken into logical units of related information, each with a primary key. No repeated groups appear in any of the tables. Instead of one large table, you have three smaller, more manageable tables: EMPLOYEE_TBL, CUSTOMER_TBL, and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table—in this case, EMP_ID, CUST_ID, and PROD_ID. This is a normal convention to use when diagramming your database, to ensure that it is easily readable.

However, your primary key can also be made up of more than one column in the data set. Often these values are not simple database-generated numbers; they are logical points of data such as a product’s name or a book’s ISBN. These values are commonly referred to as natural keys because they uniquely define a specific object, regardless of whether it is in a database. The main point to remember in selecting your primary key for a table is that it must uniquely identify a single row. Without this, you introduce the possibility of adding duplication into your results of queries and prevent yourself from successfully performing even simple actions, such as removing a particular row of data based solely on the key.

Figure 7.3 shows another example of the first normal form being applied to the BIRDS database. This is only a subset of the data from the BIRDS database to have a simplified example. The figure also shows how the BIRDS database is related in some way to the bird rescue data as well as the photographer data that we were working with in examples and exercises in this book.

The First normal form in the BIRDS database. — **FIGURE 7.3**
First normal form in the `BIRDS` database

After studying Figure 7.3, hopefully you can see that the basic bird information is grouped together into a single entity. The # represents a foreign key constraint that will eventually be placed on the BIRD_ID attribute of the BIRDS entity. When a table called BIRDS is later created based on this entity, a constraint will be placed on the BIRD_ID column. The BIRD_ID primary key is a unique identifier that ensures that every occurrence of a specific bird in the birds entity is unique. Maintaining duplicate records of the same bird is not necessary in this entity.

Note

First Normal Form

The objective of the first normal form is to divide the base data into logical units called entities. Each entity has attributes about the data contained within. These entities and attributes will become tables and columns eventually in the physical implementation based on the logical database design. When appropriate, a primary key should be assigned to each entity. To achieve first normal form, data is broken into logical units of related information. Assigning a primary key helps ensure that no duplicate records exist within a group of data.

Second Normal Form (SNF): The Whole Key

The objective of the second normal form is to enter data that is only partly dependent on the primary key into another table. Figure 7.4 illustrates the second normal form.

The second normal form is derived from the first normal form of a Company database. — **FIGURE 7.4**
The second normal form

As you can see in the figure, the second normal form is derived from the first normal form by further breaking two tables into more specific units.

EMPLOYEE_TBL is split into two tables, called EMPLOYEE_TBL and EMPLOYEE_PAY_TBL. Personal employee information is dependent on the primary key (EMP_ID), so that information remains in the EMPLOYEE_TBL (EMP_ID, LAST_NAME, FIRST_NAME, MIDDLE_NAME, ADDRESS, CITY, STATE, ZIP, PHONE, and PAGER). However, the information that is only partly dependent on the EMP_ID (each individual employee) populates EMPLOYEE_PAY_TBL (EMP_ID, POSITION, POSITION_DESC, DATE_HIRE, PAY_RATE, and DATE_LAST_RAISE). Notice that both tables contain the column EMP_ID. This is the primary key of each table and is used to match corresponding data between the two tables.

CUSTOMER_TBL is split into two tables, called CUSTOMER_TBL and ORDERS_TBL. The process here is similar to what occurred with the EMPLOYEE_TBL: Columns that were partly dependent on the primary key were directed to another table. The order of the information for a customer depends on each CUST_ID but does not directly depend on the general customer information in the original table.

Figure 7.5 illustrates another example of the second normal form in a subset of data in the BIRDS database (refer to Figure 7.3).

You can see here that, instead of having only one entity for birds, five entities have been derived from BIRDS. This is because the attributes that are only partly dependent on BIRDS, such as the food that a bird eats, have been separated into their own entities. The problem with the previous design shown in Figure 7.3 is that if food information is stored in the BIRDS entity, the only way for it to represent that a bird eats more than one type of food is to have multiple columns in the BIRDS entity for different foods (such as FOOD1, FOOD2, FOOD3, and so on). This is redundant information. Additionally, even though the food that a bird eats is partially dependent on a particular bird, a food is really an entity of its own and should become its own entity. This is the same case with nests, migration, and nicknames.

You can conclude the following relationships and information from Figure 7.5:

▶ BIRDS is the parent entity and the BIRD_ID in BIRDS is the primary key, which will be a parent key of entities that have foreign keys that reference birds. The BIRD_ID in BIRDS thus helps ensure that every record for a specific type of bird is unique within that entity.
▶ A bird builds only one type of nest. However, each specific type of nest can be built by many different types of birds. Therefore, the combination of NEST_ID and BIRD_ID comprises the primary key in NESTS. BIRD_ID is a foreign key that references the parent BIRD_ID in BIRDS.
▶ A bird might migrate to many different locations; conversely, each migration location might have many different types of birds that migrate there. Therefore, the primary key is the combination of MIGRATION_ID and BIRD_ID. MIGRATION_ID itself cannot represent uniqueness because multiple occurrences of migration location might still exist within that entity. BIRD_ID is a foreign key that references the BIRD_ID in BIRDS.
▶ A bird might have many nicknames; however, this figure indicates that each nickname is associated with only a single type of bird. Notice that, in this entity, the primary key is the combination of the BIRD_ID and NICKNAME itself.
▶ A bird might eat many different types of foods, and each different type of food might be eaten by many different types of birds. As with some of the other relationships in this figure, this is a many-to-many relationship. The FOOD_ID and BIRD_ID are two attributes that comprise the primary key in food because the combination of the two must be unique for every occurrence of data in this entity. The BIRD_ID is a foreign key that references the BIRD_ID in BIRDS.

Note

Second Normal Form

The objective of the second normal form is to move data that is only partly dependent on the primary key into another entity of its own. The second normal form is derived from the first normal form by further breaking down the original entity into two entities. Attributes that are only partly dependent on the primary key are moved to a new entity of their own.

Third Normal Form (TNF): Nothing but the Key

The third normal form’s objective is to remove data from a table that is not dependent on the primary key. Figure 7.6 illustrates the third normal form.

The third normal form of a Company database. — **FIGURE 7.6**
The third normal form

Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into two tables: One table contains the actual employee pay information; the other table contains the position descriptions, which do not need to reside in EMPLOYEE_PAY_TBL. The POSITION_DESC column is totally independent of the primary key, EMP_ID. As you can see, the normalization process is a series of steps that breaks down the data from the raw database into discrete tables of related data.

Figure 7.7 shows an example of taking data sets from the BIRDS database into the third normal form, based on what you started to model in Figure 7.5.

Third normal form in the BIRDS database. — **FIGURE 7.7**
Third normal form in the `BIRDS` database

Figure 7.7 illustrates how the BIRDS and FOOD entities have been further broken down into three different entities, using the guidelines of the third normal form. The relationship between BIRDS and FOOD alone is a many-to-many relationship. Many-to-many relationships should generally be avoided in a relational database and should instead be resolved using the normalization process. The problem with many-to-many relationships is that redundant data will certainly exist. Figure 7.7 shows how these two tables, or entities, have been expanded into three entities, called BIRDS, BIRDS_FOOD, and FOOD. BIRDS has information only about birds. FOOD has information only about specific food items. BIRDS_FOOD is simply an intermediate entity whose sole purpose is to provide a relationship between birds and food. In BIRDS, the primary key is BIRD_ID. In FOOD, the primary key is FOOD_ID. In BIRDS_FOOD, the primary key is a combination of BIRD_ID and FOOD_ID. For example, the BIRDS_FOOD entity has only one occurrence of a bird and a food item because it is not necessary to store more than one record of a bird eating a specific type of food. This would be considered redundant data, and the purpose of normalization and this specific example is to remove the redundant data from the database. With third normal form, you can see that only a single record exists for every specific type of bird in the database and only a single record exists for each specific type of food in the database. Ideally, the name “Bald Eagle” should exist only once in the entire database. Likewise, the word “Fish” as a food item should exist only one time in the entire database. If the database is designed properly, other entities simply refer to those entities and then are linked to the appropriate related information.

You can conclude the following relationships and information from Figure 7.7:

▶ BIRDS and FOOD are both the parent entities in Figure 7.7. BIRDS_FOOD is a child entity that refers to both BIRDS and FOOD. Even though the primary key in BIRDS_FOOD is a composite primary key consisting of both BIRD_ID and FOOD_ID, the BIRD_ID attribute individually is also a foreign key that references BIRD_ID in BIRDS; likewise, the FOOD_ID attribute individually in BIRDS_FOOD is a foreign key that references FOOD_ID in FOOD.
▶ A bird can eat many different types of foods. This information is found using the BIRDS_FOOD entity to access the FOOD entity.
▶ A food item can be eaten by many different types of birds. This information is also found by accessing the BIRDS_FOOD entity to get information from BIRDS.

Note

Third Normal Form

The objective of the third normal form is to remove data that is not directly dependent on the primary key. Remember that attributes do not have attributes of their own. Any attribute that appears to have attributes of its own should be moved to another entity altogether; only attributes that are directly dependent on the key should remain in an entity.

Naming Conventions

Naming conventions are one of the foremost considerations when you normalize a database. Names are how you refer to objects in the database. You want to give your tables descriptive names that enable you to easily find the type of information they contain. Descriptive table names are especially important for users who had no part in the database design but who need to query the database.

Companies should have a company-wide naming convention to provide guidance for naming not only tables within the database, but also users, filenames, and other related objects. Naming conventions also help in database administration by enabling users to more easily discern the purpose of tables and locations of files within a database system. Designing and enforcing naming conventions is one of a company’s first steps toward a successful database implementation.

Following are some examples of naming conventions you might use to name the BIRDS table. The best approach is to name objects in a way that describes the data contained or even the type of object; technically, though, the exact approach does not matter, as long as it is used consistently.

▶ Birds
▶ Bird
▶ BIRDS
▶ BIRD
▶ Bird_Table
▶ Birds_Table
▶ Birds_Tbl
▶ BIRDS_TABLE
▶ T_BIRDS
▶ t_Birds

One common debate among database designers is whether a table name should be singular or plural. For example, a table contains information about birds. However, another way of looking at the very same table is that the table contains information about each specific type of bird, in a singular sense. Again, the approach is based more on how you view the data; no real right or wrong solution exists, as long as you strive for consistency.

Benefits of Normalization

Normalization provides numerous benefits to a database. Some of the major benefits include the following:

▶ Greater overall database organization
▶ Reduction of redundant data
▶ Data consistency within the database
▶ A much more flexible database design
▶ A better handle on database security
▶ Reinforcement of referential integrity

The normalization process results in organization, which makes everyone’s job easier, from the user who accesses tables, to the database administrator (DBA) who is responsible for the overall management of every object in the database. Data redundancy is reduced, which simplifies data structures and conserves disk space. Because duplicate data is minimized, the possibility of inconsistent data is greatly reduced. For example, in one table, an individual’s name might read STEVE SMITH, whereas it might read STEPHEN R. SMITH in another table. Reducing duplicate data increases data integrity, or the assurance of consistent and accurate data within a database. Because the database has been normalized and broken into smaller tables, you have more flexibility in modifying existing structures. Modifying a small table with little data is much easier than modifying one big table that holds all the vital data in the database. Lastly, security is enhanced because the DBA can grant specific users access to a limited number of tables. Security is easier to control after normalization because data is grouped into neatly organized sets.

Drawbacks of Normalization

Most successful databases are normalized to some degree, but this process does incur one substantial drawback: reduced database performance. Query or transaction requests sent to the database then face limitations such as CPU usage, memory usage, and input/output (I/O) issues. To make a long story short, a normalized database requires much more CPU, memory, and I/O to process transactions and database queries than a denormalized database. A normalized database must locate the requested tables and then join the data from the tables to either get the requested information or process the wanted data. Hour 21, “Managing Database Users and Security,” includes a more in-depth discussion of database performance.

Denormalizing a Database

Denormalization is the process of starting with a normalized database and modifying table structures to allow controlled redundancy for increased database performance. Attempting to improve performance is the only reason to denormalize a database. A denormalized database is not the same as a database that has not been normalized. Denormalizing a database is the process of taking the level of normalization within the database down a notch or two. Remember, normalization can actually slow performance with its frequent table join operations.

Denormalization might involve recombining separate tables or creating duplicate data within tables, to reduce the number of tables that need to be joined to retrieve the requested data. This would result in less I/O and CPU time. Denormalization is normally advantageous in larger data warehousing applications, where aggregate calculations are made across millions of rows of data within tables.

Denormalization comes at a cost. Data redundancy increases in a denormalized database, which can improve performance but requires more extraneous efforts to keep track of related data. Application coding is more complicated because the data has been spread across various tables and might be more difficult to locate. In addition, referential integrity is more of a chore because related data is divided among multiple tables.

You can achieve a happy medium in both normalization and denormalization, but both processes require a thorough knowledge of the actual data and the specific business requirements of the pertinent company. If you decide to denormalize parts of your database structure, carefully document the process so that you can see exactly how you are handling issues such as redundancy to maintain data integrity within your systems.

Note

The Most Effective Normal Form

The third normal form is usually the most common and effective normal form applied to the design of a relational database. In most cases, the third normal form also provides a good balance between optimal design with minimal data redundancy and overall performance.

Applying Normalization to Your Database

This section walks you through the basic process of normalizing a database, based on the guidelines of the first three normal forms with the bird rescue data that you have been modeling during the last few hours.

Applying First Normal Form to Bird Rescues Data

In the example in Figure 7.8, basic bird rescue information is modelled in the leftmost box. To adhere to the first normal form, entities are created for each group of data with primary key designations. This is only a subset of data that might exist within the bird rescue data; remember that any example shown can vary, depending on your interpretation of the data itself and the data relationships.

Integrating bird rescue data into the BIRDS data using first normal form. — **FIGURE 7.8**
Integrating bird rescue data into the `BIRDS` data using first normal form

In this example in Figure 7.8, RESCUES is broken into two separate entities, called RESCUES and STAFF. This is because RESCUES is a group of data and a primary key of RESCUE_ID has been designated here. STAFF is another group of data and relates to RESCUES, but it has its own primary key of STAFF_ID that uniquely identifies every record within that entity. The relationship between RESCUES and STAFF in this example is a many-to-many relationship. Thus, each bird rescue might have many staff members, and each staff member might volunteer or work full time or part time at various different bird rescues. This is an example of first normal form.

Applying Second Normal Form to Bird Rescues Data

In the example in Figure 7.9, the bird rescue data that was shown in Figure 7.8 has been further expanded using the guidelines of the second normal form. Study this and try to understand both the relationships and why the data was further divided.

Integrating bird rescue data into the BIRDS data using second normal form. — **FIGURE 7.9**
Integrating bird rescue data into the `BIRDS` data using second normal form

Looking at Figure 7.9, you can see that RESCUES and STAFF are further divided into three different entities. STAFF originally had pay information about each staff member. However, every staff member could conceivably have multiple pay records, and each different type of salary or position could be applicable to various different staff members. Therefore, STAFF was broken into STAFF and STAFF_PAY. STAFF_PAY has its own identifier as a primary key called STAFF_ID, which is a child record or foreign key that references the STAFF_ID in the STAFF entity. In the second normal form, you move attributes to their own entity that are only partly dependent on the parent entity. Pay information related to a staff member is related but only partly dependent, so it then becomes its own entity.

Applying Third Normal Form to Bird Rescues Data

Figure 7.10 shows how the data in Figure 7.9 for bird rescues has been expanded into additional entities and attributes when applying the guidelines of the third normal form. Study this figure and try to understand the relationships between the entities and all the primary and foreign keys that now exist.

Integrating bird rescue data into the BIRDS data using third normal form. — **FIGURE 7.10**
Integrating bird rescue data into the `BIRDS` data using third normal form

After reviewing Figure 7.10, you can see that STAFF, which originally was only one entity, is now a total of four entities. STAFF_PAY itself has been broken into three different entities: STAFF_PAY, STAFF_TYPE, and STAFF_POSITIONS. Remember that the goal of normalization is to eliminate or reduce redundant data. The STAFF_TYPE (which could be perhaps full time, part time, or volunteer) is a type that might apply to any staff member in any rescue. Therefore, the STAFF_TYPE name itself should really exist only one time in the database. There is no reason for a staff type to exist more than once by name, although it can be associated with any staff member or rescue in the database now through relationships. The staff position itself can be perhaps a manager, an owner, or a bird expert, but each position should exist only one time in the database. The POSITION_ID helps ensure that every position is unique in the database, just as TYPE_ID for the STAFF_TYPE ensures that each staff member exists only one time in the database. STAFF_POSITIONS is a child entity that references STAFF_PAY, just as STAFF_TYPE is a child entity that references STAFF_PAY. In Figure 7.10, all redundant data at this point that has been recognized has been isolated into its own entities. This is the third normal form.

Summary

The normalization process gives you a way to effectively design your relational database so that you eliminate or reduce redundant data as much as possible and also organize your data entities and attributes effectively. Sometimes you need to make a difficult decision: to normalize or not to normalize. You always want to normalize a database, to some degree. But how much do you normalize a database without destroying performance? The real decision relies on the application. How large is the database? What is its purpose? What types of users will access the data? This hour covered the three most common normal forms, the concepts behind the normalization process, and the integrity of data. The normalization process involves many steps, most of which are optional but still vital to the functionality and performance of your database. Regardless of how deep you decide to normalize, you almost always face a trade-off, between either simple maintenance and questionable performance, or complicated maintenance and better performance. In the end, the individual (or team of individuals) designing the database must decide, and that person or team is responsible.

Q&A

Q. Why should I be so concerned with the end users’ needs when designing the database?

A. The end users might not have the technical knowledge, but they are some of the most knowledgeable data experts who use the database. Therefore, they should be at the center of any database design effort. The database designer only helps organize the data.

Q. Is normalization more advantageous than denormalization?

A. Normalization can be more advantageous. However, denormalization also can be more advantageous in some situations. Remember, many factors help determine which way to go. You will probably normalize your database to reduce repetition in the database, but you might denormalize it to a certain extent, to improve performance.

Q. Are there more than three levels of normalization?

A. More levels of normalization can be achieved in a relational database. However, keep in mind that this is not a database design book. The third normal form is the most common normal form that is used to model most relational databases. This book simply provides a general overview of the relational database and database design. The third normal form is really all you need to know to fully understand and unleash the power of SQL in most situations.

Workshop

The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

1. True or false: Normalization is the process of grouping data into logical related groups.
2. True or false: Having no duplicate data or redundant data in a database, and having everything in the database normalized, is always the best way to go.
3. True or false: If data is in the third normal form, it is automatically in the first and second normal forms.
4. What is a major advantage of a denormalized database versus a normalized database?
5. What are some major disadvantages of denormalization?
6. How do you determine whether data needs to be moved to a separate table when normalizing your database?
7. What are the disadvantages of overnormalizing your database design?
8. Why is it important to eliminate redundant data?
9. What is the most common level of normalization?

Exercises

1. Assuming that you came up with a similar configuration to the entities shown in Figure 7.11 for the photographer data set that is to be incorporated into the BIRDS database, take a minute to compare this example to yours. Feel free to use this example as a baseline for these exercises, use your own, or combine the two as you see fit. Also review Figure 6.12 in Hour 6, “Defining Entities and Relationships,” to envision how this data might be integrated into the original BIRDS database.

Integrating bird rescue data into the BIRDS data using third normal form. The RESCUES and STAFF are further divided into five different entities. The data elements in the RESCUES are: hash Rescue_Id, Rescue_Name, Rescue_Address. STAFF is broken into four entities. The data elements in the STAFF are: hash Staff_Id, Staff_Name, Staff_Address. STAFF_PAY itself has been broken into three different entities: STAFF_ PAY, STAFF_TYPE, and STAFF_POSITIONS. The data elements in the STAFF_PAY are: hash Staff_Id, Staff_Salary, Type_Id, and Position_Id. The data elements in the Staff_Type are: hash Type_Id and Type_Name. The data elements in the Staff_Positions are: hash Position_Id and Position_Name. The relationship between RESCUES and STAFF is many-to-many. The relationship between STAFF and STAFF_PAY, between STAFF_PAY and Staff_Type, and between STAFF_PAY and Staff_Positions is one-to-many.

FIGURE 7.11
Example entities and attributes for photographer data
2. List some of the redundant data that you see in this example and the ERD you modeled.
3. Use the guidelines of the first normal form to model your database appropriately.
4. Use the guidelines in this hour to take your data model to the second normal form.
5. Finally, use the guidelines in this hour to take your data model to the third normal form.
6. Verbally describe all relationships in your third normal form model.
7. List all primary and foreign keys from your third normal form model.
8. Can you envision any other data that could be added to your ERD?