Hour 8

Defining Data Structures

What You’ll Learn in This Hour:

In this hour, you learn about the characteristics of data and how such data is stored in a relational database. After a brief discussion of data itself, you learn the basic data types that exist within the standard language of SQL. You see examples of different types of data and then apply those to the BIRDS database. You will walk through how to assign data types to the bird rescue data and, finally, assign data types to the photographer database that you have been designing to this point. Up to now, this book has focused on the concepts and logical design for building a foundation for a strong SQL database and applying SQL to everything you have learned. Now, we move on to the application of data types in SQL to begin defining the data that you have modeled.

Note

Moving from Logical to Physical Databases

So far, this book has mostly dealt with the logical design of a database. During the planning and design of a database, you typically refer to groups of data, or entities. You also refer to attributes, or fields within those entities. This hour uses more physical database terminology, which involves entities seen as tables that actually store physical data, and what we see now as columns within tables that were previously represented logically as attributes. A table is a group of data and columns or attributes that physically store the data.

Defining Data

Data is a collection of information stored in a database as one of several different data types. Data includes names, numbers, dollar amounts, text, graphics, decimals, figures, calculations, summarization, and just about anything else you can possibly imagine. Data can be stored in upper case, lower case, or mixed case. Data can be manipulated or changed; most data does not remain static for its lifetime.

Data types provide rules for data for particular columns. A data type deals with the way values are stored in a column for the length allocated for a column and whether certain values (such as alphanumeric, numeric, and date and time data) are allowed. A data type exists for every possible bit or combination of data that can be stored in a particular database. These data types store data such as characters, numbers, date and time, images, and other binary data. More specifically, the data might consist of names, descriptions, numbers, calculations, images, image descriptions, documents, and so forth.

Understanding Basic Data Types

The following sections discuss the basic data types that ANSI SQL supports. Data types are characteristics of the data itself whose attributes are placed in fields within a table. For example, you can specify that a field must contain numeric values, disallowing alphanumeric strings to be entered. After all, you would not want to enter alphabetic characters in a field for a dollar amount. Defining each field in the database with a data type eliminates much of the incorrect data found in a database from data entry errors. Field definition (data type definition) is a form of data validation that controls the type of data that may be entered into each given field.

Depending on your implementation of the relational database management system (RDBMS), certain data types can be converted automatically to other data types. This type of conversion is known as an implicit conversion, which means that the database handles the conversion for you. An example of this is taking a numeric value of 1000.92 from a numeric field and inputting it into a string field. Other data types cannot be converted implicitly by the host RDBMS and, therefore, must undergo an explicit conversion. This usually involves using a SQL function, such as CAST or CONVERT. In the following Oracle example, the current system date is retrieved from the database in the default date format, which is a date data type:

SELECT CAST('12/27/1974' AS DATETIME) AS MYDATE

SQL> SELECT SYSDATE FROM DUAL;

SYSDATE
---------
08-SEP-15

Note

The SELECT Statement

The SELECT statement is one of the most important SQL commands. SELECT enables you to query data from within the database. The following examples are simple SELECT statements that show simple data within a database. Hour 12, “Introduction to Database Queries,” covers the SELECT statement in detail; subsequent chapters deal with how to effectively manage and get data from the database.

If you want to change or display the date in a format other than the default data type, you can apply the Oracle TO_CHAR function to display the date as a character string. In the next example, this function retrieves only the current month:

SQL> SELECT TO_CHAR(SYSDATE, 'Month') MONTH
  2  FROM DUAL;

MONTH
------------------------------------
September

As with most other languages, the basic data types in SQL are as follows:

  •    String types

  •    Numeric types

  •    Date and time types

Tip

SQL Data Types

Every implementation of SQL has its own specific set of data types. Using implementation-specific data types is necessary to support each implementation’s approach to storing data. However, the basics are the same among all implementations.

Fixed-Length Strings

Constant characters, or strings that always have the same length, are stored using a fixed-length data type. The following is the standard for a SQL fixed-length character:

CHARACTER(n)

Here, n represents a number identifying the allocated or maximum length of the particular field with this definition.

Some implementations of SQL use the CHAR data type to store fixed-length data. You can store alphanumeric data in this data type. An example of a constant-length data type is a state abbreviation; all state abbreviations are two characters.

Spaces normally fill any extra spots when using a fixed-length data type. If a field’s length is set to 10 and the data entered fills only 5 places, the remaining 5 spaces are recorded as spaces. This padding of spaces ensures that each value in a field is a fixed length.

Caution

Fixed-Length Data Types

Be careful not to use a fixed-length data type for fields that might contain varying-length values, such as an individual’s name. If you use the fixed-length data type inappropriately, you will eventually encounter problems, such as wasted available space and the inability to make accurate comparisons between data.

Always use the varying-length data type for nonconstant character strings, to save database space.

Varying-Length Strings

SQL supports the use of varying-length strings, or strings whose length is not constant for all data. The following is the standard for a SQL varying-length character:

CHARACTER VARYING(n)

Here, n represents a number identifying the allocated or maximum length of the particular field with this definition.

Common data types for variable-length character values are the VARCHAR, VARBINARY, and VARCHAR2 data types. VARCHAR is the ANSI standard, which Microsoft SQL Server and MySQL use; Oracle uses both VARCHAR and VARCHAR2. The data stored in a character-defined column can be alphanumeric, which means that the data value may contain numeric characters. VARBINARY is similar to VARCHAR and VARCHAR2, except that it contains a variable length of bytes. Normally, you use a type such as this to store some kind of digital data such as an image file.

Remember that fixed-length data types typically pad spaces to fill in allocated places that the field is not using. The varying-length data type does not work this way. For instance, if the allocated length of a varying-length field is 10 and a string of 5 characters is entered, the total length of that particular value is only 5. Spaces are not used to fill unused places in a column.

Large Object Types

Some variable-length data types need to hold longer lengths of data than what is traditionally reserved for a VARCHAR field. The BLOB and TEXT data types are two examples of such data types in modern database implementations. These data types are specifically made to hold large sets of data. The BLOB is a binary large object, so its data is treated as a large binary string (a byte string). A BLOB is especially useful in an implementation that needs to store binary media files in the database, such as images or MP3s.

The TEXT data type is a large character string data type that can be treated as a large VARCHAR field. It is often used when an implementation needs to store large sets of character data in the database. An example of this is storing HTML input from the entries of a blog site. Storing this type of data in the database enables the site to be dynamically updated.

Numeric Types

Numeric values are stored in fields that are defined as some type of number, typically referred to as NUMBER, INTEGER, REAL, DECIMAL, and so on.

The following are the standards for SQL numeric values:

  •    BIT(n)

  •    BIT VARYING(n)

  •    DECIMAL(p,s)

  •    INTEGER

  •    SMALLINT

  •    BIGINT

  •    FLOAT(p,s)

  •    DOUBLE PRECISION(p,s)

  •    REAL(s)

Here, p represents a number that identifies the allocated or maximum length of the particular field for each appropriate definition.

Additionally, s is a number to the right of the decimal point, as in 34.ss.

A common numeric data type in SQL implementations is NUMERIC, which accommodates the direction for numeric values provided by ANSI. Numeric values can be stored as zero, positive, negative, fixed, and floating-point numbers. The following is an example using NUMERIC:

NUMERIC(5)

This example restricts the maximum value entered in a particular field to 99999. Note that all the database implementations used for the examples support the NUMERIC type but implement it as a DECIMAL.

Decimal Types

Decimal values are numeric values that include the use of a decimal point. The standard for a decimal in SQL follows, where p is the precision and s is the decimal’s scale:

DECIMAL(p,s)

The precision is the total length of the numeric value. In a numeric defined DECIMAL(4,2), the precision is 4, which is the total length allocated for a numeric value. The scale is the number of digits to the right of the decimal point. The scale is 2 in the previous DECIMAL(4,2) example. If a value has more places to the right side of the decimal point than the scale allows, the value is rounded; for instance, 34.33 inserted into a DECIMAL(3,1) is typically rounded to 34.3.

If a numeric value is defined as the following data type, the maximum value allowed is 99.99:

DECIMAL(4,2)

The precision is 4, which represents the total length allocated for an associated value. The scale is 2, which represents the number of places, or bytes, reserved to the right side of the decimal point. The decimal point does not count as a character.

Allowed values for a column defined as DECIMAL(4,2) include the following:

  •    12

  •    12.4

  •    12.44

  •    12.449

The last numeric value, 12.449, is rounded off to 12.45 upon input into the column. In this case, any numbers between 12.445 and 12.449 are rounded to 12.45.

Integers

An integer is a numeric value that does not contain a decimal; it contains only whole numbers (both positive and negative).

Valid integers include the following:

  •    1

  •    0

  •    -1

  •    99

  •    -99

  •    199

Floating-Point Decimals

Floating-point decimals are decimal values whose precision and scale are variable lengths and virtually without limit. Any precision and scale is acceptable. The REAL data type designates a column with single-precision floating-point numbers. The DOUBLE PRECISION data type designates a column that contains double-precision floating-point numbers. To be considered a single-precision floating point, the precision must be between 1 and 21, inclusive. To be considered a double-precision floating point, the precision must be between 22 and 53, inclusive. The following are examples of the FLOAT data type:

  •    FLOAT

  •    FLOAT(15)

  •    FLOAT(50)

Date and Time Types

Date and time data types are used to keep track of information concerning dates and time. Standard SQL supports DATETIME data types, which include the following specific data types:

  •    DATE

  •    TIME

  •    DATETIME

  •    TIMESTAMP

The elements of a DATETIME data type consist of the following:

  •    YEAR

  •    MONTH

  •    DAY

  •    HOUR

  •    MINUTE

  •    SECOND

Note

Fractions and Leap Seconds

The SECOND element can also be broken down to fractions of a second. The range is from 00.000 to 61.999, although some implementations of SQL do not support this range. The extra 1.999 seconds is used for leap seconds.

Be aware that each implementation of SQL might have its own customized data type for dates and times. The previous data types and elements are standards to which each SQL vendor should adhere, but most implementations have their own data type for date values. Those variations can affect both appearance and the way date information is actually stored internally. A length is not normally specified for a date data type.

Literal Strings

A literal string is a series of characters, such as a name or a phone number, that is explicitly specified by a user or program. Literal strings consist of data with the same attributes as the previously discussed data types, but the value of the string is known. The value of a column is usually unknown because a column typically has a different value associated with each row of data in a table.

You do not actually specify data types with literal strings; you simply specify the string. Some examples of literal strings follow:

  •    'Hello'

  •    45000

  •    "45000"

  •    3.14

  •    'November 1, 1997'

The alphanumeric strings are enclosed by single quotation marks, whereas the number value 45000 is not. Also notice that the second numeric value of 45000 is enclosed in quotation marks. Generally, character strings require quotation marks, whereas numeric strings do not.

The process that converts a number to a numeric type is known as an implicit conversion. In this process, the database attempts to figure out what type it needs to create for the object. Thus, if you do not have a number enclosed within single quotation marks, the SQL compiler assumes that you want a numeric type. When working with data, make sure that the data is being represented as you want it to be. Otherwise, it might skew your results or result in an unexpected error. You will see later how literal strings are used with database queries.

NULL Data Types

As you should know from Hour 1, “Understanding the Relational Database and SQL,” a NULL value is a missing value or a column in a row of data that has not been assigned a value. NULL values are used in nearly all parts of SQL, including the creation of tables, search conditions for queries, and even literal strings.

A NULL value is designated using the keyword NULL.

Because the following is in quotations, it represents not a NULL value, but a literal string containing the characters N-U-L-L:

'NULL'

When using the NULL data type, it is important to realize that data is not required in a particular field. If data is always required for a given field, always use NOT NULL with a data type. If there is a chance that a field might not always contain data, using NULL is a better practice.

Boolean Values

A Boolean value is a value of TRUE, FALSE, or NULL. Boolean values are used to make data comparisons. For example, when criteria specified for a query return the Boolean value of TRUE, then data is returned. If a Boolean value of FALSE or NULL is returned, data might not be returned.

Consider the following example:

WHERE NAME = 'SMITH'

This line might be a condition found in a query. The condition is evaluated for every row of data in the table that is queried. If the value of NAME is SMITH for a row of data in the table, the condition returns the value TRUE, thereby returning the data associated with that record.

Most database implementations do not implement a strict Boolean type and instead opt to use their own methodology. MySQL contains the Boolean type, but it is merely a synonym for its existing TINYINT type. Oracle prefers to direct its users to use a CHAR(1) value to denote a Boolean, and Microsoft SQL Server uses a value known as BIT.

Note

Differences in Data Type Implementations

Some of the data types mentioned during this hour are not available by name in some implementations of SQL. Data types are often named differently among implementations of SQL, but the concept behind each data type remains. Most, if not all, data types are supported by relational databases.

User-Defined Types

A user-defined type is a data type that the user defines. User-defined types allow users to customize their own data types to meet data storage needs and are based on existing data types. User-defined data types can assist the developer by providing greater flexibility during database application development because they maximize the number of possibilities for data storage. The CREATE TYPE statement is used to create a user-defined type.

For example, you can create a type as follows in Oracle:

CREATE TYPE PERSON AS OBJECT
(NAME       VARCHAR (30),
 SSN        VARCHAR (9));

You can reference your user-defined type as follows:

CREATE TABLE EMP_PAY
(EMPLOYEE   PERSON,
 SALARY     DECIMAL(10,2),
 HIRE_DATE  DATE);

Notice that the data type referenced for the first column, EMPLOYEE, is PERSON. PERSON is the user-defined type you created in the first example.

Domains

A domain is a set of valid data types that can be used. A domain is associated with a data type, so only certain data is accepted. After you create a domain, you can add constraints to the domain. Constraints work with data types, allowing you to further specify acceptable data for a field. The domain is used like the user-defined type.

User-defined domains are not nearly as common as user-defined types; for example, they are not supported by Oracle. The following syntax does not work with the implementations downloaded for this book, but it is an example of a basic syntax to create a domain:

CREATE DOMAIN MONEY_D AS NUMBER(8,2);

You add constraints to your domain as follows:

ALTER DOMAIN MONEY_D
ADD CONSTRAINT MONEY_CON1
CHECK (VALUE > 5);

You reference the domain as follows:

CREATE TABLE EMP_PAY
(EMP_ID        NUMBER(9),
 EMP_NAME      VARCHAR2(30),
 PAY_RATE      MONEY_D);

Using Data Types in the BIRDS Database

At this point, you have seen a detailed ERD for the BIRDS database with entities, attributes, and relationships. Primary and foreign keys have also been discussed. Additionally, you have also been provided a list of the actual data contained within the BIRDS database so that you understand what the data looks like. Following is a list of the columns in each table in the BIRDS database, along with their assigned data types and whether each column is mandatory (meaning NULL or NOT NULL, by definition). Primary and foreign keys are also identified here for your convenience.

BIRDS
bird_id       number(3)     not null     PK
bird_name     varchar(30)   not null
height        number(4,2)   not null
weight        number(4,2)   not null
wingspan      number(4,2)   null
eggs          number(2)     not null
broods        number(1)     null
incubation    number(2)     not null
fledging      number(3)     not null
nest_builder  char(1)       not null


NICKNAMES
bird_id    number(3)    not null   PK, FK
nickname   varchar(30)  not null   PK, FK


LOCATIONS
location_id     number(2)    not null    PK
location_name   varchar(30)  not null


PHOTOS
photo_id            number(5)     not null  PK
photo_file          varchar(30)   not null
photo_date          date          not null
photo_location_id   number(2)     not null
bird_id             number(3)     not null  FK


FOOD
food_id     number(3)      not null   PK
food_name   varchar(30)    not null


BIRDS_FOOD
bird_id    number(3)    not null   PK, FK
food_id    number(3)    not null   PK, FK


NESTS
nest_id     number(1)     not null    PK
nest_name   varchar(20)   not null


BIRDS_NESTS
bird_id    number(3)     not null   PK, FK
nest_id    number(1)     not null   PK, FK


MIGRATION
migration_id          number(2)   not null   PK
migration_location    varchar(30) not null


BIRDS_MIGRATION
bird_id         number(3)  not null   PK, FK
migration_id    number(2)  not null   PK, FK

Summary

Countless data types are available with SQL. If you have programmed in other languages, you probably recognize many of the data types mentioned. Data types allow different types of data to be stored in the database, ranging from simple characters, to decimal points, to date and time. Data types also place rules on data that can be inserted into columns in a table. This helps maintain the integrity of data, along with constraints such as primary and foreign keys. The concept of data types is the same in all languages, whether programming in a third-generation language such as C and passing variables, or using a relational database implementation and coding in SQL. Of course, each implementation has its own names for standard data types, but they basically work the same. Also remember that an RDBMS does not have to implement all the data types in the ANSI standard to be considered ANSI compliant. Therefore, it is prudent to check with the documentation of your specific RDBMS implementation to see what options you have available.

Take care in planning for both the near future and the distant future when deciding on data types, lengths, scales, and precisions in which to store your data. Business rules and how you want the end user to access the data are other factors in deciding on specific data types. To assign proper data types, you should know the nature of the data and how the data in the database is related. Everything you have learned in this hour is immediately applicable to your database design and the relational data you encounter in the real world.

Q&A

Q. Why can I enter numbers such as a person’s Social Security number in fields defined as character fields?

A. Numeric values are still alphanumeric, which are allowed in string data types. The process is called an implicit conversion because the database system handles it automatically. Typically, the only data stored as numeric values are values used in computations. However, it might be helpful to define all numeric fields with a numeric data type, to control the data entered in that field.

Q. I still do not understand the difference between constant-length and varying-length data types. Can you explain?

A. Say you have an individual’s last name defined as a constant-length data type with a length of 20 bytes. Suppose that the individual’s name is Smith. When the data is inserted into the table, 20 bytes are taken: 5 for the name and 15 for the extra spaces. (Remember that this is a constant-length data type.) If you use a varying-length data type with a length of 20 and insert Smith, only 5 bytes of space are taken. If you then imagine that you are inserting 100,000 rows of data into this system, you could possibly save 1.5 million bytes of data.

Q. Are there limits on the lengths of data types?

A. Yes, there are limits on the lengths of data types. These vary among the various implementations.

Workshop

The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

  1. 1. What are the three most basic categories of data types?

  2. 2. True or false: An individual’s Social Security number, entered in the format '111111111', can be any of the following data types: constant-length character, varying-length character, or numeric.

  3. 3. True or false: The scale of a numeric value is the total length allowed for values.

  4. 4. Do all implementations use the same data types?

  5. 5. What are the precision and scale of the following?

    DECIMAL(4,2)
    DECIMAL(10,2)
    DECIMAL(14,1)
  6. 6. Which numbers can be inserted into a column whose data type is DECIMAL(4,1)?

    1. 16.2

    2. 116.2

    3. 16.21

    4. 1116.2

    5. 1116.21

Exercises

  1. 1. Assign the following column titles to a data type, decide on the proper length, and give an example of the data you would enter into that column:

    1. ssn

    2. state

    3. city

    4. phone_number

    5. zip

    6. last_name

    7. first_name

    8. middle_name

    9. salary

    10. hourly_pay_rate

    11. date_hired

  2. 2. Using the same column titles, decide whether they should be NULL or NOT NULL. Be sure to realize that, for some columns that would normally be NOT NULL, the column could be NULL, or vice versa, depending on the application:

    1. ssn

    2. state

    3. city

    4. phone_number

    5. zip

    6. last_name

    7. first_name

    8. middle_name

    9. salary

    10. hourly_pay_rate

    11. date_hired

  3. 3. Based on the birds rescue data from previous hours, assign data types and nullability as you see fit.

  4. 4. Based on the photographer data you have modeled, assign data types and nullability as you see fit.