What You’ll Learn in This Hour:
▶ A look at the underlying data of a table
▶ An introduction to the basic data types
▶ Instruction on using the various data types
▶ Examples depicting differences between data types
▶ Data types applied to the BIRDS
database
▶ Hands-on exercises to apply data types to your database design
In this hour, you learn about the characteristics of data and how such data is stored in a relational database. After a brief discussion of data itself, you learn the basic data types that exist within the standard language of SQL. You see examples of different types of data and then apply those to the BIRDS
database. You will walk through how to assign data types to the bird rescue data and, finally, assign data types to the photographer database that you have been designing to this point. Up to now, this book has focused on the concepts and logical design for building a foundation for a strong SQL database and applying SQL to everything you have learned. Now, we move on to the application of data types in SQL to begin defining the data that you have modeled.
Note
Moving from Logical to Physical Databases
So far, this book has mostly dealt with the logical design of a database. During the planning and design of a database, you typically refer to groups of data, or entities. You also refer to attributes, or fields within those entities. This hour uses more physical database terminology, which involves entities seen as tables that actually store physical data, and what we see now as columns within tables that were previously represented logically as attributes. A table is a group of data and columns or attributes that physically store the data.
Data is a collection of information stored in a database as one of several different data types. Data includes names, numbers, dollar amounts, text, graphics, decimals, figures, calculations, summarization, and just about anything else you can possibly imagine. Data can be stored in upper case, lower case, or mixed case. Data can be manipulated or changed; most data does not remain static for its lifetime.
Data types provide rules for data for particular columns. A data type deals with the way values are stored in a column for the length allocated for a column and whether certain values (such as alphanumeric, numeric, and date and time data) are allowed. A data type exists for every possible bit or combination of data that can be stored in a particular database. These data types store data such as characters, numbers, date and time, images, and other binary data. More specifically, the data might consist of names, descriptions, numbers, calculations, images, image descriptions, documents, and so forth.
The following sections discuss the basic data types that ANSI SQL supports. Data types are characteristics of the data itself whose attributes are placed in fields within a table. For example, you can specify that a field must contain numeric values, disallowing alphanumeric strings to be entered. After all, you would not want to enter alphabetic characters in a field for a dollar amount. Defining each field in the database with a data type eliminates much of the incorrect data found in a database from data entry errors. Field definition (data type definition) is a form of data validation that controls the type of data that may be entered into each given field.
Depending on your implementation of the relational database management system (RDBMS), certain data types can be converted automatically to other data types. This type of conversion is known as an implicit conversion, which means that the database handles the conversion for you. An example of this is taking a numeric value of 1000.92 from a numeric field and inputting it into a string field. Other data types cannot be converted implicitly by the host RDBMS and, therefore, must undergo an explicit conversion. This usually involves using a SQL function, such as CAST
or CONVERT
. In the following Oracle example, the current system date is retrieved from the database in the default date format, which is a date data type:
SELECT CAST('12/27/1974' AS DATETIME) AS MYDATE SQL> SELECT SYSDATE FROM DUAL; SYSDATE --------- 08-SEP-15
Note
The SELECT
Statement
The SELECT
statement is one of the most important SQL commands. SELECT
enables you to query data from within the database. The following examples are simple SELECT
statements that show simple data within a database. Hour 12, “Introduction to Database Queries,” covers the SELECT
statement in detail; subsequent chapters deal with how to effectively manage and get data from the database.
If you want to change or display the date in a format other than the default data type, you can apply the Oracle TO_CHAR
function to display the date as a character string. In the next example, this function retrieves only the current month:
SQL> SELECT TO_CHAR(SYSDATE, 'Month') MONTH 2 FROM DUAL; MONTH ------------------------------------ September
As with most other languages, the basic data types in SQL are as follows:
▶ String types
▶ Numeric types
▶ Date and time types
Tip
SQL Data Types
Every implementation of SQL has its own specific set of data types. Using implementation-specific data types is necessary to support each implementation’s approach to storing data. However, the basics are the same among all implementations.
Constant characters, or strings that always have the same length, are stored using a fixed-length data type. The following is the standard for a SQL fixed-length character:
CHARACTER(n)
Here, n
represents a number identifying the allocated or maximum length of the particular field with this definition.
Some implementations of SQL use the CHAR
data type to store fixed-length data. You can store alphanumeric data in this data type. An example of a constant-length data type is a state abbreviation; all state abbreviations are two characters.
Spaces normally fill any extra spots when using a fixed-length data type. If a field’s length is set to 10
and the data entered fills only 5 places, the remaining 5 spaces are recorded as spaces. This padding of spaces ensures that each value in a field is a fixed length.
Caution
Fixed-Length Data Types
Be careful not to use a fixed-length data type for fields that might contain varying-length values, such as an individual’s name. If you use the fixed-length data type inappropriately, you will eventually encounter problems, such as wasted available space and the inability to make accurate comparisons between data.
Always use the varying-length data type for nonconstant character strings, to save database space.
SQL supports the use of varying-length strings, or strings whose length is not constant for all data. The following is the standard for a SQL varying-length character:
CHARACTER VARYING(n)
Here, n
represents a number identifying the allocated or maximum length of the particular field with this definition.
Common data types for variable-length character values are the VARCHAR
, VARBINARY
, and VARCHAR2
data types. VARCHAR
is the ANSI standard, which Microsoft SQL Server and MySQL use; Oracle uses both VARCHAR
and VARCHAR2
. The data stored in a character-defined column can be alphanumeric, which means that the data value may contain numeric characters. VARBINARY
is similar to VARCHAR
and VARCHAR2
, except that it contains a variable length of bytes. Normally, you use a type such as this to store some kind of digital data such as an image file.
Remember that fixed-length data types typically pad spaces to fill in allocated places that the field is not using. The varying-length data type does not work this way. For instance, if the allocated length of a varying-length field is 10
and a string of 5 characters is entered, the total length of that particular value is only 5. Spaces are not used to fill unused places in a column.
Some variable-length data types need to hold longer lengths of data than what is traditionally reserved for a VARCHAR
field. The BLOB
and TEXT
data types are two examples of such data types in modern database implementations. These data types are specifically made to hold large sets of data. The BLOB
is a binary large object, so its data is treated as a large binary string (a byte string). A BLOB
is especially useful in an implementation that needs to store binary media files in the database, such as images or MP3s.
The TEXT
data type is a large character string data type that can be treated as a large VARCHAR
field. It is often used when an implementation needs to store large sets of character data in the database. An example of this is storing HTML input from the entries of a blog site. Storing this type of data in the database enables the site to be dynamically updated.
Numeric values are stored in fields that are defined as some type of number, typically referred to as NUMBER
, INTEGER
, REAL
, DECIMAL
, and so on.
The following are the standards for SQL numeric values:
▶ BIT(n)
▶ BIT VARYING(n)
▶ DECIMAL(p,s)
▶ INTEGER
▶ SMALLINT
▶ BIGINT
▶ FLOAT(p,s)
▶ DOUBLE PRECISION(p,s)
▶ REAL(s)
Here, p
represents a number that identifies the allocated or maximum length of the particular field for each appropriate definition.
Additionally, s
is a number to the right of the decimal point, as in 34.ss
.
A common numeric data type in SQL implementations is NUMERIC
, which accommodates the direction for numeric values provided by ANSI. Numeric values can be stored as zero, positive, negative, fixed, and floating-point numbers. The following is an example using NUMERIC
:
NUMERIC(5)
This example restricts the maximum value entered in a particular field to 99999
. Note that all the database implementations used for the examples support the NUMERIC
type but implement it as a DECIMAL
.
Decimal values are numeric values that include the use of a decimal point. The standard for a decimal in SQL follows, where p
is the precision and s
is the decimal’s scale:
DECIMAL(p,s)
The precision is the total length of the numeric value. In a numeric defined DECIMAL(4,2)
, the precision is 4, which is the total length allocated for a numeric value. The scale is the number of digits to the right of the decimal point. The scale is 2 in the previous DECIMAL(4,2)
example. If a value has more places to the right side of the decimal point than the scale allows, the value is rounded; for instance, 34.33
inserted into a DECIMAL(3,1)
is typically rounded to 34.3
.
If a numeric value is defined as the following data type, the maximum value allowed is 99.99
:
DECIMAL(4,2)
The precision is 4, which represents the total length allocated for an associated value. The scale is 2, which represents the number of places, or bytes, reserved to the right side of the decimal point. The decimal point does not count as a character.
Allowed values for a column defined as DECIMAL(4,2)
include the following:
▶ 12
▶ 12.4
▶ 12.44
▶ 12.449
The last numeric value, 12.449
, is rounded off to 12.45
upon input into the column. In this case, any numbers between 12.445
and 12.449
are rounded to 12.45
.
An integer is a numeric value that does not contain a decimal; it contains only whole numbers (both positive and negative).
Valid integers include the following:
▶ 1
▶ 0
▶ -1
▶ 99
▶ -99
▶ 199
Floating-point decimals are decimal values whose precision and scale are variable lengths and virtually without limit. Any precision and scale is acceptable. The REAL
data type designates a column with single-precision floating-point numbers. The DOUBLE PRECISION
data type designates a column that contains double-precision floating-point numbers. To be considered a single-precision floating point, the precision must be between 1 and 21, inclusive. To be considered a double-precision floating point, the precision must be between 22 and 53, inclusive. The following are examples of the FLOAT
data type:
▶ FLOAT
▶ FLOAT(15)
▶ FLOAT(50)
Date and time data types are used to keep track of information concerning dates and time. Standard SQL supports DATETIME
data types, which include the following specific data types:
▶ DATE
▶ TIME
▶ DATETIME
▶ TIMESTAMP
The elements of a DATETIME
data type consist of the following:
▶ YEAR
▶ MONTH
▶ DAY
▶ HOUR
▶ MINUTE
▶ SECOND
Note
Fractions and Leap Seconds
The SECOND
element can also be broken down to fractions of a second. The range is from 00.000
to 61.999
, although some implementations of SQL do not support this range. The extra 1.999 seconds is used for leap seconds.
Be aware that each implementation of SQL might have its own customized data type for dates and times. The previous data types and elements are standards to which each SQL vendor should adhere, but most implementations have their own data type for date values. Those variations can affect both appearance and the way date information is actually stored internally. A length is not normally specified for a date data type.
A literal string is a series of characters, such as a name or a phone number, that is explicitly specified by a user or program. Literal strings consist of data with the same attributes as the previously discussed data types, but the value of the string is known. The value of a column is usually unknown because a column typically has a different value associated with each row of data in a table.
You do not actually specify data types with literal strings; you simply specify the string. Some examples of literal strings follow:
▶ 'Hello'
▶ 45000
▶ "45000"
▶ 3.14
▶ 'November 1, 1997'
The alphanumeric strings are enclosed by single quotation marks, whereas the number value 45000
is not. Also notice that the second numeric value of 45000
is enclosed in quotation marks. Generally, character strings require quotation marks, whereas numeric strings do not.
The process that converts a number to a numeric type is known as an implicit conversion. In this process, the database attempts to figure out what type it needs to create for the object. Thus, if you do not have a number enclosed within single quotation marks, the SQL compiler assumes that you want a numeric type. When working with data, make sure that the data is being represented as you want it to be. Otherwise, it might skew your results or result in an unexpected error. You will see later how literal strings are used with database queries.
As you should know from Hour 1, “Understanding the Relational Database and SQL,” a NULL
value is a missing value or a column in a row of data that has not been assigned a value. NULL
values are used in nearly all parts of SQL, including the creation of tables, search conditions for queries, and even literal strings.
A NULL
value is designated using the keyword NULL
.
Because the following is in quotations, it represents not a NULL
value, but a literal string containing the characters N-U-L-L
:
'NULL'
When using the NULL
data type, it is important to realize that data is not required in a particular field. If data is always required for a given field, always use NOT NULL
with a data type. If there is a chance that a field might not always contain data, using NULL
is a better practice.
A Boolean value is a value of TRUE
, FALSE
, or NULL
. Boolean values are used to make data comparisons. For example, when criteria specified for a query return the Boolean value of TRUE
, then data is returned. If a Boolean value of FALSE
or NULL
is returned, data might not be returned.
Consider the following example:
WHERE NAME = 'SMITH'
This line might be a condition found in a query. The condition is evaluated for every row of data in the table that is queried. If the value of NAME
is SMITH
for a row of data in the table, the condition returns the value TRUE
, thereby returning the data associated with that record.
Most database implementations do not implement a strict Boolean type and instead opt to use their own methodology. MySQL contains the Boolean type, but it is merely a synonym for its existing TINYINT
type. Oracle prefers to direct its users to use a CHAR(1)
value to denote a Boolean, and Microsoft SQL Server uses a value known as BIT
.
Note
Differences in Data Type Implementations
Some of the data types mentioned during this hour are not available by name in some implementations of SQL. Data types are often named differently among implementations of SQL, but the concept behind each data type remains. Most, if not all, data types are supported by relational databases.
A user-defined type is a data type that the user defines. User-defined types allow users to customize their own data types to meet data storage needs and are based on existing data types. User-defined data types can assist the developer by providing greater flexibility during database application development because they maximize the number of possibilities for data storage. The CREATE TYPE
statement is used to create a user-defined type.
For example, you can create a type as follows in Oracle:
CREATE TYPE PERSON AS OBJECT (NAME VARCHAR (30), SSN VARCHAR (9));
You can reference your user-defined type as follows:
CREATE TABLE EMP_PAY (EMPLOYEE PERSON, SALARY DECIMAL(10,2), HIRE_DATE DATE);
Notice that the data type referenced for the first column, EMPLOYEE
, is PERSON
. PERSON
is the user-defined type you created in the first example.
A domain is a set of valid data types that can be used. A domain is associated with a data type, so only certain data is accepted. After you create a domain, you can add constraints to the domain. Constraints work with data types, allowing you to further specify acceptable data for a field. The domain is used like the user-defined type.
User-defined domains are not nearly as common as user-defined types; for example, they are not supported by Oracle. The following syntax does not work with the implementations downloaded for this book, but it is an example of a basic syntax to create a domain:
CREATE DOMAIN MONEY_D AS NUMBER(8,2);
You add constraints to your domain as follows:
ALTER DOMAIN MONEY_D ADD CONSTRAINT MONEY_CON1 CHECK (VALUE > 5);
You reference the domain as follows:
CREATE TABLE EMP_PAY (EMP_ID NUMBER(9), EMP_NAME VARCHAR2(30), PAY_RATE MONEY_D);
BIRDS
DatabaseAt this point, you have seen a detailed ERD for the BIRDS
database with entities, attributes, and relationships. Primary and foreign keys have also been discussed. Additionally, you have also been provided a list of the actual data contained within the BIRDS
database so that you understand what the data looks like. Following is a list of the columns in each table in the BIRDS
database, along with their assigned data types and whether each column is mandatory (meaning NULL
or NOT NULL
, by definition). Primary and foreign keys are also identified here for your convenience.
BIRDS bird_id number(3) not null PK bird_name varchar(30) not null height number(4,2) not null weight number(4,2) not null wingspan number(4,2) null eggs number(2) not null broods number(1) null incubation number(2) not null fledging number(3) not null nest_builder char(1) not null NICKNAMES bird_id number(3) not null PK, FK nickname varchar(30) not null PK, FK LOCATIONS location_id number(2) not null PK location_name varchar(30) not null PHOTOS photo_id number(5) not null PK photo_file varchar(30) not null photo_date date not null photo_location_id number(2) not null bird_id number(3) not null FK FOOD food_id number(3) not null PK food_name varchar(30) not null BIRDS_FOOD bird_id number(3) not null PK, FK food_id number(3) not null PK, FK NESTS nest_id number(1) not null PK nest_name varchar(20) not null BIRDS_NESTS bird_id number(3) not null PK, FK nest_id number(1) not null PK, FK MIGRATION migration_id number(2) not null PK migration_location varchar(30) not null BIRDS_MIGRATION bird_id number(3) not null PK, FK migration_id number(2) not null PK, FK
Countless data types are available with SQL. If you have programmed in other languages, you probably recognize many of the data types mentioned. Data types allow different types of data to be stored in the database, ranging from simple characters, to decimal points, to date and time. Data types also place rules on data that can be inserted into columns in a table. This helps maintain the integrity of data, along with constraints such as primary and foreign keys. The concept of data types is the same in all languages, whether programming in a third-generation language such as C and passing variables, or using a relational database implementation and coding in SQL. Of course, each implementation has its own names for standard data types, but they basically work the same. Also remember that an RDBMS does not have to implement all the data types in the ANSI standard to be considered ANSI compliant. Therefore, it is prudent to check with the documentation of your specific RDBMS implementation to see what options you have available.
Take care in planning for both the near future and the distant future when deciding on data types, lengths, scales, and precisions in which to store your data. Business rules and how you want the end user to access the data are other factors in deciding on specific data types. To assign proper data types, you should know the nature of the data and how the data in the database is related. Everything you have learned in this hour is immediately applicable to your database design and the relational data you encounter in the real world.
Q. Why can I enter numbers such as a person’s Social Security number in fields defined as character fields?
A. Numeric values are still alphanumeric, which are allowed in string data types. The process is called an implicit conversion because the database system handles it automatically. Typically, the only data stored as numeric values are values used in computations. However, it might be helpful to define all numeric fields with a numeric data type, to control the data entered in that field.
Q. I still do not understand the difference between constant-length and varying-length data types. Can you explain?
A. Say you have an individual’s last name defined as a constant-length data type with a length of 20 bytes. Suppose that the individual’s name is Smith. When the data is inserted into the table, 20 bytes are taken: 5 for the name and 15 for the extra spaces. (Remember that this is a constant-length data type.) If you use a varying-length data type with a length of 20 and insert Smith
, only 5 bytes of space are taken. If you then imagine that you are inserting 100,000 rows of data into this system, you could possibly save 1.5 million bytes of data.
Q. Are there limits on the lengths of data types?
A. Yes, there are limits on the lengths of data types. These vary among the various implementations.
The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. What are the three most basic categories of data types?
2. True or false: An individual’s Social Security number, entered in the format '111111111'
, can be any of the following data types: constant-length character, varying-length character, or numeric.
3. True or false: The scale of a numeric value is the total length allowed for values.
4. Do all implementations use the same data types?
5. What are the precision and scale of the following?
DECIMAL(4,2) DECIMAL(10,2) DECIMAL(14,1)
6. Which numbers can be inserted into a column whose data type is DECIMAL(4,1)
?
16.2
116.2
16.21
1116.2
1116.21
1. Assign the following column titles to a data type, decide on the proper length, and give an example of the data you would enter into that column:
ssn
state
city
phone_number
zip
last_name
first_name
middle_name
salary
hourly_pay_rate
date_hired
2. Using the same column titles, decide whether they should be NULL
or NOT NULL
. Be sure to realize that, for some columns that would normally be NOT NULL
, the column could be NULL,
or vice versa, depending on the application:
ssn
state
city
phone_number
zip
last_name
first_name
middle_name
salary
hourly_pay_rate
date_hired
3. Based on the birds rescue data from previous hours, assign data types and nullability as you see fit.
4. Based on the photographer data you have modeled, assign data types and nullability as you see fit.