What You’ll Learn in This Hour:
▶ How indexes work
▶ How to create an index
▶ The different types of indexes
▶ When to use indexes
▶ When not to use indexes
In this hour, you learn how to improve SQL statement performance by creating and using indexes. You begin with the CREATE INDEX
command and learn how to use indexes that have been created on tables.
An index is basically a pointer to data in a table. An index in a database is similar to an index in the back of a book. If you want to reference all the pages in a book that discuss a certain topic, you first refer to the index, which lists all topics alphabetically; the index then refers you to one or more specific page numbers. An index in a database works the same way: It points a query to the exact physical location of data in a table. You are actually directed to the data’s location in an underlying file of the database, but as far as you are concerned, you are referring to a table.
Which works faster, looking through a book page by page for some information or searching the book’s index and getting a page number? Of course, using the book’s index is the most efficient method, especially if the book is large. If you have a smaller book with just a few pages, however, flipping through the chapters for the information might go faster than flipping back and forth between the index and the chapters. When a database does not use an index, it is performing a full table scan, the same as flipping through a book page by page. Hour 23, “Improving Database Performance,” discusses full table scans.
An index is typically stored separately from the table for which the index was created. An index’s main purpose is to improve the performance of data retrieval. Indexes can be created or dropped with no effect on the data. However, after an index is dropped, performance of data retrieval might slow. Indexes take up physical space and can often grow larger than the table. Therefore, you need to consider them when estimating your database storage needs.
When an index is created, it records the location of values in a table that are associated with the column that is indexed. Entries are added to the index when new data is added to the table. When a query is executed against the database and a condition is specified on a column in the WHERE
clause that is indexed, the index is first searched for the values specified in the WHERE
clause. If the value is found in the index, the index returns the exact location of the searched data in the table. Figure 22.1 illustrates the function of an index.
FIGURE 22.1
Table access using an index
Suppose the following query was issued:
SELECT * FROM TABLE_NAME WHERE NAME = 'SMITH';
As Figure 22.1 shows, the NAME
index is referenced to resolve the location of all names equal to SMITH
. After the location is determined, the data can quickly be retrieved from the table. The data—names, in this case—is alphabetized in the index.
Note
Variations of Index Creation
Indexes can be created during table creation in certain implementations. Most implementations accommodate a command, aside from the CREATE TABLE
command, used to create indexes. Check your particular implementation for the exact syntax for the command, if any, that is available to create an index.
A full table scan occurs if there is no index on the table and the same query is executed. This means, then, that every row of data in the table is read to retrieve information pertaining to all individuals with the name SMITH
.
An index works faster because it typically stores information in an orderly, treelike format. Say that you have a list of books upon which you place an index. The index has a root node, which is the beginning point of each query. Then it is split into branches. Maybe in this case the index has two branches, one for letters A–L and the other for letters M–Z. If you ask for a book with a name that starts with the letter M, you enter the index at the root node and immediately travel to the branch containing letters M–Z. This effectively shortens the time needed to find the book by eliminating nearly half of the possibilities.
CREATE INDEX
CommandAs with many other statements in SQL, the CREATE INDEX
statement, varies greatly among relational database vendors. This command is used to create the various types of indexes available for a table. Most relational database implementations use the CREATE INDEX
statement:
CREATE INDEX INDEX_NAME ON TABLE_NAME
Some implementations allow the specification of a storage clause (as with the CREATE TABLE
statement), ordering (DESC
||ASC
), and the use of clusters. Check your particular implementation for its correct syntax.
You can create different types of indexes on tables in a database, all of which serve the same goal: to improve database performance by expediting data retrieval. This hour discusses single-column indexes, composite indexes, and unique indexes.
Indexing on a single column of a table is the simplest and most common manifestation of an index. A single-column index is one that is created based on only one table column. The basic syntax follows:
CREATE INDEX INDEX_NAME_IDX ON TABLE_NAME (COLUMN_NAME)
As you look at a few examples of creating indexes, you first create a table called BIRDS_TRAVEL
that is based on birds and their migration habits. Following is the CREATE
statement and a SELECT
statement from the new table that returns 56 rows of data. Remember that even though the database contains only 23 birds in the database, many birds migrate to many different locations. This creates a situation of duplicate bird names and duplicate migration location names in a table. In this example, 56 rows is not a lot. However, in a real organization, tables can easily contain millions of rows of data. This is where indexing is necessary to maintain acceptable performance.
SQL> create table birds_travel as 2 select b.bird_id, b.bird_name, b.wingspan, m.migration_location 3 from birds b, 4 migration m, 5 birds_migration bm 6 where b.bird_id = bm.bird_id 7 and m.migration_id = bm.migration_id; Table created. SQL> select * from birds_travel; BIRD_ID BIRD_NAME WINGSPAN MIGRATION_LOCATION ---------- -------------------- ---------- --------------------- 1 Great Blue Heron 78 Central America 3 Common Loon 99 Central America 7 Osprey 72 Central America 8 Belted Kingfisher 23 Central America 10 Pied-billed Grebe 6.5 Central America 11 American Coot 29 Central America 12 Common Sea Gull 18 Central America 14 Double-crested Cormorant 54 Central America 15 Common Merganser 34 Central America 16 Turkey Vulture 72 Central America 18 Green Heron 26.8 Central America 21 Great Egret 67.2 Central America 1 Great Blue Heron 78 Mexico 3 Common Loon 99 Mexico 7 Osprey 72 Mexico 8 Belted Kingfisher 23 Mexico 10 Pied-billed Grebe 6.5 Mexico 11 American Coot 29 Mexico 12 Common Sea Gull 18 Mexico 13 Ring-billed Gull 50 Mexico 14 Double-crested Cormorant 54 Mexico 15 Common Merganser 34 Mexico 16 Turkey Vulture 72 Mexico 18 Green Heron 26.8 Mexico 21 Great Egret 67.2 Mexico 22 Anhinga 42 Mexico 5 Golden Eagle 99 No Significant Migration 9 Canadian Goose 99 No Significant Migration 17 American Crow 39.6 No Significant Migration 20 Brown Pelican 99 No Significant Migration 23 Black Skimmer 15 No Significant Migration 19 Mute Swan 99 Partial, Open Water 1 Great Blue Heron 78 South America 7 Osprey 72 South America 8 Belted Kingfisher 23 South America 12 Common Sea Gull 18 South America 16 Turkey Vulture 72 South America 18 Green Heron 26.8 South America 1 Great Blue Heron 78 Southern United States 2 Mallard 72 Southern United States 3 Common Loon 99 Southern United States 4 Bald Eagle 99 Southern United States 6 Red Tailed Hawk 48 Southern United States 7 Osprey 72 Southern United States 8 Belted Kingfisher 23 Southern United States 9 Canadian Goose 99 Southern United States 10 Pied-billed Grebe 6.5 Southern United States 11 American Coot 29 Southern United States 12 Common Sea Gull 18 Southern United States 13 Ring-billed Gull 50 Southern United States 14 Double-crested Cormorant 54 Southern United States 15 Common Merganser 34 Southern United States 16 Turkey Vulture 72 Southern United States 18 Green Heron 26.8 Southern United States 21 Great Egret 67.2 Southern United States 22 Anhinga 42 Southern United States 56 rows selected.
The following focuses on creating a simple index on the BIRDS_TRAVEL
table on the BIRD_NAME
column. This column is being indexed because it would be typically used in a query to search for specific data within this table.
SQL> create index birds_travel_idx1 2 on birds_travel (bird_name); Index created.
Tip
Best Places for Single-Column Indexes
Single-column indexes are most effective when they are used on columns that are frequently used alone in the WHERE
clause as query conditions. Good candidates for a single-column index are individual identification numbers, serial numbers, and system-assigned keys.
You use unique indexes for performance and data integrity. A unique index does not allow duplicate values to be inserted into the table. Otherwise, the unique index performs the same way a regular index performs. The syntax follows:
CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME (COLUMN_NAME)
The first example that follows attempts to create a unique index on the BIRD_NAME
within the BIRDS_TRAVEL
table. An error message is returned. A unique index cannot be created for this column because duplicate rows of data are already found in the table. Refer back to the previous output of the query. You can see duplicate values of bird names in the BIRDS_TRAVEL
table because birds can migrate to many different locations.
SQL> create unique index birds_travel_idx2 2 on birds_travel (bird_name); on birds_travel (bird_name) * ERROR at line 2: ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found
Next is a legitimate example. Here a unique index is created on the combination of the BIRD_NAME
column in the MIGRATION_LOCATION
column. Even though both birds and migration locations have duplicate entries in this table, the combination of a bird’s name and the migration location should always be unique.
SQL> create unique index birds_travel_idx3 2 on birds_travel (bird_name, migration_location); Index created.
You might be wondering, what happens if a bird’s identification number is the primary key for a table? An index is usually implicitly created when you define a primary key for a table, so normally you do not also have to create a unique index on the table.
When working with objects such as unique indexes, it is often beneficial to create the indexes on empty tables during the creation of the database structure. This ensures that the data going into the structure already meets the demand of the constraints you want to place on it. If you work with existing data, you want to analyze the impact of whether the data needs to be adjusted to properly apply the index.
Tip
Unique Index Constraints
You can create a unique index only on a column in a table whose values are unique. In other words, you cannot create a unique index on an existing table with data that already contains records on the indexed key that are not unique. Similarly, you cannot create a unique index on a column that allows for NULL
values. If you attempt to create a unique index on a column that violates one of these principles, the statement fails.
A composite index is an index on two or more columns of a table. You should consider performance when creating a composite index because the order of columns in the index has a measurable effect on the data retrieval speed. Generally, the most restrictive value should be placed first for optimum performance. However, the columns that are always specified in your queries should be placed first. The syntax follows:
CREATE INDEX INDEX_NAME ON TABLE_NAME (COLUMN1, COLUMN2)
An example of a composite index follows:
SQL> create index birds_travel_idx4 2 on birds_travel (bird_name, wingspan); Index created.
In this example, you create a composite index based on two columns in the BIRDS_TRAVEL
table: BIRD_NAME
and WINGSPAN
. You might create a composite index here because of the assumption that these two columns are frequently used together as conditions in the WHERE
clause of a query.
In deciding whether to create a single-column index or a composite index, consider the column(s) that you might use frequently in a query’s WHERE
clause as filter conditions. If only one column is used, choose a single-column index. If two or more columns are frequently used in the WHERE
clause as filters, a composite index is the best choice.
Implicit indexes are indexes that are automatically created by the database server when an object is created. Indexes are automatically created for primary key constraints and unique constraints.
Why are indexes automatically created for these constraints? Imagine a database server. Now say that a user adds a new product to the database. The product identification is the primary key on the table, which means that it must be a unique value. To efficiently make sure the new value is unique among hundreds or thousands of records, the product identifications in the table must be indexed. Therefore, when you create a primary key or a unique constraint, an index is automatically created for you.
Unique indexes are implicitly used with a primary key for the primary key to work. Foreign keys are also excellent candidates for an index because you often use them to join the parent table. Most, if not all, columns used for table joins should be indexed.
Columns that you frequently reference in the ORDER BY
and GROUP BY
clauses should be considered for indexes. For example, if you are sorting on an individual’s name, having an index on the name column is beneficial: It renders an automatic alphabetical order on every name, thus simplifying the actual sort operation and expediting the output results.
Furthermore, you should create indexes on columns with a high number of unique values, or columns that, when used as filter conditions in the WHERE
clause, return a low percentage of rows of data from a table. This is where trial and error might come into play. Just as you should always test production code and database structures before implementing them into production, you should test indexes. Your testing should center on trying different combinations of indexes, no indexes, single-column indexes, and composite indexes; no cut-and-dried rule governs using indexes. The effective use of indexes requires a thorough knowledge of table relationships, query and transaction requirements, and the data itself.
Note
Plan for Indexing Accordingly
Be sure to plan your tables and indexes. Don’t assume that because an index has been created, all performance issues are resolved. The index might not help at all—it might actually hinder performance and take up disk space.
Although indexes are intended to enhance a database’s performance, sometimes you should avoid them. The following guidelines indicate when to reconsider using an index:
▶ Avoid using indexes on small tables. Indexes have an overhead associated with them in terms of query time to access them. In the case of small tables, the query engine to do a quick scan over the table more quickly than it can look at an index first.
▶ Do not use indexes on columns that return a high percentage of data rows when used as a filter condition in a query’s WHERE
clause. For instance, you do not see entries for the words the or and in the index of a book.
▶ You can index tables that have a frequent, large batch update jobs run. However, the index considerably slows the batch job’s performance. You can correct the conflict of having an index on a table that is frequently loaded or manipulated by a large batch process in this way: Drop the index before the batch job and then re-create the index after the job has completed. The indexes are also updated as the data is inserted, causing additional overhead.
▶ Do not use indexes on columns that contain a high number of NULL
values. Indexes operate best on columns that have a higher uniqueness of data between rows. If a lot of NULL
values exist, the index will skew toward the NULL
values, which might affect performance.
▶ Avoid indexing columns that are frequently manipulated. Maintenance on the index can become excessive.
You can see in Figure 22.2 that an index on a column such as gender might not prove beneficial. For example, suppose the following query was submitted to the database:
SELECT * FROM TABLE_NAME WHERE GENDER = 'FEMALE';
FIGURE 22.2
An example of an ineffective index
Referring to Figure 22.2, which is based on the previous query, you can see constant activity between the table and its index. Because a high number of data rows is returned for WHERE GENDER = 'FEMALE'
(or 'MALE'
), the database server constantly has to read the index, then the table, then the index, then the table, and so on. In this case, it might be more efficient to use a full table scan because a high percentage of the table must be read anyway.
Caution
Indexes Can Sometimes Lead to Performance Problems
Use caution when creating indexes on a table’s extremely long keys. Performance is inevitably slowed by high I/O costs.
As a general rule, do not use an index on a column used in a query’s condition that returns a high percentage of data rows from the table. In other words, do not create an index on a column such as gender or any column that contains few distinct values. This is often referred to as a column’s cardinality, or the uniqueness of the data. High cardinality refers to a very unique quality and describes information such as identification numbers. Low-cardinality values are not very unique and refer to columns such as gender.
You can alter an index after it has been created using syntax that is similar to the CREATE INDEX
syntax. The types of alterations that you can manage with the statement differ between implementations, but they handle all the basic variations of an index in terms of columns, ordering, and such. The syntax follows:
ALTER INDEX INDEX_NAME
Take care when altering an existing index on production systems. In most cases, the index is immediately rebuilt, which creates an overhead in terms of resources. In addition, on most basic implementations, while the index is being rebuilt, it cannot be utilized for queries. This further hinders system performance.
An index can be dropped rather simply. Check your particular implementation for the exact syntax, but most major implementations use the DROP
command. Take care when dropping an index because it can drastically slow (or, conversely, improve) performance. The syntax follows:
DROP INDEX INDEX_NAME SQL> drop index birds_travel_idx4; Index dropped.
MySQL uses a slightly different syntax; you also specify the table name of the table that you are dropping the index from:
DROP INDEX INDEX_NAME ON TABLE_NAME
The most common reason for dropping an index is to improve performance. Remember that if you drop an index, you can re-create it later. You might need to rebuild an index to reduce fragmentation. Often you must experiment with indexes in a database to determine the route to best performance; this might involve creating an index, dropping it, and eventually re-creating it, with or without modifications.
In this hour, you learned that you can use indexes to improve the overall performance of queries and transactions performed within the database. As with the index of a book, database indexes enable specific data to be quickly referenced from a table. The most common method for creating indexes is to use the CREATE INDEX
command. Different types of indexes are available among SQL implementations, including unique indexes, single-column indexes, and composite indexes. You need to consider many factors when deciding on the index type that best meets the needs of your database. The effective use of indexes often requires some experimentation, a thorough knowledge of table relationships and data, and a little patience—being patient when you create an index can save minutes, hours, or even days of work later.
Q. Does an index actually take up space the way a table does?
A. Yes, an index takes up physical space in a database. In fact, an index can become much larger than the table for which the index was created.
Q. If you drop an index so that a batch job can complete faster, how long does it take to re-create the index?
A. Many factors are involved, such as the size of the index being dropped, the CPU usage, and the machine’s power.
Q. Should all indexes be unique?
A. No, unique indexes allow no duplicate values. You might need to allow duplicate values in a table.
The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. What are some major disadvantages of using indexes?
2. Why is the order of columns in a composite index important?
3. Should a column with a large percentage of NULL
values be indexed?
4. Is the main purpose of an index to stop duplicate values in a table?
5. True or false: The main reason for a composite index is for aggregate function usage in an index.
6. What does cardinality refer to? What is considered a column of high cardinality?
1. For the following situations, decide whether an index should be used and, if so, what type of index should be used:
Several columns, but a small table
Medium-size table, no duplicates allowed
Several columns, very large table, several columns used as filters in the WHERE
clause
Large table, many columns, a lot of data manipulation
2. Write a SQL statement to create an index called FOOD_IDX
in FOOD
on the FOOD_NAME
column.
3. Write a SQL statement to create an index called WINGSPAN_IDX
in BIRDS
on the WINGSPAN
column.
4. Drop the indexes you created.
5. Study the tables used in this book. List some good candidates for indexed columns, based on how a user might search for data.
6. Create some additional indexes on your tables as you like. Think about how your data might be searched and where indexes can create efficiencies.