Hour 23

Improving Database Performance

What You’ll Learn in This Hour:

In this hour, you learn how to tune your SQL statement for maximum performance using some simple methods. Although so far this book has focused on how to write SQL, it is just as important to learn how to write efficient SQL that can help keep the database running optimally. This hour focuses on simple steps you can take when working with various queries to ensure that your SQL performs optimally.

Defining SQL Statement Tuning

SQL statement tuning is the process of optimally building SQL statements to achieve results in the most effective and efficient manner. SQL tuning begins with the basic arrangement of the elements in a query. Simple formatting can play a large role in optimizing a statement.

SQL statement tuning mainly involves tweaking a statement’s FROM and WHERE clauses. The database server decides how to evaluate a query using mostly these two clauses. To this point, you have learned the basics of the FROM and WHERE clauses; now it is time to fine-tune them for better results and happier users.

Comparing Database Tuning and SQL Statement Tuning

Before learning about SQL statement tuning, you need to understand the difference between tuning a database and tuning the SQL statements that access the database.

Database tuning is the process of tuning the actual database, which encompasses the allocated memory, disk usage, CPU, I/O, and underlying database processes. Tuning a database also involves the management and manipulation of the database structure, such as the design and layout of tables and indexes. In addition, database tuning often involves modification of the database architecture to optimize the use of the hardware resources available. You need to take many other considerations into account when tuning a database, but the database administrator (DBA) and system administrator normally accomplish these tasks. The objective of database tuning is to ensure that the database has been designed in a way that best accommodates the expected activity within the database.

SQL tuning is the process of tuning the SQL statements that access the database. These SQL statements include database queries and transactional operations, such as inserts, updates, and deletes. The objective of SQL statement tuning is to formulate statements that most effectively access the database in its current state, taking advantage of database and system resources and indexes. The objective is to reduce the operational overhead of executing the query on the database.

Note

Tuning Is Not One-Dimensional

You must perform both database tuning and SQL statement tuning to achieve optimal results when accessing the database. A poorly tuned database might render your efforts in SQL tuning useless, and vice versa. Ideally, you should first tune the database, ensure that indexes exist where needed, and then tune the SQL code.

Formatting Your SQL Statement

Formatting your SQL statement sounds like an obvious task, but it is worth mentioning. A newcomer to SQL will probably neglect to address several considerations when building a SQL statement. The upcoming sections discuss the following points—some are common sense and others are not so obvious:

  •    The format of SQL statements for readability

  •    The order of tables in the FROM clause

  •    The placement of the most restrictive conditions in the WHERE clause

  •    The placement of join conditions in the WHERE clause

Formatting a Statement for Readability

Formatting a SQL statement for readability is fairly obvious, but many SQL statements are not written neatly. The neatness of a statement does not affect the actual performance (the database does not care how neat the statement appears), but careful formatting is the first step in tuning a statement. When you look at a SQL statement with tuning intentions, making the statement readable is always the first priority. How can you determine whether the statement is well written if it is difficult to read?

Consider some basic rules for making a statement readable:

  •    Always begin a new line with each clause in the statement. For example, place the FROM clause on a separate line from the SELECT clause. Then place the WHERE clause on a separate line from the FROM clause, and so on.

  •    Use tabs or spaces for indentation when arguments of a clause in the statement exceed one line.

  •    Use tabs and spaces consistently.

  •    Use table aliases when multiple tables are used in the statement. Using the full table name to qualify each column in the statement quickly clutters the statement and makes reading it difficult.

  •    Use remarks sparingly in SQL statements (if they are available within your specific implementation). Remarks are great for documentation, but too many of them clutter a statement.

  •    Begin a new line with each column name in the SELECT clause if many columns are selected.

  •    Begin a new line with each table name in the FROM clause if many tables are used.

  •    Begin a new line with each condition of the WHERE clause. You can easily see all conditions of the statement and the order in which they are used.

Following is an example of a statement that is hard to decipher:

SQL> select birds.bird_name,
  2  birds.incubation + birds.fledging "PARENTING",    food.food_name,
  3  migration.migration_location
  4  from birds, food, birds_food, migration, birds_migration
  5  where birds.bird_id = birds_food.bird_id
  6    and food.food_id = birds_food.food_id
  7    and birds.bird_id = birds_migration.bird_id
  8    and migration.migration_id = birds_migration.migration_id
  9    and birds.wingspan > 48
 10    and birds.incubation + birds.fledging > 60
 11    and migration.migration_location not in ('Mexico', 'Central America')
 12    and food.food_name = 'Fish'
 13    order by birds.bird_name;


BIRD_NAME            PARENTING FOOD_NAME  MIGRATION_LOCATION
------------------- ---------- ---------- -----------------------
Bald Eagle                 126 Fish       Southern United States
Brown Pelican              107 Fish       No Significant Migration
Common Loon                111 Fish       Southern United States
Double-crested Cormorant    71 Fish       Southern United States
Golden Eagle               125 Fish       No Significant Migration
Great Blue Heron            88 Fish       Southern United States
Great Blue Heron            88 Fish       South America
Great Egret                 75 Fish       Southern United States
Osprey                     100 Fish       Southern United States
Osprey                     100 Fish       South America
Ring-billed Gull            61 Fish       Southern United States

11 rows selected.

Here the statement has been reformatted for improved readability:

SQL> select b.bird_name,
  2         b.incubation + fledging "PARENTING",
  3         f.food_name,
  4         m.migration_location
  5  from birds b,
  6       food f,
  7       birds_food bf,
  8       migration m,
  9       birds_migration bm
 10  where b.bird_id = bf.bird_id
 11    and f.food_id = bf.food_id
 12    and b.bird_id = bm.bird_id
 13    and m.migration_id = bm.migration_id
 14    and b.wingspan > 48
 15    and b.incubation + b.fledging > 60
 16    and m.migration_location not in ('Mexico', 'Central America')
 17    and f.food_name = 'Fish'
 18  order by bird_name;

Both statements have the same content, but the second statement is much more readable. It has been greatly simplified through the use of table aliases, which have been defined in the query’s FROM clause. In addition, the second statement aligns the elements of each clause, making each clause stand out.

Again, making a statement more readable does not directly improve its performance, but it assists you in making modifications and debugging a lengthy and otherwise complex statement. Now you can easily identify the columns selected, the tables used, the table joins performed, and the conditions placed on the query.

Note

Always Establish Standards

It is especially important to establish coding standards in a multiuser programming environment. If all code is consistently formatted, shared code and modifications to code are much easier to manage.

Arranging Tables in the FROM Clause

The arrangement or order of tables in the FROM clause can make a difference, depending on how the optimizer reads the SQL statement. For example, experienced users have found that listing the smaller tables first and the larger tables last is more efficient.

Take a look at an example of the FROM clause:

FROM SMALLEST TABLE,
     LARGEST TABLE

Note

Check for Performance When Using Multiple Tables

Check your particular implementation for performance tips, if any, when listing multiple tables in the FROM clause.

Ordering Join Conditions

As you learned in Hour 14, “Joining Tables in Queries,” most joins use a base table to link tables that have one or more common columns on which to join. The base table is the main table that most or all tables are joined to in a query. The column from the base table is normally placed on the right side of a join operation in the WHERE clause. The tables joined to the base table are normally listed in order from smallest to largest, similar to the tables listed in the FROM clause.

If a base table does not exist, the tables should be listed from smallest to largest, with the largest tables on the right side of the join operation in the WHERE clause. The join conditions should be in the first position(s) of the WHERE clause, followed by the filter clause(s), as shown here:

FROM TABLE1,                                   Smallest table
     TABLE2,                                    to
     TABLE3                                     Largest table, also base table
WHERE TABLE1.COLUMN = TABLE3.COLUMN       Join condition
  AND TABLE2.COLUMN = TABLE3.COLUMN       Join condition
[ AND CONDITION1 ]                           Filter condition
[ AND CONDITION2 ]                           Filter condition

Caution

Be Restrictive with Your Joins

Because joins typically return a high percentage of rows of data from the table(s), you should evaluate join conditions after more restrictive conditions.

In this example, TABLE3 is used as the base table. TABLE1 and TABLE2 are joined to TABLE3 for both simplicity and proven efficiency.

Identifying the Most Restrictive Condition

The most restrictive condition is typically the driving factor in achieving optimal performance for a SQL query. The most restrictive condition is the condition in the WHERE clause of a statement that returns the fewest rows of data. Conversely, the least restrictive condition is the condition in a statement that returns the most rows of data. This hour is concerned with the most restrictive condition simply because it does the most filtering of the data that is to be returned by the query.

Your goal is for the SQL optimizer to evaluate the most restrictive condition first because the condition returns a smaller subset of data, thus reducing the query’s overhead. Effectively placing the most restrictive condition in the query requires knowing how the optimizer operates. In some cases, the optimizers seem to read from the bottom of the WHERE clause up. Therefore, you want to place the most restrictive condition last in the WHERE clause, which is the condition that the optimizer reads first. The following example shows how to structure the WHERE clause based on the restrictiveness of the conditions and the FROM clause based on the size of the tables:

FROM TABLE1,                                 Smallest table
     TABLE2,                                  to
     TABLE3                                   Largest table, also base table
WHERE TABLE1.COLUMN = TABLE3.COLUMN     Join condition
  AND TABLE2.COLUMN = TABLE3.COLUMN     Join condition
[ AND CONDITION1 ]                         Least restrictive
[ AND CONDITION2 ]                         Most restrictive

Caution

Always Test Your WHERE Clauses

If you do not know how your particular implementation’s SQL optimizer works, the DBA also does not know, or you do not have sufficient documentation, you can execute a large query that takes a while to run and then rearrange conditions in the WHERE clause. Be sure to record the time it takes the query to complete each time you make changes. You should have to run only a couple tests to figure out whether the optimizer reads the WHERE clause from the top to bottom or the bottom to top. Turn off database caching during the testing for more accurate results.

Following is an example using a phony table:

Table

TEST

Row Count

5,611

Conditions

WHERE LASTNAME = 'SMITH'

returns 2,000 rows

WHERE STATE = 'IN'

returns 30,000 rows

Most Restrictive Condition

WHERE LASTNAME = 'SMITH'

Following is the first query:

SELECT COUNT(*)
FROM TEST
WHERE LASTNAME = 'SMITH'
  AND STATE = 'IN';

  COUNT(*)
----------
     1,024

This is the second query:

SELECT COUNT(*)
FROM TEST
WHERE STATE = 'IN'
  AND LASTNAME = 'SMITH';

  COUNT(*)
----------
     1,024

Suppose that the first query completed in 20 seconds, whereas the second query completed in 10 seconds. Because the second query returned faster results and the most restrictive condition was listed last in the WHERE clause, you can safely assume that the optimizer reads the WHERE clause from the bottom up.

Note

Try to Use Indexed Columns

Using an indexed column as the most restrictive condition in a query is a good practice. Indexes generally improve a query’s performance.

Running Full Table Scans

A full table scan occurs when an index is not used by the query engine or no index is present for the table(s) being used. Full table scans usually return data much more slowly than when an index is used. The larger the table, the slower that data is returned when a full table scan is performed. The query optimizer decides whether to use an index when executing the SQL statement. In most cases, the index is used if it exists.

Some implementations have sophisticated query optimizers that can decide whether to use an index. Decisions such as this are based on statistics that are gathered on database objects, such as the size of an object and the estimated number of rows that are returned by a condition with an indexed column. Refer to your implementation documentation for specifics on the decision-making capabilities of your relational database’s optimizer.

Avoid full table scans when reading large tables. For example, a full table scan is performed when a table that does not have an index is read, which usually takes considerably more time to return the data. For most larger tables, consider using an index. For small tables, as previously mentioned, the optimizer might choose the full table scan instead of the index if the table is indexed. For a small table with an index, consider dropping the index and reserving that space for other needy objects in the database.

Tip

Simple Ways to Avoid Table Scans

The easiest and most obvious way to avoid a full table scan—aside from ensuring that indexes exist on the table—is to use conditions in a query’s WHERE clause to filter data to be returned.

This is a good time for a reminder of data that should be indexed. Consider, for example, a book written about the BIRDS database: You would never index words or data such as and, the, bird, a specific weight, wingspan, and so forth. Indexing columns that do not contain many unique values is not beneficial. Instead, you want to index columns that tend to have many unique values, especially those used to search for data:

  •    Columns used as primary keys

  •    Columns used as foreign keys

  •    Columns frequently used to join tables

  •    Columns frequently used as conditions in a query

  •    Columns that have a high percentage of unique values

Tip

Table Scans Are Not Always Bad

Sometimes full table scans are beneficial. You want to perform them on queries against small tables or queries whose conditions return a high percentage of rows. The easiest way to force a full table scan is to avoid creating an index on the table.

Identifying Other Performance Considerations

Other performance considerations come into play as well when tuning SQL statements. The next sections discuss the following concepts:

  •    Using the LIKE operator and wildcards

  •    Avoiding the OR operator

  •    Avoiding the HAVING clause

  •    Avoiding large sort operations

  •    Using stored procedures

  •    Disabling indexes during batch loads

Using the LIKE Operator and Wildcards

The LIKE operator is a useful tool that places conditions on a query in a flexible manner. Using wildcards in a query can eliminate many possibilities of data that should be retrieved. Wildcards are flexible for queries that search for similar data (data that is not equivalent to an exact value specified).

Suppose you want to write a query using the fictitious table EMPLOYEE_TBL and selecting the EMP_ID, LAST_NAME, FIRST_NAME, and STATE columns. You need to know the employee identification, name, and state for all the employees with the last name Stevens. Three SQL statements with different wildcard placements serve as examples.

Query 1 follows:

SELECT EMPLOYEEID, LASTNAME, FIRSTNAME, STATE
FROM EMPLOYEES
WHERE LASTNAME LIKE 'STEVENS';

Next is Query 2:

SELECT EMPLOYEEID, LASTNAME, FIRSTNAME, STATE
FROM EMPLOYEES
WHERE LASTNAME LIKE '%EVENS%';

The final query is Query 3:

SELECT EMPLOYEEID, LASTNAME, FIRSTNAME, STATE
FROM EMPLOYEES
WHERE LASTNAME LIKE 'ST%';

The SQL statements do not necessarily return the same results. More than likely, Query 1 returns fewer rows than the other two queries and takes advantage of indexing. Query 2 and Query 3 are less specific about the desired returned data, thus making them slower than Query 1. In addition, Query 3 is probably faster than Query 2 because it specifies the first letters of the string for which you are searching. (Additionally, the column LASTNAME is likely to be indexed.) Query 3 could thus potentially take advantage of an index.

With Query 1, you might retrieve all individuals with the last name Stevens—but Stevens can be spelled in different ways. Query 2 picks up all individuals with the last name Stevens and its various spellings. Query 3 also picks up any last name that starts with ST; this is the only way to ensure that you receive all the Stevens (or Stephens).

Avoiding the OR Operator

Rewriting the SQL statement using the IN predicate instead of the OR operator consistently and substantially improves data retrieval speed. Your implementation tells you about tools you can use to time or check the performance between the OR operator and the IN predicate. This section gives you an example of how to rewrite a SQL statement by removing the OR operator and replacing it with the IN predicate. Refer to Hour 13, “Using Operators to Categorize Data,” for more on using the OR operator and the IN predicate.

The following query uses the OR operator:

SELECT EMPLOYEEID, LASTNAME, FIRSTNAME
FROM EMPLOYEES
WHERE CITY = 'INDIANAPOLIS IN'
   OR CITY = 'KOKOMO'
   OR CITY = 'TERRE HAUTE';

This is the same query using the IN operator:

SELECT EMPLOYEEID, LASTNAME, FIRSTNAME
FROM EMPLOYEES
WHERE CITY IN ('INDIANAPOLIS IN', 'KOKOMO',
               'TERRE HAUTE');

The SQL statements retrieve the same data. However, through testing and experience, you will find that the data retrieval is measurably faster by replacing OR conditions with the IN predicate, as in the second query.

Avoiding the HAVING Clause

The HAVING clause is useful for parsing the result of a GROUP BY clause; however, you can’t use it without cost. Using the HAVING clause gives the SQL optimizer extra work, which results in extra time. The query must be concerned not only with grouping result sets, but also with parsing those result sets down through the restrictions of the HAVING clause. The queries performed in the BIRDS database are fairly simple and do not touch many rows of data. However, adding the HAVING clause in a larger database can introduce some overhead, especially when the HAVING clause has more complex logic and a higher number of groupings to be applied. If possible, write SQL statements without using the HAVING clause, or design the HAVING clause restrictions so that they are as simple as possible.

Avoiding Large Sort Operations

Large sort operations mean using the ORDER BY, GROUP BY, and HAVING clauses. Subsets of data must be stored in memory or to disk (if not enough space is available in allotted memory) whenever sort operations are performed. You must sort data often. The main point is that these sort operations affect a SQL statement’s response time. Because you cannot always avoid large sort operations, it is best to schedule queries with large sorts as periodic batch processes during off-peak database usage; this ensures that the performance of most user processes is not affected.

Using Stored Procedures

You should create stored procedures for SQL statements that are executed on a regular basis—particularly large transactions or queries. Stored procedures are SQL statements that are compiled and permanently stored in the database in an executable format.

Normally, when a SQL statement is issued in the database, the database must check the syntax and convert the statement into an executable format within the database (called parsing). After it is parsed, the statement is stored in memory; however, it is not permanent. When other operations need memory, the statement might be ejected from memory. For stored procedures, the SQL statement is always available in an executable format and remains in the database until it is dropped like any other database object.

Using Views

Not everyone agrees on whether using views improves or degrades query performance. Mostly this depends on the situation. Consider using views to improve performance. It is easy to test a query with or without views, and quite often you might find a huge performance opportunity. If used under the right circumstances, views create a subset of data from base database tables using the indexes that have already been defined and then create a subset of data that is stored in memory on the server instead of hard disk space. Memory is much faster to access than physical disk space. With a view, you also end up querying a small percentage of the data that exists in the base database tables, which typically improves query performance.

Disabling Indexes During Batch Loads

When a user submits a transaction to the database (INSERT, UPDATE, or DELETE), an entry is made to both the database table and any indexes associated with the table being modified. This means that if the EMPLOYEES table has an index and a user updates the EMPLOYEES table, an update also occurs to the index associated with the EMPLOYEES table. In a transactional environment, having a write to an index occur every time a write to the table occurs is usually not an issue.

During batch loads, however, an index can cause serious performance degradation. A batch load might consist of hundreds, thousands, or millions of manipulation statements or transactions. Because of their volume, batch loads take a long time to complete and are normally scheduled during off-peak hours—usually during weekends or evenings. Sometimes, simply dropping the indexes on tables associated with batch loads can dramatically decrease the time it takes a batch load to complete. Of course, you would need to re-create the indexes on the tables after the batch load. When you drop the indexes, changes are written to the tables much faster, so the job completes more quickly. When the batch load is complete, you should rebuild the indexes. During the rebuild, the indexes are populated with all the appropriate data from the tables. Although it might take a while for an index to be created on a large table, the overall time expended is less if you drop the index and rebuild it.

Another advantage to rebuilding an index after a batch load completes is the reduction of fragmentation that is found in the index. When a database grows, records are added, removed, and updated. Fragmentation can then occur. For any database that experiences a lot of growth, it is a good idea to periodically drop and rebuild large indexes. When you rebuild an index, the number of physical extents that comprise the index decreases, less disk I/O is involved to read the index, the user gets results more quickly, and everyone is happy.

Using Cost-Based Optimization

Often you inherit a database that is in need of SQL statement tuning. These existing systems might have thousands of SQL statements executing at any given time. To optimize the amount of time spent on performance tuning, you need a way to determine what queries are most beneficial. This is where cost-based optimization comes into play. Cost-based optimization attempts to determine which queries are most costly in relation to the overall system resources spent. For instance, say you measure cost by execution duration and you have the following two queries with their corresponding run times:

SELECT * FROM EMPLOYEES
WHERE FIRSTNAME LIKE '%LE%'             2 sec

SELECT * FROM EMPLOYEES
WHERE FIRSTNAME LIKE 'G%';              1 sec

Initially, it might appear that the first statement is the one you need to concentrate your efforts on. However, what if the second statement is executed 1,000 times an hour but the first is performed only 10 times in the same hour? This makes a huge difference in how you allocate your time.

Cost-based optimization ranks SQL statements in order of total computational cost. Computational cost is easily determined based on some measure of query execution (duration, number of reads, and so on) multiplied by the number of executions over a given period:

Total Computational Cost = Execution Measure × (Number of Executions)

You get the most overall benefit by first tuning the queries with the most total computational cost. Looking at the previous example, if you cut each statement execution time in half, you can easily figure out the total computational savings:

Statement #1: 1 second × 10 executions = 10 seconds of computational savings

Statement #2: .5 second × 1000 executions = 500 seconds of computational savings

Now you can more easily understand why you should spend your valuable time on the second statement instead of the first. Not only have you worked to optimize your database, but you’ve optimized your time as well.

Tip

Performance Tools

Many relational databases have built-in tools that assist in SQL statement database performance tuning. For example, Oracle has a tool called EXPLAIN PLAN that shows the user the execution plan of a SQL statement. Another tool in Oracle that measures the actual elapsed time of a SQL statement is TKPROF. In SQL Server, the Query Analyzer has several options to give you an estimated execution plan or statistics from the executed query. Check with your DBA and implementation documentation for more information on available tools.

Summary

In this hour, you learned the meaning of tuning SQL statements in a relational database. You learned about two basic types of tuning: database tuning and SQL statement tuning—both of which are vital to the efficient operation of the database and the SQL statements within it. Each type of tuning is equally important and the overall performance of the database cannot be optimized without the other.

You read about methods for tuning a SQL statement, starting with a statement’s actual readability, which does not directly improve performance but does aid the programmer in the development and management of statements. One of the main issues that arises in SQL statement performance involves the use of indexes. Sometimes you should use indexes; other times you should avoid them. For all measures taken to improve SQL statement performance, you need to understand the data itself, the database design and relationships, and the users’ needs in accessing the database.

Q&A

Q. By following what I have learned about performance, what realistic performance gains can I expect to see in data retrieval time?

A. Realistically, you can see performance gains from fractions of a second to minutes, hours, or even days.

Q. How can I test my SQL statements for performance?

A. Each implementation should have a tool or system to check performance. Oracle7 was used to test the SQL statements in this book. Oracle offers several tools for checking performance, including the EXPLAIN PLAN, TKPROF, and SET commands. Check your particular implementation for tools that are similar to Oracle’s.

Workshop

The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

  1. 1. Is using a unique index on a small table beneficial?

  2. 2. What happens when the optimizer chooses not to use an index on a table when a query has been executed?

  3. 3. Should the most restrictive clause(s) be placed before or after the join condition(s) in the WHERE clause?

  4. 4. When is the LIKE operator considered bad in terms of performance?

  5. 5. How can you optimize batch load operations in terms of indexes?

  6. 6. Which three clauses are the cause of sort operations that degrade performance?

Exercises

  1. 1. Rewrite the following SQL statements to improve their performance. Use the fictitious EMPLOYEE_TBL and EMPLOYEE_PAY_TBL, as described here:

    EMPLOYEE_TBL
    EMP_ID        VARCHAR(9)     NOT NULL     Primary key,
    LAST_NAME     VARCHAR(15)    NOT NULL,
    FIRST_NAME    VARCHAR(15)    NOT NULL,
    MIDDLE_NAME   VARCHAR(15),
    ADDRESS       VARCHAR(30)    NOT NULL,
    CITY          VARCHAR(15)    NOT NULL,
    STATE         VARCHAR(2)     NOT NULL,
    ZIP           INTEGER(5)     NOT NULL,
    PHONE         VARCHAR(10),
    PAGER         VARCHAR(10),
    CONSTRAINT EMP_PK PRIMARY KEY (EMP_ID)
    EMPLOYEE_PAY_TBL
    EMP_ID            VARCHAR(9)     NOT NULL     primary key,
    POSITION          VARCHAR(15)    NOT NULL,
    DATE_HIRE         DATETIME,
    PAY_RATE          DECIMAL(4,2)   NOT NULL,
    DATE_LAST_RAISE   DATETIME,
    SALARY            DECIMAL(8,2),
    BONUS             DECIMAL(8,2),
    CONSTRAINT EMP_FK FOREIGN KEY (EMP_ID)
    REFERENCES EMPLOYEE_TBL (EMP_ID)
    1. SELECT EMP_ID, LAST_NAME, FIRST_NAME,
      
                 PHONE
      
            FROM EMPLOYEE_TBL
             WHERE SUBSTRING(PHONE, 1, 3) = '317' OR
                  SUBSTRING(PHONE, 1, 3) = '812' OR
                  SUBSTRING(PHONE, 1, 3) = '765';
    2. SELECT LAST_NAME, FIRST_NAME
             FROM EMPLOYEE_TBL
              WHERE LAST_NAME LIKE '%ALL%;
    3. SELECT E.EMP_ID, E.LAST_NAME, E.FIRST_NAME,
      
                 EP.SALARY
            FROM EMPLOYEE_TBL E,
            EMPLOYEE_PAY_TBL EP
            WHERE LAST_NAME LIKE 'S%'
                 AND E.EMP_ID = EP.EMP_ID;
  2. 2. Add another table called EMPLOYEE_PAYHIST_TBL that contains a large amount of pay history data. Use the following table to write a series of SQL statements to address the following problems. Be sure to ensure that the queries you write perform well.

    EMPLOYEE_PAYHIST_TBL
    PAYHIST_ID        VARCHAR(9)     NOT NULL     primary key,
    EMP_ID            VARCHAR(9)     NOT NULL,
    START_DATE        DATETIME       NOT NULL,
    END_DATE          DATETIME,
    PAY_RATE          DECIMAL(4,2)   NOT NULL,
    SALARY            DECIMAL(8,2)   NOT NULL,
    BONUS             DECIMAL(8,2)   NOT NULL,
    CONSTRAINT EMP_FK FOREIGN KEY (EMP_ID)
    REFERENCES EMPLOYEE_TBL (EMP_ID)
    1. Find the SUM of the salaried versus nonsalaried employees, by the year in which their pay started.

    2. Find the difference in the yearly pay of salaried employees versus nonsalaried employees, by the year in which their pay started. Consider the nonsalaried employees to be working full time during the year (PAY_RATE × 52 × 40).

    3. Find the difference in what employees make now versus what they made when they started with the company. Again, consider the nonsalaried employees to be full time. Also consider that the employees’ current pay is reflected in EMPLOYEE_PAY_TBL and also EMPLOYEE_PAYHIST_TBL. In the pay history table, the current pay is reflected as a row with the END_DATE for pay equal to NULL.