Improved Query Performance in DB2 11 for z/OS

by Terry Purcell

The “out-of-the-box” performance improvement message in DB2^® 10 for z/OS^® resonated very well with customers and provided motivation for the DB2 for z/OS optimizer development team to look for opportunities to continue this theme in DB2 11 for z/OS (hereafter referred to as DB2 11).

Customers have rightly pointed out that “out-of-the-box” isn’t really true for the optimizer, given that static SQL requires a REBIND to take advantage of new access paths and/or runtime optimizations. However, minimal intervention such as REBIND is certainly more desirable than requiring query rewrites or the creation of additional indexes to achieve performance gains. So while we aren’t promising that all DBA intervention is eliminated, we certainly received the message loud and clear that “closer to out of the box” is preferred. For dynamic SQL, the first execution results in a new prepare, so dynamic SQL can actually achieve the optimizations out of the box. The vast majority of business intelligence and analytics workloads use dynamic SQL, and with the ad hoc nature of SQL requests in these environments, minimal effort to achieve performance improvements is certainly preferred.

One major goal for the optimizer development team was therefore to increase focus on query performance improvements in DB2 11 that required minimal action to exploit. Given that there is considerable diversity in customer workloads, the first question is how do you identify what would benefit the broadest set of customers?

Fortunately, there are operations that are common to a majority of workloads but are not necessarily part of the DB2 optimizer. DB2 11 for z/OS delivers performance enhancements to decompression of data rows and to sort (which I’ll discuss briefly), as well as numerous internal code path optimizations that are not the focus here. But each of these improvements can enhance query performance generally.

The first task for optimizer development was to identify common query patterns. This job involved investigating customer workloads that DB2 development had obtained for performance testing, analyzing IBM and other enterprise resource planning (ERP) vendor-packaged applications and query generators, obtaining input from level 2 support on challenging query patterns, and studying lessons learned from customer proofs of concept (POCs) and migrations from other platforms. The results of this analysis were used to identify common patterns to target for DB2 11.

The following sections highlight the main query performance enhancements in DB2 11 for z/OS that require the least intervention to exploit, since these types of enhancements are of most interest for business intelligence and analytics workloads.

Predicate Indexability

Over the years, customers have been educated to understand that indexable predicates are the most efficient, while stage 2 predicates are least efficient. And, in traditional applications, developers were instructed by their DBAs to write their predicates so that matching index access was achievable for their queries. However, query generators, ERP applications, rapid development, and geographically dispersed end users can all result in limited ability for a DBA to control the quality of the SQL. So it is clear that there is a preference for DB2 to internally perform these rewrites or to optimize predicate performance.

DB2 11 rewrites some of the more common stage 2 local predicates, including the following, to an indexable form:

YEAR(DATE_COL)
DATE(TIMESTAMP_COL)
value BETWEEN C1 AND C2
SUBSTR(C1,1,10) ← SUBSTR from position 1 only

DB2 9 for z/OS already delivered the ability to create an index on an expression; however, this feature required the developer or DBA to identify the candidate queries and create the targeted indexes. The DB2 11 predicate rewrites allow optimal performance without the need for this intervention. In some cases, an index on expression may result in better performance for the query, and, for this reason, if an index on expression exists, the predicate will not be rewritten to an indexable form in DB2 11. Index on expression carries other overhead, such as resolution of the expression during insert or update, and also applicability only to the targeted expression. Coupled with the need for the DBA or end user to identify the need for the index, the anticipation is that many of these use cases today do not already have an applicable index on expression.

CASE expressions are also enhanced to support indexability, as Figure 1 shows. It is increasingly common to see complex resolution of code values into their business value being included in a view or table expression for use within a query, rather than the use of a code table or dimension table for this purpose. When used in predicates, DB2 11 will now be able to use these predicates as indexable rather than stage 2 predicates as in prior releases.

Other patterns that have been optimized include OR and IN predicate combinations, which are common in ERP applications. In some cases, single matching index access is now possible where previously only multi-index access was available. And in other cases, multi-index access is available where matching index access was not possible before.

Figure 1: CASE expression indexability

In addition, query generators are known to add dummy WHERE clause predicates to simplify their generation framework—for example, WHERE 1=1. DB2 11 enhances these patterns by removing unnecessary “always true” and “always false” predicates in some instances. Customers who have historically used query tricks such as OR 0=1 or OR 0<>0 should note that these tricks will continue to be honored by DB2.

The final enhancement within this topic is related to predicate pushdown. Prior to DB2 11, only simple predicates were pushed inside materialized views and table expressions—for example, views containing DISTINCT or UNION. DB2 11 extends predicate pushdown to include OR predicates, stage 2 predicates, and outer join ON clause predicates. This change allows the filtering to be applied before materialization, benefiting those workloads that include views or table expressions containing DISTINCT or GROUP BY, and/or when these views/table expressions are used in outer joins.

The aforementioned predicate patterns cover a broad array of ERP applications as well as known customer pain points.

Duplicate Removal

Duplicate removal using either DISTINCT or GROUP BY is another common usage in query processing. DB2 11 for z/OS has been enhanced to improve performance of DISTINCT, GROUP BY, and also non-correlated subqueries when an index exists to provide order. In prior releases, DB2 has been able to avoid or minimize the sort overhead for duplicate removal by scanning a candidate index in sequence.

Figure 2 provides a simplified example of how DB2 11 can use the non-leaf key information to skip forward to find the next distinct key value. This technique is applicable for DISTINCT, GROUP BY, and non-correlated subqueries. Prior to DB2 11, DB2 would scan the leaf pages and remove duplicates internally before returning the data to the application. The main difference with DB2 11 is that DB2 can remove the duplicates earlier by simply skipping over them within the index, regardless of whether the distance between distinct entries is a short or large distance—although a greater performance benefit is gained when whole index leaf pages can be skipped over.

Figure 2: DB2 11 index key skipping example

Optimization of DISTINCT and other duplicate removal patterns extends to join queries in DB2 11 where the join is coded as an existence check. In such queries, any duplicates introduced from the join are not required for the final result. Figure 3 provides two examples of the targeted query patterns. In DB2 11, DB2 can “early-out” from the join to the inner table as soon as the first match is found rather than processing all matching rows. Prior to DB2 11, this type of early-out was available only to correlated EXISTS subqueries that were transformed to a join.

Figure 3: Early-out join examples

Correlated subqueries may also see a performance improvement in DB2 11 due to optimized usage of a subquery cache that has existed in DB2 since V2. Figure 4 provides a very simple example of a common query pattern used in temporal or time-based implementations whereby the most recent (or least recent) version is required by the query. DB2 11 optimizes this pattern by recognizing when order is provided and adjusting the cache size accordingly.

Figure 4: Correlated subquery pattern optimized in DB2 11 for z/OS

Hash Join and Sparse Index

Most database management systems provide a hash join method for efficient join performance when a suitable join index does not exist, or if a large percentage of the rows will be joined. In these cases, hash join can be more efficient than either nested loop join or merge scan join. Sparse index has existed since DB2 V4 for non-correlated subqueries; it was exploited from DB2 V7 for star join and was opened up to non-star join in DB2 9. DB2 10 added hash support, but only in cases where no index existed to support a join.

DB2 11 extends hash join support by allowing it to be chosen in more cases and by optimizing memory usage, including run-time validation of available system memory and appropriate fallback to sparse index when the result cannot be contained in memory.

While these enhancements are available after first REBIND for static SQL or next execution for dynamic SQL, it is important to note that the DB2 subsystem parameter MXDTCACH controls the exploitation of hash join. The MXDTCACH default is set conservatively at 20 MB; customers may consider increasing this value to gain further improvement. Although skill may be required for optimal setting of MXDTCACH, the accounting and statistics reports do provide a simple record of the number of sparse indexes where a workfile was built. This information, along with the number of synchronous reads in the sort workfile buffer pool, can be used as a guide to increase MXDTCACH or to reduce VPSEQ in the sort buffer pool.

Page Range Screening and Indexing for Partitioned Table Spaces

When local predicates exist on the partitioning columns, DB2 is able to limit the access to only the qualified partitions. DB2 11 extends this support to join predicates. This and other DB2 11 enhancements should increase the scenarios in which data-partitioned secondary indexes (DPSIs) can be used in place of non-partitioned secondary indexes (NPIs or NPSIs) to achieve improved utility performance without compromising query performance.

Given a partitioning scheme in which the table is partitioned by columns used in queries as join predicates, DB2 11 can use those predicates to filter out unnecessary partitions and probe only the qualified part(s). This enhancement is most effective when the index for the join is a DPSI and the partitioning column(s) either are excluded from the index or are not the leading index column(s). Prior to DB2 11, optimal join performance could be achieved for this partitioning scheme only if the index was created as an NPI or if the index was partitioned but the partitioning column(s) was the leading index column(s) (meaning the index was a partitioning index, or PI, and not a DPSI).

Figure 5 illustrates a join between T1 and T2, where the inner table of the join (T2) uses a DPSI (on C1) and there exists a join predicate on the non-indexed YEAR column. In this example, each probe to the inner table can exploit the join predicate on the partitioning column to ensure that only the necessary partition is accessed for each inner table access. Prior to DB2 11, each inner table access would probe all partitions, with the YEAR join predicate being resolved after the data page was accessed. It is expected that customers were not accepting such poor performance previously, and thus the benefit of this enhancement is likely that a customer can convert to use a DPSI—which can ultimately benefit utility performance if this results in fewer NPIs on a table.

Figure 5: Page range screening from join

To further expand the use of DPSIs in DB2 11, there are additional enhancements for customers whose workloads already exploit DPSIs or who are considering moving more of their NPIs to DPSIs. A word of caution, however: While DB2 11 increases the sweet spot for DPSIs, there are still several scenarios where having one index b-tree structure (as with NPIs) provides considerably better query performance than one b-tree per partition.

When a query contains ORDER BY, GROUP BY, or DISTINCT and the query requires a subset of the table rows, an index is generally a more efficient way to provide that order rather than introducing a sort. For a partitioned table space, both a PI and an NPI can provide order that can match an ORDER BY, GROUP BY, or DISTINCT. However, a DPSI provides order only within a partition, not across the entire table space. DB2 can use one of two methods to allow a DPSI to provide order without requiring a sort:

Have parallelism provide order. With this method, each child task processes one partition, and the parent task merges the results to provide one ordered set.
Use “DPSI merge” (also known as “DPSI return in order”). With this method, DB2 processes each partition serially, but a merge process returns the result in order across all partitions.

It is this second merge process that has been enhanced in DB2 11. First, index on expression can now exploit the DPSI merge. Second, there are general performance improvements to DPSI merge, such as improved index lookaside (and thus getpage avoidance).

The next enhancement to DPSIs involves using parallelism for improved join performance when the partitioned table space is the inner table of a join. To benefit from this enhancement, the partitioning scheme needs to be based on columns that are not included as join predicates in a query. Figure 6 shows an example of this enhancement, which is referred to as a part-level (i.e., partition-level) join. In this example, the table is partitioned by YEAR (note that each partition is numbered from 2009 to 2013), although the query includes only join predicates on C1.

Figure 6: Part-level join

With the part-level join, each parallel child task processes only one partition, and the composite (outer) table is shared or replicated to each child task. This allows each child task to act as if it is simply a two-table join (involving one partition) rather than a join to multiple partitions.

One of the traditional complaints with DPSI join performance has been that the join would result in a large amount of random I/O since each partition was probed on the inner table of the join. However, these two DB2 11 enhancements, page range screening from join predicates and part-level join, should improve join performance for DPSIs and potentially allow more workloads to convert NPIs to exploit DPSIs.

RUNSTATS Enhancements

RUNSTATS is crucial to the DB2 for z/OS optimizer to ensure that accurate information is used for access path selection. And while running RUNSTATS is not a feature of DB2 that we can claim is devoid of DBA or user intervention, many customers do have automated or regular schedules for RUNSTATS collection. Thus, any enhancements that simplify integration with their schedule and/or improve the ability to recognize important statistics to collect are important enhancements.

DB2 10 delivered statistics profiles, which let customers combine the complexity of individualized statistics into a stored profile, which subsequent RUNSTATS executions could use to ensure consistent statistics collection. DB2 11 supports integrating these profiles into a LISTDEF, and when the USE PROFILE keyword is added to the LISTDEF, tables with a profile will collect their specialized statistics, while those without will continue to collect the basic statistics.

Next is simplified syntax in DB2 11 to clear out the statistics for a table and its associated index. The RUNSTATS keyword RESET ACCESSPATH will reset all statistics for the named objects back to -1s and clear out any specialized frequencies, cardinalities, or histograms from the catalog. Once you have a clean slate, you can then re-collect the desired statistics.

Finally, the major RUNSTATS enhancement is not actually an enhancement to RUNSTATS but guidance provided by the DB2 for z/OS optimizer about what statistics were missing as part of a BIND/REBIND, dynamic prepare, or explain. While determining the access path, the optimizer will externalize the fact that statistics that could be used by the optimizer were missing or conflicting. This information will be externalized to the catalog (from New Function Mode) and/or a new explain table (if that table exists). The DBA can use this information to determine what RUNSTATS to collect to potentially improve the optimizer’s chance at choosing an optimal performing access path. Alternatively, IBM Optim™ Query Workload Tuner can interpret the information and provide the RUNSTATS input. Having the statistics recommendations externalized as part of general query processing is a significant step forward compared with the task of individually analyzing a query or having tooling collect a workload for analysis.

Additional Performance Improvements

The focus to this point has been on the targeted performance improvements in DB2 11, since readers need to understand some detail to determine the applicability to their workload. DB2 11 also provides performance improvements that apply generally to an entire workload, and these require less explanation.

In recent DB2 releases, sort performance has been a high priority, given that most workloads involve sorting and this is an area of resource contention because all tasks converge on the same sort buffer pool and datasets. DB2 11 extends in-memory sort capabilities for smaller sorts and temporary storage of intermediate results for some correlated table expressions and subqueries. There are also general code path length optimizations for sort and reduced workfile usage for the final sort. In simple terms, these changes mean reduced contention for workfile resources and improved performance for workloads that involve sorting.

Decompression performance is also improved, which obviously benefits workloads that issue queries against compressed table spaces. As with the sorting enhancement, nothing is needed to benefit from the enhancement.

Similarly, DB2 11 includes numerous internal DB2 optimizations, such as those to the DECFLOAT data type, which is used extensively in XML and also when workloads involve local or join predicates with mismatched data types, such as character to numeric. In such cases, DB2 uses DECFLOAT as an intermediate data type to cast for the conversion. DECFLOAT users should see considerable performance improvement in DB2 11.

Summary

DB2 11 for z/OS includes other query optimization enhancements that are not specifically outlined in this paper because they either require greater DBA involvement to exploit or are less focused on improving performance for business intelligence and analytics workloads.

What you should take away from reading this article, however, is that DB2 11 for z/OS is full of query performance enhancements that require minimal intervention to exploit—more so than in any recent DB2 release. We certainly listened to the feedback from the (almost) “out-of-the-box” message from DB2 10 for z/OS and expect similar feedback from the query performance focus of DB2 11 for z/OS.