Few IT professionals can have missed the big data phenomenon that has manifested itself in recent years. Industry publications and IT analysts have devoted a huge percentage of their output to the subject (creating a big data challenge all their own in the process). There can be little doubt that the advent of new technologies and methods of customer and business interaction have created unique challenges for organizations wishing to create actionable insight from very large amounts of unstructured data. Innovative tools and techniques have been developed to cope with these “big data” challenges (and indeed some of them are discussed in this paper, in the “Hadoop and Big Data Support” section).
However, beyond this somewhat narrow definition of big data, many organizations have been dealing with the challenges of processing, maintaining, and analyzing ever-increasing amounts of more traditionally structured data for many years. The inherent scalability and resilience of IBM® DB2® for z/OS® and the underlying System z® platform have proven to be a compelling combination for such applications, and IBM continues to invest in extending DB2’s capabilities with each new release.
From transparent archiving to greater in-memory scalability through the use of 2 GB page frames, DB2 11 for z/OS, the latest release of IBM’s flagship database, contains many new features specifically designed to help customers to address the challenges of managing traditional big data. A wealth of material exists on the technical changes within DB2 11, but finding descriptions of how those new features will improve your business results can be a challenge. The main body of this paper provides a high-level overview of the major new features from an IT executive’s perspective, with emphasis on the underlying business value that DB2 11 can deliver.
This is the fourth paper in this series, with previous editions highlighting the business value offered by DB2 for z/OS V8.1, DB2 9 for z/OS, and DB2 10 for z/OS:
NOTE: Throughout the remainder of this document, all references to “DB2 9,” “DB2 10,” and “DB2 11” refer to the relevant release of IBM DB2 for z/OS.
In this section, we take a detailed look at the major features of DB2 11 for z/OS and see how many of IBM’s most innovative enterprise customers plan to use them to deliver an enhanced IT service to the business. Many of these enhancements can deliver benefits “out of the box,” with little or no effort required to begin exploiting them, reducing the time-to-value for a DB2 11 upgrade. See “DB2 11 New Features by Implementation Effort” (opposite) for a breakdown of the effort required to exploit each new feature.
This section is organized around the key DB2 11 themes:
Even in the most favorable economic climate, businesses need to control costs and increase efficiency to improve their bottom line. In today’s increasingly challenging business environment, this continues to be a key factor for the survival and success of enterprises of all sizes.
This section examines the major DB2 11 enhancements that are aimed at delivering the highest efficiency for core IT systems that rely on DB2, a key design objective for the new release. These features can help reduce ongoing operational costs, improve developer and DBA productivity, and enhance customer experience by increasing performance and delivering a more responsive application.
Most DB2 for z/OS customers operate on a CPU usage-based charging model, so any increases or decreases in the amount of CPU required to run DB2 applications can have a direct and very significant impact on overall operational costs.
Traditionally, IBM has tried to limit the additional CPU cost of adding new functionality into each release, keeping the net CPU impact below 5 percent. The move to a 64-bit computing platform in DB2 for z/OS Version 8 was an exception to this rule and introduced some significant processing overheads that resulted in many customers experiencing net CPU increases of 5 to 10 percent following the upgrade.
DB2 9 for z/OS helped to redress the balance somewhat by delivering modest CPU improvements for many large customers, but the advent of DB2 10 completely changed the picture. IBM delivered the most aggressive performance improvements of any DB2 release in the past 20 years, with many customers seeing net CPU savings of 5 to 10 percent or more in their traditional DB2 online transaction processing (OLTP) workload without any application changes being required.1 Unsurprisingly, these savings proved to be very popular and are consistently quoted as being one of the major reasons for customers to upgrade to DB2 10.
One of the most compelling features of DB2 11 is the number of enhancements that can deliver business benefit with little or no change being required to existing applications. The lists below categorize the covered DB2 11 features in this paper according to the amount of effort required to exploit them:
Minor Implementation Effort – Immediate. These features are available immediately after upgrading to DB2 11, with no database or application changes required. A REBIND may be required.
Minor Implementation Effort – Deferred. These features do not require any database or application changes but will be available only after the DB2 system has been placed in New Function Mode.
Significant Database/System Changes Required. These features require some changes to be made to DB2 objects and structures (typically by the DBA), but no application changes. These changes are typically quicker and less expensive to implement/test than application changes.
Significant Application Changes Required. These enhancements require some degree of application change in order to implement and will therefore be the most expensive to implement and test.
Minor Implementation Effort – Immediate
CPU reductions
Application compatibility
pureXML enhancements
Optimizer and query performance improvements
Data sharing performance enhancements
Enhanced dynamic schema change (some features)
BIND/REBIND enhancements
Minor Implementation Effort – Deferred
zEC12 exploitation (also requires DB2 to be running on a zEC12-class server)
Temporal data enhancements
Utility enhancements
Enhanced dynamic schema change (some features)
Significant Database/System Changes Required
Java stored procedure enhancements
Extended log record addressing
Security enhancements
Significant Application Changes Required
Transparent archiving
Global variables
Variable arrays
SQL aggregation improvements
Hadoop and big data support
IBM has further developed the CPU reduction theme within DB2 11, with initial savings of up to 5 percent expected for customers running simple OLTP workloads. Significantly higher savings are possible for complex OLTP and query workloads, as discussed below. Because these improvements are due to internal DB2 code optimization, they are available in DB2 11 Conversion Mode, without the need for any application changes. Additional CPU savings are possible once customers begin to use some of the other DB2 11 enhancements that require application change, as described elsewhere in this section.
Some workloads will benefit more than others from the performance enhancements offered by DB2 11. Figure 1 breaks down the anticipated CPU savings by workload type.
Figure 1: DB2 11 CM vs. DB2 10 NFM – Expected CPU savings by workload type
The most significant benefits are expected to be seen within query workloads. Complex reporting queries can see up to 25 percent savings for uncompressed tables and up to 40 percent for queries on compressed tables. Reporting queries with heavy sort processing may also see additional DB2 CPU savings.
Traditional OLTP workloads are also likely to benefit from the efficiency enhancements in DB2 11. Savings of up to 5 percent are expected for simple OLTP,2 with reductions of up to 10 percent for more complex transactions. Finally, update-intensive batch workloads may enjoy CPU reductions of 5 to 15 percent.
Figures 2 and 3 depict some actual observed CPU reductions for sample workloads, run as part of IBM’s internal performance testing for the new release. These figures are broadly in line with the high-level expectations detailed above.
The overall out-of-the-box CPU savings within DB2 11 are expected to be one of the major factors supporting the business case for upgrading to the new release.
Figure 2: DB2 11 CM vs. DB2 10 NFM – Sample OLTP/batch CPU savings
Figure 3: DB2 11 CM vs. DB2 10 NFM – Sample query CPU savings
In August 2012, IBM announced the latest-generation IBM zEnterprise® EC12 (zEC12) enterprise servers, with up to 101 configurable processors per server, each running at an industry-leading 5.5 GHz. In addition to an impressive list of general performance and capacity improvements over the previous-generation z196 enterprise servers, the zEC12 models include a number of features that DB2 11 will specifically exploit.
2 GB page frames. DB2 10 for z/OS introduced support for 1 MB “large page frames,” an enhancement designed to reduce processing overheads for very large DB2 buffer pools by letting z/OS manage the underlying storage in fewer 1 MB pieces rather than many more 4 KB pieces (Figure 4).
Figure 4: DB2 10 large page frame enhancement
Many customers with larger DB2 buffer pools were able to achieve CPU savings of up to 4 percent by exploiting this capability. However, as memory prices fall and workloads increase, DB2 buffer pools continue to increase in size, and the overheads of managing even the larger 1 MB page frames start to become significant.
In recognition of these trends, when running on an zEC12 server DB2 11 will support even larger 2 GB page frames, each of which will map onto more than half a million 4 KB pages (Figure 5).
Figure 5: DB2 11 large page frame enhancement
Those customers using very large DB2 buffer pools will see further CPU reductions by moving to 2 GB page frames. Other sites may not have sufficiently large pools for 1 MB page frames to be a significant limitation today, but that situation will undoubtedly change in the future as buffer pool sizes continue to grow. By moving early to support 2 GB page frames, IBM has recognized and eliminated an important future scalability issue.
DB2 code using large page frames. As discussed in the previous section, DB2 10 and DB2 11 have exploited 1 MB and 2 GB large page frames to allow more efficient handling of large buffer pools. However, despite the extensive use of large memory objects in the past few releases of DB2, the storage used for DB2 code (as opposed to the data held in buffer pools) remained backed by standard 4 KB page frames.
DB2 11 is able to utilize large page frames for DB2 code objects and log output buffers, in addition to buffer pools. This enhancement reduces the z/OS overheads associated with DB2 code objects, lowering CPU consumption and operational costs. (Support for running DB2 code in large page frames requires z/OS 2.1.)
Many new releases of DB2 introduce enhancements or new features that require application and/or SQL code to be changed. These include additional SQL reserved words, changes to DB2 behavior or processing and even changes to SQL return codes. Although IBM tries to minimize these “incompatible changes,” they cannot always be avoided. They may be required in order to ensure that DB2 adheres to evolving SQL standards, to support new functionality, or perhaps to address an earlier defect in the DB2 code.
A major part of planning for a new release is to analyze the impact of these incompatible changes and arrange for the necessary amendments to be made to DB2 application code so it will continue to work as designed under the new release. This situation poses some challenges for DB2 customers:
Figure 6 depicts these challenges.
Figure 6: Application compatibility issues
To address these issues and allow customers to upgrade their DB2 systems with less effort and risk, IBM has introduced some new capabilities in DB2 11 for z/OS that remove the hard dependency on all remedial work being conducted before a version upgrade and allow the impact of incompatible changes to be more easily assessed. Figure 7 summarizes these enhancements.
Figure 7: DB2 11 application compatibility feature
When upgrading to DB2 11, customers will be able to defer some or all of the remedial work for incompatible SQL DML and XML changes and allow the DBA or developer to request that DB2 behaves the same as it did for DB2 10 on an application-by-application basis. Although the remedial work will still need to be done at some point, DBAs and developers are now free to schedule it a later date and in a more manageable, staged fashion that conforms to the requirements of the business (e.g., as part of a regular application release). In the meantime, other applications can benefit from the enhancements in the new release.
Furthermore, IBM has provided additional trace data in DB2 11 that can identify applications using incompatible SQL and XML statements after the version upgrade has been implemented. This will enable DBAs and developers to identify applications requiring remedial work much more efficiently, and with less risk of some being accidentally missed.
Because the intention of this feature is to allow more manageable implementation of remedial work, not to defer it work indefinitely, this capability is limited in the number of previously supported releases. In DB2 11, this feature provides backwards compatibility only for DB2 10. Beyond DB2 11, compatibility for up two previous releases will be provided. This means that the release following DB2 11 will support both DB2 10 and DB2 11 compatibility, thereby allowing plenty of time for any remedial work to be undertaken.
By breaking the hard dependency on performing all remedial work prior to an upgrade and providing valuable tools to assist with the identification of that work, the DB2 Application Compatibility feature addresses many of the issues associated with handling incompatible changes in each new DB2 release. This capability should be a huge benefit to customers struggling to line up the necessary application development/DBA resources to address incompatible changes prior to DB2 11 implementation.
A common requirement for any IT application is to be able to archive old or less frequently accessed data. Regulatory restrictions may require data to be retained for many years, but access frequency tends to drop off dramatically as the data ages). Moving older data to a separate archive can reduce the cost of retrieving and maintaining more frequently used data and allow slower but much less expensive storage devices to be used.
Unfortunately, archiving is usually one of the last areas to be considered and developed for a new application, and it is therefore common to see it deferred until later code releases (or bypassed completely) if time and/or funding is scarce. Even when it is properly implemented, many hundreds of person-hours can be spent in implementing and testing the necessary logic to allow older data to be placed automatically in the archive store while ensuring that the application retains access to it when required.
DB2 11 introduces some new features to simplify application development for archiving data, as well as improve consistency between applications and reduce the amount of time required for testing. When defining the operational DB2 table, the DBA also defines an identical archive table and connects the two via an ALTER TABLE … ENABLE ARCHIVE statement. Any subsequent changes to the operational table (e.g., adding a column) will automatically be made to the archive table so they remain in step. If required, the archive table can be placed on older, cheaper disk devices, with the more frequently accessed operational data residing on faster storage.
Once an archiving relationship has been defined, DB2 can automatically and transparently handle archiving and retrieval of data from the operational and archive tables. The new DB2 11 global variable support (discussed further in the “Global Variables” section) is used to provide simple application-level switches to enable or disable arching functionality at run time, as shown in Figure 8. For static SQL, DB2 automatically prepares two access path strategies: one for use when archiving is enabled and another for use when it is not.
This approach is very flexible, providing automatic archiving and transparent access to archived data while also retaining the ability to disable that functionality via a simple SQL statement (or BIND option) when performance is critical and/or archiving functionality is not required.
Note that Version 3 of the IBM DB2 Analytics Accelerator product also offers some interesting options for handling archive data. For further details, see the “IBM DB2 Analytics Accelerator Enhancements” section.
Overall, the new transparent archive feature promises to significantly reduce the cost of designing, developing, and testing data archive processes for DB2 applications. While it will be of limited value for those applications that have already implemented such functionality manually, it could reduce developer/DBA effort by hundreds of person-hours for newly developed applications (or existing applications that did not originally implement an archiving strategy).
Figure 8: Transparent archive behaviors
Solutions developed using this feature will also benefit from the ability to dynamically enable and disable access to archive tables at run time, ensuring that no performance overhead exists for processes that require access to only the current operational data.
Temporal data support, introduced in DB2 10, provided a unique set of facilities to allow data to be queried as at a specific point in the past, present, or future. With so many IT systems needing to accommodate a historical perspective and maintain audit logs of changes made to sensitive data, DB2’s temporal support can save many hundreds of hours of design, coding, and testing that would otherwise be required to build this function manually for each application. However, the initial implementation had some restrictions, and, based on user feedback, IBM has enhanced DB2’s temporal capabilities within the new release in a number of important areas.
Temporal special registers. Although the initial implementation of temporal query in DB2 10 allowed existing tables to be easily converted via ALTER TABLE, it was still necessary to alter the SQL in applications to include the necessary temporal clauses.
DB2 11 introduces two new special registers (CURRENT TEMPORAL SYSTEM_TIME and CURRENT TEMPORAL BUSINESS_TIME), which implicitly provide temporal context to SQL queries without having to change them. When the special registers are left to the default NULL value, SQL executes without temporal context as usual. However, when they are set to a past or future timestamp value, that value is used to implicitly supply a temporal context to all SQL subsequently executed within the session. Provided that the underlying tables are temporal, this enhancement makes it possible to quickly and easily make DB2 applications temporally aware without the need to make any code changes.
System temporal performance optimization. Access to DB2 10 system temporal tables required DB2 to UNION together the base and history tables for all queries. As the majority of the access to such tables tends to be querying current data, the additional access to the history table often posed an unnecessary overhead.
The new temporal special registers (described above) provide an opportunity to avoid this situation. Provided that a new option (SYSTIMESENSITIVE (YES)) is specified when BINDing the application, DB2 will prepare two separate access path strategies for queries against system temporal tables. As shown in Figure 9, DB2 will then use the relevant access strategy depending on whether the CURRENT TEMPORAL SYSTEM_TIME register is set. This approach provides the best of both worlds, with applications able to access historical data when required but avoid unnecessary performance overheads when it is not.
Figure 9: System temporal performance enhancement
Temporal view support. DB2 10 allowed temporal queries to be executed against DB2 tables but did not allow those queries to be executed against any views that referred to those tables. Because many customers use views heavily throughout their DB2 applications, this posed a significant restriction. DB2 11 removes this limitation, allowing non-temporal and temporal tables to be mixed freely within a view definition. Temporal queries can then be executed against the view, with the relevant AS OF predicates being applied to any temporal tables. Similarly, temporal UPDATE/DELETE logic is applied for any temporal tables updated via a view.
Together, DB2 11’s temporal data enhancements significantly expand the practical use cases for DB2 temporal tables and will allow many more customers to take advantage of the substantial productivity and consistency benefits they have to offer.
One of IBM’s stated objectives is to make it easier and more cost-effective to port applications from other databases to DB2. One of the major remaining barriers to this activity has been the fact that, unlike most other relational database management systems (RDBMSs), DB2 for z/OS did not offer support for global variables. These constructs make it possible for SQL statements within the same application context to share data without the need to use application logic or insert the data into temporary tables. Significant additional effort was required to rewrite applications designed for other RDBMSs that used global variables so they would work against DB2.
DB2 11 introduces support for global variables (Figure 10). You define global variables to DB2 via the new CREATE VARIABLE SQL statement, with metadata details being recorded in the DB2 catalog tables. As shown in the example, each application referring to a given global variable then has its own instance of it, which can be set and read independently of any other connection.
Figure 10: DB2 global variables
The availability of this feature makes DB2 much more compatible with other RDBMS implementations, letting applications be more easily and cost-effectively ported to DB2. It also opens up some valuable new possibilities for developers and DBAs to more extensively parameterize SQL scripts, with associated productivity benefits.
Many applications need to handle repeating groups of data with a variable number of elements. For example, a stored procedure to perform credit scoring may need to accept multiple customer account numbers as input to allow it to check several accounts at once. Variable arrays are a common means of addressing this kind of requirement, allowing an arbitrary number of elements sharing the same data type to be easily and elegantly addressed within an application.
Unfortunately, despite the growing popularity of SQL stored procedures for handling critical business logic, previous releases of DB2 did not provide support for array handling within the SQL PL language in which they are written. Therefore, such requirements had to be satisfied by various workarounds, such as defining a long list of differently named stored procedure parameters, using temporary tables to store data, or concatenating all of the data into a long string and then splitting it up again. All of these solutions have serious drawbacks in terms of efficiency, developer productivity, application maintainability, and code portability.
DB2 11 introduces formal support for variable arrays with SQL stored procedures. An array data type is formally declared to DB2 (via the new CREATE TYPE … ARRAY SQL statement) and can then be used instead of a standard data type when defining a PL SQL variable or the input or output parameters of an SQL stored procedure.
Figure 11 shows the previously cited example of a credit scoring stored procedure that needs to accept multiple customer account numbers as input to allow it to check several accounts at once (up to 10 in this case). Before the availability of arrays, the SQL PL developer would have been forced to define separate I/O parameters for each element in the repeating groups as shown on the left. The same result is achieved far more elegantly and straightforwardly using arrays in the DB2 11 example on the right of the diagram.
For customers making extensive use of SQL stored procedures, the availability of variable arrays removes a major gap in the capabilities of the SQL PL programming language. As a result, SQL stored procedures can be written with less developer effort, will be easier to maintain, and will run more efficiently than their DB2 10 equivalents.
Java® has long been considered a mainstream enterprise application development language, and many customers use it to write their DB2 stored procedures. As shown on the left in Figure 12, prior releases of DB2 used a 31-bit address space to execute these stored procedures, with each stored procedure running in its own Java Virtual Machine (JVM). This approach imposed a practical limit of two to five concurrent stored procedures within a single address space due to storage constraints and created a significant overhead due to the need to keep starting multiple JVMs.
DB2 11 addresses these limitations by supporting 64-bit multithreaded Workload Manager (WLM) stored procedure address spaces.3 As shown on the right side of the figure, this allows a single JVM to support multiple stored procedures, with up to 25 per WLM address space observed in early IBM testing.
Figure 11: DB2 variable arrays
Figure 12: DB2 Java stored procedure enhancements
This enhancement greatly improves the scalability of Java stored procedures, reducing operating costs by lowering the CPU and storage overheads associated with many JVM instances.
DB2 9 introduced pureXML®—a major new feature that let XML documents be stored natively within DB2 and retrieved easily using the power of SQL and XPath. Some improvements were provided in DB2 10, and DB2 11 introduces an even more comprehensive package of performance and functionality enhancements.4
XQuery support. XPath and XQuery are two of the most common languages used for accessing XML documents. XPath provides a simple, extremely efficient way to navigate the internal structure of an XML document and extract specific information from it. XQuery can be considered a superset of XPath; as shown in Figure 13, it includes all the XPath syntax wrapped within a much more comprehensive, general-purpose query language that is able to undertake more complex operations, such as XML result set transformation and joins.
Figure 13: XPath and XQuery
The pureXML feature within DB2 for Linux, UNIX and Windows has supported both XPath and XQuery for some time, but until now DB2 for z/OS allowed developers to use only XPath. DB2 11 redresses the balance, allowing the full power of XQuery to be unleashed when querying XML data within DB2 for z/OS. This will improve developer productivity and DB2 family compatibility.
Performance enhancements to pureXML. The IBM development team has addressed a significant number of pureXML performance challenges within DB2 11. XML validation has been improved in several ways. The process can now be performed directly in binary format without having to convert to string XML first, LOAD is able to avoid revalidating XML if it has previously been validated, and DB2 is now able to revalidate only the changed parts of an XML document rather than the whole thing. Other performance enhancements include elimination of hotspots during XML INSERT and improvements to the XMLTABLE table function.
Figure 14 shows some of the CPU savings seen during internal IBM testing conducted during the Early Support Program (ESP).
Figure 14: DB2 11 pureXML – Sample CPU savings using internal IBM workloads
The pureXML feature has rapidly matured to the extent that many DB2 customers have adopted it as their strategic XML repository. The availability of the powerful XQuery language and some very welcome performance enhancements further consolidate this position.
No new release would be complete without further enhancements to DB2’s industry-leading optimizer, the key component that allows DB2 to pick the most efficient access path for a given query. Unless otherwise specified below, all of these capabilities are available immediately on entry to Conversion Mode for dynamic SQL, and at REBIND time for static SQL, with no changes being required to the application or SQL code.
The major items delivered as part of DB2 11 for z/OS include the following.
Hash join/sparse index enhancements. DB2 10 introduced hash join5 support for access paths using an In Memory Data Cache (IMDC).6 DB2 11 further optimizes memory usage by IMDCs to significantly improve query performance. At run time, DB2 will choose the most appropriate access strategy for each query based on the amount of memory currently available, as shown in Figure 15. Using this approach, DB2 can dynamically alter its behavior based on the specific workload executing at any given time, making best use of the memory available to provide optimum performance.
Figure 15: DB2 11 in-memory data cache optimization
IBM has also provided new trace data within DB2 11 to allow DB2 administrators to easily determine where performance could be improved by making additional memory available for IMDC operations.
These enhancements are expected to deliver measureable performance benefits “out of the box,” but significant additional benefits are expected for those customers running suitable query workloads that are able to increase the memory available for IMDC operations7 beyond DB2’s default value of 20 MB.
Predicate indexability improvements. Using an index is generally the most efficient way to retrieve a small number of rows from a large table, but some predicates cannot be used to match indexes; these are known as non-indexable. It is often possible for a developer to rewrite a query so that the non-indexable predicates are replaced with indexable equivalents. This will permit the query to execute much more efficiently, but it requires the developer to have a deep understanding of DB2 query performance.
In some cases, DB2 is able to automatically rewrite queries to make predicates indexable, and DB2 11 adds a number of additional capabilities in this area. The following predicates have been specifically targeted:
DB2 11 can also rewrite queries using OR predicates that involve NULLs to use IN-lists instead. This makes them eligible for a more efficient single matching index access path.
Figure 16 shows some examples of these new query rewrite capabilities.
Figure 16: DB2 11 query rewrite examples
DB2 11 contains further enhancements to address SQL constructs typically seen in query generators and enterprise resource planning (ERP) applications. These include pruning “always true” predicates, extending the capabilities introduced in DB2 10 to prune “always false” predicates, allowing CASE predicates to be indexable and removing several other restrictions relating to conversion of correlated subqueries, and “pushing down” predicates into materialized views.
Duplicate removal. Many SQL operations require DB2 to remove duplicate entries in a result set, including SELECT DISTINCT, GROUP BY, and non-correlated subqueries. Eliminating duplicates can be done via sorting, but this approach can be expensive in performance terms, so several other techniques can also be used. DB2 11 introduces three significant enhancements in this area.
First, DB2 11 is able to more efficiently use an index to eliminate duplicates via a technique known as index skipping. Rather than read all the qualifying bottom-level index leaf pages and throw away duplicates at run time (as depicted in Figure 17), DB2 11 can use a combination of leaf and non-leaf pages to skip directly to the next distinct value, as shown in Figure 18.
Figure 17: Pre-DB2 11 duplicate removal using index scan
Figure 18: DB2 11 Duplicate removal using index skipping
DB2 11 is also able to more efficiently process join queries that contain the GROUP BY or DISTINCT keywords, a common construct for ERP applications. This uses a process known as “early out” to abandon scanning the inner table after the first value has been returned, thereby avoiding the extra work needed to retrieve duplicate values and discard them later.
Finally, enhancements have been made to processing for correlated subqueries. DB2 11 can use another new early out optimization when the subquery uses the MAX or MIN function and can make better use of a result cache first introduced in DB2 for z/OS Version 2 for saving the 100 most recent correlated subquery execution results.
DPSIs and page range screening. Data-partitioned secondary indexes (DPSIs) were introduced in DB2 Version 8 as a means of providing many of the benefits of a secondary index while retaining good partition independence characteristics. While DPSIs can be very beneficial in some circumstances, they can also introduce performance compromises for some queries if they do not include predicates on the partitioning key. DB2 11 introduces some enhancements that significantly increase the “sweet spot” for DPSIs, allowing them to be more widely deployed.
First, page range screening for join predicates has been implemented. If a table is partitioned on one or more columns that are used in a join and indexes exist on other local or join columns, DB2 can apply page range screening from the join predicates and access only the qualified partitions for each table access. This behavior could significantly improve the performance of joins involving DPSIs.
The second enhancement involves improving the amount of parallelism available for queries involving DPSIs. If a table has been partitioned by a non-join column and has a DPSI on the join columns, DB2 can now choose to process each partition within a parallel child task. This can result in what was previously random I/O becoming sequential.
These enhancements are expected to provide a performance benefit for existing DPSI implementations, as well as significantly increase the sweet spot for DPSIs and enable them to be more widely deployed.
Optimizer RUNSTATS feedback. DB2’s cost-based optimizer is highly dependent on both the quality and the quantity of table and index statistics available when it attempts to determine the optimum access path for a given query. While most DBAs have processes in place to ensure statistics are collected (via the DB2 RUNSTATS utility), most of these take the form of standard jobs using a common set of (or default) RUNSTATS options.
If the DB2 10 optimizer encounters poor or incomplete statistics, the DBA’s first warning is typically when a poor access path is selected. He or she then has to determine the cause and (if poor statistics are to blame) re-run RUNSTATS with the necessary additional options. This process is time-intensive and error-prone, and it relies on deep DBA knowledge of the correct RUNSTATS options for a given situation. Figure 19 depicts this situation.
Figure 19: DB2 10 optimizer feedback issue
DB2 11 attempts to improve things by providing a means for the optimizer to signal when poor statistics are encountered (Figure 20). Two new tables8 will be populated with the optimizer’s recommendations on RUNSTATS options that would help it to select a better access path. Although this information unfortunately cannot be consumed directly by RUNSTATS, additional tooling can generate a suitable RUNSTATS job to address the issue. DB2 customers are free to write this themselves or use a vendor tool, such as IBM Optim™ Query Workload Tuner (which has been specifically enhanced to exploit this functionality).
Figure 20: DB2 11 optimizer RUNSTATS feedback
With the necessary tooling in place, this enhancement can reduce the amount of DBA time spent addressing statistics-related access path issues and can help drive down operational costs by letting DB2 select more efficient access paths.
Extended optimization. During access path selection, the optimizer needs to determine how many rows will be filtered by each step in the access plan in order to determine the overall cost of the query. Most of the time, it is able make good estimates, either based on specific statistics or by using default “filter factors” based on the predicates in the SQL statement. However, there are some situations where the optimizer needs more help, especially with queries with pronounced data and/or execution skew9 that use host variables or parameter markers. If this happens, the DBA typically tries to provide the optimizer with additional statistics (see the section on Optimizer RUNSTATS feedback above) or, in more extreme circumstances, may have to resort to access path hints.10
DB2 11 for z/OS provides an additional option by letting the DBA specify a “selectivity profile” for a specific SQL statement by inserting rows into a new table (DSN_PREDICATE_SELECTIVITY). This profile contains information about both the predicate filter factors and the execution frequency for a problematic query. A process similar to that used for access path hints can then be followed to allow the optimizer to use this information and select a better access strategy. However, unlike a hint, this doesn’t “lock in” a specific access path and still leaves the optimizer free to select the best available access path based on the improved data and execution information supplied in the selectivity profile.
As with the RUNSTATS recommendations previously described, the DBA is free to either manually insert rows into the selectivity table or use external tooling to assist (IBM’s Optim Query Workload Tuner will provide this functionality).
DB2 11’s new extended optimization capabilities give DBAs a better alternative to access path hints where the optimizer is unable to determine the best access strategy. This has the potential to improve DBA productivity as well as drive down operational costs for problematic queries through more efficient access path selection.
Data sharing was introduced way back in DB2 V4 and over the next few releases rapidly established itself as an unbeatable solution for customers requiring the highest levels of resilience and scalability. However, relatively few performance enhancements have been delivered since. Fortunately, DB2 data sharing performance has been selected as a focus area for DB2 11, and several significant enhancements are included in the new release.
Group buffer pool and castout enhancements. In a data sharing environment, the group buffer pools keep track of which data is being updated by which member and will also usually11 be used to cache copies of changed data. If an application updates a page that another member might want to access, DB2 copies the changed page from the local to the group buffer pool so other members can see the new version (Figure 21).
There is no direct connection between the coupling facility and disk, so when DB2 needs to externalize the changed data, it is written to disk via a dedicated pool in the owning member—a process known as “castout.”
Figure 21: DB2 data sharing topology
In situations where there is heavy and sustained write activity to the group buffer pool (e.g., when running large batch or utility processes), a lot of changed data has to be written to the group buffer pool. In extreme circumstances, this can lead to storage shortages and can ultimately compromise data availability.
To help avoid this situation, DB2 11 introduces a new capability12 known as group buffer pool write-around (Figure 22). This feature lets DB2 bypass the coupling facility and write changed pages directly to disk when the group buffer pool is under stress, significantly speeding the process of offloading changed pages. Once the amount of changed data in the group buffer pool has returned to a reasonable level, normal processing resumes.
In addition, castout processing itself has been improved to reduce the I/O elapsed time and optimize the communications between the coupling facility and the castout owners.
Figure 22: DB2 group buffer pool write-around
Other data sharing performance enhancements. Several other data sharing performance issues have been addressed in DB2 11, including enhancements to coupling facility DELETE_NAME requests to suppress unnecessary communications, a new option on light restart to include castout processing so that availability is improved, and various improvements to global locking and index split processing.
Together, DB2 11’s data sharing performance enhancements promise to significantly improve data sharing performance while further increasing DB2’s lead as the most scalable and available RDBMS in the industry.
DB2 utilities perform housekeeping and recovery functions that are essential to keep applications available and performing efficiently. IBM’s objective is to continue to drive down the CPU and elapsed time for utility operations while also reducing their availability impact and making them simpler to use. DB2 11 for z/OS includes some valuable enhancements in these areas.
REORG. A large number of enhancements have been made to the REORG utility. Improvements to online REORGs13 have reduced the elapsed time of the SWITCH phase by up to 91 percent in IBM testing and provide more control over the timing, thereby improving availability. DB2 11 also introduces automated mapping tables,14 improving DBA productivity and reducing the scope for human error.
Further new options allow the DBA to save CPU by choosing to reload data in the original order rather than sorting it and to reduce recovery CPU and elapsed time using partition-level inline image copies. Internal IBM testing showed a 28 percent reduction in elapsed time and a 49 percent reduction in CPU time when recovering a single partition of a 20-partition table space (Figure 23).
Figure 23: DB2 11 REORG – Partition-level inline copy
Other REORG enhancements include improvements to large object (LOB) processing, better support for rebalancing partitions following a partitioning key change (see the “Enhanced Dynamic Schema Change” section elsewhere in this paper), and updated defaults to reflect best practices.
Statistics. Collecting statistics is vital for good performance, but it can be expensive in CPU terms. The RUNSTATS utility has been enhanced to allow up to 80 percent of the CPU consumed when collecting distribution statistics to be offloaded to zIIP.
Improvements have also been made to inline statistics,15 increasing the zIIP offload potential and allowing histogram and distributed statistics to be collected inline.
The “Optimizer and Query Performance Improvements” section describes an important new feature that lets the optimizer recommend specific RUNSTATS options to improve query efficiency.
Backup and recovery. Improvements have been made to the elapsed time for DB2 catalog/directory recovery, reducing application outage in the event of a serious DB2 problem. Other enhancements include the removal of restrictions for point-in-time recovery following dynamic schema change (described in the “Enhanced Dynamic Schema Change” section) and new options to allow more efficient system cloning when using RESTORE SYSTEM.
Other utility enhancements. Other utility-related enhancements include cross-loader support for XML data, increased zIIP offload for LOAD REPLACE processing, and significant elapsed time reductions due to parallel data conversion during load.
Collectively, these enhancements can significantly reduce operational costs by driving down CPU consumption while also improving application availability and DBA productivity.
A number of other important performance and productivity enhancements are delivered in DB2 11, including the following.
Global Temporary Table enhancements. Declared Global Temporary Tables (DGTTs) are commonly used as an efficient way to hold intermediate results within complex processes such as stored procedures. DB2 11 allows logging to be disabled for DGTT processes, reducing CPU usage and log volumes (at the expense of some additional ROLLBACK considerations within the application). The CPU overheads for repeated COMMIT processing against DGTTs have also been reduced significantly by keeping prepared versions of the statement across COMMIT boundaries.
Pseudo-deleted index entry cleanup. Under most circumstances, deleting rows from a table does not cause the associated index entries to be physically deleted, but merely marked as deleted. These are known as pseudo-deleted entries, and large numbers of them can cause performance issues until they are actually deleted via a subsequent REORG INDEX. DB2 11 is able to use up to 128 asynchronous system tasks to clean up these pseudo-deleted index entries, using information it already maintains about the number of entries awaiting deletion within each index. IBM has provided comprehensive control options that allow this activity to be limited by number of tasks, index, time of day, and so on. All CPU for the asynchronous clean-up tasks is zIIP-eligible.
Figure 24 compares CPU degradation over time for a DB2 10 index and the equivalent situation in DB2 11 with the cleanup process enabled.
Figure 24: CPU benefit of pseudo-deleted index entry cleanup
Sort improvement. Several sort-related enhancements are included in DB2 11. These include increasing the amount of memory available for sorting, more aggressive use of in-memory sorting, and a reduction in latch contention. Because such a large proportion of SQL operations use sort in one form or another, these enhancements will benefit most DB2 workloads.
SELECT from directory pagesets. DB2 has always permitted users to issue SELECT statements against the metadata stored in the DB2 catalog, and this is a vital source of information for developers and DBAs. In contrast, the DB2 directory (which contains much of the same data but in an internal DB2 format optimized for performance) has remained firmly off limits. DB2 11 allows SQL SELECT access to five directory pagesets16 for the first time, letting authorized users view DB2’s internal data for diagnostic purposes.
Enhanced dynamic schema change. In addition to reducing planned outages and the possibility of human error, the improvements described under “Enhanced Dynamic Schema Change” can improve DBA productivity by significantly reducing the effort required to implement the relevant schema changes.
SQL support for analytic workloads. DB2 11 includes some SQL enhancements specifically aimed at supporting analytic workloads. These are described in the “SQL Aggregation Improvements” section.
The System z platform is rightly famed as one of the most robust and secure computing platforms on the planet. However, business and regulatory requirements in this area continue to become more demanding, so this is an important and ongoing focus area for the DB2 development team.
This section groups together some key new features that make DB2 more resilient to the possible negative impacts of planned change, as well as increasing its ability to cope with ever-increasing workload volumes.
The DB2 recovery log records the changes made to DB2 objects and is critical for application and system restart/recovery. When DB2 was first released in 1983, it implemented a 6-byte log address (known as a log relative byte address, or log RBA), giving it the equivalent of 256 TB of total logging capacity. This amount was considered ample, allowing for many decades of logging at the volumes typical of the time.
Unfortunately, the relentless increase in transaction volumes, combined with a significant increase in log record size due to support for large objects (LOBs) and other new data types, has resulted in an explosion in DB2 logging volume. This has been exacerbated by recent trends to consolidate workloads into fewer DB2 data sharing members due to DB2 10’s scalability improvements, resulting in yet greater transaction volumes within each DB2 system.
The net result is that many customers are approaching the end of the DB2 log range in some of their older or more active systems, as depicted in Figure 25. In some extreme cases with ultra-high logging volumes, this “log wrapping” is happening as often as every two months.
Figure 25: 6-byte DB2 relative byte address
In prior releases, resetting the log RBA was a painful process for non-data sharing customers, requiring extensive work and an extended outage of a day or more. Data sharing customers have a less disruptive option available, by “retiring” the DB2 system approaching the end of the log and adding a new one to replace it. However, even this option introduces additional cost and operational complexity, with changes required to operator automation, batch scheduling, and more.
A similar situation exists with the log record sequence number (LRSN) used by data sharing to provide a common log identifier across all members in a data sharing group. The LRSN is also 6 bytes in length, but it is based on a clock value with a maximum that will be reached in 2042. While that may appear to be of no immediate concern, under certain circumstances17 a “delta” or offset is added to the current clock value, and in extreme circumstances the offset can bring the actual LRSN value much closer to the 2042 high value. In Figure 26, Customer A has no LRSN delta and therefore has around 30 years of log capacity. However, Customer B has a very high LRSN delta of 25 years, leaving less than five years before running out of log capacity. Unfortunately, in releases prior to DB2 11, there is no way to circumvent this issue.
Figure 26: 6-byte log record sequence number
DB2 11 addresses the previously discussed issues by expanding both the log RBA and LRSN from 6 bytes to 10 bytes. For the log RBA, this enhancement provides 1 yottabyte (280 bytes) of addressable range. To put that in context, the extreme DB2 customers currently wrapping the log every two days would be able to sustain 23.5 million years at the same logging rate before wrapping the expanded log (Figure 27).
Figure 27: 10-byte log record sequence number in DB2 11
For data sharing customers, 1 byte of the expanded LRSN value is used to extend the maximum clock value, with the remaining 3 bytes used to increase its accuracy. This provides in excess of 30,000 years of additional log capacity while also making the stored value 16 million times more precise (and thereby reducing performance issues associated with “LRSN spin”).
Implementation of the expanded RBA/LRSN formats is optional but highly recommended as DB2 11 will use 10-byte values internally once in New Function Mode, and there will be a small performance penalty for converting these to the old format. Data sharing customers will be able to maintain full application availability by converting one member at a time, while non-data sharing customers will need to schedule a small outage (typically less than a minute).
The move to 10-byte log addresses represents a major investment in DB2’s future and provides customers at or approaching the existing log limits with an elegant solution while also delivering some valuable data sharing performance benefits.
As applications evolve, the associated changes to DB2 objects such as tables and indexes remain one of the most common reasons for disruptive planned outages. Dynamic schema change allows DBAs to alter the definition of DB2 objects while maintaining access to the underlying data, with the change being fully materialized the next time the data is reorganized as part of routine housekeeping.
IBM began focusing on dynamic schema change back in DB2 Version 8 and has steadily expanded its capabilities in every release since. DB2 11 introduces the following additional enhancements:
Change partitioning limit keys. Partitioning a table can provide important scalability and manageability benefits, but it usually requires the DBA to specify a “limit key” for each partition so that DB2 knows which partition a given row belongs in.18 Previously, changing the limit keys immediately made all of the affected data in the table unavailable until a REORG was executed to redistribute the data according to the new limit keys, making this a highly disruptive operation.
DB2 11 lets applications continue to access the table after partitioning limit keys have been changed. The DBA still needs to run a REORG for the change to actually be implemented, but this can be scheduled at a convenient time and with data availability being maintained throughout.
Drop column. DB2 has long supported the ability to add new columns to an existing table, but dropping columns that are no longer needed meant dropping and re-creating the entire table. Many DB2 applications therefore leave these unwanted columns in place, leading to wasted space, poorer performance, and maintainability challenges.
The new release extends the ALTER TABLE SQL statement to allow columns to be immediately dropped from an existing DB2 table with no application outage. As with the partitioning limit key change described above, this is a “deferred change,” so although the column will no longer be returned if an application accesses the table, a subsequent REORG will be required to physically remove the column from the table data.
Recovery support for deferred schema changes. The deferred schema change capability introduced in DB2 10 and enhanced in DB2 11 (as discussed above) lets DBAs make schema changes at any time but defer the materialization of those changes until a REORG can be scheduled at a convenient time.
However, the DB2 10 implementation included a significant restriction relating to point-in-time recoveries. As Figure 28 illustrates, once the REORG was run to materialize the pending change (at T2 in the example), it was not possible to perform a recovery to a prior point in time. DB2 11 removes this restriction, allowing recovery to any valid prior point.
Figure 28: Recovery scenarios with deferred schema change
Together, DB2 11’s dynamic schema change enhancements significantly improve data availability. They also reduce the possibility of human error and improve DBA productivity as a single SQL statement can replace complex scripts to drop and re-create database objects.
The plan management features19 introduced in DB2 9 and DB2 10 went a long way to addressing customer concerns about potential access path regression during BIND/REBIND activity. In particular, a new BIND option let customers ask DB2 to attempt to retain the existing access path by specifying APREUSE(ERROR). DB2 would then try to reuse the existing access path for all statements in a package, but if that wasn’t possible for even one of the SQL statements in the package, none of them would be rebound. This situation, illustrated in the top half of Figure 29, required the DBA to manually resolve the situation in order to successfully rebind the offending package.
Figure 29: Retaining access paths with APREUSE(WARN)
DB2 11 introduces APREUSE(WARN), a new BIND option that provides more flexibility for the DBA. As shown in the bottom half of the figure, any access paths that cannot be retained will go through normal optimization to select a new one, while the access path for the remaining SQL statements will be retained.
This enhancement greatly improves the usability of APREUSE, especially for large-scale rebind activity such as that performed when upgrading to a new version of DB2. A greater proportion of packages can be successfully bound while retaining most of the existing access paths, leaving the DBA to review any new access paths as time allows.
DB2 received a comprehensive overhaul of its security features within the DB2 10 for z/OS release. DB2 11 adds some important new functionality.
RACF exit enhancements. IBM offers two options for managing access to DB2 data: internal DB2 security and external security managed by a product such as REsource Access Control Facility (RACF®). Many customers choose external security because this option lets RACF administrators handle access to DB2 objects using many of the same concepts, tools, and procedures as for any other type of resource. However, the exit responsible for communications between RACF and DB2 had some significant limitations.
First, the RACF exit did not honor use of the OWNER keyword20 when plans/packages were bound or rebound and would instead use the authorization ID of the invoker of the BIND. Many sites rely on the use of OWNER within their DB2 security design, making use of external RACF security difficult or impossible. Similar restrictions also existed with the use of dynamic SQL if DYNAMICRULES(BIND) was specified.
Second, some situations could occur where access to a DB2 resource had been revoked in RACF, but DB2 would continue to temporarily allow access. In the example in Figure 30, USER1 is still able to access TAB1 after the RACF administrator has removed her access, because DB2 uses the internal DB2 authorization cache, which is not updated in line with the RACF database. Although techniques existed to force DB2 to refresh its internal authorization cache, they required manual intervention and could easily be overlooked.
Figure 30: Example of RACF/DB2 authority caching issue
Both of these issues are resolved by the RACF exit changes delivered within DB2 11. A new installation parameter instructs DB2 to properly use the package owner for static and dynamic SQL authorization checks, ensuring that external RACF security behaves in a similar way to internal DB2 authorization management. Another new parameter enables a refresh function for DB2’s internal authorization cache. As Figure 31 shows, this ensures that DB2’s internal cache stays in step with changes made to RACF, thereby avoiding the potential exposure.
Figure 31: DB2 11 RACF/DB2 authority cache refresh
Column masking. The masking feature introduced in DB2 10 provided a powerful means of fully or partially concealing sensitive information by specifying a set of rules under which either the full value or a masked equivalent would be returned. (Typically this would be done according to the type of user making the request.) DB2 11 removes some of the restrictions previously imposed on the use of the SQL GROUP BY and DISTINCT clauses when querying a masked table.
Although less extensive than the security extensions delivered in the previous release, the DB2 11 enhancements resolve some significant issues and will allow more customers to take advantage of important features such as external authorization checking and column masking.
Managing persistent threads. The high-performance database access threads (DBAT) feature introduced in DB2 10 delivered significant CPU savings for many customers by letting DBATs retain DB2 resources across a COMMIT point and thereby allow a similar type of thread reuse to that enjoyed by CICS® users for many years. However, this feature also introduced some challenges in performing routine BIND and DDL activities, which could struggle to obtain the necessary locks.
DB2 11 includes some additional enhancements to allow it to selectively free accumulated resources for high-performance DBATs, making it easier for BIND and DDL activity to “break in” and complete successfully. This, in turn, allows high-performance DBATs to be more widely deployed, with associated benefits in performance and operational cost reduction.
Autonomics improvements. IBM has a stated objective of increasing DB2’s ability to manage itself via new autonomic capabilities. DB2 11 includes new facilities to automatically clean up pseudo-deleted index entries (described in the “Other Efficiency Enhancements” section), and some improved techniques for reserving free space for SQL UPDATE operations to complete on the optimal page. In addition to improving DBA productivity, these autonomic facilities increase the robustness and availability of DB2 applications.
Scalability enhancements. Several enhancements have been made to remove DB2 scalability limitations. These include doubling the maximum number of open datasets that each DB2 system can access and increasing the maximum size of several internal objects to improve virtual storage usage.
Traditionally, DB2 for z/OS was considered primarily an OLTP data server, with the DB2 for Linux, UNIX and Windows variant (or other vendors’ databases) being a more common choice for analytics and data warehousing duties. This approach is often dictated by cost concerns or historical inertia, but the superior resilience and scalability of the IBM System z platform, combined with the increasing popularity of real-time warehousing, is leading many customers to re-examine this decision.
In a recent IBM survey,21 72 percent of the respondents indicated that they would be using transactional data from enterprise applications as input to their big data analysis programs. Given that both volume and velocity are highly significant in managing big data/analytics workloads, it makes sense for organizations to consider performing the analysis on the platform upon which that the source data resides.
DB2 9 and DB2 10 delivered some significant new functionality to support business analytics workloads, and DB2 11 further expands its capabilities in this area. However, some equally important developments are occurring within the supporting System z tools and infrastructure, and we will examine those as well in this section.
Analytics workloads commonly require complex aggregation operations to be performed in order to summarize large amounts of data. DB2 has long supported simple aggregation via the SQL GROUP BY clause (as shown in the example in Figure 32). This capability satisfies many basic requirements, but more sophisticated business BI/analysis tools typically had to retrieve all of the detail/fact data from DB2 so they could perform more complex aggregation themselves.
DB2 11 extends the basic GROUP BY aggregation with three additional capabilities. First, the GROUPING SETS clause effectively allows multiple GROUP BY queries to be executed within a single SQL statement, as shown in Figure 33.
The example shows three separate aggregations (for week number, day of week, and sales person) being executed within a single SQL statement.
Figure 32: GROUP BY example
Figure 33: GROUPING SETS example
The second new capability introduced by DB2 11 allows nested groups with subtotals to be produced, using the ROLLUP function as shown in Figure 34. The same three aggregation attributes as in the previous example have been used, but this time they are nested within each other, with a subtotal provided at each level.
Figure 34: GROUP BY ROLLUP example
Finally, a GROUP BY CUBE function provides a similar roll-up functionality to the GROUP BY ROLLUP feature described above, but with the addition of cross-tabulation rows to create the equivalent of a data cube for slice-and-dice analysis.
These new functions significantly expand DB2’s ability to natively aggregate large volumes of data, allowing BI/reporting tools to analyze DB2 for z/OS data with lower response times, and at less cost, than before.
Although IBM continues to expand DB2’s native capabilities to more efficiently handle high-volume analytic workloads, it is very challenging for any general-purpose RDBMS to deliver the kind of query performance and efficiency that dedicated operational analytics appliances such at Netezza® can achieve without sacrificing OLTP performance. In recognition of this, IBM offers the DB2 Analytics Accelerator: a unique blend of System z and highly optimized Netezza technology for efficiently handling complex, performance-critical, operational analytic workloads.
As shown in Figure 35, in an IBM DB2 Analytics Accelerator implementation, no changes are required to applications; they continue to connect to DB2 to issue queries. The DB2 optimizer is responsible for deciding whether a given query should be offloaded; if so, it is passed to the locally connected accelerator host for execution. Upon completion of the IBM DB2 Analytics Accelerator query, the results are passed back to DB2, which then passes them back to the calling application as if they had been obtained locally. Depending on the query, it is not unusual for IBM DB2 Analytics Accelerator queries that require hours to execute natively on DB2 to be completed in seconds once offloaded to the accelerator.
Figure 35: IBM DB2 Analytics Accelerator architecture
Version 3 of the IBM DB2 Analytics Accelerator technology, announced by IBM in October 2012, brings some significant enhancements that work in concert with DB2 10 and DB2 11 to expand the query workload capacity and capabilities.
Incremental update. As the IBM DB2 Analytics Accelerator stores its own local copy of the data to be queried, processes are required to keep the accelerator data in step with the operational copy held by DB2. Previous incarnations of IBM DB2 Analytics Accelerator only offered the ability to perform a full refresh of the data (at the table or partition level), which had obvious drawbacks for large tables with relatively low update frequency. IBM DB2 Analytics Accelerator V3 introduces incremental update capabilities, which allow updates to be trickle-fed to the accelerator data store and provide for near real-time reporting.
Multi-temperature data. The “Transparent Archiving” section of this document discusses one option for handling rarely referenced DB2 historical data, but IBM DB2 Analytics Accelerator V3 provides another by letting data be moved from DB2 to reside in the accelerator only as it ages. Also known as the High Performance Storage Server (HPSS), this provides a transparent means of moving rarely referenced data to a cheaper storage platform while still providing access when required.
Increased capacity. IBM DB2 Analytics Accelerator V3 extends the maximum effective capacity of an IBM DB2 Analytics Accelerator instance to a massive 1.3 PB.
Version 4 of the IBM DB2 Analytics Accelerator, which should be announced by the time this paper is published, is expected to include a number of additional enhancements.
Accelerator modeling. DB2 has long provided the ability for customers to determine the amount of CPU that could be saved by directing eligible workload to zIIP22 engines. This enables customers to rapidly assess the ROI of potential future zIIP purchases. IBM DB2 Analytics Accelerator V4 provides a similar capability for accelerated queries, with new instrumentation to show the CPU time used for queries that would be eligible for offload to an accelerator (whether one is actually available or not).
More eligible queries. IBM DB2 Analytics Accelerator V4 significantly expands the number of queries eligible for offload. Static SQL queries are now eligible, and more DB2 data types and functions are supported.
Increased accelerator efficiency. Several enhancements have been made to improve the efficiency of the accelerator itself, including workload balancing across multiple accelerators, concurrent load/replication support, enhanced monitoring, and improved workload prioritization.
Enhanced HPSS solution. The HPSS capability introduced in IBM DB2 Analytics Accelerator V3 has been further enhanced with improvements to archiving functionality and the ability to more easily restore archived tables.
The combination of DB2 11 and the many new features in IBM DB2 Analytics Accelerator V3 and V4 significantly progresses IBM’s objective to transition DB2 into a truly universal DBMS that provides the best characteristics for both OLTP and analytical workloads.
Few IT professionals can have missed the big data phenomenon that has manifested itself in recent years. Despite the undeniable value of the highly structured operational data held within enterprise applications, a vast amount of less structured data is being generated by social media streams, telemetry, clickstreams, and many other sources. Being able to analyze these big data sources undoubtedly holds significant value for many organizations, but the sheer volume and velocity at which the data is produced makes it a very challenging task for traditional database systems. In response, a number of tools and techniques have emerged centered on the open source Hadoop framework, including IBM’s InfoSphere® BigInsights™ technology.
While these technologies address many of the challenges inherent in analyzing big data, they also introduce new ones for organizations wanting to gain new insights by integrating the analysis of big data with core operational information.
As shown in Figure 36, DB2 11 delivers some highly significant new features to allow DB2 and Hadoop/BigInsights to work together and better leverage each platform’s respective strengths. This capability lets data flow in both directions between DB2 and BigInsights, as follows.
Figure 36: DB2 11/Big data integration
From BigInsights to DB2. DB2 11 provides new connectors and capabilities to allow it to easily and efficiently access BigInsights data. A query can be passed to BigInsights using a standard JSON-based query language called Jaql.23 Once the query completes, BigInsights stores the results within its internal HDFS file system. A special new user-defined function (UDF) within DB2 11 then lets this data be read as if it were a DB2 table and even joined to other local DB2 tables if required. Figure 37 illustrates an example of this process.
From DB2 to BigInsights. In the opposite direction, DB2 11 data can be accessed from within a Jaql query running within BigInsights. This capability uses a Jaql JDBC module within BigInsights to allow Jaql queries to access DB2 (or any other Java-enabled DBMS), as in the example in Figure 38. DB2 sees this request as just another incoming JDBC/DRDA® remote connection, and returns the DB2 data to BigInsights, where it will be processed in the same way as if it had accessed native HDFS data.
This capability works well for low-volume queries that need access to current data, but some applications require a local copy of the DB2 data to be held within BigInsights/Hadoop for performance reasons. A tool known as Sqoop (“SQL-to-Hadoop”) can be used for this purpose. Sqoop allows bulk data transfers between DB2 and BigInsights and can load the extracted data into a local HDFS cache.
Figure 37: DB2 11 Accessing BigInsights data
Figure 38: BigInsights accessing DB2 11 data
The new integration capabilities delivered within DB2 11 allow organizations to more easily and efficiently combine the results of big data analysis with up-to-date operational data from DB2 OLTP databases, significantly increasing the practical value of any insights obtained.
IBM’s Query Management Facility™ (QMF™) tool is as old as DB2 itself and provides a solid platform for executing many customer’s reporting and analytics queries. QMF 11 will be released at the same time as DB2 11 for z/OS. It provides many new or enhanced facilities, including:
In addition to the specific analytic features discussed within this section, a number of other enhancements will be of direct benefit to BI/query workloads.
The significant CPU reductions provided by DB2 11 are directly applicable to analytics workloads and are expected to provide an immediate CPU/cost reduction of 10 to 40 percent for complex reporting queries.
Many of the optimizer enhancements will result in faster, more efficient performance for analytics query workloads (which are frequently complex and access large amounts of data) as well as traditional OLTP.
DB2’s temporal functionality has already proven very valuable in many warehousing/analysis environments, which commonly have to support a historical perspective. The DB2 11 enhancements described elsewhere in this paper will add even more value.
DB2 11’s transparent archiving capability could provide a useful alternative or extension to the use of temporal tables for managing historical warehouse data.
SQL PL enhancements for global variables and variable arrays should be of considerable interest within BI environments for the added flexibility they offer when constructing ETL processes.
This section outlines some of the high-level considerations around the timing and structure of the DB2 11 upgrade process.
DB2 10 for z/OS broke with recent tradition by letting customers upgrade either from DB2 9 or from the earlier DB2 Version 8. IBM made it clear at the time that this was an extraordinary measure, specifically intended to help customers who had been unable to upgrade their Version 8 systems due to budget pressures caused by the global economic downturn.
True to its word, IBM is not offering a “skip migration” option for DB2 11. Customers must be running DB2 10 in New Function Mode in order to begin the upgrade process. This means that customers still running DB2 9 (or earlier) systems must complete an upgrade to DB2 10 before they can begin planning for a further upgrade to DB2 11. Figure 39 provides an overview of a possible upgrade decision process, based on the currently used DB2 release level.
IMPORTANT NOTE FOR DB2 9 CUSTOMERS:
IBM has announced that support for DB2 9 will be withdrawn in June 2014. As most DB2 upgrade projects require six to 18 months to complete, it is important for all remaining DB2 9 customers to begin planning to upgrade to DB2 10 as soon as possible.
In addition to the requirement to be running DB2 10 in New Function Mode, a number of other conditions must be met in order to upgrade to DB2 11.
Hardware requirements. DB2 11 requires a System zEC12, z196, or z10® server supporting z/Architecture®. As outlined earlier, some DB2 11 features, such as 2 GB page frames, are supported only when running on a zEC12 server. DB2 11 is also expected to require increased real storage for a given workload compared to DB2 10.
System software requirements. DB2 11 requires z/OS V1.13 or later.24 Some of the enhancements related to security also require z/OS Security Server (RACF) V1.13.
Figure 39: DB2 11 upgrade decision process
Application requirements. The new DB2 11 Application Compatibility feature removes the usual requirement for application code to be altered to address incompatible changes. Although the code will still have to be changed at some point in the future, this is no longer a prerequisite for upgrading to the new release.
DB2 11 does not support packages bound prior to DB2 9, so these will need to be rebound in preparation for the upgrade. However, the plan management features introduced in DB2 10 should significantly reduce the effort and risk associated with this activity (see the section on BIND/REBIND enhancements for a discussion of the DB2 10 and DB2 11 plan management features).
Most DB2 customers will wait for at least 12 to 24 months after a new release becomes available before beginning their upgrade project, depending on their internal policies and the sophistication of their regression testing. As Figure 40 shows, adoption rates for DB2 10 were significantly higher than normal due to the significant business benefits that were quickly available within that release.
Figure 40: DB2 10 customer adoption
Recognizing the requirement for customers to be able to exploit enhancements and obtain business value as soon as possible, IBM placed additional emphasis on code quality throughout the DB2 11 Early Support Program with the objective of allowing a limited number of customers to upgrade their production systems before General Availability.
Each customer must carefully assess the business case for their specific environment in order to determine the timing for their DB2 11 upgrade project. However the emphasis on early production readiness, combined with the significant business benefits outlined here, is expected to once again result in higher-than-average customer adoption rates for DB2 11.
The process of upgrading a DB2 environment from Version 10 to Version 11 follows the same overall structure as previous releases. Each DB2 system is first moved into DB2 11 Conversion Mode (CM) before entering New Function Mode (NFM) via an intermediate step known as Enabling New Function Mode (ENFM). Most new releases require changes to be made to the DB2 catalog and directory (typically upon entry to the CM and ENFM phases), and this can cause some disruption to application workloads.25
Although the IBM recommendation is to schedule a full DB2 outage during the transition to CM and ENFM, business demands for true 24x7 availability are making it increasingly difficult for customers to do so. Therefore, recent releases have made it possible to reduce the operational impact by permitting some application access to continue during the upgrade.26
Further progress has been made in this area within DB2 11, especially with regard to the second set of catalog changes made on entry to the ENFM phase. This process is expected to require considerably less elapsed time than the DB2 10 equivalent and also to be less disruptive while it is executing. One test conducted by IBM against a large catalog from an ESP customer showed an 18X elapsed time improvement for the ENFM catalog changes compared with DB2 10.
Figure 41 shows the impact of the DB2 11 upgrade process for non-data sharing customers who want to minimize the operational disruption associated with the upgrade. A complete outage is still required to stop and start the DB2 subsystem and run the initial catalog update upon entry to Conversion Mode, but full availability can then be restored for the duration of the CM phase. Further catalog updates are required during the ENFM phase, but these can run alongside some normal application workload if required.27
Figure 42 shows the equivalent situation for data sharing environments. In this case, a complete DB2 outage can be avoided as one member of the data sharing group can be transitioned to Conversion Mode at a time, allowing applications to continue accessing the other members of the data sharing group.
Although the IBM recommendation remains to schedule a dedicated outage during the transition to CM and ENFM, the enhancements within DB2 11 further reduce the operational impact for customers choosing to conduct an online migration.
Figure 41: Application availability during DB2 11 upgrade (non-data sharing)
Figure 42: Application availability during DB2 11 upgrade (data sharing)
This section is based on interviews with some of the organizations that participated in the DB2 11 Early Support Program. Based on their early experiences with the product, they outline the business benefits they expect by exploiting the features in the new release.
As one of the world’s biggest and most successful car manufacturers, the BMW Group is at the forefront of both automotive and IT innovation. DB2 for z/OS is a critical component of many of the company’s worldwide computer systems, from manufacturing to supplier management and customer ordering. In total, the German car company has more than 130 DB2 subsystems, belonging to over 40 data sharing groups and spread over eight z/OS LPARs.
BMW Group is no stranger to Early Support Programs. It participated in the ESP for DB2 for z/OS V2.1 way back in 1988, and more recently it was a key member of the DB2 10 for z/OS beta, which ran for most of 2010. “We participated in the DB2 10 ESP because at that time we were facing some real challenges with workload growth,” said BMW Group’s DB2 for z/OS Product Manager. “The CPU savings and virtual storage constraint relief offered by DB2 10 really helped us to cope.”
Fast forward to mid-2013, and BMW Group is still enjoying healthy growth across its major markets, with all of the associated pressures that can bring to the supporting IT infrastructure. “Virtual storage isn’t a big limitation for us anymore, but we expect the CPU savings in DB2 11 to provide the major business benefit for us,” he added. “We are very early in our performance testing, but we have already seen CPU reductions of 8 to 13 percent on some workloads due to the more efficient decompression algorithms.”
As part of the strategy to cope with the ongoing workload growth, BMW Group will also be upgrading its existing IBM system z10 EC servers to the latest generation zEC12 mainframes in the second half of 2013. BMW Group was an early adopter of the 1 MB large page frame support introduced by DB2 10, and the DB2 for z/OS Product Manager is looking forward to the opportunity to evaluate the benefits of the larger 2 GB page frames supported by DB2 11 when running on the new servers. “We use very large buffer pools within all of our production DB2 systems, so we’re anticipating further CPU savings when we can begin using 2 GB frames.”
BMW Group employs a highly centralized IT infrastructure, supporting all of its global business activities from within the Munich data center. That makes getting dedicated change slots difficult, so some of DB2 11’s availability improvements are also going to be most welcome for the DB2 team. “We operate in a very dynamic 24x7 environment, and we often have to update our applications while they continue to operate. The ability to break into persistent threads and the enhanced dynamic schema change capabilities will allow us to react to business requirements more quickly and with less operational impact,” said the Product Manager.
Another member of the BMW Group DB2 Team has been impressed with the stability of the new release: “We have encountered fewer install problems than we did at this stage with DB2 10, and our critical ISV tools are also stable, so overall we’re very pleased with code quality. Support from the local IBM team, the IBM moderators, and the lab members has also been first class.”
Perhaps the ultimate vote of confidence in the new release can be found in BMW Group’s provisional plans for rolling out DB2 11 once the ESP program ends. “We are currently scheduled to begin upgrading our main DB2 systems in February 2014,” said the team member, “with the objective of completing the rollout in plenty of time for our annual change freeze the following November.” With DB2 11 likely to be less than 6 months old when the rollout begins, that’s a powerful endorsement of the business benefits and product stability that BMW Group are expecting.
Bielefeld is a large city in North-Rhine Westphalia, the most populous state of Germany. As one of the largest municipal enterprises within the German energy market, Stadtwerke Bielefeld is responsible for the delivery of electricity, gas, district heating, and drinking water to the city’s 328,000 citizens. The company’s IT department runs IS-U, SAP’s industry-specific solution for the utilities industry, and this depends upon DB2 for z/OS as its back-end data store.
Stadtwerke Bielefeld currently has around 60 active DB2 10 subsystems supporting SAP, spread across four LPARs of an IBM z196 enterprise server. After hearing about how DB2 11 could benefit the IS-U application at an SAP customer event, DB2 DBA Bernd Klawa recommended that Stadtwerke Bielefeld get involved in the Early Support Program. “The main reason to get involved in the ESP program is new functionality for the SAP Software,” said Bernd. “The software layer for decompression is very valuable for SAP databases, since all Unicode table spaces are compressed.”
Although the interview with Bernd was conducted before detailed performance metrics were available, he was able to share some encouraging initial results: “RECOVER of catalog and directory runs more than twice as fast as on DB2 10. The SAP IS-U unbilled revenue application (a batch workload) showed an elapsed time reduction of about 20 percent in Conversion Mode. After migration to DB2 11 New Function Mode, I saw automatic index pseudo delete cleanup, which should reduce the need to REORG after big batch runs.”
Bernd also successfully tested the process of converting to DB2 11’s new longer 10-byte log addresses. “The old 6-byte log address is not really an issue for our data center. However, the conversion will still be done when we move to DB2 11 for improved logging performance.”
The experiences of Stadtwerke Bielefeld match those of many other ESP customers when it comes to the quality and stability of the new release. “I have been working with DB2 since Version 6, and I never saw such a robust release in such an early stage of development. The quality is very good,” commented Bernd.
What about Stadtwerke Bielefeld’s plans to upgrade their main DB2 systems to DB2 11 when the ESP program ends? Bernd is naturally keen to exploit the benefits within DB2 11 as soon as possible: “I expect to move to NFM in development and quality assurance systems as soon as DB2 11 is certified for usage by SAP.”
JN Data specializes in providing IT operations and engineering for large Danish financial institutions, including Jyske Bank, Nykredit, Bankdata, BEC, and SDC. DB2 for z/OS is a key component within this shared infrastructure, with around 100 DB2 data sharing members hosting development and production services for JN Data’s customers.
The company recognizes the importance of staying abreast of the latest developments within critical infrastructure components such as DB2 so that it can continue to deliver the best service to its clients. JN Data was a prominent member of the DB2 10 Early Support Program and was keen to repeat the experience when the DB2 11 ESP was announced.
Although new development features such as transparent archiving are important, according to Systems Programmer Frank Petersen it’s the operational enhancements that have the most immediate business value. “Features such as temporal tables and transparent archiving can save many hundreds of developer hours, but there is often quite a delay before a suitable application comes along to use them,” he said. “Operational items such as the ability to interrupt persistent threads can be used almost immediately and will make a much bigger impact on us in the short term. I expect this will allow us to use some BIND parameters that should give some significant CPU savings and especially for highly used packages driven through persistent threads, which are used widely in our online systems.”
What about other operational enhancements in DB2 11? “The move to 10-byte LRSN/RBA log addresses will also be important for us,” said Frank. “We hit the RBA issue on one of our DB2 systems a while ago and had to take manual action to resolve it. We’re expecting to encounter the same problem again within the next couple of years, so it’s great to see a properly engineered solution from IBM.”
Like their counterparts in many other organizations, JN Data’s DB2 support staff never seem to have enough hours in the day. “We love autonomics. DB2 11 has some really nice features for reducing the burden on the DBA, such as the automatic cleanup of pseudo-deleted index entries.” Sticking with the operational theme, Frank is also impressed with the improvements in the DB2 11 upgrade process, which make online upgrades more feasible. During its ESP testing, JN Data used IBM’s Optim Query Capture Replay (OQCR) product to capture a few hours of workload on a real production system, then replayed that workload while attempting to upgrade one of its test systems to DB2 11. “It just worked, and the elapsed time for the catalog updates was much lower than before,” said Frank. “Getting dedicated change slots for DB2 upgrades is a huge challenge with the number of systems we support. This will allow us to upgrade at a quiet time but without making the application unavailable. We’ll be performing online upgrades for our main DB2 systems when we move to DB2 11.”
Other operational pressures will prevent JN Data from beginning their DB2 11 upgrade project until 2014, but Frank is confident that DB2 will be ready when they are: “The ESP code mostly just worked from day one. The difference in code quality between DB2 11 and previous versions at this stage of the ESP is very noticeable.”
Notes
1. To benefit from DB2’s improved ability to select the most efficient access path, a “rebind” will usually be required to allow DB2 to re-create the access path structures for an application. This does not require any changes to the application itself.
2. Note that achieving these savings may also require the new log address format to be implemented; for further details, see the “Extended Log Record Addressing” section.
3. 64-bit multithreaded WLM stored procedure address space requires Java 64-bit JDK, Version 1.6.
4. XQuery and some of the performance enhancements discussed in this section will also be retrofitted to DB2 10 for z/OS via the maintenance stream.
5. Hash join is an alternative join method to the merge scan or nested loop technique and can be more efficient in certain instances.
6. An In Memory Data Cache (IMDC) is an area of memory that holds intermediate work files when tables are joined. IMDCs were introduced in DB2 for z/OS Version 8 as a more cost-effective alternative to the sparse index technique used in prior versions of DB2 for z/OS. (DB2 can still use sparse indexes where insufficient memory is available to use an IMDC.)
7. The memory available for IMDC operations is controlled by the DB2 DSNZPARM parameter MXDTCACH.
8. The new tables are SYSIBM.SYSSTATFEEDBACK for recommendations generated during optimization of static/dynamic SQL during BIND, REBIND, or PREPARE, and DSN_STAT_FEEDBACK for recommendations generated during SQL EXPLAIN.
9. Data skew describes a common situation in which data stored in a DB2 table is not uniformly distributed. For example, a table containing information about U.K. citizens may contain 60 million rows and 200 different cities for primary address. A uniform distribution should result in each city having 60,000,000 / 200 = 300,000 citizens associated with it, but the city of London will have over 8 million. Execution skew describes a similar concept, where some literal values are used in queries much more than others. Again, queries WHERE CITY=‘LONDON’ are likely to be far more common than those WHERE CITY=‘WELLS’.
10. An access path hint lets the DBA strongly suggest a specific access path to DB2, overriding the optimizer’s usual cost-based analysis. This is generally considered a last resort and is usually discouraged, due to the additional effort required to create and subsequently maintain the hint.
11. It is possible to configure group buffer pools to not cache changed data, but this unusual.
12. This capability also requires a new coupling facility microcode level (CFLEVEL) to be installed. IBM plans to retrofit this function to DB2 10 for z/OS via the maintenance stream.
13. Online REORGs let the majority of the work be done on a shadow copy of the data while full application access is maintained. The data is unavailable for only a short period at the end when access is switched to the shadow copy (known at the SWITCH phase).
14. Online REORGs require a separate mapping table to be defined, which lets DB2 keep track of the relationship between the rows in the original table and those in the shadow copy. In releases prior to DB2 11, these must be manually defined by the DBA.
15. Inline statistics are taken while another utility (e.g., REORG) is running. As that utility typically has to access all the data anyway, the overhead for simultaneously collecting statistics is much smaller than when running RUNSTATS separately.
16. SYSIBM.DBDR, SYSIBM.SCTR, SYSIBM.SPTR, SYSIBM.SYSLGRNX, and SYSIBM.SYSUTIL.
17. An LRSN delta may be set when enabling data sharing for a non-data sharing system. Higher delta values are typical when enabling data sharing on older DB2 systems, or repeatedly enabling and disabling data sharing.
18. Note that DB2 10 introduced an alternative approach known as “partition by growth” that does not require a partitioning key to be defined. However, the traditional “partition by range” approach is more commonly used.
19. This feature has several commonly used names, including “plan management,” “package management,” and “plan stability.”
20. On BIND, the OWNER keyword specifies the owner of the plan or package being bound, which must have access to execute all the static SQL inside the package being bound.
21. 2012 IBM Global Big Data Online Survey. Base: 60 IT professionals, multiple responses accepted.
22. One of the ways in which IBM is reducing the overall cost of mainframe workloads is to offer customers the option to install additional “specialty processors” within their System z machines. These processors are capable of running only specific types of work, but in so doing they can reduce the load on the general-purpose CP processors and therefore the amount or chargeable CPU consumed. The zIIP is a specialty processor designed to offload specific types of data and transaction processing workloads, such as remote SQL statements, some DB2 utility processing, and network encryption.
23. Jaql is a scripting language for enterprise data analysis based on the Hadoop MapReduce parallel programming framework. It was originally created by IBM and was donated to the open source community.
24. z/OS 1.13 has been available since July 2011, so this requirement should not be an issue for most customers.
25. The degree to which application access is disrupted depends on a number of factors, such as whether data sharing is used, the size of the DB2 catalog/directory, and the type of application workload that is active during the upgrade.
26. To minimize application impact, the catalog update process should be scheduled during a period of low DB2 system activity, when no changes are being made to the catalog by applications. This means that DDL and BIND/REBIND activity should not be allowed, but a small volume of normal application workload is acceptable.
27. FlashCopy is a function provided by IBM disk storage systems that allows near-instantaneous copies to be made of data. Other vendors provide similar functionality.
Acknowledgements
The author would like to thank the following people for their invaluable contributions to this paper: