4
Miscellaneous Performance and SQL Compatibility Enhancements
There’s no question that the most noticeable features of the DB2 10.5 release fall under the BLU Acceleration and DB2 pureScale enhancements category; that said, DB2 10.5 has a number of other enhancements that make it an even richer platform from an operational perspective. For example, there are DB2 10.5 enhancements aimed at the continual DB2 drumbeat of making it as seamless as possible to migrate to DB2 from the Oracle database (who’s kidding who: this is what we mean when we use marketing-speak and say “compatibility”); there are some features that have been driven into the product to make way for DB2’s new JavaScript Object Notation (JSON) document store capabilities (think MongoDB-styled applications); some general performance enhancements; and more. We decided to put all the “stuff” that isn’t a chapter on its own, but still pretty big from a SQL compatibility, availability, and performance perspective, into a set of Miscellaneous chapters.
image
Expression-Based Indexes
DB2 10.5 introduced the concept of expression-based indexes, which as its name suggests, allows you to define indexes on general expressions (this is used heavily in the DB2 JSON Document Store, which we cover in Chapter 6). Expression-based indexes are going to make applications that rely on the type of functional searches that this new feature empowers, to perform much faster (since they will no longer be forced to perform full table or index scans), as well as make the lives of application developers easier.
We think you’ll be better able to appreciate how expression-based indexes will help your DB2 environment with a simple example, so let’s start with the following table in mind:
image
To speed up query performance, a DBA typically would want to create indexes on important application lookups, which may include complex expressions such as a total compensation (equal to your salary plus your yearly bonus combined) and case-insensitive names (it’s faster to search for a name without having to worry about byte-level comparisons of uppercase or lowercase letters, and so on). For example, a ubiquitous set of indexes that illustrate this could be created as follows:
image
Of course, if you tried to create any of these indexes before DB2 10.5, they would fail, since index on expression support didn’t exist (you can certainly create generated columns in a base table like the ones shown here, just not indexes).
Get Faster: Query Processing Before and After DB2 10.5 with Support for Index Expressions
Keeping with our example to help you better understand the benefits that DB2 10.5 brings with its new index expression support, first let’s consider how things worked in DB2 10 (or earlier versions). Take for example the following ubiquitous SQL query that’s looking for an employee by last name without case sensitivity:
image
With the previous query (before expression-based indexes existed in DB2), DB2 would have to perform a full table (or index) scan to satisfy this query, even if there was an index on LASTNAME. In other words, to find any employee whose last name is RICK, all rows in the table (or all keys in the index) would need to be examined and compared against the value RICK.
Now consider that DB2 10.5 lets you define an index on UPPER(lastname); this is going to yield a number of benefits. First, in consideration of our example, this enhancement means that this popular query can now be executed with an index range scan that accesses the fraction of the index that corresponds to the actual values in the predicate. As you’ve likely inferred by the words “full table scan” in the description of how things worked in previous versions of DB2, there’s going to be a performance boost here. We typically see (and, of course, it depends on the data distribution and the query) a dramatic reduction in the number of pages that need to be accessed to address queries like the one in our example and this results in a dramatic improvement in performance.
Expression-Based Indexes Make Application Development Easier
Another benefit of the new DB2 10.5 index on expression capability is that it makes the whole process of application development much simpler. Before this feature was supported in DB2, developers would often resort to adding application-side code to approximate the behavior of a function-based index. Of course, the downside to this approach is that it required extra development effort, pushed data performance techniques out of the data layer, and required subsequent application deployment for new searches. DBAs could try and work around this missing feature with generated columns, but that’s more than likely to require additional storage; it’s a work-around, but it’s not as clean or efficient, and it has its limitations too.
For example, one approach that’s been used in the past to compensate for this missing feature in DB2 of old was to add a column to a table, where the column value is automatically generated based on a defined expression. The following is a typical workaround for this missing feature in previous versions of DB2:
image
Of course, the extra column consumes more space and adds complexity to the application in that it needs to deal with this new column.
Expression-Based Indexes Have Views
When you start to implement this handy new DB2 feature in your environment, we wanted to point out that you may see additional views being created when you use expression-based indexes; don’t worry about it—this is normal. DB2 will automatically create and manage (there’s nothing for you to do here) a set of statistical views to help facilitate statistics collection on your expression-based indexes. For example, when the EMP_UPPER index was created earlier in this section, DB2 would have automatically created a statistical view with a system-generated name—for example, it would have generated Data Definition Language (DDL) such as: CREATE VIEW emp_upper_v(K00) AS SELECT UPPER(lastname) FROM employee. If you were to drop the EMP_UPPER index, DB2 would automatically drop its associated statistical view.
As a DBA, when you collect statistics on a table with an expression-based index and choose to also include the collection on index statistics, DB2 will use this statistical view to collect the statistics that correspond to the specified expression. Again, all this occurs automatically, and the net effect is that there’s no need for you to collect statistics separately on the view.
image
Excluding NULL Keys from Indexes
Another index-related capability that’s new in DB3 10.5 in addition to expression-based indexes is the ability to exclude NULL keys from the indexes themselves. Quite simply, this new DB2 10.5 feature allows you to define indexes that ignore NULL keys. We bet you’re wondering at this point how ignoring NULL keys in an index could be valuable. As we’re sure you’re aware (and likely experienced), “more from less” has always been a key thrust of each new DB2 release, and that certainly applies with this feature when it comes to performance; you get more by excluding NULL keys from indexes and less resource consumption.
We’ll illustrate the benefits of NULL exclusions from indexes using an example. Consider the following ORDERS table from an online ordering system:
image
Let’s assume in this example that when a new order is submitted, the order number is not immediately known and, therefore, the order’s corresponding record gets inserted into the ORDERS table with a NULL value for the ORDER_NUM key.
This is pretty typical in a mobile world, where applications are often broken down into browsing and buying modules. When potential clients are browsing and adding items to their online shopping carts, that stuff is persisted (for abandoned cart analysis, availability in case of a browser’s session crash, and so on); however, a true order number isn’t generated until a potential client’s order has passed a payment authorization check and the item is verified to be in-stock (the application is likely using a very forgiving isolation level during the browsing and addition to cart phases of an order, but needs tighter semantics at purchase time). If the payment and availability checks pass, and the order is verified, a subsequent transaction in the ordering system assigns an order number to the corresponding row in the ORDERS table, which is updated with this order number. Let’s further assume that there’s a frequent need to look up past orders by order number, and this leads the DBA who supports this ordering application to create an index that looks like this:
image
If you were to eavesdrop on a DBA’s soliloquy while they created this supporting index structure (yes, we know a couple who think and talk to themselves just like this), you’d hear them say, “It’s a shame that this index needs to include orders that are in-process and don’t yet have order numbers. Wouldn’t it reduce index maintenance costs by only including orders that actually have order numbers? After all, when I look up orders, I’m only going to be interested in orders that have finished processing, those that are real and have been assigned order numbers.”
Our answer to such a thought-provoking and privately pontificating DBA is a resounding “Yes!” However, before DB2 10.5, there was no way to accomplish this, and now you get a good idea for the inspiration behind this index enhancement. Specifically, before DB2 10.5, all indexes had to index all the rows of a table, including those with NULL keys. The DB2 10.5 release adds the ability to exclude NULL keys from an index using the aptly named EXCLUDE NULL KEYS option of the CREATE INDEX statement; for example:
image
The index created with the previous DDL will instruct DB2 10.5 (or later) to ignore rows with NULL index keys during index maintenance. Quite simply, when a row with a NULL order number is inserted into our example index’s corresponding base table, no key will be inserted into the ORDER_NUMI index. What’s more, if (or when) that same row’s order number is updated from NULL to an actual non-NULL value (the order is hardened and becomes real), at that point in time, the correct value will be inserted into the index as a new index key.
This new feature not only avoids the processing overhead of unnecessarily maintaining NULL keys in an index, but also has the additional benefit in that there are savings associated with storage and buffer pool memory resources because the NULLs aren’t being stored. Of course, just like in the new expression-based indexes feature we talked about in the previous section, this is yet another feature that fits into the “do more with less” DB2 theme that repeats itself across every release—excluding NULLs from indexes can yield more performance with less resource consumption.
Index NULL Exclusion Simplifies Application Development
Excluding NULLs from indexes has another advantage beyond performance, and less resource consumption: it can foster a more simplified application development environment, particularly with respect to unique constraints. Continuing with our online ordering system example, it would surely make sense for each order number to be unique, and you could imagine the benefit if this business rule could be pushed into the database and DB2 could be instructed to enforce this constraint by defining the index as UNIQUE. Obviously, before DB2 10.5, where there was a potential for multiple order numbers that inserted as NULL values into the index at any given time, this option simply wasn’t possible. The old work-around? The application development team would need to manually code uniqueness checking and prevention into their application. Of course, as was the case with expression-based indexes, that’s more effort, code to maintain, potential for errors, a reduction in agility, an increase in deployment costs, and more. The goal is to push this into the database system (where it belongs), and with the DB2 10.5 support for NULL exclusion from indexes, this all becomes possible.
Of course, with the new enhancements for NULL exclusion from indexes in mind, you could create an index for our online ordering application as follows:
image
Now that this logic is pushed into the database, it doesn’t need to be implemented as application-side code. The application development teams could remove the code and simplify everyone’s lives. What’s more, it promotes reusability because different applications wouldn’t have to import the corresponding code module (or even worse, each write their own in a large company, and that’s not uncommon at all). They could just rely on the database to enforce this logic—the way it should be for data-related logic.
When 2 + 2 = 4: Getting Even Richer Semantics by Combining Features
At this point, you’re likely thinking that you should be able to mix the ability for DB2 10.5 to create expression-based indexes with its ability to exclude NULLS from the indexes themselves; what’s more, using these two features together will undoubtedly provide some synergistic benefits in the form of advanced capabilities. If this is what you’re thinking, then you’re bang on! As usual, we’ll use an example to illustrate this statement.
Let’s continue our example with the assumption that you added an ORDER_STATUS column to the ORDERS table we defined earlier in this chapter. As orders come into the system, the ORDER_STATUS column takes on a value of NEW. When the order is in the middle of processing, the ORDER_STATUS is creatively changed to PROCESSING. Finally, with an even greater dash of creativity, when the ORDER_STATUS is finished, its status is updated with the keyword COMPLETE. Let’s further assume that any completed order numbers are kept online in the system for several months for typical reporting purposes. With this in mind, your application’s order number generation logic allows for completed order numbers to be reused.
In this scenario we outlined earlier, you couldn’t apply a unique constraint directly to the ORDER_NUMI index because of the potential for order number reuse. However, in DB2 10.5, you can combine expression-based indexes and the ability to exclude NULL keys from an index to apply a unique constraint to the ORDER_NUMBER column, and furthermore, only apply this business rule if the status of the order is not COMPLETE, as follows:
image
If the DECODE option (this option is similar to the existing CASE expression; it was added in DB2 9.5 as part of the Oracle compatibility feature set) was a person talking to DB2, it would basically tell DB2, “If the ORDER_STATUS is NEW or PROCESSING, use ORDER_STATUS as the index key; otherwise, use NULL (if the ORDER_STATUS is COMPLETE).”
These features combined in the previous example would result in DB2 enforcing unique ORDER_NUMS so long as the ORDER_STATUS was NEW or PROCESSING; its net synergistic effect means that DB2 will enforce unique ORDER_NUMS so long as ORDER_STATUS is in a NEW or PROCESSING state, which provides an advanced semantic that precisely matches the needs of the application.
image
Random Ordering for Index Columns
Another interesting indexing enhancement in the DB2 10.5 release is the ability to randomize the order of index columns when they are stored in the index, which can help optimize performance in some particular scenarios (in case you are wondering why someone would want to do this). Keeping with the online ordering system example used in this chapter, let’s create the ORDER_NUMI unique index on the ORDER_NUM column in the ORDERS table as follows (this is a simpler form of this index than the one we created earlier):
image
Let’s assume that in our application, order numbers are generated by incrementing the last order number used (a typical algorithm for this kind of application). Now take a moment and consider what physically happens within the ORDER_NUMI index as new order numbers are inserted into the index through high-throughput workloads, where many thousands or more concurrent transactions may be generating order numbers. If region thrashing comes to mind, you’re spot-on. If you’re not a DBA, think about the last time you were at a conference and they opened the buffet luncheon doors and everyone went to the closest two buffet lines—things got pretty inefficient (and dare we say rowdy at some of the database conferences we’ve attended).
What’s happening is that all of these transactions (people trying to eat) are trying to insert their order numbers (get some food on their plate) into the same region of the index (from the same buffet line). Specifically, dropping the analogy because it’s making us hungry just writing it, the transactions are trying to insert order numbers into the same page of the index’s B-tree because indexes physically store columns in their ordered sequence. For example, in an ascending index, if order number 1000 is stored on index leaf page 5, DB2 will also try to store order number 1001 on leaf page 5.
In our running example, it’s conceivable that some individual index pages can be competed for by multiple concurrent transactions, leading to less-than-optimal performance, and this is where the DB2 10.5 random index feature comes in. As of DB2 10.5, you could define the ORDER_NUMI index using the RANDOM keyword in the following manner:
image
When the RANDOM keyword is used, an order number is randomized before it’s stored in the index (you can kind of think of it as a hashing algorithm, in the same manner that a row is hashed to a specific DB2 database partition when DB2 is clustered using the Database Partitioning Facility). This means that if order number 1000 is stored on index leaf page 5, order number 1001 will very likely be stored on some other page (something other than page 5), which, of course, has the downstream benefit of more evenly spreading out index page targets for agents with the intent of writing out a sequenced order number, alleviating the competition for individual index pages and leading to improved performance.
To Random Order an Index Column or Not to Random Order an Index Column… That Is the Question
We’re sure that random ordering an index column sounds great (and logical); however, as with most things in the computing world (and life, for that matter), there are some trade-offs to consider with the new DB2 10.5 random ordering on index capabilities feature so we thought we’d comment on them here. So why don’t you just use this new feature for all your indexes? As you might have concluded, if you instruct DB2 to store the index columns in random order, you won’t be able to use such an index to satisfy queries that have ordering requirements. For example, the random index created in the previous section could not be used by DB2 to satisfy the following query:
image
If your application is laden with a high dependency on ordering, a traditional ascending index on the ORDER_NUM column might just be your best bet. Of course, this guidance is going to change from application to application; it’s going to depend on the query makeup, the application’s concurrency, read to write ratio characteristics, and more. This all said, while not a panacea, random indexes are a great feature that can help improve application performance in some situations.
image
More Data on a Row: Extended Row Size
Before DB2 10.5, the maximum row size of a table was limited by the page size of the table space where the table was created. This means that before DB2 10.5, a 4KB page size could have a maximum row size of 4,005 bytes; an 8KB page was limited to 8,101 bytes; 16KB to 16,293 bytes; and a 32KB page was limited to 32,677 bytes.
These row limits, especially considering the effects of DB2 compression on them, have worked just fine for most applications. That said, we’ve come across occasional cases where an application is coded to require a row size that exceeds the 32,677 limit (which is dependent on a 32KB page size). Where we’ve seen this requirement is in applications where the team’s coding practice is to use arbitrarily large column sizes (popular in web-based applications where developers want a sort of “bit bucket” to dump character data into), as opposed to the true maximum size required by the application semantic, such as the DDL we had from one client show here:
image
If you were to try and create this CUSTOMERS table in a previous version of DB2, the CREATE TABLE statement would fail because the row size is too large. Our first recommendation to these kinds of application development houses is to use the more realistically sized (and efficient) VARCHAR data type for columns such as FIRSTNAME and LASTNAME; after all, we’ve yet to come across names this long. That said, we also understand that changing legacy or existing application code is not always practical, and so DB2 10.5 includes a new capability to extend the row size limit of a traditional DB2 table.
As of DB2 10.5, rows can exceed the table space page size. When a row exceeds the page size limits detailed earlier in this section, under the covers (and seamless to the DBA), DB2 will split up the row into separate pieces and place the portion that doesn’t fit on a single page into a separate “special” large object (LOB) that won’t be seen by applications as a column.
We want to note that the design point of DB2’s new extended row feature isn’t for use cases where you want to access all of the table’s columns and most of the rows in that table exceed the table’s page size. In this kind of workload pattern, the performance overhead of accessing the separate large object may become significant. In contrast, this new extended row support is focused on the far more typical, where very large rows are rare and/or only subsets of columns are typically referenced. In such cases, the overhead of accessing the separate large object will typically be insignificant.
As you’re probably aware, or rather come to expect from our track record in the previous decade of DB2 releases, our development teams are maniacally focused on delivering new capabilities that are simple to use and implement. You’ve seen this in a number of features; compression is a great example, with its management by intent policy. For example, in DB2, you simply specify a table attribute that tells DB2 you want it to optimize the ecosystem for storage savings and performance. When a table is created using this attribute, it will automatically enable row and temporary table compression; it will automatically enable index compression and pick and choose which of DB2’s three index compression algorithms has the most positive effect on your environment (it can choose on, all, or a subset). The self-tuning memory management (STMM) infrastructure in DB2 is designed to learn and profile your running application and adapt heap allocations for best performance. Of course, BLU Acceleration’s “Load and Go” proposition is the latest example of our incessant focus on consumability. When you consider that we keep talking about how DB2 automates the great things it does for you where it makes sense, it should be no surprise that extended rows are enabled by default for any new database you create in DB2 10.5.
With this in mind, we understand behavior consistency and predictability for existing workloads is just as important for our clients when they move to a new release of DB2. For this reason, if you have any existing databases in your environment and then upgrade to DB2 10.5, the new extended row support will be disabled. If you want to enable it, it’s simple—just update the EXTENDED_ROW_SZ database configuration parameter and set it to ENABLE; it’s an online operation.