5
Miscellaneous Availability Enhancements
DB2 10.5 delivers a number of availability enhancements in addition to the DB2 pureScale enhancements that we cover in Chapter 2, including extended capabilities and new features in the areas of space reclamation for insert time clustering tables (for example, extents don’t have to be completely free to take advantage of its special space reclamation characteristics); reorganization enhancements that affect tables with overflow and pointer records; adaptive compression support for online table reorganization; and some useful online table management features for tables with referential constraints that are no longer blocked. In this chapter, we outline these “not in the highlights reel” availability enhancements because they’re bound to be well appreciated by any DBA who manages a DB2 ecosystem.
Better Online Space Reclamation for Insert Time Clustering Tables
Insert time clustering (ITC) tables were first introduced in the DB2 10.1 release. ITC tables provide an effective way of maintaining data clustering and easier management of space utilization. ITC tables have similar characteristics to multidimensional clustering (MDC) tables. For example, both of these table types use block-based allocation and block indexes. From a clustering perspective, you can think of ITC tables as clustering data using a virtual column which clusters rows based on row insertion time. You create an ITC table by specifying the ORGANIZE BY INSERT TIME
clause on the CREATE TABLE
statement. You can use the ADMIN_MOVE_TABLE
routine to convert existing regular tables into ITC tables online, or use EXPORT
, IMPORT
, or LOAD
to move data from a source table to a target table. DB2 10.5 simplifies the management of these tables.
Understanding Insert Time Clustering Tables—A Primer
The best way to understand how ITC tables work is to compare them with regular tables, and we’ll start by taking a look at how a regular table manages its space. When rows are inserted into a regular table, they’re placed on an arbitrary page; more accurately, DB2 has an internal space search algorithm that selects a page with enough space for the row, but from a user’s point of view, the page is seemingly selected at random.
Consequently, as rows are deleted over time, a table can have quite a bit of free space that is spread out (or “fragmented”) in small quantities over many pages. Although this space can be consumed by newly inserted rows in the same table, other tables can’t easily use this space. If you want to enable other tables to use this newly available space, a reorganization (REORG
) operation must be performed, which has the effect of tightly packing the existing rows together on a data page, starting with the table’s first page. The end result is numerous empty pages at the end of the table, and a set of empty pages that are subsequently made available to other tables in the table space, as shown in Figure 5-1.
Figure 5-1 How DB2 performs a typical table reorganization
A reorganization operation such as the one shown in Figure 5-1 can be performed in place in an online manner. This method can take a fair amount of time to complete, due to the amount of data movement taking place; however, it’s important to note that this is by design: Online DB2 table reorganization is designed to make the table ultra-available during the operation. You can throttle or pause and resume such an operation in DB2, and its other benefit is that it bypasses the doubling of storage requirements that is typically required by other vendors’ “shadow” approaches to online table reorganization.
We can tell you that sets of rows that are inserted at about the same time are often deleted at about the same time. For example, consider an invoice tracking system that regularly runs a batch job to delete invoices that are more than 90 days old. When this batch job runs, any rows that were inserted 90 days ago are deleted. If these rows happen to be located on the same data pages, those pages are now empty and things are simple. Unfortunately, as with most things in life, it’s never quite that simple. In the case of a regular table, the rows that you want to delete are not likely to be on the same page. Wouldn’t it be nice if all the rows that are targeted for deletion based on time were initially inserted on the same pages? And if those pages, when empty, could easily be released back to the table space without a traditional reorganization operation, wouldn’t that be terrific? This is where ITC tables help because that’s exactly what they were designed to do.
An ITC table clusters rows according to the time at which they were inserted; rows that are inserted at about the same time are placed on the same page, or on a page in the same extent (an extent is just a contiguous group of pages). If these same rows are deleted during a batch operation, entire pages (or entire extents) become empty. Moreover, ITC tables allow such empty extents to be returned to the table space quickly (and automatically) through DB2’s automated table maintenance daemon, or manually by specifying the RECLAIM EXTENTS
option on the REORG TABLE
command, as in the following example:
We want to emphasize that this is a very quick and online operation. Unlike a traditional reorganization operation, space reclamation from ITC tables doesn’t involve the movement of rows, as was the case in Figure 5-1. In fact, all that happens is that the empty extents are returned to the table space, as shown in Figure 5-2.
Figure 5-2 How DB2 performs an ITC table reorganization
ITC table reorganizations are fast for two reasons. First, the empty extents are easy to find because they’re marked empty after the last row is deleted. Second, there’s no row or data movement—the operation just returns empty extents to the table space, in place and online.
What’s New for Insert Time Clustered Tables in DB2 10.5
In DB2 10.1, extents had to be completely empty for them to be returned to the table space. DB2 10.5 enhances the value proposition for ITC tables by enabling even partially empty extents to be reclaimed. To help explain why this enhancement to ITC tables could be useful, let’s return to the invoice system scenario. In a more true-to-life example, the batch delete job in our example might delete all rows that are more than 90 days old and that have been paid. This means that there might be the odd case in which one or two rows are left behind because the associated invoices haven’t been paid, thereby preventing all of this space from being reclaimed.
Consider the benefit if those rows representing unpaid invoices that are holding up the full reclamation of space in an almost empty extent could be consolidated on a small number of pages. If you think that’s a great idea, you’ll be pleased to know that’s exactly what the DB2 10.5 enhancement for ITC tables does: A REORG RECLAIM EXTENTS
operation automatically moves rows from almost empty extents to other extents so that the resulting empty extents can be returned to the table space, as shown in Figure 5-3. Again, and as you’ve come to expect from DB2, all of this occurs automatically whether you are using DB2’s automated table maintenance daemon or you are using the REORG RECLAIM EXTENTS
command to reclaim the space.
Figure 5-3 How DB2 10.5 performs an ITC table reorganization; it’s smarter, because it can pack nearly empty extents to automatically return more free space to the table space.
This improved RECLAIM EXTENTS
operation is still very fast in DB2 10.5 because the row movement is limited to nearly empty extents.
Note that you can use the ADMIN_GET_TAB_INFO
stored procedure to monitor the amount of space that is available for reclamation from an ITC table. For example, the following query returns the total amount of space that is available for reclamation from table T1:
DB2 10.5 updates the output from this command to accurately show the space that can be reclaimed by freeing up nearly empty extents.
Other DB2 10.5 Reorganization Enhancements
DB2 10.5 includes a few more enhancements for table reorganization that are designed to increase the availability of a DB2 server. Enhancements include a “fastpath” option for collapsing overflow and pointer records; support for adaptive compression with online, in-place reorganization; and support for tables with referential constraints that are referenced in the online ADMIN_MOVE_TABLE
stored procedure.
Fastpath! Collapsing Overflow and Pointer Records
Let’s start this section by ensuring that you understand what we mean by overflow and pointer records. Let’s assume that you’ve got an employee in Florida who moves from Tampa to St. Petersburg. This personnel change requires an update to the employee’s record in the EMPLOYEE
table. Changing the CITY
column value (defined as VARCHAR
) for this employee from “Tampa” to “St. Petersburg” increases the length of the row. Suppose that there isn’t enough free space on the page to contain this new value. In this case, DB2 will insert the new version of the row on a different page that has enough space and consider the old version of the row to be its internal location, and use an associated record identifier (RID) for the newly moved row. The new version of the row is called an overflow record, and the old version, which now contains an RID, is called a pointer record because it “points” to the location of the actual row.
When an overflow record and pointer are used to support an update, any index references to this row still refer to the old location of the row. Although this operation is seamless, if very frequent row-enlarging updates spawn the creation of many overflow and pointer records, performance-sensitive applications could see some impact because index access to these rows requires access to two pages instead of one: one to read the pointer record, and the other to follow the pointer record to the page that actually contains the row data. Of course, a reorganization operation will convert the overflow/pointer pair to a single normal record and update indexes; however, as we mentioned earlier, DB2’s online reorganization protocol is designed to be ultra-available and nonblocking, and can take some time to complete.
Suppose that you’ve been noticing a small performance degradation in some of your queries since a large number of updates occurred. After reviewing the built-in DB2 OVERFLOW_ACCESSES
monitor element, you realize that your table has some overflow and pointer records that are affecting the performance of your mission-critical applications. You obviously want to collapse them, but you don’t want to perform a complete table reorganization. The DB2 10.5 fastpath option directly addresses this scenario with the new CLEANUP OVERFLOWS
option on the REORG TABLE
command. For example:
This reorganization operation simply scans the table for overflow/pointer pairs and converts each pair into a normal record. Because this operation is limited to cleaning up overflow records alone, it executes much more quickly than a full reorganization operation.
Support for Adaptive Compression with Online, In-place Reorganization
Have you ever tried to perform an online, in-place table reorganization on a table that is using the adaptive compression feature that was introduced in the DB2 10.1 release? If you did, you likely weren’t pleased with the results: a “not supported” SQL error code. To be honest, this was simply a piece of work that didn’t get done in time for the general availability of the DB2 10.1 release, and for one reason or another (we were really going all in on the stuff we were coding up for DB2 10.5) it didn’t make it into a Fix Pack, and thus it makes its debut in DB2 10.5.
Support for ADMIN_MOVE_TABLE Routine with Referential Constraints
Have you ever tried to use the ADMIN_MOVE_TABLE
routine to move a table to a new table space, or make some general schema changes while maintaining online access to the table? If you had referential constraints defined on this table, you’d find out that you couldn’t do that. In DB2 10.5, you can.
Wrapping It Up…
In this chapter, we talked about some of the high-availability enhancements that get overshadowed when you look at all the availability work that’s gone into the DB2 pureScale 10.5 feature set. We also discussed how the ability to reclaim space from insert time clustering tables becomes easier in DB2 10.5, as does the management of tables with overflow and pointer records. Finally, we covered new support for some things that you couldn’t do before, such as invoking the ADMIN_MOVE_TABLE
routine on a table with referential constraints, or performing online reorganization operations on tables using adaptive compression. All in all, DB2 10.5 “raises the bar” for availability.