1
All Aboard! 15-Minute DB2 10.5 Tour Starts Here
Being a CIO or CTO is tough these days. Think about it—these executives face challenges that are so wide and varied, it’s dizzying and makes you sometimes wonder how anything gets done. From bring your own device (BYOD), to separation of duty (SoD) and concern (SoC), to high availability (HA) and disaster recovery (DR), to on-premise or off-premise cloud, columnar, and Big Data, there are more than enough buzzwords and real challenges to go around. The results of the most recent IBM CEO study, “Leading Through Connections” (www.ibm.com/services/us/en/c-suite/ceostudy2012/), highlighted that IT technology is the leading force for impacting immediate business, with people skills and market factors following closely behind. So how does a business deliver innovation and new systems when this study concluded that more and more of the relative proportion of IT budgets is eaten up maintaining existing systems? This same study noted that only one in five organizations allocate more than 50 percent of their IT budget to new projects—a sobering thought.
Supporting the conclusions of this IBM study, research from IDC’s “Converged Systems: End User Survey Results” (September 2012, Doc# 236966) reported that the proportion of IT budgets that is allocated to server management and administration is a whopping 68 percent. Of more concern is what this proportion used to be in 1996—29 percent! Oddly enough, this change, which resulted in many IT shops being thought of as cost centers rather than critical business innovation centers, has occurred during a time when many data center operations have been sent offshore to emerging economies such as Brazil, Russia, India, and China (often referred to as BRIC countries). This was supposed to reduce costs; however, administrative costs (which are people costs) are ironically going through the roof.
The rate of technological change and the realities of a Big Data world creates even more pressure on these precious IT resources. The mobile-social-cloud evolution completely changes the volume, veracity, velocity, and variety characteristics of the data you work with every day. From BYOD to the fact that 90 percent of mobile users keep their devices within arm’s reach 100 percent of the time, it all means more data to manage, the extinction of batch windows, and continuous availability—all trends whose funding falls into the server spending and administration category of the IT shop.
Of course, all of this means that there’s less money for innovative projects and new use cases, which brings us back to the results of the IBM study: More and more IT budgets are getting allocated to “keeping the lights on” instead of focusing on new projects and capabilities to drive growth.
With this in mind, it’s no wonder that clients are looking for more efficient ways to work with their data. They are struggling to find new agile development methods, reduce management overhead, and fully optimize the use of their existing investments to free up capital for new investments. Although it’s fair to note that investing in technologies like multicore servers and solid-state disks (SSDs), among others, can help solutions run faster and more efficiently, solely investing in these technologies is not enough! For example, are you fully exploiting your hardware systems? Have you noticed that multi-core systems don’t always make the software run faster? We’ve all had this experience at home: We get the latest quad-core machine and things don’t run twice as fast as they did on our old dual-core laptop. After all, Intel, AMD, and Power processing architectures all have built-in massive parallelism capabilities. The real question is: “Was my software written to fully exploit these capabilities?”
These challenges served as the inspiration behind the last decade of DB2 innovations, and the latest DB2 10.5 release is no exception. IBM has made significant investments that won’t just flatten the time to value for your solutions, but also will help you to derive the greatest benefit from the investments you’ve already made. For example, unlike other in-memory columnar offerings, DB2 isn’t going to force you to toss out your existing hardware to run it; in fact, the opposite is true. Your existing hardware will do even more for your solutions. For example, consider what Mohankumar Saraswatipura, the lead DBA at Reckitt Benckiser, had to say about DB2 10.5: “The performance of DB2 10.5 with BLU Acceleration is quite amazing. We ran our tests on a system that is about 16 times less powerful than our current production system. And yet BLU was able to outperform our production system in every aspect.” (Reckitt Benckiser makes a large assortment of award-winning nameplate products that you see at the grocery store and likely use every day—Lysol, for example, and who can forget the memorable catchphrase “Calgon, take me away!”)
DB2 10.5 incorporates leading technology from our research and development labs and delivers a groundbreaking DB2 release that includes the following enhancements:
• BLU Acceleration delivers incredible performance speed-up and compression ratios for analytical workloads, and it’s delivered with a “Load and Go” operational paradigm that’s more about what you don’t have to do from an administration perspective than what you have to do. This technology comes from a number of innovations from IBM Research and includes ground-breaking techniques such as dynamic in-processing processing and more.
• DB2 pureScale enhancements include new disaster recovery options for even higher availability, removal of planned maintenance outages, miscellaneous performance enhancements, elastic scalability, platform and interconnect flexibility, and more.
• NoSQL-style JSON Document Store delivers support for DB2 operating as a document store for JavaScript Object Notation (JSON) documents. This support includes plug-and-play support for MongoDB-created applications. (Remember that DB2 10 introduced NoSQL support for graph databases and has long had a truly native XML document capabilities with its native pureXML store.)
• Cloud extensions and features make DB2 an optimal database for on- or off-premise cloud deployments. For example, both the DB2 NoSQL JSON Document Store and BLU Acceleration are cloud optimized.
• Oracle compatibility enhancements inflate the compatibility ratio of Oracle PL/SQL features that run natively in DB2 to such an extent that even well known Oracle pundits have taken notice, claiming their PL/SQL book samples can be used on DB2!
• Miscellaneous enhancements, such as various nips, tucks, and some neat new stuff makes DB2 more available, scalable, and consumable; these are the kind of features that never make it into the mainstream marketing messages, but you’re glad they’re in the product.
There is, of course, a lot more to the DB2 10.5 release than what we’ve listed here, and we cover a lot of it in this book. Don’t let the point release naming fool you: DB2 10.5 is an inflection point release that’s going to help you analyze your Big Data more quickly, compress your data to a fraction of what it was, keep you online, flatten your project’s time to market curve, and help you take advantage of the social-mobile-cloud opportunity.
While this chapter can’t give you a set of abs or an Olympic-sculpted build in 15 minutes, it can give you a quick recap of what was delivered in the DB2 10.1 release and a high-level tour of the DB2 10.5 release. After reading this chapter, you’ll have a good idea of the new opportunities that are open to you if you build your application to the DB2 platform. Each of the aforementioned themes is detailed in the remainder of this book, where we frame the business challenge, the solution, and the DB2 approach. Sometimes we contrast how things are done in DB2 compared to other marketplace vendors, sometimes we contrast to the way things were done in previous releases of DB2, and, of course, we explain how the new features work.
What Was Delivered: A Recap of DB2 10.1
The DB2 10.1 release included some special innovations to help you deliver solutions in a more agile, higher-performing, and more available manner than you ever could before. In this section, we highlight some of the more important features that were delivered in the DB2 10.1 release. We wrote a book about this release, which you can download for free (http://tinyurl.com/l2dtlbf) if you want all the details. By the way, if you’re currently running DB2 9.7, you might want to consider migrating directly to DB2 10.5, because you’ll get all of the great features mentioned in this section, as well as the ones we cover in the remainder of this book, with only a single migration effort.
DB2 10.1 Delivered Performance Improvements
DB2 10.1 was a fast release! A lot of work went into making sure that our clients received “out-of-the-box” performance improvements in the neighborhood of 35 percent (our lawyers want us to tell you that any results we mention in this book were observed in controlled environments and you might not get exactly the same results; we told our lawyers that some of our clients are experiencing even better results, so we invite you to try the technologies out for yourself and decide). New performance features debuting in DB2 10.1 included intraquery parallelism, new parallel algorithms, and a brand-new zigzag join operation that we’ve seen run commonplace queries for which it was designed three times faster than in previous releases! There are also performance enhancements to RUNSTATS
, statistical views, index optimizations, hash joins, code path optimizations, range partitioning ATTACH
operations, CPU workload allocation controls, and more.
DB2 10.1 Delivered Even Lower Storage Costs
If you haven’t already implemented DB2 compression, you’re using too much disk space for your indexes, writing out larger logs and backups, and not using your server’s memory allocations as effectively as you could; whereas implementing DB2 compression can not only help you save on disk space but, in many cases, your applications will also run much faster! DB2 has delivered market-leading innovations around compression for almost a decade, and DB2’s compression capabilities represent one of the most popular feature sets in the history of DB2. After all, there’s a reason why Information Week stated that “[DB2] row-level compression is a revolutionary development that will leave Oracle and Microsoft green with envy.”
Clients that implemented DB2 compression before DB2 10.1 used a static table-wide dictionary and realized overall table compression savings of around 60 percent (typically, the largest tables would compress the most, at around 80 percent, but it all depends on the data). The DB2 9-point releases (DB2 9.5 and DB2 9.7) not only made compression autonomic, but also broadened the scope to database-wide compression with the addition of index, temporary-space (to the best of our knowledge, DB2 is still the only database in the world that can do this), inline-LOB, and XML compression. This had the net effect of even further boosting overall database compression ratios.
In DB2 10.1, adaptive compression was added to the mix. Adaptive compression added page-level dictionaries to the existing repertoire of compression techniques, giving DB2 the added ability to compress data at the page level (dynamic compression). These page-level dictionaries are automatically updated as new data is added. With this synergy, we saw average table compression ratios grow to about 70 to 80 percent. Of course, this technology had a beneficial effect on overall database compression ratios too. DB2 10.1 also added log compression, which even further compresses the overall database size.
How effective has the nonstop investment in compression been for DB2 clients? One of our clients moved from Oracle to DB2 in April 2008. Before moving, they had almost a terabyte of data. After moving to DB2 and capturing a half-decade of additional production data, their database size hasn’t changed! We think that this is a great example of how IBM’s ongoing investment in storage optimization empowers our clients to do more with less. How will the new BLU Acceleration capabilities that are part of DB2 10.5 help out this producer of one of the world’s most iconic brands? Their lead DBA told us, “Just when I thought things couldn’t get any better, BLU Acceleration came along.”
DB2 10.1 also introduced multitemperature storage management, which gave DBAs a management-by-intent framework whereby business policies could be applied for the most effective use of high-speed storage, such as SSD. Specifically, DB2 10.1 gave administrators the ability to create storage pools with specific devices assigned to them; for example, the fastest disks could be assigned to a HOT
pool, typical disks to a WARM
pool, and older disks relegated to unimportant or archive work in a COLD
pool. New table spaces could simply point to these storage pools and inherit the performance characteristics of their devices.
DB2 10.1 Delivered Improved Availability and Scalability
In DB2 10.1, the DB2 pureScale code base was merged with DB2, eliminating the need to install this industry-leading technology as a separate product (which is how it was done when it made its debut in DB2 9.8). This integration gave clients support for range partitioning, table space–level backups, and Workload Manager (WLM) integration, among other notable features. Those clients who were reluctant to deploy InfiniBand in a pureScale configuration because it required extra work acquired the option to use the “secret sauce”: DB2 pureScale communication protocols, as of DB2 10.1, support RDMA over Converged 10Gb Ethernet (or RoCE, pronounced “rocky”) as an alternative interconnect.
Three significant features were also added for those clients who deploy DB2’s turnkey high-availability and disaster recovery (HADR) solution. HADR in DB2 10.1 can support up to three standby servers and enables you to implement a delayed APPLY
to help prevent errant transactions from being applied on the backup server or avoid human error (the number-one cause of down time). For high-throughput environments, DB2 10.1 HADR includes support for transaction spooling, which can be useful in avoiding backpressure on the production server.
Finally, a new high-speed continuous data ingestion (CDI) utility was delivered as part of DB2 10.1. The multithreaded INGEST
utility is a high-speed, client-side utility that streams data from files or pipes in various data formats into DB2 tables. If you’re familiar with DB2’s IMPORT
and LOAD
utilities, you can think of the INGEST
utility as a cross between them that aims to combine the best features of both, as well as add some brand-new capabilities. INGEST
keeps tables ultra available, because it doesn’t lock the target table; rather, it uses row locking to minimize its impact on concurrent activities against the same table. CDI is especially useful in situations where new files containing data are arriving all the time (in Big Data terms, we call this velocity). The data might be arriving continuously or trickling in from an extract, transform, and load (ETL) process. After experiencing just how beneficial this feature is, some clients have renamed their processes to extract, transform, and ingest (ETI)!
DB2 10.1 Delivered More Security for Multitenancy
Row and column access control (RCAC) support was added to DB2 10.1; in combination with the existing label-based access control (LBAC), security administrators get an even more extended arsenal for separation of duty, separation of concern, and principle of least privilege in their multitenancy database environments. Specifically, RCAC (or its common industry name, fine-grained access control, or FGAC) gives the security administrator (SECADM) new flexibility in defining what rows or columns users can access.
RCAC is based on a user’s database role. A DB2 role (introduced in DB2 9.5) entitles a user to retrieve only certain records from the database. For example, a bank teller might only be able to see the mortgage papers that were issued by his branch. Records that are outside of an individual’s role are never materialized (returned to the individual or part of interim processing answer sets). In addition, columns that are of a sensitive nature can be masked (think “*****”) or set to a NULL value. RCAC is extremely flexible and considerably simpler to implement than LBAC, which is more suited to organizations that have rigid hierarchy-based security roles, such as those you might see in a nation’s defense department.
DB2 10.1 Delivered the Dawn of DB2 NoSQL Support
DB2 10.1 introduced support for one of the four NoSQL database genres (Key Value, Columnar, Graph, and Document) with its support for a graph store that’s implemented using the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF) standard and fronted by the popular JENA application programming interface (API). The RDF W3C standard describes relationships and resources in the form of subject-predicate-object. In the NoSQL world, these types of data entities and relationships are called triples, and they are stored in specialized triplestore databases. DB2 10.1 can function as a true triplestore database, providing the ability to store and query RDF data.
In addition to supporting an RDF graph store, DB2 supports a W3C RDF query language called SPARQL, which is designed to retrieve data from an RDF graph store database and was devised by the W3C, the same people who developed many of the other standards on which the Web is based. SPARQL represents a strong foray by IBM into the NoSQL world. You’ll find out later in this book how DB2 implements a JSON document store that is all the rage in today’s mobile and cloud-dominated environment. BLU Acceleration includes columnar storage as one of its inspirations, and IBM InfoSphere BigInsights includes a nonforked Hadoop distribution and HBase, among other NoSQL technologies, such as Flume, Oozi, Hive, and more.
DB2 10.1 Delivered Even More Oracle Compatibility
Every DB2 release includes more Oracle compatibility features. We want to help you move to a better database! Included in DB2 10.1 were more flexible triggers that fire only once per SQL statement; locally defined data types, procedures, and functions in compound SQL statements; and a number of new scalar functions. With these enhancements, we’ve seen the ability for DB2 to natively process Oracle PL/SQL logic within single digits of 100 percent for the typical applications we see.
DB2 10.1 Delivered Temporal Data Management
Temporal data management is all about managing and processing data in relation to time. DB2’s support for system and business time (which we think is unique) enables you to automatically track and manage multiple versions of your data. When you define a system-period temporal table, you’re instructing DB2 to automatically capture changes to the state of your table and to save “old” rows in a history table, a separate table with the same structure as your base table. Whenever a row is updated or deleted, the “before image” of the affected row is inserted into this history table.
Through simple declarative SQL statements, you can instruct DB2 10.1 to maintain a history of database changes or track effective business dates automatically, eliminating the need for such logic to be hand-coded into triggers, stored procedures, or application code.
Introducing DB2 10.5: The Short Tour
No question about it, DB2 10.1 was an exciting release and has lots of great technology. If DB2 10.1 excites you, DB2 10.5 has the potential to blow you away, because beyond its enhancements of existing features, it introduces an inflection point technology known as BLU Acceleration.
In this section, we give you the short tour of the DB2 10.5 release. The purpose of this chapter is to introduce you to the key themes and technologies that debut in this release and that are described in the remainder of this book.
DB2 with BLU Acceleration
BLU Acceleration is one of the most significant pieces of technology that’s ever been delivered in DB2 and, we’d argue, in the database market in general. DB2 with BLU Acceleration delivers significant improvements in database compression and unparalleled performance improvements for analytic applications running on DB2. This technology is so important that Chapter 3 is dedicated to teaching you all about BLU, how it works, and what makes it so special.
There’s a lot of talk about in-memory columnar data processing in the industry today, but BLU Acceleration is so much more. DB2 actually sees memory as the new disk when it comes to performance optimization, in that you only go to that tier if you have to; it also doesn’t need to have all the data in memory. More about that later.
BLU Acceleration is not some new bolted-on storage engine that sits on top of DB2, or that requires you to upgrade your servers from among a small set of architectures. BLU Acceleration is part of the DB2 database engine’s DNA, and every facet of DB2 is aware of it. It’s not an afterthought, but a key component of the database. It uses the same pages, the same buffer pools, the same backup and recovery utilities, and more.
DB2 with BLU Acceleration includes a number of significant features that work together to make it an inflection point. We sometimes refer to this engineering as the “BLU Acceleration Seven Big Ideas.”
The First Big Idea: BLU Acceleration Is Simple (“Just Load and Go”)
BLU Acceleration is simple to implement and use. As we’ve said before, BLU Acceleration, from a management perspective, is more about what you no longer have to do than what you actually need to do. There’s no need to create indexes, reorganize tables, select a partitioning key, define multidimensional clustering tables, build materialized views, and so on. All you have to do is load the data into a table and go start running your queries.
We see this technology as being well suited to a multitude of scenarios and purposes, from instantly speeding up your SAP Business Warehouse (BW) environments (without the risk of application migration or tricky hardware swaps), to accelerating your Cognos environments (we call it “Fast on Fast”), to spinning off data marts and cubes for lightning-fast performance for agile campaign support, and more. Because DBAs don’t have to spend time on performance tuning, BLU Acceleration frees up their time to work on new value projects instead of trying to get value out of existing projects. BLU effectively enables DBAs to support more analytic applications and bring to the business more opportunities to make a difference. We’ve seen clients experience up to quadruple-digit performance speed-ups on some of their queries, with double-digit average improvements in query response time overall! (Note that we said average improvements in query response time—this is very different from typical marketing claims by other vendors who typically quote “highlight reel” single queries as opposed to average query speedup.)
Handelsbanken (described by Bloomberg as one of the most successful and secure banks in the world) stated that they “were very impressed with the performance and simplicity of BLU. We found that some queries achieved an almost 100 times [faster] speed-up with literally no tuning!” That sums up this big idea: load and go… go faster, that is.
The Second Big Idea: Extreme Compression and Computer-Friendly Encoding
If you thought that DB2 compression was great in DB2 10.1, then you’re in for a surprise when you try DB2 with BLU Acceleration. In our observations, clients are seeing ten-fold compression rates when they move tables into a BLU Acceleration format. All of this compression is done automatically for you; you don’t have to choose specific encoding mechanisms, or mix a myriad of high or low query and archive compression schemes to try to get it right. BLU Acceleration is about helping performance and compression.
BLU Acceleration uses a form of Huffman encoding to compress data, along with the more traditional adaptive dictionaries. By placing only values from an individual column on a page, DB2 is able to find more patterns and improve compression rates. As an added bonus, indexes, materialized query tables (MQTs), and other optimization objects aren’t required, so you eliminate the space and overhead of creating these objects—after all, there’s no point in compressing objects that you don’t need! Some of our clients have seen savings up to 25-fold when they consider the compression rates alongside the objects that they were able to drop from the database schema.
After a value has been compressed, DB2 combines it with other compressed values to create a record that fits into a CPU register. By organizing data in this way, DB2 is able to load blocks of data into a processor core without any additional formatting or overhead. This attention to detail is what puts the acceleration into BLU Acceleration. Finally, in most cases, DB2 with BLU Acceleration doesn’t need to uncompress the data when running a query.
Triton’s head of DB2 Managed Services Team, Iqbal Goralwalla, told us that “when adaptive compression was introduced in DB2 10.1, having achieved storage savings of up to 70 percent, I was convinced this is as good as it gets. However, with DB2 10.5 and BLU Acceleration, I have been proven wrong! Converting my row-organized, uncompressed table to a column-organized table gave me a massive 93.5 percent storage saving!”
The Third Big Idea: Deep Hardware Exploitation
The third big idea revolves around exploiting existing hardware. Today’s modern processor architectures all have multiple registers on them and the ability to do multiple calculations with a single instruction. This technology is called Single Instruction, Multiple Data (SIMD). Software that’s built to exploit SIMD (like DB2) can execute one instruction against all of the registers on the chip; in other words, SIMD multiplies the power of the CPU.
This built-in parallelism was never before exploited by DB2. Most vendors still don’t exploit it today, and those that do, from what we can tell, don’t do it to the extent that DB2 with BLU Acceleration does it. The DB2 value proposition here is that DB2 can further exploit the existing parallelism of your server instead of forcing you to buy new hardware to achieve similar results. You can take advantage of your existing hardware and get great performance from DB2.
To the best of our knowledge, DB2 with BLU Acceleration is the only technology of its kind that runs on IBM Power servers. Power servers are exceptionally good at SIMD exploitation because they typically have larger and wider registers in comparison to alternative architectures.
During the development of BLU Acceleration, we worked very closely with Intel to fully exploit their Advanced Vector Extensions (AVX) instruction set, which is available on Intel Xeon processor E5-based systems. Pauline Nist, general manager of Intel’s Enterprise Software Alliances, Datacenter & Connected Systems Group, summarized our joint engineering work by referencing an analytic test bed that Intel first ran on DB2 10.1 and then on DB2 10.5 with BLU Acceleration: “Intel is excited to see a 63x–133x (depending on the Intel processor) improvement in query processing performance using DB2 10.5 with BLU Acceleration over DB2 10.1.”
The Fourth Big Idea: Core-Friendly Parallelism
Modern chip architectures use a variety of cache levels to hold instructions and data. When looking for data, multilevel caches generally operate by checking the smallest Level 1 (L1) cache first, then the next larger cache (L2), followed by the shared cache (L3), before external memory (DRAM) is checked. The more data and instructions that you can keep in the CPU caches, the faster your workloads will complete.
BLU Acceleration, in combination with automated DB2 workload management, was designed to keep data in CPU caches as long as possible; in fact, its algorithms view spilling to DRAM in the same way as spilling to disk from a performance perspective. As a result, SQL requests that are running on a series of cores are prioritized so that one unit of work completes before another request is dispatched. This enables DB2 to achieve higher throughputs when workloads can be isolated to certain cores or sockets. By isolating workloads, DB2 can achieve higher cache coherency and eliminate low-level thrashing of the cache. The Reckitt Benckiser scenario referenced earlier in this chapter is a good example; even though they ran their tests in an environment that was 16 times less powerful than their production system, the test system outperformed the production system.
Other database products aren’t designed to exploit all of the cores on a server. A very common open-source relational database product is known for its scaling issues above a certain core threshold on a single server. This is exactly the pain point that BLU Acceleration addresses, because it was designed for today’s multicore systems. Kent Collins, a Database Solutions Architect at BNSF Railway, sums it up this way: “During our testing, we couldn’t help but notice that DB2 10.5 with BLU Acceleration is excellent at utilizing our hardware resources. The core-friendly parallelism that IBM talks about was clearly evident, and I didn’t even have to partition the data across multiple servers.”
The Fifth Big Idea: Column Store
There are many benefits that a column-organized approach brings to analytic workloads. Chief among them is the fact that column-organized tables typically compress much more effectively. The probability of finding repeating patterns on a page is very high when the data on the page is from the same column. Row-organized tables, on the other hand, store data from columns in the same row, and the data of those columns can vary widely, thereby reducing the probability of finding repeating patterns on a page.
In DB2, column-organized tables can coexist with traditional row-organized tables; you’re not required to commit to one or the other approach for your entire database. And, because BLU Acceleration is built right into the DB2 engine, the SQL, optimizer, utilities, and other functions are fully aware of it.
The Sixth Big Idea: Scan-Friendly Memory Caching
Clients typically have a memory-to-disk ratio of 15 to 50 percent. What this means is that they can’t possibly fit their entire table, or even the data that is required to execute a complex query, entirely into memory. What does this mean for customers using DB2 with BLU Acceleration? Although DB2 would make good use of it, there’s no need to have excessive amounts of memory. BLU Acceleration was designed for the demands of a Big Data world where it is less and less likely that all of the data that queries need will fit into memory.
BLU Acceleration comes with a set of Big Data–aware algorithms for cleaning out memory pools that is more advanced than the typical “Least Recently Used” (LRU) algorithms that are associated with traditional technologies. These BLU Acceleration algorithms are designed from the ground up to detect “interesting” patterns and to hold those pages in the buffer pool as long as possible. The algorithms work side by side with DB2’s traditional row-based algorithms. The net result is that you can get significant performance boosts from data already in memory, even though the memory might not be as big as your tables.
The Seventh Big Idea: Data Skipping
DB2 keeps track of the minimum and maximum values that are found on a column-organized table’s data pages. This information is updated dynamically (you don’t have to manage anything—all automatic) and is used by the query optimizer to skip those data pages that don’t contain values that are needed to satisfy the query. If you’re familiar with the Zone Maps used in the IBM PureData for Analytics offering (formerly known as Netezza), you can see where the inspiration for this feature came from; its net effect is a dramatic speed-up of query execution because a lot of unnecessary scanning is avoided.
A Final Thought Before You Delve into the Wild BLU Yonder
As you can see, BLU Acceleration is a remarkable innovation that’s part of the DB2 10.5 release. It gives you outstanding performance and extreme data compression without having to do complex tuning, or any tuning at all, for that matter. On average, the clients that we worked with have experienced 10-fold compression on their databases and found that their average query sets ran 10 to 25 times faster—and some even faster than that (of course, your results might vary). Mindray’s Xu Chang is a world-renowned expert on multiple database technologies. We found him on LinkedIn talking about his experiences with DB2 10.5: “While expanding our initial DB2 tests with BLU Acceleration, we continued to see exceptional compression rates—our tables compressed at over 92 percent. But our greatest thrill wasn’t the compression rates (though we really like those), but rather the improvement we found in query speed, which was more than 50 times faster than with row-organized tables.”
DB2 pureScale Goes Ultra HA, DR, and More…
DB2 10.5 contains a number of enhancements to pureScale: DB2’s continuous-availability solution that was first released on distributed platforms in 2009. DB2 pureScale technology is based on the “gold standard” DB2 for z/OS coupling facility (CF) and is available on both the Linux and AIX platforms. This technology enables multiple DB2 members to access the same database with full locking and logging integrity because the DB2 pureScale central caching facility (which is a software implementation of the CF—so we refer to it as the CF in this book) is very efficient at messaging, locking, and data movement among cluster members. It does this with next to no operating system overhead or noticeable impact to the database.
A number of neat features have been added to the pureScale technology in the DB2 10.5 release. First, DB2 pureScale now includes high-availability disaster recovery (HADR) support. One of the areas HADR can be used in is when the primary data center goes down due to a complete power outage, flood, or other type of catastrophic event at the local site. HADR synchronizes data with a remote DB2 pureScale cluster, enabling that remote site to take over if your primary cluster goes down.
DB2 pureScale also gets a couple of new features that keep it available during planned maintenance windows. The first feature is the ability to add members online so that you can dynamically add more capacity to the system without shutting down the entire cluster. The second feature is online Fix Pack maintenance, which enables DB2 pureScale members to be brought down individually, have DB2 Fix Pack maintenance applied to them, and then be brought back online without affecting the availability of the application. After the entire cluster has been updated in this rolling fashion, it can be instructed to run the latest fix pack level. This rolling fix pack maintenance capability, along with the ability to perform rolling maintenance on the hardware, the operating system, the network, and other features, enhances DB2 pureScale’s ability to remain continuously available without planned maintenance interruptions.
DB2 10.5 also introduces enhanced lock management techniques to reduce batch overhead, randomized index keys to reduce index page hot spots, member subsets to manage multitenancy environments which differ in service-level agreements. and the ability to back up a DB2 pureScale cluster and restore it to a DB2 server that isn’t using the DB2 pureScale technology. Finally, the much-loved self-tuning memory manager (STMM) is now available on a per-member basis in a DB2 pureScale cluster.
DB2 as a JavaScript Object Notation Document Store
If you’re a die-hard relational DBA, we’re guessing a year ago, it’s likely you never heard about JavaScript Object Notation (JSON). Six months ago, you heard about it, but figured it would go away—just like your acid-washed jeans from the 1980s. Perhaps three months ago, you began hearing about it more and more, and the other day you woke up and said, “I’d better find out what JSON is all about, because his name keeps coming up everywhere I look, and I’ve never even met the man.” JSON isn’t a person—it’s a text-based open standard designed for data interchange that originated with JavaScript but has become ubiquitous. It’s the new XML in the application development and NoSQL worlds.
In addition to the giant step the DB2 10.1 release took into the NoSQL world by supporting a Jena Resource Description Framework (RDF) graph triplestore, DB2 10.5 adds support for the persistence of JSON objects in a document-style database. DB2 is embracing this use of JSON by implementing a Java-based JSON API and the popular MongoDB-compliant API as a first class citizen in DB2. The JSON API provides services to JSON applications that developers aren’t accustomed to in the NoSQL world; for example, atomicity and transactional control. But it still gives developers techniques like “fire and forget” to build Web 2.0 applications with speed and agility.
Quite simply, DB2 empowers you to use it as a NoSQL document JSON store such that you can continue to take advantage of the flexibility of JSON from an application development perspective and still benefit from the use of DB2 as the robust database back-end. You can use all of the data in DB2 facilities (such as replication, HADR, security, and other features) that aren’t necessarily present in some of the NoSQL database environments. Your developers can use this new JSON object model while ensuring that it is kept in a robust, well-performing database engine like DB2.
Oracle Compatibility
To top off the DB2 10.5 release, there are a number of new Oracle compatibility features that help clients port their database applications from Oracle to DB2. These features enable them to break free of high maintenance costs and complexity, and leverage some of the most advanced database technology in the world today. Along with the addition of a number of new Oracle SQL functions, DB2 10.5 includes three new types of indexes and the ability to create tables whose fields are larger than the default data page.
There are three index enhancements in DB2 10.5. The first one enables you to create an index on an expression rather than use generated columns in the table. This type of index facilitates more efficient queries against the database and eliminates the storage requirements of generated columns, along with their maintenance headaches—not to mention it simplifies application development too.
The second index enhancement enables unique indexes to include NULL values. NULL values in unique indexes were a challenge for many clients because you couldn’t have more than one NULL value in a unique index. However, applications often require more than one unique value in a table (for example, EMPLOYEE_NUM
and SSN
). Although one value might be guaranteed to be available at insert time (EMPLOYEE_NUM
), the second value might not be available yet. If more than one employee has forgotten to bring their SSN
to work, the system cannot insert the record because that would duplicate the NULL value. As of DB2 10.5, multiple NULL values on a unique index are supported so that existing applications can be simplified and developers don’t have to worry about multiple NULL values causing integrity problems.
The third index enhancement is support for random index keys. Applications that typically have a huge number of inserts and updates occurring against the same index page can now have their keys randomly spread across multiple pages. The use of hashing can reduce “hot spots” from occurring in the index. This reduction of hot spot behavior is useful for some DB2 pure-Scale environments or online transaction processing (OLTP) systems.
The final compatibility feature enables developers to create tables whose row definitions are bigger than the current page size. Many Web 2.0 developers create tables with inflated column sizes, an approach that we don’t sanction for a variety of reasons that we cover later in this book. In DB2 10.5, if you insert a row that doesn’t fit on a single page, DB2 will spill portions of the row into a long field space. Of course, as you’d expect, DB2 automatically manages the spillover, so there’s nothing for a DBA to do here. Row management is done behind the scenes so that you’ll never get an error if you insert more data than what fits on the page.
Tools, Tools, Tools
DB2 10.5 has so many new features, you might wonder what additional tooling is available to help you administer the product. At the time that DB2 10.5 became generally available, a number of tools were updated to support the current release and some specific features:
• Day one support for DB2 BLU Acceleration IBM InfoSphere Optim Query Workload Tuner V4.1 for DB2 for Linux, UNIX, and Windows gives expert advice on what tables to convert to column organization based on workload analysis, estimated benefit, and what-if analysis.
• DB2 pureScale Enhancements The ability to perform rolling updates in pureScale has been incorporated into the Configuration Manager, along with improvements to the Task Assistant and the Script Scheduler.
• Additional Improvements Data Studio now supports the setup and management of multistandby HADR, as well as the configuration of multiple federation objects. Recovery Expert includes support for adaptive compression and multitemperature storage. Merge Backup, Recovery Expert, and High Performance Unload have all been enhanced to support DB2 10.5.
The following list shows the products and tools that support DB2 10.5:
• IBM InfoSphere Data Architect V9.1 is a collaborative data design solution. It enables you to discover, model, relate, standardize, and integrate diverse and distributed data assets throughout your enterprise.
• IBM Data Studio V4.1 provides an integrated, modular environment for database development and administration of IBM DB2 for Linux, UNIX, and Windows. This software is available at no charge.
• IBM InfoSphere Optim Query Workload Tuner V4.1 for DB2 for Linux, UNIX, and Windows provides expert recommendations to help you improve the performance of query workloads.
• IBM InfoSphere Optim Performance Manager V5.3 for DB2 for Linux, UNIX, and Windows provides DBAs and other IT staff with the information that they need to manage performance proactively and avoid problems before they impact the business.
• IBM InfoSphere Optim Configuration Manager V3.1 for DB2 for Linux, UNIX, and Windows provides an inventory of clients and servers, tracks changes to client/server properties, and compares client/server environments with best-practice configurations.
• IBM InfoSphere Optim pureQuery Runtime V3.3 for Linux, UNIX, and Windows provides a runtime environment and an application programming interface (API) that enhances the performance of existing in-house applications without having to modify them.
• IBM DB2 Merge Backup V2.1 for Linux, UNIX, and Windows minimizes the impact of backups and shortens recovery times on production servers.
• IBM DB2 Recovery Expert V4.1 recovers database objects safely, precisely, and quickly without having to resort to full database recovery.
• IBM InfoSphere Optim High Performance Unload V5.1 for DB2 for Linux, UNIX, and Windows helps DBAs work with very large quantities of data with less effort and faster results.
• IBM InfoSphere Optim Query Capture and Replay V1.1 for DB2 for Linux, UNIX, and Windows captures production workloads and replays them in nonproduction environments for more realistic tests.
Many of these tools are provided as part of DB2 Advanced Workgroup Server Edition and DB2 Advanced Enterprise Edition. They can be purchased separately for the other DB2 editions.
Wrapping It Up…
In this chapter, we have given you a glimpse of the enhancements that we cover in this book. BLU Acceleration is one of the biggest innovations that DB2 has ever brought to market, if not the biggest. It’s a game changer. Nevertheless, continuing to invest in infrastructure that ensures that DB2 keeps its marketplace edge around certain key tenets such as scalability, high availability, and performance guarantees that a number of features pertaining to these tenets will make it into every new DB2 release, and this release is no exception.
The world is changing, and in a mobile-social-cloud world, JSON is the new lingua franca. Mobile devices generate and consume data at unprecedented rates. As you’ve come to expect from DB2, the features that we deliver are consumable and comparatively light on the pocketbook; for example, BLU Acceleration is free for existing DB2 Advanced Enterprise Server Edition (DB2 AESE) customers, and we even introduced a brand-new Workgroup Advanced Edition to deliver this capability for small to medium-sized deployments with a price point that will make your jaw drop for what you get—in a good way.
We realize that your time is precious, and we want to thank you for at least getting to the end of Chapter 1. We hope that at this point you’re intrigued enough to continue reading. We promise that in return for the minimal investment you must make to read this entire book, you will have a great grasp of the DB2 10.5 release and of some major marketplace trends as well. Enjoy!