EXPLAIN Explained: Recent Changes That Let Us Use EXPLAINBetter By Jim Dee Introduction and Basics This is the first half of a two part article aimed at DBA’s and application developers who use DB2 for z/OS. Both articles will explore a use of EXPLAIN which may be a little different than what you’re used to. The idea is to store access path data over time, and use this data to detect trends and to proactively identify issues for more detailed investigation. This article will review the basics of EXPLAIN and describe how to store trending data in your explain tables, with some examples of SQL to extract value from this. Then we will explore recent enhancements to DB2 for z/OS which allow us to more accurately simulate our production environment in test. The second half of the article will discuss the use of profiles and get into more enhancements which have given us the ability to improve optimization by passing more accurate data to the optimizer: virtual indexes, hints and APREUSE, and selectivity profiles. A basic premise of this article is that the optimizer is accurate most of the time, and that the explain tables provide information which can be used to quickly narrow down the list of potential SQL problems associated with application changes, DB2 version upgrades, application of maintenance, and other environmental changes. I’m assuming you want to strive for continual improvements in SQL performance but minimize the risks of doing so. I assume that you’re familiar with the explain tables (PLAN_TABLE and its associated tables) and that you understand the basics of the EXPLAIN command or BIND with EXPLAIN(YES). Access path stability Access path stability (sometimes called “plan management”) is an important feature of DB2 that was added in DB2 9 and has been enhanced in DB2 10 and 11. For static SQL, it provides the ability to “undo” a BIND which has experienced access path regression (or even avoid the BIND altogether!), but as we will see, it provides a lot more than that. Note that many of the new BIND options can be applied only to a package which was bound in DB2 9 or later. Space does not permit me to go into detail about access path stability, but the rest of this article assumes you are using at least PLANMGMT BASIC, and much of this material is about exploiting the access path data stored in the explain tables, as well as the package information now stored in the DB2 catalog. Get Current with EXPLAIN You have to cover the basics first. Make sure that you use the most current set of EXPLAIN tables (21 in DB2 11, at last count) for the version of DB2 you are running on. You can find the DDL to create this set of tables in SDSNSAMP(DSNTESC). If you’re planning to migrate to a new DB2 version, IBM allows you to work with these tables in the previous release, so you may want to get ahead of the game if you’re PROGNAME. ---------+---------+---------+---------+---------+---------+---------+ COLLID PROGNAME DATE TIME STMT COUNT TOTAL COST ---------+---------+---------+---------+---------+---------+---------+ MYCOLL MYPACK 05/08/2014 11. the PTF to provide DSNTIJPB in DB2 10 is UK98219. DEC(SUM(ST.PROGNAME.COLLID = ST. Another change in DB2 10 is the replacement of the TIMESTAMP column with EXPLAIN_TIME.38 5 10. only the relative values matter. PL.COLLID. Remember to consider and allow for changes in the .QUERYNO = ST.03 DSNE610I NUMBER OF ROWS DISPLAYED IS 2 A few words are in order here about the “cost” column.PROGNAME.EXPLAIN_TIME) AS TIME. TIME(PL.TOTAL_COST). the new column is much more meaningful. The output might look as follows for a very simple case. Assuming you BIND with EXPLAIN(YES). or the time of BIND or EXPLAIN for those not in the cache.55 MYCOLL MYPACK 05/08/2014 17. If you have existing EBCDIC tables. please remember that the absolute value is meaningless.COLLID. You can run the DSNTIJPM job to identify PLAN_TABLES found in a subsystem which are not at the required level.EXPLAIN_TIME) AS DATE. a lower cost is good and a higher cost is bad.QUERYNO) AS "STMT COUNT".EXPLAIN_TIME AND PL. First. DATE(PL.19.10) AS COLLID.QUERYNO GROUP BY PL. COUNT(PL. Lets discuss doing this with static SQL first.PLAN_TABLE PL. but it’s worth analyzing any change in detail.8. you will have to DROP the tablespaces and CREATE new ones. SELECT SUBSTR(PL. we’ll move on to dynamic SQL later.COLLID AND PL. This is demanded in DB2 10 NFM and in DB2 11.EXPLAIN_TIME ORDER BY PL. IBM provides a PTF to create DSNTIJPM (with a different name) in the previous version of DB2.36.PROGNAME = ST.1.PROGNAME AND PL. PL.running on DB2 10.EXPLAIN_TIME = ST.1.10) AS PROGNAME.2) AS "TOTAL COST" FROM SJD. SUBSTR(PL. Detect Trends and Changes In addition to regular performance monitoring of your SQL applications. If you have SQL that reports on when EXPLAIN occurred. Theoretically.DSN_STATEMNT_TABLE ST WHERE PL.17 8 19. and install the DB2 11 format of the EXPLAIN tables now. you might execute something like the following after every BIND or REBIND to see cost trends at a glance. SJD. you can “monitor” your access paths by saving historical EXPLAIN data. Note that the tablespaces created by this member are all defined as UNICODE. It reports the time of cache entry for cached statements. COALESCE(DEC(ST2. To accurately assess the relative impact.17' AND DATE(ST2. COALESCE(DEC(ST1. it is reasonable to expect that SQL costs will increase with time.24 DSNE610I NUMBER OF ROWS DISPLAYED IS 2 If like most of us.8.EXPLAIN_TIME) = '2014-05-08' AND TIME(ST1.TOTAL_COST ORDER BY QUERYNO . MAX(QUERYNO) FROM SJD.EXPLAIN_TIME) = '17.TOTAL_COST <> ST2.PLAN_TABLE WHERE COLLID = 'MYCOLL' AND PROGNAME = 'MYPACK' AND DATE(EXPLAIN_TIME) = '2014-05-08' .COLLID = 'MYCOLL' AND ST1. The next logical step is to identify the statements for which cost changed and any new or deleted statements. this SQL gets a little more difficult.sizes of the DB2 objects being accessed. if they are. Also.EXPLAIN_TIME) = '13. maybe the cost of a SQL statement increased because you changed your RUNSTATS options and the optimizer now has more detailed and accurate statistics to work with.DSN_STATEMNT_TABLE ST1 FULL JOIN SJD. or if your program code has not changed.DSN_STATEMNT_TABLE ST2 ON ST1. you have not coded QUERYNO in each of your SQL statements and the QUERYNO values have changed. I’m assuming here that the packages are not versioned. then the SQL is relatively simple.05. that will “weight” the cost of each SQL statement.36.TOTAL_COST.07 16.PROGNAME = 'MYPACK' AND DATE(ST1.QUERYNO AS QUERYNO. SELECT ST1.QUERYNO WHERE ST1. The results might look as follows: ---------+---------+---------+---------+----QUERYNO OLD COST NEW COST ---------+---------+---------+---------+----353 .2).EXPLAIN_TIME) = '2014-05-15' AND TIME(ST2. MAX_QNO) AS (SELECT MIN(QUERYNO). WITH FIR(MIN_QNO.COLLID = 'MYCOLL' AND ST2.0) AS "NEW COST" FROM SJD. In my simple example. This SQL is as follows. the cost might increase and be more accurate whether the access path has changed or not.QUERYNO = ST2. in this case.TOTAL_COST.11 360 8.14' AND ST1. we could just list the rows in the PLAN_TABLE but packages in the real world typically have many more than eight statements! If you have coded a unique value for QUERYNO in each SQL statement.0) AS "OLD COST".2).8. if your business is growing. you could create a copy of PLAN_TABLE with an additional column like “EXECUTION_COUNT” (maybe a FLOAT column) and populate it with your estimates or data from traces or a monitor. you can then retrieve the PLAN_TABLE row from each EXPLAIN and compare the access paths to find why calculated costs changed.PROGNAME = 'MYPACK' AND ST2. After identifying the appropriate QUERYNO values. The other information that is missing is the number of executions for each SQL statement. you need to add predicates for VERSION to the above SQL.15 . PLAN_TABLE WHERE COLLID = 'MYCOLL' AND PROGNAME = 'MYPACK' AND DATE(EXPLAIN_TIME) = '2014-05-08' AND TIME(EXPLAIN_TIME) = '11. please note that this extraction is valid only if one of the packages you’re analyzing is the current one.QUERYNO) = (FIR.QUERYNO = PACK.0) AS "OLD COST".QUERYNO AS "OLD QUERYNO".SYSPACKSTMT PACK.ST2.COLLID = 'MYCOLL' AND ST1.38').1.EXPLAIN_TIME) = '11.2). and therefore the QUERYNO will change by a constant amount.PLAN_TABLE WHERE COLLID = 'MYCOLL' AND PROGNAME = 'MYPACK' AND DATE(EXPLAIN_TIME) = '2014-05-08' .EXPLAIN_TIME) = '17.MAX_QNO)) AND ST2.8.17') SELECT ST1.QUERYNO .PLAN_TABLE WHERE COLLID = 'MYCOLL' AND PROGNAME = 'MYPACK' AND DATE(EXPLAIN_TIME) = '2014-05-08' AND TIME(EXPLAIN_TIME) = '17. MAX(QUERYNO) FROM SJD. SJD. COALESCE(DEC(ST2.38' AND DATE(ST2. SEC WHERE ST1.36.SEC. The SQL below will identify new statements.19.QUERYNO AS "NEW QUERYNO".60) AS STATEMENT FROM SJD. SYSIBM.SEC. MAX_QNO) AS (SELECT MIN(QUERYNO).TOTAL_COST.PROGNAME = 'MYPACK' AND PACK. you may want to remove it if you have a large number of packages on your DB2 subsystem. and a very similar statement can be used to extract the statements that were removed. MAX(QUERYNO) FROM SJD.TOTAL_COST.QUERYNO .0) AS "NEW COST".EXPLAIN_TIME) = '2014-05-08' AND TIME(ST2. WITH FIR(MIN_QNO. COALESCE(DEC(ST1.17' AND ((ST1.QUERYNO) = (FIR.ST2.38'). Also.19.TOTAL_COST ORDER BY "OLD QUERYNO".EXPLAIN_TIME) = '2014-05-08' AND TIME(ST1.STATEMENT. FIR.MIN_QNO . SUBSTR(PACK.AND TIME(EXPLAIN_TIME) = '11. MAX(QUERYNO) FROM SJD. MAX_QNO) AS (SELECT MIN(QUERYNO). SEC(MIN_QNO.DSN_STATEMNT_TABLE ST1.MAX_QNO .TOTAL_COST <> ST2.DSN_STATEMNT_TABLE ST2. Please note that the extraction from SYSPACKSTMT is added for reference and to aid analysis.QUERYNO AND ST1.MIN_QNO) OR (ST1.19.PROGNAME = 'MYPACK' AND ST2.NAME = 'MYPACK' AND DATE(ST1. The SQL above will work if one or two groups of lines of code are added to your source.2). SEC(MIN_QNO. MAX_QNO) AS (SELECT MIN(QUERYNO).8.COLLID = 'MYCOLL' AND PACK. ST2.COLLID = 'MYCOLL' AND ST2. This SQL relies on the fact that code is usually added in chunks.36. ST.60) AS STATEMENT FROM SJD.ST.SYSPACKSTMT PACK WHERE ST.AND TIME(EXPLAIN_TIME) = '17.1. SYSIBM. .STATEMENT. The SYSPACKCOPY table was introduced in DB2 10. If you bound and rebound with EXPLAIN. FIR. we have had the ability to undo package changes after a REBIND which lead to unexpected and unwelcome access path regressions.DSN_STATEMNT_TABLE ST2 ON ST1.COLLID = 'MYCOLL' AND PACK.DSN_STATEMNT_TABLE.MIN_QNO . so now it is much easier to find information about previous copies of a package.TOTAL_COST.QUERYNO) = (FIR. COALESCE(DEC(ST1.NAME = 'MYPACK' AND DATE(ST.EXPLAIN_TIME = (SELECT BINDTIME FROM SYSIBM.2).COLLID = 'MYCOLL' AND ST.QUERYNO AS QUERYNO.17') SELECT ST.17' AND ST.8.SEC.0) AS "CURRENT COST" FROM SJD.36.COLLID = 'MYCOLL' AND ST2.8. the following SQL will identify statements in your package for which the access path changed between the previous copy and the current one.SEC.QUERYNO) = (FIR.2).PROGNAME = 'MYPACK' AND ST2.MIN_QNO) OR (QUERYNO .COLLID = 'MYCOLL' AND ST1.QUERYNO WHERE ST1.TOTAL_COST <> ST2.PROGNAME = 'MYPACK' AND ST1.MAX_QNO)) ) ORDER BY "NEW QUERYNO". COALESCE(DEC(ST2.36.19.8.MAX_QNO .0) AS "PREVIOUS COST".QUERYNO AND NOT EXISTS (SELECT QUERYNO FROM SJD.QUERYNO AS "NEW QUERYNO".EXPLAIN_TIME = (SELECT BINDTIME FROM SYSIBM.QUERYNO = ST2.0) AS COST.SYSPACKCOPY WHERE COLLID = 'MYCOLL' AND NAME = 'MYPACK' AND COPYID = 1) AND ST2.EXPLAIN_TIME) = '17. We know that QUERYNO values did not change because the SQL has not changed.DSN_STATEMNT_TABLE ST.DSN_STATEMNT_TABLE ST1 FULL JOIN SJD.TOTAL_COST.38' AND ((QUERYNO .PROGNAME = 'MYPACK' AND PACK. SELECT ST1. COALESCE(DEC(ST.EXPLAIN_TIME) = '2014-05-08' AND TIME(ST.SYSPACKAGE WHERE COLLID = 'MYCOLL' AND NAME = 'MYPACK') AND ST1. SEC WHERE COLLID = 'MYCOLL' AND PROGNAME = 'MYPACK' AND DATE(EXPLAIN_TIME) = '2014-05-08' AND TIME(EXPLAIN_TIME) = '11.2).TOTAL_COST ORDER BY QUERYNO . SUBSTR(PACK.TOTAL_COST. Since IBM added access path stability with DB2 9.QUERYNO = PACK. Getting information to find details about the change is also possible. is relatively new. Please note that only the PLAN_TABLE and none of the other explain tables is populated. Another option we now have in DB2 10 and later is to specify a REBIND with APCOMPARE(ERROR) to stop the REBIND if any of the access paths have changed. Copyright International Business Machines Corporation 2014.In DB2 10. which will populate all the explain tables without creating a new copy of the package. you will not be able to run the cost comparisons above. In the second half of the article. remember that you can also specify REBIND with APCOMPARE(WARN) if you like living dangerously. buffer pool settings. etc. we will look into some of the features IBM has added recently. This is a tool to automate many aspects of EXPLAIN table maintenance. I can be reached at jim_dee@bmc. we’ve talked about doing our analysis on the production system. EXPLAIN PACKAGE COLLECTION ‘MYCOLL’ PACKAGE ‘MYPACK’ COPY ‘PREVIOUS’. questions. Otherwise. copy processor info There is yet another improvement we can make to this set of techniques. where the changed code is developed? We’ve been able to copy RUNSTATS statistics used by the optimizer from production to test for a long time. because of lower costs. even if you did not remember to EXPLAIN when you bound any of the copies. you can now execute SQL something like the following to populate the PLAN_TABLE. in this case. Everything discussed so far has assumed that the explain information is valid.toolbox. It was introduced with a PTF in DB2 9 and is of course available in DB2 10 and DB2 11. You can get the most out of your explain history if you use it proactively. You still must allow for different DSNZPARM settings. to avoid binding (and promoting your code to go with it) if doing so will cause access path regression. or different levels of z/OS. Why not do this analysis on our test DB2. different versions of DB2. The procedure and SQL to copy production statistics to test is documented in Chapter 50 of the “DB2 for z/OS 11 Managing Performance” manual referenced in the bibliography. note that you can populate the PLAN_TABLE for all copies at once if you leave off the COPY option in the EXPLAIN PACKAGE statement.com. the new package will be created but any changes in access path will cause messages to be issued. Copy prod stats. The relative advantage of EXPLAIN ONLY is that you may find access path changes that you want to happen.com/blogs/db2zos/this-apar-is-just-way-too-cool-to-just-be-anapar-61153?rss=1 is about the SYSPROC. SG24-8222-00. Last. Also. or comments. you can specify a BIND with “EXPLAIN ONLY”. Then you can run the SQL we’ve discussed to verify that the optimizer anticipates the same or lower costs before actually doing the BIND or REBIND. To exploit this. to give you more control over the optimization process. DB2 11 for z/OS Performance Topics (Draft). Conclusion I hope this article has given you some ideas to consider. Please send me an email with your experiences. in other words. The procedure to support modeling the environment is in Chapter 49 of the same book. . you need only have bound the package in DB2 9 or later.ADMIN_EXPLAIN_MAINT stored procedure. Bibliography Willie Favero’s blog at http://it. To accomplish this. so it is still a good idea to remember to BIND and REBIND with EXPLAIN. but the ability to simulate cpu speeds. mostly in DB2 Backup and Recovery. He has worked at BMC since 1990. product author. and has held his current position as Chief Architect since 2007. 2014. SC19-4060-02.DB2 for z/OS 11Command Reference. 2014. Jim can be reached at jim_dee@bmc. Copyright IBM Corporation 1982. 2013. Biography Jim Dee is the Chief Architect for DB2 for z/OS at BMC Software. Copyright IBM Corporation 1983. Copyright IBM Corporation 1982. as a developer. . DB2 for z/OS 11 SQL Reference. SC19-4054-01. SC19-4066-02. DB2 for z/OS 11 Managing Performance. and architect.com.