|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oracle SQL Cost-based Optimizer Index Column OrderI have heard it said that because Oracle will reorder the WHERE clause statements to ensure that the leading edge of an index will be used if possible and, that because the new skip-scan index search method is available, it is not required to try to properly order the columns in a concatenated index. In tests against an Oracle database (9.0.1), I found this assertion, that the order of columns in a concatenated index is not important, to be false and can result in poorly performing queries. This chapter shows the results from some basic tests to try to disprove this statement.
Index Order and Structure
In a standard concatenated B-tree index, the first level will be based on the initial column values, and subsequent levels on the following columns. If the index is properly built, then the mapping from these levels into the source table will be fairly linear; that is, the index clustering factor will be close to the number of dirty blocks in the table. Exhibit 1 shows a properly ordered index in relation to its table.
Exhibit 1 -
A
Well-Ordered Index with a Low Clustering Factor If, on the other hand, an improper order is selected for the columns in the index, then the index will not be a linear match to the source table and a large clustering factor that is closer to the number of rows in the table will result. A large clustering factor results in more table reads to process the same amount of data as a small clustering factor. A poorly ordered index is shown in Exhibit 2.
Exhibit 2 -
A Poorly
Ordered Index with a High Clustering Factor A clustering factor can be equated to the number of table block reads that would be required to perform a full table scan based on reads from the index. A large clustering factor indicates that table blocks would be required to be accessed more than once for a particular index scan operation. The order in which a table’s columns are specified in a concatenated B-tree index can make a significant difference in the indexes ordering, and hence in its efficiency. Let’s examine some examples.
Examples Using Various Column Orders
in an Index When I need a test table, I use the DBA series of views to provide input to a CTAS (create table as select). For this example I will use the dba_objectsview because it has an easy structure to use for indexing. Exhibit 3 shows the CTAS and various statistics for the table TEST created as an image of the dba_objects view. CTAS to Create TEST Table SQL> create table test as select * from dba_objects; Table created. SQL> select count(*) from test; COUNT(*) ---------- 31472 SQL> SELECT 'test',COUNT( DISTINCT( SUBSTR( dbms_rowid.rowid_to_restricted(ROWID,1),1,8))) blocks 2 FROM test; TEST BLOCKS ---- ------ test 434 SQL> desc test Name Null? Type -------------- ------ -------------- OWNER VARCHAR2(30) OBJECT_NAME VARCHAR2(128) SUBOBJECT_NAME VARCHAR2(30) OBJECT_ID NUMBER DATA_OBJECT_ID NUMBER OBJECT_TYPE VARCHAR2(18) CREATED DATE LAST_DDL_TIME DATE TIMESTAMP VARCHAR2(19) STATUS VARCHAR2(7) TEMPORARY VARCHAR2(1) GENERATED VARCHAR2(1) SECONDARY VARCHAR2(1) Exhibit 3
CTAS to Create TEST Table Notice the number of blocks and the number of rows (434 and 31472), these will be important once we have our index clustering factors. Now that we have our table, let’s create several indexes using the same columns but placing them in different orders, the columns we will use are the OBJECT_TYPE, OWNER, and OBJECT_NAME. The possible orders for these columns are: n OBJECT_TYPE, OWNER, OBJECT_NAME n OBJECT_TYPE, OBJECT_NAME, OWNER n OBJECT_NAME, OWNER, OBJECT_TYPE n OBJECT_NAME, OBJECT_TYPE, OWNER n OWNER, OBJECT_NAME, OBJECT_TYPE n OWNER, OBJECT_TYPE, OBJECT_NAME Let’s select a couple of these orders and create some example indexes. Exhibit 4 shows the create statements for these indexes. Also shown in Exhibit 4 are the various clustering factors for the indexes, but let’s pull them all together in one spot and compare them; see Exhibit 5. SQL> create index test_ord1 on test(object_type,owner,object_name) SQL> / Index created. SQL> analyze index test_ord1 compute statistics; Index analyzed. SQL> select clustering_factor from user_indexes where index_name='TEST_ORD1'; CLUSTERING_FACTOR ----------------- 925 SQL> create index test_ord2 on test(object_name,owner,object_type); Index created. SQL> analyze index test_ord2 compute statistics; Index analyzed. SQL> select clustering_factor from user_indexes where index_name='TEST_ORD2'; CLUSTERING_FACTOR ----------------- 25289 SQL> create index test_ord3 on test(owner,object_name,object_type); Index created. SQL> analyze index test_ord3 compute statistics; Index analyzed. SQL> select clustering_factor from user_indexes where index_name='TEST_ORD3' SQL> / CLUSTERING_FACTOR ----------------- 451 SQL> create index test_ord4 on test(owner,object_type,object_name); Index created. SQL> analyze index test_ord4 compute statistics; Index analyzed. SQL> select clustering_factor from user_indexes where index_name='TEST_ORD4'; CLUSTERING_FACTOR ----------------- 908 Exhibit 4.Creation
of Various Indexes with Different Column Orders SQL> select index_name, clustering_factor from user_indexes where index_name like'TEST%'; INDEX_NAME CLUSTERING_FACTOR ----------------------- ----------------- TEST_ORD1 925 TEST_ORD2 25289 TEST_ORD3 451 TEST_ORD4 908 Exhibit 5.The
Clustering Factors for the Indexes As you can see, the clustering factors range from a low of 451 for TEST_ORD3 to a high of 25289 for TEST_ORD2. So, based on the clustering factor TEST_ORD3 is the best ordered index and TEST_ORD2 is the worst ordered index. Let's do some example queries and see how the various indexes fare as far as cost in the cost-based optimizer (CBO). SQL> analyze table test compute statistics; Table analyzed. Elapsed: 00:00:04.05 Now let’s issue a SELECT that will use one of the indexes, the one chosen by the CBO (see Exhibit 6). As you can see, the CBO used the index with the higher clustering factor based on the right-to-left read of the WHERE clause leading column rule. The overall cost of the operation was 2. Let’s force the optimizer to use a different index and see what the cost becomes. First, let’s use the lowest clustering factor index, TEST_ORD3. We will force the query to use the index through use of the index hint (Exhibit 7). SQL> select count(*) from test 2 where 3 owner='DBAUTIL' and object_type='TABLE'; COUNT(*) -------- 338 Elapsed: 00:00:00.02 Execution Plan --------------------------------------------- 0 SELECT STATEMENT Optimizer= CHOOSE (Cost=2 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'TEST_ ORD1' (NON-UNIQUE) (Cost=2 Card=65 Bytes=845) Statistics --------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 384 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 6.Issuing
a SELECT Using One of the Indexes SQL> select /*+ index(test test_ord3) */ count(*) from test 2 where 3 owner='DBAUTIL' and object_type='TABLE'; COUNT(*) -------- 338 Elapsed: 00:00:00.02 Execution Plan --------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'TEST_ ORD3' (NON-UNIQUE) (Cost=14 Card=65 Bytes=845) Statistics --------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 384 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 7.Forcing
the Query to Use the Index through Use of the INDEX Hint Note that the cost (14) was seven times the cost of our first query (2). The use of the TEST_ORD3 index resulted in a higher cost because the query had to traverse more of the index since the OWNER column was not the leading column. Let’s examine the results form using the highest clus- tering cost index, TEST_ORD2 (Exhibit 8). Wow! A whopping 110 times higher cost (221) over the lowest cost (2) so far. Have you noticed a quasi-correlation between the consistent gets statistics and the cost factors? SQL> select /*+ test(test_ord2) */ count(*) from test 2 where 3* owner='DBAUTIL' and object_type='TABLE' SQL> / COUNT(*) -------- 338 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=221 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 INDEX (FULL SCAN) OF 'TEST_ ORD2' (NON-UNIQUE) (Cost=221 Card=65 Bytes=845) Statistics -------------------------------------------- 0 recursive calls 0 db block gets 221 consistent gets 0 physical reads 0 redo size 384 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 8.
Using TEST_ORD2 to View Results Now look at TEST_ORD4 (Exhibit 9). TEST_ORD4 performs as well as the other two low-cost indexes. Let’s try the query with no index by using the full hint (Exhibit 10) and see what happens. aultdb1>select /*+ index(test test_ord4) */ count(*) from test 2 where 3 owner='DBAUTIL' and object_type='TABLE'; COUNT(*) -------- 338 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'TEST_ ORD1' (NON-UNIQUE) (Cost=2 Card=65 Bytes=845) Statistics -------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 384 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 9.
TEST_ORD4 SQL> select /*+ full(test) */ count(*) from test 2 where 3* owner='DBAUTIL' and object_type='TABLE' SQL> / COUNT(*) -------- 338 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'TEST' (Cost=67 Card=65 Bytes=845) Statistics -------------------------------------------- 0 recursive calls 36 db block gets 446 consistent gets 0 physical reads 0 redo size 384 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 10.
Using the FULL Hint As you can see, the indexes are making a difference as far as cost. However, if you improperly arranged the index columns as in TEST_ORD2, and you had no other column orders to compare to, from these statistics you would choose to use a full table scan, increasing your cost by up to a factor of 30 based on the best performing index. But what about a situation that requires an index scan? Lets change our query to use a like operation (Exhibit 11) and see what the CBO does with it. SQL> select count(*) from test 2 where 3* owner='DBAUTIL' and object_name like 'DBA%' SQL> / COUNT(*) -------- 6 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=29) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'TEST_ ORD2' (NON-UNIQUE) (Cost=2 Card=1 Bytes=29) Statistics -------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 383 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 11.
Using a LIKE Operation Not what we expected was it? In this case it looks like the CBO chose TEST_ORD2 although it has the highest clustering factor. Let’s look at some of the other indexes and see why. First, our lowest clustering factor index, TEST_ORD3 (Exhibit 12). As you can see, this index also produced a cost of 2, so the optimizer probably chose the TEST_ORD2 index based on column order. Now look at our previous queries’ high performer, TEST_ORD1 (Exhibit 13). There is our 221 cost again, and 221 consistent gets. Obviously, the TEST_ORD1 index would not be a good candidate for this query. How about TEST_ORD4? (See Exhibit .14.) Again, this demonstrates your columns you can reduce the cost of processing. For TEST_ORD4, we would have increased our cost by a factor of 7 over TEST_ORD2 or TEST_ORD3 by using the column order in TEST_ORD4 and by a factor of 110 using the order from TEST_ORD1. What would the cost be for a full table scan in this case? See Exhibit 15. SQL> select /*+ index(test test_ord3) */ count(*) from test 2 where 3* owner='DBAUTIL' and object_name like 'DBA%' SQL> / COUNT(*) -------- 6 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=29) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'TEST_ ORD3' (NON-UNIQUE) (Cost=2 Card=1 Bytes=29) Statistics -------------------------------------------- 0 recursive calls 0 db block gets 2 consistent gets 0 physical reads 0 redo size 383 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 12.
TEST_ORD3 SQL> select /*+ index(test test_ord1) */ count(*) from test 2 where 3* owner='DBAUTIL' and object_name like 'DBA%' SQL> / COUNT(*) -------- 6 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=221 Card=1 Bytes=29) 1 0 SORT (AGGREGATE) 2 1 INDEX (FULL SCAN) OF 'TEST_ ORD1' (NON-UNIQUE) (Cost=221 Card=1 Bytes=29) Statistics -------------------------------------------- 0 recursive calls 0 db block gets 221 consistent gets 0 physical reads 0 redo size 383 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 13.
TEST_ORD1 SQL> select /*+ index(test test_ord4) */ count(*) from test 2 where 3* owner='DBAUTIL' and object_name like 'DBA%' SQL> / COUNT(*) -------- 6 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=29) 1 0 SORT (AGGREGATE) 2 1 INDEX (RANGE SCAN) OF 'TEST_ ORD4' (NON-UNIQUE) (Cost=14 Card=1 Bytes=29) Statistics -------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 383 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 14.
TEST_ORD4 SQL> select /*+ full(test) */ count(*) from test 2 where 3* owner='DBAUTIL' and object_name like 'DBA%' SQL> / COUNT(*) -------- 6 Elapsed: 00:00:00.02 Execution Plan -------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1 Bytes=29) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'TEST' (Cost=67 Card=1 Bytes=29) Statistics -------------------------------------------- 0 recursive calls 36 db block gets 446 consistent gets 0 physical reads 0 redo size 383 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Exhibit 15.
Full Table Scan As we could have anticipated, the same as with the other query, but, had we used the column order in TEST_ORD1 for our index, based solely on the results from our analysis of the first query, we would choose a full table scan for this query and again increased our cost by up to a factor of 30 over the best performing indexes. These examples show that the index column order should be
analyzed for all of the major queries and a matrix showing cost prepared.
Using the matrix in Exhibit 16,
the index with the lowest overall average cost for all queries should be used
instead of just blindly choosing a column order. Query TEST_ORD1 TEST_ORD2 TEST_ORD3 TEST_ORD4FTS ------- --------- --------- --------- ------------ Q1 2 221 14 267 Q2 221 2 2 1467 Average 111.5 111.5 8 867 Exhibit 16.
Matrix Showing Cost Prepared So, for our indexes, the TEST_ORD3 or TEST_ORD4 column orders are the best choices, assuming that the queries have equal weight in our applications. However, if Q1 was performed thousands of times a day and Q2 only a few, then TEST_ORD1 would be a better choice; while if Q2 were the predominate query, then TEST_ORD2 would be a better choice. All of this goes to show that you must take into consideration the ordering of index columns and the predominant queries that will access those indexes in order to make a logical and rational choice of index column order.
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 1996 - 2011 by
Burleson Enterprises. All rights reserved.
Oracle® is the registered trademark
of Oracle Corporation. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||