 |
|
The hash_area_size and
Disk I/O
Oracle Tips by
Burleson
|
As we have demonstrated, the hash join is more
memory intensive than a nested loop join. To be faster than a nested
loop join, we must set the hash_area_size large enough to hold
the entire hash table in memory (about 1.6 times the sum of the rows
in the table). If the hash join overflows the hash_area_size
memory, the hash join will page into the TEMP tablespace, severely
degrading the performance of the hash join. You can use the following
script, hash_area.sql, to dynamically allocate the proper
hash_area_size for your SQL query in terms of the size of your
target table.
hash_area.sql
set heading
off;
set feedback off;
set verify off;
set pages 999;
spool run_hash.sql
select
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
dba_segments
where
segment_name = upper('&1');
spool off;
@run_hash
Here is the output from this script. As you see, we pass
the driving table name, and the script generates the appropriate
alter session command to ensure that we have enough space in
hash_area_size RAM to hold the driving table.
SQL> @hash_area
customer
alter session set hash_area_size=3774873;
In addition to seeing the hash_area_size, we must
also be able to adjust the degree of parallelism in cases where we use
a full-table scan to access the tables in a hash join. Let’s take a
look at how we can combine the use_hash and parallel
hints to improve the performance of hash joins.
The use_hash Hint
The use_hash hint requests a hash join
against the specified tables. The following query is an example of a
query that has been hinted to force a hash join with parallel query:
select /*+
use_hash(e,b) parallel(e, 4) parallel(b, 4) */
e.ename,
hiredate,
b.comm
from
emp e,
bonus b
where
e.ename = b.ename
;
Here is the execution plan for the hash join. Note that
both tables in this join are using parallel query to obtain their
rows.
OPERATION
----------------------------------------------------------------------
OPTIONS
OBJECT_NAME
POSITION
------------------------------ ---------------------------- ----------
SELECT STATEMENT
3
HASH JOIN
1
PARALLEL_TO_SERIAL
TABLE ACCESS
FULL
EMP
1
PARALLEL_TO_PARALLEL
TABLE ACCESS
FULL
BONUS
2
Finally, we must enable Oracle to invoke the
hash join. Since the CBO will naturally favor nested loop joins, we
must reset some important parameters to enable hash joining.
Enabling Your Database to Accept the use_hash Hint
The use_hash hint is very finicky, and
there are many conditions that must be satisfied. It is not uncommon
to find that a use_hash hint is ignored, and here are some
common causes of this problem.
-
Check initialization parameters Make
sure that you have the proper settings for
optimizer_index_cost_adj and optimizer_max_permutations
to limit the number of table join evaluations. Also check your
values for hash_area_size and hash_multiblock_io_count.
-
Verify the driving Table Make sure that
the smaller table is the driving table (the first table in the
from clause). This is because a hash join builds the memory
array using the driving table.
-
Analyze CBO statistics Check that
tables and/or columns of the join tables are appropriately analyzed.
-
Check for skewed columns Histograms are
recommended only for nonuniform column distributions. If necessary,
you can override the join order chosen by the cost-based optimizer
using the ordered hint.
-
Check RAM region Ensure that
hash_area_size is large enough to hold the smaller table in
memory. Otherwise, Oracle must write to the TEMP tablespace, slowing
down the hash join. Oracle recommends that the hash_area_size
for the driving table be set at 1.6 times the sum of bytes for the
driving table, and you can use the hash_area.sql script to
set the hash_area_size for your query session.
Next, let’s take a look at one of the oldest
table join methods, the sort merge join.
This is an excerpt from "Oracle High-Performance SQL Tuning" by Donald
K. Burleson, published by Oracle Press.
|