 |
|
The use_nl Hint
Oracle Tips by
Burleson
|
The use_nl hint forces a nested loop
join against the target tables. Unlike the other join hints that
specify both tables, the use_nl hint only requires the name of
the driving table (the first table in the from clause when
using the CBO). The use_nl hint is seldom used in SQL tuning
because both the CBO and the RBO tend to favor nested loop joins over
hash or merge joins. However, the use_nl hint is sometimes
useful for changing the driving table without changing the order of
the tables in the from clause. This is because it is not always
evident which table should be the driving table. Sometimes both tables
have a comparable distribution of rows, and you can often improve the
performance of a nested loop join by changing the driving table for
the join. For more details on the use_nl hint, see Chapter 12.
The Hash Join
The hash join was first introduced in
Oracle7.3 as an alternative to nested loop joins. The hash join
technique improves the join speed of equi-joins by loading the driving
table into RAM and using a hash technique to join into the second
table.
A hash join is
the recommended join method when the tables are different sizes and
the smaller table is close to the available memory in
hash_area_size. The basic premise of any hash join algorithm is to
build an in-memory hash table from the smaller of the input row
sources, the build input, and use the larger input to probe
the hash table (Figure 16-7).
Figure 7: A hash join
Hash join algorithms work well for simple hash
joins when the available hash area memory is large enough to hold the
build input, but it is not necessary to fit entire probe input in
hash_area_size memory. If the smaller driving table (the first
table in the from clause) in a hash join does not fit into
hash_area_size, Oracle will partition the hash probes and use
temporary segments in the TEMP tablespace to manage the overflow.
TIP: When the smaller driving table is small
enough to fit entirely into the RAM allocated by hash_area_size,
then the hash join will generally perform faster than a nested loop
join. Remember, you can adjust the hash_area_size for your
query at the session level using the alter session command.
Oracle recommends that the hash_area_size for the driving table
be set at 1.6 times the sum of bytes for the driving table.
Internally, a hash join involves two phases,
the partition phase and the join phase. To fully understand hash
joins, let’s examine each of these phases.
The Partitioning Phase of a Hash Join
When the
driving table cannot fit entirely into the RAM allocated by
hash_area_size, a hash partition occurs. In a partition,
contiguous pieces of the driving table are split into partitions on
temporary segments in the TEMP tablespace (Figure 16-8).
Figure 8: Hash query
partitioning
This partitioning effectively divides the hash
query into many smaller inputs that can be independently processed.
However, the problem of partitioning the inputs is not trivial. It is
difficult to have a partitioning scheme that will split any data
distribution into equal partitions without any skew. To minimize any
skew in the partitioning, Oracle relies on column histograms and
special bit-vector filtering techniques.
If, after partitioning, the smaller of the two
inputs is larger than the size of the memory available to build the
hash table, the hash table overflow is dealt with by performing a
nested-loops hash join. The hash table is created with the
build input partition and then the probe phase joins the tables.
Then, the remainder of the build input is iteratively retrieved, and
the hash table is built and joined with all the probe partitions until
all of the build input is consumed.
The Join Phase of a Hash Join
Partition pairs of the build and probe inputs
with the same key values are then joined (in what is called the
join phase). This algorithm, known as the grace join,
dramatically reduces the search space and key comparisons required for
doing the join.
A limitation of the hash join algorithm is that
it is based on the assumption that the distribution of join values in
the tables is not skewed and that each partition receives
approximately the same number of rows. This is generally not true and
partition skew is a reality that needs to be dealt with.
This is an excerpt from "Oracle High-Performance SQL Tuning" by Donald
K. Burleson, published by Oracle Press.
|