| |
 |
|
Tuning Guidelines for
Distributed Joins
Oracle Tips by
Burleson
|
Here is a set of guidelines that is used by SQL
tuning experts for tuning distributed joins:
-
The overall goal of tuning a distributed join
is to minimize the amount of data being transferred across Net8. If
possible, considering using advanced replication to replicate the
table to the local site to improve join speed.
-
The SQL that is passed to the remote instance
is available in the other column of the plan table. Hence,
you can use the enhanced explain plan utility
remote_plan.sql to tune distributed joins. You also must explain
the SQL portions on the remote database to get the whole picture.
-
The local table should always be the driving
table for a remote join. Oracle provides the driving_site
hint for this purpose. The driving_site hint tells Oracle to
make the site where the referenced table resides the driving site.
You want the driving site to be the one that has the larger amount
of data, so that the smaller amount of data will be passed over the
network.
-
For tables that join more than four tables,
the parse time to determine the table join order can be excessive
because n! table join methods must be evaluated. Joins of more than
four tables should be manually tuned for optimal table join order
and made permanent by using the ordered hint or stored
outlines. The use of the ordered hint or stored outlines will
cause the optimizer to skip the time-consuming table join
evaluation, improving overall performance.
-
For cases where both tables reside at a remote
database, you can force execution to be at the driving site by
converting the distributed join into a view that you define at the
remote instance and then query from your local instance.
-
When the amount of data to be retrieved from the
remote instance is small in relation to the local table total size,
then converting the join to a correlated subquery can improve
performance.
-
The entire SQL is not passed to the remote instance;
only the remote portion of the query is passed. Since Oracle only
sends the piece of the query to the remote site, the remote site is
not aware of the full SQL join. Hence, each CBO executes
independently of the others in a distributed query.
-
Always verify that the index column for the
join condition exists as an index on the remote site. It is very
common in untuned distributed joins to see remote table access being
performed with a full-table scan.
-
Always use the CBO for distributed queries.
The rule-based optimizer cannot generate nested loop joins between
remote and local tables when the tables are joined with equi-joins.
More importantly, the RBO cannot execute joins remotely, and all
joins must be executed at the driving site. This means that Oracle
must fetch the rows across Net8 before beginning the join.
-
All sorting of result sets is performed at the
initiating database. This can cause a huge impact on the TEMP
tablespace, so it is very important to minimize network traffic for
incoming result sets from the remote database.
In addition to these guidelines, Oracle has
other important limitations on distributed joins:
-
In the CBO, no more than 20 indexes per remote
table are considered when generating query plans. The order of the
indexes varies; if the 20-index limitation is exceeded, then random
variation in query plans can result.
-
Reverse indexes on remote tables are not
visible to the optimizer. This can prevent nested loop joins from
being used for remote tables if there is an equi-join using a column
with only a reverse index.
-
The CBO cannot recognize that a remote object
is partitioned. Thus, the optimizer can generate less than optimal
plans for remote partitioned objects, particularly when partition
pruning would have been possible had the object been local.
-
Remote views are not merged, and the optimizer
has no statistics for them. It is best to replicate all mergeable
views at all sites to obtain good query plans.
Conclusion
This chapter has been a basic review of the
table joins methods and the issues surrounding the optimization of
table join operations. The main points of this chapter include these:
-
The RBO
can only invoke a sort merge join or a nested loop join. Advanced
join methods such as hash joins and star joins are only available to
the CBO.
-
As self-join is a common operation against
tables where the rows are stored in time series order and you need
to compare values between time slices. These queries use a nested
loops access method and utilize the column index.
-
Anti-joins subqueries (i.e. subqueries with a
NOT IN clause) can sometime be replaced by a standard equi-join
where you remove the extra values with a where column is not null
clause.
-
A semi-join subquery (i.e. a subquery with an
exists clause) can often be rewritten as a standard equi-join
using a select distinct clause to remove duplicate rows.
-
The hash join can outperform the nested loop
join if the driving table is small and fits entirely into the
hash_area_size RAM region. You can use the hash_area.sql
script to dynamically generate the alter session command to
resize your hash_area_size for a particular query.
-
The CBO should always be used for remote table
joins because the RBO cannot access remote dictionary values.
-
Remote table joins can be explained with the
remote_plan.sql script, and you must also explain the SQL on the
remote database to fully understand the whole execution plan.
-
The goal of tuning a distributed table join is
to minimize the amount of data transported over the network.
Next, let’s move on and take a look at tuning
Oracle DML operations. While DML is relatively straightforward, there
are some important tuning mechanisms for improving the performance of
DML.
This is an excerpt from "Oracle High-Performance SQL Tuning" by Donald
K. Burleson, published by Oracle Press.
|