 |
|
Correlated Versus
Non-Correlated Subqueries
Oracle Tips by
Burleson
|
Since we have the choice of using a correlated
subquery or a non-correlated one, we need to examine the basic
properties of these constructs. In essence a “correlated” subquery
references the outer table inside the subquery, while the
non-correlated subquery does not reference the outer table. A
correlated subquery is evaluated once per row processed by the parent
query, while a non-correlated subquery is only executed once and the
result set is kept in memory (if the result set is small), or in an
Oracle temporary segment (if the result set is large). A “scalar”
subquery is a non-correlated subquery that returns a single row. Where
the subquery only returns one row, the Oracle optimizer will reduce
this result to a constant and only execute the subquery one time.
For our example, an identical query could be
written either as a correlated subquery or a non-correlated subquery,
and both will return the same result set.
Here is a non-correlated subquery:
select
count(*)
from
outer_table
where
key in
(select key from inner_table)
;
For the purpose of this discussion, we will
call the main query (the select count(*) ) the “outer” query,
as opposed to the “inner” subquery that is enclosed in parenthesis.
The choice of using a correlated or non-correlated subquery depends
upon the number of rows that are returned by the outer query and the
subquery. Given that both of these queries will return the same
result, which is the most efficient? Let’s begin our answer to this
question by exploring the number of rows that we get back from the
query.
Issues of Scale and Subqueries
In practice, the choice between a correlated or
non-correlated subquery depends upon the expected number of rows that
is returned by both the parent query and the subquery. Our goal is to
balance the overhead that is associated with each type of subquery.
-
Correlated subquery overhead The
subquery will be re-executed once for each row that is returned by
the outer query. Hence, we must ensure that the subquery uses an
index whenever possible.
-
Non-correlated subquery overhead The
subquery is executed once, and the result set is usually sorted and
kept in an Oracle temporary segment where it is referenced as each
row is returned by the parent query. In cases where the subquery
returns a large number of rows, a significant overhead will be
involved with sorting this result set and storing the temporary
segment.
In sum, our choice depends wholly on the number
of rows returned by the parent query. If our parent query returns only
a few rows, the overhead of re-executing the subquery is not so great.
On the other hand, if the parent query returns a million rows, the
subquery would be executed a million times. The same concept applies
to the number of rows returned by the subquery. If our subquery
returns only a few rows, then there is not much overhead in keeping
the result set in memory for the parent query. On the other hand, if
the subquery returns a million rows, then Oracle would need to store
this result set as a temporary segment on disk, and then sort the
segment in order to service each row of the parent query.
This is an excerpt from "Oracle High-Performance SQL Tuning" by Donald
K. Burleson, published by Oracle Press.
|