 |
|
Using Oracle Parallel
Query
Oracle Tips by Burleson
|
There are several Oracle parameters that are
set when using Oracle Parallel Query. Many of these are default values
and are set by Oracle when your database is created. Oracle Parallel
Query can be turned on in several ways. You can turn it on permanently
for a table, or you can isolate the Parallel Query to a single table.
Permanent Parallelism
L 12-1
Alter table customer parallel degree 35;
Single Query Parallelism
L 12-2
select /*+ FULL(emp) PARALLEL(emp, 35) */
emp_name
from
emp;
Note the use of the double hints in the
preceding query. Most Oracle DBAs always use the FULL hint with the
PARALLEL hint because they are both required to use Oracle Parallel
Query.
Most Oracle DBAs identify those tables that
perform full table scans and then alter those tables to specify the
degree of parallelism. This way, all full table scans against the
tables will invoke Oracle Parallel Query.
Parallel Query Parameters
There are several important Oracle parameters
that have a direct impact on the behavior of Oracle Parallel Query:
-
sort_area_size
(pre Oracle) The higher the value, the more memory is
available for individual sorts on each parallel process. Note that
the sort_area_size parameter allocates memory for every query
on the system that invokes a sort. For example, if a single query
needs more memory, and you increase the sort_area_size,
all Oracle tasks will allocate the new amount of sort area,
regardless of whether they will use all of the space. It is also
possible to dynamically change the sort_area_size for a
specific session with the alter session command. This
technique can be used when a specific transaction requires a larger
sort area than the default for the database. Note that in Oracle,
the sort areas are managed by the pga_aggregate_target
parameter.
-
parallel_min_servers This value
specifies the minimum number of query servers that will be active on
the instance. There are system resources involved in starting a
query server, and having the query server started and waiting for
requests will accelerate processing. Note that if the actual number
of required servers is less than the values of
parallel_min_servers, the idle query servers will be consuming
unnecessary overhead, and the value should be decreased.
-
parallel_max_servers This value
specifies the maximum number of query servers allowed on the
instance. This parameter will prevent Oracle from starting so many
query servers that the instance cannot service all of them properly.
-
optimizer_percent_parallel This
parameter defines the amount of parallelism that the optimizer uses
in its cost functions. The default of 0 means that the optimizer
chooses the best serial plan. A value of 100 means that the
optimizer uses each object's degree of parallelism in computing the
cost of a full table scan operation.
NOTE: Cost-based optimization will always be used
for any query that references an object with a nonzero degree of
parallelism. Hence, you should be careful when setting parallelism if
your default is optimizer_mode=RULE.
Setting the Optimal Degree of Parallelism
Determining the optimal degree of parallelism
for Oracle tasks is not easy. Because of the highly volatile nature of
most SMP systems, there is no general rule that will apply to all
situations. As you may know, the degree of parallelism is the number
of operating system processes that are created by Oracle to service
the query.
Oracle states that the optimal degree of
parallelism for a query is based on several factors. These factors are
presented in their order of importance:
-
The number of CPUs on the server.
-
The number of physical disks that the tables
resides on.
-
For parallelizing by partition, the number of
partitions that will be accessed, based upon partition pruning (if
appropriate).
-
For Parallel DML operations with global index
maintenance, the minimum number of transaction freelists among all
the global indexes to be updated. The minimum number of transaction
freelists for a partitioned global index is the minimum number
across all index partitions. This is a requirement in order to
prevent self-deadlock.
For example, if your system has 20 CPUs and you
issue a Parallel Query on a table that is stored on 15 disk drives,
the default degree of parallelism for your query is 15 query servers.
There has been a great deal of debate about
what number of parallel processes results in the fastest response
time. As a general rule, the optimal degree of parallelism can be
safely set to n-1 where n is the number of processors in
your SMP or MPP cluster.
In practice, the best method is a
trial-and-error approach. When tuning a specific query, the DBA can
set the query to force a full table scan and then experiment with
different degrees of parallelism until the fastest response time is
achieved.
Finding the Number of CPUs on Your Database Server
Sometimes the Oracle DBA does not know the
number of CPUs on the database server. The following UNIX commands can
be issued to report on the number of CPUs on the database server.
Windows NT
If you are using MS Windows NT, you can find
the number of CPUs by entering the Control Panel and choosing the
System icon.
Linux
To see the number of CPUs on a Linux server,
you can cat the /proc/cpuinfo file. In the following example,
we see that our Linux server has four CPUs:
L 12-3
>cat /proc/cpuinfo|grep
processor|wc -l
4
Solaris
In Sun Solaris, the prsinfo command can
be used to count the number of CPUs on the processor.
L 12-4
>psrinfo -v|grep "Status of processor"|wc -l
24
IBM-AIX
The following example is taken from an AIX
server, and shows that the server has four CPUs:
L 12-5
>lsdev -C|grep Process|wc –l
36
HP/UX
In HP UNIX, you can use the glance or top
utilities to display the number of CPUs.
NOTE: Parallel hints will often speed up index
creation even on uniprocessor machines. This is not because there is
more processing power available, but because there is less I/O wait
contention with multiple processes. On the other end of the spectrum,
we generally see diminishing elapsed time when the degree of
parallelism exceeds the number of processors in the cluster.
There are several formulas for computing the
optimal parallelism. Oracle provides a formula for computing the
optimal parallelism based on the number of CPUs and the number of
disks that the file is striped onto. Assume that D is the
number of devices that P is striped across (either SQL*Loader striping
or OS striping). Assume that C is the number of CPUs available:
L 12-6
P = ceil(D/max(floor(D/C), 1))
Simply put, the degree of parallelism for a
table should generally be the number of devices on which the table is
loaded, scaled down so that it isn't too much greater than the number
of CPUs. For example, with ten devices and eight CPUs, a good choice
for the degree of parallelism is ten. With only four CPUs, a better
choice of parallelism might be five.
However, this complex rule is not always
suitable for the real world. A better rule for setting the degree of
parallelism is to simply use the number of CPUs:
L 12-7
As a general rule, you can set the degree of
parallelism to the number of CPUs on your server, minus one. This is
because one processor will be required to handle the Parallel Query
coordinator.
Setting Automatic Parallelism
Oracle Parallel Query allows you to control the
number of Parallel Query slave processes that service a table. Oracle
Parallel Query processes can be seen on the server because background
processes will start when the query is serviced. These factotum
processes are generally numbered from P000 through Pnnn. For example,
if our server is on AIX, we can create a script to gather the optimal
degree of parallelism and pass this argument to the SQL.
parallel_query.ksh
L 12-8
#!/bin/ksh
# Get the number of CPUs
num_cpu=`lsdev –C|grep mem|wc –l`
optimal_parallelism=`expr $num_cpu`-1
sqlplus system/manager<<!
select /*+ FULL(employee_table) PARALLEL(employee_table, $optimal_parallelism)*/
employee_name
from
employee_table;
exit
!
Resource Contention and Oracle Parallel Query
There are several sources of contention in
Oracle Parallel Query. As we already mentioned, Oracle Parallel Query
works best on servers that have multiple CPUs, but we can often see
disk contention when the whole table resides on the same physical
disk. In short, the use of Oracle Parallel Query can precipitate
several external bottlenecks. These include:
-
Overloaded processors This is normally
evidenced when the vmstat run queue values exceed the number of CPUs
on the server.
-
Disk enqueues When multiple processes
compete for data blocks on the same disk, I/O related slowdowns may
occur. This is evidenced by high activity from the UNIX iostat
utility.
-
Increased RAM usage The parallel sorting
feature may increase the demands on the server RAM. Each parallel
process can allocate sort_area_size in RAM to manage the
sort.
Let's explore
things that we can do to prevent contention-related slowdowns when
using Oracle Parallel Query. To be most effective, the table should be
partitioned onto separate disk devices, such that each process can do
I/O against its segment of the table without interfering with the
other simultaneous query processes. However, the client/server
environment of the 1990s relies on RAID or a logical volume manager (LVM),
which scrambles datafiles across disk packs in order to balance the
I/O load. Consequently, full utilization of Parallel Query involves
“striping” a table across numerous datafiles, each on a separate
device. It is also important to note that large contiguous extents can
help the query coordinator break up scan operations more efficiently
for the query servers. Even if your system uses RAID or a logical
volume manager (such as Veritas), there are still some performance
gains from using Parallel Query. In addition to using multiple
processes to retrieve the table, the Query Manager will also dedicate
numerous processes to simultaneously sort the result set (see Figure
12-2).
Figure 12-108: Increase RAM memory demands with
Oracle Parallel Query
As we know, a RAM memory overload can cause
swapping on the database server. Because of the parallel sorting
feature, it is also a good idea to beef up the memory on the
processor. We may also see the TEMP tablespace fall short when using
Parallel Query and Parallel DML. Here is an example of the error:
L 12-9
SQL> alter session enable parallel dml;
Session altered.
SQL> insert /*+ parallel(customer, 6) */ into customer;
2 select /*+ full(c) parallel(c, 6) */
3 from customer c;
ERROR at line 3:
ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128000 in tablespace
CUSTOMER_TS
Disk Contention with Oracle Parallel Query
Many DBAs are surprised to note that Oracle
Parallel Query does not always improve the speed of queries where the
whole table resides on a single physical disk.
The data retrieval for a table on a single disk
will not be particularly fast, since all of the parallel retrieval
processes may be competing for a channel on the same disk. But each
sort process has its own sort area (as determined by the
sort_area_size parameter before Oracle), so the sorting of the
result set will progress very quickly. In addition to full table scans
and sorting, the Parallel Query option also allows for parallel
processes for merge JOINs and nested loops.
Using Parallel Query Hints
Invoking the Parallel Query with hints requires
several steps. The most important is that the execution plan for the
query specifies a full table scan. If the output of the execution plan
does not indicate a full table scan, the query can be forced to ignore
the index by using the FULL hint.
The number of processors dedicated to service
an SQL request is ultimately determined by Oracle Query Manager, but
the programmer can specify the upper limit on the number of
simultaneous processes. When using the cost-based optimizer, the
PARALLEL hint can be embedded into the SQL to specify the number of
processes. For instance:
L 12-10
select /*+ FULL(employee_table)
PARALLEL(employee_table, 35) */
employee_name
from
employee_table
where
emp_type = ''SALARIED'';
If you are using an SMP or MPP database server with many
CPUs, you can issue a parallel request and leave it up to each Oracle
instance to use its default degree of parallelism. For example:
L 12-11
select /*+ FULL(employee_table)
PARALLEL(employee_table, DEFAULT, DEFAULT) */
employee_name
from
employee_table
where
emp_type = ''SALARIED'';
In most cases, it is better for the Oracle DBA
to determine the optimal degree of parallelism and then set that
degree in the data dictionary with the following command:
L 12-12
Alter table employee_table parallel degree
35;
This way, the DBA can always be sure of the degree of
parallelism for any particular table.
Oracle also provides the
parallel_automatic_tuning parameter to assist in setting the best
degree of parallelism. When setting parallel_automatic_tuning,
you only need to specify parallelism for a table, and Oracle will
dynamically change the parallel_adaptive_multi_user parameter
to override the execution plan in favor of maintaining an acceptable
overall load on the database. You should also note that setting
parallel_automatic_tuning will cause extra storage in the large
pool because Oracle will allocate message buffers from the large pool
instead of the shared pool.
Now let's move on to look at how the Oracle DBA
can query the behavior of parallel queries.
This is an excerpt from "Oracle
High Performance tuning with STATSPACK" by Oracle Press.
 |
If you like Oracle tuning, you may enjoy the new book "Oracle
Tuning: The Definitive Reference", over 900 pages of BC's favorite tuning
tips & scripts.
You can buy it direct from the publisher for 30%-off and get instant access to
the code depot of Oracle tuning scripts. |
|