 |
|
Frequency of Statistics
Gathering
Oracle Tips by
Burleson
|
There's a great debate within the Oracle
community about how frequently to reanalyze statistics, and the level
of detail in the sample size for the statistics (Figure 14-2). Some
DBAs subscribe to the notion that the SQL statistics should be
reanalyzed very frequently to ensure optimal statistics and better CBO
execution plans. Others subscribe to the notion that statistics should
only be recomputed when the basic nature of the table data changes.
Figure 2: Statistics
gathering frequency versus sample size
Along the other dimension, we see debate
regarding the sample size that should be used when computing
statistics. We have two options available within the analyze table
command, the computes statistics option and the estimate
statistics command syntax. When we issue a computes statistics
command, the entire table is examined via a full-table scan, and very
accurate statistics are then placed inside the data dictionary for use
by the cost-based optimizer. By using the estimate statistics
syntax, samples are taken from the table, and the samples are stored
within the data dictionary. At the heart of this issue are the time
and database resource consumption required to reanalyze all of the
tables.
As you might
remember from your college statistics classes, a sample size greater
than 40 gives us averages that are within two standard deviations from
the mean for the population that is being sampled. Within Oracle,
however, there still exists a great deal of debate about the
trade-offs between taking more detail row samples and the additional
quality of the resulting statistics (Figure 14-3). Some people argue
that a sample 50 rows from each and every table gives excellent
statistics in order to drive their cost-based optimizers, while other
DBAs insist that a sample of at least 5,000 rows is required for many
table in order to get accurate statistics. Remember, the decisions of
the cost-based optimizer are only as good as the statistics that are
being fed to it.
Figure 3: The sample
size as a function of the quality of the estimate
Essentially we see the debate regarding
cost-based optimization falling along two dimensions, the frequency of
analysis and the depth of the analysis. We also have to remember that
if we plan to use optimizer plan stability on all our queries, the
statistics are meaningless to the cost-based optimizer because the
execution plan has been predetermined and saved in a stored outline.
Computing statistics for tables and indexes can
be a very time-consuming operation, especially for data warehouses as
systems that have many gigabytes or terabytes of information. In the
real world, most Oracle professionals use the estimate statistics
clause, sample a meaningful percentage of their data, and choose a
reanalysis frequency that coincides with any scheduled changes to the
database that might affect the distribution of values. For example, if
a database runs purges from their transaction table each month, the
period immediately following the purge would be a good time to
reanalyze the CBO statistics.
Regardless of your philosophy, let’s take a
look at how you can automate the reanalysis of your CBO statistics.
Gathering Statistics for the CBO
It is important that the statistics are
refreshed periodically, especially when the distribution of data
changes frequently. For this reason, the following SQL may be used to
run the analyze statements for all of the tables and indexes. It is
not always a good idea to use Oracle’s dbms_utility.analyze_schema
or dbms_ddl.analyze_object package to perform this task, since
a failure on one of the statements can affect the results of
subsequent analyze statements. When working with databases that
do not employ stored outlines, I use the following script to generate
and execute the proper SQL analyze syntax.
analyze.ksh
#!/bin/ksh
# First, we must set the environment . . . .
ORACLE_SID=$ORACLE_SID
export ORACLE_SID
# The line below is for Solaris databases. Otherwise, use /etc/oratab
ORACLE_HOME=`cat /var/opt/oracle/oratab|grep ^$ORACLE_SID:|cut -f2
-d':'`
export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH
export PATH
$ORACLE_HOME/bin/sqlplus /<<!
set pages 999
set heading off
set echo off
set feedback off
connect internal;
spool /export/home/oracle/analyze.sql;
select
'analyze table ‘||owner||’.’||table_name||' estimate statistics sample
5000 rows;'
from dba_tables
where owner not in (‘SYS’,’SYSTEM’,’PERFSTAT’);
select
'analyze index reader.'||index_name||' compute statistics;'
from dba_indexes
where owner not in (‘SYS’,’SYSTEM’,’PERFSTAT');
spool off;
set echo on
set feedback on
@/export/home/oracle/analyze
exit
!
Most shops schedule a script like this to run
weekly, or whenever there have been significant changes to the table
data. However, it is not necessary to reanalyze tables and indexes
that remain relatively constant. For example, a database in which the
tables have a constant number of rows and indexes where the
distribution of values remain constant will not benefit from frequent
reanalysis. Let’s continue with an overview of the important issues of
SQL tuning with a cost-based default optimizer.
This is an excerpt from "Oracle High-Performance SQL Tuning" by Donald
K. Burleson, published by Oracle Press.
|