 |
|
Collecting Statistics
Oracle Tips by Burleson
|
You can collect statistics about the physical
storage characteristics and data distribution of an index, table,
column, or cluster, and store them as histograms in the data
dictionary. For computing or estimating statistics, computation always
provides exact values but can take longer than estimation, and
requires large amounts of temporary tablespace (up to four times the
size of your largest table). Estimation is often much faster than
computation, and the results are usually nearly exact. You cannot
compute or estimate histogram statistics for the following column
types:
* REFs
* varrays
* nested tables
* LOBs
* LONGs
* object types
Use estimation, rather than computation,
unless you feel you need exact values. Some statistics are always
computed exactly, regardless of whether you specify computation or
estimation. If you choose estimation, and the time saved by estimating
a statistic is negligible, Oracle computes the statistic exactly.
If the data dictionary already contains
statistics for the analyzed object, Oracle updates the existing
statistics with the new ones.
ANALYZE Myths
Since ANALYZE was introduced, many myths about
its use have been circulated. Some of the more harmful ones are:
* You need to ANALYZE entire schema.
* ANALYZE does a full row count no matter
what.
* You can sample 5 to 10 percent and get good
results.
* You can sample 50 rows and get good results.
Let’s take a quick look at these to determine
if any are valid.
First, the statement that you need to analyze
the entire schema to get good results is not true. You only need to
analyze the tables that have changed. Oracle provides the monitoring
clause for the CREATE an ALTER TABLE commands, which will place a
table into monitored mode; any changes on the table will result in its
being analyzed the next time the DBMS_STATS.GATHER_STATISTICS
procedure is run. However, this means that if a table undergoes only a
few INSERT, UPDATE, and DELETE operations, it will be analyzed. If the
table has several million rows, and you only change 10, there is no
need to reanalyze it. Source 4.4 shows a procedure that can be
utilized to analyze tables based on a percent change in row count.
You may need to comment out the write_out
procedure and subsequent calls to it, I like to track what tables need
analysis using a dba_running_stats table.
CREATE OR
REPLACE PROCEDURE check_tables (
owner_name in varchar2,
pchng IN NUMBER,
lim_rows IN NUMBER) AS
CURSOR get_tab_count (own varchar2) IS
SELECT table_name, nvl(num_rows,1)
FROM dba_tables
WHERE owner = upper(own);
tab_name VARCHAR2(64);
rows NUMBER;
string VARCHAR2(255);
cur INTEGER;
ret INTEGER;
row_count NUMBER;
com_string VARCHAR2(255);
PROCEDURE write_out(
par_name IN VARCHAR2,
par_value IN NUMBER,
rep_ord IN NUMBER,
m_date IN DATE,
par_delta IN NUMBER) IS
BEGIN
INSERT INTO dba_running_stats VALUES(
par_name,par_value,rep_ord,m_date,par_delta
);
END;
BEGIN
The next line Is for schemas with many tables. If you don't
lose the cursors you can exceed open_cursor limits and flood the
shared pool.
DBMS_SESSION.SET_CLOSE_CACHED_OPEN_CURSORS(TRUE);
OPEN get_tab_count (owner_name);
LOOP
BEGIN
FETCH get_tab_count INTO tab_name,
rows;
tab_name:=owner_name||'.'||tab_name;
IF rows=0 THEN
rows:=1;
END IF;
EXIT WHEN get_tab_count%NOTFOUND;
DBMS_OUTPUT.PUT_LINE('Table name: '||tab_name||' rows: '||to_char(rows));
Need to have created the get_count
procedure in the same schema
GET_COUNT(tab_name,row_count);
IF row_count=0 THEN
row_count:=1;
END IF;
DBMS_OUTPUT.PUT_LINE('Row count for '||tab_name||': '||to_char(row_count));
DBMS_OUTPUT.PUT_LINE('Ratio: '||to_char(row_count/rows));
IF (row_count/rows)>1+(pchng/100) OR
(rows/row_count)>1+(pchng/100) THEN
BEGIN
IF (row_count<lim_rows) THEN
string :=
'ANALYZE TABLE '||tab_name||' COMPUTE STATISTICS ';
ELSE
string :=
'ANALYZE TABLE '||tab_name||' ESTIMATE STATISTICS SAMPLE 30 PERCENT';
END IF;
cur :=
DBMS_SQL.OPEN_CURSOR;
DBMS_OUTPUT.PUT_LINE('Beginning analysis');
DBMS_SQL.PARSE(cur,string,dbms_sql.v7);
ret :=
DBMS_SQL.EXECUTE(cur) ;
DBMS_SQL.CLOSE_CURSOR(cur);
DBMS_OUTPUT.PUT_LINE(' Table: '||tab_name||' had to be analyzed.');
write_out(' Table:
'||tab_name||' had to be analyzed.',
row_count/rows,33,sysdate,0);
EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20002,'Error in analyze:
'||to_char(sqlcode)||' on '||tab_name,TRUE);
write_out(' Table:
'||tab_name||' error during analyze.
'||to_char(sqlcode), row_count/rows,33,sysdate,0);
IF
dbms_sql.is_open(cur) THEN
dbms_sql.close_cursor(cur);
END IF;
END;
END IF;
EXCEPTION
WHEN others THEN
null;
END;
COMMIT;
END LOOP;
CLOSE get_tab_count;
END;
Source 4.4 Procedure to conditionally ANALYZE
tables.
Notice in the procedure in Source 4.4 that you
specify the schema (owner) whose tables you wish to analyze, the
percent change (I usually use 10), and the number of rows at which to
switch from a COMPUTE to a ESTIMATE SAMPLE 30 PERCENT type of ANALYZE.
In one environment, using a procedure similar
to the one in Source 4.4, resulted in analysis times dropping from
over four hours to less than an hour.
To test the next myth, that ANALYZE does a
full row count no matter whether you use COMPUTE or ESTIMATE, I ran
various sample sizes. The tests were run first with a fixed row size
and then with a varying row size. Figure 4.3 is a chart of my results
on an Oracle8i, 8.1.7 database, and the actual numeric results are
shown in Tables 4.1 and 4.2.
Figure 4.3 Chart of statistics from various
sample sizes.
As you can see, the row counts did change with
sample size, as did other key statistics. In general, a sample size of
20 to 30 percent gives the best results on large tables. I don’t
suggest sampling based on row counts at all.
Table 4.1 Results of Various Sample Sizes for
Fixed-Size Row
Table 4.2 Results of Various Sample Sizes for
Variable-Size Row
To understand sampling based on number of
rows, let’s look at a graph that shows the results as the row count
increases from 0 to 5,000 rows for the same table. The graph is shown
in Figure 4.4.
Figure 4.4 Graph showing row count for
increasing values of rows sampled.
As you can see, the results vary widely until
a substantial number of rows have been sampled (the graph begins to
stabilize at around 2,800 rows).
Based on this research, I suggest using
COMPUTE or ESTIMATE SAMPLE 20-30 PERCENT to ANALYZE your database
tables.
See
Code Depot for Full Scripts
 |
This is an excerpt
from Mike Ault, bestselling author of "Oracle
10g Grid and Real Application Clusters".
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |
|