 |
|
Monitoring Database Writer Contention in
Oracle8i and Oracle
Oracle Tips by Burleson
|
As we know, the “write requests” statistic has
been removed from the v$sysstat view. It has been removed because the
use of multiple buffer pools and asynchronous DBWR processes now mean
that the number of write requests is no longer useful. Hence, we must
look at new v$sysstat statistics for monitoring the database writer.
Oracle8i now provides the free buffer wait
statistic in stats$buffer_pool_statistics. This statistic stores the
number of times a free buffer was requested in the SGA, but a block
was not available. A free buffer is a buffer that is not currently
being used by other database users. If accumulated time for the free
buffer wait event is too high then you can consider adding database
writer processes.
rpt_dbwr_alert_8i.sql (post Oracle 8.0.5 only)
-- Written
by Donald K. Burleson 1/25/01
set pages 999;
column c1 heading "Dirty Queue Length" format 9,999.99
column c2 heading "Free Buffer Requests" format 999,999,999
column c3 heading "DBWR checkpoints" format 999,999
column mydate heading 'Yr. Mo Dy Hr.' format a16
select distinct
to_char(snap_time,'yyyy-mm-dd HH24') mydate,
a.value c1,
b.free_buffer_wait c2,
c.value c3
from
stats$sysstat a,
stats$buffer_pool_statistics b,
stats$sysstat c,
stats$snapshot sn
where
sn.snap_id = a.snap_id
and
sn.snap_id = b.snap_id
and
sn.snap_id = c.snap_id
and
a.name = 'summed dirty queue length'
and
c.name = 'DBWR checkpoints'
and
a.value > 100
and
b.free_buffer_wait > 0
;
Here is a listing from this script. This will alert us
when the DBWR processes may be experiencing a problem.
Yr. Mo Dy
Hr. Summed dirty queue Free Buffer Wait DBWR
checkpoints
---------------- -------------------- -----------------
----------------
2001-11-14 08 .00
505,818 0
2001-11-14 08 .00
505,690 0
2001-11-15 07 .00
391,550 0
2001-11-16 07 .00 392,046
0
2001-11-17 19 .00
395,216 1
2001-11-20 09 .00
38,096 0
2001-11-20 10 .00
56,626 0
2001-11-20 11 .00
59,272 1
2001-11-20 12 .00
59,409 1
2001-11-20 13 .00
61,707 2
2001-11-20 14 .00
67,078 2
2001-11-20 15 .00
81,833 3
2001-11-20 16 .00
82,594 4
2001-11-20 17 .00
83,143 4
2001-11-20 18 .00
84,641 4
2001-11-20 19 .00
84,696 4
2001-11-20 20 .00
692,868 5
2001-11-20 21 .00
692,929 5
2001-11-21 00 .00
2,626 0
2001-11-21 01 .00
387,783 0
2001-11-21 02 .00
389,422 0
2001-11-21 03 .00
389,495 0
2001-11-21 04 .00
390,562 0
2001-11-21 05 .00
391,884 0
2001-11-21 06 .00
392,445 0
Now that we understand DBWR, let's look into
another important area, the detection and correction of buffer busy
waits.
Tuning the Shared Pool
The shared pool component of the Oracle SGA is
primarily used to store shared SQL cursors, SQL source and execution
plans, stored procedures, and session information, as well as to
function as a RAM cache for the data dictionary and library. The
following section will look at the relevant parameters that govern the
shared pool and show how STATSPACK can be used to monitor and help
tune the shared pool region of RAM.
The size of the shared pool is controlled with
the shared_pool_size Oracle parameter. This parameter is most
often adjusted in response to poor statistics for library cache hits
or high SQL reparsing within the library cache, but the shared pool is
unlike any other Oracle structure.
There are numerous times when making the shared
pool smaller will dramatically improve performance. In fact, many
Oracle instances with nonreusable SQL are configured with
shared_pool_size. Optimally, this setting should be large enough
to retain all the reusable library cache objects plus enough to
accommodate the maximum total of concurrent nonsharable object sizes.
On the surface, it is counterintuitive that the
shared pool performance will get worse as the size increases. In all
other areas of Oracle memory, more RAM memory will always result in
better performance. The explanation is that memory buffers usually
assists performance by eliminating physical disk access. However, in
the case of the shared pool and literal SQL, an inverted CPU-memory
trade-off develops where more memory actually results in larger data
structures to manage with more management overhead due to the
nonsharing of library cache objects. This overhead is CPU intensive,
so with enough SQL pressure, the system becomes CPU bound and
performance suffers drastically.
A landmark experiment titled “Piranhas in the
Pool” by John Beresniewicz (Oracle Internals, 2000) clearly
demonstrated this phenomenon and offered an explanation for this
behavior. This paper noted that the failure of SQL statements to
utilize bind variables directly contributed to poor performance within
the shared pool. Conversely, the use of bind variables (resulting in
identical SQL) creates many fewer library cache objects and rapid
hashing to the matching object. The library cache latch is thus
released more quickly, reducing the latching impact of the library
cache latch.
cursor_sharing and the Shared Pool
Starting in Oracle8i (8.1.6), Oracle introduced
an exciting new Oracle parameter called cursor_sharing.
cursor_sharing is designed to help manage the kinds of problems
inherent with nonsharable SQL. The cursor_sharing parameter has
the following values:
-
FORCE Library
cache object matching based on exact SQL (or PL/SQL) text match as
in pre-8.1.6 Oracle.
-
EXACT Oracle
automatically substitutes bind variables to replace literals in SQL
statements before library cache object matching takes place, causing
increased sharing of literal SQL.
-
SIMILAR Oracle
has this option to detect substantially similar SQL statements that
differ only in the values of host variables.
When cursor_sharing is set to FORCE,
Oracle adds an extra parsing process that identifies statements that
would be equivalent if they did not contain literal values in the SQL.
For systems with dynamic SQL with embedded literal values, the
cursor_sharing parameter will greatly improve performance.
For example, suppose the following statement
was stored in the shared pool:
select *
from customer where last_name = 'Burleson' and first_name = 'Don';
With cursor_sharing = FORCE, the
following statement will be recognized as identical to the first:
select *
from customer where last_name = 'Ault' and first_name = 'Mike';
The cursor_sharing facility will
translate the first statement into a host variable equivalent and use
its execution plan to execute the second statement:
select *
from customer where last_name = :var1 and first_name = :var2;
The effects of this parameter on systems with lots of
literal SQL is astounding. Beresniewicz confirmed that using
cursor_sharing results in performance advantages similar to those
obtained using bind variables. His results confirmed the following
performance gains:
Thus, it is clear that cursor_sharing
can be used to significantly enhance the performance of high-volume
literal SQL and is a great boon to the DBA saddled with such
applications.
John Beresniewicz reached the following
conclusions:
-
The library cache
and shared pool memory manager are integral components of the Oracle
server designed to create efficiencies and thereby improve
performance through the caching and reuse of CPU-intensive steps
during SQL processing.
-
Applications
characterized by high volumes of literal (nonsharable) SQL can
compromise these efficiencies, induce additional overhead, and
degrade performance.
-
Experimental
results confirm that this is the case and support the explanation
that contention for the shared pool and library cache latches plays
a major role in this problem.
-
The new
cursor_sharing initialization parameter introduced in Oracle
8.1.6 addresses and corrects the performance impact of literal SQL
by converting it to bind variable format before library cache object
matching is undertaken. Results show that this new feature does
indeed correct for the performance penalty of literal SQL and
promises to be a “silver bullet” for DBAs burdened with
pathologically nonsharable SQL.
Now that we see the benefits of cursor_sharing on
shared_pool performance, let's look at some STATSPACK scripts
that monitor the performance of the shared pool and the library cache.
Instance Event Waits and STATSPACK
There are several tables within STATSPACK that
can be used to give us insight into contention problems within the
Oracle instance.
The stats$system_event table contains detailed
statistics about system events and the time that they have to wait for
service. Unfortunately, the stats$system_event table contains internal
events such as ipc messages that are not relevant to tuning.
Fortunately, the DBA can filter through the events in the
stats$system_event table and only report on these events that are
interesting to the DBA. These events include:
We are interested in finding times when these events
experience an excessive amount of waits and time waiting. The
following STATSPACK script will identify those events where time
waited is greater than 100 or total waits for the event exceed 100:
rpt_event.sql
set pages
999;
set lines 80;
column mydate heading 'Yr. Mo Dy Hr' format a13;
column event format a30;
column waits format 999,999;
column secs_waited format 999,999,999;
column avg_wait_secs format 99,999;
break on to_char(snap_time,'yyyy-mm-dd') skip 1;
select
to_char(snap_time,'yyyy-mm-dd HH24') mydate,
e.event,
e.total_waits - nvl(b.total_waits,0) waits,
((e.time_waited - nvl(b.time_waited,0))/100) /
nvl((e.total_waits - nvl(b.total_waits,0)),0) avg_wait_secs
from
stats$system_event b,
stats$system_event e,
stats$snapshot sn
where
e.snap_id = sn.snap_id
and
b.snap_id = e.snap_id-1
and
b.event = e.event
and
(
e.event like 'SQL*Net%'
or
e.event in (
'latch free',
'enqueue',
'LGWR wait for redo copy',
'buffer busy waits'
)
)
and
e.total_waits - b.total_waits > 100
and
e.time_waited - b.time_waited > 100
;
Here is the output from this script. As you can see,
someone was running a large task on January 22 that stressed several
areas within the shared pool and communications subsystem.
Yr. Mo Dy
Hr EVENT WAITS AVG_WAIT_SECS
------------- ------------------------------ -------- -------------
2001-01-22 20 SQL*Net message from client 119,432 1
2001-01-22 20 SQL*Net more data from client 592 0
2001-01-22 20 buffer busy waits 605 0
2001-01-22 20 enqueue 826 0
2001-01-22 20 latch free 128,343 0
2001-01-22 21 SQL*Net message from client 30,249 3
2001-01-22 21 SQL*Net more data from client 253 0
2001-01-22 21 enqueue 740 0
2001-01-22 21 latch free 77,247 0
In most cases, the resolution to these problems is to
add additional blocks to the shared_pool, but we can always go
back to the stats$sql_summary table to see what SQL precipitated the
problems. Next, let's look at how STATSPACK can be used to monitor
background events within the instance.
Monitoring Background Events with STATSPACK
The stats_bg_event summary table provides a wealth of
information regarding important background events. To see the list of
events, we query the stats_bg_event summary table.
1* select
distinct event from STATS$BG_EVENT_SUMMARY
SQL> /
EVENT
------------------------------
buffer busy waits
checkpoint range buffer not saved
control file parallel write
control file sequential read
db file parallel write
db file scattered read
db file sequential read
direct path read
direct path write
enqueue
file identify
file open
free buffer waits
latch free
library cache pin
log file parallel write
log file sequential read
log file single write
log file switch completion
log file sync
pmon timer
process startup
rdbms ipc message
rdbms ipc reply
smon timer
write complete waits
The next step is to run a STATSPACK query that will
report on those events that exceed our predefined threshold. Note that
we remove the timer and message events since these are
seldom of interest when tuning a database.
rpt_bg_event_waits.sql
set pages
999;
set lines 80;
column mydate heading 'Yr. Mo Dy Hr' format a13;
column event format a30;
column total_waits heading 'tot waits' format 999,999;
column time_waited heading 'time wait' format 999,999;
column total_timeouts heading 'timeouts' format 9,999;
break on to_char(snap_time,'yyyy-mm-dd') skip 1;
select
to_char(snap_time,'yyyy-mm-dd HH24') mydate,
e.event,
e.total_waits - nvl(b.total_waits,0) total_waits,
e.time_waited - nvl(b.time_waited,0) time_waited,
e.total_timeouts - nvl(b.total_timeouts,0) total_timeouts
from
stats$bg_event_summary b,
stats$bg_event_summary e,
stats$snapshot sn
where
e.event not like '%timer'
and
e.event not like '%message%'
and
e.snap_id = sn.snap_id
and
b.snap_id = e.snap_id-1
and
b.event = e.event
and
e.total_timeouts > 50
and
(
e.total_waits - b.total_waits > 50
or
e.time_waited - b.time_waited > 50
)
;
Here is the output where we see the time, the event
name, the total waits for the event, the time waited, and the total
timeouts for the event:
Yr. Mo Dy
Hr EVENT tot waits time wait timeouts
------------- --------------------------- --------- ---------
--------
2001-12-28 12 buffer busy waits 52 0
45
2001-01-01 00 buffer busy waits 62 0
60
2001-01-01 01 buffer busy waits 53 0
49
2001-01-04 01 buffer busy waits 52 0
46
2001-01-07 18 rdbms ipc reply 85 0
54
2001-01-08 17 latch free 95 0
95
2001-01-17 23 buffer busy waits 82 0
76
2001-01-17 23 latch free 79 0
78
2001-01-21 23 latch free 51 0
51
2001-01-22 13 latch free 55 0
55
2001-01-22 14 latch free 52 0
52
2001-01-22 20 latch free 60 104
60
This report can provide vital clues for areas of
contention within our database. This script is incorporated into the
generic statspack_alert.sql script, so the DBA is always aware
of out-of-bounds conditions inside the database.
Next, let's examine the most important areas of
the shared pool, the library cache.
This is an excerpt from "Oracle
High Performance tuning with STATSPACK" by Oracle Press.
 |
If you like Oracle tuning, you may enjoy the new book "Oracle
Tuning: The Definitive Reference", over 900 pages of BC's favorite tuning
tips & scripts.
You can buy it direct from the publisher for 30%-off and get instant access to
the code depot of Oracle tuning scripts. |
|