
Redo-related wait events
There are a number of wait events that happen during redo activities and most of them are I/O related. First, I will talk about the two most important wait events: 'log file parallel write'
and 'log file sync'
. Then I will mention some other important ones you should know about.
The 'log file parallel write' event
Oracle foreground processes wait for 'log file sync'
, whereas the LGWR process waits for 'log file parallel write'
. Although we usually find 'log file sync'
in the Top 5 Timed Events or the Wait Events section of the Statspack report, in order to understand it we will first look at 'log file parallel write'
.
The LGWR background process waits for this event while it is copying redo records from the memory log buffer cache to the current redo group's member log files on disk. Asynchronous I/O will be used if available to make the write parallel, otherwise these writes will be done sequentially one member after the other. However, LGWR has to wait until the I/Os to all member log files are complete before the wait is completed. Hence the factor that determines the length of this wait is the speed with which the I/O subsystem can perform the writes to the log file members.
To reduce the time spent waiting for 'log file parallel write'
, one approach is to reduce the amount of redo generated by the database. The following are some options for doing that:
- Make use of the
UNRECOVERABLE
/NOLOGGING
options. - Reduce the number of redo group members to the minimum necessary to ensure that not all members can be lost at the same time.
- Do not leave tablespaces in the
BACKUP
mode for longer than necessary. - Use only the minimal level of supplemental LOGGING required to achieve required functionality, for example, in LogMiner, logical standby, or streams.
- A last approach is to make it so your applications do not wait for LGWR, who cares if LGWR is waiting for I/O if you are not waiting for LGWR. You do this by:
- Committing infrequently if you are doing a large load. If you load 1 million records and commit, you will have to wait a teeny tiny amount of time for LGWR to finish writing since LGWR was writing continuously during your load (LGWR will have a lot of
'log file parallel write'
waits, but you will have one tiny'log file sync'
wait). However if you load and commit 1 million records, you will have to wait 1 million times for log writer to write. That will be slow. - Looking into the ability to use asynchronous commits (
'commit work write batch nowait'
for example) since Oracle 10g R2. This has to be done with care due that in the end you are trading recoverability for speed, and consequently if buffered redo is not yet written to the redo log files and your database crashes, you will end losing your committed data.
- Committing infrequently if you are doing a large load. If you load 1 million records and commit, you will have to wait a teeny tiny amount of time for LGWR to finish writing since LGWR was writing continuously during your load (LGWR will have a lot of
Another approach is to tune the I/O itself:
- Place redo group members in different storage locations so that parallel writes do not contend with each other
- Do not use RAID-5 for redo log files
- Use ASM, the use of raw devices were deprecated in Oracle 11g and desupported in Oracle 12c
- Use faster disks for redo log files
- If archiving is being used, configure redo storage so that writes for the current redo group members do not contend with reads for the group(s) currently being archived
The 'log file sync' event
The 'log file sync'
wait event occurs in Oracle foreground processes when they have issued a COMMIT or ROLLBACK operation and are waiting for it to complete. Part (but not all) of this wait includes waiting for the LGWR process to copy the redo records for the session's transaction from the log buffer memory to disk. In the time that a foreground process is waiting for 'log file sync'
, the LGWR will also wait for a portion of that time on 'log file parallel write'
.
The key to understanding what is delaying 'log file sync'
is comparing the average time spent waiting for 'log file sync'
to that spent waiting for 'log file parallel write'
. You can then take action as follows:
- If the average wait times are similar, then redo log file I/O is causing the delay, and the guidelines for tuning that I/O should be followed.
- If the average wait time for
'log file parallel write'
is significantly smaller or larger than for'log file sync'
, then the delay is caused by the other parts of the redo LOGGING mechanism that occurs during COMMIT/ROLLBACK (and are not I/O related). Sometimes there will be a latch contention on redo latches, evidenced by'latch free'
or'LGWR wait for redo copy'
wait events.
The 'redo log space request' event
The 'redo log space request'
wait event indicates how many times a server process has waited for space in the online redo log file (this is not related to the redo log buffer as many people think).
You can access the statistics information for redo log space request by querying the v$sysstat
view, as per this example:
SQL> SELECT name,value FROM v$sysstat 2 WHERE name LIKE '%redo log space requests%'; NAME VALUE ---------------------------------------- ---------- redo log space requests 1375
Use this information plus the wait events as an indication that a tuning of checkpoints, DBWR, or archive activity is required—not in the LGWR. This is caused by the online redo log file and not by the log buffer. Thus, increasing the size of the log buffer will not solve the problem.
The 'log buffer space'
event occurs when server processes are waiting for free space in the log buffer, because redo is being generated faster than the LGWR process is writing in the redo log files.
To solve this situation or conversely reduce the amount of redo being generated, you need to increase the redo log buffer size. If you have already tuned the redo log buffer size and the problem continues to happen, then the next step will be to ensure that the disks on which the online redo logs reside do not suffer from I/O contention (only if LGWR is spending a lot of time in 'log file parallel write'
).