Recovery on One Node
When an unplanned outage occurs on the primary node, Oracle RAC Guard
automatically fails over to the secondary node and notifies the user that a
role change has occurred. At this point, Oracle RAC Guard is operating in a
non-resilient state, with the primary role on the former secondary node.
After you have performed root cause analysis and repaired the source of the
fault, restore the secondary role on the former primary node by using the
restore command:
PFSCTL> restore
The primary and secondary roles have now been reversed. You can now continue
to operate in one of the following modes:
- Operate with reversed primary and secondary roles.
- Return to the original primary/secondary configuration.
- Choose a less critical application to restore.
Let's examine these modes.
Operate with Reversed Primary and Secondary Roles
After restoring both packs, you can continue to operate with primary and
secondary roles that are reversed from the initial state. For sites with
symmetric configurations, there is no need to return to the original state.
Returning to the original roles requires a planned outage and can be
avoided. In fact, some users intentionally operate with role reversal on a
fixed schedule (such as every three months), in order to test the
capabilities of the system.
Return to the Original Primary/Secondary Configuration
Returning to the original primary/secondary configuration requires a planned
outage while the primary role is moved. Plan it for a less busy part of your
business cycle and give advance notice to users. Execute it as follows:
# pfsctl
PFSCTL> switchover
Choose a Less Critical Application to Restore
If there is more than one uniquely identified database on each node, then
performance will be degraded after a failover, under most conditions. For
example, if a two-node cluster is in a primary/secondary configuration and
an unrelated database is running on the secondary node, then the secondary
node runs the primary services, as well as the unrelated database, after
failover and may be overloaded. In this situation, move the less critical
service to the other node when it is restored.
Perform the following steps for each of the services that are moved to the
restored node:
- Set the oracle_service and db_name environment variables. For example:
$ export ORACLE_SERVICE=SALES
$ export DB_NAME=sales
- Restore the instance with secondary role:
# pfsctl
PFSCTL> restore
- Move the primary role to the original primary node:
PFSCTL> switchover
Recovery on Both Nodes
What happens during normal operations when both nodes fail for some reason?
In this situation, both nodes are up and operational. Pack A is running on
its home node, Node A, and has the primary role. It contains the primary
instance and an IP address. Pack B is running on its home node, Node B, and
has the secondary role. It contains the secondary instance and an IP
address.
If the primary instance (controlled by Pack A on Node A) fails, then Oracle
RAC Guard automatically initiates failover actions:
- The secondary instance becomes the primary instance.
- Pack A starts on Node B in foreign mode. This means that only its IP
address is activated on Node B. (This makes the secondary instance become
the primary).
After RAC Guard completes automatic actions, both Pack A and Pack B are
running on Node B. Pack B contains the primary instance and its IP address.
Pack A contains only an IP address. Nothing is running on Node A. The system
is not resilient. Now, if the primary instance fails again (it was the
secondary instance but was switched to primary on the first node failure),
then Pack A and Pack B contain only IP addresses.
In this situation, Pack B starts on its foreign node (Node A). Pack A will
still be running on Node B. At this time, just the IP addresses are up on
the nodes. Due to the fact there is no instance running, Pack B switches
back and restarts on its home node (Node B) and tries to restart the primary
instance. If restarting the instance is unsuccessful, Pack B again switches
and starts on its foreign node (Node A). The outcome of double instance
failure is:
- Both packs are running on their foreign nodes.
- Only the IP addresses are up.
- No instances are running.
The DBA must diagnose and repair the cause of the failures. Once the cause
for the failures is found and corrected, perform the following steps to
restart the instances:
- Halt both packs. This is done by entering the following command from
pfsctl:
PFSCTL> pfshalt
You will see output similar to the following if the command is successful:
pfshalt command succeeded.
- Start both packs. This is done by entering the following command from
pfsctl:
PFSCTL> pfsboot
You will see output similar to the following if the command is successful:
pfsboot command succeeded.
 |
If you like Oracle tuning, check out my latest book
"Oracle
Tuning: The Definitive Reference".
It's 980 pages of hard-core tuning insights, tips and
scripts, and you can buy it direct from the publisher for 30%-off. |
|