Oracle Training Oracle Support
Oracle Training
SQL Tuning Consulting
Oracle Tuning Consulting
Data Warehouse Consulting
Oracle Project Management
Oracle Security Assessment
Unix Consulting
Burleson Books
Burleson Articles
Burleson Web Courses
Burleson Qualifications
Oracle Internals Magazine
Oracle Links
Oracle Monitoring
Remote Support Benefits
Remote Plans & Prices
Our Automation Strategy
What We Monitor
Oracle Apps Support
Print Our Brochure
Contact Us (e-mail)
Oracle Job Opportunities
Oracle Consulting Prices
 

Free Oracle Tips


 
HTML Text AOL
 
 

Recovering from an Unplanned Outage Using Oracle RAC Guard

September 11,  2003
Don Burleson

 

Recovery on One Node

When an unplanned outage occurs on the primary node, Oracle RAC Guard automatically fails over to the secondary node and notifies the user that a role change has occurred. At this point, Oracle RAC Guard is operating in a non-resilient state, with the primary role on the former secondary node.

After you have performed root cause analysis and repaired the source of the fault, restore the secondary role on the former primary node by using the restore command:

PFSCTL> restore

The primary and secondary roles have now been reversed. You can now continue to operate in one of the following modes:

  • Operate with reversed primary and secondary roles.
  • Return to the original primary/secondary configuration.
  • Choose a less critical application to restore.

Let's examine these modes.

Operate with Reversed Primary and Secondary Roles

After restoring both packs, you can continue to operate with primary and secondary roles that are reversed from the initial state. For sites with symmetric configurations, there is no need to return to the original state. Returning to the original roles requires a planned outage and can be avoided. In fact, some users intentionally operate with role reversal on a fixed schedule (such as every three months), in order to test the capabilities of the system.

Return to the Original Primary/Secondary Configuration

Returning to the original primary/secondary configuration requires a planned outage while the primary role is moved. Plan it for a less busy part of your business cycle and give advance notice to users. Execute it as follows:

# pfsctl
PFSCTL> switchover

Choose a Less Critical Application to Restore

If there is more than one uniquely identified database on each node, then performance will be degraded after a failover, under most conditions. For example, if a two-node cluster is in a primary/secondary configuration and an unrelated database is running on the secondary node, then the secondary node runs the primary services, as well as the unrelated database, after failover and may be overloaded. In this situation, move the less critical service to the other node when it is restored.

Perform the following steps for each of the services that are moved to the restored node:

  1. Set the oracle_service and db_name environment variables. For example:

$ export ORACLE_SERVICE=SALES
$ export DB_NAME=sales

  1. Restore the instance with secondary role:

# pfsctl
PFSCTL> restore

  1. Move the primary role to the original primary node:

PFSCTL> switchover

Recovery on Both Nodes

What happens during normal operations when both nodes fail for some reason? In this situation, both nodes are up and operational. Pack A is running on its home node, Node A, and has the primary role. It contains the primary instance and an IP address. Pack B is running on its home node, Node B, and has the secondary role. It contains the secondary instance and an IP address.

If the primary instance (controlled by Pack A on Node A) fails, then Oracle RAC Guard automatically initiates failover actions:

  • The secondary instance becomes the primary instance.
  • Pack A starts on Node B in foreign mode. This means that only its IP address is activated on Node B. (This makes the secondary instance become the primary).

After RAC Guard completes automatic actions, both Pack A and Pack B are running on Node B. Pack B contains the primary instance and its IP address. Pack A contains only an IP address. Nothing is running on Node A. The system is not resilient. Now, if the primary instance fails again (it was the secondary instance but was switched to primary on the first node failure), then Pack A and Pack B contain only IP addresses.

In this situation, Pack B starts on its foreign node (Node A). Pack A will still be running on Node B. At this time, just the IP addresses are up on the nodes. Due to the fact there is no instance running, Pack B switches back and restarts on its home node (Node B) and tries to restart the primary instance. If restarting the instance is unsuccessful, Pack B again switches and starts on its foreign node (Node A). The outcome of double instance failure is:

  • Both packs are running on their foreign nodes.
  • Only the IP addresses are up.
  • No instances are running.

The DBA must diagnose and repair the cause of the failures. Once the cause for the failures is found and corrected, perform the following steps to restart the instances:

  1. Halt both packs. This is done by entering the following command from pfsctl:

PFSCTL> pfshalt

You will see output similar to the following if the command is successful:

pfshalt command succeeded.

  1. Start both packs. This is done by entering the following command from pfsctl:

PFSCTL> pfsboot

You will see output similar to the following if the command is successful:

pfsboot command succeeded.



If you like Oracle tuning, check out my latest book
"Oracle Tuning: The Definitive Reference". 

It's 980 pages of hard-core tuning insights, tips and scripts, and you can buy it direct from the publisher for 30%-off. 

 

 
 

 

 

 
   

Copyright © 1996 -  2011 by Burleson Enterprises. All rights reserved.

Oracle® is the registered trademark of Oracle Corporation.