Whenever there is a failure in the primary node, and the application or
database is disabled, the failover process is initiated by a script or
agent. Failover in a host-based database system usually includes the
following steps in sequential order:
-
Detecting failure by
monitoring the heartbeat and checking the status of resources.
-
Reorganizing cluster
membership in the cluster manager.
-
Transferring disk ownership
from the primary node to a secondary node.
-
Mounting the file system on a
secondary node.
-
Starting the database
instance on a secondary node.
-
Recovering the database and
the rollback of uncommitted data.
-
Reestablishing client
connections to the failover node (database).
Let us look at some examples of how a typical resource group is
configured in a Sun failover cluster, a Veritas cluster server, and the
Microsoft cluster service.
Examples
Sun Cluster 3.0 defines ‘Agents’ to describe a third-party application,
such as Oracle or the iPlanet Web Server that has been configured to run on
a cluster rather than a single server. No changes to these applications are
needed in order to become cluster agents. They are merely “wrapped” with
scripts that allow the cluster framework to understand how to start, stop,
and monitor the health of the given service.
The Resource Group Manager (RGM) component of the cluster framework
supports the registration and operation of applications. Resources that have
dependencies on each other can be grouped together, so that in case of
failure, the correct reconfigurations can be made without impacting any
service not affected by this failure. The RGM also monitors the health of
the application and determines when failures happen and how to react to
them.
However, Sun Cluster 2.2 implements the concept of logical hosts to
achieve failover of data services. A logical host is a group consisting of
an IP name and address, and one or more disk groups. Configuring an
application failover involves creation of a logical host. The logical host
is the basic failover unit. Logical host definitions are created by the
administrator and are associated with a particular data service (such as
Oracle DB). The logical host has all the necessary information for a
designated backup system to take over the data services of a failed node.
During the failover process, the logical host is migrated over to the
backup node. Disk ownership is ensured and the appropriate data services are
started up on the backup node. Clients continue to address the same logical
host and associated data and services as they were doing before the
failover. The only difference is that the logical host is now owned by the
backup physical node, which has assumed the identity of the master node
through the implementation of the logical host.
In the case of the Veritas Cluster Server (VCS), Service Group is the
basic unit of failover. The service group fails over to a backup node when
failure occurs at the primary node. Service groups consist of related
resources that work together to deliver database service to clients. Service
groups allow you to monitor and control service availability as a whole, as
opposed to the individual items (servers, disks, software, etc.). The
failure of one critical item in the service group will cause the entire
group to failover to another system.
In the case of Oracle database implementation within the framework of
Microsoft Cluster Service (MSC), the cluster group includes the following
resources:
- One or more virtual addresses, each of which consists of an IP address
and network name.
- The Oracle database server.
- All disks used by the Oracle database.
- A Net8 (or SQL*Net) network listener that listens on the virtual
address (or addresses) of the group for connection requests to the
databases in the group.
- An Oracle Intelligent Agent configured to use one of the group’s
virtual addresses (if Oracle Enterprise Manager will be used to manage the
database).
The cluster group is the basic failover unit in MSC. Oracle provides
failsafe manager tools to configure and manage the Oracle database failover
service within the MSC framework.
As we noted in the above examples, the database instance has been freshly
started once required resources are online. It is a mutually exclusive
condition in which the database instance either resides on a primary node or
on a backup node.
 |
If you like Oracle tuning, check out my latest book "Oracle
Tuning: The Definitive Reference".
It's 980 pages of hard-core tuning insights, tips and
scripts, and you can buy it direct from the publisher for 30%-off.
|
|