Windows Server 2008 R2 Unleashed (233 page)

When a problem is encountered with a cluster resource, the failover cluster service

attempts to fix the problem by restarting the resource and any dependent resources. If that

doesn’t work, the Services and Applications group the resource is a member of is failed

over to another available node in the cluster, where it can then be restarted. Several condi-

tions can cause a Services and Applications group to failover to a different cluster node.

Failover can occur when an active node in the cluster loses power or network connectivity

or suffers a hardware or software failure. In most cases, the failover process is either

noticed by the clients as a short disruption of service or is not noticed at all. Of course, if

failback is configured on a particular Services and Applications group and the group is

simply not stable but all possible nodes are available, the group will be continually moved

ptg

back and forth between the nodes until the failover threshold is reached. When this

happens, the group will be shut down and remain offline by the cluster service.

To avoid unwanted failover, power management should be disabled on each of the cluster

nodes in the motherboard BIOS, on the network interface cards (NICs), and in the Power

applet in the operating system’s Control Panel. Power settings that allow a display to shut

off are okay, but the administrator must make sure that the disks, as well as each of the

network cards, are configured to never go into Standby mode.

Cluster nodes can monitor the status of resources running on their local system, and they

can also keep track of other nodes in the cluster through private network communication

messages called heartbeats. Heartbeat communication is used to determine the status of a

node and send updates of cluster configuration changes and the state of each node to the

cluster quorum.

29

The cluster quorum contains the cluster configuration data necessary to restore a cluster to

a working state. Each node in the cluster needs to have access to the quorum resource,

regardless of which quorum model is chosen or the node will not be able to participate in

the cluster. This prevents something called “split-brain” syndrome, where two nodes in

the same cluster both believe they are the active node and try to control the shared

resource at the same time or worse, each node can present its own set of data, when sepa-

rate data sets are available, which causes changes in both data sets and a whirlwind of

proceeding issues. Windows Server 2008 R2 provides four different quorum models, which

are detailed in the section “Failover Cluster Quorum Models” later in this chapter.

1184

CHAPTER 29

System-Level Fault Tolerance (Clustering/Network Load Balancing)

Network Load Balancing

The second clustering technology provided with Windows Server 2008 R2 is Network

Load Balancing (NLB). NLB clusters provide high network performance, availability, and

redundancy by balancing client requests across several servers with replicated configura-

tions. When client load increases, NLB clusters can easily be scaled out by adding more

nodes to the cluster to maintain or provide better response time to client requests. One

important point to note now is that NLB does not itself replicate server configuration or

application data sets.

Two great features of NLB are that no proprietary hardware is needed and an NLB cluster

can be configured and up and running literally in minutes. One important point to

remember is that within NLB clusters, each server’s configuration must be updated inde-

pendently. The NLB administrator is responsible for making sure that application or

service configuration, version and operating system security, and updates and data are

kept consistent across each NLB cluster node. For details on installing NLB, refer to the

“Deploying Network Load Balancing Clusters” section later in this chapter.

Overview of Failover Clusters

ptg

After an organization decides to cluster an application or service using failover clusters, it

must then decide which cluster configuration model best suits the needs of the particular

deployment. Failover clusters can be deployed using four different configuration models

that will accommodate most deployment scenarios and requirements. The four configura-

tion models in this case are defined by the quorum model selected, which include the

Node Majority Quorum, Node and Disk Majority Quorum, Node and File Share Majority

Quorum, and the No Majority: Disk Only Quorum. The typical and most common cluster

deployment that includes two or more nodes in a single data center is the Node and Disk

Majority Quorum model. Another configuration model of failover clusters that utilizes one

of the previously mentioned quorum models is the geographically dispersed cluster, which

is deployed across multiple networks and geographic locations. Geographically dispersed

clusters or stretch clusters will be detailed later in this chapter in the “Deploying Multisite

or Stretch Geographically Dispersed Failover Clusters” section.

Failover Cluster Quorum Models

As previously stated, Windows Server 2008 R2 failover clusters support four different

cluster quorum models. Each of these four models is best suited for specific configurations

but if all the nodes and shared storage are configured, specified, and available during the

installation of the failover cluster, the best-suited quorum model is automatically selected.

Node Majority Quorum

The Node Majority Quorum model has been designed for failover cluster deployments

that contain an odd number of cluster nodes. When determining the quorum state of the

cluster, only the number of available nodes is counted. A cluster using the Node Majority

Quorum is called a Node Majority cluster. A Node Majority cluster remains up and

Overview of Failover Clusters

1185

running if the number of available nodes exceeds the number of failed nodes. As an

example, in a five-node cluster, three nodes must be available for the cluster to remain

online. If three nodes fail in a five-node Node Majority cluster, the entire cluster is shut

down. Node Majority clusters have been designed and are well suited for geographically or

network dispersed cluster nodes, but for this configuration to be supported by Microsoft, it

takes serious effort, quality hardware, a third-party mechanism to replicate any back-end

data, and a very reliable network. Once again, this model works well for clusters with an

odd number of nodes.

Node and Disk Majority Quorum

The Node and Disk Majority Quorum model determines whether a cluster can continue to

function by counting the number of available nodes and the availability of the cluster

witness disk. Using this model, the cluster quorum is stored on a cluster disk that is acces-

sible and made available to all nodes in the cluster through a shared storage device using

Serial Attached SCSI (SAS), Fibre Channel, or iSCSI connections. This model is the closest

to the traditional single-quorum device cluster configuration model and is composed of

two or more server nodes that are all connected to a shared storage device. In this model,

only one copy of the quorum data is maintained on the witness disk. This model is well

suited for failover clusters using shared storage, all connected on the same network with

an even number of nodes. For example, on a 2-, 4-, 6-, 8-, or 16-node cluster using this

ptg

model, the cluster continues to function as long as half of the total nodes are available

and can contact the witness disk. In the case of a witness disk failure, a majority of the

nodes need to remain up and running for the cluster to continue to function. To calculate

this, take half of the total nodes and add one and this gives you the lowest number of

available nodes that are required to keep a cluster running when the witness disk fails or

goes offline. For example, on a 6-node cluster using this model, if the witness disk fails,

the cluster will remain up and running as long as 4 nodes are available, but on a 2-node

cluster, if the witness disk fails, both nodes will need to remain up and running for the

cluster to function.

Node and File Share Majority Quorum

The Node and File Share Majority Quorum model is very similar to the Node and Disk

Majority Quorum model but instead of a witness disk, the quorum is stored on file share.

The advantage of this model is that it can be deployed similarly to the Node Majority

29

Quorum model but as long as the witness file share is available, this model can tolerate the

failure of half of the total nodes. This model is well suited for clusters with an even number

of nodes that do not utilize shared storage or clusters that span sites. This is the preferred

and recommended quorum configuration for geographically dispersed failover clusters.

No Majority: Disk Only Quorum

The No Majority: Disk Only Quorum model is best suited for testing the process and

behavior of deploying built-in or custom services and/or applications on a Windows

Server 2008 R2 failover cluster. In this model, the cluster can sustain the failover of all

nodes except one, as long as the disk containing the quorum remains available. The limi-

tation of this model is that the disk containing the quorum becomes a single point of

1186

CHAPTER 29

System-Level Fault Tolerance (Clustering/Network Load Balancing)

failure and that is why this model is not well suited for production deployments of

failover clusters.

As a best practice, before deploying a failover cluster, determine if shared storage will be

used, verify that each node can communicate with each LUN presented by the shared

storage device, and when the cluster is created, add all nodes to the list. This ensures that

the correct recommended cluster quorum model is selected for the new failover cluster.

When the recommended model utilizes shared storage and a witness disk, the smallest

available LUN will be selected. This can be changed, if necessary, after the cluster is created.

Choosing Applications for Failover Clusters

Many applications can run on failover clusters, but it is important to choose and test

those applications wisely. Although many can run on failover clusters, the application

might not be optimized for clustering or supported by the software vendor or Microsoft

when deployed on Windows Server 2008 R2 failover clusters. Work with the vendor to

determine requirements, functionality, and limitations (if any). Other major criteria that

should be met to ensure that an application can benefit and adapt to running on a cluster

are the following:

ptg

. Because clustering is IP-based, the cluster application or applications must use an IP-

based protocol.

. Applications that require access to local databases must have the option of configur-

ing where the data can be stored so a drive other than the system drive can be speci-

fied for data storage that is separate from the storage of the application core files.

. Some applications need to have access to data regardless of which cluster node they

are running on. With these types of applications, it is recommended that the data is

stored on a shared disk resource that will failover with the Services and Applications

group. If an application will run and store data only on the local system or boot

drive, the Node Majority Quorum or the Node and File Share Majority Quorum

model should be used, along with a separate file replication mechanism for the

application data.

. Client sessions must be able to reestablish connectivity if the application encounters

a network disruption or fails over to an alternate cluster node. During the failover

process, there is no client connectivity until an application is brought back online.

If the client software does not try to reconnect and simply times out when a net-

work connection is broken, this application might not be well suited for failover or

NLB clusters.

Cluster-aware applications that meet all of the preceding criteria are usually the best appli-

cations to deploy in a Windows Server 2008 R2 failover cluster. Many services built in to

Windows Server 2008 R2 can be clustered and will failover efficiently and properly. If a

particular application is not cluster-aware, be sure to investigate all the implications of the

application deployment on Windows Server 2008 R2 failover clusters before deploying or

spending any time prototyping the solution.

Overview of Failover Clusters

1187

NOTE

If you’re purchasing a third-party software package to use for Windows Server 2008 R2

failover clustering, be sure that both Microsoft and the software manufacturer certify

that it will work on Windows Server 2008 R2 failover clusters; otherwise, support will

be limited or nonexistent when troubleshooting is necessary.

Shared Storage for Failover Clusters

Shared disk storage is a requirement for Windows Server 2008 R2 failover clusters using

the Node and Disk Majority Quorum and the Disk Only Quorum models. Shared storage

devices can be a part of any cluster configuration and when they are used, the disks, disk

Other books

Jimmy and the Crawler by Raymond E. Feist
The Face of Scandal by Helena Maeve
Breach of Promise by James Scott Bell
The Twain Maxim by Clem Chambers
Heart of the Jaguar by Katie Reus
In Satan's Shadow by Miller, John Anthony
The Lady Who Broke the Rules by Marguerite Kaye