Windows Server 2008 R2 Unleashed (243 page)

corruption, or a new application or driver installation could overwrite a critical file leaving

a system unstable or in a failed state. Also, more commonly found in today’s networks, a

security, application, or system update conflicts with an existing application or service

causing undesirable issues.

Prioritizing the Recovery

After all of the computer services and applications used on a network are identified, as

well as deciding which typical disaster scenarios will be considered in the backup and

recovery plan, the next step is to organize or prioritize how the recovery of critical systems

and services will be executed. The prioritization usually involves getting the most critical

services up and running first; this usually requires networking services such as DNS and

DHCP, as well as Active Directory domain controllers, especially on corporate networks

that utilize Microsoft Windows servers and client operating systems.

Maintaining up-to-date backup and recovery plans requires following strict processes

when changing an organization’s computer and network infrastructure. With an up-to-

date technology priority list, administrators can tackle the planning for the most impor-

tant services first to ensure that if a disaster strikes sooner rather than later, the most

important systems are always protected and recoverable.

Understanding Your Backup and Recovery Needs and Options

1231

Identifying Bare Minimum Services

The bare minimum services are the fewest possible services and applications that must be

up and running for business operations to continue. Only the top few services and applica-

tions in the technology prioritized list will become part of the bare minimum services list.

For example, a bare minimum computer service for a retail outlet could be a server that

runs the retail software package and manages the register and receipt printer. For a web-

based company, it could be the web and e-commerce servers that process online orders.

Determining the Service-Level Agreement and Return-to-Operation

Requirements

A service-level agreement (SLA) is an estimated planned uptime or availability time frame

for a system, service, or application. SLAs are usually defined by hours per day, week,

month, or year and are expressed in percentages. For example, if the corner grocery store

claims to be open 24 hours a day, every day of the year, the grocery store SLA is 100%.

Another example could be an organization’s electronic fax services that should be avail-

able 7 days a week between the hours of 5:00 a.m. and 11:00 p.m.

Many organizations hope to achieve and maintain operation of the most critical services

24 hours a day, 7 days a week or 100% planned uptime as logistically possible. A few

common SLA targets are included in the following list:

ptg

. 99.999% planned uptime results in 5.25 minutes of planned downtime or mainte-

nance per year.

. 99.99% planned uptime results in 52.5 minutes of planned downtime or mainte-

nance per year.

. 99.9% planned uptime results in 8 hours, 45.6 minutes of planned downtime or

maintenance per year.

. 99.7% planned uptime results in 26 hours and 17 minutes of planned downtime or

maintenance per year.

. 99% planned uptime results in 87 hours and 36 minutes of planned downtime or

maintenance per year.

Executives and managers alike all know that maintaining 100% of planned uptime is not

usually possible because of a number of factors. Also, many professionals might also

consider that the SLA must account for the time to recover after a failure or disaster is

encountered. Ensure that the definition of the SLA is understood by all as “planned”

uptime or “planned and unplanned.” The difference is huge. A recommendation is that an

30

SLA is defined as planned uptime. The unplanned recovery time frame is defined as the

Return to Operation (RTO) number for the remainder of this section.

The RTO defines how long it will take to recover a system, service, application, or business

operation after a failure or disaster has occurred. Of course, the shorter the RTO time

frame is, the more likely the backup and recovery solution costs will increase. For

example, deploying a Windows Server 2008 R2 failover cluster can provide system recov-

ery within seconds or minutes, but the hardware and software licensing costs would easily

1232

CHAPTER 30

Backing Up the Windows Server 2008 R2 Environment

exceed the costs of a recovery plan that included diagnosing a hardware issue and waiting

for a replacement part to arrive within a 4-hour window. The business owners or execu-

tives of an organization need to clearly understand how long it will take to recover from

certain failures and that will help derive the final accepted backup and recovery solution.

Separating the SLA and RTO in disaster recovery documentation can be a very valuable

tool to use when presenting the current or proposed computer and network infrastructure

disaster recovery solution to executives, managers, auditors, and customers. For example, a

service might be presented to customers with a 99.99% SLA. The same system can be

presented in the finer details to have a maximum of an 8-hour RTO, which will still meet

a 99.9% uptime in the event of a major disaster. This can also be worded as “This service

will provide 99.9% to 99.99% availability.”

Creating the Disaster Recovery Solution

When administrators understand what sorts of failures can occur and know which services

and applications are most critical to their organization, they have gathered almost all the

information necessary to create a preliminary high-level disaster recovery solution. Many

different pieces of information and several documents will be required, even for the

ptg

preliminary solutions. Some of the items required within the solution are listed in the

following sections.

Disaster Recovery Solution Overview Document

The Disaster Recovery Solution Overview document is a short narrative of the solution in

action, including presentations with quality graphics and/or Microsoft Visio diagrams.

This document first provides an executive summary, including only high-level details to

provide executives and management with enough information to understand what steps

are being taken to provide business continuity in the event of a disaster. The remainder of

the document should contain detailed information related to the plan, including many of

the following items:

. Current computer and network infrastructure review.

. Detailed history of the planning meetings and the information that was presented

and discussed in those meetings.

. The list of which disaster and outage scenarios will be greatly mitigated by this plan,

and which scenarios will not be addressed by this plan.

NOTE

Scenarios that will not be addressed in your organization’s disaster recovery solutions

should still be referenced in the document to show that it was presented, discussed,

and considered very unlikely to occur, too expensive to mitigate up front, or not impor-

tant enough to dedicate budget or staff resources.

Creating the Disaster Recovery Solution

1233

. The list of the most critical applications, systems, and services for the organization

and the potential impact to the business if these systems encounter a failure or are

not available.

. Description of the high-level solution, including how the proposed disaster recovery

solution will enhance the organization by improving the reliability and recoverability.

. Defined SLA and RTO time estimates this solution provides to each failure and disas-

ter scenario.

. Associated computer and network hardware specifications, including initial purchas-

ing and ongoing support and licensing costs.

. Associated software specifications and licensing costs for initial purchase and

ongoing support and maintenance costs.

. Additional WAN links costs.

. Additional outside services costs, including hosting services, data center lease costs,

offsite disk and tape storage fees, consulting costs for the project, technical writing,

document management, and ongoing support or lease costs.

. Estimated internal staffing resource assignment and utilization for the solution

deployment, as well as the ongoing utilization requirements to support the ongoing

ptg

backup and recovery tasks.

. The initial estimated project schedule and project milestones.

Getting Disaster Recovery Solutions Approved

Prioritizing and identifying the bare minimum services are not only the responsibility of

the IT staff; these decisions belong to management as well. The IT staff is responsible for

identifying single points of failure, gathering the statistical information of application

and service usage, and possibly also understanding how an outage can affect business

operations.

Before the executives can make a decision regarding budget for an organization’s disaster

recovery plan, they should be presented with as much information as possible to make the

most informed decision. As a general guideline, when presenting the preliminary disaster

recovery solution, make sure it includes the “In a perfect world with unlimited budget”

plan, along with one or two lower-cost plans with clearly highlighted extended downtime

or reduced functionality. Presenting alternate plans highlighting different costs and results

30

might help ensure that the solution gets approval in one form or another.

Getting the budget approved for a secondary disaster recovery solution is better than

getting no budget for the preferred solution. The staff should always try to be very clear

on the SLA for a chosen solution and to document or have a paper trail concerning all

disaster recovery solutions that have been accepted or denied. If a failure that could have

been planned for occurs but budget was denied, IT staff members or IT managers should

make sure to have all their facts straight and documentation to prove it.

1234

CHAPTER 30

Backing Up the Windows Server 2008 R2 Environment

Documenting the Enterprise

So far, in the backup and recovery preparation, computer and network discovery has been

performed, different failure scenarios have been considered, and the most critical services

have been identified and prioritized. Now, it is time to start actually building the backup

and disaster recovery plan that a qualified individual will use in the event of a failure. To

begin creating the plan, the current computer and network infrastructure must be docu-

mented. Information on documenting a Windows Server 2008 R2 system can be found in

Chapter 22, “Documenting a Windows Server 2008 R2 Environment.” Documentation

should include, but not be limited to, the following:

.
Server configuration document—
This document details which services and appli-

cations the system provides, as well as the network settings, software installed, and

hardware specifications.

.
Server build document—
This document contains step-by-step instructions on how

to build a Windows Server 2008 R2 system for a specific role, such as domain

controller or file server, including which software is required and hardware specifica-

tions. This document will also include specific security configurations, hardware and

software configurations, and other organizational server configuration standards.

.
Network diagrams—
Network diagrams should contain network configurations, as

ptg

well as the hardware included in the infrastructure and the WAN links.

.
Network device configuration—
These documents contain the configurations of

the network devices, including the switches, firewall, and routers on the network.

.
SAN configuration—
Most medium- and large-size organizations utilize one form of

centralized storage or another. When storage devices are utilized, these device

configurations should be documented so they can be recovered in the event of a

device issue.

.
Software documentation—
This document contains a list of all the software used

in the organization, possibly including the licensing information and the storage

location.

.
Service accounts and password document—
A master list of user accounts and

network device usernames and passwords should be created and kept in a sealed

envelope in a secured onsite and offsite location.

.
Contact and support documentation—
This document should contain all IT staff

and vendor contact information required to support the infrastructure.

Developing a Backup Strategy

Determining not only what needs to be backed up, but also how the backups will be

performed and stored, is an important task. Many organizations back up data to tape

media and have that media shipped to offsite storage locations on a weekly basis.

Windows Server 2008 R2 Server Backup is built to support backup to local internal and

Windows Server Backup Overview

1235

externally connected disks and network shares for scheduled backups. Windows Server

Other books

The Tale of Cuckoo Brow Wood by Albert, Susan Wittig
The Wedding Band by Cara Connelly
El asesino de Gor by John Norman
My Lady's Pleasure by Alice Gaines
Carnal Innocence by Nora Roberts
Forced into Submission by Snowdon, Lorna
We're in Trouble by Christopher Coake
The Alchemy of Stone by Ekaterina Sedia
Framed by C.P. Smith