Oracle RMAN 11g Backup and Recovery (165 page)

518
Part IV: RMAN in the Oracle Ecosystem

ync and split technology is an example of an innovative (and challenging) solution for storage recovery that complements or duplicates many of the features RMAN

can accomplish independently. Over the past five years, sync and split has become
S
a widely used technology to provide immediate and very fast system recovery at the storage hardware level.

In this chapter, we will provide an overview of what sync and split technology refers to. We won’t be discussing any single implementation in particular, but rather discussing the implications for RMAN and database backups. After the overview, we go into the specific steps required to integrate sync and split solutions into an RMAN backup strategy.

Sync and Split: Broken Mirror Backups

In the beginning, doing sync and split backups involved nothing more complicated than extending the functionality of hardware mirroring. The best way to explain this statement is through an example. Suppose we have a disk controller that has two hard drives. For redundancy, we set the RAID level to 0 + 1 so that we are mirroring everything on disk A to disk B. This gives us immediate protection against any kind of hardware failure on either disk A or disk B.

The next step, then, is to try to leverage the hardware mirror to provide logical fault tolerance.

That is the goal of sync and split technology: to provide a fallback position in case of some failure that has occurred on both copies in the mirror. For example, suppose that a user has deleted the entire oracle software tree or the oradata directory. Such a deletion would immediately occur at both copies in our mirror, so having a mirrored copy would do us no good.

So, what is the solution? The innovation is that any mirrored disk group may have two mirror groups, but may only ever have one mirror currently writing the identical bits as the primary disk group. Let’s build an example with three logical volumes, A, B, and C, all dedicated to the same data. Volumes A, B, and C are all mirrored copies of each other. However, at 2 P.M., volume A is split away from the mirror, leaving its bits “stuck” at the split time. Volumes B and C continue to be bit-for-bit copies. After four hours, at 6 P.M., volume C is split from volume B so that it no longer gets writes of data. At this point, there are three different copies of the data on the volume: a copy at 2 P.M., a copy at 6 P.M., and a current copy. There is also no redundancy to protect against a disk failure.

Where Are We in RAID?

Need a superfast, overly simplistic primer on RAID? We’re here for you. There are hundreds of theories, from the radical to the traditional, that outline the best possible solution for disk failure protection. Typically, the Oracle “technorati” have long taken the position that nothing beats RAID 0 + 1, in which you have two disk groups, group 1 and group 2, both of which have two disks. The two disks on group 1 are striped, so that data is evenly spread across both disks.

Group 2 is an exact copy, bit for bit, of group 1. This configuration gives us both performance, by striping across disks to avoid hot spots, and redundancy, by writing every bit twice.

Recently, we were reviewing the specs for a RAID 1 + 0 configuration, which is slightly different from 0 + 1. Instead of striping and then mirroring, a RAID 1 + 0 configuration mirrors and then stripes. The difference is best represented visually, as shown in the following illustration. Here, we mirror each disk separately so that we end up with four disk groups.

After mirroring each disk, we then stripe across the four mirrors.

Chapter 22: RMAN in Sync and Split Technology
519

It might seem like a small difference, but RAID 1 + 0 has greater fault tolerance, because the failure of any one disk does not take down the other mirrored disks. In RAID 0 + 1, if any disk in group 2 fails, the whole group goes offline. So, RAID 1 + 0 provides greater tolerance than RAID 0 + 1 for multiple disk failure, instead of single disk failure.

To get back to our RAID 0 + 1 configuration, disk volume A will be “resilvered” up to disk B, which runs at the current point in time. This sync up is based on the fact that the volumes have a journaling mechanism in place that records all data changes. This journaling is more I/O on top of the multiple writes to each volume. Volume A will get access to the journals of changes on volume B

and will apply all the changes until it is getting live writes at the same time as volume B. At this point, then, you have volumes A and B in redundant mode, and volume C is your fallback position, at 6 P.M. Figure 22-1 illustrates this process.

Writes to disk

Writes to disk

Writes to disk

A

B

C

A

B

C

A

B

C

2:00 P.M.

2:01 P.M.

2:00 P.M.

6:01 P.M.

6:00 P.M.

1:59 P.M.

2:00 P.M.

6:00 P.M.

FIGURE 22-1
Sync and split technology in action

520
Part IV: RMAN in the Oracle Ecosystem

This sync and split cycle goes on and on, ad infinitum. Every four hours, a volume is synced up to the primary volume, and another volume is split away to provide a fallback position in case of a logical failure. What happens at the time you actually encounter the logical failure? In our example, let’s assume that it is now 8 P.M. Volumes A and B are getting concurrent writes, and volume C is waiting idle at 6 P.M. At 8 P.M., a DBA is doing some system maintenance and deletes the system datafile from the production database. This is when the worrying begins.

Luckily, no unrecoverable data has been added to the database since the end of the day at 5 P.M. However, the nightly batch loads start in about 15 minutes. The DBA has a small window to get the production database back up and running.

With the database running entirely on the mirrored disk volumes A and B, the sync and split architecture has given our DBA an immediate solution. He immediately configures volume C, which was stuck at 6 P.M., as the primary volume and starts up the database. When the database looks for its datafiles, it finds all the files as they appear on volume C, at 6 P.M., and no deletes have taken place. By the time the DBA is finished, it is only 8:05 P.M. The batch processes will kick off on time. Figure 22-2 shows the process.

Oracle Databases on Sync and Split Volumes

The Oracle software files can reside on a sync and split volume and thus can help protect against logical corruption that occurs in the binaries themselves. No additional configuration is needed, from an Oracle perspective. The files associated with an Oracle database, on the other hand, come with some very specific caveats and disclaimers when you start putting them on sync and split volumes. These caveats and disclaimers relate to the fact that Oracle files are always open and always have active writes taking place (this being the primary importance of a good relational database). So, if you are actively writing to your database and it is mirrored on two drives, there will be consequences if you suddenly break the mirror, unbeknownst to the database.

Each vendor-specific solution is a bit different, but at some point, a volume that is getting active writes must turn off the writes to that volume while continuing to allow writes to another volume. And regardless of how a salesperson might pitch it, the process of breaking a mirror is not instantaneous. Breaking a mirror is more like peeling a banana—you start at the top and
FIGURE 22-2
Sync and split in action

Chapter 22: RMAN in Sync and Split Technology
521

separate the peel from the fruit until you get to the bottom. Suppose your Oracle datafile is the fruit, and the mirrored copy of the datafile is the peel. If you peel away the mirror copy, you are starting at the beginning of the datafile, and the break is complete when you reach the end of the datafile. However, it is possible (likely) that Oracle will attempt to write to a block while the mirror is in the middle of peeling away. So, on the primary volume, nothing is wrong—the file header knows that an SCN has been advanced in the file and knows which block it was—but on the split mirror, the datafile header knows nothing about the written block. So, after the mirror break is complete, what do we have on the split mirror volume? One fuzzy datafile that is unrecoverable. Check out Figure 22-3 to see this.

Fear not, for there are ways to ensure that the split mirror is a healthy copy of the database. It just takes a bit of work first. How you configure Oracle database files in a sync and split environment depends on what type of files you are configuring: datafiles, control files, redo log files, or archive logs. The following sections address each in turn.

Datafiles

The previous section explained what happens to Oracle datafiles if a mirror split takes place without any preparation: the split volume copies of the files are left in a fuzzy, unusable state.

This is precisely the same predicament you run into if you simply take a copy of an online datafile without first putting it into hot backup mode. So, before you break the mirror, you must put all datafiles into hot backup mode. This is not an optional step, regardless of which vendor product you are using. Because the split generally takes a very short time, the amount of time in hot backup mode is much shorter than it would be if you were doing a copy against the same datafiles. And the I/O hit of running in backup mode (and producing more archive logs) will be relatively small, as well.

FIGURE 22-3
Unrecoverable fuzzy datafile

522
Part IV: RMAN in the Oracle Ecosystem

To alleviate the headaches of hot backup mode for those implementing sync and split architectures, Oracle has added syntax that allows you to put an entire database into hot backup mode with a single command:

alter database begin backup;

Previously, you had to put each tablespace into hot backup mode. If there is something preventing the file from going into backup mode, a warning is generated in the alert log, but the
begin backup
command proceeds anyway.

After the split is complete, you pull the database out of hot backup mode with the following command:

alter database end backup;

Control Files

A split mirror copy of a control file is in an unusable state immediately after the split mirror operation completes. The control file, in general, is up-to-date on the current state of all the datafiles. However, based on the total duration of the split itself, and the overall activity on the database at the time of the split, the control file at the split volume may not reflect much accurate data about the state of the datafiles.

Putting the database into hot backup mode cures most of these ills. With the database in hot backup mode, the control file is aware of a starting point at which recovery will be required, and from which it will be feasible. However, the control file is still at odds with reality: it thinks of itself as a current control file of an active database. This is hardly the case.

We’ve seen some implementations where a DBA insists on trying to keep the current control file available as such on the split volume, particularly if the split volume will be used for reporting purposes. However, when the time comes to put this control file into service for the sake of recovery, you have to use the
using backup controlfile
command so that the control file understands that some of its checkpoint and SCN information may not reflect reality:

Other books

The Shadow of Venus by Judith Van Gieson
Brokeback Mountain by Annie Proulx
The wrong end of time by John Brunner
Chasing the White Witch by Marina Cohen
Siege and Storm by Leigh Bardugo
The Last Honorable Man by Vickie Taylor
The Ex Games by J. S. Cooper, Helen Cooper
Six Impossible Things by Fiona Wood
Bad Yeti by Carrie Harris