Pro Oracle Database 11g Administration (85 page)

Read Pro Oracle Database 11g Administration Online

Authors: Darl Kuhn

Tags: #Oracle DBA

CHAPTER 19 ■ RMAN RESTORE AND RECOVERY

other server were intact. I was able to restore and recover the production database from these backups.

We lost about a days worth of data (between corrupt archive logs and downtime when no incoming transactions were allowed), but we were able to get the database restored and recovered about 20 hours after the initial phone call. That was a long day.

Most situations in which you need to restore and recovery will not be as bad as the one just described. However, the previous scenario does highlight the need for:

• A Backup strategy.

• A DBA with backup & recovery skills.

• A restore and recovery strategy, including a requirement to periodically test the restore and recovery.

This chapter walks you through restore and recovery using RMAN. It covers many of the common tasks you will have to perform when dealing with media failures.

Determining Media Recovery Required

The term “media recovery” refers to the need to restore files that have been lost or damaged due to failure of the underlying storage media (usually a disk of some sort). Usually, you know that media recovery is required by some sort of an error like the following:

ORA-01157: cannot identify/lock data file 1 - see DBWR trace file

ORA-01110: data file 1: '/u02/oracle/oradata/E64208/system01.dbf'

The error may be displayed on your screen when performing DBA tasks such as stopping and starting the database. Or you might see such an error in a trace file or the alert.log file. If you don't notice the issue right away, with a severe media failure, the database will stop processing transactions and users will start calling you.

To understand how Oracle is determining that media recovery is required, you must understand how Oracle determines that everything is okay. When Oracle shuts down normally (IMMEDIATE, TRANSACTIONAL, NORMAL), part of the shutdown process is to flush all modified blocks to disk and mark the header of each datafile with the current SCN and to update the control file with the current SCN

information.

Upon startup, Oracle checks to see if the SCN in the control file matches the SCN in the header of the datafile. If there is a match, then Oracle attempts to open the datafiles and online redo log files. If all files are available and can be opened, Oracle starts normally. The following query compares the SCN in the control file (for each datafile) with the SCN in the datafile header: SET LINES 132

COL name FORM a40

COL status FORM A8

COL file# FORM 9999

COL control_file_SCN FORM 999999999999999

COL datafile_SCN FORM 999999999999999

--

SELECT

a.name

,a.status

,a.file#

,a.checkpoint_change# control_file_SCN

,b.checkpoint_change# datafile_SCN

,CASE

510

CHAPTER 19 ■ RMAN RESTORE AND RECOVERY

WHEN ((a.checkpoint_change# - b.checkpoint_change#) = 0) THEN 'Startup Normal'

WHEN ((b.checkpoint_change#) = 0) THEN 'File Missing?'

WHEN ((a.checkpoint_change# - b.checkpoint_change#) > 0) THEN 'Media Rec. Req.'

WHEN ((a.checkpoint_change# - b.checkpoint_change#) < 0) THEN 'Old Control File'

ELSE 'what the ?'

END datafile_status

FROM v$datafile a -- control file SCN for datafile

,v$datafile_header b -- datafile header SCN

WHERE a.file# = b.file#

ORDER BY a.file#;

If the control file SCN values are greater than the datafile SCN values, then media recovery is most likely required.

Determining What to Restore

Media recovery requires that you perform manual tasks to get your database back in one piece. These tasks usually involve a combination of RESTORE and RECOVER commands. You will have to issue an RMAN

RESTORE command if your datafiles have experienced media failure. This could be because of somebody accidentally deleting files or a disk failure.

How the Process Works

When you issue the RESTORE command, RMAN will automatically determine how to extract the datafiles from any of the following available backups:

• Full database backup

• Incremental level 0 backup

• Image copy backup generated by BACKUP AS COPY command

After the files are restored from a backup, you are required to apply redo to them via the RECOVER

command. When you issue the RECOVER command, Oracle will examine the SCNs in the affected datafiles and determine whether any of them need to be recovered. If the SCN in the datafile is less than the corresponding SCN in the control file, then media recovery will be required.

Oracle will retrieve the datafile SCN and then look for the corresponding SCN in the redo stream to determine where to start the recovery process. If the starting recovery SCN is in the online redo log files, the archived redo log files are not required for recovery.

During a recovery, RMAN will automatically determine how to apply redo. First, RMAN will apply any incremental backups available that are greater than zero, such as the incremental level 1. Next, any archived redo log files on disk will be applied. If the archived redo log files do not exist on disk, RMAN

will attempt to retrieve them from a backup set.

To be able to perform a complete recovery, all of the following conditions need to be true:

• Your database is in archivelog mode.

• You have a good baseline backup of your database.

• You have any required redo that has been generated since the backup (archived redo log files, online redo log files, or incremental backups that RMAN can use for recovery instead of applying redo).

511

CHAPTER 19 ■ RMAN RESTORE AND RECOVERY

There are a wide variety of restore and recovery scenarios. How you restore and recover depends directly on your backup strategy and what files have been damaged. Listed next are the general steps to follow when facing a media failure:

1. Determine what files need to be restored.

2. Depending on the damage, set your database mode to nomount, mount, or open.

3. Use the RESTORE command to retrieve files from RMAN backups.

4. Use the RECOVER command for datafiles requiring recovery.

5. Open your database.

Your particular restore and recovery scenario may not require that all of the previous steps be performed. For example, you may just want to restore your spfile, which doesn’t require a recovery step.

The first step in a restore and recovery process is to determine what files have experienced media failure. You can usually determine what files need to be restored from the following sources:

• Error messages displayed on your screen, either from RMAN or SQL*Plus

• Alert.log file and corresponding trace files

• Data dictionary views

If you're using Oracle Database 11
g
or higher, then in addition to the previously listed methods you should consider the Data Recovery Advisor for obtaining information about the extent of a failure and corresponding corrective action.

Using Data Recovery Advisor

The Data Recovery Advisor tool was introduced in Oracle Database 11
g
. In the event of a media failure, this tool will display the details of the failure, recommend corrective actions, and it will perform the recommended actions if you specify it to do so. It's like having another set of eyes to provide feedback when in a restore and recovery situation. There are three modes to Data Recovery Advisor:

• Listing failures.

• Suggesting corrective action.

• Running commands to repair failures.

The Data Recovery Advisor is invoked from RMAN. You can think of the Data Recovery Advisor as a set of RMAN commands that can assist you when dealing with media failures.

Listing Failures

When using the Data Recovery Advisor, the LIST FAILURE command is used to display any issues with the datafiles, control files, or online redo logs:

RMAN> list failure;

If there are no detected failures, you'll see a message indicating that there are no failures. Here is some sample output indicating that there may be an issue with a datafile: 512

CHAPTER 19 ■ RMAN RESTORE AND RECOVERY

List of Database Failures

=========================

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------- -------

662 HIGH OPEN 16-SEP-10 One or more non-system datafiles are missing The prior message doesn’t indicate which specific file may be experiencing a failure. To dig a little deeper, use the DETAIL clause:

RMAN> list failure 662 detail;

List of Database Failures

=========================

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------- -------

662 HIGH OPEN 16-SEP-10

One or more non-system datafiles are missing

Impact: See impact for individual child failures

List of child failures for parent failure ID 662

Failure ID Priority Status Time Detected Summary

---------- -------- --------- ------------- -------

665 HIGH OPEN 16-SEP-10 Datafile 7:

'/ora01/dbfile/O11R2/users02.dbf' is missing

Impact: Some objects in tablespace USERS might be unavailable

This output details which file has experienced a failure and the nature of the problem (file missing).

Suggesting Corrective Action

The ADVISE FAILURE command gives advice about how to recover from potential problems detected by the Data Recovery Advisor. If you have multiple failures with your database, you can directly specify the failure ID to get advice on a given failure like so:

RMAN> advise failure 665;

Here is some sample output for this particular issue:

=======================

1. If file /ora01/dbfile/O11R2/users02.dbf was unintentionally renamed or moved, restore it Automated Repair Options

========================

Option Repair Description

------ ------------------

1 Restore and recover datafile 7

Strategy: The repair includes complete media recovery with no data loss Repair script: /ora01/app/oracle/diag/rdbms/o11r2/O11R2/hm/reco_1184243250.hm In this case, the Data Recovery Advisor created a script that can be used to potentially fix the problem. The contents of the repair script can be viewed with an operating system editor. For example: $ vi /ora01/app/oracle/diag/rdbms/o11r2/O11R2/hm/reco_1184243250.hm

513

CHAPTER 19 ■ RMAN RESTORE AND RECOVERY

Here are the contents of the script (for this particular example):

# restore and recover datafile

sql 'alter database datafile 7 offline';

restore datafile 7;

recover datafile 7;

sql 'alter database datafile 7 online';

After reviewing the script, you can decide to manually run the suggested commands, or you can have the Data Recovery Advisor run the script via the REPAIR command (see the next section for details).

Repairing Failures

If you have identified a failure and viewed the recommended advice, you can proceed to actually repairing a failure. If you want to inspect what the REPAIR FAILURE command will do without actually running the commands, use the PREVIEW clause. Before you run the command, make sure you first run the LIST FAILURE and ADVISE FAILURE commands from the same connected session. In other words, the RMAN session that you're in must run the LIST and ADVISE commands within the same session before running the REPAIR command.

RMAN> repair failure preview;

If you're satisfied with the repair suggestions, then run the REPAIR FAILURE command.

RMAN> repair failure;

You'll be prompted at this point for confirmation.

Do you really want to execute the above repair (enter YES or NO)?

Type in YES to proceed.

YES

If all goes well, you should see a final message like this:

media recovery complete, elapsed time: 00:00:02

Finished recover at 16-SEP-10

sql statement: alter database datafile 7 online

repair failure complete


Note
You can run the Data Recovery Advisor commands from the RMAN command prompt or from Enterprise Manager.

In this way, you can use the RMAN commands of LIST FAILURE, ADVISE FAILURE, and REPAIR

FAILURE to resolve media failures. The Data Recovery Advisor can assist with most issues that you'll run into. I did have a couple scenarios where the following recommendation was provided: 514

CHAPTER 19 ■ RMAN RESTORE AND RECOVERY

Mandatory Manual Actions

========================

1. Please contact Oracle Support Services to resolve failure 149165...

Optional Manual Actions

=======================

no manual actions available

Automated Repair Options

========================

no automatic repair options available

This output indicates that you (the DBA) aren't quite out of a job yet. ;) Using RMAN to Stop/Start Oracle

You can use RMAN to stop and start your database with methods that are almost identical to those available through SQL*Plus. When performing restore and recovery operations, it’s often more convenient to stop and start your database from within RMAN. The following RMAN commands can be used to stop and start your database:

• SHUTDOWN

• STARTUP

• ALTER DATABASE

Shutting Down

The SHUTDOWN command works the same from RMAN as it does from SQL*Plus. There are four types of shutdown: ABORT, IMMEDIATE, NORMAL, and TRANSACTIONAL. I usually first attempt to use SHUTDOWN

IMMEDIATE to stop a database. If that doesn't work, don't hesitate to use SHUTDOWN ABORT. Here are some examples:

RMAN> shutdown immediate;

RMAN> shutdown abort;

If you don't specify a shutdown option, NORMAL is the default. Shutting a database down with NORMAL

is rarely viable as this mode waits for currently connected users to disconnect at their leisure. I never use NORMAL when shutting down a database.

Other books

Closer Still by Jo Bannister
Tides of Honour by Genevieve Graham
You Are a Writer by Jeff Goins, Sarah Mae
The Secret Bride by Diane Haeger
Friday Afternoon by Sylvia Ryan
Geoffrey's Rules by Emily Tilton
Rosemary's Baby by Levin, Ira
Private Tasting by Nina Jaynes
The Procedure by Tabatha Vargo, Melissa Andrea