SAP OSS Notes: SAP Note 23070 - Backup and Recovery: Basic Concepts

Symptom:

ORACLE backup and recovery,
basic concepts and contexts taking the example of a R/3 database

Cause and prerequisites

Supplementary to Notes 4161-4165

Solution

1. physical structure of an ORACLE database:
---------------------------------------------

Every ORACLE database consists of the following physical files:

Type *) Name, Meaning Example of physical file name in R/3 ------------------------------------------------------------ 1 DB profiles $ORACLE_HOME/dbs/initC11.ora
(parameter file,
"Init.ora")

2 Control file $ORACLE_HOME/dbs/cntrlC11.dbf
$ORACLE_HOME/sapdata1/cntrl/cntrlC11.dbf
(mirrored)
...

3 Data files $ORACLE_HOME/sapdata1/system_1/system.data1
$ORACLE_HOME/sapdata2/clud_1/clud.data1
...

4 ONLINE redo log $ORACLE_HOME/saplog1/log_g1_m1.dbf
files $ORACLE_HOME/saplog1/log_g2_m1.dbf
(redo log groups) ...
$ORACLE_HOME/saplog2/log_g1_m2.dbf
(mirrored: 2nd member of 1st group)
...

5 OFFLINE redo log $ORACLE_HOME/saparch/C11arch1_101.dbf
Files $ORACLE_HOME/saparch/C11arch1_102.dbf
(archive logs) -----------------------------------------------------------------------
*) Not an official ORACLE type - only for reference in this note

1.: Editable ASCII file, containing start parameters for the database
and is only read by ORACLE when starting the instance.
2.: Contains all information about the physical structure and status of
the database and should always be mirrored at software level
at least once or twice (by ORACLE)!
3.: At least one data file of the SYSTEM tablespace must always exist,
further data files for the SYSTEM or other tablespaces
depending on use. Every tablespace is based on at least one
data file. Data files generally have the greatest storage
requirement of all database files.
4.: Used for instance and media recovery, that is, restoring files
of type 3 after process and disk errors.
At least 2 online redo logs (= 2 redo log groups) belong, with
certain exceptions (undocumented parameter "_disable_logging" for
error correction, or acceleration of certain exception actions)
to each database.
(In R/3, 4 online redo logs are strongly recommended).
This should always be mirrored at least once at software level
(by ORACLE) (i.e. at least 2 members per group).
5.: These exist independently of the current database and are only
used for media recovery (disk error with type 3).
In R/3: Backup types 1 to 4 with BRBACKUP, (all 3 tools can
Backup type 5 with BRARCHIVE, also be called via
Reloading all types with BRRESTORE menu from SAPDBA)

Files of types 2 and 3, which contain all the data and control information for the database, represent the actual core of an ORACLE database. The redo logs (types 4 and 5) contain redundant information which is only needed for restoring (recovery of) the data files (type 3) in case of error. The parameter file (type 1) does not belong
directly to the database, but rather to the instance that works with
the database. (The same applies to types 4 and 5: every instance has
its own redo logs.)

2. Security concept in ORACLE:
---------------------------------

Since it would not make sense to shut down the database and generate a full backup of the database after every transaction, the data files (type 3) are only saved at longer time intervals (in SAP databases with BRBACKUP, for example).
For this purpose, all changes to the data files are stored automatically in separate files, the ONLINE redo log files (type 4) while the system is running. If errors should occur, the current consistent status of the data is "recovered" using the data backups and the stored changes.
The ONLINE redo logs are used cyclically, that is, whenever a file is full, the system switches automatically to the next file ("log switch"). When the last file is full, the first file is overwritten again.
With every log switch, the "log sequence number" of the current ONLINE redo log is incremented by 1 as compared with the previous redo log.
Since a "check point" (writing of all main storage buffer data to the physical data files) is written at every log switch at latest, an "instance recovery" (restoring of the database after crash) is always possible with 2 ONLINE redo logs.
However, for a "media recovery" (restoring of the database after a disk error), all redo log information that has accumulated since the data files (type 3) were backed up the last time is required.
For this reason, the redo log information must be kept for a lengthy period (as far back as the oldest backup of a data file extends). In addition, it must not be possible for the redo log information to be overwritten automatically before it is saved. To do this, change the status of the database from "NOARCHIVELOG" to "ARCHIVELOG".
After every log switch the now inactive ONLINE redo log is released for copying. It cannot be written to again until it has been copied to a target directory (or to tape) by the ORACLE archiver either automatically or after manual intervention (according to status - "automatic archival enabled/disabled"). These copies (one copy per redo log group in each case) are called OFFLINE redo logs (type 5). They are stored at SAP in the directory $ORACLE_HOME/saparch.
If the space available in the file system of the target directory of the OFFLINE redo logs is insufficient, the database stops (in ARCHIVELOG mode) when the next OFFLINE redo log to be overwritten cannot be saved (status referred to as "archive stuck"). Therefore, it is necessary at certain time intervals (varying according to the number and size of the OFFLINE redo logs and the space in the file system) to save the OFFLINE redo logs (generally to tape) and to delete them from the target directory of the ORACLE archiver. The tool BRARCHIVE can be used for this (possibly from within SAPDBA).
The archive modes referred to, the current log sequence numbers of the ONLINE redo logs and their ORACLE archiver backup status can be displayed using sqldba command "archive log list":
sqldba lmode=y
SQLDBA> connect internal
SQLDBA> archive log list;
SQLDBA> exit
(Caution: Saving the ONLINE redo logs with BRBACKUP to create 'saved ONLINE redo logs' as part of an OFFLINE backup of the database should not be confused with archiving of the ONLINE redo logs by the ORACLE archiver to create OFFLINE redo logs.)

3. Software mirroring
---------------------
Owing to the central significance of the control file (type 2) for controlling the database and of the ONLINE redo logs (type 4) for data security, ORACLE provides the option of mirroring these files.
If ORACLE detects a discrepancy within the control files, the database can no longer continue working and crashes the next time the control file is accessed. Once an intact control file has been copied over the faulty mirrored file (or after the defective file has been deleted/renamed in init.ora), the database can be started up again with an automatic instance recovery.
If an error occurs in a mirrored ONLINE redo log file, this is simply set to the status "STALE"; the database can continue operation. As soon as the defective online redo log is overwritten by a log switch, the error is corrected (provided there is no disk error).
SAP recommends at least 2 mirrored control files and 1 mirrored ONLINE redo log for each of these, the mirroring being, of course, to a different disk in each case. To mirror the control file, copy the control file to a different location with the database shut down and enter it along with its path into the list of init.ora parameters 'control_files=(...,...)'.
To mirror the ONLINE redo logs, add another member to one of the redo log groups (at SAP there are four of these) using the sqldba command
SQLDBA> alter database add logfile member 'filename' to group ;
with the database mounted, but closed. (For more information on redo logs, see Note no. 4163).
BRBACKUP saves only one of the mirrored files in each case. BRRESTORE restores the mirrored files automatically as copies. BRARCHIVE has an option of a two-fold backup of the OFFLINE redo log.
(Note: It makes sense to use ORACLE software mirroring even if hardware mirroring is already in use, for example, if a file is mistakenly deleted or overwritten manually because not very large files are involved).

4. Recovery
-----------
The aim of a recovery is to reload all changes made to data files (type 3) that were made since the last backup or regeneration of the data file, up to a specified point in time that applies for the complete database. These are then to be adjusted to correspond to the control file (type 2). The result of the recovery is a consistent data status.

A full recovery always consists of two phases:
1. The actual recovery, which is always directed forwards in time
and recovers all write operations
(sqldba command "recover ...")
sometimes also referred to as the roll-forward phase
2. The rollback phase during which all write operations originating
from as yet incomplete transaction (for example, transactions
with no "commit") are cancelled with the help of the rollback
segments (important: no "backwards recovery" exists!).
This takes place when the database is opened (command
"alter database open...").
Depending on whether you recover right up to the current time and thus restore all the redo log information or whether you opt to recover up to a particular point in time only and opt to do without the latest modifications, we speak either of a complete or incomplete ("point-in-time") recovery. Both cases result in a consistent database.
While it is irrelevant when a particular file was backup up and whether the backup involved is an ONLINE or OFFLINE backup, the end time must be the same for all data files.
This means, for example, that, for a complete recovery, all or only some defective files can be restored from the backup data carrier and recovered (the remaining data files containing no errors still have their current status). For an incomplete recovery, on the other hand, it is always necessary for all data files (even those that are not defective) to be reloaded.
On no account should you open a fully restored database that you wish to recover. Doing so will cause all files to be given the current time stamp and it will no longer be possible to reset to the first redo log after the backup to perform the recovery.

Summary of conditions for a successful recovery:
------------------------------------------------
(The conditions listed below are observed automatically if the tools SAPDBA, BRBACKUP, BRARCHIVE, BRRESTORE are used consistently as described in the SAP database documentation.
A:
The database control file is current or exists as a backup or can be reconstructed manually using "CREATE CONTROLFILE" and mirrors the physical structure of the database at the time when the recovery is completed.
(It is not a problem if data files were removed from the database since the last control file backup. However, no new files should have been added since this time, since these will not be recovered.)
B:
All data files of the database have a current status and are without errors or else exist as ONLINE or OFFLINE backups (these can also be mixed; the backup completion time lies before the recovery completion time) or can be regenerated with "CREATE DATAFILE".
(The latter is always possible if the original control file from the time when this data file originated exists, that is, no even older backup of the control file is used or no "CREATE CONTROLFILE" was executed subsequently.
A backup may not be older than the time when the file originated. If a tablespace is reorganized, for example, it is not possible to restore with older backups (even if the size and name are the same). This is because a change to the structure is involved.
Data files of tablespaces deleted during the period to be recovered need not exist. They are not recovered if the current control file is used.
All data files to be recovered must be set to ONLINE.
Files set to OFFLINE will not be recovered.
C:
All redo logs (OFFLINE and ONLINE) generated since the oldest backup to be used (for ONLINE backup: time of BEGIN BACKUP) or between the time when the file was generated and the end of the recovery, are available without exception (by log sequence number) until the time when the recovery should be completed.

SAP OSS Notes

22.2.11

SAP Note 23070 - Backup and Recovery: Basic Concepts

No comments:

Post a Comment

Subscribe