Restoring a Primary Database
________________________________________________________________________
Phase I: Clean up the old system
1) Log on to your primary Replication Server and suspend the DSI connection to the
primary database.
suspend connection to
srv1.pdb
2) Re-create the primary database.
- · If your primary database is irretrievably damaged, drop it. If the database is
marked suspect, you must use the
dbcc dbrepair command.
Drop database will
not execute on a suspect database.
drop database
pdb
dbcc dbrepair
(pdb,dropdb)
- · Drop any damaged database devices using sp_dropdevice.
- · Create new database devices using disk init (if necessary).
- · Create the new database using create database with the for load option.
3) Restore the primary database.
- · Load the database from the database dump.
load database
pdb from
<dump file>
- · Restore any transaction log dumps.
load transaction
pdb from
<dump file>
- · Activate the database using the online database command.
online database
pdb
Phase II: Re-Establish Replication
If you have done up-to-the-minute recovery, your primary database should be current, and
you can go directly to Step 8. Steps 4-7 are only necessary if you have not recovered all
of the data in the database.
4) Zero out the LTM truncation point. Use the RSSD for your primary Replication
Server, and run the following command:
rs_zeroltm
srv1,
pdb
5) Log on to the primary Replication Server and get the generation number for the
primary database. Record this number.
admin get_generation,
srv1,
pdb
6) Use your primary database, and update the generation number. The new number
should be higher than the old number.
dbcc settrunc(’ltm’, ’gen_id’,
<new number>)
The reason for doing this is that the generation number will differ between the RSSD on the Repserver controlling this db and the actual db The reason for doing this is that if you are restoring a primary database to an earlier state, you have to increment the database generation number so that the Replication Server does not ignore log records submitted after the database is reloaded.
7) Recover any lost or inconsistent data.
- · Recover any data modification lost from the primary database.
- · Re-synchronize the replicate database:
- · Re-materialize the replicate database.
- · Run rs_subcmp.
- · Replay the transaction log from a log backup.
8) Once the primary and replicate databases are correct, you can resume replication. Log
on to your primary Replication Server and resume the DSI connection to your primary
database.
resume connection to
srv1.pdb
9) Restart the RepAgent on your primary database.
10) Review your error logs to ensure that no errors have occurred.
Re-playing the Transaction Log
________________________________________________________________________
1) Log on to your Adaptive Server. Create a new database that has the same size and
segments as your primary database. This will be your temporary recovery database.
2) Shut down the Replication Server and restart it in single-user mode. To do this, you
should have a run server file that contains the
–M option.
Once the Replication Server comes up, check the error log to ensure that the server
came up in stand-alone mode.
3) Put the Replication Server into log recovery mode by executing the following
commands:
set log recovery for
srv1.
pdb
allow connections
4) Configure a RepAgent for the temporary recovery database that you created in Step 1.
Log into your primary data server, use the temporary recovery database, and execute
the following commands:
exec sp_config_rep_agent
temp_rep, ‘enable’,
‘rep1’,
’sa’,’’
5) Log on to your primary Adaptive Server and load the temporary database you created
with the database dump of your primary database.
load database
temp_rep from
’<dump file>’
go
online database temp_rep
6) Start the RepAgent for the temporary recovery database in recovery mode. Although
you are starting the RepAgent on the temporary recovery database, the ‘connect
database’ is your original primary database:
exec sp_start_rep_agent
temp_rep, recovery,
‘srv1’,‘pdb’,’rep1’
7) Verify that the RepAgent has replayed everything that you have loaded. There are
two ways to do this:
- · Examine the error log of your primary Adaptive Server, looking for a message
that states that recovery of the transaction log is complete.
- · Run sp_help_rep_agent on the RepAgent for the temporary recovery database:
exec sp_help_rep_agent
temp_rep, ‘recovery’
The recovery status should be “not running” or “end of log”.
8) Log on to your primary Adaptive Server and load the temporary database you created
with the first transaction dump of your primary database:
load transaction
temp_rep
from
’<dump file>’
9) Restoring the transaction dump will invalidate the LTM truncation point. Reset it by
using the temporary recovery database and running the following command:
dbcc settrunc(‘ltm’,’valid’)
10) Start the RepAgent for the temporary recovery database in recovery mode:
exec sp_start_rep_agent
temp_rep, recovery,
‘srv1’,‘pdb’,’rep1’
11) Verify that the RepAgent has replayed everything that you have loaded. There are
two ways to do this:
- · Examine the error log of your primary Adaptive Server, looking for a message
that states that recovery of the transaction log is complete.
- · Run sp_help_rep_agent on the RepAgent for the temporary recovery database:
exec sp_help_rep_agent
temp_rep, ‘recovery’
The recovery status should be “not running” or “end of log”.
12) Repeat Steps 8, 9, 10, and 11 for each transaction log dump until you have played all
transaction log dumps through the replication system.
13) Check the Replication Server error log for any loss detection messages. If you find
any, then this process has failed, and you must find another way of re-synchronizing
your data in the primary and replicate databases. Your other options are:
- · Re-materializing the replicate database.
- · Running rs_subcmp.
14) If you did not find loss detection messages in the previous step, shut down your
Replication Server, and restart it in normal mode. To do this, you should have a run
server file that does not contain the
–M option.
If you accidentally bring the server up in single-user mode, there will be a message to
this effect in the error log. Shut Replication Server down, and start it up using a
correct run server file.
15) Re-establish the LTM truncation point in the original primary database. Use the
primary database, and run the following command:
dbcc settrunc(‘ltm’,’valid’)
16) Restart the RepAgent for the original primary database in normal mode.
exec sp_start_rep_agent
pdb
17) Log on to your Replication Server and check for any open transactions
admin who,sqt
18) Drop the temporary recovery database, and remove its database devices.
Rebuilding Stable Queues
________________________________________________________________________
1) Log on to your replication server, and drop the damaged partition.
drop partition
<partition name>
2) If necessary, add a new partition to replace the disk space. Remember, the file or raw
partition must already exist before you execute this command.
- · In UNIX, you can use the touch command to create a file.
- · In Windows NT, you can create a file using a utility program such as Notepad.
add partition
<partition name>
on ’<
physical_name>’
with size
<size>
3) Check the disk space on your Replication Server. Your failed partition should contain
the status DROPPED, while your new partition should be listed as ONLINE.
admin disk_space
4) Rebuild the stable device:
rebuild queues
5) If the connection to your replicate database is suspended, resume it.
6) Examine your Replication Server error log. Wait for the log to display the message
“Rebuild Queues: Complete”.
You may see the error message, “DSI: detecting loss for database
’srv1.pdb’…” If
your RepAgent is not running, this message can be ignored.
7) If the RepAgent for the primary database is not running, start it up. Use your primary
database and run the following:
exec sp_start_rep_agent
pdb
8) Examine your Replication Server error log. Check for loss detection messages. If a
data loss was detected, then this process has not fully succeeded, and you have lost
transactions.
- · Compare the primary and replicate databases, and replace any lost data.
- · Use the ignore loss command to restart replication.
ignore loss from
srv1.pdb to
srv1.rdb
If no data loss was detected, Replication Server will end loss detection mode, and
normal replication may continue.
9) Check the disk space used by your Replication Server’s stable device.
admin disk_space
If the damaged partition no longer appears in this display, the file or raw partition that
was used by the damaged partition may be dropped.
Restoring an RSSD Database
________________________________________________________________________
Phase I: Clean up the old system
1) Shut down all RepAgents that connect to the current Replication Server.
2) If the Replication Server is still running, log on to it and shut it down.
3) Re-create the RSSD database.
- · If your RSSD database is irretrievably damaged, drop it. If the database is marked
suspect, you must use the
dbcc dbrepair command.
Drop database will not
execute on a suspect database.
drop database
rep1_RSSD
dbcc dbrepair
(rep1_RSSD,dropdb)
- · Drop any damaged database devices using sp_dropdevice.
- · Create new database devices using disk init (if necessary).
- · Create the new RSSD database using create database with the for load option.
4) Restore the RSSD database.
- · Load the database from the database dump.
load database
rep1_RSSD
from
‘<dump device>’
- · Load the database from any transaction log dumps. Be sure to apply the
transaction log dumps in order.
load tran
rep1_RSSD
from
‘<dump device>’
- · Activate the database using the online database command.
online database
rep1_RSSD
5) If your Replication Server’s RSSD had a RepAgent and you were able to recover the
database up-to-the-minute, re-establish its truncation point and restart your
RepAgents. Use the RSSD database and run the following command:
dbcc settrunc(’ltm’, ’valid’)
6) If your Replication Server’s RSSD had a RepAgent and there was an active route from
this replication server to another one, but you were unable to recover up-to-the-minute,
get the generation number.
admin get_generation,
srv1,
rep1_RSSD
Phase II: Rebuild the RSSD
If you have done up-to-the-minute recovery, your RSSD database should be current.
Restart your Replication Server and RepAgents. Steps 7-12 are only necessary if you
have not been able to recover the RSSD database to its current state.
7) Shut down the Replication Server and restart it in single-user mode. To do this, you
should have a run server file that contains the
–M option. Once the Replication
Server comes up, check the error log to ensure that the server came up in stand-alone
mode.
8) Rebuild the stable device:
rebuild queues
9) Start up the RepAgents for your primary databases in recovery mode. To do this, do
the following:
sp_start_rep_agent pdb1, recovery
10) Examine your Replication Server error log, and check for loss detection messages. If
a data loss was detected:
- · Compare the primary and replicate databases, and replace any lost data.
- · Use the ignore loss command to restart replication.
ignore loss from
srv1.pdb to
srv1.rdb
If no data loss was detected, Replication Server will end loss detection mode, and
normal replication may continue.
11) If this Replication Server is a primary Replication Server, with a route to a
downstream Replication Server, then you must clean up the truncation point in the
RSSD. Remember that a Primary Replication Server’s RSSD is a primary database,
replicating schema information to other RSSDs.
If your Replication Server is the only one in the domain, or if it is a replicate
Replication Server, you do not need to run this last step.
To clean up the truncation point, do the following:
- · Clear the RSSD’s LTM truncation point:
dbcc settrunc(’ltm’, ’ignore’)
- · Move the transaction log forward. Execute the following batch multiple times (40
times, for example):
begin tran
checkpoint
commit
If the number of log records does not increase, create a table in the database and
drop it, and then try the above batch again.
- · Reset the RSSD’s LTM truncation point:
dbcc settrunc(’ltm’, ’valid’)
- · Increment the generation id number in the RSSD:
dbcc settrunc(’ltm’, ’gen_id’,
<new number>)
12) Shut down your Replication Server and your RepAgents, and restart them in normal
mode.
- · To start your Replication Server in normal mode, you should have a run server file
that does not contain the
–M option. If you accidentally bring the server up in single-user
mode, there will be a message to this effect in the error log. Shut Replication
Server down, and start it up using a correct run server file.
- · To start a RepAgent in normal mode, run the following command:
sp_start_rep_agent
pdb
Avoiding Disaster
________________________________________________________________________
To avoid disasters on your RSSD database:
1) Put your RSSD on database devices that are not being used by any other databases. If
you can, put them on separate disk drives.
2) Separate your RSSD data and log portions on to separate database devices and, if
possible, separate disk drives.
3) Mirror your RSSD database devices.
4) Keep current backups of your RSSD database.
5) Perform transaction backups of your RSSD database to permit up-to-the-minute
recovery.
To avoid disasters on your stable device:
6) Mirror your stable device.
Note: Sybase does not provide a mechanism to mirror your stable device. Your disk
drives or operating system must do this.
7) Always have more disk space in your stable device than is strictly necessary. If
replication fails, having more disk space gives you time to fix the problem before the
stable device fills up.
To avoid disasters on your application databases:
8) Keep current backups of your primary database.
9) If your replicate database is significantly different than your primary database, and
would be difficult to rebuild, keep a current backup of it, also.
Miscellaneous suggestions
10) Document your system, so that you know the names and locations of all important
objects.
11) Create
rs_subcmp parameter files for every subscription you are replicating, so that
you don’t need to create them during a crisis.
12) On a regular basis, run
rs_subcmp against your primary and replicate tables to ensure
that they truly are synchronized.