subject: fix: NTBACKUP aborts with 2-byte logfile
posted: Sat, 26 Jul 2008 08:37:59 +0100


THE PROBLEM:

Following a reboot, the scheduled nightly backup aborts with a 2-byte
logfile and the following entries in the system log:

23:00:14 / 7035 / Service Control Manager / The Removable Storage
Service was successfully sent a start control.
23:00:14 / 7036 / Service Control Manager / The Removable Storage
Service entered the running state.
23:33:46 / 98 / Removable Storage Service / RSM was stopped.
23:33:46 / 7036 / Service Control Manager / The Removable Storage
Service entered the stopped state.

- in the Services control panel applet, the Removable Storage Service
is shown as stopped

- the recovery options for the Removable Storage Service are all set
to "restart the service" (this makes no difference)

- the backup log (created by the script which runs the backup)
reports the backup failed 33:33 minutes into the job (it's not always
exactly 33 minutes, but it usually is)

- in the MMC, under removable storage / libraries, the tape drive is
listed, but it has a red X on it

- within approx 10 seconds, and without user interaction, the red X
disappears

- in the Services control panel applet, the Removable Storage Service
is now shown as started

- this situation will recur nightly until the system is kicked
somehow* or until the fix is applied

- NOTE: some users may report the logfile size as 1Kb, however that
is Windows Explorer mis-reporting the actual size, which is 2 bytes

* try right-clicking the tape drive in the MMC


THE CAUSE:

This behaviour occurs when the backup job encounters incorrect media
in the drive. NTBACKUP shows an "insert correct media" dialog box
when this happens, BUT the dialog box closes AUTOMATICALLY (eg. times
out) after approx 33 minutes!!!!!!!!!! If the job is scheduled to run
unattended, the operator will never see this error. As nothing is
written anywhere in the logs to indicate the problem, the operator
will not be able to use the logs to see the error. The only way for
the operator to see the error is to watch it run. The actual message
is:

"pending operator request" (title of dialog box), "place media
'xxxxx' into drive or library unit" (text of dialog box), with two
buttons, "Complete" or "Refuse".

This dialog box is obscuring another dialog box, which says:

"mount waiting" (title of dialog box), "The library is unable to load
the media at this time because the drive is in use. [...] Please
wait for the drive to become available. You may cancel this request
at any time." (text of dialog box), with one button, "Cancel"

- if we leave the dialogs alone, they both automatically cancel after
approx 33 mins, and we get a 2-byte logfile

- if we click "Complete", the message reappears

- if we click "Refuse", the job aborts, and we get a 2-byte logfile

- if we click "Cancel", the job aborts, and we get a 2-byte logfile

Note that these dialogs may appear even when the /UM (unmanaged)
switch is used. With /UM, NTBACKUP is meant to ignore the tape label
and make a backup irrespective of which particular tape is in the
drive. Crucially, NTBACKUP does NOT do this after a tape change
following a reboot.

The dialogs can be observed by placing a PAUSE in the backup script,
immediately before NTBACKUP is called. The job will wait for you to
press a key, you do that and NTBACKUP will start, then you'll see the
dialogs. If you ignore them for approx 33 minutes, they will
disappear, and a 2-byte logfile will be created.


THE SOLUTION:

In your backup script, use RSM REFRESH /lf "tape drive name" before
starting NTBACKUP, and use RSM EJECT /lf"tape drive name" afterwards,
eg:

rsm refresh /lf ....
ntbackup ....
rsm eject /lf ....

This makes the /UM switch work as advertised. Why this works is not
known.


RSM BACKGROUNDER:

RSM.EXE is the Removable Storage Manager Command Line Interface

RSM.EXE and several other parts of RSS are not visible to the user,
even in when logged in as Administrator! The only way to see/run
C:\WINDOWS\SYSTEM32\RSM.EXE under Windows 2003 is to be running with
SYSTEM privs at the time (this can be done by scheduling a command
shell). (OK so there is another way, that involves scheduling a
command shell, then using the resulting shell (which has SYSTEM
privs) to copy C:\WINDOWS\SYSTEM32\RSM.EXE to a different directory,
then it can be seen/run as Administrator, anytime, without any more
scheduled command shells).

Force RSM to update its internal database, such that it matches the
actual tape in the drive:

rsm refresh /lf"Hewlett Packard DDS4 drive"

Sample output:

The command completed successfully.

Eject the tape:

rsm eject /lf"Hewlett Packard DDS4 drive"

Sample output:

The command completed successfully.

Display all the devices in the library:

rsm view /tlibrary /guiddisplay

Sample output:

LIBRARY

Off-line Media FFE025A310D542F7BF6096BBE36FBB49
Hewlett Packard DDS4 drive E5B20A78F69B402592910F2EFBB3FFA8
TSSTcorp CD-ROM TS-H192C D92DC77ABCC64C4BA320D4D9A533F355

The command completed successfully.

Display all media assigned to the specified device, but not in the
device:

rsm view /cgFFE025A310D542F7BF6096BBE36FBB49 /TPhysical_media

Sample output:

PHYSICAL_MEDIA

3
6
Media created 16/06/2006 at 14:14 - 1
Media created 21/08/2006 at 15:28 - 1
Media created 09/07/2007 at 16:38 - 1
C:/system state created 16/04/2008 at 16:33 - 1
Media created 05/07/2008 at 03:39 - 1
Media created 08/07/2008 at 02:24 - 1
Media created 09/07/2008 at 03:41 - 1
Media created 10/07/2008 at 03:43 - 1
Media created 11/07/2008 at 03:11 - 1
Media created 12/07/2008 at 18:32 - 1
Media created 01/07/2008 at 02:17 - 1
Media created 16/07/2008 at 05:08 - 1
Media created 16/07/2008 at 23:34 - 1

The command completed successfully.

Show the current media in the specified device:

rsm view /cgE5B20A78F69B402592910F2EFBB3FFA8 /TPhysical_media

Sample output:

PHYSICAL_MEDIA

Media created 04/07/2008 at 00:08 - 1

The command completed successfully.

[RSM commandline options]
http://www.nukz.net/reference/ntcmd/COMMANDS/R/RSM.HTM

[RSM tips]
http://www.lascon.co.uk/os00906.htm

[RSM articles]
http://labmice.techtarget.com/windows2000/diskmgmt/RSM.htm


OTHERS WITH THE SAME PROBLEM:

[W2k, is using RSM REFRESH and SLEEP in a script]
http://forums.windrivers.com/archive/index.php/t-55196.html

[NT or w2k, is using RSM /view to test for media in the drive]
http://net.mpei.ru/lang/rus/faqw2kxp/jsifaq/rh5955.htm

[2K3, has the 33-minute problem, no fix]
http://www.itnewsgroups.net/group/microsoft.public.windows.server.gene
ral/topic16112.aspx

[2K3+SBS2K3, has the 33-minute problem, no fix, suspects VSS]
http://forums.techarena.in/showthread.php?t=468374

[2K3+SBS2K3, has the 33-minute problem, fixed with RSM REFRESH before
the backup and RSM EJECT afterwards]
http://groups.google.com/group/microsoft.public.windows.server.general
/browse_thread/thread/4c10c86cc81fcaa0/0dbe1b96e5f83ce2?#0dbe1b96e5f83
ce2

[2K3, has the 33-minute problem, no fix]
http://www.mcse.ms/message2458800.html
http://www.eggheadcafe.com/software/aspnet/29453425/ntbackup-schedule-
not-wor.aspx

[2K3, has the 33-minute problem, no fix]
http://techrepublic.com.com/5208-6230-
0.html?forumID=101&threadID=256496&start=0

[2K3, has the 33-minute problem, no fix]
http://www.tech-
archive.net/Archive/Windows/microsoft.public.windows.server.active_dir
ectory/2005-01/1677.html

[2K3, has the 33-minute problem, is testing RSM REFRESH]
http://www.tek-tips.com/viewthread.cfm?qid=1453684&page=1

[2K3, has the 33-minute problem, fixed with RSM REFRESH, also
stops/starts RSS first]
http://www.google.com/translate?u=http%3A%2F%2Fwww.forum-
microsoft.org%2Ftopic80399.html&hl=en&ie=UTF8&sl=fr&tl=en

[NTBackup tips]
http://technet.microsoft.com/en-gb/magazine/cc160782.aspx

[How to schedule unattended backups using a stand-alone tape library]
http://support.microsoft.com/kb/239892

[W2K: Scheduled Backups Fail on Standalone Tape Drives After Changing
Media - says download hotfix]
http://support.microsoft.com/kb/267574/

[W2K: Removable Storage May Not Refresh the Tape - says download
hotfix]
http://support.microsoft.com/kb/330306

[2K3: The NTBackup tool may not run as expected when you scheduled an
NTBackup task in Windows Server 2003 - says incompatible with RDP]
http://support.microsoft.com/kb/927073/


PARTIAL SOLUTION:

(this guy stops and starts RSS beforehand, does a refresh, and also
seems to use some kind of SLEEP command, BUT he
doesn't do an eject afterwards)

net start "Removable Storage"
rem **** EXPECTED 30 seconds ****
Waiting 30
net stop "Removable Storage"
rem **** EXPECTED 30 seconds ****
Waiting 30
Refreshed rem *** on the tape drive
rsm refresh / lf "Hewlett Packard DDS4 drive"
rem **** EXPECTED 30 seconds ****
Waiting 30
....
ntbackup ......


DISCARDED POSSIBLE SOLUTIONS:

(this key does not exist in GPEDIT in Windows 2003?)
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID
=c00023443

The Removable Storage service hangs when you start your Windows 2000
Server
http://www.jsifaq.com/SF/Tips/Tip.aspx?id=4838

(describes errors we see but suggests a firmware change is to blame)
http://support.microsoft.com/default.aspx?scid=kb;en-us;841187

(won't eject, sees our errors once only I think)
http://www.mcse.ms/archive215-2004-8-938052.html

(blames a corrupted RSM database)
http://www.winserverkb.com/Uwe/Forum.aspx/windows-server-
sbs/38705/Tape-Drive-issue

(details how to clear a corrupted RSM database)
http://www.kbalertz.com/Q235032/Problems.Damaged.Removable.Storage.Man
ager.Database.aspx


Thank you Microsoft for wasting 50+ hours of my time on this problem.
Now I'm holding your hand every backup, you work fine. You call
yourself an OS? You are a joke, a toy, a superficial, bug-ridden
leper. At least you don't need 64-bit drivers just to be a print
server! Oh wait, you do...

---
* Origin: [adminz] tech, security, support -
http://cyberdelix.net/adminz/

generated by msg2page 0.06 on Jul 27, 2008 at 13:29:15