Recovery from a crash via backup tape- our experience
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
- Date: Thu, 16 Jun 2005 17:22:14 -0700
- From: Sue Boggs <boggs at ups dot edu>
- Subject: Recovery from a crash via backup tape- our experience
Hi,
Now that we are back up and running I wanted to do a summary of our
experience for the archive. I know while we were down I was searching the
archives to see what others had written, especially trying to find out how
long it would take to restore from backup and what if anything didn't
recover. The good news, I suppose, is that there isn't much in the archives
meaning it hasn't happened often or hasn't been a big enough deal to write
about.
On Friday, June 3 about 3:30 pm our campus library suffered an
unanticipated power outage. Due to the length of the power outage and
Sunday closure I didn't come in to try to bring our Innopac back up until
Monday morning, June 6.
I called Innovative and long story short they couldn't get the Innopac up.
Turned out a disk in the disk cabinet had gone bad. Mei Ling at Innovative
said it looked like the system had gone down gracefully with the powerwatch
but it was just one of those things with older hardware. Our Alpha was
purchased in Dec. 2001 when we upgraded to Millennium. (I think of it as
the disk was old and with the shock of the outage it had a heart attack.)
III overnighted us a new disk cabinet and by 11:30 Tuesday morning it was
installed and they were transferring data from our Thursday night full
backup. By 1:30pm we could access the catalog. However, all was not
well. Lisa Bernard was now handling the call and has done a great job of
getting things fixed or routing us to the III staff person who does.
The list so far of thing that didn't work (found when we came up 6/7 unless
noted)
* When we brought up Millennium the icons on the nav bar and edit bar
were gone (fixed 6/8)
* All login manager controlled options and preferences were gone
(templates, macros, initials associated with login, Data Exchange options)
* It wouldn't let me recreate the login preferences (discovered Wed.
6/8; fixed same day)
* The millennium control bar didn't come up (fixed 6/8)
* In milcirc the circ desk and course reserve modes were totally gone
and it didn't remember the user's initials as it moved between the modes
that were left in circulation (fixed 6/9)
* We couldn't get into the manual (discovered Thur. 6/9, fixed same day)
* The InnReach mode wasn't working properly. When you clicked on
InnReach in the nav bar the nav bar changed to the InnReach options but the
main screen didn't change (discovered Friday 6/10 fixed Mon. 6/13)
* Print to email didn't work; the emails bounced (discovered Friday
6/10, fixed same day)
* Templates weren't prompting us for variable fields and wouldn't let
us edit the fields to prompt (discovered Wed. 6/15- see below for one
reason this took so long to find) (still outstanding)
At first I was told the Login Preferrences were gone so I'd just have to
recreate them and I was thinking that was odd but normal. Finally on
Thursday Daven, who fixed the lost circulation desk problem by adding the
modes back, told me that "You are an older site (we came up in 1992) and
the backup wasn't including one directory"! That directory included the
login preferences and I'm guessing the two circ modes.
When they called to report that access to the manual was fixed I was told
our old disks were so small they had moved the manual to a different
location. When they restored our backup into the new, larger disks they put
them in the normal location but the pointers were still pointing to the old
location. Based on this I'm guessing the relationship between being an old
site and not having something backed up is that we had a directory
structure different than we would have had as a newer site and somewhere
along the line that wasn't communicated to whoever sets up which
directories are backed up.
Restoring from the backup tape meant we lost everything we had done on
Friday . Besides just the local implications of figuring out what we had
to re-catalog or what items had circulated, we had the added complication
that we are an InnReach site; thus our holdings are mirrored on the central
union catalog, Summit. These records are updated in (almost) real time so
Summit accurately reflected what had been done for most of Friday. I was
hoping we could get a copy of items added on Friday that were in Summit and
just dump them back into our catalog. But of course it was more complicated
than that.
Another long story short we could not get copies of the records dumped back
and we actually couldn't catalog anything new either. On Tuesday Tim Auger,
manager of Union Database Services at III, realized if we cataloged
anything we might re-use item numbers that had already been used on Friday
and these would overlay the existing records in Summit. Tim was able to run
a list of items that had changed on 6/3 so we could check it against our
catalog. However we didn't get this until Friday 6/10 and we had to then
check each of the items to determine whether they were items that had been
cataloged, checked in or checked out etc. With other things going on and
still finding the little things wrong that we dealt with we didn't finish
the list until Monday. We notified Tim which were the new items and he
removed them from Summit and gave us the okay to start adding new item
records Tuesday morning June 14.
So basically it took us six business days to get back to normal with no new
cataloging done in that time. And actually after nine days we still have
the one outstanding issue and I'm wondering if there is anything else we
haven't found yet. Luckily it is summer term.
Among the 20/20 hindsight things:
* Have a list of things to check after a crash. In the middle of
everything it is hard to think of all the possible things to check. I never
even thought of print to email for instance. I still think I must have
missed some things. While of course not everything could be tested, such as
the fiscal close we will do in the next few weeks, most things could.
Perhaps we could brainstorm one for the Clearinghouse?
* Print out p. 105281 of the manual regarding Responding to Power
Failure so it is handy to consult when a power outage occurs.
Sorry for the length of this message,
Sue
**The opinions expressed above are mine and not those of Collins Memorial
Library or the University of Puget Sound
Sue Boggs
Cataloging & Library Technician
Technical Services
Library
University of Puget Sound
1500 N. Warner St. #1021
Tacoma, WA 98416-1021
(253) 879-2667
boggs at ups dot edu
--- StripMime Report -- processed MIME parts ---
multipart/alternative
text/plain (text body -- kept)
text/html
---