Recovery from a crash via backup tape- our experience


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Hi,

Now that we are back up and running I wanted to do a summary of our experience for the archive. I know while we were down I was searching the archives to see what others had written, especially trying to find out how long it would take to restore from backup and what if anything didn't recover. The good news, I suppose, is that there isn't much in the archives meaning it hasn't happened often or hasn't been a big enough deal to write about.

On Friday, June 3 about 3:30 pm our campus library suffered an unanticipated power outage. Due to the length of the power outage and Sunday closure I didn't come in to try to bring our Innopac back up until Monday morning, June 6.

I called Innovative and long story short they couldn't get the Innopac up. Turned out a disk in the disk cabinet had gone bad. Mei Ling at Innovative said it looked like the system had gone down gracefully with the powerwatch but it was just one of those things with older hardware. Our Alpha was purchased in Dec. 2001 when we upgraded to Millennium. (I think of it as the disk was old and with the shock of the outage it had a heart attack.)

III overnighted us a new disk cabinet and by 11:30 Tuesday morning it was installed and they were transferring data from our Thursday night full backup. By 1:30pm we could access the catalog. However, all was not well. Lisa Bernard was now handling the call and has done a great job of getting things fixed or routing us to the III staff person who does.

The list so far of thing that didn't work (found when we came up 6/7 unless noted)
* When we brought up Millennium the icons on the nav bar and edit bar were gone (fixed 6/8)
* All login manager controlled options and preferences were gone (templates, macros, initials associated with login, Data Exchange options)
* It wouldn't let me recreate the login preferences (discovered Wed. 6/8; fixed same day)
* The millennium control bar didn't come up (fixed 6/8)
* In milcirc the circ desk and course reserve modes were totally gone and it didn't remember the user's initials as it moved between the modes that were left in circulation (fixed 6/9)
* We couldn't get into the manual (discovered Thur. 6/9, fixed same day)
* The InnReach mode wasn't working properly. When you clicked on InnReach in the nav bar the nav bar changed to the InnReach options but the main screen didn't change (discovered Friday 6/10 fixed Mon. 6/13)
* Print to email didn't work; the emails bounced (discovered Friday 6/10, fixed same day)
* Templates weren't prompting us for variable fields and wouldn't let us edit the fields to prompt (discovered Wed. 6/15- see below for one reason this took so long to find) (still outstanding)
At first I was told the Login Preferrences were gone so I'd just have to recreate them and I was thinking that was odd but normal. Finally on Thursday Daven, who fixed the lost circulation desk problem by adding the modes back, told me that "You are an older site (we came up in 1992) and the backup wasn't including one directory"! That directory included the login preferences and I'm guessing the two circ modes.

When they called to report that access to the manual was fixed I was told our old disks were so small they had moved the manual to a different location. When they restored our backup into the new, larger disks they put them in the normal location but the pointers were still pointing to the old location. Based on this I'm guessing the relationship between being an old site and not having something backed up is that we had a directory structure different than we would have had as a newer site and somewhere along the line that wasn't communicated to whoever sets up which directories are backed up.

Restoring from the backup tape meant we lost everything we had done on Friday . Besides just the local implications of figuring out what we had to re-catalog or what items had circulated, we had the added complication that we are an InnReach site; thus our holdings are mirrored on the central union catalog, Summit. These records are updated in (almost) real time so Summit accurately reflected what had been done for most of Friday. I was hoping we could get a copy of items added on Friday that were in Summit and just dump them back into our catalog. But of course it was more complicated than that.

Another long story short we could not get copies of the records dumped back and we actually couldn't catalog anything new either. On Tuesday Tim Auger, manager of Union Database Services at III, realized if we cataloged anything we might re-use item numbers that had already been used on Friday and these would overlay the existing records in Summit. Tim was able to run a list of items that had changed on 6/3 so we could check it against our catalog. However we didn't get this until Friday 6/10 and we had to then check each of the items to determine whether they were items that had been cataloged, checked in or checked out etc. With other things going on and still finding the little things wrong that we dealt with we didn't finish the list until Monday. We notified Tim which were the new items and he removed them from Summit and gave us the okay to start adding new item records Tuesday morning June 14.

So basically it took us six business days to get back to normal with no new cataloging done in that time. And actually after nine days we still have the one outstanding issue and I'm wondering if there is anything else we haven't found yet. Luckily it is summer term.

Among the 20/20 hindsight things:
* Have a list of things to check after a crash. In the middle of everything it is hard to think of all the possible things to check. I never even thought of print to email for instance. I still think I must have missed some things. While of course not everything could be tested, such as the fiscal close we will do in the next few weeks, most things could. Perhaps we could brainstorm one for the Clearinghouse?
* Print out p. 105281 of the manual regarding Responding to Power Failure so it is handy to consult when a power outage occurs.
Sorry for the length of this message,

Sue

**The opinions expressed above are mine and not those of Collins Memorial Library or the University of Puget Sound

Sue Boggs
Cataloging & Library Technician
Technical Services

Library
University of Puget Sound
1500 N. Warner St. #1021
Tacoma, WA 98416-1021

(253) 879-2667
boggs at ups dot edu


--- StripMime Report -- processed MIME parts ---
multipart/alternative
text/plain (text body -- kept)
text/html
---