Re: [IUG] URL Checker - who does the work, how long odes it take, etc?


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Dear Charis,

I do the bulk of the work with the URL checker for the HELIN Consortium, which consists of 10 academic ad 14 hospital libraries. I work in the Consortium office supporting the libraries in cataloging, authority control, ERM and other activities.

Over time, you learn what categories of URL's are flagged every week as problems that are, in fact, fine. A very useful tool in the Millennium URL Checker module is the 'URL block' which prevents certain URL's from being checked. Here I list those servers which ALWAYS show up on the report such as purl.gpo.gov (because they all move to another URL) and resources going through our proxy server that are subscribed to by HELIN member libraries, including ebook collections (ebrary.com, hdl.handle.net for the ACLS collection, oxfordreference.com, xreferplus.com, and so on). I currently have about 150 entries in this file. This is, of course, not ideal since some URL's that no longer work may be blocked from being checked, but I was able to reduce the number of flagged URL's from over 10,000 to a couple of hundred for me to actually work on. Since most of the member libraries subscribe to these paid resources through a process I manage, I'm aware of when a new resource is added or cancelled.

For us, the checker runs automatically in the early hours of Tuesday morning. Each week, I move the file of bibliographic records generated by this process into a review file for further review. I sort this file a couple of different ways (by cat. date, material type, bib. level) to see what shows up for the first time. I can often resolve several of the ones for bibs. on order, videos, and serials without going into the URL Checker module itself. Libraries may create new review files from this one for their own bib. records to check them. Some do, some don't.

Most of the questionable URL's in my report are for US government documents, probably not surprisingly. Several of our libraries, but not all, use Marcive to receive bib. records for their repositories. I asked Marcive to only send us the purls if the bib. record contains but a purl and the actual URL; the latter often changed.

About once a month I use the URL Checker module itself, where I look most closely at bibs. with 404 (not found) and 301 (moved permanently) errors. For those web pages where I find an exact match for the resource cataloged, I correct the URL myself. For resources where I cannot reasonably and easily identify the correct new URL, I send an email to the head of cataloging for that library. That person may consult with the selector/liaison, find the new site for the resource themselves, or delete their holdings from OCLC and the bib. record from the catalog. I spend about 30 minutes weekly and 2 hours once a month doing this work.

For several of the reports (moved permanently, moved temporarily, not found, etc.) the program identifies a possible new URL. If that URL is the correct one, you can check it off and use the program to replace the URL in the bib. record. One issue I have is that the bib. record itself doesn't show unless you highlight it and click on edit or bring it up in the web; it's crucial to check the bibliographic information before making any changes to the URL.

Currently the program only looks at URL's in the bib. record, not attached records (item and holdings records) nor ERM resource records. As a Functional Expert for the Innovative User Group responsible for review enhancement requests for the URL checker, I can say that many want III to expand the reach of this program to URL's wherever they appear.

I hope this helps,

--Martha

Martha Rice Sanders
Knowledge Management Librarian,
The HELIN Consortium
401-874-4951
msanders at etal dot uri dot edu

At 07:28 PM 1/7/2009, you wrote:
Hi-
The UC Berkeley library is half way through a Millennium implementation
and we are interested in the URL checker module, but we have some
questions. If your library uses the URL checker, we'd be curious to know
a few things about how you use it:

- Who resolves the dead links? Reference staff? Technical Services
staff? Other? It seems like it would take reference skills to hunt down
the new replacement link.

- How do you resolve them? Web searching and seeing how many you can
find? Some systematic approach? Or do you simply remove the dead link
without an attempt to replace it? I know that if there's a redirect, the
product gives you that information in the report.

- Any sense of how many you process (resolve successfully or just
delete) in a given month, as compared to the number of URLs in your
catalog? Any sense of how long this takes (ie, how many hours a month
your staff devote to this task)?

We've bought but not yet implemented the ERM product also - that comes
at the end of our implementation project. So if there are wrinkles that
pertain to ERM sites I'd be glad to know those too.

Many thanks,
Charis Takaro
UC Berkeley Library Integrated System Manager
--
This message was distributed through the Innovative Users Group INNOPAC list
Public replies: INNOPAC at innopacusers dot org
Update your subscription options: http://innopacusers.org/mailman/listinfo/innopac