Re: [IUG] Curious searching behavior


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
We've experienced this as well, three times since the middle of April.

The IP addresses in each case turn out to be in China, in or near the city of Beijing. (That could be redirected by a hacker.) In each case they show they're using Windows NT and IE 5.x, which are otherwise pretty rare among our users.

The script being used takes about 3 days to run on our server, harvesting about 70-90,000 marc records a day. It's just long enough that by the time I can download server logs and detect it (and block that IP) they've finished. I did succeed in cutting off this access once, when they were 3/4 of the way done.

We've also been puzzled about the reason for this harvesting. Is someone making their own version of OCLC? Or cataloging their own collection of English language books? Is it a library school project?

I saw a version of this same thing many months ago, coming from an IP address in Florida. It occurs to me now that it may have been a dry run for whatever is happening now. It's non-destructive, though it does double our traffic (page views) for days when it's happening. It hasn't slowed our system noticeably, but I worry that two or three of these running at the same time probably would. (Before our server replacement in Aug.'10, this would have shut us down, as we couldn't support 50,000 page views a day. Now it seems we can do many times that.)

I've tried to see this traffic in real time using "List non-local access attempts allowed" in telnet (A/A/L/N) and it's possible, but that only seems to show one entry for every 50 or 100 hits from this script. It is also very time consuming to check this every day. We considered blocking all IP addresses coming from China, which seems to be a fairly common IT strategy, but it's a big messy list to input using this telnet interface, and one of the IPs that this script came from wouldn't have been on any of the current lists, which demonstrates that these barn doors are almost impossible to lock ahead of time. We don't have our own firewall, though I should discuss this with our network provider.

Our normal page views per day (PVD) on the catalog run from 20,000 (xmas day) to 52,000, so seeing 75,000 or 90,000 pvd really got my attention.

Dan McMahon
Tech System Specialist
MARINet, Novato CA
http://marinet.lib.ca.us


-----Original Message-----
From: innopac-bounces at innovativeusers dot org [mailto:innopac-bounces at innovativeusers dot org] On Behalf Of Linda West
Sent: Wednesday, June 01, 2011 8:45 AM
To: innopac at innovativeusers dot org
Subject: [IUG] Curious searching behavior

I just pulled patron search statistics for May and have noticed some curious
behavior. It appears that someone has accessed/harvested? over 150,000 of
our MARC records by displaying the MARC record and then using the next
record button to go between records. This happened every Friday in May
early in the morning. Has anyone else experienced this or does anyone have
an idea of what has happened to us?


--
--
Linda H. West
Technical Services Director/Asst. Professor
John Vaughan Library
Northeastern State University
Tahlequah OK 74464
918-444-3280
west at nsuok dot edu

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



--- StripMime Report -- processed MIME parts ---
multipart/alternative
text/plain (text body -- kept)
text/html
---
--
This message was distributed through the Innovative Users Group INNOPAC list
Public replies: INNOPAC at innovativeusers dot org
You are currently subscribed to innopac as: dmcmahon at marinet dot info dot
Update your subscription options: http://innovativeusers.org/iug-discussion-list

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.