[Nottingham] agent idents for web strippers and email harvesters

leigh at rylands-internet-solutions.co.uk leigh at rylands-internet-solutions.co.uk
Sun Oct 19 22:48:59 BST 2003


listers.

In a  moment of lucidity a rather simple but effective ploy to claw back some 
(although probably not all) of the battle back from the low lifes who strip web sites and 
use email harvesting software to get email addresses for spamming occurred to me.

I have had this running on a number of sites for the past two days (only had basic 
idea Friday morning) and it has already turned away an email harvester from one of 
the sites.

While this system is building up an extensive database of bonafide browser types 
what I really need is some HTTP_USER_AGENT strings for as many web strippers 
and email harvesters as I can find.
I currently have:

WebStripper/2.58
autoemailspider
autoemailspider
EmailWolf 1.00
Mozilla/4.0 (compatible; Advanced Email Extractor v2.xx)
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent

... but could do with more.

If you have a site that has been visited by one of these agents then please let me 
have the HTTP_USER_AGENT identifier.

If any one is interested in having their sites protected with this system then I will 
probably be in a position to add other sites to the network in a few weeks.
At moment only PHP pages can be protected although will extend this to ASP and 
ColdFusion shortly.

I am currently working on refining the details of the service this is providing (eg auto 
redirecting for WAP phones, WebTV etc) as well as developing a bit of load balancing 
and redundancy by setting up a chain of servers that can handle the queries should 
the one server that is running the system at the moment go down or suffer a load 
problem.

Regards

Leigh Silvester



More information about the Nottingham mailing list