[Sussex] Apache throttling/fair use

Alan Pope alan.pope at gmail.com
Sun Jan 23 22:02:47 UTC 2005


On Sun, 23 Jan 2005 21:55:11 +0000, Steve Dobson <steve at dobson.org> wrote:
> Guys
> Do any of you know of a apache module that I could use to monitor (or
> two) that I could use to do the following (or part there of):

Your robots.txt should be a good starting place for this.

> 2). Stop web crawlers (like wget) from just grabbing the site's
>     content.

e.g. in your robots.txt in the root of the website you could put
something like this..

User-agent: WGet
Disallow: /

User-agent: Wget
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: WebZIP/3.6
Disallow: /

User-agent: Plucker/Py-1.0
Disallow: /

User-agent: WebCapture
Disallow: /

User-agent: WebCopy
Disallow: /

User-agent: WebCopy/0.98b7
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: WebCopier v2.8
Disallow: /

User-agent: WebCopier v3.0
Disallow: /

User-agent: Mass Downloader
Disallow: /

User-agent: Mass Downloader/2.2
Disallow: /

User-agent: Web Downloader
Disallow: /

User-agent: Web Downloader/2.9
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Offline Explorer/2.5
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebStripper/2.14
Disallow: /

etc..

Cheers,
Al.




More information about the Sussex mailing list