[Nottingham] Web page scraping

Tue Jul 31 15:45:08 BST 2007

It depends a lot on the quality of the HTML that you are looking at.
If it is simple and line based you may be able to treat it like a text
file with grep and similar tools. Otherwise you might be better off
using Perl and some of the many modules for pulling apart HTML and
automating web access (eg. WWW::Mechanize )

http://www.perl.com/pub/a/2003/01/22/mechanize.html

http://search.cpan.org/~petdance/WWW-Mechanize-1.30/lib/WWW/Mechanize.pm

-Cam