[Nottingham] Web page scraping
Camilo Mesias
camilo at mesias.co.uk
Tue Jul 31 15:45:08 BST 2007
It depends a lot on the quality of the HTML that you are looking at.
If it is simple and line based you may be able to treat it like a text
file with grep and similar tools. Otherwise you might be better off
using Perl and some of the many modules for pulling apart HTML and
automating web access (eg. WWW::Mechanize )
http://www.perl.com/pub/a/2003/01/22/mechanize.html
http://search.cpan.org/~petdance/WWW-Mechanize-1.30/lib/WWW/Mechanize.pm
-Cam
More information about the Nottingham
mailing list