[Nottingham] Web page scraping

Camilo Mesias camilo at mesias.co.uk
Tue Jul 31 15:45:08 BST 2007


It depends a lot on the quality of the HTML that you are looking at.
If it is simple and line based you may be able to treat it like a text
file with grep and similar tools. Otherwise you might be better off
using Perl and some of the many modules for pulling apart HTML and
automating web access (eg. WWW::Mechanize )

http://www.perl.com/pub/a/2003/01/22/mechanize.html

http://search.cpan.org/~petdance/WWW-Mechanize-1.30/lib/WWW/Mechanize.pm

-Cam



More information about the Nottingham mailing list