[Nottingham] Web page scraping

Michael Erskine msemtd at yahoo.co.uk
Tue Jul 31 15:37:53 BST 2007

On Tuesday 31 July 2007 15:16:57 Martin Garton wrote:
> On Tue, 2007-07-31 at 15:14 +0100, Martin wrote:
> > "Web page scraping":
> >
> > Anyone recommend any software for extracting info/tables from html and
> > web pages?
> wget, awk, sed, grep.
Perl (naturally!) but depending on complexity of page "getter" (do you have 
authentication? cookies?), lynx(1), wget(1), or LWP::UserAgent(3). Then 
dependent on complexity and variability of tabled data, a simple (or not so 
simple) regex, HTML::TableExtract(3), HTML::Filter(3), HTML::TokeParser(3), 

Michael Erskine.

Whether you can hear it or not,
The Universe is laughing behind your back.
		-- National Lampoon, "Deteriorata"

All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine 

More information about the Nottingham mailing list