[Nottingham] Web page scraping
Michael Erskine
msemtd at yahoo.co.uk
Tue Jul 31 15:37:53 BST 2007
On Tuesday 31 July 2007 15:16:57 Martin Garton wrote:
> On Tue, 2007-07-31 at 15:14 +0100, Martin wrote:
> > "Web page scraping":
> >
> > Anyone recommend any software for extracting info/tables from html and
> > web pages?
>
> wget, awk, sed, grep.
Perl (naturally!) but depending on complexity of page "getter" (do you have
authentication? cookies?), lynx(1), wget(1), or LWP::UserAgent(3). Then
dependent on complexity and variability of tabled data, a simple (or not so
simple) regex, HTML::TableExtract(3), HTML::Filter(3), HTML::TokeParser(3),
etc.
Regards,
Michael Erskine.
--
Whether you can hear it or not,
The Universe is laughing behind your back.
-- National Lampoon, "Deteriorata"
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html
More information about the Nottingham
mailing list