[Nottingham] Web page scraping

Michael Erskine msemtd at yahoo.co.uk
Tue Jul 31 15:37:53 BST 2007


On Tuesday 31 July 2007 15:16:57 Martin Garton wrote:
> On Tue, 2007-07-31 at 15:14 +0100, Martin wrote:
> > "Web page scraping":
> >
> > Anyone recommend any software for extracting info/tables from html and
> > web pages?
>
> wget, awk, sed, grep.
 
Perl (naturally!) but depending on complexity of page "getter" (do you have 
authentication? cookies?), lynx(1), wget(1), or LWP::UserAgent(3). Then 
dependent on complexity and variability of tabled data, a simple (or not so 
simple) regex, HTML::TableExtract(3), HTML::Filter(3), HTML::TokeParser(3), 
etc.

Regards,
Michael Erskine.

-- 
Whether you can hear it or not,
The Universe is laughing behind your back.
		-- National Lampoon, "Deteriorata"




	
	
		
___________________________________________________________ 
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine 
http://uk.docs.yahoo.com/nowyoucan.html




More information about the Nottingham mailing list