[Gllug] Screen-scraping tools (for HTML)?

Benedikt Heinen beh at icemark.net
Wed Nov 24 00:00:00 UTC 2004


> I'm thinking there must be an easier way ...  Does anyone know of any
> tools to help automating / screen scraping pages?

With java I've successfully used nekohtml to parse (dirty) HTML into clean 
xml documents, which can then be easily disected with proper XPath 
queries...   (Your mileage with similar perl/ocaml/... libs may vary, but 
I'd look out for a similar setup)...



   Benedikt

 	INFLUENCE, n.  In politics, a visionary _quo_ given in exchange
 	  for a substantial _quid_.
 			(Ambrose Bierce, The Devil's Dictionary)
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list