[Gllug] Screen-scraping tools (for HTML)?
Benedikt Heinen
beh at icemark.net
Wed Nov 24 00:00:00 UTC 2004
> I'm thinking there must be an easier way ... Does anyone know of any
> tools to help automating / screen scraping pages?
With java I've successfully used nekohtml to parse (dirty) HTML into clean
xml documents, which can then be easily disected with proper XPath
queries... (Your mileage with similar perl/ocaml/... libs may vary, but
I'd look out for a similar setup)...
Benedikt
INFLUENCE, n. In politics, a visionary _quo_ given in exchange
for a substantial _quid_.
(Ambrose Bierce, The Devil's Dictionary)
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list