[Gllug] Screen-scraping tools (for HTML)?
J F
jnns at linuxmail.org
Fri Nov 26 12:49:13 UTC 2004
> Our current project involves "automating" Google Adwords. I'm doing
> this by screen-scraping the HTML (LWP + HTML::TreeBuilder + lots of
> OCaml glue code). Google's HTML is hideous - so hideous in fact that
> HTML::TreeBuilder misparses a lot of it, resulting in nasty
> workarounds all over the place.
> I'm thinking there must be an easier way ... Does anyone know of any
> tools to help automating / screen scraping pages?
Use HTML Tidy (http://tidy.sourceforge.net) to clean the messy HTML before you try parsing it.
Cheers
--
______________________________________________
Check out the latest SMS services @ http://www.linuxmail.org
This allows you to send and receive SMS through your mailbox.
Powered by Outblaze
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list