[Gllug] Screen-scraping tools (for HTML)?

Richard Jones rich at annexia.org
Tue Nov 23 22:11:55 UTC 2004


Our current project involves "automating" Google Adwords.  I'm doing
this by screen-scraping the HTML (LWP + HTML::TreeBuilder + lots of
OCaml glue code).  Google's HTML is hideous - so hideous in fact that
HTML::TreeBuilder misparses a lot of it, resulting in nasty
workarounds all over the place.

I'm thinking there must be an easier way ...  Does anyone know of any
tools to help automating / screen scraping pages?

Rich.

-- 
Richard Jones.  http://www.annexia.org/  http://www.j-london.com/
>>>   http://www.team-notepad.com/ - collaboration tools for teams   <<<
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
http://subjectlink.com/ - Lesson plans and source material for teachers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20041123/3405eeac/attachment.pgp>
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


More information about the GLLUG mailing list