[Liverpool] Software patent data

Dave Brotherstone davegb at pobox.com
Sun Apr 2 08:30:39 BST 2006


In terms of spidering round a site, have you tried wget?  wget --help should
give you all you need to know, and it's got a pretty flexible set of options
as to how far to go.

I don't know about the pdf save issues though.

hth,
Dave.

On 31/03/06, tony burrows <tony at tonyburrows.uklinux.net> wrote:
>
> After the last meeting I started playing with the idea of getting data
> from the patent office.  Gave me an excuse to learn Python as well.
>
> Now currently a can take a downloaded page and grab the relevant data
> into an xml file.  I know how to use python to stuff it into MySQL, but
> I have hit a couple of problems.
>
> First, I'm not sure how to navigate around pages automatically so that I
> can grab stuff without having to do it all manually through a browser.
> Second, the search terms I'm using are vague at best - software,
> programming, computer.
> Third, I wanted to grab the actual patent doc, then do a word-count.
> The website  provides this with some off sort of extension that firefox
> doesn't seem to handle (it's actually pdf and neither Konqueror or Opera
> have problems with it).  Worst of all, all you get is a single page at a
> time, which doesn't seem to want to save (when I tried and reopened
> there was nothing there).
> Any suggestions?
>
> Tony
>
> _______________________________________________
> Liverpool mailing list
> Liverpool at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/liverpool
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.lug.org.uk/pipermail/liverpool/attachments/20060402/6fd7fd36/attachment.html


More information about the Liverpool mailing list