[Liverpool] Software patent data

tony burrows tony at tonyburrows.uklinux.net
Fri Mar 31 17:12:29 BST 2006


After the last meeting I started playing with the idea of getting data 
from the patent office.  Gave me an excuse to learn Python as well.

Now currently a can take a downloaded page and grab the relevant data 
into an xml file.  I know how to use python to stuff it into MySQL, but 
I have hit a couple of problems.

First, I'm not sure how to navigate around pages automatically so that I 
can grab stuff without having to do it all manually through a browser.
Second, the search terms I'm using are vague at best - software, 
programming, computer.
Third, I wanted to grab the actual patent doc, then do a word-count.  
The website  provides this with some off sort of extension that firefox 
doesn't seem to handle (it's actually pdf and neither Konqueror or Opera 
have problems with it).  Worst of all, all you get is a single page at a 
time, which doesn't seem to want to save (when I tried and reopened 
there was nothing there).
Any suggestions?

Tony



More information about the Liverpool mailing list