[Liverpool] Software patent data
tony burrows
tony at tonyburrows.uklinux.net
Fri Mar 31 17:12:29 BST 2006
After the last meeting I started playing with the idea of getting data
from the patent office. Gave me an excuse to learn Python as well.
Now currently a can take a downloaded page and grab the relevant data
into an xml file. I know how to use python to stuff it into MySQL, but
I have hit a couple of problems.
First, I'm not sure how to navigate around pages automatically so that I
can grab stuff without having to do it all manually through a browser.
Second, the search terms I'm using are vague at best - software,
programming, computer.
Third, I wanted to grab the actual patent doc, then do a word-count.
The website provides this with some off sort of extension that firefox
doesn't seem to handle (it's actually pdf and neither Konqueror or Opera
have problems with it). Worst of all, all you get is a single page at a
time, which doesn't seem to want to save (when I tried and reopened
there was nothing there).
Any suggestions?
Tony
More information about the Liverpool
mailing list