[Liverpool] Software patent data
Julian Todd
julian at goatchurch.org.uk
Sun Apr 2 11:27:49 BST 2006
tony burrows wrote:
> After the last meeting I started playing with the idea of getting data
> from the patent office. Gave me an excuse to learn Python as well.
>
> Now currently a can take a downloaded page and grab the relevant data
> into an xml file. I know how to use python to stuff it into MySQL,
> but I have hit a couple of problems.
>
> First, I'm not sure how to navigate around pages automatically so that
> I can grab stuff without having to do it all manually through a browser.
All you need is urllib.urlopen(), read(), urlparse.urljoin() and some
regexp knowledge to get whatever you want from the internet, spider
around it, and capture the data.
http://docs.python.org/lib/module-urllib.html
That's how I've done it for the whole of publicwhip. Arrange a date
from me if you want to know how to get started. The technical term for
what you are trying to do is making the data accessible. So,
downloading all the data, adding a proper search engine, and reposting
it in a useable form is not violating the copyright, it's making it
accessible for people who can't handle their interface. Or so goes the
argument. It hasn't been tested in court, but the moral defense is: if
the patent office is willing to take on these improved capabilities
which people need, then you will take your website down. It should be
as legal as caching the webpages for quicker access.
Julian T.
More information about the Liverpool
mailing list