[Gllug] pdf conversion to HTML

Richard Cohen richard at vmlinuz.org
Thu Jan 31 13:20:01 UTC 2002


On Thu, 31 Jan 2002, will wrote:

> I am sure there is one but I can't remember the name.  Is there a
> command line tool for converting pdf files to HTML?

Um...

pdf2html? :-)

It's based off xpdf, but the version of pdf2html out there is built against
an old version of xpdf.  I hacked it into the source for an up-to-date xpdf
because I wanted a) better PDF parsing and b) to turn off recognition of the
"don't copy" bit in the headers.[*]

I can't actually find a homepage for the pdf2html I've got, and my home
machine is turned off, so I can't look at it right now.  There does appear
to be *another* pdf2html, which simply does a dump of the PDF to images (one
image per page) and makes HTML which displays the page.  That's *not* the
one I've got at home, which actually converts PDF text to HTML text, pretty
well.

> Will.

Cheers
Richard

[*] I don't think there's anything wrong with buying a PDF and 'ripping' it
to HTML so I can view it on my palmtop.  For personal use only...


-- 
Gllug mailing list  -  Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug




More information about the GLLUG mailing list