[Wylug-discuss] advice on ebook reader or similar...

Dave Fisher wylug-discuss at davefisher.co.uk
Sat Sep 24 21:27:09 UTC 2011


On 24 September 2011 20:51, Jim Jackson <jj at franjam.org.uk> wrote:
> generated a lot of discussion. I've googled around a bit more, and to be
> honest, everyone seems to knock ereader's abilities with PDFs.

The problem isn't the e-readers, it's the PDF format, and it's lack of
really quite basic metadata.

PDF is, in one respect a kind of anti-XML ... all the markup is
dedicated to ensuring that the print/screen version looks right,
virtually nothing to ensuring that content is semantically meaningful
or translatable.

> re. kindle...
> On Thu, 22 Sep 2011, Dave Fisher wrote:
>> 2. It doesn't re-flow PDF text ... not a problem if you have the A4
>> Kindle, and mostly surmountable by (imperfect) conversion on the
>> smaller device.
>
> conversion to what format? Have you converted many PDFs? What's the
> gotchas? From my browsing, it appears to be graphics in PDFs that cause the
> main problems.

I have converted lots of PDFs to epub using Calibre.

An epub is, basically, a zipped-up collection of HTML files. With
separate HTML files for front-matter, ToCs, volumes, indexes, etc.

Generally speaking, text flow, hypertext, and embedded objects like
images and tables are handled fairly well.

Obviously you lose page-layout.

Irritatingly, you don't lose the page headers and footers, but you can
set up regexp filters to take them out.

I haven't had time (or rather haven't bothered) to automate that yet
... vim does the job.

Dave



More information about the Wylug-discuss mailing list