[Gllug] Acceptable HTML
Simon Stewart
sms at lateral.net
Fri Jun 14 09:21:00 UTC 2002
On Fri, Jun 14, 2002 at 08:37:03AM +0100, tet at accucard.com wrote:
>
> >Without downloading and installing a massive office suite, if some
> >clueless soul has decided to "save as HTML" from Word 2000, what's the
> >best method to get it into a state that HTML Tidy is happy to look at?
> >
> >This particular document is scattered with tables and diagrams, and so
> >far neither Abiword or KWord has been even vaguely pleased to see
> >it....
>
> Mozilla (and Mozilla's composer) will happily load it. Of course that
> doesn't help you convert it to a usable format...
Mozilla does indeed load it and I've been using it as my editor, but
MS have done something very weird to their tables which I'm not too
impressed with. Grief! How hard can it be to generate acceptable HTML
from a word processor? Given that this is HTML and not PDF, I don't
care whether _all_ the formatting is in there: the words and the
annotations would be enough.
> >ObLinux: I only ask here because I want to be able to edit it properly
> >in emacs/vi on my Linux desktop machine. :)
>
> You can get it to a usable state (at least one that can be further fixed
> with HTML tidy) by using:
>
> sed 's,<o:p></o:p>,,' file.mshtml > file.html
Will try this. Thanks, Tet!
Cheers,
Simon
--
... though the Japanese must be the most stupid people... I'm sure I
read somewhere that Tokyo has the densest population in the world...
- Gid Holyoake, sdm.
--
Gllug mailing list - Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug
More information about the GLLUG
mailing list