[Gllug] Acceptable HTML

tet at accucard.com tet at accucard.com
Fri Jun 14 07:37:03 UTC 2002

>Without downloading and installing a massive office suite, if some
>clueless soul has decided to "save as HTML" from Word 2000, what's the
>best method to get it into a state that HTML Tidy is happy to look at?
>This particular document is scattered with tables and diagrams, and so
>far neither Abiword or KWord has been even vaguely pleased to see

Mozilla (and Mozilla's composer) will happily load it. Of course that
doesn't help you convert it to a usable format...

>ObLinux: I only ask here because I want to be able to edit it properly
>in emacs/vi on my Linux desktop machine. :)

You can get it to a usable state (at least one that can be further fixed
with HTML tidy) by using:

	sed 's,<o:p></o:p>,,' file.mshtml > file.html


Gllug mailing list  -  Gllug at linux.co.uk

More information about the GLLUG mailing list