[Sussex] Converting OpenOffice documents to XHTML

Wed Nov 10 16:53:33 UTC 2004

Geoff

On Wed, Nov 10, 2004 at 04:34:50PM +0000, Geoff Teale wrote:
> Steve,
> 
> More detail please .. I'm now very experienced in coding in
> OpenOffice.org but I'm not quite clear about what you are trying to do
> here.

They currently have paper based process backed up by a manual.  They are
currently expanding the manual and with lots of new stuff but we have
it on the idea of converting this to a "web application".

The manual is going to be converted into a set of XHTML pages. 
Rather than have them cut & past the text into the database in a slow,
mandrollic process I would much prefer to extract on the structure
of the document (now in OpenOffice) and load that data into the database.

For example where the document reads:

  1.1   Heath and Safety

  1.1.2.  First Aid Kits

        The first aid kids need to be checked weekly to ensure each is
        stocked with the appropriate stuff.

I would like to turn that into something like:

  <section>
     <title>Heath and Safety</title>
     <subsection>
        <title>First Aid Kits</title>
        <text>
            The first aid kids need to be checked weekly to ensure each is
            stocked with the appropriate stuff.
        </text>
     </subsection>
  </section>

That way I could then parse this new XML using PHP extract the various
bits and insert them into the appropriate rows and columns in the database.

I haven't yet designed the database so I am flexible on who this is best
to be done.  I'm looking here for the best way of doing this.  I've seen
stuff on XML style conversion, but is this the way to go?

Steve