[Gllug] belated comment on XML

Russell Howe rhowe at wiss.co.uk
Wed Jul 7 01:05:36 UTC 2004


On Tue, Jul 06, 2004 at 06:58:03PM +0100, t.clarke wrote:
> Having been forced to extract data from an XML shipping-manifest and thus
> figure out some of XML's complexities, I can't help feeling that the language
> suffers from being designed by committee.  What, on the surface seems an
> elegant concept, has so many extra complex rules thrown in, that it makes it
> on par with EDI to get to grips with.  My first reaction was to regard it is
> as computerised verbal diarrhoea  - the actual content of a message often being
> somewhat less in size than the tags that surround it.

Possibly a sign that the XML language you're using is badly designed, or
that the data you're dealing might be minimal in itself, but has an
awful lot of metadata.

> As for being able to
> simply 'write' XML - what about the special characters that have to be replaced
> by 'entities' ?

Entities are just a form of escape sequence. A good form, in that
they're well-defined, a bad form in that they have opening and closing
markers, which some might say is unnecessary and too painful to write.

Things which worry me about XML are mainly:

1) External entities - behaviour is largely undefined, but could lead to
arbitrary code inclusion (and therefore 'execution') (for code read XML
and for execution read parsing)
2) Character entity substitution - I bet most people don't check to see
when they should be replacing a character with a character entity. This
can lead to non-compliant XML, and the potential for unchecked input to
inject arbitrary XML into the document. A bit similar to the SQL
injection stuff and XSS stuff you see affecting websites.
3) Processing instructions - behaviour totally undefined, and if a 3rd
party is allowed to submit raw XML, with embedded PIs, then it's
possible that the XML processor could be conned into doing allsorts of
things, depending on the PIs it supports.
4) Overriding DTD definitions in the internal subset - assuming access
to the internal subset is controlled, then the developer is the only one
who can shoot himself in the foot.
5) Idiots, but then again, this is computing and idiots are always a
concern.

Probably others, but it's 2am.

I still want to write an efficient character entity-iser in Java, which
will take a String, and return an XML fragment, conformant to a
specified DTD, using whatever character entities it can from that DTD.

Should be easy to do, especially if I can find a nice DTD parser, just
no time :) It would make XSS attacks much easier to prevent in webapps.

-- 
Russell Howe       | Why be just another cog in the machine,
rhowe at siksai.co.uk | when you can be the spanner in the works?
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list