[Gllug] unicode and cross site scripting vulnerabilities

Sean Burlington gllug at uncertainty.org.uk
Tue Feb 26 14:47:51 UTC 2002


On Tuesday 26 February 2002 1:59 pm, Simon Stewart wrote:
> On Tue, Feb 26, 2002 at 01:07:21PM +0000, Sean Burlington wrote:
> > Hi All,
> >    I would need to make some dynamic web sites more suitable for
> > internationalisation ...
> >
> > but I also want to make sure that they are safe from cross site scripting
> > vulnerabilities ...
> >
> > one way I sometimes make data safe is to replace or delete all chars
> > except say a-zA-Z0-9
> >
> > this means that I can be really sure that no awkward chars like quotes or
> > <> will hang around to break things.
> >
> > As I understand it unicode complicates this situation in two ways...
> >
> > 1) chars like 'the chinese charecter for water' should be allowed
> > 2) there are several fifferent ways of specifying (say) the quote char
> >
> > So. How do I get around this ?
> >
> > Do I have to find out all the ways of representing any unsafe chars, and
> > replace/encode these?
>
> If it helps, perl 5.6 supports unicode (or more precisely, utf8)
> although the support isn't terribly complete. Stick a "use utf8" at
> the head of your program, and things will start to work more according
> to plan. Having said this, \w etc. should all work as expected even if
> fed wide characters without any changes needing to be made[1]
>
> I would expect Ruby to have some pretty impressive unicode
> capabilities too, given its origins. I've not played with the language
> much at all, but I know that you can start the interpreter expecting a
> unicode source file (with "ruby -Ku", I believe)
>
> Finally, rooting through the JDK 1.4's new regex classes, you can get
> support for things like \p and \P similar to that provided by perl,
> and Java's strings are stored as Unicode in any case.
>

those java regexps look nice - and java looks like the language of choice for 
getting to grips with i18n anyway ...

what I am still unsure about is how to make sure that generated html doesn't 
inadvertantly contain markup from user input

only php seems to have a function like its htmlspeciachars - and php doesn't 
seem to cope well with unicode/utf-8 !

although - if you are generating a utf-8 web page - I guess it is just
<>&"
that you have to worry about - is anything else (8 bit) going to break the 
markup ??

and is a > still a > if it's Japanese !? :-)

-- 

Sean

-- 
Gllug mailing list  -  Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug




More information about the GLLUG mailing list