[sclug] URL encoding/decoding question

ed ed at ednevitible.co.uk
Sun Feb 19 18:15:56 UTC 2006


On Sun, 19 Feb 2006 16:58:44 +0100
pieter claassen <pieter at claassen.co.uk> wrote:

> I am trying to edit some database stored fields containing HTML with a
> JSP page.
> 
> I store the data URL encoded in UTF-8 in the db and decode it before
> rendering with the exception of URL parameter strings that contain
> HTML that are re-encoded so that they don't break the rendering of
> links in the browser.

The data, when echoed from the database should not be in a format that
would break links, unless you inserted something that would. I don't
quite see the reason URL encoding.

> My questions:
> 1. I assume it makes no difference whether I store data encoded or not
> in the DB? The reason I went for encoding was in case there were some
> values that would screw the SQL insertion up (like "). Encoding and
> decoding a string should result in exactly the same value?

executeQuery( page.toString().replaceAll( "'", "\\''" );

Should do the trick, all you need to do is reaplce all the ' characters
with \', then SQL should ignore it, and just insert. To the best of my
knowledge that worked fine when I stored a few thousand binaries in db
rows.

> 2. For some reason when I try to encode the " % " characters (space%
> space), I get an encoded value of "+%25+" in the database but when I
> try to decode this value, I get:

With URL encoding, spaces become + and % becomes a hex reference,
confusingly hex characters are represented with a leading '%', so a
% is 37 in decimal, 25 in hex, so it would be represented as '%25'.

> 3. A big problem is the encoding of € strings which give me 
> "%26euro%3B" in the database and is then rendered by the browser
> inside a textarea block as ___ (the euro sign). The problem is that if
> I encode this symbol then I get "%C3%A2%C2%82%C2%AC" which in return
> encodes to "%C3%83%C2%A2%C3%82%C2%82%C3%82%C2%AC".

I cannot be certain, but I think what you've done is encoded a unicode
character, (two bytes), which becomes four bytes after the conversion,
with a prefixing converted &.

> How do I get text in the text area to be decoded to HTML values and
> then re-encoded before insertion in the DB to the same UTF-8 value?
> Whey does this happen?

You don't have to convert to insert to a database, that is done on the
backend and does not get transported through HTTP.

> The java encoding and decoding calls are:

> The bottom line is that if I decode HTML, view it in a textarea and
> re-encode it, that it is not the same as it was before.

I expect you are loosing things because the order of decode is not the
same as the order you are encoding as.

Try using the above replaceAll, it should solve your problems problems
at the database layer.

There are a few java applets around on google if you search for applet
urlencode or something similar.

-- 
Regards, Ed                      :: http://www.usenix.org.uk
:%s/\t/  /g                      :: proud unix system person
:%s/Open Source/Free Software/g



More information about the Sclug mailing list