[sclug] URL encoding/decoding question

Roland Turner SCLUG raz.fpyht.bet.hx at raz.cx
Thu Feb 23 09:58:33 UTC 2006


On Thu, 2006-02-23 at 09:53 +0100, pieter claassen wrote:

> Firstly, I converted the db character set to utf8 by dumping the data
> and recreating the table.

That's a given as potentially every character has to be re-encoded.

> But suddenly all my PreparedStatement queries failed with the following
> info in the mysql.log
> 
> SELECT data FROM content WHERE id=? and lang=?
> 
> It is clear that connector j (both version 3.1.12 and 5.0.0) failed to
> bind/substitute the variables for their place holders.

Yikes! This (clearly) is a show-stopper.

(My reservations about the behaviour of MySQL AB notwithstanding, I find
it difficult to believe that they could have a broken language binding
to something as widely used as Java. You should be able to sort this out
with MySQL AB or MySQL's user community.)

> I noticed in the mysql log that 
> 
>  5 Query       SET NAMES latin1
>  5 Query       SET character_set_results = NULL
> 
> Which I forced through the URL Connection parameters to now state
> 7 Query       SET NAMES utf8
> 7 Query       SET character_set_results = utf8

Hmm. Another thought, although one that could be difficult to follow
through: perhaps the JDBC driver itself has a problem with UTF-8. If you
are forcing encodings at this low-level behind the driver's back, maybe
it's barfing. (If this hypothesis is correct, then you need to get MySQL
out of the character-handling path. See below.)

> however current results on the test code below are:
> 7 Prepare     [1] SELECT data FROM content WHERE lang=?
> 7 Execute     [1] SELECT data FROM content WHERE lang=?
> 
> The workaround of using Statement rather than preparedStatement is not
> really an option since Tomcat uses preparedStatement internally for
> authentication.....

As per earlier this thread, unless you are actually secretly in the
employ of criminal organisations who are paying you to weaken
applications in order to facilitate worm propogation, this tradeoff
never arises. You must never construct queries by concatenating fixed
fragments with stuff from the web.

> Any ideas appreciated.

Apart from not-using-MySQL :-)

More usefully, you'll recall that I asked some questions very early in
the thread about CLOB vs. BLOB and pointed out the need for
compatibility in character-encoding interpretations. Unless you have a
particular need to do something like:

   SELECT *
   FROM wherever
   WHERE webFormFieldContents LIKE '%euro%';

(i.e. a need to qualify queries on the _content_ of the field containing
data from the web form) or to share the data with another language
(which may not share Java's ease of handling for Unicode) then you
probably don't need to involve the database in character handling at
all. This being the case, switch to BLOB or equivalent and let Java take
care of the conversion. Once you remove one entire layer of character
interpretation, your problem will either vanish, or become a while lot
simpler to solve.

- Raz



More information about the Sclug mailing list