[Slony1-general] Soliciting improvements to UTF8 test

Tue Jul 11 09:06:12 PDT 2006

We've got a test that is intended to seek out problems in handling of
UTF8 / multibyte values.  Some may recall that there was a condition in
1.1.0 where it was possible to cut off the final byte of such a
string...  The "testutf8" test specifically goes after that condition,
using the queries listed below.

    echo "INSERT INTO utf8table (string) values ('${txtb} - \303\241');"
>> $GENDATA
    echo "INSERT INTO utf8table (string) values ('${txtb} --
\303\241');" >> $GENDATA
    echo "INSERT INTO utf8table (string) values ('\303\241 - ${txtb}');"
>> $GENDATA
    echo "INSERT INTO utf8table (string) values ('\303\241 --
${txtb}');" >> $GENDATA
    echo "INSERT INTO utf8table (string) values ('t3 -- \303\241 -
${txtb}');" >> $GENDATA
    echo "INSERT INTO utf8table (string) values ('t3 - \303\241 --
${txtb}');" >> $GENDATA

While that probably is quite satisfactory for making sure that we do not
regress back into that particular failure, I expect that we might get
something out of adding some additional UTF8 characters and doing more
with them.

I daresay I have no personal call to use UTF8 at this time, and a search
through PG docs aren't finding me much in terms of how to
comprehensively create full sets of UTF8 characters.

I'd like, if I could, to create some sample data that would
comprehensively create each UTF8 character that exists, and generate
some patterns thereof.  Does anyone have a link to a web page or
relevant resource ("read ISO standard Foo" would not be welcome ;-))
that might help educate me a little on this?