Sun May 28 15:16:28 PDT 2006
- Previous message: [Slony1-general] strategy to fix utf8 encoding errors
- Next message: [Slony1-general] strategy to fix utf8 encoding errors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 5/27/06, Vivek Khera <vivek at khera.org> wrote: > I have a database (rt3) in a postgres 8.0 server which has UNICODE > encoding. It was replicated to another 8.0 DB just fine for a long > time. Today I upgraded the replica to 8.1 and when I went to > replicate it, I got UTF8 encoding failure from one of the tables: > 'invalid byte sequence for encoding "UTF8": 0xa9' > > Aside from playing whack-a-mole and fixing the errors one at a time > as they are reported by slon, what can I do to make the data UTF8 > safe for the strict checking of Pg 8.1? > > And what does one do to figure out what character to replace or do > you generally just cut the offending character from the row? > When migrated from 7.4 to 8.1, we had problems with bad characters. There was a small set of bad characters, usually characters which hadn't been translated to UTF-8 but were in the original latin-1 or windows-1252 character set. Luckily, UTF-8 strings are pretty distinctive. It is pretty easy to write a regex which only matches valid UTF-8 strings. You could either run that against a dump, every column in eveyr table, or particular problem columns. If you have a good idea of what the original character set was and what characters you can expect, then you can translate them to Unicode. - Ian
- Previous message: [Slony1-general] strategy to fix utf8 encoding errors
- Next message: [Slony1-general] strategy to fix utf8 encoding errors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list