Jason Culverhouse jason
Wed Aug 2 14:03:02 PDT 2006
FYI,
I had to do something like this when upgrading to Postgres 8.1, the  
database was only about ~25G.

Assuming that you UTF-8 errors are sparse in the database this should  
be a little easier.

1) Dump the database:
   pg_dump musubi > text.dump

2) Split the text dump into lines of 100,000.  This generates files  
with names 'line.00000'.   Make sure your suffix length (-a) is big  
enough to handle the number splits.  Make sure you have enough memory  
to process each split file, iconv will read the entire file into  
memory in the next step.

split -a 5 -d -l 100000 text.dump line.

3) Run this bash script in the directory with the split output
---
#!/bin/bash -x
for i in $(ls line.*); do
   iconv -c -f UTF-8 -t UTF-8 -o ${i}.fixed $i
   diff -q ${i}.fixed $i
   if [ "$?" = "0" ]; then
     rm ${i}.fixed $i
   fi
done
---

After this is done you will be left with files 'line.xxx' and  
'line.xxx.diff'  files that contain the rows that need to be fixed.
Execute:
   diff line.xxx line.xxx.diff

You will see the rows that need to be updated.

Most of this was put together from various postings on postgres  
mailings lists.

Jason

On Aug 2, 2006, at 1:00 PM, Marcin Mank wrote:

>>
>> There surely should be some better way, such as finding which  
>> specific
>> tuples are problematic, and updating them on the source database.
>>
>> A thought...  You might do two dumps:
>>
>> 1.  Raw, no conversion using iconv
>>
>> 2.  Another, which converts using iconv [generate this by passing  
>> file #1
> thru iconv...]
>>
>
> I am just struggling with this issue, my solution:
>
> CREATE OR REPLACE FUNCTION utf8_encode(text)
>   RETURNS text AS
> $BODY$
> ($s)=@_;
>  utf8::encode($s);
> return $s;
> $BODY$
>   LANGUAGE 'plperlu' IMMUTABLE;
>
>
>
> and now :
>
> foreach suspect table {
>     update table set field=utf8_encode(field) where field<>
> utf8_encode(field)
> }
>
> kinda slow, but might be good enough.
>
> Greetings
> Marcin
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general




More information about the Slony1-general mailing list