Wed Aug 2 14:03:02 PDT 2006
- Previous message: [Slony1-general] Very slow initial replica population
- Next message: [Slony1-general] Very slow initial replica population
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
FYI, I had to do something like this when upgrading to Postgres 8.1, the database was only about ~25G. Assuming that you UTF-8 errors are sparse in the database this should be a little easier. 1) Dump the database: pg_dump musubi > text.dump 2) Split the text dump into lines of 100,000. This generates files with names 'line.00000'. Make sure your suffix length (-a) is big enough to handle the number splits. Make sure you have enough memory to process each split file, iconv will read the entire file into memory in the next step. split -a 5 -d -l 100000 text.dump line. 3) Run this bash script in the directory with the split output --- #!/bin/bash -x for i in $(ls line.*); do iconv -c -f UTF-8 -t UTF-8 -o ${i}.fixed $i diff -q ${i}.fixed $i if [ "$?" = "0" ]; then rm ${i}.fixed $i fi done --- After this is done you will be left with files 'line.xxx' and 'line.xxx.diff' files that contain the rows that need to be fixed. Execute: diff line.xxx line.xxx.diff You will see the rows that need to be updated. Most of this was put together from various postings on postgres mailings lists. Jason On Aug 2, 2006, at 1:00 PM, Marcin Mank wrote: >> >> There surely should be some better way, such as finding which >> specific >> tuples are problematic, and updating them on the source database. >> >> A thought... You might do two dumps: >> >> 1. Raw, no conversion using iconv >> >> 2. Another, which converts using iconv [generate this by passing >> file #1 > thru iconv...] >> > > I am just struggling with this issue, my solution: > > CREATE OR REPLACE FUNCTION utf8_encode(text) > RETURNS text AS > $BODY$ > ($s)=@_; > utf8::encode($s); > return $s; > $BODY$ > LANGUAGE 'plperlu' IMMUTABLE; > > > > and now : > > foreach suspect table { > update table set field=utf8_encode(field) where field<> > utf8_encode(field) > } > > kinda slow, but might be good enough. > > Greetings > Marcin > > _______________________________________________ > Slony1-general mailing list > Slony1-general at gborg.postgresql.org > http://gborg.postgresql.org/mailman/listinfo/slony1-general
- Previous message: [Slony1-general] Very slow initial replica population
- Next message: [Slony1-general] Very slow initial replica population
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list