Gordon Shannon gordo169 at gmail.com
Sun Jul 19 16:43:37 PDT 2009

Gordon Shannon wrote:
> 
> 
> 
> Brian A. Seklecki-2 wrote:
>> 
>> Ughhh...who's RPMs are are you using?  Sig11 is a segfault, which could
>> mean bad patches in the code by the RPM builder, or mismatch libraries,
>> etc.
>> 
>> Run your slon(8) process via strace(8) and get us the output.  Also,
>> build a debug binary (COPTS/CFLAGS+=-g)
>> 
> 
> Here is the srpm we started with:
> http://yum.pgsqlrpms.org/srpms/8.4/fedora/fedora-11-i386/slony1-2.0.2-1.f11.src.rpm
> 
> We built it unmodified except for disable build of documentation. We
> confirmed that the downloaded source md5sum matches that of the source at:
> http://www.slony.info/downloads/2.0/source/
> 
> We are looking into the strace...  BTW, what do you mean by slon(8)?
> 
> Thx
> 
> 

I think I found the duplicate key problem.  The "log_actionseq" column in
the sl_log_1/2 tables is a bigint.  But the compress_actionseq() function in
remote_worker.c is working only with signed ints not longs.  So if a value
greater than 2,147,483,647 comes along, the value in curr_number will
overflow.

Here's the relevant debug info from the log:

14 0719 18:33:27 DEBUG4 compress_actionseq(list,subquery) Action list:
'4832430056','4832430057','4832430058','4832430059','4832430060','4832430061','4832430062','4832430063','4832430064','4832430065','4832430066','4832430067','4832430068','4832430069','4832430070','4832430071','4832430072','4832430073','4832430074','4832430075','4832430076','4832430077','4832430078','4832430079','4832430080','4832430081','4832430082','4832430083','4832430084','4832430085','4832430086','4832430087','4832430088','4832430089','4832430090','4832430091','4832430092','4832430093','4832430094','4832430095','4832430096','4832430097','4832430098','4832430099','4832430100','4832430101','4832430102','4832430103','4832430104','4832430105','4832430106','4832430107','4832430108','4832430109','4832430110','4832430111','4832430112','4832430113','4832430114','4832430115','4832430116','4832430117','4832430118','4832430119','4832430120','4832430121','4832430122','4832430123','4832430124','4832430125','4832430126','4832430127','4832430128','4832430129','4832430130','4832430131','4832430132','4832430133','4832430134','4832430135','4832430136','4832430137','4832430138','4832430139','4832430140','4832430141','4832430142','4832430143','4832430144','4832430145','4832430146','4832430147','4832430148','4832430149','4832430150','4832430151','4832430152','4832430153','4832430154','4832430155','4832430156','4832430157','4832430158','4832430159'
14 0719 18:33:27 DEBUG4 Finished number: 537462760
14 0719 18:33:27 DEBUG4 Finished number: 537462761
14 0719 18:33:27 DEBUG4 Finished number: 537462762
(...)
14 0719 18:33:27 DEBUG4 Finished number: 537462860
14 0719 18:33:27 DEBUG4 Finished number: 537462861
14 0719 18:33:27 DEBUG4 Finished number: 537462862
14 0719 18:33:27 DEBUG4 Finished number: 537462863
14 0719 18:33:27 DEBUG4 between entry - 537462760 537462863
14 0719 18:33:27 DEBUG4  compressed actionseq subquery...   log_actionseq
not between '537462760' and '537462863'

Note that 537462760 is what you get when truncate the 8-byte 4832430056 to a
4-byte integer. So essentially it trying to sync rows that already came over
in the subscription event.

Let me know if you need more details.

-Gordon


-- 
View this message in context: http://www.nabble.com/How-to-downgrade-from-2.0.2-to-2.0.1-tp24540512p24561999.html
Sent from the Slony-I -- General mailing list archive at Nabble.com.



More information about the Slony1-general mailing list