Tue Jun 2 13:49:26 PDT 2009
- Previous message: [Slony1-general] Initial replication of sequences is failing
- Next message: [Slony1-general] Initial replication of sequences is failing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for the idea, Jeff. I inspected the log more closely and found
what I think is the _real_ problem (unexpected chunk number)...
2009-05-28 23:49:31 CDT DEBUG2 remoteWorkerThread_1: Begin COPY of table
"finch"."blastres_qry"
2009-05-28 23:49:31 CDT DEBUG2 remoteWorkerThread_1: nodeon73 is 0
NOTICE: truncate of "finch"."blastres_qry" failed - doing delete
2009-05-28 23:49:31 CDT DEBUG2 remoteListenThread_1: LISTEN
2009-05-28 23:49:33 CDT DEBUG2 remoteListenThread_1: queue event 1,4141 SYNC
2009-05-28 23:49:33 CDT DEBUG2 remoteListenThread_1: UNLISTEN
2009-05-28 23:49:34 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 2412
2009-05-28 23:49:38 CDT DEBUG2 localListenThread: Received event 2,2412 SYNC
2009-05-28 23:49:44 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 2413
2009-05-28 23:49:44 CDT DEBUG2 localListenThread: Received event 2,2413 SYNC
2009-05-28 23:49:45 CDT DEBUG2 remoteListenThread_1: queue event 1,4142 SYNC
2009-05-28 23:49:53 CDT DEBUG2 remoteListenThread_1: LISTEN
2009-05-28 23:49:54 CDT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 2414
2009-05-28 23:49:56 CDT DEBUG2 localListenThread: Received event 2,2414 SYNC
2009-05-28 23:50:00 CDT ERROR remoteWorkerThread_1: copy to stdout on
provider - PGRES_FATAL_ERROR ERROR: unexpected chunk number 8618
(expected 426) for toast value 34675334
2009-05-28 23:50:00 CDT WARN remoteWorkerThread_1: data copy for set 1
failed - sleep 30 seconds
2009-05-28 23:50:00 CDT DEBUG2 remoteListenThread_1: queue event 1,4143 SYNC
2009-05-28 23:50:00 CDT DEBUG2 remoteListenThread_1: UNLISTEN
WARNING: there is no transaction in progress
Any ideas on what could cause such an error?
Sean
Jeff Frost wrote:
> Jeff Frost wrote:
>> Andrew Sullivan wrote:
>>> On Tue, Jun 02, 2009 at 10:30:45AM -0500, Sean Staats wrote:
>>>
>>>> I created a new replication cluster. It turns out that starting the
>>>> table IDs at id=1 and the sequence IDs at id=1001 didn't make any
>>>> difference as slony gave me the same error (sequence ID 1001 has already
>>>> been assigned.) Increasing the log verbosity to 4 doesn't produce any
>>>> more useful debugging information. Time for another approach.
>>>>
>>>> Would it make sense to create 2 different sets - one to replicate the
>>>> tables and one to replicate the sequences? Is there a downside to this
>>>> kind of workaround?
>>>>
>>>
>>> It'd be better to figure out what the duplication is caused by. Have
>>> a look in the _slony tables and check to see what's in there. Where's
>>> the collision?
>>>
>>>
>> I've seen this issue recently when the initial sync fails. If you
>> scroll further back in your logs do you have a failure for the
>> initial copy_set? When this happens to me, it seems that slony
>> leaves the slave DB in a half replicated state, but reattempts to do
>> the initial sync and finds that the sequences are already in
>> _cluster.sl_sequence table, then errors out. This requires dropping
>> the node and starting over. This is with version 1.2.16. I recall
>> previous versions being able to recover from a failed initial sync
>> without intervention, but my memory could be mistaken.
> In fact, here's how it looks in my logs:
>
> Jun 2 13:09:36 localhost slon[1867]: [274-1] 2009-06-02 13:09:36 PDT
> ERROR remoteWorkerThread_1: "select
> "_engage_cluster".tableHasSerialKey('"archive"."invitation"');"
> Jun 2 13:09:36 localhost slon[1867]: [274-2] could not receive data
> from server: Connection timed out
> Jun 2 13:09:36 localhost slon[1867]: [275-1] 2009-06-02 13:09:36 PDT
> WARN remoteWorkerThread_1: data copy for set 1 failed - sleep 30 seconds
> Jun 2 13:09:36 localhost postgres[1880]: [26-1] NOTICE: there is no
> transaction in progress
> Jun 2 13:10:06 localhost slon[1867]: [276-1] 2009-06-02 13:10:06 PDT
> DEBUG1 copy_set 1
> Jun 2 13:10:06 localhost slon[1867]: [277-1] 2009-06-02 13:10:06 PDT
> DEBUG1 remoteWorkerThread_1: connected to provider DB
> Jun 2 13:10:09 localhost slon[1867]: [278-1] 2009-06-02 13:10:09 PDT
> ERROR remoteWorkerThread_1: "select
> "_engage_cluster".setAddSequence_int(1, 4,
> Jun 2 13:10:09 localhost slon[1867]: [278-2]
> '"public"."tracking_sequence"', 'public.tracking_sequence sequence')"
> PGRES_FATAL_ERROR ERROR: Slony-I: setAddSequence_int():
> Jun 2 13:10:09 localhost slon[1867]: [278-3] sequence ID 4 has
> already been assigned
> Jun 2 13:10:09 localhost slon[1867]: [279-1] 2009-06-02 13:10:09 PDT
> WARN remoteWorkerThread_1: data copy for set 1 failed - sleep 60 seconds
>
> The DB in question is 144GB and it's being replicated over a
> relatively slow link. It seems to do about 1GB/hr, but never gets
> past 10GB. It always dies at that same point.
> --
> Jeff Frost, Owner <jeff at frostconsultingllc.com>
> Frost Consulting, LLC http://www.frostconsultingllc.com/
> Phone: 916-647-6411 FAX: 916-405-4032
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>
- Previous message: [Slony1-general] Initial replication of sequences is failing
- Next message: [Slony1-general] Initial replication of sequences is failing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list