Fri Feb 27 05:58:13 PST 2009
- Previous message: [Slony1-general] slony rep error on start up: Slony-I: setAddTable_int():
- Next message: [Slony1-general] how to sync laptop master to server slave?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm seeing a problem similar to that described in this thread: http://lists.slony.info/pipermail/slony1-general/2008-August/008492.html However, I was unable to isolate the cause. The solution was to restart the slons on all nodes, which unstuck things and all tests show that the cluster is replicating happily again. I'm trying to determine how/why the problem happened. In this case it was triggered by moving an origin around (as part of a routine disaster drill). The set moved to the backup without problem, but we didn't notice that the slon on node 2 had died until we went to move things back. At that point, the move set hung, and eventually bailed with "timeout exceeded while waiting for event confirmation" If anyone has any insight into this, I'm all ears. Otherwise, I can set up a test bed if there's anything I can do to provide more information. Some basic details: FreeBSD 6.3 PostgreSQL 8.3.5 Slony 1.2.15 Cluster = node 1: masterdb in facil0 node 2: localslavedb in facil0 node 3: remoteslavedb in facil1 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8393-1] [55583] CONFIG moveSet: set_id=1 old_origin=1 new_origin=3 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8394-1] [55583] DEBUG1 remoteWorkerThread_3: helper thread for provider 3 created 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8395-1] [55583] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8396-1] [55583] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=3 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [err] slon[55583]: [8397-1] [55583] CONFIG storeListen: li_origin=3 li_receiver=2 li_provider=3 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8398-1] [55583] DEBUG1 remoteWorkerThread_1: helper thread for provider 1 terminated 2009 Feb 27 06:31:43 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8399-1] [55583] DEBUG1 remoteWorkerThread_1: disconnecting from data provider 1 2009 Feb 27 06:31:52 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8400-1] [55583] DEBUG1 remoteWorkerThread_3: connected to data provider 3 on 'dbname=database_name_prod host=remoteslavedb-ifx.bdb.facil1 2009 Feb 27 06:31:52 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8400-2] user=database_name_slony password=[redacted]' 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8401-1] [55583] INFO localListenThread: got restart notification 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [2-1] [1058] DEBUG1 slon: retry requested 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8402-1] [55583] DEBUG1 main: scheduler mainloop returned 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8403-1] [55583] INFO remoteListenThread_3: disconnecting from 'dbname=database_name_prod host=remoteslavedb-ifx.bdb.facil1 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8403-2] user=database_name_slony password=[redacted]' 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8404-1] [55583] INFO remoteListenThread_1: disconnecting from 'dbname=database_name_prod host=masterdb-db.cluster00.facil0 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [info] slon[55583]: [8404-2] user=database_name_slony password=[redacted]' 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8405-1] [55583] DEBUG1 syncThread: thread done 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8406-1] [55583] DEBUG1 cleanupThread: thread done 2009 Feb 27 06:32:01 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8407-1] [55583] DEBUG1 remoteWorkerThread_1: thread done 2009 Feb 27 06:32:02 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8408-1] [55583] DEBUG1 remoteListenThread_3: thread done 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8409-1] [55583] DEBUG1 remoteListenThread_1: thread done 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][55583] [local2] [debug] slon[55583]: [8410-1] [55583] DEBUG1 main: done 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [3-1] [1058] DEBUG1 slon: restart of worker 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][1058] [local2] [err] slon[1058]: [1-1] [1058] CONFIG main: slon version 1.2.15 starting up 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [2-1] [27776] CONFIG main: local node id = 2 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [3-1] [27776] CONFIG main: launching sched_start_mainloop 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [4-1] [27776] CONFIG main: loading current cluster configuration 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [5-1] [27776] CONFIG storeNode: no_id=1 no_comment='Master node' 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [6-1] [27776] CONFIG storeNode: no_id=3 no_comment='remote slave node' 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [7-1] [27776] CONFIG storePath: pa_server=1 pa_client=2 pa_conninfo="dbname=database_name_prod host=masterdb-db.cluster00.facil0 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [7-2] user=database_name_slony password=[redacted]" pa_connretry=10 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [8-1] [27776] CONFIG storePath: pa_server=3 pa_client=2 pa_conninfo="dbname=database_name_prod host=remoteslavedb-ifx.bdb.facil1 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [8-2] user=database_name_slony password=[redacted]" pa_connretry=10 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [9-1] [27776] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [10-1] [27776] CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=3 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [11-1] [27776] CONFIG storeListen: li_origin=3 li_receiver=2 li_provider=3 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [12-1] [27776] CONFIG storeSet: set_id=1 set_origin=3 set_comment='All tables' 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [warning] slon[27776]: [13-1] [27776] WARN remoteWorker_wakeup: node 3 - no worker thread 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [14-1] [27776] CONFIG storeSubscribe: sub_set=1 sub_provider=3 sub_forward='t' 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [warning] slon[27776]: [15-1] [27776] WARN remoteWorker_wakeup: node 3 - no worker thread 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [16-1] [27776] CONFIG enableSubscription: sub_set=1 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [warning] slon[27776]: [17-1] [27776] WARN remoteWorker_wakeup: node 3 - no worker thread 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [18-1] [27776] CONFIG main: configuration complete - starting threads 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [debug] slon[27776]: [19-1] [27776] DEBUG1 localListenThread: thread starts 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [20-1] [27776] FATAL localListenThread: "select "_database_name_prod".cleanupNodelock(); insert into 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [20-2] "_database_name_prod".sl_nodelock values ( 2, 0, "pg_catalog".pg_backend_pid()); " - ERROR: duplicate key value violates 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][27776] [local2] [err] slon[27776]: [20-3] unique constraint "sl_nodelock-pkey" 2009 Feb 27 06:32:04 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [2-1] [1058] DEBUG1 slon: shutdown requested 2009 Feb 27 06:32:24 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [3-1] [1058] DEBUG1 slon: child termination timeout - kill child 2009 Feb 27 06:32:24 -05:00 localslavedb [slon][1058] [local2] [debug] slon[1058]: [4-1] [1058] DEBUG1 slon: done -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
- Previous message: [Slony1-general] slony rep error on start up: Slony-I: setAddTable_int():
- Next message: [Slony1-general] how to sync laptop master to server slave?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list