Mon Jun 21 10:14:40 PDT 2010
- Previous message: [Slony1-bugs] [Bug 128] DROP TABLE replicatedTable leaves the cluster in a bad state
- Next message: [Slony1-bugs] [Bug 136] New: listen paths not being rebuilt correctly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
http://www.slony.info/bugzilla/show_bug.cgi?id=135 Summary: slon can get stuck waiting for a sl_config_lock it already has Product: Slony-I Version: 2.0 Platform: All OS/Version: All Status: NEW Severity: major Priority: low Component: slon AssignedTo: slony1-bugs at lists.slony.info ReportedBy: ssinger at ca.afilias.info CC: slony1-bugs at lists.slony.info Estimated Hours: 0.0 I am not exactly sure what is going on here but this is what I have observed. As part of a test (FailNodeTest.js) a subscription process seemed to have hung. select pid,relname, mode, granted from pg_locks l, pg_class c where l.relation=c.oid; pid | relname | mode | granted -------+----------------------------+---------------------+--------- 13197 | sl_config_lock | AccessExclusiveLock | t 13378 | pg_class | AccessShareLock | t 13206 | sl_config_lock | AccessExclusiveLock | f 13378 | pg_class_relname_nsp_index | AccessShareLock | t 13378 | pg_locks | AccessShareLock | t 13378 | pg_class_oid_index | AccessShareLock | t Note pid 13197 has the lock. pid 13206 is waiting for the lock ps aux |grep 13197 9994 13197 0.0 0.3 62020 10232 ? Ss 12:45 0:00 postgres: slony test4 ::1(50428) idle in transaction 9994 13384 0.0 0.0 7336 880 pts/0 R+ 12:55 0:00 grep 13197 ssinger.local at ssinger:~/src/clustertest/clustertest$ lsof |grep 50428 slon 13137 ssinger.local 6u IPv6 4391222 0t0 TCP localhost:50428->localhost:postgresql (ESTABLISHED) postgres 13197 ssinger.local 7u IPv6 4391254 0t0 TCP localhost:postgresql->localhost:50428 (ESTABLISHED) So process 13137 which is the slon has the lock but, ps aux |grep 13206 9994 13206 0.0 0.2 60740 8192 ? Ss 12:45 0:00 postgres: slony test4 ::1(50432) LOCK TABLE waiting 9994 13399 0.0 0.0 7340 900 pts/0 S+ 12:56 0:00 grep 13206 ssinger.local at ssinger:~/src/clustertest/clustertest$ lsof |grep 50432 slon 13137 ssinger.local 10u IPv6 4391239 0t0 TCP localhost:50432->localhost:postgresql (ESTABLISHED) postgres 13206 ssinger.local 7u IPv6 4391278 0t0 TCP localhost:postgresql->localhost:50432 (ESTABLISHED) ssinger.local at ssinger:~/src/clustertest/clustertest$ ps aux |grep 13137 9994 13137 0.0 0.0 160664 1448 pts/0 Sl 12:45 0:00 /usr/local/pgsql_cvs/bin/slon disorder_replica -d4 dbname=test4 host=localhost user=slony password=slony This shows process process 13137 is also connected to a second backend that is waiting for the sl_config_lock. In the logs I observe 2010-06-21 12:46:16,425 [db4 stdout] DEBUG info.slony.clustertest.testcoordinator.slony.SlonLauncher db4 - 2010-06-21 12:46:16 EDTDEBUG2 remoteWorkerThread_3: Received event #3 from 5000000007 type:SUBSCRIBE_SET 2010-06-21 12:46:16,425 [db4 stdout] DEBUG info.slony.clustertest.testcoordinator.slony.SlonLauncher db4 - 2010-06-21 12:46:16 EDTWARN remoteWorkerThread_1: copy set: data provider 3 only on sync 5000000036 - sleep 5 seconds that log message on remoteWorkerThread_1 keeps repeating. I wonder if we are 'sleeping' while holding a lock that prevents the sync from getting processed. The cluster configuration is 1===>3====>4 where a path ALSO exists between nodes 1 and 4. (There is also a node 5 receiving from 3, and 2 receiving from 1). -- Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are the assignee for the bug.
- Previous message: [Slony1-bugs] [Bug 128] DROP TABLE replicatedTable leaves the cluster in a bad state
- Next message: [Slony1-bugs] [Bug 136] New: listen paths not being rebuilt correctly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list