Mon Mar 20 00:38:14 PST 2006
- Previous message: [Slony1-general] slon process fail to restart when dropping other node.
- Next message: [Slony1-general] slon process fail to restart when dropping other node.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
It seems a bug in remote_worker.c , rtcfg_lock() was called , but some case exit main loop without rtcfg_unlock() . Diff for src/slon/remote_worker.c attached. On Wed, 08 Mar 2006 20:34:25 +0900 TANIDA Yutaka <tanida at sraoss.co.jp> wrote: > Hi. > > I found a bug , unexpected slon shutdown when dropping node. > > -PostgreSQL 8.1.2 > -Slony-I 1.1.5 > --perltools was not used. > -RHEL4 update2 > > TO REPRODUCE THIS BUG: > > 1. make sure PostgreSQL 8.1,pgbench and Slony-I was installed. > > 2. Execute add.sh follows. It will create 1 master and 2-slave cluster > named "nodetest". node 1-2 and 1-3 path exists. > > [tanida at srapc2209 sl]$ cat add.sh > #!/bin/sh > CLUSTERNAME=nodetest > > RUSER=postgres > killall slon > dropdb node1 > dropdb node2 > dropdb node3 > createdb node1 > createdb node2 > createdb node3 > createlang plpgsql node1 > pgbench -i -s 1 node1 > pg_dump -s node1 | psql node2 > pg_dump -s node1 | psql node3 > slonik <<_EOF_ > cluster name = $CLUSTERNAME; > node 1 admin conninfo = 'dbname=node1'; > node 2 admin conninfo = 'dbname=node2'; > node 3 admin conninfo = 'dbname=node3'; > init cluster ( id=1, comment = 'Master'); > create set (id=1, origin=1, comment='All tables'); > table add key (node id = 1, fully qualified name = 'public.history'); > store node (id=2, comment = 'Slave'); > store node (id=3, comment = 'Slave'); > store path (server = 1, client = 2, conninfo='dbname=node1 '); > store path (server = 2, client = 1, conninfo='dbname=node2 '); > store path (server = 1, client = 3, conninfo='dbname=node1 '); > store path (server = 3, client = 1, conninfo='dbname=node3 '); > > store listen (origin=1, provider = 1, receiver =2); > store listen (origin=2, provider = 2, receiver =1); > store listen (origin=1, provider = 1, receiver =3); > store listen (origin=2, provider = 1, receiver =3); > store listen (origin=3, provider = 3, receiver =1); > store listen (origin=3, provider = 1, receiver =2); > set add table (set id=1, origin=1, id=1, fully qualified name = 'public.accounts', comment='accounts table'); > set add table (set id=1, origin=1, id=2, fully qualified name = 'public.branches', comment='branches table'); > set add table (set id=1, origin=1, id=3, fully qualified name = 'public.tellers', comment='tellers table'); > set add table (set id=1, origin=1, id=4, fully qualified name = 'public.history', comment='history table', key = serial); > #wait for event(origin=all,confirmed=all); > _EOF_ > slon nodetest "dbname=node1" >node1.log 2>&1 & > slon nodetest "dbname=node2" >node2.log 2>&1 & > slon nodetest "dbname=node3" >node3.log 2>&1 & > slonik <<_EOF_ > cluster name = $CLUSTERNAME; > node 1 admin conninfo = 'dbname=node1'; > node 2 admin conninfo = 'dbname=node2'; > node 3 admin conninfo = 'dbname=node3'; > subscribe set ( id = 1, provider = 1, receiver = 2, forward = yes); > subscribe set ( id = 1, provider = 1, receiver = 3, forward = yes); > _EOF_ > > 3. Execute del.sh follows. It will drop node 3. > > [tanida at srapc2209 sl]$ cat del.sh > #!/bin/sh > CLUSTERNAME=nodetest > > slonik <<_EOF_ > cluster name = $CLUSTERNAME; > node 1 admin conninfo = 'dbname=node1'; > node 2 admin conninfo = 'dbname=node2'; > node 3 admin conninfo = 'dbname=node3'; > drop node (id=3); > _EOF_ > > 4. slon for node3 will shutdown immediately , but after 20 seconds , > slon for node1 will shutdown , which must be restart. > > log of node1 shows: > > 2006-03-08 20:24:11 JST INFO localListenThread: got restart notification - signal scheduler > 2006-03-08 20:24:11 JST DEBUG1 slon: restart requested > 2006-03-08 20:24:11 JST DEBUG1 cleanupThread: thread done > 2006-03-08 20:24:11 JST DEBUG1 syncThread: thread done > 2006-03-08 20:24:11 JST DEBUG1 main: scheduler mainloop returned > 2006-03-08 20:24:11 JST DEBUG1 localListenThread: thread done > 2006-03-08 20:24:31 JST WARN main: shutdown timeout exiting > 2006-03-08 20:24:31 JST DEBUG1 slon: shutdown now requested > > It seems something unconditional happened in remoteListenThread or > remoteWorkerThread and deadlocked , so wait 20 seconds and shutdowned by > timeout. > > This example is for "drop node", but It will occurs other statement > requests restarts , such as "uninstall node" , "move set" or "failover". > > > -- > TANIDA Yutaka <tanida at sraoss.co.jp> > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at gborg.postgresql.org > http://gborg.postgresql.org/mailman/listinfo/slony1-general > > -- TANIDA Yutaka <tanida at sraoss.co.jp> -------------- next part -------------- A non-text attachment was scrubbed... Name: crush_on_dropNode.diff Type: application/octet-stream Size: 686 bytes Desc: not available Url : http://gborg.postgresql.org/pipermail/slony1-general/attachments/20060320/f9053aa1/crush_on_dropNode.obj
- Previous message: [Slony1-general] slon process fail to restart when dropping other node.
- Next message: [Slony1-general] slon process fail to restart when dropping other node.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list