Fri May 2 02:26:57 PDT 2008
- Previous message: [Slony1-general] 1.2.14rc still does not appear to handle switchover cleanly
- Next message: [Slony1-general] 1.2.14rc still does not appear to handle switchover cleanly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Do we need to apply this patch? We have 3 nodes, 1 origin with 2 subscribers to it. Yes. > > I'd have expected such a patch would have been applied to the 1.2.14rc tarball? I guess I don''t understand something about the release procedures etc. I think so too.. the original patch which I sent was not applied correctly, I hope the release 1.2.14 or 1.2.14rc2. > > ----- Original Message ---- > > From: Yoshiharu Mori <y-mori at sraoss.co.jp> > > To: Charles Duffy <charles.duffy at gmail.com> > > Cc: slony1-general at lists.slony.info > > Sent: Friday, 2 May, 2008 7:49:55 AM > > Subject: Re: [Slony1-general] 1.2.14rc still does not appear to handle switchover cleanly > > > > Hi > > > > > > Now with 1.2.12 and 1.2.14rc (have not tested 1.2.13 yet (but since > > > > it's apparent in 1.2.12 and in 1.2.14rc even with the "patch/possible > > > > fix", I'm guessing the issue is very much in 1.2.13 and there is a > > > > large issue as failover and switchover are key elements in this > > > > application. > > > > > > Just to add my test report on 1.2.14RC - I agree that it still exhibits > > > the "move set" problem. > > > > Do you attach the patch on the following pages ? > > http://lists.slony.info/pipermail/slony1-general/2008-March/007690.html > > > > > > > > I tested it out with a 3-node PostgreSQL 8..3.1 cluster: > > > > > > node3 <-- node1 --> node2 > > > > > > There's one replication set. It contains all the tables used in a pgbench > > > installation (the test app). The initial state was: origin on node1, > > > with node2 and > > > node3 both subscribing directly to node1. > > > > > > It works fine in this initial state: > > > > > > node1=# insert into accounts values (100000000, 100000, 100000, 'hello > > sailor'); > > > INSERT 0 1 > > > node1=# \c node2 > > > You are now connected to database "node2". > > > node2=# select * from accounts where aid = 100000000; > > > aid | bid | abalance | filler > > > > > -----------+--------+----------+-------------------------------------------------------------------------------------- > > > 100000000 | 100000 | 100000 | hello sailor > > > (1 row) > > > > > > node2=# \c node3 > > > You are now connected to database "node3". > > > node3=# select * from accounts where aid = 100000000; > > > aid | bid | abalance | filler > > > > > -----------+--------+----------+-------------------------------------------------------------------------------------- > > > 100000000 | 100000 | 100000 | hello sailor > > > (1 row) > > > > > > node3=# \q > > > > > > So, we're getting data. It worked with a large pgbench run as well. > > > > > > The move set, however, causes problems: > > > > > > [ccd at hpsystem slontest]$ slonik_move_set 1 1 2 | slonik > > > :5: Locking down set 1 on node 1 > > > :7: Locked down - moving it > > > :9: Replication set 1 moved from node 1 to 2. Remember to > > > :10: update your configuration file, if necessary, to note the > > > new location > > > :11: for the set. > > > > > > [ccd at hpsystem slontest]$ psql -U postgres node1 > > > Password for user postgres: > > > Welcome to psql 8.3.1, the PostgreSQL interactive terminal. > > > > > > node1=# select * from _replication.sl_subscribe ; > > > sub_set | sub_provider | sub_receiver | sub_forward | sub_active > > > ---------+--------------+--------------+-------------+------------ > > > 1 | 1 | 3 | t | t > > > 1 | 2 | 1 | t | t > > > (2 rows) > > > > > > node1=# \c node2 > > > You are now connected to database "node2". > > > node2=# select * from _replication.sl_subscribe ; > > > sub_set | sub_provider | sub_receiver | sub_forward | sub_active > > > ---------+--------------+--------------+-------------+------------ > > > 1 | 1 | 3 | t | t > > > 1 | 2 | 1 | t | t > > > (2 rows) > > > > > > node2=# \c node3 > > > You are now connected to database "node3". > > > node3=# select * from _replication.sl_subscribe ; > > > sub_set | sub_provider | sub_receiver | sub_forward | sub_active > > > ---------+--------------+--------------+-------------+------------ > > > 1 | 1 | 2 | t | t > > > 1 | 1 | 3 | t | t > > > > > > > > > Node3 never reflects the new state of the cluster. Restarting slon > > > daemons was ineffective. > > > Looking at locks, the same symptomatic lock is present (as in all > > > other reports of this issue): > > > > > > > > > node3=# select * from pg_locks; > > > locktype | database | relation | page | tuple | virtualxid | > > > transactionid | classid | objid | objsubid | virtualtransaction | pid > > > | mode | granted > > > > > ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+---------------------+--------- > > > virtualxid | | | | | 16/121 | > > > | | | | 16/121 | 3480 | > > > ExclusiveLock | t > > > relation | 16387 | 20718 | | | | > > > | | | | 16/121 | 3480 | > > > AccessExclusiveLock | f > > > relation | 16387 | 20663 | | | | > > > | | | | 21/108 | 3489 | > > > AccessShareLock | t > > > virtualxid | | | | | 9/130 | > > > | | | | 9/130 | 3426 | > > > ExclusiveLock | t > > > relation | 16386 | 20311 | | | | > > > | | | | 9/130 | 3426 | > > > RowExclusiveLock | t > > > relation | 16386 | 20303 | | | | > > > | | | | 9/130 | 3426 | > > > RowExclusiveLock | t > > > transactionid | | | | | | > > > 36277 | | | | 9/130 | 3426 | > > > ExclusiveLock | t > > > virtualxid | | | | | 4/396 | > > > | | | | 4/396 | 3382 | > > > ExclusiveLock | t > > > relation | 16387 | 10969 | | | | > > > | | | | 23/134 | 5232 | > > > AccessShareLock | t > > > virtualxid | | | | | 21/108 | > > > | | | | 21/108 | 3489 | > > > ExclusiveLock | t > > > relation | 16386 | 2614 | | | | > > > | | | | 9/130 | 3426 | > > > ExclusiveLock | t > > > relation | 16387 | 20669 | | | | > > > | | | | 21/108 | 3489 | > > > AccessShareLock | t > > > relation | 16386 | 2614 | | | | > > > | | | | 4/396 | 3382 | > > > ExclusiveLock | f > > > virtualxid | | | | | 23/134 | > > > | | | | 23/134 | 5232 | > > > ExclusiveLock | t > > > relation | 16387 | 20718 | | | | > > > | | | | 21/108 | 3489 | > > > AccessExclusiveLock | t > > > (15 rows) > > > > > > node3=# select * from pg_locks where granted = 'f'; > > > locktype | database | relation | page | tuple | virtualxid | > > > transactionid | classid | objid | objsubid | virtualtransaction | pid > > > | mode | granted > > > > > ----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+---------------------+--------- > > > relation | 16387 | 20718 | | | | > > > | | | | 16/121 | 3480 | > > > AccessExclusiveLock | f > > > (1 row) > > > > > > node3=# select relname,relnamespace from pg_class where oid = 20718; > > > relname | relnamespace > > > ----------------+-------------- > > > sl_config_lock | 20491 > > > (1 row) > > > > > > > > > looking at logfiles: > > > > > > Node2's slon daemon logfile contains: > > > ================================= > > > 2008-05-02 14:34:22 EST DEBUG2 remoteListenThread_1: queue event 1,67 MOVE_SET > > > 2008-05-02 14:34:22 EST DEBUG2 remoteListenThread_1: UNLISTEN > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: Received event 1,66 SYNC > > > 2008-05-02 14:34:22 EST DEBUG2 calc sync size - last time: 1 last > > > length: 7719 ideal: 7 proposed size: 3 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: SYNC 66 processing > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: syncing set 1 > > > with 4 table(s) from provider 1 > > > 2008-05-02 14:34:22 EST DEBUG2 ssy_action_list length: 0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: current local > > > log_status is 0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1_1: current remote > > > log_status = 0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteHelperThread_1_1: 0.001 seconds > > > delay for first row > > > 2008-05-02 14:34:22 EST DEBUG2 remoteHelperThread_1_1: 0.001 seconds > > > until close cursor > > > 2008-05-02 14:34:22 EST DEBUG2 remoteHelperThread_1_1: inserts=0 > > > updates=0 deletes=0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: new sl_rowid_seq > > > value: 1000000000011000 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: SYNC 66 done in > > > 0.003 seconds > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: Received event > > > 1,67 MOVE_SET > > > 2008-05-02 14:34:22 EST CONFIG moveSet: set_id=1 old_origin=1 new_origin=2 > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=1 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=2 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST CONFIG storeListen: li_origin=1 li_receiver=2 > > > li_provider=1 > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=1 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST CONFIG storeListen: li_origin=3 li_receiver=2 > > > li_provider=3 > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=3 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST CONFIG storeListen: li_origin=1 li_receiver=2 > > > li_provider=3 > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=3 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST CONFIG storeListen: li_origin=3 li_receiver=2 > > > li_provider=1 > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=1 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=1 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST DEBUG2 sched_wakeup_node(): no_id=3 (0 threads > > > + worker signaled) > > > 2008-05-02 14:34:22 EST DEBUG1 remoteWorkerThread_1: helper thread for > > > provider 1 terminated > > > 2008-05-02 14:34:22 EST DEBUG1 remoteWorkerThread_1: disconnecting > > > from data provider 1 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteListenThread_3: start listening > > > for event origin 1 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_3: forward confirm > > > 1,66 received by 3 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_3: forward confirm > > > 2,34 received by 3 > > > 2008-05-02 14:34:23 EST DEBUG2 remoteWorkerThread_1: forward confirm > > > 2,34 received by 1 > > > 2008-05-02 14:34:23 EST DEBUG2 syncThread: new sl_action_seq 1 - SYNC 36 > > > 2008-05-02 14:34:25 EST DEBUG2 localListenThread: Received event 2,34 SYNC > > > 2008-05-02 14:34:25 EST DEBUG2 localListenThread: Received event 2,35 > > ACCEPT_SET > > > 2008-05-02 14:34:25 EST DEBUG2 localListenThread: ACCEPT_SET > > > > > > Node3's slon daemon logfile contains: > > > ================================= > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_2: Received event > > > 2,35 ACCEPT_SET > > > 2008-05-02 14:34:22 EST DEBUG2 start processing ACCEPT_SET > > > 2008-05-02 14:34:22 EST DEBUG2 ACCEPT: set=1 > > > 2008-05-02 14:34:22 EST DEBUG2 ACCEPT: old origin=1 > > > 2008-05-02 14:34:22 EST DEBUG2 ACCEPT: new origin=2 > > > 2008-05-02 14:34:22 EST DEBUG2 ACCEPT: move set seq=67 > > > 2008-05-02 14:34:22 EST DEBUG2 got parms ACCEPT_SET > > > 2008-05-02 14:34:22 EST DEBUG2 ACCEPT_SET - node not origin > > > 2008-05-02 14:34:22 EST DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET > > > not received yet - sleep > > > 2008-05-02 14:34:22 EST DEBUG2 remoteListenThread_1: queue event 1,66 SYNC > > > 2008-05-02 14:34:22 EST DEBUG2 remoteListenThread_1: queue event 1,67 MOVE_SET > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: Received event 1,66 SYNC > > > 2008-05-02 14:34:22 EST DEBUG2 calc sync size - last time: 1 last > > > length: 111 ideal: 540 proposed size: 3 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: SYNC 66 processing > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: syncing set 1 > > > with 4 table(s) from provider 1 > > > 2008-05-02 14:34:22 EST DEBUG2 ssy_action_list length: 0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: current local > > > log_status is 0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1_1: current remote > > > log_status = 0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteHelperThread_1_1: 0.000 seconds > > > delay for first row > > > 2008-05-02 14:34:22 EST DEBUG2 remoteHelperThread_1_1: 0.000 seconds > > > until close cursor > > > 2008-05-02 14:34:22 EST DEBUG2 remoteHelperThread_1_1: inserts=0 > > > updates=0 deletes=0 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: new sl_rowid_seq > > > value: 1000000000011000 > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: SYNC 66 done in > > > 0.002 seconds > > > 2008-05-02 14:34:22 EST DEBUG2 remoteWorkerThread_1: Received event > > > 1,67 MOVE_SET > > > 2008-05-02 14:34:23 EST DEBUG2 syncThread: new sl_action_seq 1 - SYNC 35 > > > 2008-05-02 14:34:24 EST DEBUG2 localListenThread: Received event 3,35 SYNC > > > 2008-05-02 14:34:26 EST DEBUG2 remoteListenThread_2: queue event 2,36 SYNC > > > 2008-05-02 14:34:30 EST DEBUG2 remoteListenThread_1: queue event 1,68 SYNC > > > 2008-05-02 14:34:32 EST DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET > > > not received yet - sleep > > > 2008-05-02 14:34:33 EST DEBUG2 syncThread: new sl_action_seq 1 - SYNC 36 > > > 2008-05-02 14:34:34 EST DEBUG2 localListenThread: Received event 3,36 SYNC > > > 2008-05-02 14:34:34 EST DEBUG2 remoteListenThread_1: queue event 1,69 SYNC > > > 2008-05-02 14:34:34 EST DEBUG2 remoteListenThread_1: queue event 2,37 SYNC > > > 2008-05-02 14:34:38 EST DEBUG2 remoteListenThread_1: LISTEN > > > 2008-05-02 14:34:42 EST DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET > > > not received yet - sleep > > > 2008-05-02 14:34:44 EST DEBUG2 syncThread: new sl_action_seq 1 - SYNC 37 > > > 2008-05-02 14:34:44 EST DEBUG2 remoteListenThread_1: queue event 1,70 SYNC > > > 2008-05-02 14:34:44 EST DEBUG2 remoteListenThread_1: UNLISTEN > > > 2008-05-02 14:34:44 EST DEBUG2 remoteListenThread_2: queue event 2,38 SYNC > > > 2008-05-02 14:34:44 EST DEBUG2 localListenThread: Received event 3,37 SYNC > > > 2008-05-02 14:34:48 EST DEBUG2 remoteListenThread_1: LISTEN > > > 2008-05-02 14:34:52 EST DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET > > > not received yet - sleep > > > > > > The "ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep" > > > continue.... > > > > > > > Sometimes in 1.2.12 and 1.2.14rc the failover works, > > > > > > Yes, I've also experienced this - _occasionally_ the first 'move set' > > > will work, but moving again always kills it. > > > > > > > > > > > I'm more than happy to work thru this as I really want to push out > > > > 8.3.1 and would love to have a functioning 1.2.14 slon release, but > > > > something bad happened between 1.2.11 and current.. Either something > > > > new that I have not added to my setup scripts or it's the code. > > > > > > > > I'll work with someone on this!!! > > > > > > I'm going to investigate this myself in more detail when I get some > > > time - if I come up with anything useful, I'll be posting... > > > > > > I know this post is basically just yet another description of the > > > issue, but I thought it might be valuable nonetheless. Anyone see > > > anything wrong in my setup that might be causing this, instead of the > > > bug? > > > > > > Thanks > > > > > > Charles Duffy > > > _______________________________________________ > > > Slony1-general mailing list > > > Slony1-general at lists.slony.info > > > http://lists.slony.info/mailman/listinfo/slony1-general > > > > > > > > > -- > > SRA OSS, Inc. Japan > > Yoshiharu Mori > > http://www.sraoss.co.jp/ > > _______________________________________________ > > Slony1-general mailing list > > Slony1-general at lists.slony.info > > http://lists.slony.info/mailman/listinfo/slony1-general > > > > > > > __________________________________________________________ > Sent from Yahoo! Mail. > A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html > -- SRA OSS, Inc. 日本支社 Yoshiharu Mori <y-mori at sraoss.co.jp> http://www.sraoss.co.jp/
- Previous message: [Slony1-general] 1.2.14rc still does not appear to handle switchover cleanly
- Next message: [Slony1-general] 1.2.14rc still does not appear to handle switchover cleanly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list