Wed Jan 4 10:00:05 PST 2006
- Previous message: [Slony1-general] DDL Script eror
- Next message: [Slony1-general] 1.1.5?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have previously reported the problem of failover not working correctly if two slaves are subscribed to the same master node. I have now isolated that this only occurs if the master node is _not_ node 1. eg: master_node slaves node_1------------>node_101 \node_201 Failover works correctly master_node slaves node_101--------->node_201 \node_251 Failover causes the problem of the slave nodes pointing at each other, each thinking the other is both a master and a slave! Here again is the complete test case and failure report. ========================================================================= slony1-1.1.2 PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-49) sl_subscribe is not being updated correctly after a "FAILOVER" I have the following config node 1 admin conninfo='dbname=control host=main.comp.com port=5450 user=postgres'; node 101 admin conninfo='dbname=masterdb host=main.comp.com port=5480 user=postgres'; node 151 admin conninfo='dbname=masterdb host=slavea.comp.com port=5450 user=postgres'; node 201 admin conninfo='dbname=masterdb host=slaveb.comp.com port=5480 user=postgres'; node 251 admin conninfo='dbname=masterdb host=slavec.comp.com port=5480 user=postgres'; node 1 exists only as a controller and is not subscribed to any node; node 101 is the initial master node 151 subscribes to node 101 node 201 subscribes to node 101 node 251 subscribes to node 251 pg_ctl and slon are stopped on node 101 to simulate system down Before failover I have sub_set |sub_provider |sub_receiver |sub_forward |sub_active 1 | 101 | 151 | t | t 1 | 101 | 201 | t | t 1 | 201 | 251 | t | t However after failover (id = 101, backup node = 201); I have sub_set |sub_provider |sub_receiver |sub_forward |sub_active 1 | 201 | 251 | t | t 1 | 151 | 201 | t | t 1 | 201 | 251 | t | t on all nodes! Which is obviously wrong. I have tried correcting the problem by manually deleting the incorrect provider and then cleaning sl_confirm, sl_event, sl_seqlog and sl_setsync on all nodes with delete from sl_confirm; delete from sl_event; delete from sl_seqlog; delete from sl_setsync; after which slon can be restarted, but slony still thinks the new provider node is replicated, as evidenced by slaveb=# insert into activation_code_prefix slaveb-# (code_prefix, product_id) slaveb-# values slaveb-# ('XX99', 300); ERROR: Slony-I: Table activation_code_prefix is replicated and cannot be modified on a subscriber node In plain language, this is very, very bad. :( A fix or workaround would be greatly appreciated. TIA, Melvin Davidson
- Previous message: [Slony1-general] DDL Script eror
- Next message: [Slony1-general] 1.1.5?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list