Cyril Scetbon cscetbon.ext at orange-ftgroup.com
Wed Jul 22 05:23:28 PDT 2009
Did you tried with slony 1.2 ? Are there any differences in the code of 
failover or drop node function ?

Richard Yen wrote:
> Hi,
>
> I've been trying to get failover to work in 2.0.2, but it seems to hang.
>
> I have a 3-node architecture, and have tried the instructions, per 
> http://www.slony.info/documentation/failover.html#COMPLEXFAILOVER
>
> Here's how I do it (node 1 is provider, and node 2 is failover node):
>    -- subscribe node 3 to node 2
>    -- execute FAILOVER
>    -- slonik hangs
>
> If I go into node 2 and to and look at sl_subscribe, there is only one 
> row with provider=2, subscriber=3 (which is correct and expected).  
> However, looking at sl_status, looks like everything is running just 
> fine (sl_event_lag and sl_time_lag go up and down, as if there's 
> activity).  HOWEVER, if I do an update on node 2, the update never 
> makes it to node 3.  (Node 1 still says provider=1, subscriber=2 AND 
> provider=2, subscriber=3)
>
> slonik is still running/hanging during all this.
>
> if I strace the slonik process, I find the following:
>
> ======BEGIN STRACE======
> rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
> sendto(3, "Q\0\0\0\30begin transaction; \0"..., 25, 0, NULL, 0) = 25
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, 
> revents=POLLIN}])
> recvfrom(3, "C\0\0\0\nBEGIN\0Z\0\0\0\5T"..., 16384, 0, NULL, NULL) = 17
> rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
> sendto(3, "Q\0\0\0Wselect nl_backendpid from 
> \"_sltest\".sl_nodelock     where nl_backendpid <> 28927; \0"..., 88, 
> 0, NULL, 0) = 88
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, 
> revents=POLLIN}])
> recvfrom(3, 
> "T\0\0\0&\0\1nl_backendpid\0\304\27Dn\0\3\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00529006D\0\0\0\17\0\1\0\0\0\00529011D\0\0\0\17\0\1\0\0\0\00529012C\0\0\0\vSELECT\0Z\0\0\0\5T"..., 
> 16384, 0, NULL, NULL) = 105
> rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
> sendto(3, "Q\0\0\0\32rollback transaction;\0"..., 27, 0, NULL, 0) = 27
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, 
> revents=POLLIN}])
> recvfrom(3, "C\0\0\0\rROLLBACK\0Z\0\0\0\5I"..., 16384, 0, NULL, NULL) 
> = 20
> rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
> sendto(4, "Q\0\0\0\30begin transaction; \0"..., 25, 0, NULL, 0) = 25
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> poll([{fd=4, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=4, 
> revents=POLLIN}])
> recvfrom(4, "C\0\0\0\nBEGIN\0Z\0\0\0\5T"..., 16384, 0, NULL, NULL) = 17
> rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
> sendto(4, "Q\0\0\0Wselect nl_backendpid from 
> \"_sltest\".sl_nodelock     where nl_backendpid <> 16155; \0"..., 88, 
> 0, NULL, 0) = 88
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> poll([{fd=4, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=4, 
> revents=POLLIN}])
> recvfrom(4, 
> "T\0\0\0&\0\1nl_backendpid\0\0\1\"\203\0\3\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00517510D\0\0\0\17\0\1\0\0\0\00517511C\0\0\0\vSELECT\0Z\0\0\0\5T"..., 
> 16384, 0, NULL, NULL) = 89
> rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
> sendto(4, "Q\0\0\0\32rollback transaction;\0"..., 27, 0, NULL, 0) = 27
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> poll([{fd=4, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=4, 
> revents=POLLIN}])
> recvfrom(4, "C\0\0\0\rROLLBACK\0Z\0\0\0\5I"..., 16384, 0, NULL, NULL) 
> = 20
> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> nanosleep({1, 0}, {1, 0})               = 0
> ======END STRACE======
>
> This repeats over and over again in the log (infinite loop?)
>
> I also tried a different time with the script provided by slony-ctl, 
> but no luck. (It DOES, however, work when there's only 2 nodes)
>
> Are there any know issues for 3+ node failover in 2.0.2?
>
> Would anyone be able to walk me through this, if perhaps I'm doing 
> something wrong?
>
> Thanks!
> --Richard
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general

-- 
Cyril SCETBON - Ingénieur bases de données
Cellule bases de données
AUSY pour France Télécom - OPF/PORTAILS/DOP/HEBEX

Tél : +33 (0)4 97 12 87 60
Jabber : cscetbon at jabber.org
France Telecom - Orange
790 Avenue du Docteur Maurice Donat 
Bâtiment Marco Polo C1 - Bureau 202
06250 Mougins
France

***********************************
Ce message et toutes les pieces jointes (ci-apres le 'message') sont
confidentiels et etablis a l'intention exclusive de ses destinataires.
Toute utilisation ou diffusion non autorisee est interdite.
Tout message electronique est susceptible d'alteration. Le Groupe France
Telecom decline toute responsabilite au titre de ce message s'il a ete
altere, deforme ou falsifie.
Si vous n'etes pas destinataire de ce message, merci de le detruire
immediatement et d'avertir l'expediteur.
***********************************
This message and any attachments (the 'message') are confidential and
intended solely for the addressees.
Any unauthorised use or dissemination is prohibited.
Messages are susceptible to alteration. France Telecom Group shall not be
liable for the message if altered, changed or falsified.
If you are not recipient of this message, please cancel it immediately and
inform the sender.
************************************



More information about the Slony1-general mailing list