Wed Jul 22 05:23:28 PDT 2009
- Previous message: [Slony1-general] slon engine pointing to the wrong directory
- Next message: [Slony1-general] Vacuum of sl_1 and 2 logs. (postgres)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Did you tried with slony 1.2 ? Are there any differences in the code of failover or drop node function ? Richard Yen wrote: > Hi, > > I've been trying to get failover to work in 2.0.2, but it seems to hang. > > I have a 3-node architecture, and have tried the instructions, per > http://www.slony.info/documentation/failover.html#COMPLEXFAILOVER > > Here's how I do it (node 1 is provider, and node 2 is failover node): > -- subscribe node 3 to node 2 > -- execute FAILOVER > -- slonik hangs > > If I go into node 2 and to and look at sl_subscribe, there is only one > row with provider=2, subscriber=3 (which is correct and expected). > However, looking at sl_status, looks like everything is running just > fine (sl_event_lag and sl_time_lag go up and down, as if there's > activity). HOWEVER, if I do an update on node 2, the update never > makes it to node 3. (Node 1 still says provider=1, subscriber=2 AND > provider=2, subscriber=3) > > slonik is still running/hanging during all this. > > if I strace the slonik process, I find the following: > > ======BEGIN STRACE====== > rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 > sendto(3, "Q\0\0\0\30begin transaction; \0"..., 25, 0, NULL, 0) = 25 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, > revents=POLLIN}]) > recvfrom(3, "C\0\0\0\nBEGIN\0Z\0\0\0\5T"..., 16384, 0, NULL, NULL) = 17 > rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 > sendto(3, "Q\0\0\0Wselect nl_backendpid from > \"_sltest\".sl_nodelock where nl_backendpid <> 28927; \0"..., 88, > 0, NULL, 0) = 88 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, > revents=POLLIN}]) > recvfrom(3, > "T\0\0\0&\0\1nl_backendpid\0\304\27Dn\0\3\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00529006D\0\0\0\17\0\1\0\0\0\00529011D\0\0\0\17\0\1\0\0\0\00529012C\0\0\0\vSELECT\0Z\0\0\0\5T"..., > 16384, 0, NULL, NULL) = 105 > rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 > sendto(3, "Q\0\0\0\32rollback transaction;\0"..., 27, 0, NULL, 0) = 27 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, > revents=POLLIN}]) > recvfrom(3, "C\0\0\0\rROLLBACK\0Z\0\0\0\5I"..., 16384, 0, NULL, NULL) > = 20 > rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 > sendto(4, "Q\0\0\0\30begin transaction; \0"..., 25, 0, NULL, 0) = 25 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > poll([{fd=4, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=4, > revents=POLLIN}]) > recvfrom(4, "C\0\0\0\nBEGIN\0Z\0\0\0\5T"..., 16384, 0, NULL, NULL) = 17 > rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 > sendto(4, "Q\0\0\0Wselect nl_backendpid from > \"_sltest\".sl_nodelock where nl_backendpid <> 16155; \0"..., 88, > 0, NULL, 0) = 88 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > poll([{fd=4, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=4, > revents=POLLIN}]) > recvfrom(4, > "T\0\0\0&\0\1nl_backendpid\0\0\1\"\203\0\3\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00517510D\0\0\0\17\0\1\0\0\0\00517511C\0\0\0\vSELECT\0Z\0\0\0\5T"..., > 16384, 0, NULL, NULL) = 89 > rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0 > sendto(4, "Q\0\0\0\32rollback transaction;\0"..., 27, 0, NULL, 0) = 27 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > poll([{fd=4, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=4, > revents=POLLIN}]) > recvfrom(4, "C\0\0\0\rROLLBACK\0Z\0\0\0\5I"..., 16384, 0, NULL, NULL) > = 20 > rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 > rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > nanosleep({1, 0}, {1, 0}) = 0 > ======END STRACE====== > > This repeats over and over again in the log (infinite loop?) > > I also tried a different time with the script provided by slony-ctl, > but no luck. (It DOES, however, work when there's only 2 nodes) > > Are there any know issues for 3+ node failover in 2.0.2? > > Would anyone be able to walk me through this, if perhaps I'm doing > something wrong? > > Thanks! > --Richard > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general -- Cyril SCETBON - Ingénieur bases de données Cellule bases de données AUSY pour France Télécom - OPF/PORTAILS/DOP/HEBEX Tél : +33 (0)4 97 12 87 60 Jabber : cscetbon at jabber.org France Telecom - Orange 790 Avenue du Docteur Maurice Donat Bâtiment Marco Polo C1 - Bureau 202 06250 Mougins France *********************************** Ce message et toutes les pieces jointes (ci-apres le 'message') sont confidentiels et etablis a l'intention exclusive de ses destinataires. Toute utilisation ou diffusion non autorisee est interdite. Tout message electronique est susceptible d'alteration. Le Groupe France Telecom decline toute responsabilite au titre de ce message s'il a ete altere, deforme ou falsifie. Si vous n'etes pas destinataire de ce message, merci de le detruire immediatement et d'avertir l'expediteur. *********************************** This message and any attachments (the 'message') are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited. Messages are susceptible to alteration. France Telecom Group shall not be liable for the message if altered, changed or falsified. If you are not recipient of this message, please cancel it immediately and inform the sender. ************************************
- Previous message: [Slony1-general] slon engine pointing to the wrong directory
- Next message: [Slony1-general] Vacuum of sl_1 and 2 logs. (postgres)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list