bugzilla-daemon at main.slony.info bugzilla-daemon at main.slony.info
Fri May 13 11:24:12 PDT 2011
http://www.slony.info/bugzilla/show_bug.cgi?id=213

           Summary: failover while slon is not running
           Product: Slony-I
           Version: devel
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: low
         Component: slon
        AssignedTo: slony1-bugs at lists.slony.info
        ReportedBy: ssinger at ca.afilias.info
                CC: slony1-bugs at lists.slony.info
   Estimated Hours: 0.0


This bug was introduced in  bfa8e601fe7ba1bd91a053901426d4f7195c53a0 (2.1.0)
and 60566590d683b85733404ef290e6c1823c4c014c (2.0.5)

If a failover command is executed while the slon for the backup node is not
running (say node 2)

The most ahead node (say node 3) will have a FAILOVER_SET event generated with
a ev_origin=1 (the failing node).

For the failover to finish that event needs to be processed on node 2.  When
the slon for node 2 is later started  it sees that no_active=false in sl_node
(this change was made in the above referenced commits).  Since the node is
inactive no remoteWorkerThread_1 is started so the slon for node 2 won't ever
process the FAILOVER_SET event since that event has ev_origin=1.


As a workaround if you get into this situation you can:

manually (with psql) set no_active=true for the failed node on node 2.  Then
start the slon for node 2.  It will now have a remoteWorkerThread_1 and process
the FAILVOVER_SET command.

Longer term we probably need to split out a nodes inactive status for rebuild
listen paths and waiting compared with starting slon worker threads?

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Slony1-bugs mailing list