Christopher Browne cbbrowne at ca.afilias.info
Thu Jan 14 09:06:31 PST 2010
Andy Dale wrote:
> Hi,
>
> I have attempted to investigate further into why the failover/drop 
> node is not being picked up on node 3.  Here is the actual output of 
> the slonik script in my original post:
>
> [oper at backup slonik]$ slonik forceProviderChangeToBackup.sk
> INFO: calling failedNode(1,2) on node 1
> forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 1 has other 
> direct receivers - change providers only
> forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 2 has no 
> other direct receivers - move now
> forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 3 has no 
> other direct receivers - move now
> INFO: calling failedNode(1,2) on node 3
> forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 1 has other 
> direct receivers - change providers only
> forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 2 has no 
> other direct receivers - move now
> forceProviderChangeToBackup.sk:9: NOTICE:  failedNode: set 3 has no 
> other direct receivers - move now
> INFO: Waiting for slon engines to restart
> IMPORTANT: Last known SYNC for set 1 = 383
> INFO: Node with highest sync for set 1 is 2
> INFO: Node with highest sync for set 2 is 2
> INFO: Node with highest sync for set 3 is 2
>
> After the inspecting the logfile generated by the slon process at node 
> 3 and it seems to pick up on the fact that the set has been moved to 
> node 2, but it does not remove node 1.
>
> DEBUG2 remoteWorkerThread_2: Received event 2,180 ACCEPT_SET
> DEBUG2 start processing ACCEPT_SET
> DEBUG2 ACCEPT: set=1
> DEBUG2 ACCEPT: old origin=1
> DEBUG2 ACCEPT: new origin=2
> DEBUG2 ACCEPT: move set seq=384
> DEBUG2 got parms ACCEPT_SET
> DEBUG2 ACCEPT_SET - node not origin
> DEBUG2 remoteListenThread_2: queue event 2,183 SYNC
> DEBUG2 remoteListenThread_2: queue event 2,184 DROP_NODE
> DEBUG2 remoteListenThread_2: queue event 2,185 SYNC
> DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep
> ERROR  slon_connectdb: PQconnectdb("dbname=db host=node1 port=5432 
> user=postgres") failed - could not connect to server: Connection refused
>         Is the server running on host "node 1" and accepting
>         TCP/IP connections on port 5432?
> WARN   remoteListenThread_1: DB connection failed - sleep 10 seconds
> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 181
> DEBUG2 remoteListenThread_2: LISTEN
> DEBUG2 remoteListenThread_2: queue event 2,186 SYNC
> DEBUG2 remoteListenThread_2: UNLISTEN
> DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep
> DEBUG2 localListenThread: Received event 3,181 SYNC
> ERROR  slon_connectdb: PQconnectdb("dbname=db host=node1 port=5432 
> user=postgres") failed - could not connect to server: Connection refused
>         Is the server running on host "node 1" and accepting
>         TCP/IP connections on port 5432?
>
>
> Does the below line mean it is waiting for some kind of notification 
> from somewhere? :
>     DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep
Yup, that indicates that node #3 hasn't completed the failover.  It 
hasn't fully accepted the new provider.
I'm not sure what to suggest on that.
>
> Additionally, does anyone know how to make the slon logs contain a 
> timestamp (e.g. DEBUG2 [2009-01-14 12:12] syncThread), as I find it 
> pretty hard to follow what is going on when comparing the log files at 
> multiple nodes.
http://www.slony.info/documentation/runtime-config.html

See the slon parameter log_timestamp.

What I find *I* prefer is to use syslog to collect slon logs, with the 
result that the timestamps are generated by syslog.  I know our DBAs 
don't use that; your milage may vary.


More information about the Slony1-general mailing list