Jan Wieck jan at wi3ck.info
Tue Oct 14 09:20:30 PDT 2014
On 10/14/2014 11:57 AM, Glyn Astill wrote:
>> From: Granthana Biswas <granthana.biswas at gmail.com>
>>To: Glyn Astill <glynastill at yahoo.co.uk>
>>Cc: "slony1-general at lists.slony.info" <slony1-general at lists.slony.info>
>>Sent: Tuesday, 14 October 2014, 16:48
>>Subject: [Slony1-general] Changing master node's IP & port
>>
>>
>>
>>Hi Glyn,
>>
>>
>>Yes I had stopped all the slons for every node but I did not DELETE FROM "_Cluster1".sl_nodelock WHERE nl_nodeid = 1 AND nl_conncnt = 0;
>>
>>
>
> Well you shouldn't have had to run the delete, that's just to get you going.  I assume you're up and running now?

The cleanup procedure for nodelock checks if the backend process, 
holding the lock (if any) is still alive. If I had to guess my guess 
would be that someone/something changed the IP address of the server 
without stopping the slon processes first and that IP address change 
leads to a connection loss without the TCP connections getting RST or 
FIN packets. In that situation it is likely that the database backend 
from the old slon connection is waiting on a blocking read and will only 
notice that the connection is gone after a full TCP keepalive timeout, 
which defaults to several hours.

Terminating the slony related database connections via 
pg_terminate_backend() will make the nodelock cleanup succeed.

In any case it is a good practice to have TCP keepalive settings a lot 
more aggressive on both sides, PostgreSQL and Slony. At my former work 
place we used to set them to 60 seconds idle, then 9 keepalive packets 
in 7 second interval. That will let the connections time out in about 2 
minutes.


Regards, Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info


More information about the Slony1-general mailing list