bugzilla-daemon at main.slony.info bugzilla-daemon at main.slony.info
Tue Aug 3 10:51:05 PDT 2010
http://www.slony.info/bugzilla/show_bug.cgi?id=132

Steve Singer <ssinger at ca.afilias.info> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|low                         |medium

--- Comment #1 from Steve Singer <ssinger at ca.afilias.info> 2010-08-03 10:51:05 PDT ---
I've seen this happen some more often.

I think the following is happening.

-a slon_retry() method is called following the move set
-slon_retry signals the parent slon process which in turn sends a kill to the
child
-The child exits, I can find no 'cleanup' process that the child runs on a kill
to ensure the postgresql connections are closed or to remove entries from
sl_nodelock
-The child is restarted by the parent.
-The child calls cleanupNodelock() which checks to see if the pid for the
backend registered with sl_nodelock is still around.  I think sometimes the old
backend process is still around (hasn't yet exited) maybe because it is in the
middle of a query and hasn't yet noticed that the slon it is talking to has
gone away
-Since the backend process is still around the row isn't deleted from
sl_nodelock causing the insert into sl_nodelock to fail.

Since I've seen thsi happen more than an isolated incident and it causes the
watchdog to exit as well I am bumping the priority.

Options to fix this include
1) Having a the slon worker properly exit and remove itself from the
sl_nodelock table before exiting
2) Increase the 'sleep' time before restarting the child.  This doesn't really
fix the problem it just makes it less likely

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Slony1-bugs mailing list