Andrew Sullivan ajs at crankycanuck.ca
Mon Mar 24 07:18:40 PDT 2008
On Mon, Mar 24, 2008 at 11:10:40AM +0200, Henry wrote:

> Let's say you have many slaves being rep'd from a master.  Sometimes, one
> of these slaves will fall behind in a big way.  Even stopping all activity
> on all systems to allow it to catch up doesn't resolve the problem.

If you restart the slons, does it help?
 
> My question is the following:  from an admin point of view in trying to
> resolve this kind of issue, what slony tables should I poke around in (and
> what flag/s should I take note of), and what errors/footprints should I
> look for in the slony logs which might be contributing to the node in
> question never catching up?

It's sort of impossible to say in your case, because you've given us so
little to work with.  But I'd start looking at _slony_schema.sl_status.  I'd
also have a look at the syncs in the logs from the slons for the origin and
that replica, and compare with the slon logs from a working replica.  I'd
also look at the pg_locks view on the affected node.

> My (horribly noob) solution so far has been to stop everything, drop
> replication systems from all nodes, and start again (a process which can
> throw a week in the drain).

That does not seem to be a great idea, I agree.  You could improve this
global thermonuclear war option to be merely a neutron bomb by performing a
DROP NODE for just the bad node.  But it'd be better to figure out what's
wrong.

A


More information about the Slony1-general mailing list