Henry henry at zen.co.za
Mon Mar 24 10:54:39 PDT 2008

On Mon, March 24, 2008 4:18 pm, Andrew Sullivan wrote:
> On Mon, Mar 24, 2008 at 11:10:40AM +0200, Henry wrote:
>
>> Let's say you have many slaves being rep'd from a master.  Sometimes,
>> one
>> of these slaves will fall behind in a big way.  Even stopping all
>> activity
>> on all systems to allow it to catch up doesn't resolve the problem.
>
> If you restart the slons, does it help?

Yes, that's always been my first step.  Problem is, there's still a very
slow trickle of mods coming in externally from the cluster (but with major
volume mods stopped), and even this slow trickle seems to prevent the
offending node from catching up.

>
>> My question is the following:  from an admin point of view in trying to
>> resolve this kind of issue, what slony tables should I poke around in
>> (and
>> what flag/s should I take note of), and what errors/footprints should I
>> look for in the slony logs which might be contributing to the node in
>> question never catching up?
>
> It's sort of impossible to say in your case, because you've given us so
> little to work with.  But I'd start looking at _slony_schema.sl_status.
> I'd
> also have a look at the syncs in the logs from the slons for the origin
> and
> that replica, and compare with the slon logs from a working replica.  I'd
> also look at the pg_locks view on the affected node.
>
>> My (horribly noob) solution so far has been to stop everything, drop
>> replication systems from all nodes, and start again (a process which can
>> throw a week in the drain).
>
> That does not seem to be a great idea, I agree.  You could improve this
> global thermonuclear war option to be merely a neutron bomb by performing
> a DROP NODE for just the bad node.  But it'd be better to figure out
> what's wrong.

Now that sounds like a killer idea (at least until I figure out what the
hell's causing the problem).  I'll poke around and see if I can figure out
how to drop a node and re-subscribe it (not sure if that's idiomatically
correct).

h



More information about the Slony1-general mailing list