[Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists

Tue Nov 23 08:15:28 PST 2010

On 10-11-23 09:48 AM, Vick Khera wrote:
> On Tue, Nov 23, 2010 at 9:31 AM, Steve Singer<ssinger at ca.afilias.info>  wrote:
>> Slony can get into a state where it can't keep up/catch up with
>> replication because the sl_log table is so large.
>>
>>
>> Does this problem bite people often enough in the real world for us to
>> devote effort to fixing?
>>
>
> It used to happen to me a lot when I had my origin running on spinning
> media.  Ever since I moved to an SSD, it doesn't really happen.  At
> worst when I do a large delete I fall behind by a few minutes but it
> catches up quickly.  For me, it didn't even require taking the DB down
> for any extended period.. just running a large update or delete that
> touched many many rows (ie, generated a lot of events in sl_log) could
> send the system into a tailspin that would take hours or possibly days
> (until we hit a weekend) to recover.
>
> I am not sure it was caused by the log being too big... because
> sometimes reindexing the tables on the replica would clear up the
> backlog quickly too.  But I may be sniffing down the wrong trail.
>

The other place this will hit busy systems is during the initial sync.
If your database is very large (or very busy) a lot of log rows can 
accumulate while that initial sync is going on.   OMIT_COPY doesn't help 
you because it requires an outage to get the master and slave in sync 
(just the loading time on a 1TB database is a while).

CLONE PREPARE/FINISH also aren't of help because a) these only work if 
you already have at least 1 subscriber setup and b) after you do the 
clone prepare any later transactions still need to be kept in sl_log 
until the new slave is up and running.

_______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general