[Slony1-general] Manually kicking off a logswitch

Wed Sep 19 16:11:42 PDT 2012

On 9/19/2012 6:46 PM, Brian Fehrle wrote:
> Hi all,
>
> Postgres 8.4, slony 1.2.21
>
> Previously I had reached out on an issue where the sl_log_1 or sl_log_2
> table would get so full that replication would come to a crawl, only
> processing one event at a time. It seems as though HUGE data
> insert,updates,or deletes to replicated tables are to cause, and being
> on slony 1.2 there isn't much we can do to get around it.
>
> The size of the sl_log table was above 9 million rows where we saw this
> as an issue. We are now going along the path of doing much smaller
> groups of updates so we don't get into the same condition as before. We
> just did 1.2 million rows worth of updates and it only took a few
> minutes to replicate it all to the slave. Good news.
>
> But now our sl_log_1 table is sitting at 1.2 million rows, and we'd like
> to let it be switched and truncated by slony before kicking off a few
> more million rows worth of updates. From what I can tell via
> documentation, this is not all that often.
>
> So what is the thoughts on manually kicking off the logswitch via
> "select _slony.logswitch_start()" on the master? I reviewed the code and
> it won't let a switch occure if it's already in progress, so it seems
> it's being pretty safe in its execution. However it looks like all it
> really does is update a sequence to say "we're currently switching" and
> then slony does it in the background.

The issue itself is caused by a problem with the log select query that 
was fixed in 2.1 (commit d4118d... from Jan 27, 2011).

>
> So my questions are. 1. is this a safe practice to do? We may be doing
> it multiple times a day (guestimate, ten or more times?). and 2. what is
> slony doing in the background for this to occur? It looks like it
> actually switches to the new log right away, but takes some time before
> the old log is truncated, does it need to wait until a cleanevent can
> run on the data within, aka about 10 minutes?  (#2 is more out of
> curiosity).

slon is calling the stored procedure cleanupEvent(interval). You can 
safely call that with an interval of a few minutes. The interval is how 
old events must be at least to be purged from sl_event. Even a zero 
interval should be safe.

However, Slony is normally trying to do this every 10 minutes (if memory 
serves). With that backlog it is very likely that the previous logswitch 
attempt hasn't finished yet because the backlog is in the "old" log table.

As said, the problem is fixed in 2.1. If upgrading to Slony 2.1 is not 
an option for you, you may consider backpatching the above commit into 
your 1.2.

Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin