[Slony1-general] Slony replication problem

Thu Mar 18 05:28:44 PDT 2010

Venkatraju wrote:
> Hi,
>
> I have a two node Slony cluster running version 1.2.14 on PostgreSQL
> 8.1.11 on CentOS 5. I noticed that the subscriber had not been
> replicating for almost 6
> hours now (from sl_status). Slony logs on publisher and subscriber
> contain the following messages:
>
> NOTICE:  Slony-I: log switch to sl_log_2 still in progress - sl_log_1
> not truncated
> WARN   cleanupThread: "select "_cluster".logswitch_weekly(); " -
> ERROR: Previous logswitch still in progress
> CONTEXT:  SQL statement "SELECT  "_cluster".logswitch_start()"
>
> Both sl_log_1 and sl_log_2 tables contain a large number of rows
> (~500,000 rows in sl_log_1 and ~8 million in sl_log_2). Some
> questions:
>
> 1) We found some long running transactions active on the publisher.
> Could this cause logswitch to fail? The long running transactions were
> against a database that is not used in replication - could those
> queries hold up the logswitch?
>
>   
The long running transaction could block the log switch.  The truncate 
from of sl_log_1 and switch to sl_log_2 is waiting.
Check pg_locks for ungranted locks - I imagine that will find the 
backend that's doing the log switch is probably waiting on a lock.

I would kill the long running transactions.  Slony does not function 
well with systems have long running transactions against the provider.

Run a vacuum on pg_listener after killing them as well.
> 2) Slony maintenance page
> (http://www.slony.info/documentation/maintenance.html) says:
>
> "That means that on a regular basis, these tables are completely
> cleared out, so that you will not suffer from them having grown to
> some significant size, due to heavy load, after which they are
> incapable of shrinking back down."
>
> "incapable of shrinking back down"? Does this mean there is no way to
> recover from this state without rebuilding the DBs?
>   

You don't have to worry about that for the sl_log_1/2 tables.  Truncate 
gets rid of the dead space altogether.  Other tables in your system 
however may be bloated though.

You don't need to rebuild your database to recover this, but you do need 
to run some disruptive operations - Vaccum full+reindex (really slow), 
cluster (not MVCC safe in 8.1, be 100% sure there are no old 
transactions accessing your tables before you start if you don't want to 
lose data), or a no-op alter table.  All of these operations will block 
access to your table while running.

Depending on the size of your DB and how bad it is bloated - reloading 
may be quicker.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.