[Slony1-general] Slony replication stops right after start Slony replication stops right after start

Fri Feb 24 06:34:18 PST 2012

On 12-02-24 08:21 AM, Ulas Albayrak wrote:

You didn't say what version of slony you are using with which version of 
postgresql.

I don't see anything in the logs you posted about the slon for the 
origin node generating sync events.  At DEBUG2 or higher (at least ons 
some versions of slony) you should be getting "syncThread: new 
sl_action_seq %s " type messages in the log for the slon origin.

Are new SYNC events being generated in the origin sl_event table with 
ev_origin=$originid?

Many versions of slony require an exclusive lock on sl_event to generate 
sync events.  Do you have something preventing this?  (ie look in 
pg_locks to see if the slony sync connection is waiting on a lock).

> Hi,
>
> I have been trying to set up a small Slony cluster (only 2 nodes) for
> the last 2 days but I can't get it to work. Everytime I get the same
> result: The replication starts of fine. Slony start copying, trying to
> get all the tables in the subscribing node up to speed. But somewhere
> along the way the 2nd node stops getting updates. Slony replicates all
> the data in a specific table up to a specific point in time and then
> no more. And this time seems to coincide with when the copying of data
> for that specific table started.
>
> An example to illustrate the scenario:
>
> Let's say I have set up the whole replication system and then at 12:00
> I start the actual replication. Around 12:05 copying of table A from
> node 1 to node 2 starts. It finishes but only the data that was
> received before  12:05 get copied to node 2. Then at 12:10 copying of
> table B starts. Same thing here: Slony copies all the data that was
> received before 12:10 to node 2. And this is the same for all tables.
>
> The logs for the slon deamons show:
>
> Origin node:
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=21942
> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
> PL/pgSQL function "cleanupevent" line 83 at PERFORM
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=21945
> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
> PL/pgSQL function "cleanupevent" line 83 at PERFORM
> NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
> CONTEXT:  SQL statement "SELECT "_fleetcluster".logswitch_start()"
> PL/pgSQL function "cleanupevent" line 101 at PERFORM
> 2012-02-24 12:17:39 CETINFO   cleanupThread:    0.019 seconds for cleanupEvent()
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=21949
> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
> PL/pgSQL function "cleanupevent" line 83 at PERFORM
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=23779
> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
> PL/pgSQL function "cleanupevent" line 83 at PERFORM
>
> Subscribing node:
> 2012-02-24 13:20:23 CETINFO   remoteWorkerThread_1: SYNC 5000000856
> done in 0.012 seconds
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 1 with
> 9 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 2 with
> 15 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 3 with
> 4 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 4 with
> 6 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 5 with
> 3 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 6 with
> 4 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 7 with
> 3 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 8 with
> 23 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 9 with
> 8 table(s) from provider 1
> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: SYNC 5000000857
> done in 0.014 seconds
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 1 with
> 9 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 2 with
> 15 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 3 with
> 4 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 4 with
> 6 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 5 with
> 3 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 6 with
> 4 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 7 with
> 3 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 8 with
> 23 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 9 with
> 8 table(s) from provider 1
> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: SYNC 5000000858
> done in 0.011 seconds
>
>
>
> Have anyone experienced this before or have any idea what could be causing this?
>