Ulas Albayrak ulas.albayrak at gmail.com
Fri Feb 24 07:41:52 PST 2012
Hi,

Sorry, seems I forgot to post versions:

Slony: 2.0.7
PostgreSQL:  9.0.5

I restarted the slon deamon with debug=4 on both nodes and this is what I got:

Origin:
2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event
1,5000002765 SYNC
2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 -
SYNC 5000002766
2012-02-24 16:20:52 CETDEBUG2 localListenThread: Received event
1,5000002766 SYNC
2012-02-24 16:20:52 CETDEBUG2 syncThread: new sl_action_seq 45233 -
SYNC 5000002767
2012-02-24 16:20:54 CETDEBUG2 localListenThread: Received event
1,5000002767 SYNC
2012-02-24 16:20:54 CETDEBUG2 syncThread: new sl_action_seq 45263 -
SYNC 5000002768
2012-02-24 16:20:56 CETDEBUG2 localListenThread: Received event
1,5000002768 SYNC
2012-02-24 16:20:56 CETDEBUG2 remoteWorkerThread_2: forward confirm
1,5000002768 received by 2
2012-02-24 16:21:04 CETDEBUG2 syncThread: new sl_action_seq 45263 -
SYNC 5000002769
2012-02-24 16:21:06 CETDEBUG2 remoteListenThread_2: queue event
2,5000001298 SYNC
2012-02-24 16:21:06 CETDEBUG2 remoteWorkerThread_2: Received event #2
from 5000001298 type:SYNC
2012-02-24 16:21:06 CETDEBUG1 calc sync size - last time: 1 last
length: 18005 ideal: 3 proposed size: 3
2012-02-24 16:21:06 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001298 processing
2012-02-24 16:21:06 CETDEBUG1 remoteWorkerThread_2: no sets need
syncing for this event
2012-02-24 16:21:08 CETDEBUG2 localListenThread: Received event
1,5000002769 SYNC
2012-02-24 16:21:08 CETDEBUG2 syncThread: new sl_action_seq 45292 -
SYNC 5000002770
2012-02-24 16:21:08 CETDEBUG2 remoteListenThread_2: queue event
2,5000001299 SYNC
2012-02-24 16:21:08 CETDEBUG2 remoteWorkerThread_2: Received event #2
from 5000001299 type:SYNC
2012-02-24 16:21:08 CETDEBUG1 calc sync size - last time: 1 last
length: 2003 ideal: 29 proposed size: 3
2012-02-24 16:21:08 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001299 processing
2012-02-24 16:21:08 CETDEBUG1 remoteWorkerThread_2: no sets need
syncing for this event
2012-02-24 16:21:10 CETDEBUG2 localListenThread: Received event
1,5000002770 SYNC
2012-02-24 16:21:10 CETDEBUG2 syncThread: new sl_action_seq 45322 -
SYNC 5000002771
2012-02-24 16:21:12 CETDEBUG2 localListenThread: Received event
1,5000002771 SYNC
2012-02-24 16:21:12 CETDEBUG2 syncThread: new sl_action_seq 45382 -
SYNC 5000002772
2012-02-24 16:21:14 CETDEBUG2 localListenThread: Received event
1,5000002772 SYNC
2012-02-24 16:21:16 CETDEBUG2 remoteWorkerThread_2: forward confirm
1,5000002772 received by 2
2012-02-24 16:21:18 CETDEBUG2 syncThread: new sl_action_seq 45411 -
SYNC 5000002773
2012-02-24 16:21:20 CETDEBUG2 localListenThread: Received event
1,5000002773 SYNC
2012-02-24 16:21:26 CETDEBUG2 syncThread: new sl_action_seq 45417 -
SYNC 5000002774
2012-02-24 16:21:26 CETDEBUG2 remoteListenThread_2: queue event
2,5000001300 SYNC
2012-02-24 16:21:26 CETDEBUG2 remoteWorkerThread_2: Received event #2
from 5000001300 type:SYNC
2012-02-24 16:21:26 CETDEBUG1 calc sync size - last time: 1 last
length: 18012 ideal: 3 proposed size: 3
2012-02-24 16:21:26 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001300 processing
2012-02-24 16:21:26 CETDEBUG1 remoteWorkerThread_2: no sets need
syncing for this event
2012-02-24 16:21:26 CETDEBUG2 remoteWorkerThread_2: forward confirm
1,5000002773 received by 2
2012-02-24 16:21:28 CETDEBUG2 syncThread: new sl_action_seq 45430 -
SYNC 5000002775
2012-02-24 16:21:28 CETDEBUG2 remoteListenThread_2: queue event
2,5000001301 SYNC
2012-02-24 16:21:28 CETDEBUG2 remoteWorkerThread_2: Received event #2
from 5000001301 type:SYNC
2012-02-24 16:21:28 CETDEBUG1 calc sync size - last time: 1 last
length: 2004 ideal: 29 proposed size: 3
2012-02-24 16:21:28 CETDEBUG2 remoteWorkerThread_2: SYNC 5000001301 processing
2012-02-24 16:21:28 CETDEBUG1 remoteWorkerThread_2: no sets need
syncing for this event
2012-02-24 16:21:30 CETDEBUG2 syncThread: new sl_action_seq 45460 -
SYNC 5000002776
2012-02-24 16:21:30 CETDEBUG2 remoteWorkerThread_2: forward confirm
1,5000002775 received by 2
2012-02-24 16:21:32 CETDEBUG2 localListenThread: Received event
1,5000002774 SYNC
2012-02-24 16:21:32 CETDEBUG2 localListenThread: Received event
1,5000002775 SYNC
2012-02-24 16:21:32 CETDEBUG2 localListenThread: Received event
1,5000002776 SYNC

Subscriber:
2012-02-24 16:28:53 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local
log_status is 2
2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1
2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do
2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote
log_status = 1
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor
2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds
delay for first row
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused
line buffers
2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds
until close cursor
2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0
updates=0 deletes=0
2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper
timing:  pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3
2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper
timing:  large tuples 0.000/0
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper
thread status
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR
line to worker
2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for
workgroup to finish
2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished
2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to pool
2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done.
2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE
2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup
2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work
2012-02-24 16:28:53 CETINFO   remoteWorkerThread_1: SYNC 5000002873
done in 0.014 seconds
2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873
sync_event timing:  pqexec (s/count)- provider 0.001/1 - subsc
riber 0.008/1 - IUD 0.001/4
2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001345
2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event
2,5000001345 SYNC
2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001346
2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event
2,5000001346 SYNC
2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event
1,5000002874 SYNC
2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event
1,5000002875 SYNC
2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event
1,5000002876 SYNC
2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1
from 5000002874 type:SYNC
2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last
length: 18011 ideal: 3 proposed size: 3
2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 processing
2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query -
pulling big actionid list for 1
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 1 with
9 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 2 with
15 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 3 with
4 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 4 with
6 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 5 with
3 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 6 with
4 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 7 with
3 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 8 with
23 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: syncing set 9 with
8 table(s) from provider 1
2012-02-24 16:29:11 CETDEBUG4  ssy_action_list value:
2012-02-24 16:29:11 CETDEBUG2  ssy_action_list length: 0
2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local
log_status is 2
2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1
2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do
2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote
log_status = 1
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor
2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds
delay for first row
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused
line buffers
2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds
until close cursor
2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0
updates=0 deletes=0
2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper
timing:  pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3
2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper
timing:  large tuples 0.000/0
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper
thread status
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR
line to worker
2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for
workgroup to finish
2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished
2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to pool
2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done.
2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE
2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup
2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work
2012-02-24 16:29:11 CETINFO   remoteWorkerThread_1: SYNC 5000002876
done in 0.014 seconds
2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876
sync_event timing:  pqexec (s/count)- provider 0.001/1 - subsc
riber 0.008/1 - IUD 0.001/4
2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm
2,5000001346 received by 1

New SYNC events are being generated. I can find several rows in the
origin's sl_event table with ev_origin=$originid. None of them however
(including rows with ev_origin=$subscribedid) are older than about 15
minutes, even though the most recent data in the subscriber db is
several hours old.

I'm not really sure how to interpret the data in the pg_locks view,
but none of them (neither origin or subscriber) have any rows where
the "granted" column isn't set to TRUE, if that means anything?

/Ulas


On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer <ssinger at ca.afilias.info> wrote:
> On 12-02-24 08:21 AM, Ulas Albayrak wrote:
>
> You didn't say what version of slony you are using with which version of
> postgresql.
>
> I don't see anything in the logs you posted about the slon for the origin
> node generating sync events.  At DEBUG2 or higher (at least ons some
> versions of slony) you should be getting "syncThread: new sl_action_seq %s "
> type messages in the log for the slon origin.
>
> Are new SYNC events being generated in the origin sl_event table with
> ev_origin=$originid?
>
> Many versions of slony require an exclusive lock on sl_event to generate
> sync events.  Do you have something preventing this?  (ie look in pg_locks
> to see if the slony sync connection is waiting on a lock).
>
>
>
>
>
>> Hi,
>>
>> I have been trying to set up a small Slony cluster (only 2 nodes) for
>> the last 2 days but I can't get it to work. Everytime I get the same
>> result: The replication starts of fine. Slony start copying, trying to
>> get all the tables in the subscribing node up to speed. But somewhere
>> along the way the 2nd node stops getting updates. Slony replicates all
>> the data in a specific table up to a specific point in time and then
>> no more. And this time seems to coincide with when the copying of data
>> for that specific table started.
>>
>> An example to illustrate the scenario:
>>
>> Let's say I have set up the whole replication system and then at 12:00
>> I start the actual replication. Around 12:05 copying of table A from
>> node 1 to node 2 starts. It finishes but only the data that was
>> received before  12:05 get copied to node 2. Then at 12:10 copying of
>> table B starts. Same thing here: Slony copies all the data that was
>> received before 12:10 to node 2. And this is the same for all tables.
>>
>> The logs for the slon deamons show:
>>
>> Origin node:
>> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=21942
>> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
>> PL/pgSQL function "cleanupevent" line 83 at PERFORM
>> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=21945
>> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
>> PL/pgSQL function "cleanupevent" line 83 at PERFORM
>> NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
>> CONTEXT:  SQL statement "SELECT "_fleetcluster".logswitch_start()"
>> PL/pgSQL function "cleanupevent" line 101 at PERFORM
>> 2012-02-24 12:17:39 CETINFO   cleanupThread:    0.019 seconds for
>> cleanupEvent()
>> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=21949
>> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
>> PL/pgSQL function "cleanupevent" line 83 at PERFORM
>> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=23779
>> CONTEXT:  SQL statement "SELECT "_fleetcluster".cleanupNodelock()"
>> PL/pgSQL function "cleanupevent" line 83 at PERFORM
>>
>> Subscribing node:
>> 2012-02-24 13:20:23 CETINFO   remoteWorkerThread_1: SYNC 5000000856
>> done in 0.012 seconds
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 1 with
>> 9 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 2 with
>> 15 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 3 with
>> 4 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 4 with
>> 6 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 5 with
>> 3 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 6 with
>> 4 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 7 with
>> 3 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 8 with
>> 23 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: syncing set 9 with
>> 8 table(s) from provider 1
>> 2012-02-24 13:20:41 CETINFO   remoteWorkerThread_1: SYNC 5000000857
>> done in 0.014 seconds
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 1 with
>> 9 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 2 with
>> 15 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 3 with
>> 4 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 4 with
>> 6 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 5 with
>> 3 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 6 with
>> 4 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 7 with
>> 3 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 8 with
>> 23 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: syncing set 9 with
>> 8 table(s) from provider 1
>> 2012-02-24 13:20:43 CETINFO   remoteWorkerThread_1: SYNC 5000000858
>> done in 0.011 seconds
>>
>>
>>
>> Have anyone experienced this before or have any idea what could be causing
>> this?
>>
>



-- 
Ulas Albayrak
ulas.albayrak at gmail.com


More information about the Slony1-general mailing list