Fri Feb 24 08:44:19 PST 2012
- Previous message: [Slony1-general] Slony replication stops right after start Slony replication stops right after start
- Next message: [Slony1-general] Slony replication stops right after start Slony replication stops right after start
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12-02-24 10:41 AM, Ulas Albayrak wrote: > Hi, > > Sorry, seems I forgot to post versions: > > Slony: 2.0.7 > PostgreSQL: 9.0.5 > > I restarted the slon deamon with debug=4 on both nodes and this is what I got: > > Origin: > 2012-02-24 16:20:50 CETDEBUG2 localListenThread: Received event > 1,5000002765 SYNC > 2012-02-24 16:20:50 CETDEBUG2 syncThread: new sl_action_seq 45211 - > SYNC 5000002766 This shows that SYNC events are now being generated on the origin. > Subscriber: > 2012-02-24 16:28:53 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: current local > log_status is 2 > 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: activate helper 1 > 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: waiting for log data > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: got work to do > 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1_1: current remote > log_status = 1 > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor > 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds > delay for first row > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: return 50 unused > line buffers > 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: 0.003 seconds > until close cursor > 2012-02-24 16:28:53 CETDEBUG1 remoteHelperThread_1_1: inserts=0 > updates=0 deletes=0 This shows that SYNC events are being processed on the slave and there is nothing to do. I also notice that you have 8 replication sets. Are changes not being replicated to all tables in all the sets or only some of the sets? sl_log_1 and sl_log_2 on the origin should have a record of rows that need to be replicated. If you insert a row into one of your tables you should then see that data in sl_log_1 or sl_log_2. Is this the case. > 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 > 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: large tuples 0.000/0 > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: change helper > thread status > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR > line to worker > 2012-02-24 16:28:53 CETDEBUG3 remoteHelperThread_1_1: waiting for > workgroup to finish > 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: helper 1 finished > 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: returning lines to pool > 2012-02-24 16:28:53 CETDEBUG3 remoteWorkerThread_1: all helpers done. > 2012-02-24 16:28:53 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE > 2012-02-24 16:28:53 CETDEBUG2 remoteWorkerThread_1: cleanup > 2012-02-24 16:28:53 CETDEBUG4 remoteHelperThread_1_1: waiting for work > 2012-02-24 16:28:53 CETINFO remoteWorkerThread_1: SYNC 5000002873 > done in 0.014 seconds > 2012-02-24 16:28:53 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002873 > sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc > riber 0.008/1 - IUD 0.001/4 > 2012-02-24 16:28:55 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001345 > 2012-02-24 16:28:59 CETDEBUG2 localListenThread: Received event > 2,5000001345 SYNC > 2012-02-24 16:29:05 CETDEBUG2 syncThread: new sl_action_seq 1 - SYNC 5000001346 > 2012-02-24 16:29:11 CETDEBUG2 localListenThread: Received event > 2,5000001346 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event > 1,5000002874 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event > 1,5000002875 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteListenThread_1: queue event > 1,5000002876 SYNC > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: Received event #1 > from 5000002874 type:SYNC > 2012-02-24 16:29:11 CETDEBUG1 calc sync size - last time: 1 last > length: 18011 ideal: 3 proposed size: 3 > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: SYNC 5000002876 processing > 2012-02-24 16:29:11 CETDEBUG1 about to monitor_subscriber_query - > pulling big actionid list for 1 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 1 with > 9 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 2 with > 15 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 3 with > 4 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 4 with > 6 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 5 with > 3 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 6 with > 4 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 7 with > 3 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 8 with > 23 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: syncing set 9 with > 8 table(s) from provider 1 > 2012-02-24 16:29:11 CETDEBUG4 ssy_action_list value: > 2012-02-24 16:29:11 CETDEBUG2 ssy_action_list length: 0 > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: current local > log_status is 2 > 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: activate helper 1 > 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: waiting for log data > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: got work to do > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1_1: current remote > log_status = 1 > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: allocate line buffers > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetch from cursor > 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds > delay for first row > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: fetched 0 log rows > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: return 50 unused > line buffers > 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: 0.002 seconds > until close cursor > 2012-02-24 16:29:11 CETDEBUG1 remoteHelperThread_1_1: inserts=0 > updates=0 deletes=0 > 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: pqexec (s/count)- provider 0.002/3 - subscriber 0.000/3 > 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: sync_helper > timing: large tuples 0.000/0 > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: change helper > thread status > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: send DONE/ERROR > line to worker > 2012-02-24 16:29:11 CETDEBUG3 remoteHelperThread_1_1: waiting for > workgroup to finish > 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: helper 1 finished > 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: returning lines to pool > 2012-02-24 16:29:11 CETDEBUG3 remoteWorkerThread_1: all helpers done. > 2012-02-24 16:29:11 CETDEBUG4 remoteWorkerThread_1: changing helper 1 to IDLE > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: cleanup > 2012-02-24 16:29:11 CETDEBUG4 remoteHelperThread_1_1: waiting for work > 2012-02-24 16:29:11 CETINFO remoteWorkerThread_1: SYNC 5000002876 > done in 0.014 seconds > 2012-02-24 16:29:11 CETDEBUG1 remoteWorkerThread_1: SYNC 5000002876 > sync_event timing: pqexec (s/count)- provider 0.001/1 - subsc > riber 0.008/1 - IUD 0.001/4 > 2012-02-24 16:29:11 CETDEBUG2 remoteWorkerThread_1: forward confirm > 2,5000001346 received by 1 > > New SYNC events are being generated. I can find several rows in the > origin's sl_event table with ev_origin=$originid. None of them however > (including rows with ev_origin=$subscribedid) are older than about 15 > minutes, even though the most recent data in the subscriber db is > several hours old. > > I'm not really sure how to interpret the data in the pg_locks view, > but none of them (neither origin or subscriber) have any rows where > the "granted" column isn't set to TRUE, if that means anything? > > /Ulas > > > On Fri, Feb 24, 2012 at 3:34 PM, Steve Singer<ssinger at ca.afilias.info> wrote: >> On 12-02-24 08:21 AM, Ulas Albayrak wrote: >> >> You didn't say what version of slony you are using with which version of >> postgresql. >> >> I don't see anything in the logs you posted about the slon for the origin >> node generating sync events. At DEBUG2 or higher (at least ons some >> versions of slony) you should be getting "syncThread: new sl_action_seq %s " >> type messages in the log for the slon origin. >> >> Are new SYNC events being generated in the origin sl_event table with >> ev_origin=$originid? >> >> Many versions of slony require an exclusive lock on sl_event to generate >> sync events. Do you have something preventing this? (ie look in pg_locks >> to see if the slony sync connection is waiting on a lock). >> >> >> >> >> >>> Hi, >>> >>> I have been trying to set up a small Slony cluster (only 2 nodes) for >>> the last 2 days but I can't get it to work. Everytime I get the same >>> result: The replication starts of fine. Slony start copying, trying to >>> get all the tables in the subscribing node up to speed. But somewhere >>> along the way the 2nd node stops getting updates. Slony replicates all >>> the data in a specific table up to a specific point in time and then >>> no more. And this time seems to coincide with when the copying of data >>> for that specific table started. >>> >>> An example to illustrate the scenario: >>> >>> Let's say I have set up the whole replication system and then at 12:00 >>> I start the actual replication. Around 12:05 copying of table A from >>> node 1 to node 2 starts. It finishes but only the data that was >>> received before 12:05 get copied to node 2. Then at 12:10 copying of >>> table B starts. Same thing here: Slony copies all the data that was >>> received before 12:10 to node 2. And this is the same for all tables. >>> >>> The logs for the slon deamons show: >>> >>> Origin node: >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21942 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21945 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> NOTICE: Slony-I: Logswitch to sl_log_2 initiated >>> CONTEXT: SQL statement "SELECT "_fleetcluster".logswitch_start()" >>> PL/pgSQL function "cleanupevent" line 101 at PERFORM >>> 2012-02-24 12:17:39 CETINFO cleanupThread: 0.019 seconds for >>> cleanupEvent() >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=21949 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=23779 >>> CONTEXT: SQL statement "SELECT "_fleetcluster".cleanupNodelock()" >>> PL/pgSQL function "cleanupevent" line 83 at PERFORM >>> >>> Subscribing node: >>> 2012-02-24 13:20:23 CETINFO remoteWorkerThread_1: SYNC 5000000856 >>> done in 0.012 seconds >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 1 with >>> 9 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 2 with >>> 15 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 3 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 4 with >>> 6 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 5 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 6 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 7 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 8 with >>> 23 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: syncing set 9 with >>> 8 table(s) from provider 1 >>> 2012-02-24 13:20:41 CETINFO remoteWorkerThread_1: SYNC 5000000857 >>> done in 0.014 seconds >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 1 with >>> 9 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 2 with >>> 15 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 3 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 4 with >>> 6 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 5 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 6 with >>> 4 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 7 with >>> 3 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 8 with >>> 23 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: syncing set 9 with >>> 8 table(s) from provider 1 >>> 2012-02-24 13:20:43 CETINFO remoteWorkerThread_1: SYNC 5000000858 >>> done in 0.011 seconds >>> >>> >>> >>> Have anyone experienced this before or have any idea what could be causing >>> this? >>> >> > > >
- Previous message: [Slony1-general] Slony replication stops right after start Slony replication stops right after start
- Next message: [Slony1-general] Slony replication stops right after start Slony replication stops right after start
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list