[Slony1-general] Do we need to stop the write activity during a slave initialization?

Mon Jun 4 19:43:26 PDT 2007

On 6/4/2007 6:57 PM, Dmitry Koterov wrote:
> No test cases, but I have some log lines, hope they would help.
> I also attach this logs to the mail.

What really would be interesting in this case is the sl_setsync for the 
failing subscriber together with the log row (below). That setsync 
should be the still the one created by the copy_set() and have an active 
xip vector showing transaction 264185 in progress and thus, not included 
in the COPY data.

Jan

> 
> -- From the slave log:
> 2007-06-04 18:28:28 GMT DEBUG2 localListenThread: Received event 3,1178 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 localListenThread: Received event 3,1179 
> SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: queue event 1,2685 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: UNLISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: forward confirm 
> 3,1179 received by 4
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_2: LISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 1,2685 received by 2
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 2,2330 received by 1
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 4,1164 received by 1
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 4,1164 received by 2
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_4: queue event 4,1165 
> SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_4: UNLISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: Received event 
> 4,1165 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: SYNC 1165 processing
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: no sets need 
> syncing for this event
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 3,1179 received by 2
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: LISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 3,1179 received by 1
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: queue event 1,2686 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: UNLISTEN
> 2007-06-04 18:28:29 GMT DEBUG2 remoteListenThread_2: queue event 2,2331 SYNC
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: Received event 
> 2,2331 SYNC
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: SYNC 2331 processing
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: no sets need 
> syncing for this event
> 2007-06-04 18:28:29 GMT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 1180
> 2007-06-04 18:28:29 GMT DEBUG2 remoteListenThread_4: LISTEN
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_4: forward confirm 
> 2,2331 received by 4
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 3,1180 received by 2
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 4,1165 received by 2
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 1,2686 received by 2
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm 
> 4,1165 received by 1
> 2007-06-04 18:28:29 GMT DEBUG1 copy_set 1
> 2007-06-04 18:28:29 GMT DEBUG1 remoteWorkerThread_1: connected to 
> provider DB
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_4: forward confirm 
> 3,1180 received by 4
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_1: prepare to copy 
> table "public"."***"
> 
> 
> -- An operation which fails:
> 2007-06-04 20:28:28 GMT ERROR  remoteWorkerThread_1: "
> insert into "public"."edge" 
> (obj_id,obj_comments_count,obj_last_commented,obj_marks_sum_sign,obj_marks_rating,obj_created,obj_type_did,edge_pid1,edge_pid2,edge_created,edge_status_did) 
> 
> values ('668066978','0',NULL,'0','0','2007-06-04 
> 18:28:28.674814','45892','149931528','899581307','2007-06-04 
> 18:28:28.674814','1');
> 
> 
> -- A corresponding event on Master
> =# select * from _m_cluster.sl_log_1 where log_cmddata like 
> '%edge_pid1%668066978%' order by log_actionseq limit 60;
>  log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype 
> |                                                                                                                                           
> log_cmddata                                
> ------------+---------+-------------+---------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
> 
>           1 | 264185  |           7 |          3424 | I           | 
> (obj_id,obj_comments_count,obj_last_commented,obj_marks_sum_sign,obj_marks_rating,obj_created,obj_type_did,edge_pid1,edge_pid2,edge_created,edge_status_did) 
> values ('668066978','0',NULL,'0','0','2007-06-04 18:28: 
> 28.674814','45892','149931528','899581307','2007-06-04 18:28:28.674814','1')
> 
> 
> -- Related events from Master:
> =# select * from _m_cluster.sl_event where ev_minxid=264185;
>  ev_origin | ev_seqno |        ev_timestamp        | ev_minxid | 
> ev_maxxid |                    ev_xip                    | ev_type | 
> ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | 
> ev_data7 | ev_data8
> -----------+----------+----------------------------+-----------+-----------+----------------------------------------------+---------+----------+----------+----------+----------+----------+----------+----------+---------- 
> 
>          1 |     2686 | 2007-06-04 18:28:28.895215 | 264185    | 
> 264202    | '264185'                                     | SYNC    
> |          |          |          |          |          |          
> |          |
>          1 |     2687 | 2007-06-04 18:28:29.194165 | 264185    | 
> 264272    | '264269','264185','264270','264266','264263' | SYNC    
> |          |          |          |          |          |          
> |          |
> 
> 
> 
> 
> On 6/5/07, *Andrew Hammond* <andrew.george.hammond at gmail.com 
> <mailto:andrew.george.hammond at gmail.com>> wrote:
> 
>     On 6/4/07, Dmitry Koterov <dmitry at koterov.ru
>     <mailto:dmitry at koterov.ru>> wrote:
>      > Hello.
>      >
>      > Seems Slony sometimes tries to play an event created BEFORE the whole
>      > copying process was started?
>      > I have the following timing:
>      >
>      > 1) in 18:28:28 a row was added to the table tableName
>      > 2) in 18:28:30 we have DEBUG2 remoteWorkerThread_1: Begin COPY of
>     table
>      > "public"."tableName"
>      > 3) about 20 minutes later slony on the slave tried to replay an
>     event (1)
>      > and failed with "duplicate check" violation (because an element
>     was already
>      > present in the table thanks to COPY).
>      >
>      > So, seems we have to stop any write activity during the start of
>     a COPY
>      > process to avoid this.
> 
>     That certainly shouldn't happen. Do you have a detailed test case that
>     allows this error to be re-produced?
> 
>      > P.S.
>      > Not a single, but 2 slave machines were copied sequentially. The
>     first one
>      > copied and began to replicate successfully, but the second -
>     failed with
>      > described symphtoms.
>      > Slony 1.2.9.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #