Mon Jun 4 19:43:26 PDT 2007
- Previous message: [Slony1-general] Do we need to stop the write activity during a slave initialization?
- Next message: [Slony1-general] Re: Bug in cleanupNodelock prevents slony startup
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6/4/2007 6:57 PM, Dmitry Koterov wrote:
> No test cases, but I have some log lines, hope they would help.
> I also attach this logs to the mail.
What really would be interesting in this case is the sl_setsync for the
failing subscriber together with the log row (below). That setsync
should be the still the one created by the copy_set() and have an active
xip vector showing transaction 264185 in progress and thus, not included
in the COPY data.
Jan
>
> -- From the slave log:
> 2007-06-04 18:28:28 GMT DEBUG2 localListenThread: Received event 3,1178 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 localListenThread: Received event 3,1179
> SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: queue event 1,2685 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: UNLISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: forward confirm
> 3,1179 received by 4
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_2: LISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 1,2685 received by 2
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 2,2330 received by 1
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 4,1164 received by 1
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 4,1164 received by 2
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_4: queue event 4,1165
> SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_4: UNLISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: Received event
> 4,1165 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: SYNC 1165 processing
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_4: no sets need
> syncing for this event
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 3,1179 received by 2
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: LISTEN
> 2007-06-04 18:28:28 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 3,1179 received by 1
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: queue event 1,2686 SYNC
> 2007-06-04 18:28:28 GMT DEBUG2 remoteListenThread_1: UNLISTEN
> 2007-06-04 18:28:29 GMT DEBUG2 remoteListenThread_2: queue event 2,2331 SYNC
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: Received event
> 2,2331 SYNC
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: SYNC 2331 processing
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: no sets need
> syncing for this event
> 2007-06-04 18:28:29 GMT DEBUG2 syncThread: new sl_action_seq 1 - SYNC 1180
> 2007-06-04 18:28:29 GMT DEBUG2 remoteListenThread_4: LISTEN
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_4: forward confirm
> 2,2331 received by 4
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 3,1180 received by 2
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 4,1165 received by 2
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 1,2686 received by 2
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_2: forward confirm
> 4,1165 received by 1
> 2007-06-04 18:28:29 GMT DEBUG1 copy_set 1
> 2007-06-04 18:28:29 GMT DEBUG1 remoteWorkerThread_1: connected to
> provider DB
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_4: forward confirm
> 3,1180 received by 4
> 2007-06-04 18:28:29 GMT DEBUG2 remoteWorkerThread_1: prepare to copy
> table "public"."***"
>
>
> -- An operation which fails:
> 2007-06-04 20:28:28 GMT ERROR remoteWorkerThread_1: "
> insert into "public"."edge"
> (obj_id,obj_comments_count,obj_last_commented,obj_marks_sum_sign,obj_marks_rating,obj_created,obj_type_did,edge_pid1,edge_pid2,edge_created,edge_status_did)
>
> values ('668066978','0',NULL,'0','0','2007-06-04
> 18:28:28.674814','45892','149931528','899581307','2007-06-04
> 18:28:28.674814','1');
>
>
> -- A corresponding event on Master
> =# select * from _m_cluster.sl_log_1 where log_cmddata like
> '%edge_pid1%668066978%' order by log_actionseq limit 60;
> log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype
> |
> log_cmddata
> ------------+---------+-------------+---------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 1 | 264185 | 7 | 3424 | I |
> (obj_id,obj_comments_count,obj_last_commented,obj_marks_sum_sign,obj_marks_rating,obj_created,obj_type_did,edge_pid1,edge_pid2,edge_created,edge_status_did)
> values ('668066978','0',NULL,'0','0','2007-06-04 18:28:
> 28.674814','45892','149931528','899581307','2007-06-04 18:28:28.674814','1')
>
>
> -- Related events from Master:
> =# select * from _m_cluster.sl_event where ev_minxid=264185;
> ev_origin | ev_seqno | ev_timestamp | ev_minxid |
> ev_maxxid | ev_xip | ev_type |
> ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 |
> ev_data7 | ev_data8
> -----------+----------+----------------------------+-----------+-----------+----------------------------------------------+---------+----------+----------+----------+----------+----------+----------+----------+----------
>
> 1 | 2686 | 2007-06-04 18:28:28.895215 | 264185 |
> 264202 | '264185' | SYNC
> | | | | | |
> | |
> 1 | 2687 | 2007-06-04 18:28:29.194165 | 264185 |
> 264272 | '264269','264185','264270','264266','264263' | SYNC
> | | | | | |
> | |
>
>
>
>
> On 6/5/07, *Andrew Hammond* <andrew.george.hammond at gmail.com
> <mailto:andrew.george.hammond at gmail.com>> wrote:
>
> On 6/4/07, Dmitry Koterov <dmitry at koterov.ru
> <mailto:dmitry at koterov.ru>> wrote:
> > Hello.
> >
> > Seems Slony sometimes tries to play an event created BEFORE the whole
> > copying process was started?
> > I have the following timing:
> >
> > 1) in 18:28:28 a row was added to the table tableName
> > 2) in 18:28:30 we have DEBUG2 remoteWorkerThread_1: Begin COPY of
> table
> > "public"."tableName"
> > 3) about 20 minutes later slony on the slave tried to replay an
> event (1)
> > and failed with "duplicate check" violation (because an element
> was already
> > present in the table thanks to COPY).
> >
> > So, seems we have to stop any write activity during the start of
> a COPY
> > process to avoid this.
>
> That certainly shouldn't happen. Do you have a detailed test case that
> allows this error to be re-produced?
>
> > P.S.
> > Not a single, but 2 slave machines were copied sequentially. The
> first one
> > copied and began to replicate successfully, but the second -
> failed with
> > described symphtoms.
> > Slony 1.2.9.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck at Yahoo.com #
- Previous message: [Slony1-general] Do we need to stop the write activity during a slave initialization?
- Next message: [Slony1-general] Re: Bug in cleanupNodelock prevents slony startup
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list