Thu May 20 07:48:26 PDT 2010
- Previous message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Next message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jan Wieck a écrit : > On 5/20/2010 9:35 AM, Cyril Scetbon wrote: > >> Jan Wieck a écrit : >> >>> On 5/12/2010 10:31 AM, Gurjeet Singh wrote: >>> >>> >>>> Hi All, >>>> >>>> I have two Slony test beds which show the exact same symptoms! >>>> >>>> select * from sl_event order by ev_seqno; >>>> >>>> ev_origin | ev_seqno | ev_timestamp | >>>> ev_snapshot | ev_type | >>>> -----------+------------+----------------------------+----------------------------+---------+- >>>> 2 | 5000000002 | 2010-04-30 08:32:38.622928 | >>>> 458:458: | SYNC | >>>> 1 | 5000525721 | 2010-05-12 13:30:22.79626 | >>>> 72685915:72685915: | SYNC | >>>> 1 | 5000525722 | 2010-05-12 13:30:24.800943 | >>>> 72686139:72686139: | SYNC | >>>> 1 | 5000525723 | 2010-05-12 13:30:26.804862 | >>>> 72686224:72686224: | SYNC | >>>> ... >>>> >>>> >>>> >>> Slony always keeps at least the last event per origin around. Otherwise >>> the view sl_status would not work. >>> >>> >> Hi Jan, Can you talk more about it ? I've posted a mail today to >> slony1-bugs cause test_slony_state.pl is warning us about old events >> (that's exactly the eldest ones). That's a matter for events generated >> from the local node. I see events from the local node only when I >> restart it : >> > > I presume that you have set sync_interval_timeout to zero on the > subscribers, which will prevent the generation of SYNC events on those > nodes because no actual replication work is ever generated there. Looks > like test_slony_state.pl depends on that parameter no be non-zero > (default is -t 10000, meaning every 10 seconds). > No as you can see : 2010-05-20 15:31:55 CEST CONFIG slon: watchdog ready - pid = 23457 2010-05-20 15:31:55 CEST CONFIG main: Integer option vac_frequency = 3 2010-05-20 15:31:55 CEST CONFIG main: Integer option log_level = 2 2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_interval = 500 2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_interval_timeout = 10000 2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_group_maxsize = 20 2010-05-20 15:31:55 CEST CONFIG main: Integer option desired_sync_time = 60000 2010-05-20 15:31:55 CEST CONFIG main: Integer option syslog = 0 2010-05-20 15:31:55 CEST CONFIG main: Integer option quit_sync_provider = 0 2010-05-20 15:31:55 CEST CONFIG main: Integer option quit_sync_finalsync = 0 2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_max_rowsize = 8192 2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_max_largemem = 5242880 2010-05-20 15:31:55 CEST CONFIG main: Integer option remote_listen_timeout = 300 2010-05-20 15:31:55 CEST CONFIG main: Boolean option log_pid = 0 2010-05-20 15:31:55 CEST CONFIG main: Boolean option log_timestamp = 1 2010-05-20 15:31:55 CEST CONFIG main: Boolean option cleanup_deletelogs = 0 2010-05-20 15:31:55 CEST CONFIG main: Real option real_placeholder = 0.000000 But this is a receiver and I saw in the code of function generate_sync_event that it does not generate sync interval on a node which is not the origin of a set. That's why I presume there is no sync created except the one created at startup (mandatory) in syncThread_main : /* * We don't initialize the last known action sequence to the actual value. * This causes that we create a SYNC event allways on startup, just in * case. */ last_actseq_buf[0] = '\0'; /* * Build the query that starts a transaction and retrieves the last value * from the action sequence. */ dstring_init(&query1); slon_mkquery(&query1, "start transaction;" "set transaction isolation level serializable;" "select last_value from %s.sl_action_seq;", rtcfg_namespace); /* * Build the query that calls createEvent() for the SYNC */ dstring_init(&query2); slon_mkquery(&query2, "select %s.createEvent('_%s', 'SYNC', NULL)" " from %s.sl_node where no_id = %s.getLocalNodeId('_%s') " " and exists (select 1 from %s.sl_set where set_origin= no_id);", rtcfg_namespace, rtcfg_cluster_name, rtcfg_namespace, rtcfg_namespace, rtcfg_cluster_name, rtcfg_namespace); > > Jan > > >> select * from _OURCLUSTER.sl_event where >> ev_origin=102; >> ev_origin | ev_seqno | ev_timestamp | >> ev_snapshot | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | >> ev_data5 | ev_data6 | ev_data7 | ev_data8 >> -----------+----------+----------------------------+----------------------+---------+----------+----------+----------+----------+----------+----------+----------+---------- >> 102 | 51 | 2010-05-20 12:27:00.099562 | >> 338318875:338318875: | SYNC | | | >> | | | | | >> (1 row) >> >> select * from _OURCLUSTER.sl_confirm where con_origin=102; >> con_origin | con_received | con_seqno | con_timestamp >> ------------+--------------+-----------+---------------------------- >> 102 | 101 | 51 | 2010-05-20 12:27:02.78581 >> 102 | 103 | 51 | 2010-05-20 12:27:00.118815 >> 102 | 104 | 51 | 2010-05-20 12:27:00.253975 >> >> the SYNC appears in slony logs as "new sl_action_seq 1 - SYNC %d" >> >> >>> What should worry you is that there are no newer SYNC events from node 2 >>> available. Slony does create a sporadic SYNC every now and then even if >>> there is no activity or the node isn't an origin anyway. >>> >>> Is it possible that node 2's clock is way off? >>> >>> >>> Jan >>> >>> >>> >>>> The reason I think this _might_ be a bug is that on both clusters, slave >>>> node's sl_event has the exact same record for ev_seqno=5000000002 except >>>> for the timestamp; same origin, and same snapshot! >>>> >>>> The head of sl_confirm has: >>>> >>>> select * from sl_confirm order by con_seqno; >>>> >>>> con_origin | con_received | con_seqno | con_timestamp >>>> ------------+--------------+------------+---------------------------- >>>> 2 | 1 | 5000000002 | 2010-04-30 08:32:53.974021 >>>> 1 | 2 | 5000527075 | 2010-05-12 14:15:41.192279 >>>> 1 | 2 | 5000527076 | 2010-05-12 14:15:43.193607 >>>> 1 | 2 | 5000527077 | 2010-05-12 14:15:45.196291 >>>> 1 | 2 | 5000527078 | 2010-05-12 14:15:47.197005 >>>> ... >>>> >>>> Can someone comment on the health of the cluster? All events, except for >>>> that on, are being confirmed and purged from the system regularly, so my >>>> assumption is that the cluster is healthy and that the slave is in sync >>>> with the master. >>>> >>>> Thanks in advance. >>>> -- >>>> gurjeet.singh >>>> @ EnterpriseDB - The Enterprise Postgres Company >>>> http://www.enterprisedb.com >>>> >>>> singh.gurjeet@{ gmail | yahoo }.com >>>> Twitter/Skype: singh_gurjeet >>>> >>>> Mail sent from my BlackLaptop device >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Slony1-general mailing list >>>> Slony1-general at lists.slony.info >>>> http://lists.slony.info/mailman/listinfo/slony1-general >>>> >>>> >>> >>> > > > -- Cyril SCETBON
- Previous message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Next message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list