Thu May 20 07:48:26 PDT 2010
- Previous message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Next message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jan Wieck a écrit :
> On 5/20/2010 9:35 AM, Cyril Scetbon wrote:
>
>> Jan Wieck a écrit :
>>
>>> On 5/12/2010 10:31 AM, Gurjeet Singh wrote:
>>>
>>>
>>>> Hi All,
>>>>
>>>> I have two Slony test beds which show the exact same symptoms!
>>>>
>>>> select * from sl_event order by ev_seqno;
>>>>
>>>> ev_origin | ev_seqno | ev_timestamp |
>>>> ev_snapshot | ev_type |
>>>> -----------+------------+----------------------------+----------------------------+---------+-
>>>> 2 | 5000000002 | 2010-04-30 08:32:38.622928 |
>>>> 458:458: | SYNC |
>>>> 1 | 5000525721 | 2010-05-12 13:30:22.79626 |
>>>> 72685915:72685915: | SYNC |
>>>> 1 | 5000525722 | 2010-05-12 13:30:24.800943 |
>>>> 72686139:72686139: | SYNC |
>>>> 1 | 5000525723 | 2010-05-12 13:30:26.804862 |
>>>> 72686224:72686224: | SYNC |
>>>> ...
>>>>
>>>>
>>>>
>>> Slony always keeps at least the last event per origin around. Otherwise
>>> the view sl_status would not work.
>>>
>>>
>> Hi Jan, Can you talk more about it ? I've posted a mail today to
>> slony1-bugs cause test_slony_state.pl is warning us about old events
>> (that's exactly the eldest ones). That's a matter for events generated
>> from the local node. I see events from the local node only when I
>> restart it :
>>
>
> I presume that you have set sync_interval_timeout to zero on the
> subscribers, which will prevent the generation of SYNC events on those
> nodes because no actual replication work is ever generated there. Looks
> like test_slony_state.pl depends on that parameter no be non-zero
> (default is -t 10000, meaning every 10 seconds).
>
No as you can see :
2010-05-20 15:31:55 CEST CONFIG slon: watchdog ready - pid = 23457
2010-05-20 15:31:55 CEST CONFIG main: Integer option vac_frequency = 3
2010-05-20 15:31:55 CEST CONFIG main: Integer option log_level = 2
2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_interval = 500
2010-05-20 15:31:55 CEST CONFIG main: Integer option
sync_interval_timeout = 10000
2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_group_maxsize = 20
2010-05-20 15:31:55 CEST CONFIG main: Integer option desired_sync_time =
60000
2010-05-20 15:31:55 CEST CONFIG main: Integer option syslog = 0
2010-05-20 15:31:55 CEST CONFIG main: Integer option quit_sync_provider = 0
2010-05-20 15:31:55 CEST CONFIG main: Integer option quit_sync_finalsync = 0
2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_max_rowsize = 8192
2010-05-20 15:31:55 CEST CONFIG main: Integer option sync_max_largemem =
5242880
2010-05-20 15:31:55 CEST CONFIG main: Integer option
remote_listen_timeout = 300
2010-05-20 15:31:55 CEST CONFIG main: Boolean option log_pid = 0
2010-05-20 15:31:55 CEST CONFIG main: Boolean option log_timestamp = 1
2010-05-20 15:31:55 CEST CONFIG main: Boolean option cleanup_deletelogs = 0
2010-05-20 15:31:55 CEST CONFIG main: Real option real_placeholder =
0.000000
But this is a receiver and I saw in the code of function
generate_sync_event that it does not generate sync interval on a node
which is not the origin of a set. That's why I presume there is no sync
created except the one created at startup (mandatory) in syncThread_main :
/*
* We don't initialize the last known action sequence to the
actual value.
* This causes that we create a SYNC event allways on startup,
just in
* case.
*/
last_actseq_buf[0] = '\0';
/*
* Build the query that starts a transaction and retrieves the
last value
* from the action sequence.
*/
dstring_init(&query1);
slon_mkquery(&query1,
"start transaction;"
"set transaction isolation level
serializable;"
"select last_value from %s.sl_action_seq;",
rtcfg_namespace);
/*
* Build the query that calls createEvent() for the SYNC
*/
dstring_init(&query2);
slon_mkquery(&query2,
"select %s.createEvent('_%s', 'SYNC', NULL)"
" from %s.sl_node where no_id =
%s.getLocalNodeId('_%s') "
" and exists (select 1 from %s.sl_set where
set_origin= no_id);",
rtcfg_namespace, rtcfg_cluster_name,
rtcfg_namespace, rtcfg_namespace,
rtcfg_cluster_name, rtcfg_namespace);
>
> Jan
>
>
>> select * from _OURCLUSTER.sl_event where
>> ev_origin=102;
>> ev_origin | ev_seqno | ev_timestamp |
>> ev_snapshot | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 |
>> ev_data5 | ev_data6 | ev_data7 | ev_data8
>> -----------+----------+----------------------------+----------------------+---------+----------+----------+----------+----------+----------+----------+----------+----------
>> 102 | 51 | 2010-05-20 12:27:00.099562 |
>> 338318875:338318875: | SYNC | | |
>> | | | | |
>> (1 row)
>>
>> select * from _OURCLUSTER.sl_confirm where con_origin=102;
>> con_origin | con_received | con_seqno | con_timestamp
>> ------------+--------------+-----------+----------------------------
>> 102 | 101 | 51 | 2010-05-20 12:27:02.78581
>> 102 | 103 | 51 | 2010-05-20 12:27:00.118815
>> 102 | 104 | 51 | 2010-05-20 12:27:00.253975
>>
>> the SYNC appears in slony logs as "new sl_action_seq 1 - SYNC %d"
>>
>>
>>> What should worry you is that there are no newer SYNC events from node 2
>>> available. Slony does create a sporadic SYNC every now and then even if
>>> there is no activity or the node isn't an origin anyway.
>>>
>>> Is it possible that node 2's clock is way off?
>>>
>>>
>>> Jan
>>>
>>>
>>>
>>>> The reason I think this _might_ be a bug is that on both clusters, slave
>>>> node's sl_event has the exact same record for ev_seqno=5000000002 except
>>>> for the timestamp; same origin, and same snapshot!
>>>>
>>>> The head of sl_confirm has:
>>>>
>>>> select * from sl_confirm order by con_seqno;
>>>>
>>>> con_origin | con_received | con_seqno | con_timestamp
>>>> ------------+--------------+------------+----------------------------
>>>> 2 | 1 | 5000000002 | 2010-04-30 08:32:53.974021
>>>> 1 | 2 | 5000527075 | 2010-05-12 14:15:41.192279
>>>> 1 | 2 | 5000527076 | 2010-05-12 14:15:43.193607
>>>> 1 | 2 | 5000527077 | 2010-05-12 14:15:45.196291
>>>> 1 | 2 | 5000527078 | 2010-05-12 14:15:47.197005
>>>> ...
>>>>
>>>> Can someone comment on the health of the cluster? All events, except for
>>>> that on, are being confirmed and purged from the system regularly, so my
>>>> assumption is that the cluster is healthy and that the slave is in sync
>>>> with the master.
>>>>
>>>> Thanks in advance.
>>>> --
>>>> gurjeet.singh
>>>> @ EnterpriseDB - The Enterprise Postgres Company
>>>> http://www.enterprisedb.com
>>>>
>>>> singh.gurjeet@{ gmail | yahoo }.com
>>>> Twitter/Skype: singh_gurjeet
>>>>
>>>> Mail sent from my BlackLaptop device
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Slony1-general mailing list
>>>> Slony1-general at lists.slony.info
>>>> http://lists.slony.info/mailman/listinfo/slony1-general
>>>>
>>>>
>>>
>>>
>
>
>
--
Cyril SCETBON
- Previous message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Next message: [Slony1-bugs] [Slony1-general] An old event not confirmed: A possible bug?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list