lkv at defx.org lkv
Thu Jul 6 09:29:25 PDT 2006
Hi guys,

I'm trying to figure out why suddenly one of my cascaded replicas
is falling behind, the setup contains 3 nodes, the database is not
really big (say 600mb-1gb):

origin(1) <-> subscriber1(3) <-> subscriber2(2)

origin and subscriber 2 are on dsl, subscriber1 is a colocated server.

the sl_path looks like so:

 pa_server | pa_client |   pa_conninfo   | pa_connretry 
-----------+-----------+-----------------+--------------
         3 |         1 | service=cmsb-fx |           10
         1 |         3 | service=cmsa-fx |           10
         2 |         3 | service=cmsj-fx |           10
         3 |         2 | service=cmsb-fx |           10

and the sl_listen:

 li_origin | li_provider | li_receiver 
-----------+-------------+-------------
         1 |           1 |           3
         1 |           3 |           2
         3 |           3 |           1
         3 |           3 |           2
         2 |           3 |           1
         2 |           2 |           3

for some time now, i've started seeing this (from sl_status):

 st_origin | st_received | st_lag_num_events |   st_lag_time   
-----------+-------------+-------------------+-----------------
         1 |           2 |               158 | 04:39:19.80944
         1 |           3 |                 0 | 00:01:18.545123

(that is from the origin)

it seems to me like messages from 1 do not get applied on 2.
the settings are:

[..]
sync_interval=60000
sync_interval_timeout=1000
sync_group_maxsize=1000
desired_sync_time=60000
[..]

and i'm using slony 1.1.5

once the lag goes over 100 it doesnt catch up and i have to
drop/add the subscriber. i tried Rod's patch but that didnt speed up
things either. 

this whole setup used to work fine for over 5-6months now and it suddenly
started acting wierd.

any clues? any params i can tweak?

thanks in advance,
l




More information about the Slony1-general mailing list