[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Wed Feb 20 15:12:33 PST 2008

On 2/18/2008 4:46 PM, Geoffrey wrote:
> Christopher Browne wrote:
>> Geoffrey <lists at serioustechnology.com> writes:
>>> I want to make sure I have a good handle on this issue.  We currently
>>> have a master/slave configuration.  In the event the slave must be
>>> rebooted, what are the proper steps to insure that slony picks up from
>>> where it left off.
>>>
>>> What we are currently doing is simply restart the slon daemons for
>>> each database.
>> 
>> That seems apropos.
>> 
>>> For the most part this appears to work, but what concerns me is that
>>> I have one table on one database where the number of records on the
>>> master node has not increased in a while and the slave does not
>>> appear to be 'trying to catch up.'  That is to say, the slave has
>>> fewer records in that table then the master and the slave table is
>>> not growing.
>> 
>> Well, then "get thee to the slon logs..."
>> 
>> -> Do they indicate, for the subscriber, that errors are being experienced?
> 
> Not that I can tell.  That is the first placed I looked.
> 
>> -> Do they indicate that SYNCs are being processed, and data applied?
> 
> SYNCs processed, but says nothing to process:
> 
> 2008-02-18 16:39:27 EST DEBUG2 localListenThread: Received event 1,10345 
> SYNC
> 2008-02-18 16:39:32 EST DEBUG2 remoteListenThread_2: queue event 2,7343 SYNC
> 2008-02-18 16:39:32 EST DEBUG2 remoteWorkerThread_2: Received event 
> 2,7343 SYNC
> 2008-02-18 16:39:32 EST DEBUG2 remoteWorkerThread_2: SYNC 7343 processing
> 2008-02-18 16:39:32 EST DEBUG2 remoteWorkerThread_2: no sets need 
> syncing for this event

This is the slon log of node 1. Unless node 1 is the lagging subscriber 
(which I actually don't expect), this isn't telling us much.

The question is if the slon for the subscriber is running and if that 
ones log shows any errors.

Jan

> 
>> -> Is the subscriber in question behind (according to the view
>>    sl_status) by an increasing amount of time?
> 
> I'm not sure what I'm looking for here.  From the slave:
> 
> master=# select * from  _master_cluster.sl_status;
>   st_origin | st_received | st_last_event |      st_last_event_ts      | 
> st_last_received |    st_last_received_ts     | 
> st_last_received_event_ts  | st_lag_num_events | st_lag_time
> -----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-------------
>           2 |           1 |          7377 | 02/18/2008 16:45:11.298502 | 
>      7377 | 02/18/2008 16:45:11.728581 | 02/18/2008 16:45:11.298502 | 
>     0 | @ 4.40 secs
> 
> 
>> Error messages in the slon logs should give some idea of what is going on.
> 
> 

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin