Christopher Browne cbbrowne at ca.afilias.info
Wed Jul 29 13:12:55 PDT 2009
Quoting Jason Culverhouse <jason at merchantcircle.com>:
> I see these notices on node 20
> NOTICE:  Slony-I: log switch to sl_log_1 still in progress -  
> sl_log_2 not truncated
> NOTICE:  Slony-I: log switch to sl_log_1 still in progress -  
> sl_log_2 not truncated
> NOTICE:  Slony-I: log switch to sl_log_1 still in progress -  
> sl_log_2 not truncated
>
> I see FETCH statements taking up 100% cpu on node 30
>
> Any Idea on what to do here?

This happens when it's continuing to notice tuples lingering in sl_log_2.

You should be able to check on node 20 and see that "select * from  
_myschema.sl_log_2 limit 1;" returns a tuple.

I'd suggest running test_slony_state{-dbi}.pl, to check the general  
condition of things.  It'll doubtless gripe about there being a lot of  
tuples in sl_log_2.

I'd *suspect* that a confirmation isn't getting back to node 20, so it  
doesn't know that those tuples can be deleted, and hence never gets  
around to truncating the log table.

The "failure of event communications" is the thing I'd expect to be  
most likely to be the *real* cause.

I could be wrong, but it's pretty easy to check (see the other email  
thread today where I pointed someone at running the state test tool),  
and it is hopefully easy to rectify if that's the case.

If communications are fine, then the question arises, why aren't those  
old tuples in sl_log_2 getting trimmed?  That should help direct the  
questions, irrespective of cause.

FYI, if there's some misconfiguration that has caused this, and the  
truncate then comes thru (~10-15 minutes later), then you may want to  
induce another log switch pretty soon since sl_log_1 probably has a  
lot of data crudding it up.  Again, someone asked how to induce this  
just recently, so take a peek at recent archives for how to do that.




More information about the Slony1-general mailing list