[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Thu Feb 21 05:56:13 PST 2008

Andrew Sullivan wrote:
> On Wed, Feb 20, 2008 at 02:55:32PM -0500, Geoffrey wrote:
>>> 1.  Somewhere, your application or some person got in and removed (or maybe
>>> renamed and re-created) a table that was referenced by _something_ that was
>>> still open.
>> The only tables that could possibly be removed would be temp tables.  I 
>> assure you, none of the tables that are being replicated are being 
>> removed by anyone.  The application is not designed that way.
> 
> Just because an application isn't designed to do something doesn't mean it
> never does ;-)

Agreed, but the way this data is used, if a table was ever dropped, it 
would be immediately apparent.  I know this is not happening.  Our 
application is constantly touching virtually every table in the 
application, if one was ever dropped, the data integrity issue would be 
immediately apparent.

> Temp tables could indeed cause the message in question, but
> _only if_ something was looking for that temp table.  (A temp table created
> by a stored procedure without execute would fall into this case, for
> instance, because the plans are cached.)

This could be the case, but I don't know.  I've inquired of others on 
the team to verify.  Question is, is this error related to our slony 
problems?  We've only seen this error following the initial replication 
of data via slony, so I suspect it's a slony issue, but I don't know if 
it's related to our data problem.

>>> 2.  Slony was dropped from the node without some set of your connections
>>> having disconnected, and they're still expecting the triggers they can 
>>> still
>>> see to be able to write into that table.
>> Can you define 'dropped from the node?'
> 
> Somehow, that node stopped being a Slony replica, and so the Slony schema
> was removed.

I would expect to see this blatantly noted in the log files.

> Someone attempted to insert something into a replicated table
> (or delete something, or update something), and the trigger fired without
> the underlying table into which to insert being there.  If someone had
> superuser permission on the database, and was fooling with the underlying
> Slony tables, for instance, all bets are off.  I have seen bigger messes
> created by fat fingers.

I'm the only person who has access to the replication server, so I don't 
believe this is the issue.  I didn't even start looking at any of the 
slony schema data until the problem appeared.

>> I simply don't understand how one table inparticular could get so far 
>> out of sync.  We're talking 300 records.
> 
> Yes.  Note, however, that 300 records could be just a couple of SYNCs, if
> the failure happened at just the right moment.

Understood, and the difference of 300 dropped to around 40 last night, 
but this morning is back up over 250.

-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin