[Slony1-general] Replication problem

Fri Dec 9 15:13:08 PST 2005

On 12/8/2005 10:41 PM, Peter Davie wrote:

> Hi Jan,
> 
> Just to add to the nightmare... Even though some queries are cursor 
> based, the *postgres process* can still run out of memory when 
> performing queries (I have seen this happen with slony).

Both problems, slon/postgres running out of memory, as well as slon 
terminating, are addressed in HEAD (which will become 1.2).

Jan

> 
> Thanks,
> Peter
> 
> Jan Wieck wrote:
> 
>> On 12/8/2005 9:31 AM, cbbrowne at ca.afilias.info wrote:
>>
>>>> On 12/7/2005 9:23 PM, Peter Davie wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Using Slony1 version 1.1.0 at a customer site, the customer has had 
>>>>> the
>>>>> slon daemons fall over on one of their slave servers (and didn't
>>>>> notice!) On restarting the slon processes, there is now an error being
>>>>> generated because it is attempting to malloc memory to record all 
>>>>> of the
>>>>> outstanding transactions and the slon daemon is running out of memory.
>>>>> Is there any way forward to resolve this, or will I just have to
>>>>> uninstall the slave and resubscribe (which is my current plan).
>>>>
>>>>
>>>> This node must have been down for quite some time. A SYNC event in the
>>>> remote_worker queue takes about 200 bytes or so. How many million 
>>>> events
>>>> is this node behind? You could tell from looking at sl_status.
>>>>
>>>> And don't forget to VACUUM FULL ANALYZE that database after you've
>>>> dropped that node.
>>>
>>>
>>> Based on the symptoms, two things come to my mind:
>>>
>>> 1. Did the slon controlling the origin die? That would be the classic
>>> way for a SYNC to encompass a Very Long Period Of Time and hence a 
>>> LOT of
>>> transactions.
>>
>>
>> That's not the case and it wouldn't cause the symptom observed. Unless 
>> there are large rows involved, the resulting, humungous sync chunk 
>> would just take a while, but since that operation is cursor driven 
>> even in 1.0, it won't cause slon to run out of memory.
>>
>>>
>>> There's a script in ~/tools that will generate SYNCs if you run it as a
>>> cron job. We run this in production so as to avoid this particular
>>> problem...
>>>
>>> 2. Is it possible that the subscriber is trying to process a whole bunch
>>> of SYNCs in one fell group?
>>>
>>> If you add the "-g 1" option, it'll go one SYNC at a time, which would
>>> somewhat alleviate the problem.
>>
>>
>> Would not be a problem either.
>>
>>
>> Jan
>>
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #