[Slony1-general] Slave can't catch up, postgres error 'stack depth limit exceeded'

Fri Jan 27 10:18:49 PST 2012

>>>> but ... isn't it slony which should not use more than
>>>> default_stack_size ? can't there be an underlining bug ?
>>> If slony is leaking memory or if the compression routine for the
>>> snapshot id's isn't working properly then it is a bug.  I haven't seen
>>> any evidence of this (nor have I analyzed the entire contents of his
>>> sl_event to figure out if that is the case).
>>>
>>> If a single SYNC group really had a lot of active xids such that it
>>> exceeded the amount of text that can be passed to a function with the
>>> default stack size then this isn't a bug.
>>>
>>> In 2.2 on a failed SYNC slon should now dynamically shrink the SYNC
>>> group size until it works (or reaches a size of 1).
>>>
>> Very cool.
>>
>> Unfortunately I've now removed my logs due to space issues. But one
>> thing that concerns me is that I had two slave nodes that were both
>> behind the master at the same SYNC event. One node was on postgres 9.1.2
>> (which is the one that I had this issue with), and the other on 8.4.9.
>> When I brought the daemon for 8.4.9 online, it synced up and did not
>> have this issue, while the 9.1 still did. Both 8.4.9 and 9.1.2 instances
>> had the same value for max_stack_depth.
> A different thing troubles me...
>
> The point of the "compress" step is to compress together runs of
> sequential transaction ID values, and that depends on the values being
> returned in sequential order so that it can recognize runs of
> sequences and compress them together.
>
> It seems as though the query is no longer returning the values in
> sequential order, which seems like a problem.

I may have another opportunity to have another large sync like this 
happen on my systems, if so I'll keep the logs and see what I can find 
in terms of the compression.

- Brian F