[Slony1-general] Buffering problem

Mon Sep 19 19:35:28 PDT 2005

On 9/19/2005 2:11 PM, Philip Warner wrote:

> Jan Wieck wrote:
> 
>>
>> You didn't misread the code. It indeed buffers based on a compiled in
>> number of rows only and doesn't take the size into account at all. So
>> yes, the fetching thread needs to stop if the buffer grows too large.
>> Since it does block if all buffers are filled, that part wouldn't be
>> too complicated.
>>
>> What gets complicated is the fact that the buffer never shrinks! All
>> the buffer lines stay allocated and eventually get enlarged until slon
>> exits. So even if you stop fetching after you hit large rows, slowly
>> over time all buffer lines will get adjusted to that huge size. On
>> some operating systems (libc implementations to be precise) free()
>> isn't a solution here as it never returns memory to the OS, but keeps
>> the pages for future alloc()s. 
> 
> Well, it would help, wou;dn't it? If in one pass, row(1) had 37MB
> allocated, and in another pass row(2) wanted 37MB, at least another 37MB
> would not be grabbed from the OS -- the freed block would be available.
> 
>> The best way to tackle that would IMHO be to allow only certain buffer
>> lines to be used for huge rows and block if none of them is available.
> 
> Wouldn't this lead to ordering problems?
> 
> What about definining a MAX_ROW_BUFFER which represents the maximum
> allowed to be permanently allocated to command data fetched from the
> log. Then, only fetch cmddata for log rows up to this size. For rows
> larger than this, retrieve the PK and store in the list. When the item
> is to be processed, retrieve the cmddata directly using the PK.

That would create quite a nightmare in the thread coordination. The one 
that does the fetch then needs to be told by the one that does the apply 
to get a specific row instead now.

What you could to to keep it simple is to go with a free() approach. 
free() buffers that are over a certain size after they are applied. And 
have the fetch thread wait if the buffered amount exceeds your limit. In 
addition, you probably want to make the initial fetch size a config 
parameter and also make the actual number of fetched rows depending on 
the buffers fill level, so to speak. The larger the buffer is, the fewer 
rows to fetch in order to avoid "fetching 100 50M rows at once" by surprise.

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #