chris cbbrowne at ca.afilias.info
Thu Jul 17 15:54:28 PDT 2008
chris <cbbrowne at ca.afilias.info> writes:
> Christopher Browne <cbbrowne at ca.afilias.info> writes:
>> Over lunch, Jan and I had a chat about this; it looks like we don't
>> report quite comprehensive enough information in the logs to make it
>> easy to interpret what parts of SYNC processing are consuming what
>> time.
>>
>> The "straw man" idea we came up with is to do a much better breakdown
>> of the time, in particlar, to record:
>>
>>  - time spent in pqexec() against the provider, broken down into...
>>     - time spent processing what transactions are part of the SYNC group
>>     - time spent processing the LOG cursor
>>  - time spent in pqexec() against the subscriber (the I/U/D phase)
>>  - numbers of pqexecs()
>>     against provider
>>     against subscriber
>>  - possibly, the number of times we grab timestamps
>
> And, let us augment this with the number of "large tuple" fetches...
> That will be really cheap from gettimeofday() perspective, but gives
> us a good idea of how much flow interruption takes place.

To better explain the above; at present, we wind up needing to do
quite a lot of very deep guesstimating in order to infer where
performance bottlenecks may lie.

If we report these various values, namely:
 - How many pqexecs, and
 - How long those pqexecs took

  a) Against the data provider, which tells us how expensive it was to PULL
     the sync,

  b) Against the subscriber, which tells us how expensive it was to load the
     data in, there, and

  c) For large tuples that break up the usual "do 100 rows at a time" behaviour

That may be expected to allow people to MUCH more readily determine
where the bottlenecks are.  There tend to be three characteristic
ones:

  1.  The data provider may get overloaded.  "Why" is another question :-).

  2.  The subscriber may be less well appointed, and it might be getting
      overloaded.

  3.  The problem might be with the network connection; the slon may be too
      far away from, well, something...

With this data, it should be easier to distinguish between these
scenarios.

I now have a patch that seems to work reasonably:

http://lists.slony.info/pipermail/slony1-patches/2008-July/000039.html

Jan, you had warned of concerns about multiple threads; I *think* that
this "immunizes itself" in that sync_helper() and sync_event()
instantiate their own copies of the structure, pm, with "auto" extent,
so that if there are multiple threads operating concurrently, each one
should have its own independent copy of pm on the stack.

Perhaps I am woefully wrong, though :-).

I think I'd like to give the variables better names, and Jan suggested
adding a config varible to allow making this data collection optional.

Jan, please browse and see if there are any woeful misapprehensions...
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules  of the  Evil Overlord  #145. "My  dungeon cell  decor  will not
feature exposed pipes.  While they add to the  gloomy atmosphere, they
are good  conductors of vibrations and  a lot of  prisoners know Morse
code." <http://www.eviloverlord.com/>


More information about the Slony1-general mailing list