Thu Jul 17 15:54:28 PDT 2008
- Previous message: [Slony1-general] Slow Replication issue
- Next message: [Slony1-general] Slow Replication issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
chris <cbbrowne at ca.afilias.info> writes: > Christopher Browne <cbbrowne at ca.afilias.info> writes: >> Over lunch, Jan and I had a chat about this; it looks like we don't >> report quite comprehensive enough information in the logs to make it >> easy to interpret what parts of SYNC processing are consuming what >> time. >> >> The "straw man" idea we came up with is to do a much better breakdown >> of the time, in particlar, to record: >> >> - time spent in pqexec() against the provider, broken down into... >> - time spent processing what transactions are part of the SYNC group >> - time spent processing the LOG cursor >> - time spent in pqexec() against the subscriber (the I/U/D phase) >> - numbers of pqexecs() >> against provider >> against subscriber >> - possibly, the number of times we grab timestamps > > And, let us augment this with the number of "large tuple" fetches... > That will be really cheap from gettimeofday() perspective, but gives > us a good idea of how much flow interruption takes place. To better explain the above; at present, we wind up needing to do quite a lot of very deep guesstimating in order to infer where performance bottlenecks may lie. If we report these various values, namely: - How many pqexecs, and - How long those pqexecs took a) Against the data provider, which tells us how expensive it was to PULL the sync, b) Against the subscriber, which tells us how expensive it was to load the data in, there, and c) For large tuples that break up the usual "do 100 rows at a time" behaviour That may be expected to allow people to MUCH more readily determine where the bottlenecks are. There tend to be three characteristic ones: 1. The data provider may get overloaded. "Why" is another question :-). 2. The subscriber may be less well appointed, and it might be getting overloaded. 3. The problem might be with the network connection; the slon may be too far away from, well, something... With this data, it should be easier to distinguish between these scenarios. I now have a patch that seems to work reasonably: http://lists.slony.info/pipermail/slony1-patches/2008-July/000039.html Jan, you had warned of concerns about multiple threads; I *think* that this "immunizes itself" in that sync_helper() and sync_event() instantiate their own copies of the structure, pm, with "auto" extent, so that if there are multiple threads operating concurrently, each one should have its own independent copy of pm on the stack. Perhaps I am woefully wrong, though :-). I think I'd like to give the variables better names, and Jan suggested adding a config varible to allow making this data collection optional. Jan, please browse and see if there are any woeful misapprehensions... -- select 'cbbrowne' || '@' || 'linuxfinances.info'; http://cbbrowne.com/info/lsf.html Rules of the Evil Overlord #145. "My dungeon cell decor will not feature exposed pipes. While they add to the gloomy atmosphere, they are good conductors of vibrations and a lot of prisoners know Morse code." <http://www.eviloverlord.com/>
- Previous message: [Slony1-general] Slow Replication issue
- Next message: [Slony1-general] Slow Replication issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list