Fri Jul 11 11:06:50 PDT 2008
- Previous message: [Slony1-general] Slow Replication issue
- Next message: [Slony1-general] Slow Replication issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Cyril SCETBON <scetbon at echo.fr> writes: >> Is does not seem to hurt performance as you said, but do you see >> anything else ? > If I check the different grouping size I see : > > count size > > 1 255 > 1 763 > 2 7 > 290 3 > > If I understand, you both mean that 301.017 seconds is the duration of > the processing of events from node 1 which size is 255 and not the 290 > groups of 3 events ? Right, 301.017s is the duration of the processing of events from node 1. It's not reporting any duration information for the other nodes in that it's not doing the work of: - searching for the scope of the SYNC group, - opening a cursor to pull the relevant log data for that scope, and - applying the changes to the subscriber. The processing of the 290 groups of 3 events is taking place concurrently with the Real Work, in separate threads, so it's not consuming any time that's worth measuring. Over lunch, Jan and I had a chat about this; it looks like we don't report quite comprehensive enough information in the logs to make it easy to interpret what parts of SYNC processing are consuming what time. The "straw man" idea we came up with is to do a much better breakdown of the time, in particlar, to record: - time spent in pqexec() against the provider, broken down into... - time spent processing what transactions are part of the SYNC group - time spent processing the LOG cursor - time spent in pqexec() against the subscriber (the I/U/D phase) - numbers of pqexecs() against provider against subscriber - possibly, the number of times we grab timestamps This could be costly to record (since it will call gettimeofday() quite a lot of times), so we'd add in an option as to whether to do this analysis or not. This would put more useful information in the logs, specifically, it would make it easier to see where the bottleneck really lies. For instance, if the log indicates that 297s are being spent in pqexec() on the provider, and only 4s on pqexec() on the subscriber, that makes it pretty that there's some problem with the queries searching for data to replicate. Reverse the figures, and it becomes clear that, for some reason, the subscriber is very slow in accepting updates. In either case, that is helpful in suggesting where to look for improvements. To be sure, we'd not focus on the place with 4s; we'd focus on the 297s! Improvements welcome; I'll probably put off starting to implement anything until next week. -- select 'cbbrowne' || '@' || 'linuxfinances.info'; http://cbbrowne.com/info/lsf.html Rules of the Evil Overlord #145. "My dungeon cell decor will not feature exposed pipes. While they add to the gloomy atmosphere, they are good conductors of vibrations and a lot of prisoners know Morse code." <http://www.eviloverlord.com/>
- Previous message: [Slony1-general] Slow Replication issue
- Next message: [Slony1-general] Slow Replication issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list