Fri Jul 11 11:50:58 PDT 2008
- Previous message: [Slony1-general] Slow Replication issue
- Next message: [Slony1-general] Slow Replication issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Christopher Browne wrote: > Cyril SCETBON <scetbon at echo.fr> writes: > >>> Is does not seem to hurt performance as you said, but do you see >>> anything else ? >>> >> If I check the different grouping size I see : >> >> count size >> >> 1 255 >> 1 763 >> 2 7 >> 290 3 >> >> If I understand, you both mean that 301.017 seconds is the duration of >> the processing of events from node 1 which size is 255 and not the 290 >> groups of 3 events ? >> > > Right, 301.017s is the duration of the processing of events from node > 1. > > It's not reporting any duration information for the other nodes in > that it's not doing the work of: > > - searching for the scope of the SYNC group, > - opening a cursor to pull the relevant log data for that scope, and > - applying the changes to the subscriber. > > The processing of the 290 groups of 3 events is taking place > concurrently with the Real Work, in separate threads, so it's not > consuming any time that's worth measuring. > > Over lunch, Jan and I had a chat about this; it looks like we don't > report quite comprehensive enough information in the logs to make it > easy to interpret what parts of SYNC processing are consuming what > time. > > The "straw man" idea we came up with is to do a much better breakdown > of the time, in particlar, to record: > > - time spent in pqexec() against the provider, broken down into... > - time spent processing what transactions are part of the SYNC group > - time spent processing the LOG cursor > - time spent in pqexec() against the subscriber (the I/U/D phase) > - numbers of pqexecs() > against provider > against subscriber > - possibly, the number of times we grab timestamps > > This could be costly to record (since it will call gettimeofday() > quite a lot of times), so we'd add in an option as to whether to do > this analysis or not. > > This would put more useful information in the logs, specifically, it > would make it easier to see where the bottleneck really lies. > > For instance, if the log indicates that 297s are being spent in > pqexec() on the provider, and only 4s on pqexec() on the subscriber, > that makes it pretty that there's some problem with the queries > searching for data to replicate. > > Reverse the figures, and it becomes clear that, for some reason, the > subscriber is very slow in accepting updates. > I think that would be really a great improvement for log analysis > In either case, that is helpful in suggesting where to look for > improvements. To be sure, we'd not focus on the place with 4s; we'd > focus on the 297s! > > Improvements welcome; I'll probably put off starting to implement > anything until next week. > For now, I'll try to enable the log_duration and log_statement parameters in postgresql.conf file to investiguate. Thanks -- Cyril SCETBON
- Previous message: [Slony1-general] Slow Replication issue
- Next message: [Slony1-general] Slow Replication issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list