Fri Oct 15 23:30:31 PDT 2004
- Previous message: [Slony1-general] Node Naming Proposal
- Next message: [Slony1-general] Logging Bandwidth Usage - Request For Comment
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
An issue we recently ran into was that our network guys would like to get a _clear_ picture as to how much bandwidth Slony-I is using. It has potential impact on costs, as we have some bandwidth providers that charge extra for using more bandwidth than planned for. I'm sure many would be interested in getting a clearer picture as to how much of their bandwidth between destinations is getting "eaten up" by replication. Here are the beginnings of "a thought" on how we might get some statistics (at this point incomplete, but hopefully still useful, at least provisionally) out of the slon processes. A Method To Estimate Bandwidth Usage A "first order" approximation of how much bandwidth is used might be gotten by recording the sizes of all the queries that get submitted. This would likely only reflect about 1/2 of the bandwidth usage, in view of the fact that for every instance of "insert into table x values (a, b, c)" that goes to a subscribing node, the contents of that query were retrieved from the provider node. That being said, knowing that "reality" is on the order of 2x the size of the queries submitted is still a useful thing to know. A Mechanism To Determine Query Sizes The cases where slon submits queries, it virtually always submits them via the following idiom: res = PQexec(dbconn, dstring_data(&query1)); dstring_data(x) is actually a macro that references a field in the query structure. One might capture the size of the query by modifying dstring_data(x) to surreptitiously use strlen() to measure the size of the query, and then add that value to a variable. Alternatively, we might create a wrapper function and replace usage of PQexec() with that function. That is probably less scary :-). A Mechanism To Report Usage There then needs to be a way to report the amount of query data that has passed through dstring_data(x)'s "mouth." I would suggest that reporting it in the slon logs each time the cleanup cycle runs might be a reasonably satisfactory interval. A Method That Is Incomplete The above methodology is conspicuously incomplete in three ways that I can see: - It does not make any attempt to measure the amount of bandwidth consumed by results returned by queries. The size of the result set would be pretty messy to rummage through as a binary data structure. That being said, the results that one would expect to be of material size are those coming from queries of sl_log tables on the provider node that then lead to queries being submitted to the subscriber node. And we would expect those to be of roughly equal size, implying that it might be right to multiply the sizes of queries by 2 for heavily updated databases. - It does not make any attempt to separate statistics for "requests" going to the provider from those for updates going to the subscriber. - It uses "size of queries" as a metric for "bandwidth used." That is necessarily only an approximation. Nonetheless, if we can get the "low hanging fruit" of being able to easily provide _some_ numbers, that may suffice to allow the creation of estimates that are, if not exact, at least based on a rational process. It is better to have some numbers than to have no numbers... -- "cbbrowne","@","ca.afilias.info" <http://dev6.int.libertyrms.com/> Christopher Browne (416) 673-4124 (land)
- Previous message: [Slony1-general] Node Naming Proposal
- Next message: [Slony1-general] Logging Bandwidth Usage - Request For Comment
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list